OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Classification for multi row observation: Long format to Wide format always efficient?

  • Thread starter Thread starter Salih
  • Start date Start date
S

Salih

Guest
I have a table of observations, or rather 'grouped' observations, where each group represents a deal, and each row representing a product. But the prediction is to be done at a Deal level. Below is the sample dataset.

Sample Dataset :

Code:
df = pd.DataFrame({'deal': ['deal1', 'deal1', 'deal2', 'deal2', 'deal3', 'deal3'],
                   'product': ['prd_1', 'prd_2', 'prd_1', 'prd_2', 'prd_1', 'prd_2'],
                   'Quantity': [2, 1, 5, 3, 6, 7],
                   'Total Price': [10, 7, 25, 24, 30, 56],
                   'Result': ['Won', 'Won', 'Lost','Lost', 'Won', 'Won']})

My Approach: Flatten the data to get one observation per row using pivot_table, so that we get one row per Deal, and then proceed with the classification modelling, probably a logistic regression or gradient boosting.

But in the above case we had: 1 column (product, with 2 unique values) to be pivoted 2 measures (Quantity and Price) as the series/values.

resulting in 4 columns. The Wide format table is shown below:

Table

Question/Problem/Thought:

Is this always the best way in cases like these? The problem (or maybe not?) I see is when number of columns to be pivoted is more than 1 and also if its combination of unique values in it is more, the table may get very very wide!

I would be grateful to hear alternative efficient ways to prepare the dataset to train, if any!
<p>I have a table of observations, or rather 'grouped' observations, where each group represents a deal, and each row representing a product. But the prediction is to be done at a Deal level. Below is the sample dataset.</p>
<p><strong>Sample Dataset :</strong></p>
<pre><code>df = pd.DataFrame({'deal': ['deal1', 'deal1', 'deal2', 'deal2', 'deal3', 'deal3'],
'product': ['prd_1', 'prd_2', 'prd_1', 'prd_2', 'prd_1', 'prd_2'],
'Quantity': [2, 1, 5, 3, 6, 7],
'Total Price': [10, 7, 25, 24, 30, 56],
'Result': ['Won', 'Won', 'Lost','Lost', 'Won', 'Won']})
</code></pre>
<p><strong>My Approach:</strong>
Flatten the data to get one observation per row using <code>pivot_table</code>, so that we get one row per Deal, and then proceed with the classification modelling, probably a logistic regression or gradient boosting.</p>
<p>But in the above case we had:
1 column (product, with 2 unique values) to be pivoted
2 measures (Quantity and Price) as the series/values.</p>
<p>resulting in 4 columns. The Wide format table is shown below:</p>
<p><a href="https://i.sstatic.net/Yd4y0Xx7.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/Yd4y0Xx7.png" alt="Table" /></a></p>
<p><strong>Question/Problem/Thought:</strong></p>
<p>Is this always the best way in cases like these? The problem (or maybe not?) I see is when number of columns to be pivoted is more than 1 and also if its combination of unique values in it is more, the table may get very very wide!</p>
<p>I would be grateful to hear alternative efficient ways to prepare the dataset to train, if any!</p>
 

Latest posts

Top