OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Python: Only 2 unique column names in dataframe, 3105 columns total. How to get average of row, grouped by unique column name

  • Thread starter Thread starter careless_caramel
  • Start date Start date
C

careless_caramel

Guest
My dataframe

My dataframe is in the linked image. Basically to make it simple, my dataframe currently looks something like this:

GeneCell_ACell_BCell_BCell_BCell_A
Gene_A0435.54.53.5
Gene_B1.3523.42.40
Gene_C2.33.33202

And there are 3105 columns of Cell_A and Cell_B combined. There are around 13k (I think?) rows of genes. What I want to do is get the average number per gene (row), grouped by the unique column name. So in the end I would have just 2 columns, Cell_A and Cell_B, with the average number (per gene, i.e. row) as data.

I expect that it has to do something with either agg or groupby. But I have no idea where to even start with this. If you can offer some guidance I would be very grateful!
<p><a href="https://i.sstatic.net/TMnDm9DJ.png" rel="nofollow noreferrer">My dataframe</a></p>
<p>My dataframe is in the linked image. Basically to make it simple, my dataframe currently looks something like this:</p>
<div class="s-table-container"><table class="s-table">
<thead>
<tr>
<th>Gene</th>
<th>Cell_A</th>
<th>Cell_B</th>
<th>Cell_B</th>
<th>Cell_B</th>
<th>Cell_A</th>
</tr>
</thead>
<tbody>
<tr>
<td>Gene_A</td>
<td>0</td>
<td>4</td>
<td>35.5</td>
<td>4.5</td>
<td>3.5</td>
</tr>
<tr>
<td>Gene_B</td>
<td>1.3</td>
<td>52</td>
<td>3.4</td>
<td>2.4</td>
<td>0</td>
</tr>
<tr>
<td>Gene_C</td>
<td>2.3</td>
<td>3.3</td>
<td>32</td>
<td>0</td>
<td>2</td>
</tr>
</tbody>
</table></div>
<p>And there are 3105 columns of Cell_A and Cell_B combined. There are around 13k (I think?) rows of genes. What I want to do is get the <strong>average number</strong> per gene (row), grouped by the unique column name. So in the end I would have just 2 columns, Cell_A and Cell_B, with the average number (per gene, i.e. row) as data.</p>
<p>I expect that it has to do something with either agg or groupby. But I have no idea where to even start with this. If you can offer some guidance I would be very grateful!</p>
 
Top