OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Polars: pandas equivalent of selecting column names from a list

  • Thread starter Thread starter Einar
  • Start date Start date
E

Einar

Guest
I have two DataFrames in polars, one that describes the meta data, and one of the actual data (LazyFrames are used as the actual data is larger):

Code:
import polars as pl
df = pl.LazyFrame(
    {
        "ID": ["CX1", "CX2", "CX3"],
        "Sample1": [1, 1, 1],
        "Sample2": [2, 2, 2],
        "Sample3": [4, 4, 4],
    }
)

df_meta = pl.LazyFrame(
    {
        "sample": ["Sample1", "Sample2", "Sa,mple3", "Sample4"],
        "qc": ["pass", "pass", "fail", "pass"]
    }
)

I need to select the columns in df for samples that have passing qc using the information in df_meta. As you can see, df_meta has an additional sample, which of course we are not interested in as it's not part of our data.

In pandas, I'd do (not very elegant but does the job):

Code:
df.loc[:, df.columns.isin(df_meta.query("qc == 'pass'")["sample"])]

However I'm not sure about how doing this in polars. Reading through SO and the docs didn't give me a definite answer.

I've tried:

Code:
df.with_context(
   df_meta.filter(pl.col("qc") == "pass").select(pl.col("sample").alias("meta_ids"))
).with_columns(
    pl.all().is_in("meta_ids")
).collect()

Which however raises an exception:

Code:
InvalidOperationError: `is_in` cannot check for String values in Int64 data

I assume it's checking the content of the columns, but I'm interested in the column names.

I've also tried:

Code:
meta_ids = df_meta.filter(pl.col("qc") == "pass").get_column("sample")
df.select(pl.col(meta_ids))

but as expected, an exception is raised as there's one sample not accounted for in the first dataFrame:

Code:
ColumnNotFoundError: Sample4

What would be the correct way to do this?
<p>I have two DataFrames in polars, one that describes the meta data, and one of the actual data (LazyFrames are used as the actual data is larger):</p>
<pre><code>import polars as pl
df = pl.LazyFrame(
{
"ID": ["CX1", "CX2", "CX3"],
"Sample1": [1, 1, 1],
"Sample2": [2, 2, 2],
"Sample3": [4, 4, 4],
}
)

df_meta = pl.LazyFrame(
{
"sample": ["Sample1", "Sample2", "Sa,mple3", "Sample4"],
"qc": ["pass", "pass", "fail", "pass"]
}
)
</code></pre>
<p>I need to select the <em>columns</em> in <code>df</code> for samples that have passing <code>qc</code> using the information in <code>df_meta</code>. As you can see, <code>df_meta</code> has an additional sample, which of course we are not interested in as it's not part of our data.</p>
<p>In pandas, I'd do (not very elegant but does the job):</p>
<pre><code>df.loc[:, df.columns.isin(df_meta.query("qc == 'pass'")["sample"])]
</code></pre>
<p>However I'm not sure about how doing this in polars. Reading through SO and the docs didn't give me a definite answer.</p>
<p>I've tried:</p>
<pre><code>df.with_context(
df_meta.filter(pl.col("qc") == "pass").select(pl.col("sample").alias("meta_ids"))
).with_columns(
pl.all().is_in("meta_ids")
).collect()
</code></pre>
<p>Which however raises an exception:</p>
<pre><code>InvalidOperationError: `is_in` cannot check for String values in Int64 data
</code></pre>
<p>I assume it's checking the content of the columns, but I'm interested in the column <em>names</em>.</p>
<p>I've also tried:</p>
<pre><code>meta_ids = df_meta.filter(pl.col("qc") == "pass").get_column("sample")
df.select(pl.col(meta_ids))
</code></pre>
<p>but as expected, an exception is raised as there's one sample not accounted for in the first dataFrame:</p>
<pre><code>ColumnNotFoundError: Sample4
</code></pre>
<p>What would be the correct way to do this?</p>
 

Latest posts

I
Replies
0
Views
1
Iain Conochie
I
M
Replies
0
Views
1
Mahshid Jafarzadeh
M
Top