OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Check whether boolean column contains only True values

  • Thread starter Thread starter the_economist
  • Start date Start date
T

the_economist

Guest
Working in Databricks, I've got a dataframe which looks like this:

Code:
columns = ["a", "b", "c"]
data = [(True, True, True), (True, True, True), (True, False, True)]
df = spark.createDataFrame(data).toDF(*columns)
df.display()

enter image description here

I'd like to select only those columns of the dataframe in which not all values are True.
In pandas, I would use df['a'].all() to check whether all values of column "a" are True. Unfortunately, I don't find an equivalent in PySpark. I have found a solution for the problem, but it seems much too complicated:

Code:
df.select(*[column for column in df.columns 
            if df.select(column).distinct().collect() != 
            spark.createDataFrame([True], 'boolean').toDF(column).collect()])

The solution returns what I want:

enter image description here

Is there an easier way of doing this in PySpark?
<p>Working in Databricks, I've got a dataframe which looks like this:</p>
<pre><code>columns = ["a", "b", "c"]
data = [(True, True, True), (True, True, True), (True, False, True)]
df = spark.createDataFrame(data).toDF(*columns)
df.display()
</code></pre>
<p><a href="https://i.sstatic.net/OuHdw.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/OuHdw.png" alt="enter image description here" /></a></p>
<p>I'd like to select only those columns of the dataframe in which not all values are True.<br />
In pandas, I would use <code>df['a'].all()</code> to check whether all values of column "a" are True. Unfortunately, I don't find an equivalent in PySpark.
I have found a solution for the problem, but it seems much too complicated:</p>
<pre><code>df.select(*[column for column in df.columns
if df.select(column).distinct().collect() !=
spark.createDataFrame([True], 'boolean').toDF(column).collect()])
</code></pre>
<p>The solution returns what I want:</p>
<p><a href="https://i.sstatic.net/QRlPN.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/QRlPN.png" alt="enter image description here" /></a></p>
<p>Is there an easier way of doing this in PySpark?</p>
 
Top