OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Pandas read csv simultaneously passing usecols and names args

  • Thread starter Thread starter silence_of_the_lambdas
  • Start date Start date
S

silence_of_the_lambdas

Guest
When reading a CSV file as a pandas dataframe, an error is raised when trying to select a subset of columns based on original column names (usecols=) and renaming the selected columns (names=). Passing renamed column names to usecols works, but all columns must be passed to names to correctly select columns.

Code:
# read the entire CSV
df1a = pd.read_csv(folder_csv+'test_read_csv.csv')
# select a subset of columns while reading the CSV
df1b = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['Col1','Col3'])
# rename columns while reading the CSV
df1c = pd.read_csv(folder_csv+'test_read_csv.csv', names=['first', 'second', 'third'], header=0)

# select a subset of columns and rename them while reading the CSV;
# throws error "ValueError: Usecols do not match columns, columns expected but not found: ['Col3', 'Col1']"
df1d = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['Col1','Col3'], names=['first','third'])

# selects columns 1 and 2, calling them 1 and 3
df1e = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['first','third'], names=['first','third'])
# selects columns 1 and 3 correctly
df1f = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['first','third'], names=['first','second','third'])

The CSV file test_read_csv.csv is:

Code:
Col1,Col2,Col3
val1a,val2a,val3a
val1b,val2b,val3b
val1c,val2c,val3c
val1d,val2d,val3d
val1e,val2e,val3e

Wouldn't it be a fairly common use case to select certain columns based on the original column names and then renaming only those columns while reading the data?

Of course, it is possible to select the columns and rename them after loading the entire CSV file:

Code:
df1 = df1[['Col1','Col3']]
df1.columns = ['first', 'third']

But I don't know how and whether this can be integrated directly when reading the data. The same holds also for pd.read_excel().
<p>When reading a CSV file as a pandas dataframe, an error is raised when trying to select a subset of columns based on original column names (<code>usecols=</code>) and renaming the selected columns (<code>names=</code>). Passing renamed column names to <code>usecols</code> works, but all columns must be passed to <code>names</code> to correctly select columns.</p>
<pre><code># read the entire CSV
df1a = pd.read_csv(folder_csv+'test_read_csv.csv')
# select a subset of columns while reading the CSV
df1b = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['Col1','Col3'])
# rename columns while reading the CSV
df1c = pd.read_csv(folder_csv+'test_read_csv.csv', names=['first', 'second', 'third'], header=0)

# select a subset of columns and rename them while reading the CSV;
# throws error "ValueError: Usecols do not match columns, columns expected but not found: ['Col3', 'Col1']"
df1d = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['Col1','Col3'], names=['first','third'])

# selects columns 1 and 2, calling them 1 and 3
df1e = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['first','third'], names=['first','third'])
# selects columns 1 and 3 correctly
df1f = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['first','third'], names=['first','second','third'])
</code></pre>
<p>The CSV file <em>test_read_csv.csv</em> is:</p>
<pre><code>Col1,Col2,Col3
val1a,val2a,val3a
val1b,val2b,val3b
val1c,val2c,val3c
val1d,val2d,val3d
val1e,val2e,val3e
</code></pre>
<p><strong>Wouldn't it be a fairly common use case to select certain columns based on the <em>original</em> column names and then renaming only those columns while reading the data?</strong></p>
<p>Of course, it is possible to select the columns and rename them after loading the entire CSV file:</p>
<pre><code>df1 = df1[['Col1','Col3']]
df1.columns = ['first', 'third']
</code></pre>
<p>But I don't know how and whether this can be integrated directly when reading the data. The same holds also for <code>pd.read_excel()</code>.</p>
 

Latest posts

S
Replies
0
Views
1
Safwan Aipuram
S
Top