OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Adding leading zeros to data columns when loading from CSV files using pandas

  • Thread starter Thread starter PhysyCola
  • Start date Start date
P

PhysyCola

Guest
I have a script to load and combine time-series data from two .csv files that have the same base filename (specified as a path using Pathlib), but different suffixes. A minimal working example of this is as follows:

Code:
import pandas as pd

def load_data(filename):
    headers_0 = ['a', 'b', 'c']  # Headers for first file. May have more entries than columns in file
    headers_1 = ['d', 'e']       # Headers for second file.

    data_0 = pd.read_csv(str(filename.with_suffix('')) + '_0', header=None, delim_whitespace=True)
    data_0.columns = headers_0[0:data_0.shape[1]]

    data_1 = pd.read_csv(str(filename.with_suffix('')) + '_1', header=None, delim_whitespace=True)
    data_1.columns = headers_1[0:data_1.shape[1]]

    data = data_0.join(data_1)
    data.fillna(0, inplace=True)

    return data

Thus far, I have only been using load_data for datasets where both data_0 and data_1 have the same length of columns (same length of time-series). However, I am now encountering a situation where data_1 has a shorter column length than data_0; this is because the data in data_1 only starts getting recorded at some later time than data_0.

How do I use pandas to fill the columns of data_1 with leading zeros, such that the column length in both data_0 and data_1 is the same? I believe that the line data.fillna(0, inplace=True) is filling the length mismatch with trailing zeros; is there an obvious way to change this to leading zeros? Note that I do not know the length of either dataset a priori, so I would appreciate help towards a solution that works based on the length of the data loaded using pandas.

I have tried different options for DataFrame.fillna such as method=backfill, but none of these attempts have yielded the expected result.
<p>I have a script to load and combine time-series data from two .csv files that have the same base filename (specified as a path using <code>Pathlib</code>), but different suffixes. A minimal working example of this is as follows:</p>
<pre class="lang-py prettyprint-override"><code>import pandas as pd

def load_data(filename):
headers_0 = ['a', 'b', 'c'] # Headers for first file. May have more entries than columns in file
headers_1 = ['d', 'e'] # Headers for second file.

data_0 = pd.read_csv(str(filename.with_suffix('')) + '_0', header=None, delim_whitespace=True)
data_0.columns = headers_0[0:data_0.shape[1]]

data_1 = pd.read_csv(str(filename.with_suffix('')) + '_1', header=None, delim_whitespace=True)
data_1.columns = headers_1[0:data_1.shape[1]]

data = data_0.join(data_1)
data.fillna(0, inplace=True)

return data
</code></pre>
<p>Thus far, I have only been using <code>load_data</code> for datasets where both <code>data_0</code> and <code>data_1</code> have the same length of columns (same length of time-series). However, I am now encountering a situation where <code>data_1</code> has a shorter column length than <code>data_0</code>; this is because the data in <code>data_1</code> only starts getting recorded at some later time than <code>data_0</code>.</p>
<p>How do I use pandas to fill the columns of <code>data_1</code> with <strong>leading</strong> zeros, such that the column length in both <code>data_0</code> and <code>data_1</code> is the same? I believe that the line <code>data.fillna(0, inplace=True)</code> is filling the length mismatch with trailing zeros; is there an obvious way to change this to leading zeros? Note that I do not know the length of either dataset <em>a priori</em>, so I would appreciate help towards a solution that works based on the length of the data loaded using pandas.</p>
<p>I have tried different options for <code>DataFrame.fillna</code> such as <code>method=backfill</code>, but none of these attempts have yielded the expected result.</p>
 

Latest posts

Top