October 21, 2024
Chicago 12, Melborne City, USA
python

Dates mismatch in DataFrame – Data alignment


I’m seeking assistance with a date mismatch issue in my pandas DataFrame. I appreciate any insights you can provide.

enter image description here

I’ve attached an image showing a subset of my DataFrame. The full DataFrame contains about 98 columns, with each pair of columns representing a stock’s dates and values. The issue I’m facing is that some stocks have data starting later (e.g., in 2019 or 2020) instead of the earliest date (13/01/2018).

My objectives are to:

  1. Align the data across all stocks by shifting down the data for stocks that are not aligned with the earliest date.
  2. Aggregate the data, keeping only the first ‘Dates’ column.

Additional information:

  • The data is weekly, so sometimes 12/01/2018 needs to be aligned with the closest date (e.g., 13/01/2018) for a particular stock.

I need to do that in Python please.
Has anyone encountered a similar issue before? I’d greatly appreciate any suggestions on how to approach this problem efficiently.

Thank you in advance for your help!

date_columns = [col for col in df.columns if ‘Dates’ in col]
value_columns = [col for col in df.columns if ‘Dates’ not in col]

for date_col in date_columns:
df[date_col] = pd.to_datetime(df[date_col], errors=”coerce”)

df_combined = pd.DataFrame()
df_combined[‘Common Date’] = df[date_columns[0]]

for i in range(1, len(date_columns)):
df_combined[‘Common Date’] = df_combined[‘Common Date’].combine_first(df[date_columns[i]])

df_combined = df_combined.drop_duplicates(subset=[‘Common Date’]).sort_values(‘Common Date’).reset_index(drop=True)

for date_col, value_col in zip(date_columns, value_columns):
df_combined[value_col] = df[value_col]

df_combined



You need to sign in to view this answers

Leave feedback about this

  • Quality
  • Price
  • Service

PROS

+
Add Field

CONS

+
Add Field
Choose Image
Choose Video