I’m seeking assistance with a date mismatch issue in my pandas DataFrame. I appreciate any insights you can provide.
I’ve attached an image showing a subset of my DataFrame. The full DataFrame contains about 98 columns, with each pair of columns representing a stock’s dates and values. The issue I’m facing is that some stocks have data starting later (e.g., in 2019 or 2020) instead of the earliest date (13/01/2018).
My objectives are to:
- Align the data across all stocks by shifting down the data for stocks that are not aligned with the earliest date.
- Aggregate the data, keeping only the first ‘Dates’ column.
Additional information:
- The data is weekly, so sometimes 12/01/2018 needs to be aligned with the closest date (e.g., 13/01/2018) for a particular stock.
I need to do that in Python please.
Has anyone encountered a similar issue before? I’d greatly appreciate any suggestions on how to approach this problem efficiently.
Thank you in advance for your help!
date_columns = [col for col in df.columns if ‘Dates’ in col]
value_columns = [col for col in df.columns if ‘Dates’ not in col]
for date_col in date_columns:
df[date_col] = pd.to_datetime(df[date_col], errors=”coerce”)
df_combined = pd.DataFrame()
df_combined[‘Common Date’] = df[date_columns[0]]
for i in range(1, len(date_columns)):
df_combined[‘Common Date’] = df_combined[‘Common Date’].combine_first(df[date_columns[i]])
df_combined = df_combined.drop_duplicates(subset=[‘Common Date’]).sort_values(‘Common Date’).reset_index(drop=True)
for date_col, value_col in zip(date_columns, value_columns):
df_combined[value_col] = df[value_col]
df_combined
You need to sign in to view this answers
Leave feedback about this