I have code like the following where I split up a dataframe into different groups. The "treatment" group is where I might want to delete rows and/or modify rows; and for performance reasons I split it into away from a group of rows that should survive unchanged.
It is guaranteed that all DFs have the same columns and dtypes (they all come from the original df
parameter).
At the end of the treatment, I want to concat them back to a single DF. Now, I do not know in advance if any of the DFs will be empty. (and if df
is empty, all DFs will be empty (happens especially in testing) – usually though df
has ~500k rows).
See code:
def some_fn(df: pd.DataFrame) -> pd.DataFrame:
df_no_treatment, df_treatment = split_df(df)
df_treatment = do_something_complex(df_treatment)
assert (df.dtypes == df_treatment.dtypes).all()
assert (df.dtypes == df_no_treatment.dtypes).all()
result = pd.concat([df_no_treatment, df_treatment]).sort_index()
assert (df.dtypes == result.dtypes).all()
return result
Now, concat
throws a FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
Note the asserts in the code above, it seems to work as intended?
How do I fix the warning or opt into the new behavior? I don’t want any automatics with dtypes, they do match and concat should just concat and do not do anything else.
I find code like
if df_no_treatment.empty:
return df_treatment
if df_treatment.empty:
return df_no_treatment
return pd.concat([df_no_treatment, df_treatment]).sort_index()
absolutely over the top for what was previously a simple concat. What am I missing?
You need to sign in to view this answers