I have a dataset set with missing values and I want to fill them groupwise. I used the groupby() method and it works fine. But the same thing I want to do using the KNNImputer algorithm.
Code I have done so far(using groupby() method which worked as expected):
null_columns = df.columns[df.isnull().any()]
# filling median values by country
for column in null_columns:
if column != "Life expectancy":
df[column] = df.groupby("Country")[column].apply(lambda x: x.fillna(x.median()))
Code I tried but couldn’t work(using KNNImputer):
# Initializing imputer
imputer = KNNImputer(n_neighbors=5)
# Select numeric columns
numeric_cols = df.select_dtypes(include="number").columns
# Loop through numeric columns
for cols in numeric_cols:
if cols != "Life expectancy":
# Group by country and apply the imputer
df[cols] = df.groupby("Country")[cols].transform(lambda x: imputer.fit_transform(x[[cols]]))
I tried different processes but nothing gave me the result like the previous code(using groupby() method).
You need to sign in to view this answers