October 22, 2024
Chicago 12, Melborne City, USA
python

Pandas DataFrame – KNNImputer Algorithm Implementation


I have a dataset set with missing values and I want to fill them groupwise. I used the groupby() method and it works fine. But the same thing I want to do using the KNNImputer algorithm.

Code I have done so far(using groupby() method which worked as expected):

null_columns = df.columns[df.isnull().any()]

# filling median values by country
for column in null_columns:
    if column != "Life expectancy":
        df[column] = df.groupby("Country")[column].apply(lambda x: x.fillna(x.median()))

Code I tried but couldn’t work(using KNNImputer):

# Initializing imputer
imputer = KNNImputer(n_neighbors=5)

# Select numeric columns
numeric_cols = df.select_dtypes(include="number").columns

# Loop through numeric columns
for cols in numeric_cols:
    if cols != "Life expectancy":
        # Group by country and apply the imputer
        df[cols] = df.groupby("Country")[cols].transform(lambda x: imputer.fit_transform(x[[cols]]))

I tried different processes but nothing gave me the result like the previous code(using groupby() method).



You need to sign in to view this answers

Leave feedback about this

  • Quality
  • Price
  • Service

PROS

+
Add Field

CONS

+
Add Field
Choose Image
Choose Video