Impute missing values by group data



    Wyx Smrf


    Impute missing values based on df.groupby() method. The imputed values will be based on the group statistic defined in the categorical columns stated.

    Library: Pandas

    Filename pattern: .ipynb, .py

    import pandas as pd
    params = {'dataframe':       df,
          'cat_col_1':       'categorical_column from df', 
          'cat_col_2':       'categorical_column from df', 
          'num_col':         'numerical_column from df', 
          'impute_strategy': 'mean, median, etc.'}
    def conditional_fillna(dataframe, cat_col_1, cat_col_2, num_col, impute_strategy):
      """Impute the missing values in a column using central tendency. 
        The values of the central tendency is obtained by the statistic of the grouped data
        dataframe (DataFrame): Two-dimensional data structure stored in a tabular format.
        cat_col_1 (Series): Categorical column to be used in data grouping
        cat_col_2 (Series): Categorical column to be used in data grouping
        num_col (Series): Numerical Column to be used to compute for numeric values among different data groups
        impute_strategy (String): The method on which to impute the missing values
        DataFrame: Imputed missing values based on the statistic from other column/s
      # Create a grouping for the dataset
      grouped_data = dataframe.groupby([cat_col_1, cat_col_2])[num_col]
      # Fill missing values in a column using the defined impute strategy
      dataframe[num_col].fillna(grouped_data.transform(impute_strategy), inplace=True)
      # Optional: Covert the imputed numerical values into an integer
      dataframe[num_col] = dataframe[num_col].round(0).astype(int)
      return dataframe
    Codiga Logo
    Codiga Hub
    • Rulesets
    • Playground
    • Snippets
    • Cookbooks
    soc-2 icon

    We are SOC-2 Compliance Certified

    G2 high performer medal

    Codiga – All rights reserved 2022.