OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Fill Column Values using a dictionary and pattern matching

  • Thread starter Thread starter canconfirm24
  • Start date Start date
C

canconfirm24

Guest
I am working on categorizing credit card transaction. Right now I am using a dictionary combined with np.select() as follows:

Code:
    def cat_mapper(frame, targ_col, cat_col):
    
        category_retailers = {'Online Shopping':['amazon','amzn mktp', 'target.com'],
                              'Wholesale Stores': ['costco', 'target'],
                             }
        cond = [frame[targ_col].str.contains('|'.join(category_retailers['Online Shopping']),regex=True,case=False),
                frame[targ_col].str.contains('|'.join(category_retailers['Wholesale Stores']),regex=True,case=False),
               ]

        choice = ['Online Shopping',
                  'Wholesale Stores'
                 ]
       
        default_cond = frame[cat_col]

        frame[cat_col] = np.select(cond, choice, default_cond)
        return frame

Where the frame parameter is the dataframe, targ_col parameter is the Description column with the transaction description or name and the cat_col parameter is the category column that will contain the transaction category.

The basic premise is to check the transaction description column for partial matches in the dictionary values, if there is a partial match in the description assign the corresponding dictionary key to the category column.

There are no issues with the functionality of the above block, but there is some redundancy in have to define dictionary values then match the dictionary values to corresponding conditions and choices for np.select.

Is there a way to pattern match the transaction description against a list of values in the dictionary and assign the dictionary key to the category column without using np.select as an intermediary?

I assume I can use a nested loop but even that seems to verbose. Is there a more eloquent way to achieve the same outcome.

sample dataframe:

Code:
data_dict = {'Description': ['amazon 345689','amzn mktp online 7765','amazon 4444','costco location','Wholefoods'],
           'Category':['NaN','NaN','NaN','NaN','Groceries']
          }
df = pd.DataFrame(data=data_dict)

Sample Output:

Code:
data_dict = {'Description': ['amazon 345689','amzn mkpt online 7765','amazon 4444','costco location','Wholefoods'],
           'Category':['Online Shopping','Online Shopping','Online Shopping','Online Shopping','Groceries']
          }
df = pd.DataFrame(data=data_dict)

The above sample output should be reproducible with the below code block and my current np.select() framework.

Code:
import pandas as pd
import numpy as np


def cat_mapper(frame, targ_col, cat_col):

    category_retailers = {'Online Shopping':['amazon','amzn mktp', 'target.com'],
                            'Wholesale Stores': ['costco', 'target'],
                            }
    cond = [frame[targ_col].str.contains('|'.join(category_retailers['Online Shopping']),regex=True,case=False),
            frame[targ_col].str.contains('|'.join(category_retailers['Wholesale Stores']),regex=True,case=False),
            ]

    choice = ['Online Shopping',
                'Wholesale Stores',
                ]

    default_cond = frame[cat_col]

    frame[cat_col] = np.select(cond, choice, default_cond)
    return frame

data_dict ={'Description': ['amazon 345689','amzn mktp online 7765','amazon 4444','costco location','Wholefoods'],
           'Category':['NaN','NaN','NaN','NaN','Groceries']
          }


df = pd.DataFrame(data=data_dict)

cat_mapper(df,'Description','Category')

Thanks in advance and let me know if you need me to provide any additional details
<p>I am working on categorizing credit card transaction. Right now I am using a dictionary combined with np.select() as follows:</p>
<pre><code> def cat_mapper(frame, targ_col, cat_col):

category_retailers = {'Online Shopping':['amazon','amzn mktp', 'target.com'],
'Wholesale Stores': ['costco', 'target'],
}
cond = [frame[targ_col].str.contains('|'.join(category_retailers['Online Shopping']),regex=True,case=False),
frame[targ_col].str.contains('|'.join(category_retailers['Wholesale Stores']),regex=True,case=False),
]

choice = ['Online Shopping',
'Wholesale Stores'
]

default_cond = frame[cat_col]

frame[cat_col] = np.select(cond, choice, default_cond)
return frame
</code></pre>
<p>Where the frame parameter is the dataframe, targ_col parameter is the Description column with the transaction description or name and the cat_col parameter is the category column that will contain the transaction category.</p>
<p>The basic premise is to check the transaction description column for partial matches in the dictionary values, if there is a partial match in the description assign the corresponding dictionary key to the category column.</p>
<p>There are no issues with the functionality of the above block, but there is some redundancy in have to define dictionary values then match the dictionary values to corresponding conditions and choices for np.select.</p>
<p>Is there a way to pattern match the transaction description against a list of values in the dictionary and assign the dictionary key to the category column without using np.select as an intermediary?</p>
<p>I assume I can use a nested loop but even that seems to verbose. Is there a more eloquent way to achieve the same outcome.</p>
<p>sample dataframe:</p>
<pre><code>data_dict = {'Description': ['amazon 345689','amzn mktp online 7765','amazon 4444','costco location','Wholefoods'],
'Category':['NaN','NaN','NaN','NaN','Groceries']
}
df = pd.DataFrame(data=data_dict)
</code></pre>
<p>Sample Output:</p>
<pre><code>data_dict = {'Description': ['amazon 345689','amzn mkpt online 7765','amazon 4444','costco location','Wholefoods'],
'Category':['Online Shopping','Online Shopping','Online Shopping','Online Shopping','Groceries']
}
df = pd.DataFrame(data=data_dict)
</code></pre>
<p>The above sample output should be reproducible with the below code block and my current np.select() framework.</p>
<pre><code>import pandas as pd
import numpy as np


def cat_mapper(frame, targ_col, cat_col):

category_retailers = {'Online Shopping':['amazon','amzn mktp', 'target.com'],
'Wholesale Stores': ['costco', 'target'],
}
cond = [frame[targ_col].str.contains('|'.join(category_retailers['Online Shopping']),regex=True,case=False),
frame[targ_col].str.contains('|'.join(category_retailers['Wholesale Stores']),regex=True,case=False),
]

choice = ['Online Shopping',
'Wholesale Stores',
]

default_cond = frame[cat_col]

frame[cat_col] = np.select(cond, choice, default_cond)
return frame

data_dict ={'Description': ['amazon 345689','amzn mktp online 7765','amazon 4444','costco location','Wholefoods'],
'Category':['NaN','NaN','NaN','NaN','Groceries']
}


df = pd.DataFrame(data=data_dict)

cat_mapper(df,'Description','Category')
</code></pre>
<p>Thanks in advance and let me know if you need me to provide any additional details</p>
 

Latest posts

S
Replies
0
Views
1
Safwan Aipuram
S
Top