OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Singularizing (multi-term) ingredients from cooking recipes

  • Thread starter Thread starter carly lange
  • Start date Start date
C

carly lange

Guest
I am working on a recipe classification system and I am struggling with the preprocessing of my data. The data is from Food.com and I need to make sure that all ingredients are in singular form to reduce the number of unique ingredients. But I am not so sure how to do that/what library to use. It is also a bit tricky because some ingredient names are very long e.g. "del monte crushed tomatoes with mild green chilies" or they include specific brand names e.g. "baker's angel flake sweetened coconut".

First reading the ingredients and see what it looks like I first removed all ",', and []. that leaves me with the data as seen in the description above. But now I am struggling with converting them all to singular. After I did the cleaning the data looks like in the second image. Removing characters (",',[]). I intentionally did not remove numbers and dots since there are ingredients like "a.1. steak sauce" and "1% low-fat chocolate milk". But what can I do to singularize the terms? So far I have around 14000 unique ingredients and I am sure it would reduce by half when everything is in singular form. Some words are very distinct so I tried using a mapping function that converts e.g.: '8-inch 97% fat free flour tortillas': '8-inch 97% fat-free flour tortilla'

But also sometimes there are cases where the authors write instead of "8-inch" they write "8"", that also complicates the process a lot. I do not know how to go about it since I cannot go through all 14000 ingredients and adjust them by hand with the mapping function. Also, since I am building a recipe classifier that should also be able to classify completely new recipes, I need to find a way to integrate the preprocessing in my system.
<p>I am working on a recipe classification system and I am struggling with the preprocessing of my data. The data is from Food.com and I need to make sure that all ingredients are in singular form to reduce the number of unique ingredients. But I am not so sure how to do that/what library to use. It is also a bit tricky because some ingredient names are very long e.g. "del monte crushed tomatoes with mild green chilies" or they include specific brand names e.g. "baker's angel flake sweetened coconut".</p>
<p><a href="https://i.sstatic.net/XIxn4kKc.png" rel="nofollow noreferrer">First reading the ingredients and see what it looks like</a> I first removed all ",', and []. that leaves me with the data as seen in the description above. But now I am struggling with converting them all to singular. After I did the cleaning the data looks like in the second image. <a href="https://i.sstatic.net/o76nk3A4.png" rel="nofollow noreferrer">Removing characters (",',[])</a>. I intentionally did not remove numbers and dots since there are ingredients like "a.1. steak sauce" and "1% low-fat chocolate milk". But what can I do to singularize the terms? So far I have around 14000 unique ingredients and I am sure it would reduce by half when everything is in singular form. Some words are very distinct so I tried using a mapping function that converts e.g.:
<em>'8-inch 97% fat free flour tortillas': '8-inch 97% fat-free flour tortilla'</em></p>
<p>But also sometimes there are cases where the authors write instead of "8-inch" they write "8"", that also complicates the process a lot. I do not know how to go about it since I cannot go through all 14000 ingredients and adjust them by hand with the mapping function. Also, since I am building a recipe classifier that should also be able to classify completely new recipes, I need to find a way to integrate the preprocessing in my system.</p>
 
Top