OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

UDF in Snowpark + DBT gives ModuleNotFoundError with group_by().apply_in_pandas()

  • Thread starter Thread starter Juan Saiz Lomas
  • Start date Start date
J

Juan Saiz Lomas

Guest
I am using snowpark and DBT to do some data transformations and creating some views/table in Snowflake. My models have a structure like this one:

Code:
import pandas as pd
from snowflake.snowpark import Session
from snowflake.snowpark.functions import col

def model(dbt, session: Session):
    dbt.config(materialized="table")
    df = dbt.ref("<SOME_TABLE>")
    df_grouped = df.group_by(col('ID1_COL'), col('ID2_COL'))

    def transform_function(df: pd.DataFrame) -> pd.DataFrame:
        """
        Function that transforms data and is to be used in a grouped dataframe
        """
        [...]
        return df_transformed

    final_df = df_grouped.apply_in_pandas(
        transform_function,
        output_schema=StructType([
            StructField("COL1", StringType()),
            StructField("COL2", BooleanType()),
            [...]
            StructField("COLN", FloatType()),
        ])
    )

    return final_df

I want to define the transform_function outside of the dbt model function. No matter what I try I get different errors, mainly this one:

Code:
  ModuleNotFoundError: No module named 'main_module'

Post about similar problem: https://community.snowflake.com/s/question/0D5VI000009QRW00AO/groupbyapplyinpandas-gives-error

I tried @udf decorators, registering a function in multiple different ways but I haven't managed to do it. When the function is not used in the apply_in_pandas it seems to work.

I have also tried pandas_udf but it hangs for ages.

The only thing I haven't tried is writing a library in a separate file but ideally I would like to avoid this.
<p>I am using snowpark and DBT to do some data transformations and creating some views/table in Snowflake. My models have a structure like this one:</p>
<pre><code>import pandas as pd
from snowflake.snowpark import Session
from snowflake.snowpark.functions import col

def model(dbt, session: Session):
dbt.config(materialized="table")
df = dbt.ref("<SOME_TABLE>")
df_grouped = df.group_by(col('ID1_COL'), col('ID2_COL'))

def transform_function(df: pd.DataFrame) -> pd.DataFrame:
"""
Function that transforms data and is to be used in a grouped dataframe
"""
[...]
return df_transformed

final_df = df_grouped.apply_in_pandas(
transform_function,
output_schema=StructType([
StructField("COL1", StringType()),
StructField("COL2", BooleanType()),
[...]
StructField("COLN", FloatType()),
])
)

return final_df
</code></pre>
<p>I want to define the <code>transform_function</code> outside of the dbt <code>model</code> function. No matter what I try I get different errors, mainly this one:</p>
<pre><code> ModuleNotFoundError: No module named 'main_module'
</code></pre>
<p>Post about similar problem: <a href="https://community.snowflake.com/s/question/0D5VI000009QRW00AO/groupbyapplyinpandas-gives-error" rel="nofollow noreferrer">https://community.snowflake.com/s/question/0D5VI000009QRW00AO/groupbyapplyinpandas-gives-error</a></p>
<p>I tried <code>@udf</code> decorators, registering a function in multiple different ways but I haven't managed to do it. When the function is not used in the apply_in_pandas it seems to work.</p>
<p>I have also tried <code>pandas_udf</code> but it hangs for ages.</p>
<p>The only thing I haven't tried is writing a library in a separate file but ideally I would like to avoid this.</p>
 

Latest posts

Top