I’m working on a Kedro project where I have a dataset defined in catalog.yml as follows:
daily_stats_dataset:
type: SQLTableDataSet
orm_model: my_proj.schemas.sqla_schemas.DailyStats
credentials: my_database
monthly_stats_dataset:
type: SQLTableDataSet
orm_model: my_proj.schemas.sqla_schemas.MonthlyStats
credentials: my_database
I need to incrementally append new rows from my daily_stats_dataset to my monthly_stats_dataset. However the daily stats are aggregated each time and added to the monthly stats. This means that, when the monthly stats already exist, I need to overwrite that line. I want to ensure that I do not insert duplicate entries based on a date column. I would thus like to implement upsert functionality for existing rows.
Is it possible to do this with Kedro, since the nodes look like this
node(
func=update_monthly_stats,
inputs=daily_stats_dataset,
outputs="monthly_stats_dataset",
),
but the monthly_stats will be inserted, resulting in unique constraints to be violated.
You need to sign in to view this answers