October 26, 2024
Chicago 12, Melborne City, USA
SQL

How to Incrementally Append and Upsert Rows in Kedro directly to PostgreSQL DB (SQLTableORMDataset)?


I’m working on a Kedro project where I have a dataset defined in catalog.yml as follows:

daily_stats_dataset:
  type: SQLTableDataSet
  orm_model: my_proj.schemas.sqla_schemas.DailyStats
  credentials: my_database


monthly_stats_dataset:
  type: SQLTableDataSet
  orm_model: my_proj.schemas.sqla_schemas.MonthlyStats
  credentials: my_database

I need to incrementally append new rows from my daily_stats_dataset to my monthly_stats_dataset. However the daily stats are aggregated each time and added to the monthly stats. This means that, when the monthly stats already exist, I need to overwrite that line. I want to ensure that I do not insert duplicate entries based on a date column. I would thus like to implement upsert functionality for existing rows.

Is it possible to do this with Kedro, since the nodes look like this

    node(
            func=update_monthly_stats,
            inputs=daily_stats_dataset,
            outputs="monthly_stats_dataset",
        ),

but the monthly_stats will be inserted, resulting in unique constraints to be violated.



You need to sign in to view this answers

Leave feedback about this

  • Quality
  • Price
  • Service

PROS

+
Add Field

CONS

+
Add Field
Choose Image
Choose Video