October 21, 2024
Chicago 12, Melborne City, USA
python

rolling sum with right-closed interval in duckdb


In Polars / pandas I can do a rolling sum where row each row the window is (row - 10 minutes, row]. For example:

import polars as pl

data = {
    "timestamp": [
        "2023-08-04 10:00:00",
        "2023-08-04 10:05:00",
        "2023-08-04 10:10:00",
        "2023-08-04 10:10:00",
        "2023-08-04 10:20:00",
        "2023-08-04 10:20:00",
    ],
    "value": [1, 2, 3, 4, 5, 6],
}

df = pl.DataFrame(data).with_columns(pl.col("timestamp").str.strptime(pl.Datetime))

print(
    df.with_columns(pl.col("value").rolling_sum_by("timestamp", "10m", closed="right"))
)

This outputs

shape: (6, 2)
┌─────────────────────┬───────┐
│ timestamp           ┆ value │
│ ---                 ┆ ---   │
│ datetime[μs]        ┆ i64   │
╞═════════════════════╪═══════╡
│ 2023-08-04 10:00:00 ┆ 1     │
│ 2023-08-04 10:05:00 ┆ 3     │
│ 2023-08-04 10:10:00 ┆ 9     │
│ 2023-08-04 10:10:00 ┆ 9     │
│ 2023-08-04 10:20:00 ┆ 11    │
│ 2023-08-04 10:20:00 ┆ 11    │
└─────────────────────┴───────┘

How can I do this in DuckDB? Closest I could come up with is:

rel = duckdb.sql("""
SELECT
  timestamp,
  value,
  SUM(value) OVER roll AS rolling_sum
FROM df
WINDOW roll AS (
  ORDER BY timestamp
  RANGE BETWEEN INTERVAL '9 minutes 59 seconds' PRECEDING AND CURRENT ROW
)
ORDER BY timestamp;
""")
print(rel)

but that makes the window [row - 10 minutes, row], not (row - 10 minutes, row]



You need to sign in to view this answers

Leave feedback about this

  • Quality
  • Price
  • Service

PROS

+
Add Field

CONS

+
Add Field
Choose Image
Choose Video