OiO.lk Blog SQL Counting Active Transactions in SQL: Issues with Status Logic in Time Series Data
SQL

Counting Active Transactions in SQL: Issues with Status Logic in Time Series Data


I am trying to create a SQL query to count active transactions month by month from a transaction_status table. Each transaction_id can have multiple statuses, with two key statuses: ENDORSEMENT_SUCCESS (when the transaction is active) and ENDORSEMENT_CLOSED_SUCCESS (when the transaction is closed).

Problem:

The challenge I’m facing is ensuring that:

  • A transaction should only be counted as active if it has the ENDORSEMENT_SUCCESS status and does not have the ENDORSEMENT_CLOSED_SUCCESS status at any point after that.

  • If a transaction is closed in a particular month, it should not be counted as active in the following months, but it should still be counted in the months where it was active.

  • Additionally, we have a parallel transaction table that tracks the last_status of each transaction_id. The monthly count of active transactions from the transaction_status table should match the count of transactions with last_status="ENDORSEMENT_SUCCESS" from the transaction table.

Despite multiple attempts, the results are consistently incorrect, with the total count being higher than expected.

What I Tried:

Here’s the query I’m using to try and count the active transactions per month:

WITH transactions AS (
    SELECT 
        transaction_id,
        MIN(CASE WHEN status="ENDORSEMENT_SUCCESS" THEN created_at END) AS start_date,
        MAX(CASE WHEN status="ENDORSEMENT_CLOSED_SUCCESS" THEN created_at END) AS end_date
    FROM 
        transaction_status
    GROUP BY 
        transaction_id
),
active_transactions AS (
    SELECT 
        transaction_id,
        start_date
    FROM 
        transactions
    WHERE 
        end_date IS NULL OR end_date > CURRENT_DATE
),
months AS (
    SELECT DISTINCT
        DATE_TRUNC('month', created_at) AS month_start
    FROM 
        transaction_status
)

SELECT 
    m.month_start,
    COUNT(DISTINCT at.transaction_id) AS active_transactions
FROM 
    months m
LEFT JOIN 
    active_transactions at
ON 
    at.start_date <= m.month_start
GROUP BY 
    m.month_start
ORDER BY 
    m.month_start;

The data in the "status" column of the "transaction_status" table is output like this:

transaction_id status created_at
1222 ENDORSEMENT_SUCCESS 2024-03-01
1222 ENDORSEMENT_CLOSED_SUCCESS 2024-10-11
1333 ENDORSEMENT_SUCCESS 2024-01-15
1444 ENDORSEMENT_SUCCESS 2024-02-21
1444 ENDORSEMENT_CLOSED_SUCCESS 2024-09-15

Expected Output:

month_start active_transactions
2024-03-01 30
2024-04-01 65

For example, based on the data above, the expected output for the month of March 2023 should be:

What I Expected: I expected the query to return the number of active transactions for each month, with totals that would match the last_status count from the parallel transaction table.

What Actually Resulted: The total count returned by the query is higher than expected, and it does not align with the counts from the transaction table.



You need to sign in to view this answers

Exit mobile version