I am trying to create a SQL query to count active transactions month by month from a transaction_status
table. Each transaction_id
can have multiple statuses, with two key statuses: ENDORSEMENT_SUCCESS
(when the transaction is active) and ENDORSEMENT_CLOSED_SUCCESS
(when the transaction is closed).
Problem:
The challenge I’m facing is ensuring that:
-
A transaction should only be counted as active if it has the
ENDORSEMENT_SUCCESS
status and does not have theENDORSEMENT_CLOSED_SUCCESS
status at any point after that. -
If a transaction is closed in a particular month, it should not be counted as active in the following months, but it should still be counted in the months where it was active.
-
Additionally, we have a parallel
transaction
table that tracks thelast_status
of eachtransaction_id
. The monthly count of active transactions from thetransaction_status
table should match the count of transactions withlast_status="ENDORSEMENT_SUCCESS"
from thetransaction
table.
Despite multiple attempts, the results are consistently incorrect, with the total count being higher than expected.
What I Tried:
Here’s the query I’m using to try and count the active transactions per month:
WITH transactions AS (
SELECT
transaction_id,
MIN(CASE WHEN status="ENDORSEMENT_SUCCESS" THEN created_at END) AS start_date,
MAX(CASE WHEN status="ENDORSEMENT_CLOSED_SUCCESS" THEN created_at END) AS end_date
FROM
transaction_status
GROUP BY
transaction_id
),
active_transactions AS (
SELECT
transaction_id,
start_date
FROM
transactions
WHERE
end_date IS NULL OR end_date > CURRENT_DATE
),
months AS (
SELECT DISTINCT
DATE_TRUNC('month', created_at) AS month_start
FROM
transaction_status
)
SELECT
m.month_start,
COUNT(DISTINCT at.transaction_id) AS active_transactions
FROM
months m
LEFT JOIN
active_transactions at
ON
at.start_date <= m.month_start
GROUP BY
m.month_start
ORDER BY
m.month_start;
The data in the "status" column of the "transaction_status" table is output like this:
transaction_id | status | created_at |
---|---|---|
1222 | ENDORSEMENT_SUCCESS | 2024-03-01 |
1222 | ENDORSEMENT_CLOSED_SUCCESS | 2024-10-11 |
1333 | ENDORSEMENT_SUCCESS | 2024-01-15 |
1444 | ENDORSEMENT_SUCCESS | 2024-02-21 |
1444 | ENDORSEMENT_CLOSED_SUCCESS | 2024-09-15 |
Expected Output:
month_start | active_transactions |
---|---|
2024-03-01 | 30 |
2024-04-01 | 65 |
For example, based on the data above, the expected output for the month of March 2023 should be:
What I Expected: I expected the query to return the number of active transactions for each month, with totals that would match the last_status count from the parallel transaction table.
What Actually Resulted: The total count returned by the query is higher than expected, and it does not align with the counts from the transaction table.
You need to sign in to view this answers