Slow query with filter on Postgres (using index, requiring many different filters)

I have a table:

create table accounts_service.operation_history
(
    history_id                 bigint generated always as identity
        primary key,
    operation_id               varchar(36)                            not null
        unique,
    operation_type             varchar(30)                            not null,
    operation_time             timestamp with time zone default now() not null,
    from_phone                 varchar(20),
    user_id                    varchar(21),
-- and a lot of another varchar(x), text and even couple of number, boolean, jsonb, timestamp columns
);


create index operation_history_user_id_operation_time_idx
    on accounts_service.operation_history (user_id, operation_time);

create index operation_history_operation_time_idx
    on accounts_service.operation_history (operation_time);

I want to make a simple select with a where filter on operation_time (this is a required filter and can be a day or two) as well as additional filters for other columns: commonly, with varchar(x) type.

But my quries are slow:

explain (buffers, analyze)
select *
from operation_history operationh0_
where (null is null or operationh0_.user_id = null)
  and operationh0_.operation_time >= '2024-09-30 20:00:00.000000 +00:00'
  and operationh0_.operation_time <= '2024-10-02 20:00:00.000000 +00:00'
  and (operationh0_.from_phone="+000111223344")
order by operationh0_.operation_time asc, operationh0_.history_id asc
limit 25;

Limit  (cost=8063.39..178328.00 rows=25 width=1267) (actual time=174373.106..174374.395 rows=0 loops=1)
  Buffers: shared hit=532597 read=1433916
  I/O Timings: read=517880.241
  ->  Incremental Sort  (cost=8063.39..198759.76 rows=28 width=1267) (actual time=174373.105..174374.394 rows=0 loops=1)
        Sort Key: operation_time, history_id
        Presorted Key: operation_time
        Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 25kB  Peak Memory: 25kB
        Buffers: shared hit=532597 read=1433916
        I/O Timings: read=517880.241
        ->  Gather Merge  (cost=1000.60..198758.50 rows=28 width=1267) (actual time=174373.099..174374.388 rows=0 loops=1)
              Workers Planned: 2
              Workers Launched: 2
              Buffers: shared hit=532597 read=1433916
              I/O Timings: read=517880.241
              ->  Parallel Index Scan using operation_history_operation_time_idx on operation_history operationh0_  (cost=0.57..197755.24 rows=12 width=1267) (actual time=174362.932..174362.933 rows=0 loops=3)
                    Index Cond: ((operation_time >= '2024-09-30 20:00:00+00'::timestamp with time zone) AND (operation_time <= '2024-10-02 20:00:00+00'::timestamp with time zone))
                    Filter: ((from_phone)::text="+000111223344"::text)
                    Rows Removed by Filter: 723711
                    Buffers: shared hit=532597 read=1433916
                    I/O Timings: read=517880.241
Planning Time: 0.193 ms
Execution Time: 174374.449 ms

-- for simplicity:
set max_parallel_workers_per_gather = 0;
--It's just simplifying plan, numbers are relevant. Retry the previous query:

Limit  (cost=7535.40..189179.35 rows=25 width=1267) (actual time=261432.728..261432.729 rows=0 loops=1)
  Buffers: shared hit=374346 read=1591362
  I/O Timings: read=257253.065
  ->  Incremental Sort  (cost=7535.40..210976.63 rows=28 width=1267) (actual time=261432.727..261432.727 rows=0 loops=1)
        Sort Key: operation_time, history_id
        Presorted Key: operation_time
        Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 25kB  Peak Memory: 25kB
        Buffers: shared hit=374346 read=1591362
        I/O Timings: read=257253.065
        ->  Index Scan using operation_history_operation_time_idx on operation_history operationh0_  (cost=0.57..210975.37 rows=28 width=1267) (actual time=261432.720..261432.720 rows=0 loops=1)
              Index Cond: ((operation_time >= '2024-09-30 20:00:00+00'::timestamp with time zone) AND (operation_time <= '2024-10-02 20:00:00+00'::timestamp with time zone))
              Filter: ((from_phone)::text="+000111223344"::text)
              Rows Removed by Filter: 2171134
              Buffers: shared hit=374346 read=1591362
              I/O Timings: read=257253.065
Planning Time: 0.170 ms
Execution Time: 261432.774 ms

So it filtered just 2 171 134 rows and it was more than 4 mins. Seems it is too long, isn’t it?

I tried selecting specific columns (e.g. operation_time, from_phone, to_phone, history_id), it had no effect.
I tried vacuum analyze, it had no effect.
I checked some parameters of postgres, like shared_buffers, work_mem, etc. Changing it has no effect. And also I compared it with pgTune and it’s ok.

Some another info:


SELECT relpages, pg_size_pretty(pg_total_relation_size(oid)) AS table_size
FROM pg_class
WHERE relname="operation_history";

18402644 | 210 GB

select count(*) from operation_history;

352402877

Server drives: AWS gp3
I don’t want to create indexes for all columns because there are massive writes to this table…

Is there any way to optimize it?
Or is it just making a lot of reads from the index and the table and it’s ok and we need to do sharding, etc?

You need to sign in to view this answers

Related Post