I have a table:
create table accounts_service.operation_history
(
history_id bigint generated always as identity
primary key,
operation_id varchar(36) not null
unique,
operation_type varchar(30) not null,
operation_time timestamp with time zone default now() not null,
from_phone varchar(20),
user_id varchar(21),
-- and a lot of another varchar(x), text and even couple of number, boolean, jsonb, timestamp columns
);
create index operation_history_user_id_operation_time_idx
on accounts_service.operation_history (user_id, operation_time);
create index operation_history_operation_time_idx
on accounts_service.operation_history (operation_time);
I want to make a simple select with a where filter on operation_time (this is a required filter and can be a day or two) as well as additional filters for other columns: commonly, with varchar(x) type.
But my quries are slow:
explain (buffers, analyze)
select *
from operation_history operationh0_
where (null is null or operationh0_.user_id = null)
and operationh0_.operation_time >= '2024-09-30 20:00:00.000000 +00:00'
and operationh0_.operation_time <= '2024-10-02 20:00:00.000000 +00:00'
and (operationh0_.from_phone="+000111223344")
order by operationh0_.operation_time asc, operationh0_.history_id asc
limit 25;
Limit (cost=8063.39..178328.00 rows=25 width=1267) (actual time=174373.106..174374.395 rows=0 loops=1)
Buffers: shared hit=532597 read=1433916
I/O Timings: read=517880.241
-> Incremental Sort (cost=8063.39..198759.76 rows=28 width=1267) (actual time=174373.105..174374.394 rows=0 loops=1)
Sort Key: operation_time, history_id
Presorted Key: operation_time
Full-sort Groups: 1 Sort Method: quicksort Average Memory: 25kB Peak Memory: 25kB
Buffers: shared hit=532597 read=1433916
I/O Timings: read=517880.241
-> Gather Merge (cost=1000.60..198758.50 rows=28 width=1267) (actual time=174373.099..174374.388 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=532597 read=1433916
I/O Timings: read=517880.241
-> Parallel Index Scan using operation_history_operation_time_idx on operation_history operationh0_ (cost=0.57..197755.24 rows=12 width=1267) (actual time=174362.932..174362.933 rows=0 loops=3)
Index Cond: ((operation_time >= '2024-09-30 20:00:00+00'::timestamp with time zone) AND (operation_time <= '2024-10-02 20:00:00+00'::timestamp with time zone))
Filter: ((from_phone)::text="+000111223344"::text)
Rows Removed by Filter: 723711
Buffers: shared hit=532597 read=1433916
I/O Timings: read=517880.241
Planning Time: 0.193 ms
Execution Time: 174374.449 ms
-- for simplicity:
set max_parallel_workers_per_gather = 0;
--It's just simplifying plan, numbers are relevant. Retry the previous query:
Limit (cost=7535.40..189179.35 rows=25 width=1267) (actual time=261432.728..261432.729 rows=0 loops=1)
Buffers: shared hit=374346 read=1591362
I/O Timings: read=257253.065
-> Incremental Sort (cost=7535.40..210976.63 rows=28 width=1267) (actual time=261432.727..261432.727 rows=0 loops=1)
Sort Key: operation_time, history_id
Presorted Key: operation_time
Full-sort Groups: 1 Sort Method: quicksort Average Memory: 25kB Peak Memory: 25kB
Buffers: shared hit=374346 read=1591362
I/O Timings: read=257253.065
-> Index Scan using operation_history_operation_time_idx on operation_history operationh0_ (cost=0.57..210975.37 rows=28 width=1267) (actual time=261432.720..261432.720 rows=0 loops=1)
Index Cond: ((operation_time >= '2024-09-30 20:00:00+00'::timestamp with time zone) AND (operation_time <= '2024-10-02 20:00:00+00'::timestamp with time zone))
Filter: ((from_phone)::text="+000111223344"::text)
Rows Removed by Filter: 2171134
Buffers: shared hit=374346 read=1591362
I/O Timings: read=257253.065
Planning Time: 0.170 ms
Execution Time: 261432.774 ms
So it filtered just 2 171 134
rows and it was more than 4 mins. Seems it is too long, isn’t it?
I tried selecting specific columns (e.g. operation_time, from_phone, to_phone, history_id), it had no effect.
I tried vacuum analyze, it had no effect.
I checked some parameters of postgres, like shared_buffers, work_mem, etc. Changing it has no effect. And also I compared it with pgTune and it’s ok.
Some another info:
SELECT relpages, pg_size_pretty(pg_total_relation_size(oid)) AS table_size
FROM pg_class
WHERE relname="operation_history";
18402644 | 210 GB
select count(*) from operation_history;
352402877
Server drives: AWS gp3
I don’t want to create indexes for all columns because there are massive writes to this table…
Is there any way to optimize it?
Or is it just making a lot of reads from the index and the table and it’s ok and we need to do sharding, etc?
You need to sign in to view this answers
Leave feedback about this