How to efficiently FOR-loop over a huge SELECT result set in PL/pgSQL

In a PL/pgSQL stored procedure, I would like to calculate multiple averages (simple, weighted, etc) over different periods of time and over different number of data points, all in a single loop.

Therefore, I cannot set the LIMIT boundary for the SELECT that the loop loops over – because the biggest time period can still have not enough data points, so I can’t put LIMIT 1000 nor can I do WHERE date_created >= (one year ago).

Thus I would like not to limit the SELECT query in any way, and instead loop through it until I’m done and have calculated all the averages that I needed over all possible chunks of data (either time or quantity based), and then I just call EXIT to stop the looping.

My concern is, what if, without LIMIT or WHERE constrains, the SELECT returns 100 million rows while I only end up needing first 10k of them – will my stored PL/pgSQL procedure actually run the huge 100 million row selection and then loop over it, or will it loop only as needed?

I do understand that the FOR loop uses cursors internally. But my question is about the result set that needs to be built before a cursor can loop through it. So, is it a bad idea to run a FOR loop over a SELECT query with no LIMIT/WHERE constrains that could potentially return 100M rows, and then EXIT as needed? Or should I just process data in chunks of ~1k items?

The SELECT statement looks backwards starting from some point in time, and that is why it has:

where date_created < START_TIME
order by date_created desc

You need to sign in to view this answers

Related Post