does fetching values into local variables allow greater optimization in C/C++?

I’ve often wondered this. Suppose you’ve got a loop in a function or method, like:

for (int i = 0; i < count; ++i)
    ...do stuff including function/method calls...;

Suppose count is some variable external to the function – a global variable, a data member of the object performing the loop, whatever. Not a local variable. It seems to me that the compiler doesn’t know, in general, whether count might change in value across the work that the loop does, and therefore it has to re-fetch the value of count each time through the loop, from whatever memory location it lives in; it can’t, for example, be kept in a register. If you instead wrote:

int local_count = count;

for (int i = 0; i < local_count; ++i)
    ...do stuff including function/method calls...;

then you’re telling the compiler "the value of count should be cached in a local variable and used, even if the original value were to change". That would allow it to be, for example, placed in a register for the duration of the loop.

Assume that the value of count does not, in fact, change for the duration of the loop. Does this difference in coding style make a performance difference or not, with a typical modern compiler? How smart at compilers at figuring out "this value won’t change and doesn’t need to be re-fetched"? Note I’m not asking about volatile, which I understand; I’m asking about the possibility that the do stuff section of the code changes the value of count (which I know doesn’t happen, but the compiler perhaps does not know).

This comes up even in simpler situations like:

use non-local variable foo;
...do stuff...;
use non-local variable foo again;

Assume that I know the value of foo doesn’t change. Should I still cache its value in a local variable for maximal performance? Like:

auto local_foo = foo;

use local_foo;
...do stuff...;
use local_foo again;

I imagine this will depend on the compiler, but I’m hoping some general, useful statements can be made about about how smart/dumb modern compilers tend to be about drawing this inference that a value has not changed, when set to a high optimization level. Does it depend on inlining? Link-time optimization? Other considerations, such as the use of pointers in the the "do stuff" sections that the compiler presumably cannot assume do not point at the external variable in question? Is there any more elegant way of handling this problem than making local-variable copies of stuff all over the place?

(Please don’t reply about premature optimization, tell me not to worry about such tiny performance details, etc.; I am asking specifically about the context where every little bit of performance really does matter for the code in question. I work on simulations that take days or weeks to run, often with an extreme hotspot in a short section of code. And please don’t tell me I ought to hand-code such performance-sensitive code in assembler if I care so much; I’d love to, but my software has to run cross-platform on end-user machines that might be macOS, Linux, or Windows, so assembly is a non-starter, and indeed I don’t know what compiler I’ll be on. But I really do need to squeeze maximum performance out of the compiler, to the extent possible.)

You need to sign in to view this answers

Related Post