October 24, 2024
Chicago 12, Melborne City, USA
templates

Do templates bloat the binary size needlessly?


Binary size can have an indirect effect on performance due to cache usage.

I wrote this bit of code to investigate if compilers would fold identical template code (in this case, the sorting of pointers).

#include <cstdio>
#include <cstdlib>
#include <algorithm>

#define CAST 1

template<typename T>
void f() {
    std::size_t n = 1024;

    T** v = (T**)std::malloc(n * sizeof(T*));

    for(std::size_t i = 0; i < n; ++i) {
        v[i] = (T*)std::malloc(sizeof(T));
        *v[i] = T{};
    }

#if CAST
    std::sort((void**)v, (void**)(v + n));
#else
    std::sort(v, v + n);
#endif

    std::printf("%p, %p\n", v[0], v[n-1]);

    for(std::size_t i = 0; i < n; ++i) {
        std::free(v[i]);
    }

    std::free(v);
}

int main() {
    std::printf("CAST = %d\n", CAST);
    f<double>();
    f<float>();
    f<int>();
    f<short>();
    f<char>();
    f<long>();
    f<unsigned>();
    f<unsigned long>();
}

I compile it with -O3 and then strip the binary. Interestingly, the generated binary size does depend on whether the pointers are cast to void* before being sorted. For Clang++ especially, the binary is much smaller.

CAST G++ 12 Clang++ 15
false 22KB 35KB
true 18KB 19KB

This is odd, because one would think that if sort is inlined, the generated machine code should not depend on CAST. But if it’s not inlined, then the compiler should notice that a bunch of sort<> functions have identical machine code and fold them into one.

But apparently, this does not happen. I wonder why?



You need to sign in to view this answers

Leave feedback about this

  • Quality
  • Price
  • Service

PROS

+
Add Field

CONS

+
Add Field
Choose Image
Choose Video