October 23, 2024
Chicago 12, Melborne City, USA
C#

Why modern calling conventions pass variadic arguments in registers?


If we look at a few modern calling conventions, like x86-64 SysV style or AArch64 style (document aapcs64.pdf titled "Procedure Call Standard for the Arm® 64-bit Architecture"), we see explicit notes that variadic arguments are passed in the same way as other arguments. For example, a function call open(path, mode, cflags) on x86-64 will get path in RDI, mode in RSI and (the only variadic one) cflags in RDX.

There is no question with passing static argument set in registers, it is good for resource saving. But if we look into a function that then interprets arguments and so calls va_start for them, we will see that va_start is converted into putting all possible arguments (typically, much more than present really) onto stack; for example, full emulation of printf via vfprintf starts with (I compacted similar rows to avoid too long listings):

my_printf:
        endbr64
; nearly unconditional saving
        subq    $216, %rsp
        movq    %rsi, 40(%rsp)
<...>
        movq    %r9, 72(%rsp)
        testb   %al, %al
        je      .L2
        movaps  %xmm0, 80(%rsp)
<...>
        movaps  %xmm7, 192(%rsp)
; repacking into registers for enclosed vfprintf
.L2:
        movq    %fs:40, %rax
        movq    %rax, 24(%rsp)
        xorl    %eax, %eax
        movl    $8, (%rsp)
        movl    $48, 4(%rsp)
        leaq    224(%rsp), %rax
        movq    %rax, 8(%rsp)
        leaq    32(%rsp), %rax
        movq    %rax, 16(%rsp)
        movq    %rsp, %rcx
        movq    %rdi, %rdx
        movl    $1, %esi
; finally, call the function
        movq    stdout(%rip), %rdi
        call    __vfprintf_chk@PLT
... skipped epilogue

Here 192 bytes of VA frame. Similarly, AArch64 version pushes 184 bytes (x1..x7 and q0..q7).

If the variadic tail of any function call had been always put on stack, things would have got much simpler in code and cheaper in runtime, because all packing and copying had not been needed. va_start would have been reduced to a single move of variadic list starting location (in stack) to a variable. This is how it really worked with i386 (where all arguments were passed on stack). Assembly output of the same trivial wrapper for Linux/i386:

my_printf:
        pushl   %ebx
        subl    $8, %esp
        call    __x86.get_pc_thunk.bx
        addl    $_GLOBAL_OFFSET_TABLE_, %ebx
        leal    20(%esp), %eax ; <--- This is va_start
        pushl   %eax ; VA pointer pushed for vfprintf
        pushl   20(%esp)
        pushl   $1
        movl    stdout@GOT(%ebx), %eax
        pushl   (%eax)
        call    __vfprintf_chk@PLT

Here, the question: why variadic arguments implementation, at least for x86-64 and aarch64, is that complicated and resource wasting?

(I could imagine that there were cases when two styles, both with fixed arguments and with a variadic list, should have been equally allowed in function declarations of the same function. But I donʼt know a case for it. The mentioned open is unlikely the one.)



You need to sign in to view this answers

Leave feedback about this

  • Quality
  • Price
  • Service

PROS

+
Add Field

CONS

+
Add Field
Choose Image
Choose Video