October 22, 2024
Chicago 12, Melborne City, USA
C++

Cortex-M loading 32-bit variable optimization


I’m trying to compile the following test code below, that only writes the 32-bits variable into a pointer. I write it once as byte access, and second time as word access.

void load_data_8(uint32_t value, void* d) {
    uint8_t* d_ptr = d;

    *d_ptr++ = (value>>0)&0xFF;
    *d_ptr++ = (value>>8)&0xFF;
    *d_ptr++ = (value>>16)&0xFF;
    *d_ptr++ = (value>>24)&0xFF;
    
    *d_ptr++ = (value>>24)&0xFF;
    *d_ptr++ = (value>>16)&0xFF;
    *d_ptr++ = (value>>8)&0xFF;
    *d_ptr++ = (value>>0)&0xFF;
}

void load_data_32(uint32_t value, void* d) {
    uint32_t* d_ptr = d;

    *d_ptr = value;
}

Compiler: ARM GCC 11.2.1
Compiler flags: -mcpu=cortex-m7 -O3 (C-M7 has unaligned memory access instructions)
Compiler produces the following:

load_data_8:
        rev     r3, r0
        str     r0, [r1]  @ unaligned
        str     r3, [r1, #4]      @ unaligned
        bx      lr
load_data_32:
        str     r0, [r1]
        bx      lr
main:
        movs    r0, #0
        bx      lr

And if I compile the same code for cortex-m0plus, which has even less capabilities for unaligned memory access, I get this:

Compiler flags: -mcpu=cortex-m0plus -O3

load_data_8:
        push    {r4, lr}
        lsrs    r3, r0, #8
        lsrs    r2, r0, #16
        uxtb    r4, r0
        uxtb    r3, r3
        uxtb    r2, r2
        lsrs    r0, r0, #24
        strb    r4, [r1]
        strb    r3, [r1, #1]
        strb    r2, [r1, #2]
        strb    r0, [r1, #3]
        strb    r0, [r1, #4]
        strb    r2, [r1, #5]
        strb    r3, [r1, #6]
        strb    r4, [r1, #7]
        pop     {r4, pc}
load_data_32:
        str     r0, [r1]
        bx      lr
  • C-M7 test: What is the reason for @ unaligned message in the load_data_8 function for Cortex-M7, but not in the load_data_32? How does compiler know that data pointer in the load_data_32 won’t be unaligned?

  • C-M0+ test: Why it does not produce the same code for load_data_8 and load_data_32, given in both cases we write 32-bits of data in a CPU endianness (little)? What makes it different from core standpoint if the type is 8-bit vs 32-bit, given that memory is in a sequence?



You need to sign in to view this answers

Leave feedback about this

  • Quality
  • Price
  • Service

PROS

+
Add Field

CONS

+
Add Field
Choose Image
Choose Video