• arm-gcc, Cortex-M0+, uint64_t and alignment

    From pozz@pozzugno@gmail.com to comp.arch.embedded on Tue Jan 20 13:26:15 2026
    From Newsgroup: comp.arch.embedded

    I just discovered that my arm-gcc assigns an alignment of 8 to a struct
    with uint64_t member.

    First of all: I can't explain why. Cortex-M0+ shouldn't have any special load/store instructions for 64-bits data. I think the uint64_t variable
    is *always* accessed with two separate instructions.

    Second thing. Is it safe to force the alignment of such structs to 4
    with __attribute__((aligned(4)))?

    I have big arrays of structs that contains uint64_t members, so I'm
    thinking how to save some space.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Tue Jan 20 17:07:45 2026
    From Newsgroup: comp.arch.embedded

    On 20/01/2026 13:26, pozz wrote:
    I just discovered that my arm-gcc assigns an alignment of 8 to a struct
    with uint64_t member.

    First of all: I can't explain why. Cortex-M0+ shouldn't have any special load/store instructions for 64-bits data. I think the uint64_t variable
    is *always* accessed with two separate instructions.


    There are other Cortex-M devices that /can/ access 64 bit data with a
    single instruction (though not always as an atomic function).

    Compilers use family ABI's, not ABI's specifically tuned for exact
    devices. The EABI for 32-bit ARM says long long's are 8 byte aligned,
    so that's what is used for all targets that use the EABI. (There's a
    lot to dislike about the EABI - this is not the worst thing.)

    Second thing. Is it safe to force the alignment of such structs to 4
    with __attribute__((aligned(4)))?


    You can't reduce the alignment of a struct or its elements by adding an __aligned_ attribute to the struct itself or any of its fields. The
    best you can do on the struct itself is __attribute__((packed)). But
    that can come with disadvantages, and inefficient use.

    I have big arrays of structs that contains uint64_t members, so I'm
    thinking how to save some space.

    The best way is to organise the fields so that they are naturally
    aligned, and don't have padding for alignment. I like "-Wpadded" to
    tell me if there is unexpected padding.


    What you /can/ do, however, is define a type that is 64 bits, but 4 byte alignment:

    typedef uint64_t __attribute__((aligned(4)) uint64_a;

    Now you can use "uint64_a" instead of "uint64_t", and it will have 4
    byte alignment.


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Grant Edwards@invalid@invalid.invalid to comp.arch.embedded on Tue Jan 20 16:41:26 2026
    From Newsgroup: comp.arch.embedded

    On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

    You can't reduce the alignment of a struct or its elements by adding an __aligned_ attribute to the struct itself or any of its fields. The
    best you can do on the struct itself is __attribute__((packed)). But
    that can come with disadvantages, and inefficient use.

    Yep making a structure aligned is an excellent way to introduce subtle
    bugs that happen when somebody, somewhere passes a pointer to one of
    those structure fields to some library function. Somebody I used to
    work with was very fond of making all of his structures aligned (for
    no apparent reason). Then he would test his code on an X86 desktop
    machine. It worked fine because the X86 support unaligned
    accesses. Then he would move to an ARM target, and it would
    fail. Inevitably the cry "The compiler's broken!" would be heard, and
    I would have to explain to him for the Nth time about misaligned
    accesses on different ARM targets. Some of our targets generate a bus
    fault, some just silently read/write only part of the data.

    That same guy once insisted that with the 32-bit GCC compiler we were
    using "unsigned long variables work, but unsigned variables don't". So
    he was busily changing all of his "unsigned" variables to "unsigned
    long". I printed out the assembly generated for both cases showing
    that it was identical. He then insisted that the linker must be doing
    something to break unsigned integers.

    And then there was the time he decided that cross compiling on a
    single-core Linux host worked but compiling on a dual-core
    didn't. [Both cases using a single-threaded "make".]

    And the time he decided that he needed to upgrade a buch of the Ubuntu
    X11 libraries on the X86 host machine to fix a problem in the ARM
    target.

    --
    Grant

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From pozz@pozzugno@gmail.com to comp.arch.embedded on Tue Jan 20 17:55:31 2026
    From Newsgroup: comp.arch.embedded

    Il 20/01/2026 17:07, David Brown ha scritto:
    On 20/01/2026 13:26, pozz wrote:
    I just discovered that my arm-gcc assigns an alignment of 8 to a
    struct with uint64_t member.

    First of all: I can't explain why. Cortex-M0+ shouldn't have any
    special load/store instructions for 64-bits data. I think the uint64_t
    variable is *always* accessed with two separate instructions.


    There are other Cortex-M devices that /can/ access 64 bit data with a
    single instruction (though not always as an atomic function).

    Compilers use family ABI's, not ABI's specifically tuned for exact devices.  The EABI for 32-bit ARM says long long's are 8 byte aligned,
    so that's what is used for all targets that use the EABI.  (There's a
    lot to dislike about the EABI - this is not the worst thing.)

    So the ABI used by arm gcc is EABI that is valid for a list of Cortex-M devices, a few of these that require 8-byte alignment of 64-bits integers.


    Second thing. Is it safe to force the alignment of such structs to 4
    with __attribute__((aligned(4)))?

    You can't reduce the alignment of a struct or its elements by adding an __aligned_ attribute to the struct itself or any of its fields.  The
    best you can do on the struct itself is __attribute__((packed)).  But
    that can come with disadvantages, and inefficient use.

    But this is the opposite of what you write below!


    I have big arrays of structs that contains uint64_t members, so I'm
    thinking how to save some space.

    The best way is to organise the fields so that they are naturally
    aligned, and don't have padding for alignment.  I like "-Wpadded" to
    tell me if there is unexpected padding.


    What you /can/ do, however, is define a type that is 64 bits, but 4 byte alignment:

    typedef uint64_t __attribute__((aligned(4)) uint64_a;

    Now you can use "uint64_a" instead of "uint64_t", and it will have 4
    byte alignment.

    Before you wrote it's impossible to reduce the alignment from 8 to 4
    with __attribute__((aligned(4))), but now you write it is possible.

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From pozz@pozzugno@gmail.com to comp.arch.embedded on Tue Jan 20 18:09:42 2026
    From Newsgroup: comp.arch.embedded

    Il 20/01/2026 17:41, Grant Edwards ha scritto:
    On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

    You can't reduce the alignment of a struct or its elements by adding an
    __aligned_ attribute to the struct itself or any of its fields. The
    best you can do on the struct itself is __attribute__((packed)). But
    that can come with disadvantages, and inefficient use.

    Yep making a structure aligned is an excellent way to introduce subtle
    bugs that happen when somebody, somewhere passes a pointer to one of
    those structure fields to some library function.

    However, as long as the application runs on Cortex-M0+, the aligned
    version shouldn't introduce issues, should it?


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Tue Jan 20 18:41:58 2026
    From Newsgroup: comp.arch.embedded

    On 20/01/2026 17:41, Grant Edwards wrote:
    On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

    You can't reduce the alignment of a struct or its elements by adding an
    __aligned_ attribute to the struct itself or any of its fields. The
    best you can do on the struct itself is __attribute__((packed)). But
    that can come with disadvantages, and inefficient use.

    Yep making a structure aligned is an excellent way to introduce subtle

    To be clear - you mean making the structure /packed/, not /aligned/, or
    in some other way allowing objects to be placed at a smaller alignment
    than the ABI says.

    bugs that happen when somebody, somewhere passes a pointer to one of
    those structure fields to some library function. Somebody I used to
    work with was very fond of making all of his structures aligned (for
    no apparent reason). Then he would test his code on an X86 desktop
    machine. It worked fine because the X86 support unaligned
    accesses. Then he would move to an ARM target, and it would
    fail. Inevitably the cry "The compiler's broken!" would be heard, and
    I would have to explain to him for the Nth time about misaligned
    accesses on different ARM targets. Some of our targets generate a bus
    fault, some just silently read/write only part of the data.


    Cortex M3 and bigger all handle misaligned accesses without problem
    (albeit possibly at a performance penalty). Cortex M0, M0+ and M1 do
    not support misaligned accesses. On an M4, the compiler should generate normal 32-bit loads and stores for a "packed" struct with 32-bit fields,
    but it should generate byte-by-byte accesses on a Cortex M0.

    It's fine to have packed structs, or types with smaller than normal
    alignment, as long as the compiler knows that's the case. So you don't
    take pointers to fields in a "packed" struct, and if you use something
    like the "uint64_a" type I suggested, it should be accessed by pointers
    to its real type, not pointers to "uint64_t". (In practice, I would
    expect that pointers to uint64_t would work, because the accesses will
    be 32-bit anyway, but you should never lie to your compiler!)

    That same guy once insisted that with the 32-bit GCC compiler we were
    using "unsigned long variables work, but unsigned variables don't". So
    he was busily changing all of his "unsigned" variables to "unsigned
    long". I printed out the assembly generated for both cases showing
    that it was identical. He then insisted that the linker must be doing something to break unsigned integers.

    That is, shall we say, a strange idea from that guy. Perhaps he was
    confused by uint32_t being "unsigned long" on EABI 32-bit ARM, rather
    than "unsigned int" (as it is on 32-bit ARM Linux, and on Windows) ?

    I have often seen people think that "unsigned int" and "unsigned long"
    are the same type on 32-bit ARM, just because they are both 32-bit, and
    get confused when there are compiler complaints about incompatible
    pointers when they are mixed.


    And then there was the time he decided that cross compiling on a
    single-core Linux host worked but compiling on a dual-core
    didn't. [Both cases using a single-threaded "make".]

    And the time he decided that he needed to upgrade a buch of the Ubuntu
    X11 libraries on the X86 host machine to fix a problem in the ARM
    target.


    Someone is a little confused :-)


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Tue Jan 20 18:44:06 2026
    From Newsgroup: comp.arch.embedded

    On 20/01/2026 18:09, pozz wrote:
    Il 20/01/2026 17:41, Grant Edwards ha scritto:
    On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

    You can't reduce the alignment of a struct or its elements by adding an
    __aligned_ attribute to the struct itself or any of its fields.  The
    best you can do on the struct itself is __attribute__((packed)).  But
    that can come with disadvantages, and inefficient use.

    Yep making a structure aligned is an excellent way to introduce subtle
    bugs that happen when somebody, somewhere passes a pointer to one of
    those structure fields to some library function.

    However, as long as the application runs on Cortex-M0+, the aligned
    version shouldn't introduce issues, should it?



    Correctly aligned data is never a problem. /Misaligned/ data is a problem.

    The Cortex-M0+ cannot access misaligned data directly. But if the
    compiler knows that it is misaligned - by "packed" struct, or "aligned" attribute on the typedef - it should break apart the accesses into bytes
    or 16-bit half-words as necessary. (Aligning a uint64_t to 4 byte
    alignment will not be a problem.)

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Grant Edwards@invalid@invalid.invalid to comp.arch.embedded on Tue Jan 20 17:48:42 2026
    From Newsgroup: comp.arch.embedded

    On 2026-01-20, pozz <pozzugno@gmail.com> wrote:
    Il 20/01/2026 17:41, Grant Edwards ha scritto:
    On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

    You can't reduce the alignment of a struct or its elements by adding an
    __aligned_ attribute to the struct itself or any of its fields. The
    best you can do on the struct itself is __attribute__((packed)). But
    that can come with disadvantages, and inefficient use.

    Yep making a structure aligned is an excellent way to introduce subtle
    bugs that happen when somebody, somewhere passes a pointer to one of
    those structure fields to some library function.

    Aargh, my bad. I meant that making a strucutre _packed_ is an
    excellent way to introduce subtle bugs that happen when somebody,
    somewhere passes a pointer to one of those structure fields to some
    library function.

    However, as long as the application runs on Cortex-M0+, the aligned
    version shouldn't introduce issues, should it?

    A non-packed structure should always be OK.

    A packed structure will work fine as long as it's being accessed by
    code that "knows" it's packed. You can pass a pointer to packed
    struct to a function as long as it's declared in that function as a
    pointer to a packed struct: the compiler will generate extra code to
    deal with accesses to values that are misaligned due to the
    packing. However, passing a pointer to an packed field structure
    (e.g. to a uint64_t) to a function where it was declared as a normal
    "uint64_t *p" can cause failures on ARM targets. It will work OK on
    X86. I think it used to work OK on m68k also. IIRC SPARC failed in
    similar ways to ARM.

    --
    Grant



    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Grant Edwards@invalid@invalid.invalid to comp.arch.embedded on Tue Jan 20 18:10:19 2026
    From Newsgroup: comp.arch.embedded

    On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

    Cortex M3 and bigger all handle misaligned accesses without problem
    (albeit possibly at a performance penalty).

    FWIW, the M3 can be configured to generate a fault on unaligned
    accesses, so whether it works or not depends on your low-level init
    code. I believe that unaligned-fault-enable feature is disabled by
    default at reset. Also, The M3 only supports non-world aligned
    accesses for normal signle store/load instructions. LDM/STM and
    LDRD/STRD will fault on non-word aligned access.

    --
    Grant

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Tue Jan 20 22:24:35 2026
    From Newsgroup: comp.arch.embedded

    On 20/01/2026 17:55, pozz wrote:
    Il 20/01/2026 17:07, David Brown ha scritto:
    On 20/01/2026 13:26, pozz wrote:
    I just discovered that my arm-gcc assigns an alignment of 8 to a
    struct with uint64_t member.

    First of all: I can't explain why. Cortex-M0+ shouldn't have any
    special load/store instructions for 64-bits data. I think the
    uint64_t variable is *always* accessed with two separate instructions.


    There are other Cortex-M devices that /can/ access 64 bit data with a
    single instruction (though not always as an atomic function).

    Compilers use family ABI's, not ABI's specifically tuned for exact
    devices.  The EABI for 32-bit ARM says long long's are 8 byte aligned,
    so that's what is used for all targets that use the EABI.  (There's a
    lot to dislike about the EABI - this is not the worst thing.)

    So the ABI used by arm gcc is EABI that is valid for a list of Cortex-M devices, a few of these that require 8-byte alignment of 64-bits integers.


    I don't think any of the 32-bit Cortex-M cores actually need 8 byte
    alignment in the hardware - it could have been for compatibility with
    64-bit devices.


    Second thing. Is it safe to force the alignment of such structs to 4
    with __attribute__((aligned(4)))?

    You can't reduce the alignment of a struct or its elements by adding
    an __aligned_ attribute to the struct itself or any of its fields.
    The best you can do on the struct itself is __attribute__((packed)).
    But that can come with disadvantages, and inefficient use.

    But this is the opposite of what you write below!

    No, but I might have been unclear.

    Adding the "aligned" attribute to the /struct/, or to the field members /directly/, does not help you here. Adding it to a new typedef does.



    I have big arrays of structs that contains uint64_t members, so I'm
    thinking how to save some space.

    The best way is to organise the fields so that they are naturally
    aligned, and don't have padding for alignment.  I like "-Wpadded" to
    tell me if there is unexpected padding.


    What you /can/ do, however, is define a type that is 64 bits, but 4
    byte alignment:

    typedef uint64_t __attribute__((aligned(4)) uint64_a;

    Now you can use "uint64_a" instead of "uint64_t", and it will have 4
    byte alignment.

    Before you wrote it's impossible to reduce the alignment from 8 to 4
    with __attribute__((aligned(4))), but now you write it is possible.


    Putting it in a typedef lets you change the alignment.

    See <https://godbolt.org/z/3fac7n7Yo>, and look at the code generated
    for the M0+ and the M4 to see how "packed" and "aligned" affects things.



    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Tue Jan 20 22:32:01 2026
    From Newsgroup: comp.arch.embedded

    On 20/01/2026 19:10, Grant Edwards wrote:
    On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

    Cortex M3 and bigger all handle misaligned accesses without problem
    (albeit possibly at a performance penalty).

    FWIW, the M3 can be configured to generate a fault on unaligned
    accesses, so whether it works or not depends on your low-level init
    code. I believe that unaligned-fault-enable feature is disabled by
    default at reset.

    I did not know that.

    Also, The M3 only supports non-world aligned
    accesses for normal signle store/load instructions. LDM/STM and
    LDRD/STRD will fault on non-word aligned access.


    Yes. Of course, the LDM/STM are primarily used for pushing and popping registers on the stack, so you are always going to be aligned there.

    In the godbolt.org link I posted in a reply to Pozz, we can see that
    when the compiler knows the uint64_t is aligned at least to 4 bytes, it
    uses LDRD, but when it does not know that it is 4 bytes aligned, it uses
    two LDR instructions.

    (As an aside, I find it annoying that STRD can be interrupted in the
    middle - it means you don't have an atomic 64-bit store. LDRD can also
    be interrupted in the middle, but as it is restarted, it gives you a
    64-bit atomic read.)

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Grant Edwards@invalid@invalid.invalid to comp.arch.embedded on Wed Jan 21 03:38:29 2026
    From Newsgroup: comp.arch.embedded

    On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

    (As an aside, I find it annoying that STRD can be interrupted in the
    middle - it means you don't have an atomic 64-bit store. LDRD can also
    be interrupted in the middle, but as it is restarted, it gives you a
    64-bit atomic read.)

    Yes, I just noticed that in the manual the other day, and it seemed
    like an odd decision.

    --
    Grant


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Wed Jan 21 08:54:34 2026
    From Newsgroup: comp.arch.embedded

    On 21/01/2026 04:38, Grant Edwards wrote:
    On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

    (As an aside, I find it annoying that STRD can be interrupted in the
    middle - it means you don't have an atomic 64-bit store. LDRD can also
    be interrupted in the middle, but as it is restarted, it gives you a
    64-bit atomic read.)

    Yes, I just noticed that in the manual the other day, and it seemed
    like an odd decision.


    It's not odd from the implementation viewpoint, but disappointing from
    the user viewpoint. The double loads and stores are implemented as a
    sort of combination of two instructions, or at least two actions.
    Disabling interrupts in the middle of the instructions would mean
    additional hardware logic. (I think all longer-running instructions,
    like divisions, are interruptible.) When the interrupt returns, the instructions are simply restarted.

    That gives an atomic 64-bit load, so it lets you safely read 64-bit data
    that is changed by an interrupt or higher-priority thread - unlike using
    two separate 32-bit load instructions. (Using a volatile read appears
    to force the use of LDRD on gcc for M3 and above, while non-volatile
    reads might be split and re-arranged depending on the surrounding code.)

    An interrupted double store is, obviously, a very different matter -
    your interrupt routines or pre-empting threads see half-written data.

    My guess as to the decision process is that making these instructions non-interruptible would have taken more hardware, and weakened
    guarantees on interrupt latency. But if they had asked /me/, I'd have
    chosen to make STRD non-interruptible :-)



    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From pozz@pozzugno@gmail.com to comp.arch.embedded on Wed Jan 21 09:11:38 2026
    From Newsgroup: comp.arch.embedded

    Il 20/01/2026 18:44, David Brown ha scritto:
    On 20/01/2026 18:09, pozz wrote:
    Il 20/01/2026 17:41, Grant Edwards ha scritto:
    On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

    You can't reduce the alignment of a struct or its elements by adding an >>>> __aligned_ attribute to the struct itself or any of its fields.  The
    best you can do on the struct itself is __attribute__((packed)).  But >>>> that can come with disadvantages, and inefficient use.

    Yep making a structure aligned is an excellent way to introduce subtle
    bugs that happen when somebody, somewhere passes a pointer to one of
    those structure fields to some library function.

    However, as long as the application runs on Cortex-M0+, the aligned
    version shouldn't introduce issues, should it?



    Correctly aligned data is never a problem.  /Misaligned/ data is a problem.

    The Cortex-M0+ cannot access misaligned data directly.  But if the
    compiler knows that it is misaligned - by "packed" struct, or "aligned" attribute on the typedef - it should break apart the accesses into bytes
    or 16-bit half-words as necessary.  (Aligning a uint64_t to 4 byte alignment will not be a problem.)

    However for Cortex-M0+ uint64_t aligned at 4 bytes is:
    - aligned for the core (two 4-bytes aligned accesses are required)
    - misaligned for the ABI and the compiler

    We agree that forcing the gcc compiler to consider 4-bytes as the
    required alignment of uint64_t (using aligned attribute) is always safe. However, what really changes in the binary output?

    In some cases, the address of uint64_t can change from 8-bytes to
    4-bytes aligned address (because we instructed it to do so). What about
    the code that accesses uint64_t aligned to 4-bytes? Is it identical
    between 4- and 8-bytes alignment requirement? I think so, because in
    both case, the compiler should add two load/store 4-bytes instructions.

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Wed Jan 21 10:02:10 2026
    From Newsgroup: comp.arch.embedded

    On 21/01/2026 09:11, pozz wrote:
    Il 20/01/2026 18:44, David Brown ha scritto:
    On 20/01/2026 18:09, pozz wrote:
    Il 20/01/2026 17:41, Grant Edwards ha scritto:
    On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

    You can't reduce the alignment of a struct or its elements by
    adding an
    __aligned_ attribute to the struct itself or any of its fields.  The >>>>> best you can do on the struct itself is __attribute__((packed)).  But >>>>> that can come with disadvantages, and inefficient use.

    Yep making a structure aligned is an excellent way to introduce subtle >>>> bugs that happen when somebody, somewhere passes a pointer to one of
    those structure fields to some library function.

    However, as long as the application runs on Cortex-M0+, the aligned
    version shouldn't introduce issues, should it?



    Correctly aligned data is never a problem.  /Misaligned/ data is a
    problem.

    The Cortex-M0+ cannot access misaligned data directly.  But if the
    compiler knows that it is misaligned - by "packed" struct, or
    "aligned" attribute on the typedef - it should break apart the
    accesses into bytes or 16-bit half-words as necessary.  (Aligning a
    uint64_t to 4 byte alignment will not be a problem.)

    However for Cortex-M0+ uint64_t aligned at 4 bytes is:
    - aligned for the core (two 4-bytes aligned accesses are required)

    Yes. As far as I know, the M0+ core does not need any alignment greater
    than 4 for any purpose. (But I might not know everything about the
    core!) There can be alignment requirements for other things, such as DMA.

    - misaligned for the ABI and the compiler

    Yes.


    We agree that forcing the gcc compiler to consider 4-bytes as the
    required alignment of uint64_t (using aligned attribute) is always safe.

    No.

    It will almost always be safe, but you don't have any guarantees. The compiler knows that if "p" is of type "uint64_t *", then "(uintptr_t) p
    & 0x07" will always be zero. Is it likely that you would have anything
    in your code where that is relevant, and also that the compiler would
    generate code that relies on that assumption? No, it is very unlikely.

    But there is a general principle that you should not lie to your
    compiler - don't write code that executes UB, breaks ABIs, or is
    otherwise breaking the contract you have with the compiler unless you
    are using compiler features that let you keep everything honest.

    Part of that is that code you are writing now for the M0+ might be
    copied or adapted to a different target at a different time. Maybe on a different core, the same data will be read using some kind of SIMD or
    vector instruction that /does/ require 8-byte alignment. Don't mess
    these things without telling your compiler. And don't mess with them
    without telling future maintainers and programmers using the code
    (including your future self).

    I would be extremely surprised to find code that fails to work on an M0+ because of a uint64_t pointer that is 4-byte aligned but not 8-byte
    aligned. But if /I/ want to use 64-bit integers with 4-byte alignments,
    I'd use the typedef'd aligned type for the object type and for any
    relevant pointers.


    However, what really changes in the binary output?

    In some cases, the address of uint64_t can change from 8-bytes to
    4-bytes aligned address (because we instructed it to do so). What about
    the code that accesses uint64_t aligned to 4-bytes? Is it identical
    between 4- and 8-bytes alignment requirement? I think so, because in
    both case, the compiler should add two load/store 4-bytes instructions.


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From pozz@pozzugno@gmail.com to comp.arch.embedded on Wed Jan 21 15:58:03 2026
    From Newsgroup: comp.arch.embedded

    Il 21/01/2026 10:02, David Brown ha scritto:
    On 21/01/2026 09:11, pozz wrote:
    Il 20/01/2026 18:44, David Brown ha scritto:
    On 20/01/2026 18:09, pozz wrote:
    Il 20/01/2026 17:41, Grant Edwards ha scritto:
    On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

    You can't reduce the alignment of a struct or its elements by
    adding an
    __aligned_ attribute to the struct itself or any of its fields.  The >>>>>> best you can do on the struct itself is __attribute__((packed)).  But >>>>>> that can come with disadvantages, and inefficient use.

    Yep making a structure aligned is an excellent way to introduce subtle >>>>> bugs that happen when somebody, somewhere passes a pointer to one of >>>>> those structure fields to some library function.

    However, as long as the application runs on Cortex-M0+, the aligned
    version shouldn't introduce issues, should it?



    Correctly aligned data is never a problem.  /Misaligned/ data is a
    problem.

    The Cortex-M0+ cannot access misaligned data directly.  But if the
    compiler knows that it is misaligned - by "packed" struct, or
    "aligned" attribute on the typedef - it should break apart the
    accesses into bytes or 16-bit half-words as necessary.  (Aligning a
    uint64_t to 4 byte alignment will not be a problem.)

    However for Cortex-M0+ uint64_t aligned at 4 bytes is:
    - aligned for the core (two 4-bytes aligned accesses are required)

    Yes.  As far as I know, the M0+ core does not need any alignment greater than 4 for any purpose.  (But I might not know everything about the core!)  There can be alignment requirements for other things, such as DMA.

    - misaligned for the ABI and the compiler

    Yes.


    We agree that forcing the gcc compiler to consider 4-bytes as the
    required alignment of uint64_t (using aligned attribute) is always safe.

    No.

    It will almost always be safe, but you don't have any guarantees.  The compiler knows that if "p" is of type "uint64_t *", then "(uintptr_t) p
    & 0x07" will always be zero.  Is it likely that you would have anything
    in your code where that is relevant, and also that the compiler would generate code that relies on that assumption?  No, it is very unlikely.

    But there is a general principle that you should not lie to your
    compiler - don't write code that executes UB, breaks ABIs, or is
    otherwise breaking the contract you have with the compiler unless you
    are using compiler features that let you keep everything honest.

    Part of that is that code you are writing now for the M0+ might be
    copied or adapted to a different target at a different time.  Maybe on a different core, the same data will be read using some kind of SIMD or
    vector instruction that /does/ require 8-byte alignment.  Don't mess
    these things without telling your compiler.  And don't mess with them without telling future maintainers and programmers using the code
    (including your future self).

    But it is exactly what I wanted to do: explictly tell the compiler to
    align uint64_t at a 4-bytes address (as I wrote, with attribute align).
    I didn't think to lie my best friend compiler.

    What I wanted to know is if there were other issues or drawback, such as
    more instructions penalty. From the goldbot link that you share in
    another post, it seems there's a penalty of a single instruction (it's strange, it seems the compiler needs to save the struct pointer to r3,
    before loading the two halves of the word, but only if uint64_t is
    aligned to 4-bytes).


    I would be extremely surprised to find code that fails to work on an M0+ because of a uint64_t pointer that is 4-byte aligned but not 8-byte aligned.  But if /I/ want to use 64-bit integers with 4-byte alignments, I'd use the typedef'd aligned type for the object type and for any
    relevant pointers.

    Yes, of course. Even if I don't understand why the compiler isn't able
    to align at 4-bytes address the uint64_t member in struct B.

    However, what really changes in the binary output?

    In some cases, the address of uint64_t can change from 8-bytes to
    4-bytes aligned address (because we instructed it to do so). What
    about the code that accesses uint64_t aligned to 4-bytes? Is it
    identical between 4- and 8-bytes alignment requirement? I think so,
    because in both case, the compiler should add two load/store 4-bytes
    instructions.



    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Wed Jan 21 17:13:54 2026
    From Newsgroup: comp.arch.embedded

    On 21/01/2026 15:58, pozz wrote:
    Il 21/01/2026 10:02, David Brown ha scritto:
    On 21/01/2026 09:11, pozz wrote:
    Il 20/01/2026 18:44, David Brown ha scritto:
    On 20/01/2026 18:09, pozz wrote:
    Il 20/01/2026 17:41, Grant Edwards ha scritto:
    On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

    You can't reduce the alignment of a struct or its elements by
    adding an
    __aligned_ attribute to the struct itself or any of its fields.  The >>>>>>> best you can do on the struct itself is __attribute__((packed)). >>>>>>> But
    that can come with disadvantages, and inefficient use.

    Yep making a structure aligned is an excellent way to introduce
    subtle
    bugs that happen when somebody, somewhere passes a pointer to one of >>>>>> those structure fields to some library function.

    However, as long as the application runs on Cortex-M0+, the aligned >>>>> version shouldn't introduce issues, should it?



    Correctly aligned data is never a problem.  /Misaligned/ data is a
    problem.

    The Cortex-M0+ cannot access misaligned data directly.  But if the
    compiler knows that it is misaligned - by "packed" struct, or
    "aligned" attribute on the typedef - it should break apart the
    accesses into bytes or 16-bit half-words as necessary.  (Aligning a
    uint64_t to 4 byte alignment will not be a problem.)

    However for Cortex-M0+ uint64_t aligned at 4 bytes is:
    - aligned for the core (two 4-bytes aligned accesses are required)

    Yes.  As far as I know, the M0+ core does not need any alignment
    greater than 4 for any purpose.  (But I might not know everything
    about the core!)  There can be alignment requirements for other
    things, such as DMA.

    - misaligned for the ABI and the compiler

    Yes.


    We agree that forcing the gcc compiler to consider 4-bytes as the
    required alignment of uint64_t (using aligned attribute) is always safe. >>
    No.

    It will almost always be safe, but you don't have any guarantees.  The
    compiler knows that if "p" is of type "uint64_t *", then "(uintptr_t)
    p & 0x07" will always be zero.  Is it likely that you would have
    anything in your code where that is relevant, and also that the
    compiler would generate code that relies on that assumption?  No, it
    is very unlikely.

    But there is a general principle that you should not lie to your
    compiler - don't write code that executes UB, breaks ABIs, or is
    otherwise breaking the contract you have with the compiler unless you
    are using compiler features that let you keep everything honest.

    Part of that is that code you are writing now for the M0+ might be
    copied or adapted to a different target at a different time.  Maybe on
    a different core, the same data will be read using some kind of SIMD
    or vector instruction that /does/ require 8-byte alignment.  Don't
    mess these things without telling your compiler.  And don't mess with
    them without telling future maintainers and programmers using the code
    (including your future self).

    But it is exactly what I wanted to do: explictly tell the compiler to
    align uint64_t at a 4-bytes address (as I wrote, with attribute align).
    I didn't think to lie my best friend compiler.


    uint64_t on 32-bit EABI ARM has an alignment of 8 bytes. That's cut in
    stone, and you cannot change it (short of adding a new ABI to the
    toolchain). If you try to use uint64_t objects that are not 8-byte
    aligned, or try to use pointers that are not 8-byte aligned to access
    uint64_t types, you are lying to your compiler.

    If you make a new type that is like a uint64_t but with an "aligned(4)" attribute, you have a /new/ type. And that type will work just like you
    want - it is an 8 byte unsigned integer with a 4 byte alignment. As
    long as you use that consistently, you'll be fine.

    What I wanted to know is if there were other issues or drawback, such as more instructions penalty.

    The drawback from trying to use an object of a type with an improper
    alignment is that you have UB. What more reasons do you want for not
    doing it?

    From the goldbot link that you share in
    another post, it seems there's a penalty of a single instruction (it's strange, it seems the compiler needs to save the struct pointer to r3, before loading the two halves of the word, but only if uint64_t is
    aligned to 4-bytes).


    The compiler is not perfect here - there is definitely an extra
    instruction because it is reading the low word first. (clang reads the
    low word first for uint64_t as well, meaning it gives worse code for A
    and B as well.) In real code, rather than a brief test snippet, other
    factors could mean this does not happen - it's only because the pointer happens to be in r0 that you see it here.

    But there's no harm in filing a gcc bug on this, looking for an obvious improvement.


    I would be extremely surprised to find code that fails to work on an
    M0+ because of a uint64_t pointer that is 4-byte aligned but not
    8-byte aligned.  But if /I/ want to use 64-bit integers with 4-byte
    alignments, I'd use the typedef'd aligned type for the object type and
    for any relevant pointers.

    Yes, of course. Even if I don't understand why the compiler isn't able
    to align at 4-bytes address the uint64_t member in struct B.


    It can't align the uint64_t member because the EABI says uint64_t (or,
    rather, unsigned long long) is 8 bytes aligned. gcc didn't make those
    rules - ARM did.

    As I briefly mentioned before, there are a number of very poor choices
    in the EABI (and the 32-bit ARM ABI used for Linux). This is far from
    the worst.

    However, what really changes in the binary output?

    In some cases, the address of uint64_t can change from 8-bytes to
    4-bytes aligned address (because we instructed it to do so). What
    about the code that accesses uint64_t aligned to 4-bytes? Is it
    identical between 4- and 8-bytes alignment requirement? I think so,
    because in both case, the compiler should add two load/store 4-bytes
    instructions.




    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From pozz@pozzugno@gmail.com to comp.arch.embedded on Wed Jan 21 17:57:39 2026
    From Newsgroup: comp.arch.embedded

    Il 21/01/2026 17:13, David Brown ha scritto:
    On 21/01/2026 15:58, pozz wrote:
    Il 21/01/2026 10:02, David Brown ha scritto:
    On 21/01/2026 09:11, pozz wrote:
    Il 20/01/2026 18:44, David Brown ha scritto:
    On 20/01/2026 18:09, pozz wrote:
    Il 20/01/2026 17:41, Grant Edwards ha scritto:
    On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

    You can't reduce the alignment of a struct or its elements by >>>>>>>> adding an
    __aligned_ attribute to the struct itself or any of its fields. >>>>>>>> The
    best you can do on the struct itself is __attribute__((packed)). >>>>>>>> But
    that can come with disadvantages, and inefficient use.

    Yep making a structure aligned is an excellent way to introduce >>>>>>> subtle
    bugs that happen when somebody, somewhere passes a pointer to one of >>>>>>> those structure fields to some library function.

    However, as long as the application runs on Cortex-M0+, the
    aligned version shouldn't introduce issues, should it?



    Correctly aligned data is never a problem.  /Misaligned/ data is a >>>>> problem.

    The Cortex-M0+ cannot access misaligned data directly.  But if the >>>>> compiler knows that it is misaligned - by "packed" struct, or
    "aligned" attribute on the typedef - it should break apart the
    accesses into bytes or 16-bit half-words as necessary.  (Aligning a >>>>> uint64_t to 4 byte alignment will not be a problem.)

    However for Cortex-M0+ uint64_t aligned at 4 bytes is:
    - aligned for the core (two 4-bytes aligned accesses are required)

    Yes.  As far as I know, the M0+ core does not need any alignment
    greater than 4 for any purpose.  (But I might not know everything
    about the core!)  There can be alignment requirements for other
    things, such as DMA.

    - misaligned for the ABI and the compiler

    Yes.


    We agree that forcing the gcc compiler to consider 4-bytes as the
    required alignment of uint64_t (using aligned attribute) is always
    safe.

    No.

    It will almost always be safe, but you don't have any guarantees.
    The compiler knows that if "p" is of type "uint64_t *", then
    "(uintptr_t) p & 0x07" will always be zero.  Is it likely that you
    would have anything in your code where that is relevant, and also
    that the compiler would generate code that relies on that
    assumption?  No, it is very unlikely.

    But there is a general principle that you should not lie to your
    compiler - don't write code that executes UB, breaks ABIs, or is
    otherwise breaking the contract you have with the compiler unless you
    are using compiler features that let you keep everything honest.

    Part of that is that code you are writing now for the M0+ might be
    copied or adapted to a different target at a different time.  Maybe
    on a different core, the same data will be read using some kind of
    SIMD or vector instruction that /does/ require 8-byte alignment.
    Don't mess these things without telling your compiler.  And don't
    mess with them without telling future maintainers and programmers
    using the code (including your future self).

    But it is exactly what I wanted to do: explictly tell the compiler to
    align uint64_t at a 4-bytes address (as I wrote, with attribute
    align). I didn't think to lie my best friend compiler.


    uint64_t on 32-bit EABI ARM has an alignment of 8 bytes.  That's cut in stone, and you cannot change it (short of adding a new ABI to the toolchain).  If you try to use uint64_t objects that are not 8-byte aligned, or try to use pointers that are not 8-byte aligned to access uint64_t types, you are lying to your compiler.

    If you make a new type that is like a uint64_t but with an "aligned(4)" attribute, you have a /new/ type.  And that type will work just like you want - it is an 8 byte unsigned integer with a 4 byte alignment.  As
    long as you use that consistently, you'll be fine.

    What I wanted to know is if there were other issues or drawback, such
    as more instructions penalty.

    The drawback from trying to use an object of a type with an improper alignment is that you have UB.  What more reasons do you want for not
    doing it?

    Most probably I can't explain what I want to say. I don't want to use an *improper* alignment (different from the one that gcc really is using).
    I want to know what happens when I *instruct* the compiler to use a
    4-bytes alignment for uint64_t in the context of Cortex-M0+ core only.

    In other words, is it completely safe to use, as you suggested,

    typedef uint64_t __attribute__((align(4))) uint64_a;

    ???

    From what you wrote, I think yes. Maybe just a very small optimization penalty.


    From the goldbot link that you share in another post, it seems there's
    a penalty of a single instruction (it's strange, it seems the compiler
    needs to save the struct pointer to r3, before loading the two halves
    of the word, but only if uint64_t is aligned to 4-bytes).


    The compiler is not perfect here - there is definitely an extra
    instruction because it is reading the low word first.  (clang reads the
    low word first for uint64_t as well, meaning it gives worse code for A
    and B as well.)  In real code, rather than a brief test snippet, other factors could mean this does not happen - it's only because the pointer happens to be in r0 that you see it here.

    But there's no harm in filing a gcc bug on this, looking for an obvious improvement.


    I would be extremely surprised to find code that fails to work on an
    M0+ because of a uint64_t pointer that is 4-byte aligned but not
    8-byte aligned.  But if /I/ want to use 64-bit integers with 4-byte
    alignments, I'd use the typedef'd aligned type for the object type
    and for any relevant pointers.

    Yes, of course. Even if I don't understand why the compiler isn't able
    to align at 4-bytes address the uint64_t member in struct B.


    It can't align the uint64_t member because the EABI says uint64_t (or, rather, unsigned long long) is 8 bytes aligned.  gcc didn't make those rules - ARM did.

    But struct B is defined with correct alignment attribute for uint64_t
    member. I tried also:

    struct B {
    uint32_t x;
    uint64_t y __attribute__((aligned(4)));
    };

    The struct size is always 16, so y is placed at offset 8 and not 4. It
    seems to me gcc isn't able to respect the aligned attribute of 4 bytes
    when it is specified inside the struct definition.

    I don't see many differences with:

    typedef __attribute__((aligned(4))) uint64_t uint64_a;

    struct C {
    uint32_t x;
    uint64_a y;
    };


    As I briefly mentioned before, there are a number of very poor choices
    in the EABI (and the 32-bit ARM ABI used for Linux).  This is far from
    the worst.

    However, what really changes in the binary output?

    In some cases, the address of uint64_t can change from 8-bytes to
    4-bytes aligned address (because we instructed it to do so). What
    about the code that accesses uint64_t aligned to 4-bytes? Is it
    identical between 4- and 8-bytes alignment requirement? I think so,
    because in both case, the compiler should add two load/store 4-bytes
    instructions.





    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Thu Jan 22 10:03:21 2026
    From Newsgroup: comp.arch.embedded

    On 21/01/2026 17:57, pozz wrote:
    Il 21/01/2026 17:13, David Brown ha scritto:
    On 21/01/2026 15:58, pozz wrote:
    Il 21/01/2026 10:02, David Brown ha scritto:
    On 21/01/2026 09:11, pozz wrote:

    <snip for brevity>

    Most probably I can't explain what I want to say. I don't want to use an *improper* alignment (different from the one that gcc really is using).
    I want to know what happens when I *instruct* the compiler to use a
    4-bytes alignment for uint64_t in the context of Cortex-M0+ core only.


    I think we may have been talking slightly past each other, so that
    re-wording was helpful.

    In other words, is it completely safe to use, as you suggested,

       typedef uint64_t __attribute__((align(4))) uint64_a;

    ???

    Baring compiler bugs, yes, that is completely safe. When the compiler
    lets you make such a type, and use it, it is the compiler's
    responsibility to get the details right. You should never see issues
    from the compiler's knowledge and assumptions of alignments, and it
    should generate instructions that work on the target (for example, if
    the target hardware required 8-byte alignment for 64-bit loads and
    stores, then the compiler would generate two 32-bit accesses instead).

    And for the Cortex M series, 4-byte alignment is the maximum needed for working code (though there might be efficiency differences on some of
    the biggest M cores that have 64-bit buses internally, or when data
    caches are used).


    From what you wrote, I think yes. Maybe just a very small optimization penalty.


    Yes. And I think that is a "missed optimisation opportunity" bug. I
    suspect (or speculate), but have not looked at the compiler code to be
    sure, that the code generator generally accesses the low half of 64-bit
    data first. And then it may have specific optimisations ("peephole" optimisations) for re-ordering the accesses for "long long" types in
    certain circumstances, saving a register and an instruction. However,
    that would apply only to the specific type - and while "uint64_a" works
    a lot like "unsigned long long", it is not that exact type, and won't
    trigger the same optimisation.


    From the goldbot link that you share in another post, it seems
    there's a penalty of a single instruction (it's strange, it seems the
    compiler needs to save the struct pointer to r3, before loading the
    two halves of the word, but only if uint64_t is aligned to 4-bytes).


    The compiler is not perfect here - there is definitely an extra
    instruction because it is reading the low word first.  (clang reads
    the low word first for uint64_t as well, meaning it gives worse code
    for A and B as well.)  In real code, rather than a brief test snippet,
    other factors could mean this does not happen - it's only because the
    pointer happens to be in r0 that you see it here.

    But there's no harm in filing a gcc bug on this, looking for an
    obvious improvement.


    I would be extremely surprised to find code that fails to work on an
    M0+ because of a uint64_t pointer that is 4-byte aligned but not
    8-byte aligned.  But if /I/ want to use 64-bit integers with 4-byte
    alignments, I'd use the typedef'd aligned type for the object type
    and for any relevant pointers.

    Yes, of course. Even if I don't understand why the compiler isn't
    able to align at 4-bytes address the uint64_t member in struct B.


    It can't align the uint64_t member because the EABI says uint64_t (or,
    rather, unsigned long long) is 8 bytes aligned.  gcc didn't make those
    rules - ARM did.

    But struct B is defined with correct alignment attribute for uint64_t member. I tried also:

    struct B {
        uint32_t x;
        uint64_t y __attribute__((aligned(4)));
    };

    The struct size is always 16, so y is placed at offset 8 and not 4. It
    seems to me gcc isn't able to respect the aligned attribute of 4 bytes
    when it is specified inside the struct definition.

    That is my conclusion too. (I tried the "aligned" attribute in every
    place I could.) The only place it worked was on a typedef for the new "uint64_a" type.


    I don't see many differences with:

    typedef __attribute__((aligned(4))) uint64_t uint64_a;

    struct C {
        uint32_t x;
        uint64_a y;
    };


    It certainly seems inconsistent to me that it works on the typedef, and
    not directly in the struct definition. After all, a typedef does not
    actually define a new type (it's a silly name) - it merely defines an
    alias or shortcut name for a type. So it would seem logical that using "uint64_a" or "__attribute__((aligned(4))) uint64_t" in the struct
    definition would mean exactly the same thing. But apparently not. gcc attributes are not part of the normal C grammar, so there's no standard
    to fall back on here.


    As I briefly mentioned before, there are a number of very poor choices
    in the EABI (and the 32-bit ARM ABI used for Linux).  This is far from
    the worst.

    However, what really changes in the binary output?

    In some cases, the address of uint64_t can change from 8-bytes to
    4-bytes aligned address (because we instructed it to do so). What
    about the code that accesses uint64_t aligned to 4-bytes? Is it
    identical between 4- and 8-bytes alignment requirement? I think so, >>>>> because in both case, the compiler should add two load/store
    4-bytes instructions.






    --- Synchronet 3.21b-Linux NewsLink 1.2