Forum: War Ensemble BBS

arm-gcc, Cortex-M0+, uint64_t and alignment

From pozz@pozzugno@gmail.com to comp.arch.embedded on Tue Jan 20 13:26:15 2026

From Newsgroup: comp.arch.embedded

I just discovered that my arm-gcc assigns an alignment of 8 to a struct
with uint64_t member.

First of all: I can't explain why. Cortex-M0+ shouldn't have any special load/store instructions for 64-bits data. I think the uint64_t variable
is *always* accessed with two separate instructions.

Second thing. Is it safe to force the alignment of such structs to 4
with __attribute__((aligned(4)))?

I have big arrays of structs that contains uint64_t members, so I'm
thinking how to save some space.
--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Tue Jan 20 17:07:45 2026

From Newsgroup: comp.arch.embedded

On 20/01/2026 13:26, pozz wrote:

I just discovered that my arm-gcc assigns an alignment of 8 to a struct
with uint64_t member.

First of all: I can't explain why. Cortex-M0+ shouldn't have any special load/store instructions for 64-bits data. I think the uint64_t variable
is *always* accessed with two separate instructions.

There are other Cortex-M devices that /can/ access 64 bit data with a
single instruction (though not always as an atomic function).

Compilers use family ABI's, not ABI's specifically tuned for exact
devices. The EABI for 32-bit ARM says long long's are 8 byte aligned,
so that's what is used for all targets that use the EABI. (There's a
lot to dislike about the EABI - this is not the worst thing.)

Second thing. Is it safe to force the alignment of such structs to 4
with __attribute__((aligned(4)))?

You can't reduce the alignment of a struct or its elements by adding an __aligned_ attribute to the struct itself or any of its fields. The
best you can do on the struct itself is __attribute__((packed)). But
that can come with disadvantages, and inefficient use.

I have big arrays of structs that contains uint64_t members, so I'm
thinking how to save some space.

The best way is to organise the fields so that they are naturally
aligned, and don't have padding for alignment. I like "-Wpadded" to
tell me if there is unexpected padding.

What you /can/ do, however, is define a type that is 64 bits, but 4 byte alignment:

typedef uint64_t __attribute__((aligned(4)) uint64_a;

Now you can use "uint64_a" instead of "uint64_t", and it will have 4
byte alignment.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Grant Edwards@invalid@invalid.invalid to comp.arch.embedded on Tue Jan 20 16:41:26 2026

From Newsgroup: comp.arch.embedded

On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

You can't reduce the alignment of a struct or its elements by adding an __aligned_ attribute to the struct itself or any of its fields. The
best you can do on the struct itself is __attribute__((packed)). But
that can come with disadvantages, and inefficient use.

Yep making a structure aligned is an excellent way to introduce subtle
bugs that happen when somebody, somewhere passes a pointer to one of
those structure fields to some library function. Somebody I used to
work with was very fond of making all of his structures aligned (for
no apparent reason). Then he would test his code on an X86 desktop
machine. It worked fine because the X86 support unaligned
accesses. Then he would move to an ARM target, and it would
fail. Inevitably the cry "The compiler's broken!" would be heard, and
I would have to explain to him for the Nth time about misaligned
accesses on different ARM targets. Some of our targets generate a bus
fault, some just silently read/write only part of the data.

That same guy once insisted that with the 32-bit GCC compiler we were
using "unsigned long variables work, but unsigned variables don't". So
he was busily changing all of his "unsigned" variables to "unsigned
long". I printed out the assembly generated for both cases showing
that it was identical. He then insisted that the linker must be doing
something to break unsigned integers.

And then there was the time he decided that cross compiling on a
single-core Linux host worked but compiling on a dual-core
didn't. [Both cases using a single-threaded "make".]

And the time he decided that he needed to upgrade a buch of the Ubuntu
X11 libraries on the X86 host machine to fix a problem in the ARM
target.

--
Grant

--- Synchronet 3.21b-Linux NewsLink 1.2

From pozz@pozzugno@gmail.com to comp.arch.embedded on Tue Jan 20 17:55:31 2026

From Newsgroup: comp.arch.embedded

Il 20/01/2026 17:07, David Brown ha scritto:

On 20/01/2026 13:26, pozz wrote:

I just discovered that my arm-gcc assigns an alignment of 8 to a
struct with uint64_t member.

First of all: I can't explain why. Cortex-M0+ shouldn't have any
special load/store instructions for 64-bits data. I think the uint64_t
variable is *always* accessed with two separate instructions.

There are other Cortex-M devices that /can/ access 64 bit data with a
single instruction (though not always as an atomic function).

Compilers use family ABI's, not ABI's specifically tuned for exact devices. The EABI for 32-bit ARM says long long's are 8 byte aligned,
so that's what is used for all targets that use the EABI. (There's a
lot to dislike about the EABI - this is not the worst thing.)

So the ABI used by arm gcc is EABI that is valid for a list of Cortex-M devices, a few of these that require 8-byte alignment of 64-bits integers.

Second thing. Is it safe to force the alignment of such structs to 4
with __attribute__((aligned(4)))?

You can't reduce the alignment of a struct or its elements by adding an __aligned_ attribute to the struct itself or any of its fields. The
best you can do on the struct itself is __attribute__((packed)). But
that can come with disadvantages, and inefficient use.

But this is the opposite of what you write below!

I have big arrays of structs that contains uint64_t members, so I'm
thinking how to save some space.

The best way is to organise the fields so that they are naturally
aligned, and don't have padding for alignment. I like "-Wpadded" to
tell me if there is unexpected padding.

What you /can/ do, however, is define a type that is 64 bits, but 4 byte alignment:

typedef uint64_t __attribute__((aligned(4)) uint64_a;

Now you can use "uint64_a" instead of "uint64_t", and it will have 4
byte alignment.

Before you wrote it's impossible to reduce the alignment from 8 to 4
with __attribute__((aligned(4))), but now you write it is possible.

--- Synchronet 3.21b-Linux NewsLink 1.2

From pozz@pozzugno@gmail.com to comp.arch.embedded on Tue Jan 20 18:09:42 2026

From Newsgroup: comp.arch.embedded

Il 20/01/2026 17:41, Grant Edwards ha scritto:

On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

You can't reduce the alignment of a struct or its elements by adding an
__aligned_ attribute to the struct itself or any of its fields. The
best you can do on the struct itself is __attribute__((packed)). But
that can come with disadvantages, and inefficient use.

Yep making a structure aligned is an excellent way to introduce subtle
bugs that happen when somebody, somewhere passes a pointer to one of
those structure fields to some library function.

However, as long as the application runs on Cortex-M0+, the aligned
version shouldn't introduce issues, should it?

--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Tue Jan 20 18:41:58 2026

From Newsgroup: comp.arch.embedded

On 20/01/2026 17:41, Grant Edwards wrote:

On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

You can't reduce the alignment of a struct or its elements by adding an
__aligned_ attribute to the struct itself or any of its fields. The
best you can do on the struct itself is __attribute__((packed)). But
that can come with disadvantages, and inefficient use.

Yep making a structure aligned is an excellent way to introduce subtle

To be clear - you mean making the structure /packed/, not /aligned/, or
in some other way allowing objects to be placed at a smaller alignment
than the ABI says.

bugs that happen when somebody, somewhere passes a pointer to one of
those structure fields to some library function. Somebody I used to
work with was very fond of making all of his structures aligned (for
no apparent reason). Then he would test his code on an X86 desktop
machine. It worked fine because the X86 support unaligned
accesses. Then he would move to an ARM target, and it would
fail. Inevitably the cry "The compiler's broken!" would be heard, and
I would have to explain to him for the Nth time about misaligned
accesses on different ARM targets. Some of our targets generate a bus
fault, some just silently read/write only part of the data.

Cortex M3 and bigger all handle misaligned accesses without problem
(albeit possibly at a performance penalty). Cortex M0, M0+ and M1 do
not support misaligned accesses. On an M4, the compiler should generate normal 32-bit loads and stores for a "packed" struct with 32-bit fields,
but it should generate byte-by-byte accesses on a Cortex M0.

It's fine to have packed structs, or types with smaller than normal
alignment, as long as the compiler knows that's the case. So you don't
take pointers to fields in a "packed" struct, and if you use something
like the "uint64_a" type I suggested, it should be accessed by pointers
to its real type, not pointers to "uint64_t". (In practice, I would
expect that pointers to uint64_t would work, because the accesses will
be 32-bit anyway, but you should never lie to your compiler!)

That same guy once insisted that with the 32-bit GCC compiler we were
using "unsigned long variables work, but unsigned variables don't". So
he was busily changing all of his "unsigned" variables to "unsigned
long". I printed out the assembly generated for both cases showing
that it was identical. He then insisted that the linker must be doing something to break unsigned integers.

That is, shall we say, a strange idea from that guy. Perhaps he was
confused by uint32_t being "unsigned long" on EABI 32-bit ARM, rather
than "unsigned int" (as it is on 32-bit ARM Linux, and on Windows) ?

I have often seen people think that "unsigned int" and "unsigned long"
are the same type on 32-bit ARM, just because they are both 32-bit, and
get confused when there are compiler complaints about incompatible
pointers when they are mixed.

And then there was the time he decided that cross compiling on a
single-core Linux host worked but compiling on a dual-core
didn't. [Both cases using a single-threaded "make".]

And the time he decided that he needed to upgrade a buch of the Ubuntu
X11 libraries on the X86 host machine to fix a problem in the ARM
target.

Someone is a little confused :-)

--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Tue Jan 20 18:44:06 2026

From Newsgroup: comp.arch.embedded

On 20/01/2026 18:09, pozz wrote:

Il 20/01/2026 17:41, Grant Edwards ha scritto:

On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

You can't reduce the alignment of a struct or its elements by adding an
__aligned_ attribute to the struct itself or any of its fields. The
best you can do on the struct itself is __attribute__((packed)). But
that can come with disadvantages, and inefficient use.

Yep making a structure aligned is an excellent way to introduce subtle
bugs that happen when somebody, somewhere passes a pointer to one of
those structure fields to some library function.

However, as long as the application runs on Cortex-M0+, the aligned
version shouldn't introduce issues, should it?

Correctly aligned data is never a problem. /Misaligned/ data is a problem.

The Cortex-M0+ cannot access misaligned data directly. But if the
compiler knows that it is misaligned - by "packed" struct, or "aligned" attribute on the typedef - it should break apart the accesses into bytes
or 16-bit half-words as necessary. (Aligning a uint64_t to 4 byte
alignment will not be a problem.)

--- Synchronet 3.21b-Linux NewsLink 1.2

From Grant Edwards@invalid@invalid.invalid to comp.arch.embedded on Tue Jan 20 17:48:42 2026

From Newsgroup: comp.arch.embedded

On 2026-01-20, pozz <pozzugno@gmail.com> wrote:

Il 20/01/2026 17:41, Grant Edwards ha scritto:

On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

You can't reduce the alignment of a struct or its elements by adding an
__aligned_ attribute to the struct itself or any of its fields. The
best you can do on the struct itself is __attribute__((packed)). But
that can come with disadvantages, and inefficient use.

Yep making a structure aligned is an excellent way to introduce subtle
bugs that happen when somebody, somewhere passes a pointer to one of
those structure fields to some library function.

Aargh, my bad. I meant that making a strucutre _packed_ is an
excellent way to introduce subtle bugs that happen when somebody,
somewhere passes a pointer to one of those structure fields to some
library function.

However, as long as the application runs on Cortex-M0+, the aligned
version shouldn't introduce issues, should it?

A non-packed structure should always be OK.

A packed structure will work fine as long as it's being accessed by
code that "knows" it's packed. You can pass a pointer to packed
struct to a function as long as it's declared in that function as a
pointer to a packed struct: the compiler will generate extra code to
deal with accesses to values that are misaligned due to the
packing. However, passing a pointer to an packed field structure
(e.g. to a uint64_t) to a function where it was declared as a normal
"uint64_t *p" can cause failures on ARM targets. It will work OK on
X86. I think it used to work OK on m68k also. IIRC SPARC failed in
similar ways to ARM.

--
Grant

--- Synchronet 3.21b-Linux NewsLink 1.2

From Grant Edwards@invalid@invalid.invalid to comp.arch.embedded on Tue Jan 20 18:10:19 2026

From Newsgroup: comp.arch.embedded

On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

Cortex M3 and bigger all handle misaligned accesses without problem
(albeit possibly at a performance penalty).

FWIW, the M3 can be configured to generate a fault on unaligned
accesses, so whether it works or not depends on your low-level init
code. I believe that unaligned-fault-enable feature is disabled by
default at reset. Also, The M3 only supports non-world aligned
accesses for normal signle store/load instructions. LDM/STM and
LDRD/STRD will fault on non-word aligned access.

--
Grant

--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Tue Jan 20 22:24:35 2026

From Newsgroup: comp.arch.embedded

On 20/01/2026 17:55, pozz wrote:

Il 20/01/2026 17:07, David Brown ha scritto:

On 20/01/2026 13:26, pozz wrote:

I just discovered that my arm-gcc assigns an alignment of 8 to a
struct with uint64_t member.

First of all: I can't explain why. Cortex-M0+ shouldn't have any
special load/store instructions for 64-bits data. I think the
uint64_t variable is *always* accessed with two separate instructions.

There are other Cortex-M devices that /can/ access 64 bit data with a
single instruction (though not always as an atomic function).

Compilers use family ABI's, not ABI's specifically tuned for exact
devices. The EABI for 32-bit ARM says long long's are 8 byte aligned,
so that's what is used for all targets that use the EABI. (There's a
lot to dislike about the EABI - this is not the worst thing.)

So the ABI used by arm gcc is EABI that is valid for a list of Cortex-M devices, a few of these that require 8-byte alignment of 64-bits integers.

I don't think any of the 32-bit Cortex-M cores actually need 8 byte
alignment in the hardware - it could have been for compatibility with
64-bit devices.

Second thing. Is it safe to force the alignment of such structs to 4
with __attribute__((aligned(4)))?

You can't reduce the alignment of a struct or its elements by adding
an __aligned_ attribute to the struct itself or any of its fields.
The best you can do on the struct itself is __attribute__((packed)).
But that can come with disadvantages, and inefficient use.

But this is the opposite of what you write below!

No, but I might have been unclear.

Adding the "aligned" attribute to the /struct/, or to the field members /directly/, does not help you here. Adding it to a new typedef does.

I have big arrays of structs that contains uint64_t members, so I'm
thinking how to save some space.

The best way is to organise the fields so that they are naturally
aligned, and don't have padding for alignment. I like "-Wpadded" to
tell me if there is unexpected padding.

What you /can/ do, however, is define a type that is 64 bits, but 4
byte alignment:

typedef uint64_t __attribute__((aligned(4)) uint64_a;

Now you can use "uint64_a" instead of "uint64_t", and it will have 4
byte alignment.

Before you wrote it's impossible to reduce the alignment from 8 to 4
with __attribute__((aligned(4))), but now you write it is possible.

Putting it in a typedef lets you change the alignment.

See <https://godbolt.org/z/3fac7n7Yo>, and look at the code generated
for the M0+ and the M4 to see how "packed" and "aligned" affects things.

--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Tue Jan 20 22:32:01 2026

From Newsgroup: comp.arch.embedded

On 20/01/2026 19:10, Grant Edwards wrote:

On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

Cortex M3 and bigger all handle misaligned accesses without problem
(albeit possibly at a performance penalty).

FWIW, the M3 can be configured to generate a fault on unaligned
accesses, so whether it works or not depends on your low-level init
code. I believe that unaligned-fault-enable feature is disabled by
default at reset.

I did not know that.

Also, The M3 only supports non-world aligned
accesses for normal signle store/load instructions. LDM/STM and
LDRD/STRD will fault on non-word aligned access.

Yes. Of course, the LDM/STM are primarily used for pushing and popping registers on the stack, so you are always going to be aligned there.

In the godbolt.org link I posted in a reply to Pozz, we can see that
when the compiler knows the uint64_t is aligned at least to 4 bytes, it
uses LDRD, but when it does not know that it is 4 bytes aligned, it uses
two LDR instructions.

(As an aside, I find it annoying that STRD can be interrupted in the
middle - it means you don't have an atomic 64-bit store. LDRD can also
be interrupted in the middle, but as it is restarted, it gives you a
64-bit atomic read.)

--- Synchronet 3.21b-Linux NewsLink 1.2

From Grant Edwards@invalid@invalid.invalid to comp.arch.embedded on Wed Jan 21 03:38:29 2026

From Newsgroup: comp.arch.embedded

On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

(As an aside, I find it annoying that STRD can be interrupted in the
middle - it means you don't have an atomic 64-bit store. LDRD can also
be interrupted in the middle, but as it is restarted, it gives you a
64-bit atomic read.)

Yes, I just noticed that in the manual the other day, and it seemed
like an odd decision.

--
Grant

--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Wed Jan 21 08:54:34 2026

From Newsgroup: comp.arch.embedded

On 21/01/2026 04:38, Grant Edwards wrote:

On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

(As an aside, I find it annoying that STRD can be interrupted in the
middle - it means you don't have an atomic 64-bit store. LDRD can also
be interrupted in the middle, but as it is restarted, it gives you a
64-bit atomic read.)

Yes, I just noticed that in the manual the other day, and it seemed
like an odd decision.

It's not odd from the implementation viewpoint, but disappointing from
the user viewpoint. The double loads and stores are implemented as a
sort of combination of two instructions, or at least two actions.
Disabling interrupts in the middle of the instructions would mean
additional hardware logic. (I think all longer-running instructions,
like divisions, are interruptible.) When the interrupt returns, the instructions are simply restarted.

That gives an atomic 64-bit load, so it lets you safely read 64-bit data
that is changed by an interrupt or higher-priority thread - unlike using
two separate 32-bit load instructions. (Using a volatile read appears
to force the use of LDRD on gcc for M3 and above, while non-volatile
reads might be split and re-arranged depending on the surrounding code.)

An interrupted double store is, obviously, a very different matter -
your interrupt routines or pre-empting threads see half-written data.

My guess as to the decision process is that making these instructions non-interruptible would have taken more hardware, and weakened
guarantees on interrupt latency. But if they had asked /me/, I'd have
chosen to make STRD non-interruptible :-)

--- Synchronet 3.21b-Linux NewsLink 1.2

From pozz@pozzugno@gmail.com to comp.arch.embedded on Wed Jan 21 09:11:38 2026

From Newsgroup: comp.arch.embedded

Il 20/01/2026 18:44, David Brown ha scritto:

On 20/01/2026 18:09, pozz wrote:

Il 20/01/2026 17:41, Grant Edwards ha scritto:

On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

You can't reduce the alignment of a struct or its elements by adding an >>>> __aligned_ attribute to the struct itself or any of its fields. The
best you can do on the struct itself is __attribute__((packed)). But >>>> that can come with disadvantages, and inefficient use.

Yep making a structure aligned is an excellent way to introduce subtle
bugs that happen when somebody, somewhere passes a pointer to one of
those structure fields to some library function.

However, as long as the application runs on Cortex-M0+, the aligned
version shouldn't introduce issues, should it?

Correctly aligned data is never a problem. /Misaligned/ data is a problem.

The Cortex-M0+ cannot access misaligned data directly. But if the
compiler knows that it is misaligned - by "packed" struct, or "aligned" attribute on the typedef - it should break apart the accesses into bytes
or 16-bit half-words as necessary. (Aligning a uint64_t to 4 byte alignment will not be a problem.)

However for Cortex-M0+ uint64_t aligned at 4 bytes is:
- aligned for the core (two 4-bytes aligned accesses are required)
- misaligned for the ABI and the compiler

We agree that forcing the gcc compiler to consider 4-bytes as the
required alignment of uint64_t (using aligned attribute) is always safe. However, what really changes in the binary output?

In some cases, the address of uint64_t can change from 8-bytes to
4-bytes aligned address (because we instructed it to do so). What about
the code that accesses uint64_t aligned to 4-bytes? Is it identical
between 4- and 8-bytes alignment requirement? I think so, because in
both case, the compiler should add two load/store 4-bytes instructions.

--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Wed Jan 21 10:02:10 2026

From Newsgroup: comp.arch.embedded

On 21/01/2026 09:11, pozz wrote:

Il 20/01/2026 18:44, David Brown ha scritto:

On 20/01/2026 18:09, pozz wrote:

Il 20/01/2026 17:41, Grant Edwards ha scritto:

On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

You can't reduce the alignment of a struct or its elements by
adding an
__aligned_ attribute to the struct itself or any of its fields. The >>>>> best you can do on the struct itself is __attribute__((packed)). But >>>>> that can come with disadvantages, and inefficient use.

Yep making a structure aligned is an excellent way to introduce subtle >>>> bugs that happen when somebody, somewhere passes a pointer to one of
those structure fields to some library function.

However, as long as the application runs on Cortex-M0+, the aligned
version shouldn't introduce issues, should it?

Correctly aligned data is never a problem. /Misaligned/ data is a
problem.

The Cortex-M0+ cannot access misaligned data directly. But if the
compiler knows that it is misaligned - by "packed" struct, or
"aligned" attribute on the typedef - it should break apart the
accesses into bytes or 16-bit half-words as necessary. (Aligning a
uint64_t to 4 byte alignment will not be a problem.)

However for Cortex-M0+ uint64_t aligned at 4 bytes is:
- aligned for the core (two 4-bytes aligned accesses are required)

Yes. As far as I know, the M0+ core does not need any alignment greater
than 4 for any purpose. (But I might not know everything about the
core!) There can be alignment requirements for other things, such as DMA.

- misaligned for the ABI and the compiler

Yes.

We agree that forcing the gcc compiler to consider 4-bytes as the
required alignment of uint64_t (using aligned attribute) is always safe.

No.

It will almost always be safe, but you don't have any guarantees. The compiler knows that if "p" is of type "uint64_t *", then "(uintptr_t) p
& 0x07" will always be zero. Is it likely that you would have anything
in your code where that is relevant, and also that the compiler would
generate code that relies on that assumption? No, it is very unlikely.

But there is a general principle that you should not lie to your
compiler - don't write code that executes UB, breaks ABIs, or is
otherwise breaking the contract you have with the compiler unless you
are using compiler features that let you keep everything honest.

Part of that is that code you are writing now for the M0+ might be
copied or adapted to a different target at a different time. Maybe on a different core, the same data will be read using some kind of SIMD or
vector instruction that /does/ require 8-byte alignment. Don't mess
these things without telling your compiler. And don't mess with them
without telling future maintainers and programmers using the code
(including your future self).

I would be extremely surprised to find code that fails to work on an M0+ because of a uint64_t pointer that is 4-byte aligned but not 8-byte
aligned. But if /I/ want to use 64-bit integers with 4-byte alignments,
I'd use the typedef'd aligned type for the object type and for any
relevant pointers.

However, what really changes in the binary output?

In some cases, the address of uint64_t can change from 8-bytes to
4-bytes aligned address (because we instructed it to do so). What about
the code that accesses uint64_t aligned to 4-bytes? Is it identical
between 4- and 8-bytes alignment requirement? I think so, because in
both case, the compiler should add two load/store 4-bytes instructions.

--- Synchronet 3.21b-Linux NewsLink 1.2

From pozz@pozzugno@gmail.com to comp.arch.embedded on Wed Jan 21 15:58:03 2026

From Newsgroup: comp.arch.embedded

Il 21/01/2026 10:02, David Brown ha scritto:

On 21/01/2026 09:11, pozz wrote:

Il 20/01/2026 18:44, David Brown ha scritto:

On 20/01/2026 18:09, pozz wrote:

Il 20/01/2026 17:41, Grant Edwards ha scritto:

On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

You can't reduce the alignment of a struct or its elements by
adding an
__aligned_ attribute to the struct itself or any of its fields. The >>>>>> best you can do on the struct itself is __attribute__((packed)). But >>>>>> that can come with disadvantages, and inefficient use.

Yep making a structure aligned is an excellent way to introduce subtle >>>>> bugs that happen when somebody, somewhere passes a pointer to one of >>>>> those structure fields to some library function.

However, as long as the application runs on Cortex-M0+, the aligned
version shouldn't introduce issues, should it?

Correctly aligned data is never a problem. /Misaligned/ data is a
problem.

The Cortex-M0+ cannot access misaligned data directly. But if the
compiler knows that it is misaligned - by "packed" struct, or
"aligned" attribute on the typedef - it should break apart the
accesses into bytes or 16-bit half-words as necessary. (Aligning a
uint64_t to 4 byte alignment will not be a problem.)

However for Cortex-M0+ uint64_t aligned at 4 bytes is:
- aligned for the core (two 4-bytes aligned accesses are required)

Yes. As far as I know, the M0+ core does not need any alignment greater than 4 for any purpose. (But I might not know everything about the core!) There can be alignment requirements for other things, such as DMA.

- misaligned for the ABI and the compiler

Yes.

We agree that forcing the gcc compiler to consider 4-bytes as the
required alignment of uint64_t (using aligned attribute) is always safe.

No.

It will almost always be safe, but you don't have any guarantees. The compiler knows that if "p" is of type "uint64_t *", then "(uintptr_t) p
& 0x07" will always be zero. Is it likely that you would have anything
in your code where that is relevant, and also that the compiler would generate code that relies on that assumption? No, it is very unlikely.

But there is a general principle that you should not lie to your
compiler - don't write code that executes UB, breaks ABIs, or is
otherwise breaking the contract you have with the compiler unless you
are using compiler features that let you keep everything honest.

Part of that is that code you are writing now for the M0+ might be
copied or adapted to a different target at a different time. Maybe on a different core, the same data will be read using some kind of SIMD or
vector instruction that /does/ require 8-byte alignment. Don't mess
these things without telling your compiler. And don't mess with them without telling future maintainers and programmers using the code
(including your future self).

But it is exactly what I wanted to do: explictly tell the compiler to
align uint64_t at a 4-bytes address (as I wrote, with attribute align).
I didn't think to lie my best friend compiler.

What I wanted to know is if there were other issues or drawback, such as
more instructions penalty. From the goldbot link that you share in
another post, it seems there's a penalty of a single instruction (it's strange, it seems the compiler needs to save the struct pointer to r3,
before loading the two halves of the word, but only if uint64_t is
aligned to 4-bytes).

I would be extremely surprised to find code that fails to work on an M0+ because of a uint64_t pointer that is 4-byte aligned but not 8-byte aligned. But if /I/ want to use 64-bit integers with 4-byte alignments, I'd use the typedef'd aligned type for the object type and for any
relevant pointers.

Yes, of course. Even if I don't understand why the compiler isn't able
to align at 4-bytes address the uint64_t member in struct B.

However, what really changes in the binary output?

In some cases, the address of uint64_t can change from 8-bytes to
4-bytes aligned address (because we instructed it to do so). What
about the code that accesses uint64_t aligned to 4-bytes? Is it
identical between 4- and 8-bytes alignment requirement? I think so,
because in both case, the compiler should add two load/store 4-bytes
instructions.

--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Wed Jan 21 17:13:54 2026

From Newsgroup: comp.arch.embedded

On 21/01/2026 15:58, pozz wrote:

Il 21/01/2026 10:02, David Brown ha scritto:

On 21/01/2026 09:11, pozz wrote:

Il 20/01/2026 18:44, David Brown ha scritto:

On 20/01/2026 18:09, pozz wrote:

Il 20/01/2026 17:41, Grant Edwards ha scritto:

On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

You can't reduce the alignment of a struct or its elements by
adding an
__aligned_ attribute to the struct itself or any of its fields. The >>>>>>> best you can do on the struct itself is __attribute__((packed)). >>>>>>> But
that can come with disadvantages, and inefficient use.

Yep making a structure aligned is an excellent way to introduce
subtle
bugs that happen when somebody, somewhere passes a pointer to one of >>>>>> those structure fields to some library function.

However, as long as the application runs on Cortex-M0+, the aligned >>>>> version shouldn't introduce issues, should it?

Correctly aligned data is never a problem. /Misaligned/ data is a
problem.

The Cortex-M0+ cannot access misaligned data directly. But if the
compiler knows that it is misaligned - by "packed" struct, or
"aligned" attribute on the typedef - it should break apart the
accesses into bytes or 16-bit half-words as necessary. (Aligning a
uint64_t to 4 byte alignment will not be a problem.)

However for Cortex-M0+ uint64_t aligned at 4 bytes is:
- aligned for the core (two 4-bytes aligned accesses are required)

Yes. As far as I know, the M0+ core does not need any alignment
greater than 4 for any purpose. (But I might not know everything
about the core!) There can be alignment requirements for other
things, such as DMA.

- misaligned for the ABI and the compiler

Yes.

We agree that forcing the gcc compiler to consider 4-bytes as the
required alignment of uint64_t (using aligned attribute) is always safe. >>

No.

It will almost always be safe, but you don't have any guarantees. The
compiler knows that if "p" is of type "uint64_t *", then "(uintptr_t)
p & 0x07" will always be zero. Is it likely that you would have
anything in your code where that is relevant, and also that the
compiler would generate code that relies on that assumption? No, it
is very unlikely.

But there is a general principle that you should not lie to your
compiler - don't write code that executes UB, breaks ABIs, or is
otherwise breaking the contract you have with the compiler unless you
are using compiler features that let you keep everything honest.

Part of that is that code you are writing now for the M0+ might be
copied or adapted to a different target at a different time. Maybe on
a different core, the same data will be read using some kind of SIMD
or vector instruction that /does/ require 8-byte alignment. Don't
mess these things without telling your compiler. And don't mess with
them without telling future maintainers and programmers using the code
(including your future self).

But it is exactly what I wanted to do: explictly tell the compiler to
align uint64_t at a 4-bytes address (as I wrote, with attribute align).
I didn't think to lie my best friend compiler.

uint64_t on 32-bit EABI ARM has an alignment of 8 bytes. That's cut in
stone, and you cannot change it (short of adding a new ABI to the
toolchain). If you try to use uint64_t objects that are not 8-byte
aligned, or try to use pointers that are not 8-byte aligned to access
uint64_t types, you are lying to your compiler.

If you make a new type that is like a uint64_t but with an "aligned(4)" attribute, you have a /new/ type. And that type will work just like you
want - it is an 8 byte unsigned integer with a 4 byte alignment. As
long as you use that consistently, you'll be fine.

What I wanted to know is if there were other issues or drawback, such as more instructions penalty.

The drawback from trying to use an object of a type with an improper
alignment is that you have UB. What more reasons do you want for not
doing it?

From the goldbot link that you share in
another post, it seems there's a penalty of a single instruction (it's strange, it seems the compiler needs to save the struct pointer to r3, before loading the two halves of the word, but only if uint64_t is
aligned to 4-bytes).

The compiler is not perfect here - there is definitely an extra
instruction because it is reading the low word first. (clang reads the
low word first for uint64_t as well, meaning it gives worse code for A
and B as well.) In real code, rather than a brief test snippet, other
factors could mean this does not happen - it's only because the pointer happens to be in r0 that you see it here.

But there's no harm in filing a gcc bug on this, looking for an obvious improvement.

I would be extremely surprised to find code that fails to work on an
M0+ because of a uint64_t pointer that is 4-byte aligned but not
8-byte aligned. But if /I/ want to use 64-bit integers with 4-byte
alignments, I'd use the typedef'd aligned type for the object type and
for any relevant pointers.

Yes, of course. Even if I don't understand why the compiler isn't able
to align at 4-bytes address the uint64_t member in struct B.

It can't align the uint64_t member because the EABI says uint64_t (or,
rather, unsigned long long) is 8 bytes aligned. gcc didn't make those
rules - ARM did.

As I briefly mentioned before, there are a number of very poor choices
in the EABI (and the 32-bit ARM ABI used for Linux). This is far from
the worst.

However, what really changes in the binary output?

In some cases, the address of uint64_t can change from 8-bytes to
4-bytes aligned address (because we instructed it to do so). What
about the code that accesses uint64_t aligned to 4-bytes? Is it
identical between 4- and 8-bytes alignment requirement? I think so,
because in both case, the compiler should add two load/store 4-bytes
instructions.

--- Synchronet 3.21b-Linux NewsLink 1.2

From pozz@pozzugno@gmail.com to comp.arch.embedded on Wed Jan 21 17:57:39 2026

From Newsgroup: comp.arch.embedded

Il 21/01/2026 17:13, David Brown ha scritto:

On 21/01/2026 15:58, pozz wrote:

Il 21/01/2026 10:02, David Brown ha scritto:

On 21/01/2026 09:11, pozz wrote:

Il 20/01/2026 18:44, David Brown ha scritto:

On 20/01/2026 18:09, pozz wrote:

Il 20/01/2026 17:41, Grant Edwards ha scritto:

On 2026-01-20, David Brown <david.brown@hesbynett.no> wrote:

You can't reduce the alignment of a struct or its elements by >>>>>>>> adding an
__aligned_ attribute to the struct itself or any of its fields. >>>>>>>> The
best you can do on the struct itself is __attribute__((packed)). >>>>>>>> But
that can come with disadvantages, and inefficient use.

Yep making a structure aligned is an excellent way to introduce >>>>>>> subtle
bugs that happen when somebody, somewhere passes a pointer to one of >>>>>>> those structure fields to some library function.

However, as long as the application runs on Cortex-M0+, the
aligned version shouldn't introduce issues, should it?

Correctly aligned data is never a problem. /Misaligned/ data is a >>>>> problem.

The Cortex-M0+ cannot access misaligned data directly. But if the >>>>> compiler knows that it is misaligned - by "packed" struct, or
"aligned" attribute on the typedef - it should break apart the
accesses into bytes or 16-bit half-words as necessary. (Aligning a >>>>> uint64_t to 4 byte alignment will not be a problem.)

However for Cortex-M0+ uint64_t aligned at 4 bytes is:
- aligned for the core (two 4-bytes aligned accesses are required)

Yes. As far as I know, the M0+ core does not need any alignment
greater than 4 for any purpose. (But I might not know everything
about the core!) There can be alignment requirements for other
things, such as DMA.

- misaligned for the ABI and the compiler

Yes.

We agree that forcing the gcc compiler to consider 4-bytes as the
required alignment of uint64_t (using aligned attribute) is always
safe.

No.

It will almost always be safe, but you don't have any guarantees.
The compiler knows that if "p" is of type "uint64_t *", then
"(uintptr_t) p & 0x07" will always be zero. Is it likely that you
would have anything in your code where that is relevant, and also
that the compiler would generate code that relies on that
assumption? No, it is very unlikely.

But there is a general principle that you should not lie to your
compiler - don't write code that executes UB, breaks ABIs, or is
otherwise breaking the contract you have with the compiler unless you
are using compiler features that let you keep everything honest.

Part of that is that code you are writing now for the M0+ might be
copied or adapted to a different target at a different time. Maybe
on a different core, the same data will be read using some kind of
SIMD or vector instruction that /does/ require 8-byte alignment.
Don't mess these things without telling your compiler. And don't
mess with them without telling future maintainers and programmers
using the code (including your future self).

But it is exactly what I wanted to do: explictly tell the compiler to
align uint64_t at a 4-bytes address (as I wrote, with attribute
align). I didn't think to lie my best friend compiler.

uint64_t on 32-bit EABI ARM has an alignment of 8 bytes. That's cut in stone, and you cannot change it (short of adding a new ABI to the toolchain). If you try to use uint64_t objects that are not 8-byte aligned, or try to use pointers that are not 8-byte aligned to access uint64_t types, you are lying to your compiler.

If you make a new type that is like a uint64_t but with an "aligned(4)" attribute, you have a /new/ type. And that type will work just like you want - it is an 8 byte unsigned integer with a 4 byte alignment. As
long as you use that consistently, you'll be fine.

What I wanted to know is if there were other issues or drawback, such
as more instructions penalty.

The drawback from trying to use an object of a type with an improper alignment is that you have UB. What more reasons do you want for not
doing it?

Most probably I can't explain what I want to say. I don't want to use an *improper* alignment (different from the one that gcc really is using).
I want to know what happens when I *instruct* the compiler to use a
4-bytes alignment for uint64_t in the context of Cortex-M0+ core only.

In other words, is it completely safe to use, as you suggested,

typedef uint64_t __attribute__((align(4))) uint64_a;

???

From what you wrote, I think yes. Maybe just a very small optimization penalty.

From the goldbot link that you share in another post, it seems there's
a penalty of a single instruction (it's strange, it seems the compiler
needs to save the struct pointer to r3, before loading the two halves
of the word, but only if uint64_t is aligned to 4-bytes).

The compiler is not perfect here - there is definitely an extra
instruction because it is reading the low word first. (clang reads the
low word first for uint64_t as well, meaning it gives worse code for A
and B as well.) In real code, rather than a brief test snippet, other factors could mean this does not happen - it's only because the pointer happens to be in r0 that you see it here.

But there's no harm in filing a gcc bug on this, looking for an obvious improvement.

I would be extremely surprised to find code that fails to work on an
M0+ because of a uint64_t pointer that is 4-byte aligned but not
8-byte aligned. But if /I/ want to use 64-bit integers with 4-byte
alignments, I'd use the typedef'd aligned type for the object type
and for any relevant pointers.

Yes, of course. Even if I don't understand why the compiler isn't able
to align at 4-bytes address the uint64_t member in struct B.

It can't align the uint64_t member because the EABI says uint64_t (or, rather, unsigned long long) is 8 bytes aligned. gcc didn't make those rules - ARM did.

But struct B is defined with correct alignment attribute for uint64_t
member. I tried also:

struct B {
uint32_t x;
uint64_t y __attribute__((aligned(4)));
};

The struct size is always 16, so y is placed at offset 8 and not 4. It
seems to me gcc isn't able to respect the aligned attribute of 4 bytes
when it is specified inside the struct definition.

I don't see many differences with:

typedef __attribute__((aligned(4))) uint64_t uint64_a;

struct C {
uint32_t x;
uint64_a y;
};

As I briefly mentioned before, there are a number of very poor choices
in the EABI (and the 32-bit ARM ABI used for Linux). This is far from
the worst.

However, what really changes in the binary output?

In some cases, the address of uint64_t can change from 8-bytes to
4-bytes aligned address (because we instructed it to do so). What
about the code that accesses uint64_t aligned to 4-bytes? Is it
identical between 4- and 8-bytes alignment requirement? I think so,
because in both case, the compiler should add two load/store 4-bytes
instructions.

--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Thu Jan 22 10:03:21 2026

From Newsgroup: comp.arch.embedded

On 21/01/2026 17:57, pozz wrote:

Il 21/01/2026 17:13, David Brown ha scritto:

On 21/01/2026 15:58, pozz wrote:

Il 21/01/2026 10:02, David Brown ha scritto:

On 21/01/2026 09:11, pozz wrote:

<snip for brevity>

Most probably I can't explain what I want to say. I don't want to use an *improper* alignment (different from the one that gcc really is using).
I want to know what happens when I *instruct* the compiler to use a
4-bytes alignment for uint64_t in the context of Cortex-M0+ core only.

I think we may have been talking slightly past each other, so that
re-wording was helpful.

In other words, is it completely safe to use, as you suggested,

   typedef uint64_t __attribute__((align(4))) uint64_a;

???

Baring compiler bugs, yes, that is completely safe. When the compiler
lets you make such a type, and use it, it is the compiler's
responsibility to get the details right. You should never see issues
from the compiler's knowledge and assumptions of alignments, and it
should generate instructions that work on the target (for example, if
the target hardware required 8-byte alignment for 64-bit loads and
stores, then the compiler would generate two 32-bit accesses instead).

And for the Cortex M series, 4-byte alignment is the maximum needed for working code (though there might be efficiency differences on some of
the biggest M cores that have 64-bit buses internally, or when data
caches are used).

From what you wrote, I think yes. Maybe just a very small optimization penalty.

Yes. And I think that is a "missed optimisation opportunity" bug. I
suspect (or speculate), but have not looked at the compiler code to be
sure, that the code generator generally accesses the low half of 64-bit
data first. And then it may have specific optimisations ("peephole" optimisations) for re-ordering the accesses for "long long" types in
certain circumstances, saving a register and an instruction. However,
that would apply only to the specific type - and while "uint64_a" works
a lot like "unsigned long long", it is not that exact type, and won't
trigger the same optimisation.

From the goldbot link that you share in another post, it seems
there's a penalty of a single instruction (it's strange, it seems the
compiler needs to save the struct pointer to r3, before loading the
two halves of the word, but only if uint64_t is aligned to 4-bytes).

The compiler is not perfect here - there is definitely an extra
instruction because it is reading the low word first. (clang reads
the low word first for uint64_t as well, meaning it gives worse code
for A and B as well.) In real code, rather than a brief test snippet,
other factors could mean this does not happen - it's only because the
pointer happens to be in r0 that you see it here.

But there's no harm in filing a gcc bug on this, looking for an
obvious improvement.

I would be extremely surprised to find code that fails to work on an
M0+ because of a uint64_t pointer that is 4-byte aligned but not
8-byte aligned. But if /I/ want to use 64-bit integers with 4-byte
alignments, I'd use the typedef'd aligned type for the object type
and for any relevant pointers.

Yes, of course. Even if I don't understand why the compiler isn't
able to align at 4-bytes address the uint64_t member in struct B.

It can't align the uint64_t member because the EABI says uint64_t (or,
rather, unsigned long long) is 8 bytes aligned. gcc didn't make those
rules - ARM did.

But struct B is defined with correct alignment attribute for uint64_t member. I tried also:

struct B {
    uint32_t x;
    uint64_t y __attribute__((aligned(4)));
};

The struct size is always 16, so y is placed at offset 8 and not 4. It
seems to me gcc isn't able to respect the aligned attribute of 4 bytes
when it is specified inside the struct definition.

That is my conclusion too. (I tried the "aligned" attribute in every
place I could.) The only place it worked was on a typedef for the new "uint64_a" type.

I don't see many differences with:

typedef __attribute__((aligned(4))) uint64_t uint64_a;

struct C {
    uint32_t x;
    uint64_a y;
};

It certainly seems inconsistent to me that it works on the typedef, and
not directly in the struct definition. After all, a typedef does not
actually define a new type (it's a silly name) - it merely defines an
alias or shortcut name for a type. So it would seem logical that using "uint64_a" or "__attribute__((aligned(4))) uint64_t" in the struct
definition would mean exactly the same thing. But apparently not. gcc attributes are not part of the normal C grammar, so there's no standard
to fall back on here.

As I briefly mentioned before, there are a number of very poor choices
in the EABI (and the 32-bit ARM ABI used for Linux). This is far from
the worst.

However, what really changes in the binary output?

In some cases, the address of uint64_t can change from 8-bytes to
4-bytes aligned address (because we instructed it to do so). What
about the code that accesses uint64_t aligned to 4-bytes? Is it
identical between 4- and 8-bytes alignment requirement? I think so, >>>>> because in both case, the compiler should add two load/store
4-bytes instructions.

--- Synchronet 3.21b-Linux NewsLink 1.2

Who's Online

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,097
Nodes:	10 (0 / 10)
Uptime:	15:17:08
Calls:	14,089
Files:	187,110
D/L today:	146 files (47,201K bytes)
Messages:	2,491,113

arm-gcc, Cortex-M0+, uint64_t and alignment

Who's Online

System Info