Forum: War Ensemble BBS

Re: Memory protection between compilation units?

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Sun Jun 15 13:57:59 2025

From Newsgroup: comp.lang.c

Mateusz Viste <mateusz@not.gonna.tell> wrote:

On 13.06.2025 15:56, Michael S wrote:

A significant part of x86 installed base (all Intel Core CPUs starting
from gen 6 up to gen 9 and their Xeon contemporaries) has extension
named Itel MPX that was invented exactly for that purpose. But it didn't >>work particularly well. Compiler people never liked it, but despite
that it was supported by several generations of gcc and probably by
clang as well.

This does not really sound like something "readily available", unless you
are suggesting that I migrate to a Linux kernel from 10 years ago, switch
to gcc 5.0 and use outdated hardware.

The proper solution to your problem is to stop using memory-unsafe
language for complex application programming. It's not that successful
use of unsafe languages is for complex application programming is >>impossible. The practice proved many times that it can be done. But
only by very good team. You team is not good enough.

Just to clarify: I didn’t post here seeking help with a simple out-of-bounds
issue, nor was I here to vent. I’ve been wrangling C code in complex, high-performance systems for over a decade - I’m managing just fine. Code improvement is a continual, non-negotiable process in our line of work, but fires happen occasionally nonetheless. While fixing the issue, I started wondering about how faults like this could be located faster, that is assuming they do slip into production - because in spite of the testing process, some faults will inevitably get to customers.

A crash that happens closer to the source of the problem (same compilation unit) would significantly ease the debugging effort. I figured it was a
topic worth sharing, in the spirit of sparking some constructive
discussions.

You should understand that C array indexing and pointer pointer
operations are defined in specific way. This has several
advantages. But also has significant cost: checking validity
of array indexing in C is much harder than in other languages.
Namely, in most languages implementation knows size/bounds of
an array and can automatically generate checks on each access.
This has some cost, but modern experience is that this cost
is quite acceptable (on average about 5-10% increase in runtime
and similar increase in size). In C compiler sometimes knows
size of the array, but in general it does not. So in C you
either use half measures, like hoping that paging hardware
will catch of of bound access (possibly arranging data layout to
increase chance of fault) or very expensive approches,
which essentially bundle bounds with the pointer (Intel
tried to add hardware support for this, but even with
hardware support it is still much more expensive than checking
in some other languages).

IIUC in your example the array was global, so compiler knew its
bound and in principle could generate bounds checks. But
I am not aware of C compiler which actually generate such
checks. AFAIK gcc sanitize options are doing somewhat different
thing, Tiny C has an option to generate bounds checks, but
it is not clear to me in which cases it is effective (and you
probably would not use Tiny C for preformance critical code).

Note that in C++ when you use C arrays, you have the same
situation as in C. But you can instead use array classes which
check accesses.
--
Waldek Hebisch
--- Synchronet 3.21a-Linux NewsLink 1.2

From Mateusz Viste@mateusz@not.gonna.tell to comp.lang.c on Sun Jun 15 20:27:17 2025

From Newsgroup: comp.lang.c

On 15.06.2025 15:57, antispam@fricas.org wrote:

IIUC in your example the array was global, so compiler knew its
bound and in principle could generate bounds checks. But
I am not aware of C compiler which actually generate such
checks.

There was one apparently as early as 1983 :)

https://www.doc.ic.ac.uk/~afd/rarepapers/KendallBccRuntimeCheckingsforC.pdf

Granted, it wasn’t a full-fledged C compiler, more of a bounds-checking code
generator. Still, the paper is a fascinating read and highlights that this
topic has been explored for quite some time. A more recent variation on the
theme can be seen here (based on GCC BP, abandoned a couple years ago):

https://www.cs.purdue.edu/homes/xyzhang/fall07/Papers/TR181.pdf

That said, detecting out-of-bounds array access is no panacea. Memory
corruption can arise from various sources, such as dangling pointers or
poorly managed pointer arithmetic. Hence why I was looking in the direction
of the MMU. All compilation units of a program share the same set of TLBs.
I figured there might perhaps be a way to isolate a given compilation unit
in different TLBs, effectively sandboxing its memory, then make this unit
communicate with the rest of the program via shm when shared memory
accesses are needed.

Of course, even if such solution would be possible, it would not be very
practical. Besides, one could easily achieve the same isolation by turning
that compilation unit into a standalone, service-providing daemon.

Mateusz
--- Synchronet 3.21a-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Sun Jun 15 23:50:15 2025

From Newsgroup: comp.lang.c

Mateusz Viste <mateusz@not.gonna.tell> wrote:

That said, detecting out-of-bounds array access is no panacea. Memory corruption can arise from various sources, such as dangling pointers or poorly managed pointer arithmetic.

AFAICS there is no reason for explicit pointer arithmetic in well
written C programs. Implicit pointer arithmetic (coming from array
indexing) is done by compiler so should be no problem. Like in
case of bounds checking using other languages can help in avoiding
dangling pointers.

Hence why I was looking in the direction
of the MMU. All compilation units of a program share the same set of TLBs.
I figured there might perhaps be a way to isolate a given compilation unit
in different TLBs, effectively sandboxing its memory, then make this unit communicate with the rest of the program via shm when shared memory
accesses are needed.

Changing TLB-s content is rather expensive. Also what "its memory"
is supposed to mean? Normaly functions in a C program pass pointers
to other functions, so several functions can legaly access rather
large and varying in time parts of memory. Best approximation to
your idea available in PC hardware is 286/386 segmentation. But
it proved to be quite inconvenient, so "everybody" is now using flat
mode. One could try to emulate segmentation using paging hardware,
and your idea clearly goes in such direction, but it is unlikely
to work well.
--
Waldek Hebisch
--- Synchronet 3.21a-Linux NewsLink 1.2

From Kaz Kylheku@643-408-1753@kylheku.com to comp.lang.c on Mon Jun 16 01:01:35 2025

From Newsgroup: comp.lang.c

On 2025-06-15, Waldek Hebisch <antispam@fricas.org> wrote:

Mateusz Viste <mateusz@not.gonna.tell> wrote:

That said, detecting out-of-bounds array access is no panacea. Memory
corruption can arise from various sources, such as dangling pointers or
poorly managed pointer arithmetic.

AFAICS there is no reason for explicit pointer arithmetic in well
written C programs.

LOL, you heard it here.

Implicit pointer arithmetic (coming from array
indexing) is done by compiler so should be no problem. Like in

Array indexing *is* pointer arithmetic.

Are you not aware of this equivalence?

(E1)[(E2)] <---> *((E1) + (E2))

In fact, let's draw the commutative diagram

(E1)[(E2)] <---> *((E1) + (E2))
^ ^
| |
| |
v v
(E2)[(E1)] <---> *((E2) + (E1))

You're not saying anything here other than that you like the p[i]
/notation/ better than *(p + i), and &p[i] better than p + i.

Great, thanks for sharing!

You're not doing yourself any favor by confusing
"not styled in my taste" with "not well written".
--- Synchronet 3.21a-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Mon Jun 16 10:00:34 2025

From Newsgroup: comp.lang.c

Kaz Kylheku <643-408-1753@kylheku.com> wrote:

On 2025-06-15, Waldek Hebisch <antispam@fricas.org> wrote:

Mateusz Viste <mateusz@not.gonna.tell> wrote:

That said, detecting out-of-bounds array access is no panacea. Memory
corruption can arise from various sources, such as dangling pointers or
poorly managed pointer arithmetic.

AFAICS there is no reason for explicit pointer arithmetic in well
written C programs.

LOL, you heard it here.

Implicit pointer arithmetic (coming from array
indexing) is done by compiler so should be no problem. Like in

Array indexing *is* pointer arithmetic.

Are you not aware of this equivalence?

(E1)[(E2)] <---> *((E1) + (E2))

Learn to read.

In fact, let's draw the commutative diagram

(E1)[(E2)] <---> *((E1) + (E2))
^ ^
| |
| |
v v
(E2)[(E1)] <---> *((E2) + (E1))

You're not saying anything here other than that you like the p[i]
/notation/ better than *(p + i), and &p[i] better than p + i.

The indexing notation at least have chance of being automatically
checked (in cases when compiler/checker knows array size). With arbitrary user-written pointer arithmetic there is no hope of automatic checking.
--
Waldek Hebisch
--- Synchronet 3.21a-Linux NewsLink 1.2

From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Mon Jun 16 06:12:01 2025

From Newsgroup: comp.lang.c

On 2025-06-16 06:00, Waldek Hebisch wrote:

Kaz Kylheku <643-408-1753@kylheku.com> wrote:

...

You're not saying anything here other than that you like the p[i]
/notation/ better than *(p + i), and &p[i] better than p + i.

The indexing notation at least have chance of being automatically
checked (in cases when compiler/checker knows array size). With arbitrary user-written pointer arithmetic there is no hope of automatic checking.

Since they are, by definition, equivalent, *(p+i) is can be
automatically checked under precisely the same situations where p[i] can
be checked. It makes NO difference.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Louis Krupp@lkrupp@invalid.pssw.com.invalid to comp.lang.c on Mon Jun 16 06:29:30 2025

From Newsgroup: comp.lang.c

On 6/11/2025 7:32 AM, Mateusz Viste wrote:

This might not be a strictly C question, but it definitely concerns all
C programmers.

Earlier today, I fixed an out-of-bounds write bug. An obvious issue:

static int *socks[0xffff];

void update_my_socks(int *sock, int val) {
socks[val & 0xffff] = sock;
}

<snip>

Imagine an alternate universe in which array declarations took the form (borrowed from Unisys ALGOL):

array_name[lower_bound : upper_bound]

The array in question would have been declared

static int *socks[0 : 0xffff]

The mask 0xffff and the upper bound would have been the same, and the
code would have been obviously right instead of subtly wrong.

Louis

--- Synchronet 3.21a-Linux NewsLink 1.2

From Mateusz Viste@mateusz@x.invalid to comp.lang.c on Mon Jun 16 15:01:28 2025

From Newsgroup: comp.lang.c

On Mon, 16 Jun 2025 06:29:30 Louis Krupp wrote:

Imagine an alternate universe in which array declarations took the
form (borrowed from Unisys ALGOL):

array_name[lower_bound : upper_bound]

This alternate C universe you describe looks appealing, but I strongly
suspect it is currently tormented by violent conflicts between the
noble 0-based traditionalists, the idealistic 1-based reformists, and
the rogue "random-based" anarchists. Our C is not perfect, by we could
have ended with much worse.

Mateusz

--- Synchronet 3.21a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Jun 16 06:10:46 2025

From Newsgroup: comp.lang.c

antispam@fricas.org (Waldek Hebisch) writes:

Mateusz Viste <mateusz@not.gonna.tell> wrote:

That said, detecting out-of-bounds array access is no panacea. Memory
corruption can arise from various sources, such as dangling pointers or
poorly managed pointer arithmetic.

AFAICS there is no reason for explicit pointer arithmetic in well
written C programs.

This assertion is in effect a No True Scotsman statement.

Implicit pointer arithmetic (coming from array
indexing) is done by compiler so should be no problem.

Even if there is no direct manipulation ("pointer arithmetic") of
pointer variables, access can be checked only if array bounds
information is available, and in many cases it isn't. The reason is
(among other things) C doesn't have array parameters; what it does
have instead is pointer parameters. At the point in the code when
an "array" access is to be done, the information needed to check
that an index value is in bounds just isn't available. The culprit
here is not explicit pointer arithmetic, but lacking the information
needed to do a bounds check. That lack is inherent in how the C
language works with respect to arrays and pointer conversion.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Rosario19@Ros@invalid.invalid to comp.lang.c on Mon Jun 16 18:14:05 2025

From Newsgroup: comp.lang.c

On Thu, 12 Jun 2025 19:15:26 +0100, Richard Heathfield wrote:

Sure. Or some people prefer to single-step with a debugger. Such
people can make their lives a little easier by surrounding the
buffer with sentinel soldiers, setting the sentinel soldiers to a
magic number, and putting a watch on them both - the buffer high
soldier and the buffer low soldier.

I think out of bound of the array many times there is a write of the 2
limit bounds memory... but there are cases where bound are ok but
memory is written out the array the same, in some other places
--- Synchronet 3.21a-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Mon Jun 16 16:47:26 2025

From Newsgroup: comp.lang.c

Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

antispam@fricas.org (Waldek Hebisch) writes:

Mateusz Viste <mateusz@not.gonna.tell> wrote:

That said, detecting out-of-bounds array access is no panacea. Memory
corruption can arise from various sources, such as dangling pointers or
poorly managed pointer arithmetic.

AFAICS there is no reason for explicit pointer arithmetic in well
written C programs.

This assertion is in effect a No True Scotsman statement.

Implicit pointer arithmetic (coming from array
indexing) is done by compiler so should be no problem.

Even if there is no direct manipulation ("pointer arithmetic") of
pointer variables, access can be checked only if array bounds
information is available, and in many cases it isn't. The reason is
(among other things) C doesn't have array parameters; what it does
have instead is pointer parameters. At the point in the code when
an "array" access is to be done, the information needed to check
that an index value is in bounds just isn't available. The culprit
here is not explicit pointer arithmetic, but lacking the information
needed to do a bounds check. That lack is inherent in how the C
language works with respect to arrays and pointer conversion.

Yes, I wrote this in an earlier message. Here OP concern was
specifically "poorly managed pointer arithmetic".
--
Waldek Hebisch
--- Synchronet 3.21a-Linux NewsLink 1.2

From Richard Heathfield@rjh@cpax.org.uk to comp.lang.c on Mon Jun 16 17:53:31 2025

From Newsgroup: comp.lang.c

On 16/06/2025 17:14, Rosario19 wrote:

On Thu, 12 Jun 2025 19:15:26 +0100, Richard Heathfield wrote:

Sure. Or some people prefer to single-step with a debugger. Such
people can make their lives a little easier by surrounding the
buffer with sentinel soldiers, setting the sentinel soldiers to a
magic number, and putting a watch on them both - the buffer high
soldier and the buffer low soldier.

I think out of bound of the array many times there is a write of the 2
limit bounds memory... but there are cases where bound are ok but
memory is written out the array the same, in some other places

<whoosh>
--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

--- Synchronet 3.21a-Linux NewsLink 1.2

From olcott@polcott333@gmail.com to comp.lang.c on Sat Jun 21 15:49:10 2025

From Newsgroup: comp.lang.c

On 6/11/2025 8:32 AM, Mateusz Viste wrote:

This might not be a strictly C question, but it definitely concerns all
C programmers.

Earlier today, I fixed an out-of-bounds write bug. An obvious issue:

static int *socks[0xffff];

void update_my_socks(int *sock, int val) {
socks[val & 0xffff] = sock;
}

While the presented issue is common knowledge for anyone familiar with
C, *locating* the bug was challenging. The program did not crash at the moment of the out-of-bounds write but much later - somewhere entirely different, in a different object file that maintained a static pointer
for tracking a position in a linked list. To my surprise, the pointer
was randomly reset to NULL about once a week, causing a segfault.
Tracing this back to an unrelated out-of-bounds write elsewhere in the
code was tedious, to say the least.

This raises a question: how can such corruptions be detected sooner? Protected mode prevents interference between programs but doesn’t
safeguard a program from corrupting itself. Is there a way to enforce
memory protection between module files of the same program? After all,
static objects shouldn't be accessible outside their compilation unit.

How would you approach this?

Mateusz

https://en.cppreference.com/w/c/types/integer.html
One way to fix the problem in the above specific
case is to define: void update_my_socks(int *sock, uint16_t val)
--
Copyright 2025 Olcott "Talent hits a target no one else can hit; Genius
hits a target no one else can see." Arthur Schopenhauer
--- Synchronet 3.21a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Tue Jul 1 09:54:36 2025

From Newsgroup: comp.lang.c

Mateusz Viste <mateusz@not.gonna.tell> writes:

On 14.06.2025 01:31, Tim Rentsch wrote:

It isn't wrong to think of bitwise-and as masking-in (or possibly
masking-out) of certain bits, but it still isn't a modulo. A
modulo operation is what is desired;

By "different viewpoints," I meant that while you approach the
problem by applying a modulo operation to the index so it fits the
array size, I tend to think in terms of ensuring the index
correctly maps to a location within an n-bit address space.
Naturally, the array should accommodate the maximum possible index
for the given address space, and that?s where the original code
fell short. And you're absolutely right that hardcoded values are problematic, the size of the array should have been linked with
the n-bits address space expectation.

I understand what you're doing. However one thinks of it, what is
needed is a way to ensure the produced index value is in the range
of array index values, and that the mapping covers the full range of
array index values. Using bitwise-and is a way of solving a less
general problem. Unfortunately: one, although it is known that
using bitwise-and works only for certain array sizes, there was no
check or assertion in the code to verify that requirement; two,
it's a holdover from earlier times when the performance difference
might matter, but now it's a premature optimization (and in most
cases does not result in any improvement); and three, in this case
using bitwise-and contributed to the bug, which wouldn't have
happened if modulo had been used instead.
--- Synchronet 3.21a-Linux NewsLink 1.2

Who's Online

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,069
Nodes:	10 (0 / 10)
Uptime:	70:55:47
Calls:	13,725
Files:	186,960
D/L today:	4,358 files (1,099M bytes)
Messages:	2,410,344

Re: Memory protection between compilation units?

Who's Online

System Info