• Re: Memory protection between compilation units?

    From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Sun Jun 15 13:57:59 2025
    From Newsgroup: comp.lang.c

    Mateusz Viste <mateusz@not.gonna.tell> wrote:
    On 13.06.2025 15:56, Michael S wrote:

    A significant part of x86 installed base (all Intel Core CPUs starting
    from gen 6 up to gen 9 and their Xeon contemporaries) has extension
    named Itel MPX that was invented exactly for that purpose. But it didn't >>work particularly well. Compiler people never liked it, but despite
    that it was supported by several generations of gcc and probably by
    clang as well.

    This does not really sound like something "readily available", unless you
    are suggesting that I migrate to a Linux kernel from 10 years ago, switch
    to gcc 5.0 and use outdated hardware.

    The proper solution to your problem is to stop using memory-unsafe
    language for complex application programming. It's not that successful
    use of unsafe languages is for complex application programming is >>impossible. The practice proved many times that it can be done. But
    only by very good team. You team is not good enough.

    Just to clarify: I didn’t post here seeking help with a simple out-of-bounds
    issue, nor was I here to vent. I’ve been wrangling C code in complex, high-performance systems for over a decade - I’m managing just fine. Code improvement is a continual, non-negotiable process in our line of work, but fires happen occasionally nonetheless. While fixing the issue, I started wondering about how faults like this could be located faster, that is assuming they do slip into production - because in spite of the testing process, some faults will inevitably get to customers.

    A crash that happens closer to the source of the problem (same compilation unit) would significantly ease the debugging effort. I figured it was a
    topic worth sharing, in the spirit of sparking some constructive
    discussions.

    You should understand that C array indexing and pointer pointer
    operations are defined in specific way. This has several
    advantages. But also has significant cost: checking validity
    of array indexing in C is much harder than in other languages.
    Namely, in most languages implementation knows size/bounds of
    an array and can automatically generate checks on each access.
    This has some cost, but modern experience is that this cost
    is quite acceptable (on average about 5-10% increase in runtime
    and similar increase in size). In C compiler sometimes knows
    size of the array, but in general it does not. So in C you
    either use half measures, like hoping that paging hardware
    will catch of of bound access (possibly arranging data layout to
    increase chance of fault) or very expensive approches,
    which essentially bundle bounds with the pointer (Intel
    tried to add hardware support for this, but even with
    hardware support it is still much more expensive than checking
    in some other languages).

    IIUC in your example the array was global, so compiler knew its
    bound and in principle could generate bounds checks. But
    I am not aware of C compiler which actually generate such
    checks. AFAIK gcc sanitize options are doing somewhat different
    thing, Tiny C has an option to generate bounds checks, but
    it is not clear to me in which cases it is effective (and you
    probably would not use Tiny C for preformance critical code).

    Note that in C++ when you use C arrays, you have the same
    situation as in C. But you can instead use array classes which
    check accesses.
    --
    Waldek Hebisch
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Mateusz Viste@mateusz@not.gonna.tell to comp.lang.c on Sun Jun 15 20:27:17 2025
    From Newsgroup: comp.lang.c

    On 15.06.2025 15:57, antispam@fricas.org wrote:
    IIUC in your example the array was global, so compiler knew its
    bound and in principle could generate bounds checks. But
    I am not aware of C compiler which actually generate such
    checks.

    There was one apparently as early as 1983 :)

    https://www.doc.ic.ac.uk/~afd/rarepapers/KendallBccRuntimeCheckingsforC.pdf

    Granted, it wasn’t a full-fledged C compiler, more of a bounds-checking code
    generator. Still, the paper is a fascinating read and highlights that this
    topic has been explored for quite some time. A more recent variation on the
    theme can be seen here (based on GCC BP, abandoned a couple years ago):

    https://www.cs.purdue.edu/homes/xyzhang/fall07/Papers/TR181.pdf

    That said, detecting out-of-bounds array access is no panacea. Memory
    corruption can arise from various sources, such as dangling pointers or
    poorly managed pointer arithmetic. Hence why I was looking in the direction
    of the MMU. All compilation units of a program share the same set of TLBs.
    I figured there might perhaps be a way to isolate a given compilation unit
    in different TLBs, effectively sandboxing its memory, then make this unit
    communicate with the rest of the program via shm when shared memory
    accesses are needed.

    Of course, even if such solution would be possible, it would not be very
    practical. Besides, one could easily achieve the same isolation by turning
    that compilation unit into a standalone, service-providing daemon.

    Mateusz
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Sun Jun 15 23:50:15 2025
    From Newsgroup: comp.lang.c

    Mateusz Viste <mateusz@not.gonna.tell> wrote:

    That said, detecting out-of-bounds array access is no panacea. Memory corruption can arise from various sources, such as dangling pointers or poorly managed pointer arithmetic.

    AFAICS there is no reason for explicit pointer arithmetic in well
    written C programs. Implicit pointer arithmetic (coming from array
    indexing) is done by compiler so should be no problem. Like in
    case of bounds checking using other languages can help in avoiding
    dangling pointers.

    Hence why I was looking in the direction
    of the MMU. All compilation units of a program share the same set of TLBs.
    I figured there might perhaps be a way to isolate a given compilation unit
    in different TLBs, effectively sandboxing its memory, then make this unit communicate with the rest of the program via shm when shared memory
    accesses are needed.

    Changing TLB-s content is rather expensive. Also what "its memory"
    is supposed to mean? Normaly functions in a C program pass pointers
    to other functions, so several functions can legaly access rather
    large and varying in time parts of memory. Best approximation to
    your idea available in PC hardware is 286/386 segmentation. But
    it proved to be quite inconvenient, so "everybody" is now using flat
    mode. One could try to emulate segmentation using paging hardware,
    and your idea clearly goes in such direction, but it is unlikely
    to work well.
    --
    Waldek Hebisch
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Kaz Kylheku@643-408-1753@kylheku.com to comp.lang.c on Mon Jun 16 01:01:35 2025
    From Newsgroup: comp.lang.c

    On 2025-06-15, Waldek Hebisch <antispam@fricas.org> wrote:
    Mateusz Viste <mateusz@not.gonna.tell> wrote:

    That said, detecting out-of-bounds array access is no panacea. Memory
    corruption can arise from various sources, such as dangling pointers or
    poorly managed pointer arithmetic.

    AFAICS there is no reason for explicit pointer arithmetic in well
    written C programs.

    LOL, you heard it here.

    Implicit pointer arithmetic (coming from array
    indexing) is done by compiler so should be no problem. Like in

    Array indexing *is* pointer arithmetic.

    Are you not aware of this equivalence?

    (E1)[(E2)] <---> *((E1) + (E2))

    In fact, let's draw the commutative diagram

    (E1)[(E2)] <---> *((E1) + (E2))
    ^ ^
    | |
    | |
    v v
    (E2)[(E1)] <---> *((E2) + (E1))

    You're not saying anything here other than that you like the p[i]
    /notation/ better than *(p + i), and &p[i] better than p + i.

    Great, thanks for sharing!

    You're not doing yourself any favor by confusing
    "not styled in my taste" with "not well written".
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Mon Jun 16 10:00:34 2025
    From Newsgroup: comp.lang.c

    Kaz Kylheku <643-408-1753@kylheku.com> wrote:
    On 2025-06-15, Waldek Hebisch <antispam@fricas.org> wrote:
    Mateusz Viste <mateusz@not.gonna.tell> wrote:

    That said, detecting out-of-bounds array access is no panacea. Memory
    corruption can arise from various sources, such as dangling pointers or
    poorly managed pointer arithmetic.

    AFAICS there is no reason for explicit pointer arithmetic in well
    written C programs.

    LOL, you heard it here.

    Implicit pointer arithmetic (coming from array
    indexing) is done by compiler so should be no problem. Like in

    Array indexing *is* pointer arithmetic.

    Are you not aware of this equivalence?

    (E1)[(E2)] <---> *((E1) + (E2))


    Learn to read.

    In fact, let's draw the commutative diagram

    (E1)[(E2)] <---> *((E1) + (E2))
    ^ ^
    | |
    | |
    v v
    (E2)[(E1)] <---> *((E2) + (E1))

    You're not saying anything here other than that you like the p[i]
    /notation/ better than *(p + i), and &p[i] better than p + i.

    The indexing notation at least have chance of being automatically
    checked (in cases when compiler/checker knows array size). With arbitrary user-written pointer arithmetic there is no hope of automatic checking.
    --
    Waldek Hebisch
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Mon Jun 16 06:12:01 2025
    From Newsgroup: comp.lang.c

    On 2025-06-16 06:00, Waldek Hebisch wrote:
    Kaz Kylheku <643-408-1753@kylheku.com> wrote:
    ...
    You're not saying anything here other than that you like the p[i]
    /notation/ better than *(p + i), and &p[i] better than p + i.

    The indexing notation at least have chance of being automatically
    checked (in cases when compiler/checker knows array size). With arbitrary user-written pointer arithmetic there is no hope of automatic checking.

    Since they are, by definition, equivalent, *(p+i) is can be
    automatically checked under precisely the same situations where p[i] can
    be checked. It makes NO difference.


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Louis Krupp@lkrupp@invalid.pssw.com.invalid to comp.lang.c on Mon Jun 16 06:29:30 2025
    From Newsgroup: comp.lang.c

    On 6/11/2025 7:32 AM, Mateusz Viste wrote:
    This might not be a strictly C question, but it definitely concerns all
    C programmers.

    Earlier today, I fixed an out-of-bounds write bug. An obvious issue:

    static int *socks[0xffff];

    void update_my_socks(int *sock, int val) {
    socks[val & 0xffff] = sock;
    }

    <snip>

    Imagine an alternate universe in which array declarations took the form (borrowed from Unisys ALGOL):

    array_name[lower_bound : upper_bound]

    The array in question would have been declared

    static int *socks[0 : 0xffff]

    The mask 0xffff and the upper bound would have been the same, and the
    code would have been obviously right instead of subtly wrong.

    Louis


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Mateusz Viste@mateusz@x.invalid to comp.lang.c on Mon Jun 16 15:01:28 2025
    From Newsgroup: comp.lang.c

    On Mon, 16 Jun 2025 06:29:30 Louis Krupp wrote:
    Imagine an alternate universe in which array declarations took the
    form (borrowed from Unisys ALGOL):

    array_name[lower_bound : upper_bound]

    This alternate C universe you describe looks appealing, but I strongly
    suspect it is currently tormented by violent conflicts between the
    noble 0-based traditionalists, the idealistic 1-based reformists, and
    the rogue "random-based" anarchists. Our C is not perfect, by we could
    have ended with much worse.

    Mateusz

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Jun 16 06:10:46 2025
    From Newsgroup: comp.lang.c

    antispam@fricas.org (Waldek Hebisch) writes:

    Mateusz Viste <mateusz@not.gonna.tell> wrote:

    That said, detecting out-of-bounds array access is no panacea. Memory
    corruption can arise from various sources, such as dangling pointers or
    poorly managed pointer arithmetic.

    AFAICS there is no reason for explicit pointer arithmetic in well
    written C programs.

    This assertion is in effect a No True Scotsman statement.

    Implicit pointer arithmetic (coming from array
    indexing) is done by compiler so should be no problem.

    Even if there is no direct manipulation ("pointer arithmetic") of
    pointer variables, access can be checked only if array bounds
    information is available, and in many cases it isn't. The reason is
    (among other things) C doesn't have array parameters; what it does
    have instead is pointer parameters. At the point in the code when
    an "array" access is to be done, the information needed to check
    that an index value is in bounds just isn't available. The culprit
    here is not explicit pointer arithmetic, but lacking the information
    needed to do a bounds check. That lack is inherent in how the C
    language works with respect to arrays and pointer conversion.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Rosario19@Ros@invalid.invalid to comp.lang.c on Mon Jun 16 18:14:05 2025
    From Newsgroup: comp.lang.c

    On Thu, 12 Jun 2025 19:15:26 +0100, Richard Heathfield wrote:
    Sure. Or some people prefer to single-step with a debugger. Such
    people can make their lives a little easier by surrounding the
    buffer with sentinel soldiers, setting the sentinel soldiers to a
    magic number, and putting a watch on them both - the buffer high
    soldier and the buffer low soldier.
    I think out of bound of the array many times there is a write of the 2
    limit bounds memory... but there are cases where bound are ok but
    memory is written out the array the same, in some other places
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Mon Jun 16 16:47:26 2025
    From Newsgroup: comp.lang.c

    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    antispam@fricas.org (Waldek Hebisch) writes:

    Mateusz Viste <mateusz@not.gonna.tell> wrote:

    That said, detecting out-of-bounds array access is no panacea. Memory
    corruption can arise from various sources, such as dangling pointers or
    poorly managed pointer arithmetic.

    AFAICS there is no reason for explicit pointer arithmetic in well
    written C programs.

    This assertion is in effect a No True Scotsman statement.

    Implicit pointer arithmetic (coming from array
    indexing) is done by compiler so should be no problem.

    Even if there is no direct manipulation ("pointer arithmetic") of
    pointer variables, access can be checked only if array bounds
    information is available, and in many cases it isn't. The reason is
    (among other things) C doesn't have array parameters; what it does
    have instead is pointer parameters. At the point in the code when
    an "array" access is to be done, the information needed to check
    that an index value is in bounds just isn't available. The culprit
    here is not explicit pointer arithmetic, but lacking the information
    needed to do a bounds check. That lack is inherent in how the C
    language works with respect to arrays and pointer conversion.

    Yes, I wrote this in an earlier message. Here OP concern was
    specifically "poorly managed pointer arithmetic".
    --
    Waldek Hebisch
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Richard Heathfield@rjh@cpax.org.uk to comp.lang.c on Mon Jun 16 17:53:31 2025
    From Newsgroup: comp.lang.c

    On 16/06/2025 17:14, Rosario19 wrote:
    On Thu, 12 Jun 2025 19:15:26 +0100, Richard Heathfield wrote:

    Sure. Or some people prefer to single-step with a debugger. Such
    people can make their lives a little easier by surrounding the
    buffer with sentinel soldiers, setting the sentinel soldiers to a
    magic number, and putting a watch on them both - the buffer high
    soldier and the buffer low soldier.

    I think out of bound of the array many times there is a write of the 2
    limit bounds memory... but there are cases where bound are ok but
    memory is written out the array the same, in some other places

    <whoosh>
    --
    Richard Heathfield
    Email: rjh at cpax dot org dot uk
    "Usenet is a strange place" - dmr 29 July 1999
    Sig line 4 vacant - apply within

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From olcott@polcott333@gmail.com to comp.lang.c on Sat Jun 21 15:49:10 2025
    From Newsgroup: comp.lang.c

    On 6/11/2025 8:32 AM, Mateusz Viste wrote:
    This might not be a strictly C question, but it definitely concerns all
    C programmers.

    Earlier today, I fixed an out-of-bounds write bug. An obvious issue:

    static int *socks[0xffff];

    void update_my_socks(int *sock, int val) {
    socks[val & 0xffff] = sock;
    }

    While the presented issue is common knowledge for anyone familiar with
    C, *locating* the bug was challenging. The program did not crash at the moment of the out-of-bounds write but much later - somewhere entirely different, in a different object file that maintained a static pointer
    for tracking a position in a linked list. To my surprise, the pointer
    was randomly reset to NULL about once a week, causing a segfault.
    Tracing this back to an unrelated out-of-bounds write elsewhere in the
    code was tedious, to say the least.

    This raises a question: how can such corruptions be detected sooner? Protected mode prevents interference between programs but doesn’t
    safeguard a program from corrupting itself. Is there a way to enforce
    memory protection between module files of the same program? After all,
    static objects shouldn't be accessible outside their compilation unit.

    How would you approach this?

    Mateusz


    https://en.cppreference.com/w/c/types/integer.html
    One way to fix the problem in the above specific
    case is to define: void update_my_socks(int *sock, uint16_t val)
    --
    Copyright 2025 Olcott "Talent hits a target no one else can hit; Genius
    hits a target no one else can see." Arthur Schopenhauer
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Tue Jul 1 09:54:36 2025
    From Newsgroup: comp.lang.c

    Mateusz Viste <mateusz@not.gonna.tell> writes:

    On 14.06.2025 01:31, Tim Rentsch wrote:

    It isn't wrong to think of bitwise-and as masking-in (or possibly
    masking-out) of certain bits, but it still isn't a modulo. A
    modulo operation is what is desired;

    By "different viewpoints," I meant that while you approach the
    problem by applying a modulo operation to the index so it fits the
    array size, I tend to think in terms of ensuring the index
    correctly maps to a location within an n-bit address space.
    Naturally, the array should accommodate the maximum possible index
    for the given address space, and that?s where the original code
    fell short. And you're absolutely right that hardcoded values are problematic, the size of the array should have been linked with
    the n-bits address space expectation.

    I understand what you're doing. However one thinks of it, what is
    needed is a way to ensure the produced index value is in the range
    of array index values, and that the mapping covers the full range of
    array index values. Using bitwise-and is a way of solving a less
    general problem. Unfortunately: one, although it is known that
    using bitwise-and works only for certain array sizes, there was no
    check or assertion in the code to verify that requirement; two,
    it's a holdover from earlier times when the performance difference
    might matter, but now it's a premature optimization (and in most
    cases does not result in any improvement); and three, in this case
    using bitwise-and contributed to the bug, which wouldn't have
    happened if modulo had been used instead.
    --- Synchronet 3.21a-Linux NewsLink 1.2