• A neat hash table implementation

    From highcrew@high.crew3868@fastmail.com to comp.lang.c on Tue Apr 7 23:38:01 2026
    From Newsgroup: comp.lang.c

    The most annoying bit of C, to me, is the lack of a decent hash table implementation.

    hsearch_r is close to my definition of decent, but it has huge
    limitations: fixed size that needs to be estimated before use, lack of a
    delete primitive, and it can only use string keys.

    I recently stumbled into klib: https://attractivechaos.github.io/klib/
    Klib has a nice header-only hash-table implementation: khash.h

    To be honest, I'm not a huge fan of vendoring header-only libraries.
    It is also a little hard to understand at first, since it makes heavy
    use of cpp macros. Yet it definitely captured my attention.

    What I like most:

    - A reasonable API
    - It is possible to redefine malloc/calloc/free before including the
    header. This allows me to use a memory pool, if so I wish
    - Proper handling of malloc failures (by contrast with glib2 and
    uthash)
    - It has a kh_resize() primitive, that I can use to pre-allocate the
    table to the maximum size, if I want to avoid re-allocations
    (of course I need to free up space if kh_size() == max)

    So far I couldn't find any obvious shortcoming, so I thought of sharing
    it with this NG, in case anyone missed it.
    --
    High Crew
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From jmj@jmj@energokod.gda.pl to comp.lang.c on Wed Apr 8 05:14:36 2026
    From Newsgroup: comp.lang.c

    W dniu 7.04.2026 o 23:38, highcrew pisze:
    hsearch_r is close to my definition of decent, but it has huge
    limitations: fixed size that needs to be estimated before use, lack of a delete primitive, and it can only use string keys.

    Did you know that these "limitations" are hash table attributes in 1. y. computer science students book?
    --
    Jacek Marcin Jaworski, Pruszcz Gd., woj. Pomorskie, Polska 🇵🇱, EU 🇪🇺;
    tel.: +48-609-170-742, najlepiej w godz.: 5:00-5:55 lub 16:00-17:25; <jmj@energokod.gda.pl>, gpg: 4A541AA7A6E872318B85D7F6A651CC39244B0BFA;
    Domowa s. WWW: <https://energokod.gda.pl>;
    Mini Netykieta: <https://energokod.gda.pl/MiniNetykieta.html>;
    Mailowa Samoobrona: <https://emailselfdefense.fsf.org/pl>.
    UWAGA:
    NIE ZACIĄGAJ "UKRYTEGO DŁUGU"! PŁAĆ ZA PROG. FOSS I INFO. INTERNETOWE! CZYTAJ DARMOWY: "17. Raport Totaliztyczny - Patroni Kontra Bankierzy": <https://energokod.gda.pl/raporty-totaliztyczne/17.%20Patroni%20Kontra%20Bankierzy.pdf>

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From highcrew@high.crew3868@fastmail.com to comp.lang.c on Wed Apr 8 09:29:13 2026
    From Newsgroup: comp.lang.c

    On 4/8/26 5:14 AM, 🇵🇱Jacek Marcin Jaworski🇵🇱 wrote:
    Did you know that these "limitations" are hash table attributes in 1. y. computer science students book?

    No, I did not know.
    --
    High Crew
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Wed Apr 8 10:42:26 2026
    From Newsgroup: comp.lang.c

    On 2026-04-07 23:38, highcrew wrote:
    The most annoying bit of C, to me, is the lack of a decent hash table implementation.

    hsearch_r is close to my definition of decent, but it has huge
    limitations: fixed size that needs to be estimated before use, lack of a delete primitive, and it can only use string keys.

    I recently stumbled into klib: https://attractivechaos.github.io/klib/
    Klib has a nice header-only hash-table implementation: khash.h

    To be honest, I'm not a huge fan of vendoring header-only libraries.
    It is also a little hard to understand at first, since it makes heavy
    use of cpp macros. Yet it definitely captured my attention.

    What I like most:

    - A reasonable API
    - It is possible to redefine malloc/calloc/free before including the
      header. This allows me to use a memory pool, if so I wish
    - Proper handling of malloc failures (by contrast with glib2 and
      uthash)
    - It has a kh_resize() primitive, that I can use to pre-allocate the
      table to the maximum size, if I want to avoid re-allocations
      (of course I need to free up space if kh_size() == max)

    So far I couldn't find any obvious shortcoming, so I thought of sharing
    it with this NG, in case anyone missed it.


    The lack of hash tables I consider indeed a severe limitation if you're
    used to simple data mappings - hail to Awk for that - and contributions
    welcome (for any language without that feature builtin).

    Beyond "[not] any obvious shortcoming"; have you practical experiences
    with using that library?

    The CPP macros is certainly something I despise. It seems to be used to
    provide some primitive form of genericity; is that observed correctly?

    Are recent "C" standards (meanwhile) providing such functionality?

    Janis

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From highcrew@high.crew3868@fastmail.com to comp.lang.c on Wed Apr 8 21:44:58 2026
    From Newsgroup: comp.lang.c

    On 4/8/26 10:42 AM, Janis Papanagnou wrote:
    Beyond "[not] any obvious shortcoming"; have you practical experiences
    with using that library?

    Not yet, but I'm currently integrating khash into a personal project.
    If anyone is interested, I can share my thoughts later in time...

    The CPP macros is certainly something I despise. It seems to be used to provide some primitive form of genericity; is that observed correctly?

    Correct.

    Quoting:
    Generic programming in C. Except my khash.h, all the other C hash
    libraries use (void*) to achieve generic typing. Using void* is okey for strings, but will cause overhead for integers. This is why all C
    libraries, except khash.h, is slower than C++ libraries on integer keys,
    but close to on string keys.

    https://attractivechaos.wordpress.com/2008/08/28/comparison-of-hash-table-libraries/
    (see "Generic programming in C")

    Are recent "C" standards (meanwhile) providing such functionality?

    AFAIK, the hsearch interface is the closest thing provided by POSIX
    and hsearch_r (what I consider the usable variant) is a GNU extension.
    --
    High Crew
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From makendo@makendo@makendo.invalid to comp.lang.c on Fri Apr 10 10:40:17 2026
    From Newsgroup: comp.lang.c

    Quoting:
    Generic programming in C. Except my khash.h, all the other C hash
    libraries use (void*) to achieve generic typing. Using void* is okey for
    strings, but will cause overhead for integers. This is why all C
    libraries, except khash.h, is slower than C++ libraries on integer keys,
    but close to on string keys.

    Barring macro usage, you also have to use function pointers to compare
    equality or hash an arbitrary object, and I dunno if compilers can know
    enough to devirtualize and inline these. Another source of overhead.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Apr 10 05:42:47 2026
    From Newsgroup: comp.lang.c

    Take unordered_map<>, that's ten times more comfortable than any C solution. --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.lang.c on Fri Apr 10 01:17:44 2026
    From Newsgroup: comp.lang.c

    On 4/7/2026 10:14 PM, 🇵🇱Jacek Marcin Jaworski🇵🇱 wrote:
    W dniu 7.04.2026 o 23:38, highcrew pisze:
    hsearch_r is close to my definition of decent, but it has huge
    limitations: fixed size that needs to be estimated before use, lack of a
    delete primitive, and it can only use string keys.

    Did you know that these "limitations" are hash table attributes in 1. y. computer science students book?


    FWIW:
    Hash tables, like linked lists, are one of those things that pretty much everyone rolls custom nearly every time they need one.

    So, fast and easy to implement for a specific task;
    But, more difficult to "generalize' without also ruining any advantages
    it might have otherwise had (almost invariably the code would end up an
    order of magnitude or more bulkier and slower than had they rolled
    something specifically for the task at hand).

    So, generally, no one really generalizes or attempts to reuse them,
    because doing so is almost invariably worse than writing something one-off.


    By the time one has generalized it to the point of dealing with multiple
    key and value types, one has often gotten much of the way to
    implementing something like a dynamic type-system or similar, which is
    alas another one of those things that people either avoid entirely, or
    tend to just end up rolling their own.

    And, if one tries to generalize it enough that one could in theory use
    it across multiple projects, almost invariably one will end up with the
    sort of thing that no one wants to poke with a stick.

    And almost invariably someone else will think it a good idea to have it
    use a Garbage Collector, etc, and thus far pretty much no one has
    managed to make a garbage collector that manages to "consistently not suck".

    So, the pattern continues...



    That said, one does tend to develop a toolbox of patterns they can use whenever they need to roll a hash table or whatever else.

    And, there are some semi-difficult problems, like how to most
    efficiently hash a variable-length string.


    Well, decided to leave out going into things like how to efficiently
    hash strings, as getting fast string hashing here may involve use of techniques that some people might consider, unnatural.

    Well, and also someone at the level of asking about reusable hash tables
    is not likely at the level of being relative dark arts of
    speed-optimized string handling (well, of the sort that involves wild
    pointer casting and treating blobs of characters as integer data and
    using arithmetic tricks to detect the presence of a NUL terminator).



    But, then again, arguably someone at the level of asking about generic
    hash implementations might not be in a good position to write moderately efficient hand-rolled code (so, the inefficiency of trying to use
    something generic might not necessarily be worse than if they try to
    write it themselves).

    I guess someone could try to write an implementation where the hash
    functions and similar are provided by vtables with an interface
    mechanism along vaguely similar lines to qsort, and an initialization mechanism that could fill in pointers for some common cases (such as
    common power-of-2 sizes for payload data, ...).

    Well, and at this level, one can question if hash-tables are the best interface (by the time someone is paying these sorts of costs, one may
    have crossed into a realm where a B-Tree or similar might be a better
    option, etc).

    Or, people may use the term "hash table" when they mean "some sort of
    generic key/value mapping" without caring about the underlying algorithm
    (say, for example, if one has a dictionary-object like in a language
    like JavaScript or similar, do they really need to care whether it is implemented on top of a hash table, B-Tree, or AVL-Tree, ?...).

    Sometimes, interface overhead costs may be hard to work around.

    ...


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c,comp.lang.c++ on Fri Apr 10 13:13:40 2026
    From Newsgroup: comp.lang.c

    On Fri, 10 Apr 2026 05:42:47 +0200
    Bonita Montero <Bonita.Montero@gmail.com> wrote:

    Take unordered_map<>, that's ten times more comfortable than any C
    solution.

    First, your reply is mildly off topic in comp.lang.c.
    Second, my reply is going to be even more off topic. Not that it stops
    me.
    In order to remain partly on topic I'm cross-posting to comp.lang.c++

    A. For most common cases I agree.

    B. For less common cases, where standard library does not provide
    ready-made hash function for your Key type, it is less clear.
    On one hand, implementing only custom hash function (or functor) is less
    work than implementing full associative array.
    On the other hand, it's harder work. Exact requirements for Hash are not specified in easy-to-read language. I don't know about you, but for me personally it feels like going on mine field.
    So, personally, I'd rather code full associative array than Hash for std::unordered_map<>. More work, but better feeling of control.

    All that rather different from supplying comparison primitive for
    std::map<> where interface and requirements are crystal clear and the
    task itself is much simpler.

    C. std::unordered_map<> can be surprising.
    I think, it was you who showed us such case couple of years ago, when
    you tried to use string literals as keys.

    D. Performance.
    For optimized "Release" build performance of std::unordered_map<> is
    typically very good. But for non-optimized "Debug" build it typically
    very bad.
    That's unlike C-style implementations in separately compiled library:
    here performance is, may be, not quite as good as "Release" build of std::unordered_map<>, but close. But it remains the same under
    "Debug" build.

    E. Compilation time. Slow.
    Could not be a problem when unordered_map used in just few compilation
    units. Could be serious problem otherwise.


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Fri Apr 10 13:25:12 2026
    From Newsgroup: comp.lang.c

    On Fri, 10 Apr 2026 01:17:44 -0500
    BGB <cr88192@gmail.com> wrote:
    On 4/7/2026 10:14 PM, 🇵🇱Jacek Marcin Jaworski🇵🇱 wrote:
    W dniu 7.04.2026 o 23:38, highcrew pisze:
    hsearch_r is close to my definition of decent, but it has huge
    limitations: fixed size that needs to be estimated before use,
    lack of a delete primitive, and it can only use string keys.

    Did you know that these "limitations" are hash table attributes in
    1. y. computer science students book?


    FWIW:
    Hash tables, like linked lists, are one of those things that pretty
    much everyone rolls custom nearly every time they need one.

    Not that I disagree with many of your points below, but here you are
    wrong. Hash tables are very *unlike* linked lists. Be sure that
    overwhelming majority of professional programmers, including good ones,
    never implemented a hash table themselves after they left school.
    It seems, you mostly converse with people that share your extreme DIY
    attitude. Please realize that for better or worse the bigger world is different.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c,comp.lang.c++ on Fri Apr 10 13:07:47 2026
    From Newsgroup: comp.lang.c

    Am 10.04.2026 um 12:13 schrieb Michael S:

    On the other hand, it's harder work. Exact requirements for Hash are not specified in easy-to-read language. I don't know about you, but for me personally it feels like going on mine field.

    Most hashers are 5min of work.

    C. std::unordered_map<> can be surprising.
    I think, it was you who showed us such case couple of years ago, when
    you tried to use string literals as keys.

    Maybe that was before C++17. Today I'd use a string_view.

    E. Compilation time. Slow.
    Could not be a problem when unordered_map used in just few compilation
    units. Could be serious problem otherwise.

    I'm using a compiler which supports C++20 modules.
    With that compile times are a magnitude lower.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.lang.c on Fri Apr 10 13:00:53 2026
    From Newsgroup: comp.lang.c

    On 4/10/2026 5:25 AM, Michael S wrote:
    On Fri, 10 Apr 2026 01:17:44 -0500
    BGB <cr88192@gmail.com> wrote:

    On 4/7/2026 10:14 PM, 🇵🇱Jacek Marcin Jaworski🇵🇱 wrote:
    W dniu 7.04.2026 o 23:38, highcrew pisze:
    hsearch_r is close to my definition of decent, but it has huge
    limitations: fixed size that needs to be estimated before use,
    lack of a delete primitive, and it can only use string keys.

    Did you know that these "limitations" are hash table attributes in
    1. y. computer science students book?


    FWIW:
    Hash tables, like linked lists, are one of those things that pretty
    much everyone rolls custom nearly every time they need one.


    Not that I disagree with many of your points below, but here you are
    wrong. Hash tables are very *unlike* linked lists. Be sure that
    overwhelming majority of professional programmers, including good ones,
    never implemented a hash table themselves after they left school.

    It seems, you mostly converse with people that share your extreme DIY attitude. Please realize that for better or worse the bigger world is different.


    ?...

    So, then are people just naively looping over arrays and using linear
    search for everything? Etc.


    Then again, I can note that (for example) in the Doom engine, the
    original WAD code, while it did use a trick to speed up lump-name
    checking, was still using linear search to lookup WAD lumps.

    Something like (from memory):
    for(i=0; i<numwadlumps; i++)
    {
    if( ((int *)(waddir[i].name)[0] == (int *)name[0]) &&
    ((int *)(waddir[i].name)[1] == (int *)name[1]) )
    break;
    }

    But, this could be sped up some, say:
    llname = * *(long long *)name;
    h = (int *)name[0] ^ (int *)name[1];
    h ^= h >> 17;
    h ^= h >> 7;
    h &= 255;
    c = waddirhash[h];
    while(c>=0)
    {
    if( *(long long *)(waddir[c].name) == llname) )
    break;
    c = waddir[c].chain;
    }

    This doesn't feel too atypical, except that some people don't like this
    sort of wanton pointer casting and similar.



    But, as noted, gets harder if one wants a fast hash for a
    variable-length string, say:
    lcomask=0x8080808080808080ULL;
    lcoadj =0x0101010101010101ULL;
    v=*(uint64_t *)str;
    s=str; va=0;
    //check for NUL terminators
    while(!((v-lcoadj)&((~v)&lcomask)))
    {
    va^=(va>>17)
    va^=(va<< 7)
    va^=v;
    s+=8;
    v=*(uint64_t *)s;
    }
    //trailing bytes
    for(i=0; i<64; i+=8)
    {
    c=(v>>i)&255;
    if(!c) break; // NUL byte
    va^=(va>>17)
    va^=(va<< 7)
    va^=(c<<13);
    }
    //generate final hash
    v=va^(va>>31);
    v^=v>>17;
    v^=v>>11;
    h=v&4095;
    return(h);

    As opposed to a simpler but slower:
    h=0;
    while(*s)h=(h*251)+(*s++);
    h=((h*251+7)>>8)&4095;
    return(h);

    But, then again, a lot depends on average string lengths, and if the
    averate length is less than around 8 characters or so, the latter may
    still be competitive (or faster). Or, one may not be worrying too much
    about the speed of the hash function.

    Well, and the former does seem to fall outside the realm of typical C
    coding practice.

    Also this particular algorithm also assumes a machine with fast
    unaligned memory loads, ... Whereas the naive strategy may be safer if
    one allows for targets which may be alignment sensitive, ...


    Though, a similar approach can be used to determine things like string
    length, or to implement things like "strcmp()".

    Where, for example, the C library I am using in some of my projects had originally been using more naive string functions (looping and working character by character), but this was a lot slower.

    Even if the faster versions can look like ugly hair...


    Well, and sometimes there are cases where things like AVL Trees or
    B-Trees or similar are preferable (for example, they may scale better
    and use less memory, even if the average time lookup cost is higher).

    But, hashes can work well as a good quick/dirty way to speed up lookups,
    etc.


    Well, or sometimes they can be combined:
    A small hash-table as a lookup cache on the front-end of a B-Tree;
    ...


    Etc...

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Bart@bc@freeuk.com to comp.lang.c on Fri Apr 10 21:07:54 2026
    From Newsgroup: comp.lang.c

    On 10/04/2026 11:25, Michael S wrote:
    On Fri, 10 Apr 2026 01:17:44 -0500
    BGB <cr88192@gmail.com> wrote:

    On 4/7/2026 10:14 PM, 🇵🇱Jacek Marcin Jaworski🇵🇱 wrote:
    W dniu 7.04.2026 o 23:38, highcrew pisze:
    hsearch_r is close to my definition of decent, but it has huge
    limitations: fixed size that needs to be estimated before use,
    lack of a delete primitive, and it can only use string keys.

    Did you know that these "limitations" are hash table attributes in
    1. y. computer science students book?


    FWIW:
    Hash tables, like linked lists, are one of those things that pretty
    much everyone rolls custom nearly every time they need one.


    Not that I disagree with many of your points below, but here you are
    wrong. Hash tables are very *unlike* linked lists. Be sure that
    overwhelming majority of professional programmers, including good ones,
    never implemented a hash table themselves after they left school.

    C programmers or all programmers? Since I'd guess the majority of the
    latter wouldn't know *how* to implement a hash table, if they even know
    what it is.


    It seems, you mostly converse with people that share your extreme DIY attitude. Please realize that for better or worse the bigger world is different.

    C encourages DIY code, by not having any standard libraries for many
    things, by omitting certain language features, by being used to
    /implement such/ things for other languages where they are built-in.

    For hash tables however, I agree with BGB: a hash algorithm, once you've
    got one that works well for your task, is a few lines of code. If self-implemented, rather than buried in a library with a clunky 'general purpose' interface, it can be made very tight and efficient.

    Hash tables have simple criteria: how long it takes to work out the hash function, and how many clashes arise. Those are easy to test and compare.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.lang.c on Fri Apr 10 16:13:06 2026
    From Newsgroup: comp.lang.c

    On 4/10/2026 3:07 PM, Bart wrote:
    On 10/04/2026 11:25, Michael S wrote:
    On Fri, 10 Apr 2026 01:17:44 -0500
    BGB <cr88192@gmail.com> wrote:

    On 4/7/2026 10:14 PM, 🇵🇱Jacek Marcin Jaworski🇵🇱 wrote:
    W dniu 7.04.2026 o 23:38, highcrew pisze:
    hsearch_r is close to my definition of decent, but it has huge
    limitations: fixed size that needs to be estimated before use,
    lack of a delete primitive, and it can only use string keys.

    Did you know that these "limitations" are hash table attributes in
    1. y. computer science students book?

    FWIW:
    Hash tables, like linked lists, are one of those things that pretty
    much everyone rolls custom nearly every time they need one.


    Not that I disagree with many of your points below, but here you are
    wrong. Hash tables are very *unlike* linked lists. Be sure that
    overwhelming majority of professional programmers, including good ones,
    never implemented a hash table themselves after they left school.

    C programmers or all programmers? Since I'd guess the majority of the
    latter wouldn't know *how* to implement a hash table, if they even know
    what it is.


    Probably true, the guys using JavaScript and Python probably have little
    need to write their own.

    And the guys using Java and C++ and similar can use what the libraries provide.


    Contrast, C is often used in cases where you don't even know for certain
    if there is an OS...



    It seems, you mostly converse with people that share your extreme DIY
    attitude. Please realize that for better or worse the bigger world is
    different.

    C encourages DIY code, by not having any standard libraries for many
    things, by omitting certain language features, by being used to /
    implement such/ things for other languages where they are built-in.

    For hash tables however, I agree with BGB: a hash algorithm, once you've
    got one that works well for your task, is a few lines of code. If self- implemented, rather than buried in a library with a clunky 'general
    purpose' interface, it can be made very tight and efficient.

    Hash tables have simple criteria: how long it takes to work out the hash function, and how many clashes arise. Those are easy to test and compare.


    Yeah, you find some things that work, and end up doing that every time
    it comes up.


    Not everything needs to be a big general purpose library.
    Well, and if one does, it can turn into a feedback loop of trying to
    make things ever bigger and more complex in an attempt to address every possible use-case. Rather than just the very specific scenario that
    prompted use of hashing as a workaround (most often cases of "well,
    using linear search was too slow here", or other types of lookup, ...).


    So, in my thinking, they still fall in a similar category to things like linked lists.


    Contrast, if there is something that is more deserving of a library
    interface, it is something like sorting algorithms:
    Implementing a naive sorting algorithm (like selection or bubble sort)
    is often slow;
    Something like quicksort is involved and fiddly enough to where it
    doesn't make sense to freehand it every time it comes up;
    ...

    But, at the same time, you don't generally use a library call like
    "qsort()" to select the smallest of 3 or 4 numbers or to sort 5 to 10
    items, as the constant cost of using "qsort()" will be higher than
    whatever one could potentially "save" by using it (and internally, these
    tend to fall back to a naive sorting algorithm whenever the subset falls
    below some minimum number of items; so it may be extra API overhead just
    to have it fall back to bubble sort or similar anyways).



    But, on the other side, one might end up looking back at their code and wondering why exactly they ended up with four different implementations
    of math code for doing software emulation of Binary128 numbers (each
    written for a specific use-case, but the underlying ideas of Binary128 floating-point math don't tend to vary as much as the specific forms the values take and the contexts in which it was used, *).

    *:
    Struct with two 64 bit members and for running on a hosted PC-like target;
    One version that is intended to run in an interrupt handler and is given values as pointers to pairs of "uint64_t *" numbers (using "__int128"
    math internally);
    One that is written in assembler for a specific CPU ISA;
    One that is written in C and assumes the compiler having __int128 and __uint128 types and the ability to use __m128 to facilitate bit-casts
    between __float128 and __uint128.

    Say:
    __float128 x;
    __uint128 vix;
    vix=(__uint128)((__m128)x);
    Which is not exactly standard or portable behavior, but on the target in question is the fastest option (note that trying to use "memcpy()" here
    would wreck performance, as the compiler in question can't optimize away
    the memory stores/loads, but can turn the casts into a direct 128-bit register-pair move, interpreting it as a request to do a direct
    bit-for-bit copy; where leaving out the __m128 cast here would instead
    result in it trying to convert the value into an integer).

    Sometimes, there be dragons here...


    ...


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Fri Apr 10 17:31:34 2026
    From Newsgroup: comp.lang.c

    On 2026-04-10 14:00, BGB wrote:
    On 4/10/2026 5:25 AM, Michael S wrote:
    On Fri, 10 Apr 2026 01:17:44 -0500
    BGB <cr88192@gmail.com> wrote:
    ...
    FWIW:
    Hash tables, like linked lists, are one of those things that pretty
    much everyone rolls custom nearly every time they need one.


    Not that I disagree with many of your points below, but here you are
    wrong. Hash tables are very *unlike* linked lists. Be sure that
    overwhelming majority of professional programmers, including good ones,
    never implemented a hash table themselves after they left school.

    It seems, you mostly converse with people that share your extreme DIY
    attitude. Please realize that for better or worse the bigger world is
    different.


    ?...

    So, then are people just naively looping over arrays and using linear
    search for everything? Etc.

    How do you reach that conclusion from the above comment? All that you
    can conclude from the above is that if they think they need a hash
    table, they use one that someone else has already implemented, rather
    than implementing one of their own. I personally have never implemented
    a hash table, but I have used the hashing capability that are built into several of the tools I do use. The closest I have come to implementing
    my own hash table is to help someone else fix the hash function they
    were passing to a hashing routine. Their function was producing a
    ridiculously high number of collisions, because they hadn't put any
    thought into the distribution of the data they were hashing.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.lang.c on Fri Apr 10 23:34:41 2026
    From Newsgroup: comp.lang.c

    On 4/10/2026 4:31 PM, James Kuyper wrote:
    On 2026-04-10 14:00, BGB wrote:
    On 4/10/2026 5:25 AM, Michael S wrote:
    On Fri, 10 Apr 2026 01:17:44 -0500
    BGB <cr88192@gmail.com> wrote:
    ...
    FWIW:
    Hash tables, like linked lists, are one of those things that pretty
    much everyone rolls custom nearly every time they need one.


    Not that I disagree with many of your points below, but here you are
    wrong. Hash tables are very *unlike* linked lists. Be sure that
    overwhelming majority of professional programmers, including good ones,
    never implemented a hash table themselves after they left school.

    It seems, you mostly converse with people that share your extreme DIY
    attitude. Please realize that for better or worse the bigger world is
    different.


    ?...

    So, then are people just naively looping over arrays and using linear
    search for everything? Etc.

    How do you reach that conclusion from the above comment? All that you
    can conclude from the above is that if they think they need a hash
    table, they use one that someone else has already implemented, rather
    than implementing one of their own. I personally have never implemented
    a hash table, but I have used the hashing capability that are built into several of the tools I do use. The closest I have come to implementing
    my own hash table is to help someone else fix the hash function they
    were passing to a hashing routine. Their function was producing a ridiculously high number of collisions, because they hadn't put any
    thought into the distribution of the data they were hashing.


    There are few options sometimes.


    Say, for example, you want to look over a list of strings and map them
    to an ordinal number:
    char *strings[]={
    "first",
    "second",
    ...
    NULL
    };
    How do you do it then, exactly?...


    One could use a "for()" loop, say:
    for(i=0; strings[i]; i++)
    if(!strcmp(strings[i], str))
    break;
    Which is, often OK.


    What else, use a 3rd party library? Far likely, that is a bigger pain...
    Well, and 3rd party library dependencies are often not worth it.


    What else could you do? Maybe:
    int strings_hash[256];
    short strings_chain[(sizeof(strings)/sizeof(char *))+1];
    ...
    int HashString(char *str)
    {
    char *s;
    int h;
    s=str;
    while(*s)h=h*251+(*s++);
    return(((h*251+7)>>8)&255);
    }
    int GetIndexInStrings(char *str)
    {
    if(!strings_hash[255])
    {
    int i, j;
    for(i=0; i<256; i++)
    strings_hash[i]=-1;
    for(i=0; strings[i]; i++)
    {
    j=HashString(strings[i]);
    strings_chain[i]=strings_hash[j];
    strings_hash[j]=i;
    }
    }
    i=HashString(str);
    while(i>=0)
    {
    if(!strcmp(strings[i], str))
    break;
    i=strings_chain[i];
    }
    return(i);
    }
    ...


    I suspect finding code that someone else had written for this would
    likely be more of a hassle than writing it oneself.

    It likely isn't even to the level where it would make sense for someone
    to ask Grok or Gemini to write it...

    It is to the level where one may one-off it just to write a Usenet
    response...


    Or, maybe they want a key/value mapping, or any number of other things
    that can be lumped under the name of "hash table"?...

    well, OK, maybe get a little creating:
    typedef struct SomeSortOfHash_s SomeSortOfHash;
    struct SomeSortOfHash_s {
    short *keys;
    void *vals;
    char shsz;
    };
    SomeSortOfHash *NewSomeSortOfHash(int shsz)
    {
    SomeSortOfHash *tmp;
    tmp=malloc(sizeof(SomeSortOfHash));
    tmp->shsz=shsz;
    tmp->keys=malloc((1<<shsz)*sizeof(short));
    tmp->vals=malloc((1<<shsz)*sizeof(void *));
    memset(tmp->keys, 255, (1<<shsz)*sizeof(short));
    return(tmp);
    }
    int SomeSortOfHash_LookupSlotNumberForOrdinal(
    SomeSortOfHash *tab, short ord)
    {
    int i, j, k, h, sz, szm;
    sz=1<<tab->shsz;
    szm=(sz-1);
    h=((ord*65521+17)>>16)&szm;
    if(tab->keys[h]==ord)
    return(h);
    j=h; j=j*31+7;
    for(i=0; i<16; i++)
    {
    k=(j>>5)&szm;
    if(tab->keys[k]==ord)
    return(k);
    }
    //dunno, if near full, fall back to linear search or such.
    for(i=0; i<sz; i++)
    if(tab->keys[i]==ord)
    return(i);
    return(-1);
    }
    int SomeSortOfHash_GetSlotNumberForOrdinal(
    SomeSortOfHash *tab, short ord)
    {
    //nearly the same, just insert the ordinal, or expand table.
    //or do both Lookup and Get with same internal function,
    //using a flag for whether or not to add a new member
    }
    int SomeSortOfHash_LookupSlotNumberForName(
    SomeSortOfHash *tab, char *name)
    {
    int ord;
    ord=GetOrdinalForName(name);
    if(ord<=0)
    return(-1);
    return(SomeSortOfHash_LookupSlotNumberForOrdinal(tab, ord));
    }
    ...

    Well, could fill out the rest, but point is obvious enough.

    Maybe this is enough to start to care about it.


    Granted, I don't use hash tables in this particular way more often (for
    small key/value dictionaries had more often ended up using certain types
    of B-Trees or similar).

    Though, ironically, despite being "technically" B-Trees, often a very different sort of structure from that typically used in filesystems or databases (which are often built around disk blocks and for associations between name-strings and data-blobs, rather than associating an ordinal
    with a pointer or similar).

    ...


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sat Apr 11 08:20:36 2026
    From Newsgroup: comp.lang.c

    On 2026-04-11 06:34, BGB wrote:
    [...]

    There are few options sometimes.

    Say, for example, you want to look over a list of strings and map them
    to an ordinal number:
      char *strings[]={
      "first",
      "second",
      ...
      NULL
      };
    How do you do it then, exactly?...

    Static lists of strings will get incorporated presorted in programs.
    (I use Unix sort(1) for that, either directly or using my editor.)

    In case an empty array will get populated dynamically from unknown
    input there's the option to insert them in a sorted order instead
    of sorting them.

    [...]

    Granted, I don't use hash tables in this particular way more often (for small key/value dictionaries had more often ended up using certain types
    of B-Trees or similar).

    You really mean B-Trees here, not binary trees? - Compared to binary
    trees, B-Trees are at least a bit more complex to organize. And the
    linked structures of dynamic tree structures require more effort and
    care compared to a static hash array.


    Though, ironically, despite being "technically" B-Trees, often a very different sort of structure from that typically used in filesystems or databases (which are often built around disk blocks and for associations between name-strings and data-blobs, rather than associating an ordinal
    with a pointer or similar).

    In case I'd need a fast tree structure I'd rather (for specific cases)
    use B*-Trees instead of B-Trees; they have better properties. Usually,
    though, I never needed these sorts of trees; I used binary or AVL in
    some (very few) cases.

    BTW, in early days I assumed that Awk realized its associative arrays
    with trees, but Arnold told me that internally he uses a hash array (I
    never checked). Obvious one variant that fits for all application cases.

    Janis

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.lang.c on Sat Apr 11 04:19:15 2026
    From Newsgroup: comp.lang.c

    On 4/11/2026 1:20 AM, Janis Papanagnou wrote:
    On 2026-04-11 06:34, BGB wrote:
    [...]

    There are few options sometimes.

    Say, for example, you want to look over a list of strings and map them
    to an ordinal number:
       char *strings[]={
       "first",
       "second",
       ...
       NULL
       };
    How do you do it then, exactly?...

    Static lists of strings will get incorporated presorted in programs.
    (I use Unix sort(1) for that, either directly or using my editor.)

    In case an empty array will get populated dynamically from unknown
    input there's the option to insert them in a sorted order instead
    of sorting them.


    I guess yeah, could use binary search in that case...


    [...]

    Granted, I don't use hash tables in this particular way more often
    (for small key/value dictionaries had more often ended up using
    certain types of B-Trees or similar).

    You really mean B-Trees here, not binary trees? - Compared to binary
    trees, B-Trees are at least a bit more complex to organize. And the
    linked structures of dynamic tree structures require more effort and
    care compared to a static hash array.


    Yes, B-Trees, though for dictionary type objects typically used a small
    fixed branching factor, usually 16 or so (each node may have up to 16 children).


    So, in this case:
    0- 16 children: 1 level, 1 node
    17- 256 children: 2 levels, 3 to 17 nodes
    257-4096 children: 3 levels

    Typically there was a max of 3 or 4 levels as I tend to usually use
    16-bit ordinal numbers as keys.

    In a few cases:
    0001-7FFF: Identifier
    8000-FFFE: Indexed items.


    Though in BGBCC's ASTs, there was a limit of 4096 AST identifiers, with
    the high bits used to identify element type (node, string, 64-bit
    integer, Binary64 number). With an indexing scheme for up to 32K sub-nodes.

    The ASTs are kinda weird in that they pretend to be XML DOM but
    internally now operate very different than a conventional DOM API.
    Mostly this is because they were tuned more heavily in terms of
    performance and memory use.

    Also, I quickly found that any sort of variable-size memory allocation
    was something best avoided in any way possible.


    In cases where I had more identifiers, the dictionaries usually
    contained tagged-references, typically (high 4 bits):
    0000: Pointer (48-bit address, 12-bit object type)
    0001: Smaller Literal Types (32-48 bits of payload data)
    0010: Bounded Pointers in BGBCC, 0 based
    0011: Bounded Pointers in BGBCC, variable-based
    01xx: Fixnum (62 bits)
    10xx: Flonum (62 bits, S.E11.M50)
    ...

    Though, the tag-refs are more used in my runtime library and ABI (either
    on my own ISA or RISC-V); with it also using similar B-Tree style
    objects for things like ex-nihilo objects.

    In BGBCC, it is sort of possible to use C language extensions to do
    JavaScript like stuff, but there is a lot of hair here as the C and
    JavaScript style type-systems don't mix well (fully preserving C
    semantics inside of a dynamic system is not viable, to mixed use exposes
    a lot more sharp corners).

    Well, and also there is a class/instance object system, but it behaves
    more like a Java/C# style object system, rather than a C++ style system.


    There is no usable C++ support in BGBCC, this is basically a bit beyond
    my current effort budget.



    Though, ironically, despite being "technically" B-Trees, often a very
    different sort of structure from that typically used in filesystems or
    databases (which are often built around disk blocks and for
    associations between name-strings and data-blobs, rather than
    associating an ordinal with a pointer or similar).

    In case I'd need a fast tree structure I'd rather (for specific cases)
    use B*-Trees instead of B-Trees; they have better properties. Usually, though, I never needed these sorts of trees; I used binary or AVL in
    some (very few) cases.


    I have used AVL trees as well.

    A little easier to implement, but generally more memory use and slower
    (in typical use).

    say:
    struct AVLNode_s {
    AVLNode *lno;
    AVLNode *rno;
    void *data;
    u16 key;
    byte depth;
    }; // 32 bytes

    struct BTreeNode_s {
    uint64_t vals[16];
    uint16_t keys[16];
    byte nkey;
    byte depth;
    }; //168 bytes

    Now, say you want an object with 8 items:
    AVL: 256 bytes
    B-Tree: 168 bytes
    Or: 12 items:
    AVL: 384 bytes
    B-Tree: 168 bytes

    ...



    BTW, in early days I assumed that Awk realized its associative arrays
    with trees, but Arnold told me that internally he uses a hash array (I
    never checked). Obvious one variant that fits for all application cases.


    FWIW:
    In BGBCC, it uses B-Trees for the AST nodes, but most lookups for things
    like the global toplevel and similar use hashed lookups (though,
    generally all of the global toplevel objects exist in a big array, and
    may be referenced by index).

    Symbols/labels generally also use a tagged integer structure (generally 32-bit), with a range of symbols keyed to a string table of symbol names
    (all of the C identifiers and similar are essentially interned into a
    table with 32-bit index numbers, as at this point, a 16-bit number is no longer sufficient).


    Some of this is mostly because integer compare is faster than using
    "strcmp()" for lookups. Though, a lot of these objects have name strings
    as well.

    Well, and also types are represented both as a signature string and as a bit-packed tagged integer format. Where the strings are generally used
    as the external representation (and the bit-packed integer format mostly
    for internal use by the compiler).

    Some of this stuff can get a little crazy though.

    ...


    Janis


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Sat Apr 11 20:28:35 2026
    From Newsgroup: comp.lang.c

    On Fri, 10 Apr 2026 23:34:41 -0500
    BGB <cr88192@gmail.com> wrote:

    On 4/10/2026 4:31 PM, James Kuyper wrote:
    On 2026-04-10 14:00, BGB wrote:
    On 4/10/2026 5:25 AM, Michael S wrote:
    On Fri, 10 Apr 2026 01:17:44 -0500
    BGB <cr88192@gmail.com> wrote:
    ...
    FWIW:
    Hash tables, like linked lists, are one of those things that
    pretty much everyone rolls custom nearly every time they need
    one.

    Not that I disagree with many of your points below, but here you
    are wrong. Hash tables are very *unlike* linked lists. Be sure
    that overwhelming majority of professional programmers, including
    good ones, never implemented a hash table themselves after they
    left school.

    It seems, you mostly converse with people that share your extreme
    DIY attitude. Please realize that for better or worse the bigger
    world is different.


    ?...

    So, then are people just naively looping over arrays and using
    linear search for everything? Etc.

    How do you reach that conclusion from the above comment? All that
    you can conclude from the above is that if they think they need a
    hash table, they use one that someone else has already implemented,
    rather than implementing one of their own. I personally have never implemented a hash table, but I have used the hashing capability
    that are built into several of the tools I do use. The closest I
    have come to implementing my own hash table is to help someone else
    fix the hash function they were passing to a hashing routine. Their function was producing a ridiculously high number of collisions,
    because they hadn't put any thought into the distribution of the
    data they were hashing.

    There are few options sometimes.


    Say, for example, you want to look over a list of strings and map
    them to an ordinal number:
    char *strings[]={
    "first",
    "second",
    ...
    NULL
    };
    How do you do it then, exactly?...


    One could use a "for()" loop, say:
    for(i=0; strings[i]; i++)
    if(!strcmp(strings[i], str))
    break;
    Which is, often OK.


    What else, use a 3rd party library? Far likely, that is a bigger
    pain... Well, and 3rd party library dependencies are often not worth
    it.


    And still, overwhelming majority of pro programmers will happily accept
    the pain. That is, if the idea to use hash tables crossed their mind in
    the first place.
    Another option, probably more common in bigger organizations, is to
    ask guru. Bigger organizations tend to employ gurus. Guru could ether
    help them with 3rd party lib or, if in the good mood, implement
    something that meets their needs.





    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From HashTables@invalid@invalid.invalid to comp.lang.c on Sat Apr 11 20:06:43 2026
    From Newsgroup: comp.lang.c

    On 10/04/2026 19:00, BGB wrote:
    On 4/10/2026 5:25 AM, Michael S wrote:
    On Fri, 10 Apr 2026 01:17:44 -0500
    BGB <cr88192@gmail.com> wrote:

    On 4/7/2026 10:14 PM, 🇵🇱Jacek Marcin Jaworski🇵🇱 wrote:
    W dniu 7.04.2026 o 23:38, highcrew pisze:
    hsearch_r is close to my definition of decent, but it has huge
    limitations: fixed size that needs to be estimated before use,
    lack of a delete primitive, and it can only use string keys.

    Did you know that these "limitations" are hash table attributes in
    1. y. computer science students book?

    FWIW:
    Hash tables, like linked lists, are one of those things that pretty
    much everyone rolls custom nearly every time they need one.


    Not that I disagree with many of your points below, but here you are
    wrong. Hash tables are very *unlike* linked lists. Be sure that
    overwhelming majority of professional programmers, including good ones,
    never implemented a hash table themselves after they left school.

    It seems, you mostly converse with people that share your extreme DIY
    attitude. Please realize that for better or worse the bigger world is
    different.


    ?...

    So, then are people just naively looping over arrays and using linear
    search for everything? Etc.


    Then again, I can note that (for example) in the Doom engine, the
    original WAD code, while it did use a trick to speed up lump-name
    checking, was still using linear search to lookup WAD lumps.

    Something like (from memory):
      for(i=0; i<numwadlumps; i++)
      {
        if( ((int *)(waddir[i].name)[0] == (int *)name[0]) &&
            ((int *)(waddir[i].name)[1] == (int *)name[1]) )
              break;
      }

    But, this could be sped up some, say:
      llname = * *(long long *)name;
      h = (int *)name[0] ^ (int *)name[1];
      h ^= h >> 17;
      h ^= h >>  7;
      h &= 255;
      c = waddirhash[h];
      while(c>=0)
      {
        if( *(long long *)(waddir[c].name) == llname) )
          break;
        c = waddir[c].chain;
      }

    This doesn't feel too atypical, except that some people don't like this
    sort of wanton pointer casting and similar.



    But, as noted, gets harder if one wants a fast hash for a
    variable-length string, say:
      lcomask=0x8080808080808080ULL;
      lcoadj =0x0101010101010101ULL;
      v=*(uint64_t *)str;
      s=str; va=0;
      //check for NUL terminators
      while(!((v-lcoadj)&((~v)&lcomask)))
      {
        va^=(va>>17)
        va^=(va<< 7)
        va^=v;
        s+=8;
        v=*(uint64_t *)s;
      }
      //trailing bytes
      for(i=0; i<64; i+=8)
      {
        c=(v>>i)&255;
        if(!c) break;  // NUL byte
        va^=(va>>17)
        va^=(va<< 7)
        va^=(c<<13);
      }
      //generate final hash
      v=va^(va>>31);
      v^=v>>17;
      v^=v>>11;
      h=v&4095;
      return(h);

    As opposed to a simpler but slower:
      h=0;
      while(*s)h=(h*251)+(*s++);
      h=((h*251+7)>>8)&4095;
      return(h);

    But, then again, a lot depends on average string lengths, and if the
    averate length is less than around 8 characters or so, the latter may
    still be competitive (or faster). Or, one may not be worrying too much
    about the speed of the hash function.

    Well, and the former does seem to fall outside the realm of typical C
    coding practice.

    Also this particular algorithm also assumes a machine with fast
    unaligned memory loads, ... Whereas the naive strategy may be safer if
    one allows for targets which may be alignment sensitive, ...


    Though, a similar approach can be used to determine things like string length, or to implement things like "strcmp()".

    Where, for example, the C library I am using in some of my projects had originally been using more naive string functions (looping and working character by character), but this was a lot slower.

    Even if the faster versions can look like ugly hair...


    Well, and sometimes there are cases where things like AVL Trees or
    B-Trees or similar are preferable (for example, they may scale better
    and use less memory, even if the average time lookup cost is higher).

    But, hashes can work well as a good quick/dirty way to speed up lookups, etc.


    Well, or sometimes they can be combined:
      A small hash-table as a lookup cache on the front-end of a B-Tree;
      ...


    Etc...


    I used to do something like this at college (many years ago!!):

    void initHashTable(HashTable *table, int size) {
    table->size = size;
    table->count = 0;
    table->items = malloc(sizeof(Item*) * size);
    for (int i = 0; i < size; i++) {
    table->items[i] = NULL; // Initialize all buckets to NULL
    }
    }

    void addItem(HashTable *table, int key, int value) {
    int index = hash(key, table->size);
    // Add item to the linked list at index
    }

    Michael says, 'Hash tables are very unlike linked lists'. Actually, they
    are very much like linked lists, so I agree with BGB, but I tend to
    think in more abstract terms.




    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Sat Apr 11 14:19:06 2026
    From Newsgroup: comp.lang.c

    On 4/10/2026 11:00 AM, BGB wrote:
    [...]

    Have you ever experimented with Cantor pairing has a hash?
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Sun Apr 12 11:06:56 2026
    From Newsgroup: comp.lang.c

    On Sat, 11 Apr 2026 20:06:43 +0100
    HashTables <invalid@invalid.invalid> wrote:

    Michael says, 'Hash tables are very unlike linked lists'. Actually,
    they are very much like linked lists, so I agree with BGB, but

    It sounds like you didn't understand what I was talking about.
    The context was "where do professional C programmers tend to use
    pre-written libraries vs coding the damn thing from the first
    principles as it suits the situation at hand". For hash tables it's predominantly the former, for linked lists it's often enough the latter.

    I tend to think in more abstract terms.


    I tend not to.


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.lang.c on Sun Apr 12 03:36:29 2026
    From Newsgroup: comp.lang.c

    On 4/11/2026 4:19 PM, Chris M. Thomasson wrote:
    On 4/10/2026 11:00 AM, BGB wrote:
    [...]

    Have you ever experimented with Cantor pairing has a hash?

    Not Cantor specifically, but have made use of Morton Ordering a fair
    bit, and also Hilbert Ordering in some use cases.


    For hashing in a grid space, have often done things like:
    (((X*Prime1+Y*Prime2+Bias1)*Prime3+Bias2)>>SHIFT)&MASK

    Though, this is sometimes not particularly cheap (due to all the
    multiplies and similar).

    Granted, can reduce this to: (X*(Prime1*Prime3)+Y*(Prime2*Prime3)+(Bias1*Prime3+Bias2))>>SHIFT)&MASK
    Which constant propagation reduces to:
    (X*Magic1+Y*Magic2+Magic3)>>SHIFT)&MASK

    While it might seem intuitive to just use 3 primes in the latter case,
    there can be some mathematical weirdness at play which may make
    composite numbers following the previous pattern preferable.

    Also (for maybe esoteric reasons) best results if Magic1+Magic2+Magic3
    add up to something just larger than the power-of-2 value of the Shift,
    which is also better if larger than MASK (though, this part is looser). Usually the Bias values are small prime numbers just big enough to push
    the value over the threshold, and it is better if the X/Y values are
    uneven (don't pick primes that are too similar, but also not too far
    apart), and given primes should not be reused, ...

    Though, say, for picking two primes, maybe something near 3/8 of the
    target sum and another near 5/8 or similar.

    Also be careful of Mersenne primes as they seem to have different
    behavioral properties than other primes (good for biases, but avoid for multipliers in this configuration).

    ...


    Or, at least, this sort of approach has seemed to work OK for me in
    practice, but I can't really explain the "why" aspects all that well.

    But, yeah, this part does sometimes seem almost like some sort of
    numerical black arts (do it well, hash works well; do it poorly, and
    hash function sucks).



    Though, one why that is easier to explain:
    Hash tables should always be a power of 2.
    Why:
    Mostly because the % operator is typically painfully slow.



    Another option is to use Morton Shuffling, say:
    ((__morton(x, y)*Prime+Bias)>>SHIFT)&MASK

    or, 4D:
    ((__morton(__morton(x, y), __morton(z, w))*Prime+Bias)>>SHIFT)&MASK
    For 3D, can make up a W axis, like:
    x-y+z+Magic

    Though, mostly makes sense on an ISA with built-in support for Morton
    Shuffles (otherwise can end up being more expensive than using
    multiplies or similar).

    ...

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.lang.c on Sun Apr 12 04:24:32 2026
    From Newsgroup: comp.lang.c

    On 4/12/2026 3:06 AM, Michael S wrote:
    On Sat, 11 Apr 2026 20:06:43 +0100
    HashTables <invalid@invalid.invalid> wrote:

    Michael says, 'Hash tables are very unlike linked lists'. Actually,
    they are very much like linked lists, so I agree with BGB, but

    It sounds like you didn't understand what I was talking about.
    The context was "where do professional C programmers tend to use
    pre-written libraries vs coding the damn thing from the first
    principles as it suits the situation at hand". For hash tables it's predominantly the former, for linked lists it's often enough the latter.


    Well, my experience differs here.

    But, I haven't done much of any real professional programming, most just poking around with FOSS stuff, and fiddling with my own projects.


    I tend to think in more abstract terms.


    I tend not to.

    I suspect my thinking tends towards a fairly low level of abstraction.

    Low level of abstractions, typically;
    Seemingly tends towards a "bottom up" approach to a lot of things;
    ...



    Had noted that my thinking patterns (and typically also dreams) tend to
    be dominated by a mixture of text and monochrome graphics.

    In some inner parts of my "inner world", it seems almost like a MUD
    (like I had found within my mind, it is like there are free-running text
    feeds of whatever I am doing, thinking, and whatever I see around me,
    etc). But, then on top of this, a layer that represents worlds in mostly static images and monochrome (with some limited color following
    something akin to a 16-color palette).

    Can imagine 3D scenery and graphics easily, but things mostly remain as
    a sort of white-on-black monochrome.


    Otherwise, noted on things like a NOVA episode talking about sensory perception, and some other science-type videos talking about sensory perception, that for me senses seem to work very different...

    Like, I don't tend to live in a fully realized high-detail world. I can
    see that my eyes see such a world, but it doesn't quite match descriptions.


    Many illusions don't work correctly, and I am left being all too aware
    of visual perception glitches involving things like mirrors and shadows:
    Things like "hollow face" don't work as described;
    Many others give different behaviors;
    Visual perception tends to flag a mirror as a window into another room,
    and my reflection as a person standing in that room (intellectually, I
    know it is myself, visual processing disagrees);
    Shadows sometimes are either seen as separate objects, or almost like "darkness volumes" that extend off of objects (not easy to describe
    exactly);
    Color constancy doesn't behave exactly as described. At least in my
    case, the color of an object is not seen independently of the color of
    the lighting (like, say, warm white makes everything look kinda yellow,
    etc).


    But, given "mirror as doorway" (or a person fighting with the person in
    the mirror, etc) or "shadows as independent figures" are fairly common
    tropes in media, seems like these sensory glitches aren't entirely
    unorthodox.


    In cases of ambiguous figures (images which can be seen as one of
    several possibilities), my vision doesn't usually settle on one or the
    other, but rather tends to chaotically flicker between both of them.


    Well, except in the "one eye sees green, the other sees red", where
    ironically (and contrary to claims), my vision combines them to yellow
    (and does not see a flicker).

    Also, I don't see the world as a single monolithic thing, but more sort
    of a layered "onion like" thing.

    Well, and I don't have sort of a "seeing everything all at once"
    experience, rather I am rather aware that one needs to actually look at
    things to see them, and can only see it well if one is looking at it
    directly (that someones' sensory experience could somehow not include
    this aspect seems almost alien).


    Mind can keep a map of things I am not currently looking at, but it is
    more of a mental map, and tends to represent things like black-and-white glyphs representing objects on the inside of a sphere. Sometimes, it can
    also construct a top down view (for path-finding) and can visualize
    waypoints (and superimpose them onto my normal vision).

    But, like, I know that I am doing this. And, say, that the floating line
    I need to follow to get to a given location is not an "actually real
    entity", but rather something that one part of my mind draws so that my
    legs know which direction to keep walking in.


    But, yeah, general existence is its own thing I guess...



    ...


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Sun Apr 12 14:51:33 2026
    From Newsgroup: comp.lang.c

    Michael S <already5chosen@yahoo.com> wrote:
    On Fri, 10 Apr 2026 01:17:44 -0500
    BGB <cr88192@gmail.com> wrote:

    On 4/7/2026 10:14 PM, 🇵🇱Jacek Marcin Jaworski🇵🇱 wrote:
    W dniu 7.04.2026 o 23:38, highcrew pisze:
    hsearch_r is close to my definition of decent, but it has huge
    limitations: fixed size that needs to be estimated before use,
    lack of a delete primitive, and it can only use string keys.

    Did you know that these "limitations" are hash table attributes in
    1. y. computer science students book?


    FWIW:
    Hash tables, like linked lists, are one of those things that pretty
    much everyone rolls custom nearly every time they need one.


    Not that I disagree with many of your points below, but here you are
    wrong. Hash tables are very *unlike* linked lists. Be sure that
    overwhelming majority of professional programmers, including good ones,
    never implemented a hash table themselves after they left school.

    It seems, you mostly converse with people that share your extreme DIY attitude. Please realize that for better or worse the bigger world is different.

    I have made my own implementation of hash tables. For many tasks
    I have used Perl scripts. But once I had 8MB machine (more or less
    average at that time) and something like 1.3 MB data. Perl
    was running out of memory and I considered this unreasonable.
    So I created my implementation of hash tables in C and it had
    no trouble processing the data. IIRC I spent something like
    2 days thinking about design and 1 day coding. I did not need
    to delete items, so this was/is not supported. Keys are counted
    strings, that allows hashing arbitrary data, but equality is
    hardcoded to bitwise comparison. Table stores pointers to
    key-data structs, it uses "double hashing" to handle
    collisions. Table grows when data is inserted, starting at
    about kilobyte and possibly growing to gigabytes.

    My initial implementation had a bug, it lost piece of data that
    triggered resizing. I was using it to collect counts for
    essentially statistical processing, so loosing few items did not
    matter much for final result and I initially overlooked that
    problem. Also, I initally used xor to reduce multiword data
    to single word. That caused excessive collisions on some of
    my data, so I replaced xor by addition (that could be improved
    with some extra mixing, I do not remember if I did that).

    I used this implementation several times, sometimes with small
    tweaks, so it is not "fully custom" every time I need it.
    OTOH initial design satisfied my needs (fully binary keys,
    reasonable space use, reasonable speed) while "ready"
    things that I could use did not. Considering total
    implementation time, that is about 4 days (3 for initial
    implementation and 1 for fixes) I consider it time well
    spent (one can easily spend more time searching for
    good implementation on the net).

    Of couse, if you do not need a hash table, then you do not
    need to implement it. Similarly, if a co-worker created
    good implementation or more generally if you can use
    implementation created by some other person, then use it.
    So I agree that majority of programmer probably do not
    need to implement a hash table. But IMO minority that
    implements hash tables in C is not small.
    --
    Waldek Hebisch
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From HashTables@invalid@invalid.invalid to comp.lang.c on Sun Apr 12 19:56:50 2026
    From Newsgroup: comp.lang.c

    On 12/04/2026 09:06, Michael S wrote:
    I tend to think in more abstract terms.

    I tend not to.




    I've noticed that, which is why I decided to encourage people to take a
    break from coding and think about what the code is supposed to do. It's
    very easy to dismiss something just because it uses a different name,
    without looking at the code itself.
    --
    *Newsgroup Post *


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Apr 13 08:27:13 2026
    From Newsgroup: comp.lang.c

    HashTables <invalid@invalid.invalid> writes:

    Michael says, 'Hash tables are very unlike linked lists'.
    Actually, they are very much like linked lists, [...]

    Hash tables are nothing like linked lists. Some kinds of hash
    tables (but not all) use linked lists internally, but that has
    nothing to do with their public interface. A function to sort an
    array might choose internally to produce a linked list and then sort
    the list before putting elements back into the array, but that
    doesn't mean arrays are like linked lists. The same is true for
    hash tables.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.lang.c on Mon Apr 13 14:46:45 2026
    From Newsgroup: comp.lang.c

    On 4/13/2026 10:27 AM, Tim Rentsch wrote:
    HashTables <invalid@invalid.invalid> writes:

    Michael says, 'Hash tables are very unlike linked lists'.
    Actually, they are very much like linked lists, [...]

    Hash tables are nothing like linked lists. Some kinds of hash
    tables (but not all) use linked lists internally, but that has
    nothing to do with their public interface. A function to sort an
    array might choose internally to produce a linked list and then sort
    the list before putting elements back into the array, but that
    doesn't mean arrays are like linked lists. The same is true for
    hash tables.

    In the original sense, the idea was that they were like linked lists in
    the sense that a programmer may often tend write both as one-off things whenever they come up (as opposed to using some 3rd party
    "foo_linked_list.h" library or something).

    Not in the sense of there being any sort of technical or functional
    similarity between hash tables and linked lists.


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Mon Apr 13 14:42:39 2026
    From Newsgroup: comp.lang.c

    On 4/13/2026 8:27 AM, Tim Rentsch wrote:
    HashTables <invalid@invalid.invalid> writes:

    Michael says, 'Hash tables are very unlike linked lists'.
    Actually, they are very much like linked lists, [...]

    Hash tables are nothing like linked lists. Some kinds of hash
    tables (but not all) use linked lists internally, but that has
    nothing to do with their public interface. A function to sort an
    array might choose internally to produce a linked list and then sort
    the list before putting elements back into the array, but that
    doesn't mean arrays are like linked lists. The same is true for
    hash tables.

    Linked lists for hash collisions are fairly common?
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Tue Apr 14 01:14:56 2026
    From Newsgroup: comp.lang.c

    On 2026-04-13 21:46, BGB wrote:
    [...]

    In the original sense, the idea was that they were like linked lists in
    the sense that a programmer may often tend write both as one-off things whenever they come up (as opposed to using some 3rd party "foo_linked_list.h" library or something).

    Okay, you now clarified what you meant by that comparison, but;
    I'm not sure where you're coming from, and where you've got the
    impression from that "a programmer may often tend write both as
    one-off things". - I can of course only speak for the contexts
    I've professionally worked in, and experienced a different thing.
    First it was never the individual programmer's decision whether
    projects created their own functions or libraries, or whether
    some existing product or source will be used, or examined before
    the decision whether it fits to be used. If the language we used
    didn't provide the necessary functions we got third-party code
    (for C++ in the 1990's, for example, Boost). Why spend manpower
    and have all the hassle with producing correct code when others
    have already provided the functions or libraries in good quality.

    I'm aware that the Not Invented Here syndrome was (and maybe is)
    existing. So I'm curious how widely spread that habit actually is;
    especially in professional commercial contexts, as opposed to
    individual private projects where every individual programmer may
    do what he likes.

    Janis

    [...]

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Tue Apr 14 01:17:06 2026
    From Newsgroup: comp.lang.c

    On 2026-04-13 23:42, Chris M. Thomasson wrote:
    On 4/13/2026 8:27 AM, Tim Rentsch wrote:
    HashTables <invalid@invalid.invalid> writes:

    Michael says, 'Hash tables are very unlike linked lists'.
    Actually, they are very much like linked lists, [...]

    Hash tables are nothing like linked lists.  Some kinds of hash
    tables (but not all) use linked lists internally, but that has
    nothing to do with their public interface.  A function to sort an
    array might choose internally to produce a linked list and then sort
    the list before putting elements back into the array, but that
    doesn't mean arrays are like linked lists.  The same is true for
    hash tables.

    Linked lists for hash collisions are fairly common?

    They are an implementation option. One common variant of a few
    (also common) other possibilities we have.

    Janis

    --- Synchronet 3.21f-Linux NewsLink 1.2