Forum: War Ensemble BBS

A neat hash table implementation

From highcrew@high.crew3868@fastmail.com to comp.lang.c on Tue Apr 7 23:38:01 2026

From Newsgroup: comp.lang.c

The most annoying bit of C, to me, is the lack of a decent hash table implementation.

hsearch_r is close to my definition of decent, but it has huge
limitations: fixed size that needs to be estimated before use, lack of a
delete primitive, and it can only use string keys.

I recently stumbled into klib: https://attractivechaos.github.io/klib/
Klib has a nice header-only hash-table implementation: khash.h

To be honest, I'm not a huge fan of vendoring header-only libraries.
It is also a little hard to understand at first, since it makes heavy
use of cpp macros. Yet it definitely captured my attention.

What I like most:

- A reasonable API
- It is possible to redefine malloc/calloc/free before including the
header. This allows me to use a memory pool, if so I wish
- Proper handling of malloc failures (by contrast with glib2 and
uthash)
- It has a kh_resize() primitive, that I can use to pre-allocate the
table to the maximum size, if I want to avoid re-allocations
(of course I need to free up space if kh_size() == max)

So far I couldn't find any obvious shortcoming, so I thought of sharing
it with this NG, in case anyone missed it.
--
High Crew
--- Synchronet 3.21f-Linux NewsLink 1.2

From jmj@jmj@energokod.gda.pl to comp.lang.c on Wed Apr 8 05:14:36 2026

From Newsgroup: comp.lang.c

W dniu 7.04.2026 o 23:38, highcrew pisze:

hsearch_r is close to my definition of decent, but it has huge
limitations: fixed size that needs to be estimated before use, lack of a delete primitive, and it can only use string keys.

Did you know that these "limitations" are hash table attributes in 1. y. computer science students book?
--
Jacek Marcin Jaworski, Pruszcz Gd., woj. Pomorskie, Polska 🇵🇱, EU 🇪🇺;
tel.: +48-609-170-742, najlepiej w godz.: 5:00-5:55 lub 16:00-17:25; <jmj@energokod.gda.pl>, gpg: 4A541AA7A6E872318B85D7F6A651CC39244B0BFA;
Domowa s. WWW: <https://energokod.gda.pl>;
Mini Netykieta: <https://energokod.gda.pl/MiniNetykieta.html>;
Mailowa Samoobrona: <https://emailselfdefense.fsf.org/pl>.
UWAGA:
NIE ZACIĄGAJ "UKRYTEGO DŁUGU"! PŁAĆ ZA PROG. FOSS I INFO. INTERNETOWE! CZYTAJ DARMOWY: "17. Raport Totaliztyczny - Patroni Kontra Bankierzy": <https://energokod.gda.pl/raporty-totaliztyczne/17.%20Patroni%20Kontra%20Bankierzy.pdf>

--- Synchronet 3.21f-Linux NewsLink 1.2

From highcrew@high.crew3868@fastmail.com to comp.lang.c on Wed Apr 8 09:29:13 2026

From Newsgroup: comp.lang.c

On 4/8/26 5:14 AM, 🇵🇱Jacek Marcin Jaworski🇵🇱 wrote:

Did you know that these "limitations" are hash table attributes in 1. y. computer science students book?

No, I did not know.
--
High Crew
--- Synchronet 3.21f-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Wed Apr 8 10:42:26 2026

From Newsgroup: comp.lang.c

On 2026-04-07 23:38, highcrew wrote:

The most annoying bit of C, to me, is the lack of a decent hash table implementation.

hsearch_r is close to my definition of decent, but it has huge
limitations: fixed size that needs to be estimated before use, lack of a delete primitive, and it can only use string keys.

I recently stumbled into klib: https://attractivechaos.github.io/klib/
Klib has a nice header-only hash-table implementation: khash.h

To be honest, I'm not a huge fan of vendoring header-only libraries.
It is also a little hard to understand at first, since it makes heavy
use of cpp macros. Yet it definitely captured my attention.

What I like most:

- A reasonable API
- It is possible to redefine malloc/calloc/free before including the
header. This allows me to use a memory pool, if so I wish
- Proper handling of malloc failures (by contrast with glib2 and
uthash)
- It has a kh_resize() primitive, that I can use to pre-allocate the
table to the maximum size, if I want to avoid re-allocations
(of course I need to free up space if kh_size() == max)

So far I couldn't find any obvious shortcoming, so I thought of sharing
it with this NG, in case anyone missed it.

The lack of hash tables I consider indeed a severe limitation if you're
used to simple data mappings - hail to Awk for that - and contributions
welcome (for any language without that feature builtin).

Beyond "[not] any obvious shortcoming"; have you practical experiences
with using that library?

The CPP macros is certainly something I despise. It seems to be used to
provide some primitive form of genericity; is that observed correctly?

Are recent "C" standards (meanwhile) providing such functionality?

Janis

--- Synchronet 3.21f-Linux NewsLink 1.2

From highcrew@high.crew3868@fastmail.com to comp.lang.c on Wed Apr 8 21:44:58 2026

From Newsgroup: comp.lang.c

On 4/8/26 10:42 AM, Janis Papanagnou wrote:

Beyond "[not] any obvious shortcoming"; have you practical experiences
with using that library?

Not yet, but I'm currently integrating khash into a personal project.
If anyone is interested, I can share my thoughts later in time...

The CPP macros is certainly something I despise. It seems to be used to provide some primitive form of genericity; is that observed correctly?

Correct.

Quoting:

Generic programming in C. Except my khash.h, all the other C hash

libraries use (void*) to achieve generic typing. Using void* is okey for strings, but will cause overhead for integers. This is why all C
libraries, except khash.h, is slower than C++ libraries on integer keys,
but close to on string keys.

https://attractivechaos.wordpress.com/2008/08/28/comparison-of-hash-table-libraries/
(see "Generic programming in C")

Are recent "C" standards (meanwhile) providing such functionality?

AFAIK, the hsearch interface is the closest thing provided by POSIX
and hsearch_r (what I consider the usable variant) is a GNU extension.
--
High Crew
--- Synchronet 3.21f-Linux NewsLink 1.2

From makendo@makendo@makendo.invalid to comp.lang.c on Fri Apr 10 10:40:17 2026

From Newsgroup: comp.lang.c

Quoting:

Generic programming in C. Except my khash.h, all the other C hash
libraries use (void*) to achieve generic typing. Using void* is okey for
strings, but will cause overhead for integers. This is why all C
libraries, except khash.h, is slower than C++ libraries on integer keys,
but close to on string keys.

Barring macro usage, you also have to use function pointers to compare
equality or hash an arbitrary object, and I dunno if compilers can know
enough to devirtualize and inline these. Another source of overhead.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Apr 10 05:42:47 2026

From Newsgroup: comp.lang.c

Take unordered_map<>, that's ten times more comfortable than any C solution. --- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.lang.c on Fri Apr 10 01:17:44 2026

From Newsgroup: comp.lang.c

On 4/7/2026 10:14 PM, 🇵🇱Jacek Marcin Jaworski🇵🇱 wrote:

W dniu 7.04.2026 o 23:38, highcrew pisze:

hsearch_r is close to my definition of decent, but it has huge
limitations: fixed size that needs to be estimated before use, lack of a
delete primitive, and it can only use string keys.

Did you know that these "limitations" are hash table attributes in 1. y. computer science students book?

FWIW:
Hash tables, like linked lists, are one of those things that pretty much everyone rolls custom nearly every time they need one.

So, fast and easy to implement for a specific task;
But, more difficult to "generalize' without also ruining any advantages
it might have otherwise had (almost invariably the code would end up an
order of magnitude or more bulkier and slower than had they rolled
something specifically for the task at hand).

So, generally, no one really generalizes or attempts to reuse them,
because doing so is almost invariably worse than writing something one-off.

By the time one has generalized it to the point of dealing with multiple
key and value types, one has often gotten much of the way to
implementing something like a dynamic type-system or similar, which is
alas another one of those things that people either avoid entirely, or
tend to just end up rolling their own.

And, if one tries to generalize it enough that one could in theory use
it across multiple projects, almost invariably one will end up with the
sort of thing that no one wants to poke with a stick.

And almost invariably someone else will think it a good idea to have it
use a Garbage Collector, etc, and thus far pretty much no one has
managed to make a garbage collector that manages to "consistently not suck".

So, the pattern continues...

That said, one does tend to develop a toolbox of patterns they can use whenever they need to roll a hash table or whatever else.

And, there are some semi-difficult problems, like how to most
efficiently hash a variable-length string.

Well, decided to leave out going into things like how to efficiently
hash strings, as getting fast string hashing here may involve use of techniques that some people might consider, unnatural.

Well, and also someone at the level of asking about reusable hash tables
is not likely at the level of being relative dark arts of
speed-optimized string handling (well, of the sort that involves wild
pointer casting and treating blobs of characters as integer data and
using arithmetic tricks to detect the presence of a NUL terminator).

But, then again, arguably someone at the level of asking about generic
hash implementations might not be in a good position to write moderately efficient hand-rolled code (so, the inefficiency of trying to use
something generic might not necessarily be worse than if they try to
write it themselves).

I guess someone could try to write an implementation where the hash
functions and similar are provided by vtables with an interface
mechanism along vaguely similar lines to qsort, and an initialization mechanism that could fill in pointers for some common cases (such as
common power-of-2 sizes for payload data, ...).

Well, and at this level, one can question if hash-tables are the best interface (by the time someone is paying these sorts of costs, one may
have crossed into a realm where a B-Tree or similar might be a better
option, etc).

Or, people may use the term "hash table" when they mean "some sort of
generic key/value mapping" without caring about the underlying algorithm
(say, for example, if one has a dictionary-object like in a language
like JavaScript or similar, do they really need to care whether it is implemented on top of a hash table, B-Tree, or AVL-Tree, ?...).

Sometimes, interface overhead costs may be hard to work around.

...

--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c,comp.lang.c++ on Fri Apr 10 13:13:40 2026

From Newsgroup: comp.lang.c

On Fri, 10 Apr 2026 05:42:47 +0200
Bonita Montero <Bonita.Montero@gmail.com> wrote:

Take unordered_map<>, that's ten times more comfortable than any C
solution.

First, your reply is mildly off topic in comp.lang.c.
Second, my reply is going to be even more off topic. Not that it stops
me.
In order to remain partly on topic I'm cross-posting to comp.lang.c++

A. For most common cases I agree.

B. For less common cases, where standard library does not provide
ready-made hash function for your Key type, it is less clear.
On one hand, implementing only custom hash function (or functor) is less
work than implementing full associative array.
On the other hand, it's harder work. Exact requirements for Hash are not specified in easy-to-read language. I don't know about you, but for me personally it feels like going on mine field.
So, personally, I'd rather code full associative array than Hash for std::unordered_map<>. More work, but better feeling of control.

All that rather different from supplying comparison primitive for
std::map<> where interface and requirements are crystal clear and the
task itself is much simpler.

C. std::unordered_map<> can be surprising.
I think, it was you who showed us such case couple of years ago, when
you tried to use string literals as keys.

D. Performance.
For optimized "Release" build performance of std::unordered_map<> is
typically very good. But for non-optimized "Debug" build it typically
very bad.
That's unlike C-style implementations in separately compiled library:
here performance is, may be, not quite as good as "Release" build of std::unordered_map<>, but close. But it remains the same under
"Debug" build.

E. Compilation time. Slow.
Could not be a problem when unordered_map used in just few compilation
units. Could be serious problem otherwise.

--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Fri Apr 10 13:25:12 2026

From Newsgroup: comp.lang.c

On Fri, 10 Apr 2026 01:17:44 -0500
BGB <cr88192@gmail.com> wrote:

On 4/7/2026 10:14 PM, 🇵🇱Jacek Marcin Jaworski🇵🇱 wrote:

W dniu 7.04.2026 o 23:38, highcrew pisze:

hsearch_r is close to my definition of decent, but it has huge
limitations: fixed size that needs to be estimated before use,
lack of a delete primitive, and it can only use string keys.

Did you know that these "limitations" are hash table attributes in
1. y. computer science students book?

FWIW:
Hash tables, like linked lists, are one of those things that pretty
much everyone rolls custom nearly every time they need one.

Not that I disagree with many of your points below, but here you are
wrong. Hash tables are very *unlike* linked lists. Be sure that
overwhelming majority of professional programmers, including good ones,
never implemented a hash table themselves after they left school.
It seems, you mostly converse with people that share your extreme DIY
attitude. Please realize that for better or worse the bigger world is different.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c,comp.lang.c++ on Fri Apr 10 13:07:47 2026

From Newsgroup: comp.lang.c

Am 10.04.2026 um 12:13 schrieb Michael S:

On the other hand, it's harder work. Exact requirements for Hash are not specified in easy-to-read language. I don't know about you, but for me personally it feels like going on mine field.

Most hashers are 5min of work.

C. std::unordered_map<> can be surprising.
I think, it was you who showed us such case couple of years ago, when
you tried to use string literals as keys.

Maybe that was before C++17. Today I'd use a string_view.

E. Compilation time. Slow.
Could not be a problem when unordered_map used in just few compilation
units. Could be serious problem otherwise.

I'm using a compiler which supports C++20 modules.
With that compile times are a magnitude lower.
--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.lang.c on Fri Apr 10 13:00:53 2026

From Newsgroup: comp.lang.c

On 4/10/2026 5:25 AM, Michael S wrote:

On Fri, 10 Apr 2026 01:17:44 -0500
BGB <cr88192@gmail.com> wrote:

On 4/7/2026 10:14 PM, 🇵🇱Jacek Marcin Jaworski🇵🇱 wrote:

W dniu 7.04.2026 o 23:38, highcrew pisze:

hsearch_r is close to my definition of decent, but it has huge
limitations: fixed size that needs to be estimated before use,
lack of a delete primitive, and it can only use string keys.

Did you know that these "limitations" are hash table attributes in
1. y. computer science students book?

FWIW:
Hash tables, like linked lists, are one of those things that pretty
much everyone rolls custom nearly every time they need one.

Not that I disagree with many of your points below, but here you are
wrong. Hash tables are very *unlike* linked lists. Be sure that
overwhelming majority of professional programmers, including good ones,
never implemented a hash table themselves after they left school.

It seems, you mostly converse with people that share your extreme DIY attitude. Please realize that for better or worse the bigger world is different.

?...

So, then are people just naively looping over arrays and using linear
search for everything? Etc.

Then again, I can note that (for example) in the Doom engine, the
original WAD code, while it did use a trick to speed up lump-name
checking, was still using linear search to lookup WAD lumps.

Something like (from memory):
for(i=0; i<numwadlumps; i++)
{
if( ((int *)(waddir[i].name)[0] == (int *)name[0]) &&
((int *)(waddir[i].name)[1] == (int *)name[1]) )
break;
}

But, this could be sped up some, say:
llname = * *(long long *)name;
h = (int *)name[0] ^ (int *)name[1];
h ^= h >> 17;
h ^= h >> 7;
h &= 255;
c = waddirhash[h];
while(c>=0)
{
if( *(long long *)(waddir[c].name) == llname) )
break;
c = waddir[c].chain;
}

This doesn't feel too atypical, except that some people don't like this
sort of wanton pointer casting and similar.

But, as noted, gets harder if one wants a fast hash for a
variable-length string, say:
lcomask=0x8080808080808080ULL;
lcoadj =0x0101010101010101ULL;
v=*(uint64_t *)str;
s=str; va=0;
//check for NUL terminators
while(!((v-lcoadj)&((~v)&lcomask)))
{
va^=(va>>17)
va^=(va<< 7)
va^=v;
s+=8;
v=*(uint64_t *)s;
}
//trailing bytes
for(i=0; i<64; i+=8)
{
c=(v>>i)&255;
if(!c) break; // NUL byte
va^=(va>>17)
va^=(va<< 7)
va^=(c<<13);
}
//generate final hash
v=va^(va>>31);
v^=v>>17;
v^=v>>11;
h=v&4095;
return(h);

As opposed to a simpler but slower:
h=0;
while(*s)h=(h*251)+(*s++);
h=((h*251+7)>>8)&4095;
return(h);

But, then again, a lot depends on average string lengths, and if the
averate length is less than around 8 characters or so, the latter may
still be competitive (or faster). Or, one may not be worrying too much
about the speed of the hash function.

Well, and the former does seem to fall outside the realm of typical C
coding practice.

Also this particular algorithm also assumes a machine with fast
unaligned memory loads, ... Whereas the naive strategy may be safer if
one allows for targets which may be alignment sensitive, ...

Though, a similar approach can be used to determine things like string
length, or to implement things like "strcmp()".

Where, for example, the C library I am using in some of my projects had originally been using more naive string functions (looping and working character by character), but this was a lot slower.

Even if the faster versions can look like ugly hair...

Well, and sometimes there are cases where things like AVL Trees or
B-Trees or similar are preferable (for example, they may scale better
and use less memory, even if the average time lookup cost is higher).

But, hashes can work well as a good quick/dirty way to speed up lookups,
etc.

Well, or sometimes they can be combined:
A small hash-table as a lookup cache on the front-end of a B-Tree;
...

Etc...

--- Synchronet 3.21f-Linux NewsLink 1.2

From Bart@bc@freeuk.com to comp.lang.c on Fri Apr 10 21:07:54 2026

From Newsgroup: comp.lang.c

On 10/04/2026 11:25, Michael S wrote:

On Fri, 10 Apr 2026 01:17:44 -0500
BGB <cr88192@gmail.com> wrote:

On 4/7/2026 10:14 PM, 🇵🇱Jacek Marcin Jaworski🇵🇱 wrote:

W dniu 7.04.2026 o 23:38, highcrew pisze:

hsearch_r is close to my definition of decent, but it has huge
limitations: fixed size that needs to be estimated before use,
lack of a delete primitive, and it can only use string keys.

Did you know that these "limitations" are hash table attributes in
1. y. computer science students book?

FWIW:
Hash tables, like linked lists, are one of those things that pretty
much everyone rolls custom nearly every time they need one.

Not that I disagree with many of your points below, but here you are
wrong. Hash tables are very *unlike* linked lists. Be sure that
overwhelming majority of professional programmers, including good ones,
never implemented a hash table themselves after they left school.

C programmers or all programmers? Since I'd guess the majority of the
latter wouldn't know *how* to implement a hash table, if they even know
what it is.

It seems, you mostly converse with people that share your extreme DIY attitude. Please realize that for better or worse the bigger world is different.

C encourages DIY code, by not having any standard libraries for many
things, by omitting certain language features, by being used to
/implement such/ things for other languages where they are built-in.

For hash tables however, I agree with BGB: a hash algorithm, once you've
got one that works well for your task, is a few lines of code. If self-implemented, rather than buried in a library with a clunky 'general purpose' interface, it can be made very tight and efficient.

Hash tables have simple criteria: how long it takes to work out the hash function, and how many clashes arise. Those are easy to test and compare.

--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.lang.c on Fri Apr 10 16:13:06 2026

From Newsgroup: comp.lang.c

On 4/10/2026 3:07 PM, Bart wrote:

On 10/04/2026 11:25, Michael S wrote:

On Fri, 10 Apr 2026 01:17:44 -0500
BGB <cr88192@gmail.com> wrote:

On 4/7/2026 10:14 PM, 🇵🇱Jacek Marcin Jaworski🇵🇱 wrote:

W dniu 7.04.2026 o 23:38, highcrew pisze:

hsearch_r is close to my definition of decent, but it has huge
limitations: fixed size that needs to be estimated before use,
lack of a delete primitive, and it can only use string keys.

Did you know that these "limitations" are hash table attributes in
1. y. computer science students book?

FWIW:
Hash tables, like linked lists, are one of those things that pretty
much everyone rolls custom nearly every time they need one.

Not that I disagree with many of your points below, but here you are
wrong. Hash tables are very *unlike* linked lists. Be sure that
overwhelming majority of professional programmers, including good ones,
never implemented a hash table themselves after they left school.

C programmers or all programmers? Since I'd guess the majority of the
latter wouldn't know *how* to implement a hash table, if they even know
what it is.

Probably true, the guys using JavaScript and Python probably have little
need to write their own.

And the guys using Java and C++ and similar can use what the libraries provide.

Contrast, C is often used in cases where you don't even know for certain
if there is an OS...

It seems, you mostly converse with people that share your extreme DIY
attitude. Please realize that for better or worse the bigger world is
different.

C encourages DIY code, by not having any standard libraries for many
things, by omitting certain language features, by being used to /
implement such/ things for other languages where they are built-in.

For hash tables however, I agree with BGB: a hash algorithm, once you've
got one that works well for your task, is a few lines of code. If self- implemented, rather than buried in a library with a clunky 'general
purpose' interface, it can be made very tight and efficient.

Hash tables have simple criteria: how long it takes to work out the hash function, and how many clashes arise. Those are easy to test and compare.

Yeah, you find some things that work, and end up doing that every time
it comes up.

Not everything needs to be a big general purpose library.
Well, and if one does, it can turn into a feedback loop of trying to
make things ever bigger and more complex in an attempt to address every possible use-case. Rather than just the very specific scenario that
prompted use of hashing as a workaround (most often cases of "well,
using linear search was too slow here", or other types of lookup, ...).

So, in my thinking, they still fall in a similar category to things like linked lists.

Contrast, if there is something that is more deserving of a library
interface, it is something like sorting algorithms:
Implementing a naive sorting algorithm (like selection or bubble sort)
is often slow;
Something like quicksort is involved and fiddly enough to where it
doesn't make sense to freehand it every time it comes up;
...

But, at the same time, you don't generally use a library call like
"qsort()" to select the smallest of 3 or 4 numbers or to sort 5 to 10
items, as the constant cost of using "qsort()" will be higher than
whatever one could potentially "save" by using it (and internally, these
tend to fall back to a naive sorting algorithm whenever the subset falls
below some minimum number of items; so it may be extra API overhead just
to have it fall back to bubble sort or similar anyways).

But, on the other side, one might end up looking back at their code and wondering why exactly they ended up with four different implementations
of math code for doing software emulation of Binary128 numbers (each
written for a specific use-case, but the underlying ideas of Binary128 floating-point math don't tend to vary as much as the specific forms the values take and the contexts in which it was used, *).

*:
Struct with two 64 bit members and for running on a hosted PC-like target;
One version that is intended to run in an interrupt handler and is given values as pointers to pairs of "uint64_t *" numbers (using "__int128"
math internally);
One that is written in assembler for a specific CPU ISA;
One that is written in C and assumes the compiler having __int128 and __uint128 types and the ability to use __m128 to facilitate bit-casts
between __float128 and __uint128.

Say:
__float128 x;
__uint128 vix;
vix=(__uint128)((__m128)x);
Which is not exactly standard or portable behavior, but on the target in question is the fastest option (note that trying to use "memcpy()" here
would wreck performance, as the compiler in question can't optimize away
the memory stores/loads, but can turn the casts into a direct 128-bit register-pair move, interpreting it as a request to do a direct
bit-for-bit copy; where leaving out the __m128 cast here would instead
result in it trying to convert the value into an integer).

Sometimes, there be dragons here...

...

--- Synchronet 3.21f-Linux NewsLink 1.2

From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Fri Apr 10 17:31:34 2026

From Newsgroup: comp.lang.c

On 2026-04-10 14:00, BGB wrote:

On 4/10/2026 5:25 AM, Michael S wrote:

On Fri, 10 Apr 2026 01:17:44 -0500
BGB <cr88192@gmail.com> wrote:

...

FWIW:
Hash tables, like linked lists, are one of those things that pretty
much everyone rolls custom nearly every time they need one.

Not that I disagree with many of your points below, but here you are
wrong. Hash tables are very *unlike* linked lists. Be sure that
overwhelming majority of professional programmers, including good ones,
never implemented a hash table themselves after they left school.

It seems, you mostly converse with people that share your extreme DIY
attitude. Please realize that for better or worse the bigger world is
different.

?...

So, then are people just naively looping over arrays and using linear
search for everything? Etc.

How do you reach that conclusion from the above comment? All that you
can conclude from the above is that if they think they need a hash
table, they use one that someone else has already implemented, rather
than implementing one of their own. I personally have never implemented
a hash table, but I have used the hashing capability that are built into several of the tools I do use. The closest I have come to implementing
my own hash table is to help someone else fix the hash function they
were passing to a hashing routine. Their function was producing a
ridiculously high number of collisions, because they hadn't put any
thought into the distribution of the data they were hashing.

--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.lang.c on Fri Apr 10 23:34:41 2026

From Newsgroup: comp.lang.c

On 4/10/2026 4:31 PM, James Kuyper wrote:

On 2026-04-10 14:00, BGB wrote:

On 4/10/2026 5:25 AM, Michael S wrote:

On Fri, 10 Apr 2026 01:17:44 -0500
BGB <cr88192@gmail.com> wrote:

...

FWIW:
Hash tables, like linked lists, are one of those things that pretty
much everyone rolls custom nearly every time they need one.

Not that I disagree with many of your points below, but here you are
wrong. Hash tables are very *unlike* linked lists. Be sure that
overwhelming majority of professional programmers, including good ones,
never implemented a hash table themselves after they left school.

It seems, you mostly converse with people that share your extreme DIY
attitude. Please realize that for better or worse the bigger world is
different.

?...

So, then are people just naively looping over arrays and using linear
search for everything? Etc.

How do you reach that conclusion from the above comment? All that you
can conclude from the above is that if they think they need a hash
table, they use one that someone else has already implemented, rather
than implementing one of their own. I personally have never implemented
a hash table, but I have used the hashing capability that are built into several of the tools I do use. The closest I have come to implementing
my own hash table is to help someone else fix the hash function they
were passing to a hashing routine. Their function was producing a ridiculously high number of collisions, because they hadn't put any
thought into the distribution of the data they were hashing.

There are few options sometimes.

Say, for example, you want to look over a list of strings and map them
to an ordinal number:
char *strings[]={
"first",
"second",
...
NULL
};
How do you do it then, exactly?...

One could use a "for()" loop, say:
for(i=0; strings[i]; i++)
if(!strcmp(strings[i], str))
break;
Which is, often OK.

What else, use a 3rd party library? Far likely, that is a bigger pain...
Well, and 3rd party library dependencies are often not worth it.

What else could you do? Maybe:
int strings_hash[256];
short strings_chain[(sizeof(strings)/sizeof(char *))+1];
...
int HashString(char *str)
{
char *s;
int h;
s=str;
while(*s)h=h*251+(*s++);
return(((h*251+7)>>8)&255);
}
int GetIndexInStrings(char *str)
{
if(!strings_hash[255])
{
int i, j;
for(i=0; i<256; i++)
strings_hash[i]=-1;
for(i=0; strings[i]; i++)
{
j=HashString(strings[i]);
strings_chain[i]=strings_hash[j];
strings_hash[j]=i;
}
}
i=HashString(str);
while(i>=0)
{
if(!strcmp(strings[i], str))
break;
i=strings_chain[i];
}
return(i);
}
...

I suspect finding code that someone else had written for this would
likely be more of a hassle than writing it oneself.

It likely isn't even to the level where it would make sense for someone
to ask Grok or Gemini to write it...

It is to the level where one may one-off it just to write a Usenet
response...

Or, maybe they want a key/value mapping, or any number of other things
that can be lumped under the name of "hash table"?...

well, OK, maybe get a little creating:
typedef struct SomeSortOfHash_s SomeSortOfHash;
struct SomeSortOfHash_s {
short *keys;
void *vals;
char shsz;
};
SomeSortOfHash *NewSomeSortOfHash(int shsz)
{
SomeSortOfHash *tmp;
tmp=malloc(sizeof(SomeSortOfHash));
tmp->shsz=shsz;
tmp->keys=malloc((1<<shsz)*sizeof(short));
tmp->vals=malloc((1<<shsz)*sizeof(void *));
memset(tmp->keys, 255, (1<<shsz)*sizeof(short));
return(tmp);
}
int SomeSortOfHash_LookupSlotNumberForOrdinal(
SomeSortOfHash *tab, short ord)
{
int i, j, k, h, sz, szm;
sz=1<<tab->shsz;
szm=(sz-1);
h=((ord*65521+17)>>16)&szm;
if(tab->keys[h]==ord)
return(h);
j=h; j=j*31+7;
for(i=0; i<16; i++)
{
k=(j>>5)&szm;
if(tab->keys[k]==ord)
return(k);
}
//dunno, if near full, fall back to linear search or such.
for(i=0; i<sz; i++)
if(tab->keys[i]==ord)
return(i);
return(-1);
}
int SomeSortOfHash_GetSlotNumberForOrdinal(
SomeSortOfHash *tab, short ord)
{
//nearly the same, just insert the ordinal, or expand table.
//or do both Lookup and Get with same internal function,
//using a flag for whether or not to add a new member
}
int SomeSortOfHash_LookupSlotNumberForName(
SomeSortOfHash *tab, char *name)
{
int ord;
ord=GetOrdinalForName(name);
if(ord<=0)
return(-1);
return(SomeSortOfHash_LookupSlotNumberForOrdinal(tab, ord));
}
...

Well, could fill out the rest, but point is obvious enough.

Maybe this is enough to start to care about it.

Granted, I don't use hash tables in this particular way more often (for
small key/value dictionaries had more often ended up using certain types
of B-Trees or similar).

Though, ironically, despite being "technically" B-Trees, often a very different sort of structure from that typically used in filesystems or databases (which are often built around disk blocks and for associations between name-strings and data-blobs, rather than associating an ordinal
with a pointer or similar).

...

--- Synchronet 3.21f-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sat Apr 11 08:20:36 2026

From Newsgroup: comp.lang.c

On 2026-04-11 06:34, BGB wrote:

[...]

There are few options sometimes.

Say, for example, you want to look over a list of strings and map them
to an ordinal number:
char *strings[]={
"first",
"second",
...
NULL
};
How do you do it then, exactly?...

Static lists of strings will get incorporated presorted in programs.
(I use Unix sort(1) for that, either directly or using my editor.)

In case an empty array will get populated dynamically from unknown
input there's the option to insert them in a sorted order instead
of sorting them.

[...]

Granted, I don't use hash tables in this particular way more often (for small key/value dictionaries had more often ended up using certain types
of B-Trees or similar).

You really mean B-Trees here, not binary trees? - Compared to binary
trees, B-Trees are at least a bit more complex to organize. And the
linked structures of dynamic tree structures require more effort and
care compared to a static hash array.

Though, ironically, despite being "technically" B-Trees, often a very different sort of structure from that typically used in filesystems or databases (which are often built around disk blocks and for associations between name-strings and data-blobs, rather than associating an ordinal
with a pointer or similar).

In case I'd need a fast tree structure I'd rather (for specific cases)
use B*-Trees instead of B-Trees; they have better properties. Usually,
though, I never needed these sorts of trees; I used binary or AVL in
some (very few) cases.

BTW, in early days I assumed that Awk realized its associative arrays
with trees, but Arnold told me that internally he uses a hash array (I
never checked). Obvious one variant that fits for all application cases.

Janis

--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.lang.c on Sat Apr 11 04:19:15 2026

From Newsgroup: comp.lang.c

On 4/11/2026 1:20 AM, Janis Papanagnou wrote:

On 2026-04-11 06:34, BGB wrote:

[...]

There are few options sometimes.

Say, for example, you want to look over a list of strings and map them
to an ordinal number:
   char *strings[]={
   "first",
   "second",
   ...
   NULL
   };
How do you do it then, exactly?...

Static lists of strings will get incorporated presorted in programs.
(I use Unix sort(1) for that, either directly or using my editor.)

In case an empty array will get populated dynamically from unknown
input there's the option to insert them in a sorted order instead
of sorting them.

I guess yeah, could use binary search in that case...

[...]

Granted, I don't use hash tables in this particular way more often
(for small key/value dictionaries had more often ended up using
certain types of B-Trees or similar).

You really mean B-Trees here, not binary trees? - Compared to binary
trees, B-Trees are at least a bit more complex to organize. And the
linked structures of dynamic tree structures require more effort and
care compared to a static hash array.

Yes, B-Trees, though for dictionary type objects typically used a small
fixed branching factor, usually 16 or so (each node may have up to 16 children).

So, in this case:
0- 16 children: 1 level, 1 node
17- 256 children: 2 levels, 3 to 17 nodes
257-4096 children: 3 levels

Typically there was a max of 3 or 4 levels as I tend to usually use
16-bit ordinal numbers as keys.

In a few cases:
0001-7FFF: Identifier
8000-FFFE: Indexed items.

Though in BGBCC's ASTs, there was a limit of 4096 AST identifiers, with
the high bits used to identify element type (node, string, 64-bit
integer, Binary64 number). With an indexing scheme for up to 32K sub-nodes.

The ASTs are kinda weird in that they pretend to be XML DOM but
internally now operate very different than a conventional DOM API.
Mostly this is because they were tuned more heavily in terms of
performance and memory use.

Also, I quickly found that any sort of variable-size memory allocation
was something best avoided in any way possible.

In cases where I had more identifiers, the dictionaries usually
contained tagged-references, typically (high 4 bits):
0000: Pointer (48-bit address, 12-bit object type)
0001: Smaller Literal Types (32-48 bits of payload data)
0010: Bounded Pointers in BGBCC, 0 based
0011: Bounded Pointers in BGBCC, variable-based
01xx: Fixnum (62 bits)
10xx: Flonum (62 bits, S.E11.M50)
...

Though, the tag-refs are more used in my runtime library and ABI (either
on my own ISA or RISC-V); with it also using similar B-Tree style
objects for things like ex-nihilo objects.

In BGBCC, it is sort of possible to use C language extensions to do
JavaScript like stuff, but there is a lot of hair here as the C and
JavaScript style type-systems don't mix well (fully preserving C
semantics inside of a dynamic system is not viable, to mixed use exposes
a lot more sharp corners).

Well, and also there is a class/instance object system, but it behaves
more like a Java/C# style object system, rather than a C++ style system.

There is no usable C++ support in BGBCC, this is basically a bit beyond
my current effort budget.

Though, ironically, despite being "technically" B-Trees, often a very
different sort of structure from that typically used in filesystems or
databases (which are often built around disk blocks and for
associations between name-strings and data-blobs, rather than
associating an ordinal with a pointer or similar).

In case I'd need a fast tree structure I'd rather (for specific cases)
use B*-Trees instead of B-Trees; they have better properties. Usually, though, I never needed these sorts of trees; I used binary or AVL in
some (very few) cases.

I have used AVL trees as well.

A little easier to implement, but generally more memory use and slower
(in typical use).

say:
struct AVLNode_s {
AVLNode *lno;
AVLNode *rno;
void *data;
u16 key;
byte depth;
}; // 32 bytes

struct BTreeNode_s {
uint64_t vals[16];
uint16_t keys[16];
byte nkey;
byte depth;
}; //168 bytes

Now, say you want an object with 8 items:
AVL: 256 bytes
B-Tree: 168 bytes
Or: 12 items:
AVL: 384 bytes
B-Tree: 168 bytes

...

BTW, in early days I assumed that Awk realized its associative arrays
with trees, but Arnold told me that internally he uses a hash array (I
never checked). Obvious one variant that fits for all application cases.

FWIW:
In BGBCC, it uses B-Trees for the AST nodes, but most lookups for things
like the global toplevel and similar use hashed lookups (though,
generally all of the global toplevel objects exist in a big array, and
may be referenced by index).

Symbols/labels generally also use a tagged integer structure (generally 32-bit), with a range of symbols keyed to a string table of symbol names
(all of the C identifiers and similar are essentially interned into a
table with 32-bit index numbers, as at this point, a 16-bit number is no longer sufficient).

Some of this is mostly because integer compare is faster than using
"strcmp()" for lookups. Though, a lot of these objects have name strings
as well.

Well, and also types are represented both as a signature string and as a bit-packed tagged integer format. Where the strings are generally used
as the external representation (and the bit-packed integer format mostly
for internal use by the compiler).

Some of this stuff can get a little crazy though.

...

Janis

--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Sat Apr 11 20:28:35 2026

From Newsgroup: comp.lang.c

On Fri, 10 Apr 2026 23:34:41 -0500
BGB <cr88192@gmail.com> wrote:

On 4/10/2026 4:31 PM, James Kuyper wrote:

On 2026-04-10 14:00, BGB wrote:

On 4/10/2026 5:25 AM, Michael S wrote:

On Fri, 10 Apr 2026 01:17:44 -0500
BGB <cr88192@gmail.com> wrote:

...

FWIW:
Hash tables, like linked lists, are one of those things that
pretty much everyone rolls custom nearly every time they need
one.

Not that I disagree with many of your points below, but here you
are wrong. Hash tables are very *unlike* linked lists. Be sure
that overwhelming majority of professional programmers, including
good ones, never implemented a hash table themselves after they
left school.

It seems, you mostly converse with people that share your extreme
DIY attitude. Please realize that for better or worse the bigger
world is different.

?...

So, then are people just naively looping over arrays and using
linear search for everything? Etc.

How do you reach that conclusion from the above comment? All that
you can conclude from the above is that if they think they need a
hash table, they use one that someone else has already implemented,
rather than implementing one of their own. I personally have never implemented a hash table, but I have used the hashing capability
that are built into several of the tools I do use. The closest I
have come to implementing my own hash table is to help someone else
fix the hash function they were passing to a hashing routine. Their function was producing a ridiculously high number of collisions,
because they hadn't put any thought into the distribution of the
data they were hashing.

There are few options sometimes.

Say, for example, you want to look over a list of strings and map
them to an ordinal number:
char *strings[]={
"first",
"second",
...
NULL
};
How do you do it then, exactly?...

One could use a "for()" loop, say:
for(i=0; strings[i]; i++)
if(!strcmp(strings[i], str))
break;
Which is, often OK.

What else, use a 3rd party library? Far likely, that is a bigger
pain... Well, and 3rd party library dependencies are often not worth
it.

And still, overwhelming majority of pro programmers will happily accept
the pain. That is, if the idea to use hash tables crossed their mind in
the first place.
Another option, probably more common in bigger organizations, is to
ask guru. Bigger organizations tend to employ gurus. Guru could ether
help them with 3rd party lib or, if in the good mood, implement
something that meets their needs.

--- Synchronet 3.21f-Linux NewsLink 1.2

From HashTables@invalid@invalid.invalid to comp.lang.c on Sat Apr 11 20:06:43 2026

From Newsgroup: comp.lang.c

On 10/04/2026 19:00, BGB wrote:

On 4/10/2026 5:25 AM, Michael S wrote:

On Fri, 10 Apr 2026 01:17:44 -0500
BGB <cr88192@gmail.com> wrote:

On 4/7/2026 10:14 PM, 🇵🇱Jacek Marcin Jaworski🇵🇱 wrote:

W dniu 7.04.2026 o 23:38, highcrew pisze:

hsearch_r is close to my definition of decent, but it has huge
limitations: fixed size that needs to be estimated before use,
lack of a delete primitive, and it can only use string keys.

Did you know that these "limitations" are hash table attributes in
1. y. computer science students book?

FWIW:
Hash tables, like linked lists, are one of those things that pretty
much everyone rolls custom nearly every time they need one.

Not that I disagree with many of your points below, but here you are
wrong. Hash tables are very *unlike* linked lists. Be sure that
overwhelming majority of professional programmers, including good ones,
never implemented a hash table themselves after they left school.

It seems, you mostly converse with people that share your extreme DIY
attitude. Please realize that for better or worse the bigger world is
different.

?...

So, then are people just naively looping over arrays and using linear
search for everything? Etc.

Then again, I can note that (for example) in the Doom engine, the
original WAD code, while it did use a trick to speed up lump-name
checking, was still using linear search to lookup WAD lumps.

Something like (from memory):
for(i=0; i<numwadlumps; i++)
{
    if( ((int *)(waddir[i].name)[0] == (int *)name[0]) &&
        ((int *)(waddir[i].name)[1] == (int *)name[1]) )
          break;
}

But, this could be sped up some, say:
llname = * *(long long *)name;
h = (int *)name[0] ^ (int *)name[1];
h ^= h >> 17;
h ^= h >> 7;
h &= 255;
c = waddirhash[h];
while(c>=0)
{
    if( *(long long *)(waddir[c].name) == llname) )
      break;
    c = waddir[c].chain;
}

This doesn't feel too atypical, except that some people don't like this
sort of wanton pointer casting and similar.

But, as noted, gets harder if one wants a fast hash for a
variable-length string, say:
lcomask=0x8080808080808080ULL;
lcoadj =0x0101010101010101ULL;
v=*(uint64_t *)str;
s=str; va=0;
//check for NUL terminators
while(!((v-lcoadj)&((~v)&lcomask)))
{
    va^=(va>>17)
    va^=(va<< 7)
    va^=v;
    s+=8;
    v=*(uint64_t *)s;
}
//trailing bytes
for(i=0; i<64; i+=8)
{
    c=(v>>i)&255;
    if(!c) break; // NUL byte
    va^=(va>>17)
    va^=(va<< 7)
    va^=(c<<13);
}
//generate final hash
v=va^(va>>31);
v^=v>>17;
v^=v>>11;
h=v&4095;
return(h);

As opposed to a simpler but slower:
h=0;
while(*s)h=(h*251)+(*s++);
h=((h*251+7)>>8)&4095;
return(h);

But, then again, a lot depends on average string lengths, and if the
averate length is less than around 8 characters or so, the latter may
still be competitive (or faster). Or, one may not be worrying too much
about the speed of the hash function.

Well, and the former does seem to fall outside the realm of typical C
coding practice.

Also this particular algorithm also assumes a machine with fast
unaligned memory loads, ... Whereas the naive strategy may be safer if
one allows for targets which may be alignment sensitive, ...

Though, a similar approach can be used to determine things like string length, or to implement things like "strcmp()".

Where, for example, the C library I am using in some of my projects had originally been using more naive string functions (looping and working character by character), but this was a lot slower.

Even if the faster versions can look like ugly hair...

Well, and sometimes there are cases where things like AVL Trees or
B-Trees or similar are preferable (for example, they may scale better
and use less memory, even if the average time lookup cost is higher).

But, hashes can work well as a good quick/dirty way to speed up lookups, etc.

Well, or sometimes they can be combined:
A small hash-table as a lookup cache on the front-end of a B-Tree;
...

Etc...

I used to do something like this at college (many years ago!!):

void initHashTable(HashTable *table, int size) {
table->size = size;
table->count = 0;
table->items = malloc(sizeof(Item*) * size);
for (int i = 0; i < size; i++) {
table->items[i] = NULL; // Initialize all buckets to NULL
}
}

void addItem(HashTable *table, int key, int value) {
int index = hash(key, table->size);
// Add item to the linked list at index
}

Michael says, 'Hash tables are very unlike linked lists'. Actually, they
are very much like linked lists, so I agree with BGB, but I tend to
think in more abstract terms.

--- Synchronet 3.21f-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Sat Apr 11 14:19:06 2026

From Newsgroup: comp.lang.c

On 4/10/2026 11:00 AM, BGB wrote:
[...]

Have you ever experimented with Cantor pairing has a hash?
--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Sun Apr 12 11:06:56 2026

From Newsgroup: comp.lang.c

On Sat, 11 Apr 2026 20:06:43 +0100
HashTables <invalid@invalid.invalid> wrote:

Michael says, 'Hash tables are very unlike linked lists'. Actually,
they are very much like linked lists, so I agree with BGB, but

It sounds like you didn't understand what I was talking about.
The context was "where do professional C programmers tend to use
pre-written libraries vs coding the damn thing from the first
principles as it suits the situation at hand". For hash tables it's predominantly the former, for linked lists it's often enough the latter.

I tend to think in more abstract terms.

I tend not to.

--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.lang.c on Sun Apr 12 03:36:29 2026

From Newsgroup: comp.lang.c

On 4/11/2026 4:19 PM, Chris M. Thomasson wrote:

On 4/10/2026 11:00 AM, BGB wrote:
[...]

Have you ever experimented with Cantor pairing has a hash?

Not Cantor specifically, but have made use of Morton Ordering a fair
bit, and also Hilbert Ordering in some use cases.

For hashing in a grid space, have often done things like:
(((X*Prime1+Y*Prime2+Bias1)*Prime3+Bias2)>>SHIFT)&MASK

Though, this is sometimes not particularly cheap (due to all the
multiplies and similar).

Granted, can reduce this to: (X*(Prime1*Prime3)+Y*(Prime2*Prime3)+(Bias1*Prime3+Bias2))>>SHIFT)&MASK
Which constant propagation reduces to:
(X*Magic1+Y*Magic2+Magic3)>>SHIFT)&MASK

While it might seem intuitive to just use 3 primes in the latter case,
there can be some mathematical weirdness at play which may make
composite numbers following the previous pattern preferable.

Also (for maybe esoteric reasons) best results if Magic1+Magic2+Magic3
add up to something just larger than the power-of-2 value of the Shift,
which is also better if larger than MASK (though, this part is looser). Usually the Bias values are small prime numbers just big enough to push
the value over the threshold, and it is better if the X/Y values are
uneven (don't pick primes that are too similar, but also not too far
apart), and given primes should not be reused, ...

Though, say, for picking two primes, maybe something near 3/8 of the
target sum and another near 5/8 or similar.

Also be careful of Mersenne primes as they seem to have different
behavioral properties than other primes (good for biases, but avoid for multipliers in this configuration).

...

Or, at least, this sort of approach has seemed to work OK for me in
practice, but I can't really explain the "why" aspects all that well.

But, yeah, this part does sometimes seem almost like some sort of
numerical black arts (do it well, hash works well; do it poorly, and
hash function sucks).

Though, one why that is easier to explain:
Hash tables should always be a power of 2.
Why:
Mostly because the % operator is typically painfully slow.

Another option is to use Morton Shuffling, say:
((__morton(x, y)*Prime+Bias)>>SHIFT)&MASK

or, 4D:
((__morton(__morton(x, y), __morton(z, w))*Prime+Bias)>>SHIFT)&MASK
For 3D, can make up a W axis, like:
x-y+z+Magic

Though, mostly makes sense on an ISA with built-in support for Morton
Shuffles (otherwise can end up being more expensive than using
multiplies or similar).

...

--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.lang.c on Sun Apr 12 04:24:32 2026

From Newsgroup: comp.lang.c

On 4/12/2026 3:06 AM, Michael S wrote:

On Sat, 11 Apr 2026 20:06:43 +0100
HashTables <invalid@invalid.invalid> wrote:

Michael says, 'Hash tables are very unlike linked lists'. Actually,
they are very much like linked lists, so I agree with BGB, but

It sounds like you didn't understand what I was talking about.
The context was "where do professional C programmers tend to use
pre-written libraries vs coding the damn thing from the first
principles as it suits the situation at hand". For hash tables it's predominantly the former, for linked lists it's often enough the latter.

Well, my experience differs here.

But, I haven't done much of any real professional programming, most just poking around with FOSS stuff, and fiddling with my own projects.

I tend to think in more abstract terms.

I tend not to.

I suspect my thinking tends towards a fairly low level of abstraction.

Low level of abstractions, typically;
Seemingly tends towards a "bottom up" approach to a lot of things;
...

Had noted that my thinking patterns (and typically also dreams) tend to
be dominated by a mixture of text and monochrome graphics.

In some inner parts of my "inner world", it seems almost like a MUD
(like I had found within my mind, it is like there are free-running text
feeds of whatever I am doing, thinking, and whatever I see around me,
etc). But, then on top of this, a layer that represents worlds in mostly static images and monochrome (with some limited color following
something akin to a 16-color palette).

Can imagine 3D scenery and graphics easily, but things mostly remain as
a sort of white-on-black monochrome.

Otherwise, noted on things like a NOVA episode talking about sensory perception, and some other science-type videos talking about sensory perception, that for me senses seem to work very different...

Like, I don't tend to live in a fully realized high-detail world. I can
see that my eyes see such a world, but it doesn't quite match descriptions.

Many illusions don't work correctly, and I am left being all too aware
of visual perception glitches involving things like mirrors and shadows:
Things like "hollow face" don't work as described;
Many others give different behaviors;
Visual perception tends to flag a mirror as a window into another room,
and my reflection as a person standing in that room (intellectually, I
know it is myself, visual processing disagrees);
Shadows sometimes are either seen as separate objects, or almost like "darkness volumes" that extend off of objects (not easy to describe
exactly);
Color constancy doesn't behave exactly as described. At least in my
case, the color of an object is not seen independently of the color of
the lighting (like, say, warm white makes everything look kinda yellow,
etc).

But, given "mirror as doorway" (or a person fighting with the person in
the mirror, etc) or "shadows as independent figures" are fairly common
tropes in media, seems like these sensory glitches aren't entirely
unorthodox.

In cases of ambiguous figures (images which can be seen as one of
several possibilities), my vision doesn't usually settle on one or the
other, but rather tends to chaotically flicker between both of them.

Well, except in the "one eye sees green, the other sees red", where
ironically (and contrary to claims), my vision combines them to yellow
(and does not see a flicker).

Also, I don't see the world as a single monolithic thing, but more sort
of a layered "onion like" thing.

Well, and I don't have sort of a "seeing everything all at once"
experience, rather I am rather aware that one needs to actually look at
things to see them, and can only see it well if one is looking at it
directly (that someones' sensory experience could somehow not include
this aspect seems almost alien).

Mind can keep a map of things I am not currently looking at, but it is
more of a mental map, and tends to represent things like black-and-white glyphs representing objects on the inside of a sphere. Sometimes, it can
also construct a top down view (for path-finding) and can visualize
waypoints (and superimpose them onto my normal vision).

But, like, I know that I am doing this. And, say, that the floating line
I need to follow to get to a given location is not an "actually real
entity", but rather something that one part of my mind draws so that my
legs know which direction to keep walking in.

But, yeah, general existence is its own thing I guess...

...

--- Synchronet 3.21f-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Sun Apr 12 14:51:33 2026

From Newsgroup: comp.lang.c

Michael S <already5chosen@yahoo.com> wrote:

On Fri, 10 Apr 2026 01:17:44 -0500
BGB <cr88192@gmail.com> wrote:

On 4/7/2026 10:14 PM, 🇵🇱Jacek Marcin Jaworski🇵🇱 wrote:

W dniu 7.04.2026 o 23:38, highcrew pisze:

hsearch_r is close to my definition of decent, but it has huge
limitations: fixed size that needs to be estimated before use,
lack of a delete primitive, and it can only use string keys.

Did you know that these "limitations" are hash table attributes in
1. y. computer science students book?

FWIW:
Hash tables, like linked lists, are one of those things that pretty
much everyone rolls custom nearly every time they need one.

Not that I disagree with many of your points below, but here you are
wrong. Hash tables are very *unlike* linked lists. Be sure that
overwhelming majority of professional programmers, including good ones,
never implemented a hash table themselves after they left school.

It seems, you mostly converse with people that share your extreme DIY attitude. Please realize that for better or worse the bigger world is different.

I have made my own implementation of hash tables. For many tasks
I have used Perl scripts. But once I had 8MB machine (more or less
average at that time) and something like 1.3 MB data. Perl
was running out of memory and I considered this unreasonable.
So I created my implementation of hash tables in C and it had
no trouble processing the data. IIRC I spent something like
2 days thinking about design and 1 day coding. I did not need
to delete items, so this was/is not supported. Keys are counted
strings, that allows hashing arbitrary data, but equality is
hardcoded to bitwise comparison. Table stores pointers to
key-data structs, it uses "double hashing" to handle
collisions. Table grows when data is inserted, starting at
about kilobyte and possibly growing to gigabytes.

My initial implementation had a bug, it lost piece of data that
triggered resizing. I was using it to collect counts for
essentially statistical processing, so loosing few items did not
matter much for final result and I initially overlooked that
problem. Also, I initally used xor to reduce multiword data
to single word. That caused excessive collisions on some of
my data, so I replaced xor by addition (that could be improved
with some extra mixing, I do not remember if I did that).

I used this implementation several times, sometimes with small
tweaks, so it is not "fully custom" every time I need it.
OTOH initial design satisfied my needs (fully binary keys,
reasonable space use, reasonable speed) while "ready"
things that I could use did not. Considering total
implementation time, that is about 4 days (3 for initial
implementation and 1 for fixes) I consider it time well
spent (one can easily spend more time searching for
good implementation on the net).

Of couse, if you do not need a hash table, then you do not
need to implement it. Similarly, if a co-worker created
good implementation or more generally if you can use
implementation created by some other person, then use it.
So I agree that majority of programmer probably do not
need to implement a hash table. But IMO minority that
implements hash tables in C is not small.
--
Waldek Hebisch
--- Synchronet 3.21f-Linux NewsLink 1.2

From HashTables@invalid@invalid.invalid to comp.lang.c on Sun Apr 12 19:56:50 2026

From Newsgroup: comp.lang.c

On 12/04/2026 09:06, Michael S wrote:

I tend to think in more abstract terms.

I tend not to.

I've noticed that, which is why I decided to encourage people to take a
break from coding and think about what the code is supposed to do. It's
very easy to dismiss something just because it uses a different name,
without looking at the code itself.
--
*Newsgroup Post *

--- Synchronet 3.21f-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Apr 13 08:27:13 2026

From Newsgroup: comp.lang.c

HashTables <invalid@invalid.invalid> writes:

Michael says, 'Hash tables are very unlike linked lists'.
Actually, they are very much like linked lists, [...]

Hash tables are nothing like linked lists. Some kinds of hash
tables (but not all) use linked lists internally, but that has
nothing to do with their public interface. A function to sort an
array might choose internally to produce a linked list and then sort
the list before putting elements back into the array, but that
doesn't mean arrays are like linked lists. The same is true for
hash tables.
--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.lang.c on Mon Apr 13 14:46:45 2026

From Newsgroup: comp.lang.c

On 4/13/2026 10:27 AM, Tim Rentsch wrote:

HashTables <invalid@invalid.invalid> writes:

Michael says, 'Hash tables are very unlike linked lists'.
Actually, they are very much like linked lists, [...]

Hash tables are nothing like linked lists. Some kinds of hash
tables (but not all) use linked lists internally, but that has
nothing to do with their public interface. A function to sort an
array might choose internally to produce a linked list and then sort
the list before putting elements back into the array, but that
doesn't mean arrays are like linked lists. The same is true for
hash tables.

In the original sense, the idea was that they were like linked lists in
the sense that a programmer may often tend write both as one-off things whenever they come up (as opposed to using some 3rd party
"foo_linked_list.h" library or something).

Not in the sense of there being any sort of technical or functional
similarity between hash tables and linked lists.

--- Synchronet 3.21f-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Mon Apr 13 14:42:39 2026

From Newsgroup: comp.lang.c

On 4/13/2026 8:27 AM, Tim Rentsch wrote:

HashTables <invalid@invalid.invalid> writes:

Michael says, 'Hash tables are very unlike linked lists'.
Actually, they are very much like linked lists, [...]

Hash tables are nothing like linked lists. Some kinds of hash
tables (but not all) use linked lists internally, but that has
nothing to do with their public interface. A function to sort an
array might choose internally to produce a linked list and then sort
the list before putting elements back into the array, but that
doesn't mean arrays are like linked lists. The same is true for
hash tables.

Linked lists for hash collisions are fairly common?
--- Synchronet 3.21f-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Tue Apr 14 01:14:56 2026

From Newsgroup: comp.lang.c

On 2026-04-13 21:46, BGB wrote:

[...]

In the original sense, the idea was that they were like linked lists in
the sense that a programmer may often tend write both as one-off things whenever they come up (as opposed to using some 3rd party "foo_linked_list.h" library or something).

Okay, you now clarified what you meant by that comparison, but;
I'm not sure where you're coming from, and where you've got the
impression from that "a programmer may often tend write both as
one-off things". - I can of course only speak for the contexts
I've professionally worked in, and experienced a different thing.
First it was never the individual programmer's decision whether
projects created their own functions or libraries, or whether
some existing product or source will be used, or examined before
the decision whether it fits to be used. If the language we used
didn't provide the necessary functions we got third-party code
(for C++ in the 1990's, for example, Boost). Why spend manpower
and have all the hassle with producing correct code when others
have already provided the functions or libraries in good quality.

I'm aware that the Not Invented Here syndrome was (and maybe is)
existing. So I'm curious how widely spread that habit actually is;
especially in professional commercial contexts, as opposed to
individual private projects where every individual programmer may
do what he likes.

Janis

[...]

--- Synchronet 3.21f-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Tue Apr 14 01:17:06 2026

From Newsgroup: comp.lang.c

On 2026-04-13 23:42, Chris M. Thomasson wrote:

On 4/13/2026 8:27 AM, Tim Rentsch wrote:

HashTables <invalid@invalid.invalid> writes:

Michael says, 'Hash tables are very unlike linked lists'.
Actually, they are very much like linked lists, [...]

Hash tables are nothing like linked lists. Some kinds of hash
tables (but not all) use linked lists internally, but that has
nothing to do with their public interface. A function to sort an
array might choose internally to produce a linked list and then sort
the list before putting elements back into the array, but that
doesn't mean arrays are like linked lists. The same is true for
hash tables.

Linked lists for hash collisions are fairly common?

They are an implementation option. One common variant of a few
(also common) other possibilities we have.

Janis

--- Synchronet 3.21f-Linux NewsLink 1.2

Who's Online

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,113
Nodes:	10 (0 / 10)
Uptime:	492402:21:18
Calls:	14,249
Files:	186,315
D/L today:	84 files (21,992K bytes)
Messages:	2,515,148

A neat hash table implementation

Who's Online

System Info