Forum: War Ensemble BBS

Re: Article of Melissa O'Nail

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Wed Jan 7 10:51:16 2026

From Newsgroup: comp.lang.c

Michael S <already5chosen@yahoo.com> wrote:

I experimented a bit more (in fact, more like a lot more) with
test batteries of L’Ecuyer. It led me to conclusion that occasional
failure in the either middle or big battery means nothing.
Sometimes even cripto-quality PRNG does not pass one or another test.
Then you try to reproduce it and see that with any other seed that you
try a failure does not happen.
All in all, it makes me more suspect of PRNGs that consistently pass
both batteries with various seed. I start to see it as a sign of
PRNG being rigged to pass tests.

Well, that depends on the tests and threshhold in the tests.
Some tests when fed with trurly random source will produce produce
very small variation of the results. With generous threshhold
such test will essentially never fail for trurly random source.
OTOH when expected variation of the results is larger and
threshhold is tight, then trurly random source will fail the
test from time to time. And if you test long enough you should
be able to estimate probability of failure and possibly compare
is with theoretical result if available.

To say the truth, I do not know what failures of crypto-quality PRNG
means. It may mean that the test is of tight kind that is supposed
to fail from time to time for trurly random source. Or it may mean
to PRNG improves cypto part at cost of statistics. That is
non-uniform distribution of the output is not a problem in
crypto applications. Simply for crypto purposes future output
should be not predictable from current and past output only.
And slightly non-uniform distribution can increase probablity
of test failure enough that you can observe such failures.

BTW: You mentioned using counter and hardware AES128 round.
Given cheap AES128 round I would try 128-bit state and AES128
round as state update. I do not know if hardware AES128 is
fast enough to make it wortwhile, but using AES128 round as a
state update should be much better than scrambling a counter.

Said above does not apply to scomp_LinearComp() failures of mt19937.
Those failures are very consistent. I just don't consider them
significant for my own use of PRNGs or for any other uses of PRNG that
I personally ever encountered.

Overall an experience strengthened my position that general wisdom, previously shared by O'Neil herself, got it right: in absence of the
special considerations people should select mt19937 and especially
mt19937-64 as their default PRNGs of choice.
Looking closer, apart from its properties of randomness and apart from
huge period

Huge period alone is easy. AFAICS matrix variants of LCG can
easily get quite large periods. I did not test matrix LCG,
but on statistical tests they should be essentially as good
as multiprecision LCG, but should be cheaper to implement.

Just to be clear, I mean equation x_{n+1} = Ax_n + b, wher x_n
is a vector reprezenting n-th state, A is a matrix and b is a
vector. Matrix A may be somewhat sparse, that is have limited
number of non-zero entries, and some entries my be simple, like
1 or -1. With proper choice of a and b period should be
comparable with number of availalable states.

I see no reason to go for very long periods, already 512 bits
of state allow perids which in practice should be indistingushable
from infinite period.

(which does not matter for me) I started to appreciate for
mt19937 for the following properties that I was not aware before:
- it does not use multiplier. So good fit for embedded systems that
have no (or very slow) multiplier HW.

Biggest MCU with no hardware multiplier that I have has 2kB RAM.
I do not want mt19937 on such a machine. 64-bit multiplication
on 8-biter with hardware multiplier may be slow, so 64-bit LCG
(and improvements based on it) may be slow. But multiplication
nicely spreads out bits, so it is not clear to me if there is
equally good cheaper alternative. If needed I would investigate
matrix LCG, they may be slightly cheaper.

- algorithm is very SIMD-friendly, so optimized implementations can be
very very fast on modern x86 and ARM64 hardware.

Just size of the state puts limit how fast it can be. And size of
the state means that it will compete for cache with user data.
BTW: AFAICS matrix LCG can be SIMD friendly too.

The latter property also means that very fast FPGA implementation is
easily possible as long as designer is willing to through at it
moderate amount of logic resources.

Said above does not mean that PCG generators of O'Neil have no place. Intuitively, they appear not bad. But the theory is unproven, optimized implementation is likely slower that optimized mt19937, claimed
"security" advantages are nonsense as admitted later by O'Neil herself.
And, as said above, I no longer trust her empirical methodology, based
on work of L’Ecuyer.
So, PCG generators are valuable addition to the toolbox, but not good
enough to change my default.

I agree that ATM it is not entirely clear if PCG-s are as good as
suggested by tests. But I am surprised by your opinion about
speed, I did not analyze deeply either of them, but PCG-s are way
simpler, so I would expect them to be faster.
--
Waldek Hebisch
--- Synchronet 3.21a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Wed Jan 7 07:39:08 2026

From Newsgroup: comp.lang.c

John McCue <jmclnx@gmail.com.invalid> writes:

Michael Sanders <porkchop@invalid.foo> wrote:

Is it incorrect to use 0 (zero) to seed srand()?

int seed = (argc >= 2 && strlen(argv[1]) == 9)
? atoi(argv[1])
: (int)(time(NULL) % 900000000 + 100000000);

srand(seed);

I like to just read /dev/urandom when I need a random
number. Seem easier and more portable across Linux &
the *BSDs.

int s;
read(fd, &s, sizeof(int));

Apples and oranges. Many applications that use random numbers
need a stream of numbers that is deterministic and reproducible,
which /dev/urandom is not.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Wed Jan 7 07:46:40 2026

From Newsgroup: comp.lang.c

Michael S <already5chosen@yahoo.com> writes:

[regarding rand() and srand()]

Pay attention that C Standard only requires for the same seed to always produces the same sequence. There is no requirement that different
seeds have to produce different sequences.
So, for generator in your example, implementation like below would be
fully legal. Personally, I wouldn't even consider it as particularly
poor quality:

void srand(unsigned seed ) { init = seed | 1;}

It seems better to do, for example,

void srand(unsigned seed ) { init = seed - !seed;}

--- Synchronet 3.21a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Wed Jan 7 07:50:09 2026

From Newsgroup: comp.lang.c

antispam@fricas.org (Waldek Hebisch) writes:

[...]

There is a paper "PCG: A Family of Simple Fast Space-Efficient
Statistically Good Algorithms for Random Number Generation"
by M. O?Neill where she gives a family of algorithms and runs
several statistical tests against known algorithms. Mersenne
Twister does not look good in tests. If you have enough (128) bits
LCGs do pass tests. A bunch of generators with 64-bit state also
passes tests. So the only reason to prefer Mersenne Twister is
that it is implemented in available libraries. Otherwise it is
not so good, have large state and needs more execution time
than alternatives.

Interesting paper. Thank you.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Wed Jan 7 18:14:14 2026

From Newsgroup: comp.lang.c

On Wed, 07 Jan 2026 07:46:40 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Michael S <already5chosen@yahoo.com> writes:

[regarding rand() and srand()]

Pay attention that C Standard only requires for the same seed to
always produces the same sequence. There is no requirement that
different seeds have to produce different sequences.
So, for generator in your example, implementation like below would
be fully legal. Personally, I wouldn't even consider it as
particularly poor quality:

void srand(unsigned seed ) { init = seed | 1;}

It seems better to do, for example,

void srand(unsigned seed ) { init = seed - !seed;}

Yes, it is better.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Wed Jan 7 08:41:25 2026

From Newsgroup: comp.lang.c

Michael S <already5chosen@yahoo.com> writes:

On Tue, 23 Dec 2025 17:54:05 -0000 (UTC)
antispam@fricas.org (Waldek Hebisch) wrote:

[...]

There is a paper "PCG: A Family of Simple Fast Space-Efficient
Statistically Good Algorithms for Random Number Generation"
by M. O?Neill where she gives a family of algorithms and runs
several statistical tests against known algorithms. Mersenne
Twister does not look good in tests. If you have enough (128) bits
LCGs do pass tests. A bunch of generators with 64-bit state also
passes tests. So the only reason to prefer Mersenne Twister is
that it is implemented in available libraries. Otherwise it is
not so good, have large state and needs more execution time
than alternatives.

I don't know. Testing randomness is complicated matter.
How can I be sure that L'Ecuyer and Simard's TestU01 suite tests
things that I personally care about and that it does not test
things that are of no interest for me? Especially, the latter.

Do you think any of the tests in the TestU01 suite are actually counter-indicated? As long as you don't think any TestU01 test
makes things worse, there is no reason not to use all of them.
You are always free to disregard tests you don't care about.

Also, the TestU01 suit is made for generators with 32-bit output.
M. O'Neill used ad hoc technique to make it applicable to
generators with 64-bit output. Is this technique right? Or may
be it put 64-bit PRNG at unfair disadvantage?

As long as the same mapping is applied to all 64-bit PRNGs under
consideration I don't see a problem. The point of the test is to
compare PRNGs, not to compare test methods. If someone thinks a
different set of tests is called for they are free to run them.

Besides, I strongly disagree with at least one assertion made by
O'Neill: "While security-related applications should use a secure
generator, because we cannot always know the future contexts in
which our code will be used, it seems wise for all applications to
avoid generators that make discovering their entire internal state
completely trivial."
No, I know exactly what I am doing/ I know exactly that for my
application easy discovery of complete state of PRNG is not a
defect.

You and she are talking about different things. You are talking
about choosing a PRNG to be used only by yourself. She is talking
about choosing a PRNG to be made available to other people without
knowing who they are or what their needs are. In the second case
it's reasonable to raise the bar for the set of criteria that need
to be met.

Anyway, even if I am skeptical about her criticism of popular PRNGs, intuitively I agree with the constructive part of the article - medium-quality PRNG that feeds medium quality hash function can
potentially produce very good fast PRNG with rather small internal
state.

After looking at one of the example PCG generators, I would
describe it as a medium-quality PRNG that feeds a low-quality
hash. The particular combination I looked at produced good
results, but it isn't clear which combinations of PRNG and
hash would do likewise.

On related note, I think that even simple counter fed into high
quality hash function (not cryptographically high quality, far
less than that) can produce excellent PRNG with even smaller
internal state. But not very fast one. Although the speed
depends on specifics of used computer. I can imagine computer
that has low-latency Rijndael128 instruction. On such computer,
running counter through 3-4 rounds of Rijndael ill produce very
good PRNG that is only 2-3 times slower than, for example, LCG
128/64.

I think the point of her paper where she talks about determining
how much internal state is needed is to measure the efficacy of
the PRNG, not to try to reduce the amount of state needed. Based
on my own experience with various PRNGs I think it's a mistake to
try to minimize the amount of internal state needed. My own rule
of thumb is to allow at least a factor of four: for example, a
PRNG with a 32-bit output should have at least 128 bits of state.
My latest favorite has 256 bits of state to produce 32-bit
outputs (and so might also do well to produce 64-bit outputs, but
I haven't tested that).
--- Synchronet 3.21a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Wed Jan 7 13:54:21 2026

From Newsgroup: comp.lang.c

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

John McCue <jmclnx@gmail.com.invalid> writes:

Michael Sanders <porkchop@invalid.foo> wrote:

Is it incorrect to use 0 (zero) to seed srand()?

int seed = (argc >= 2 && strlen(argv[1]) == 9)
? atoi(argv[1])
: (int)(time(NULL) % 900000000 + 100000000);

srand(seed);

I like to just read /dev/urandom when I need a random
number. Seem easier and more portable across Linux &
the *BSDs.

int s;
read(fd, &s, sizeof(int));

Apples and oranges. Many applications that use random numbers
need a stream of numbers that is deterministic and reproducible,
which /dev/urandom is not.

And neither is the non-conforming rand() on OpenBSD.

The rand(1) man page on OpenBSD 7.8 says:

Standards insist that this interface return deterministic
results. Unsafe usage is very common, so OpenBSD changed the
subsystem to return non-deterministic results by default.

To satisfy portable code, srand() may be called to initialize
the subsystem. In OpenBSD the seed variable is ignored,
and strong random number results will be provided from
arc4random(3). In other systems, the seed variable primes a
simplistic deterministic algorithm.

It does provide an srand_deterministic() function that behaves the way
srand() is supposed to.

And a program that calls rand() produces a link-time warning, even
though OpenBSD's rand() *doesn't* return deterministic values.

ld: warning: rand_test.c(rand_test.o:(main)): warning: rand() may return deterministic values, is that what you want?

(In a similarly questionable decision, OpenBSD's printf triggers a
SIGABRT signal if the format string uses "%n".)
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Thu Jan 8 01:06:01 2026

From Newsgroup: comp.lang.c

On Wed, 07 Jan 2026 08:41:25 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Michael S <already5chosen@yahoo.com> writes:

On Tue, 23 Dec 2025 17:54:05 -0000 (UTC)
antispam@fricas.org (Waldek Hebisch) wrote:

[...]

There is a paper "PCG: A Family of Simple Fast Space-Efficient
Statistically Good Algorithms for Random Number Generation"
by M. O?Neill where she gives a family of algorithms and runs
several statistical tests against known algorithms. Mersenne
Twister does not look good in tests. If you have enough (128) bits
LCGs do pass tests. A bunch of generators with 64-bit state also
passes tests. So the only reason to prefer Mersenne Twister is
that it is implemented in available libraries. Otherwise it is
not so good, have large state and needs more execution time
than alternatives.

I don't know. Testing randomness is complicated matter.
How can I be sure that L'Ecuyer and Simard's TestU01 suite tests
things that I personally care about and that it does not test
things that are of no interest for me? Especially, the latter.

Do you think any of the tests in the TestU01 suite are actually counter-indicated? As long as you don't think any TestU01 test
makes things worse, there is no reason not to use all of them.
You are always free to disregard tests you don't care about.

Except that it's difficult psychologically.
The batteries of test gains position of of authority in your mind.
Well, may be, you specifically are resistant, but I am not. Nor is
Melissa O'Nail, it seems.

To illustrate my point, I will tell you the story about myself.
Sort of confession.
If you had read the rest of this thread (or paid attention to
finer details in the article of O'Nail) then you already know that
mt19937 consistently fails in scomp_LinearComp() subtest of Crush and
BigCrush batteries of Test01. As reported, I "fixed" it by skipping
every 19936th word of mt19937 output. This particular "fix" is benign.
I am sure that it does not make output of generator less random. The
only impact is a bit of slowness, because of the need to manage yet
another modulo counter.
What I did not tell so far is that I tried another "fix". I added
leakage during mt state update. It means that periodically I
forced two LS bits of newly generated state word to be '01', without
affecting the word that goes to tampering and then to output.
And according to Test01 it worked! Flew through all batteries!
Luckily, I was sufficiently self-conscious to understand that I don't understand nearly enough about algebra of Galois fields to predict all consequences of my modification. But that's me. I know few people that
are less aware of their limitations.

Also, the TestU01 suit is made for generators with 32-bit output.
M. O'Neill used ad hoc technique to make it applicable to
generators with 64-bit output. Is this technique right? Or may
be it put 64-bit PRNG at unfair disadvantage?

As long as the same mapping is applied to all 64-bit PRNGs under consideration I don't see a problem. The point of the test is to
compare PRNGs, not to compare test methods. If someone thinks a
different set of tests is called for they are free to run them.

But Melissa, following advice of L'Ecuyer, and me, following advice of
Melissa, tested 64-bit generators three times - LSW then MSW, only LSW
and only MSW, thus putting them under trice more serious scrutiny than
32-bit counterparts.

Besides, I strongly disagree with at least one assertion made by
O'Neill: "While security-related applications should use a secure generator, because we cannot always know the future contexts in
which our code will be used, it seems wise for all applications to
avoid generators that make discovering their entire internal state completely trivial."
No, I know exactly what I am doing/ I know exactly that for my
application easy discovery of complete state of PRNG is not a
defect.

You and she are talking about different things. You are talking
about choosing a PRNG to be used only by yourself. She is talking
about choosing a PRNG to be made available to other people without
knowing who they are or what their needs are. In the second case
it's reasonable to raise the bar for the set of criteria that need
to be met.

No, this part of her article is a mistake, plain and simple.
He even sort of admitted it couple of years later in her blogg.
In reality, we either need secure PRNG or do not need secure PRNG.
There is no middle ground of "more or less secure insecure PRNGs".
PRNGs she advocates are insecure until proven otherwise by crypto
analysts.

Anyway, even if I am skeptical about her criticism of popular PRNGs, intuitively I agree with the constructive part of the article - medium-quality PRNG that feeds medium quality hash function can
potentially produce very good fast PRNG with rather small internal
state.

After looking at one of the example PCG generators, I would
describe it as a medium-quality PRNG that feeds a low-quality
hash. The particular combination I looked at produced good
results, but it isn't clear which combinations of PRNG and
hash would do likewise.

I tend to like CRC32C for hashing 64 bits into 32 bits; for no reason
apart from that this primitive is available and fast on modern Intel
and AMD CPUs. ARM64 CPUs as well, I think, although less than 100% sure.

On related note, I think that even simple counter fed into high
quality hash function (not cryptographically high quality, far
less than that) can produce excellent PRNG with even smaller
internal state. But not very fast one. Although the speed
depends on specifics of used computer. I can imagine computer
that has low-latency Rijndael128 instruction. On such computer,
running counter through 3-4 rounds of Rijndael ill produce very
good PRNG that is only 2-3 times slower than, for example, LCG
128/64.

I think the point of her paper where she talks about determining
how much internal state is needed is to measure the efficacy of
the PRNG, not to try to reduce the amount of state needed. Based
on my own experience with various PRNGs I think it's a mistake to
try to minimize the amount of internal state needed. My own rule
of thumb is to allow at least a factor of four: for example, a
PRNG with a 32-bit output should have at least 128 bits of state.
My latest favorite has 256 bits of state to produce 32-bit
outputs (and so might also do well to produce 64-bit outputs, but
I haven't tested that).

One important point that I seem to figure out recently is that the only practical way to produce both solid and very fast PRNG that adheres to
standard language APIs with 32-bit and to somewhat smaller extent 64-bit output, is to use buffering. I.e. most of the time generator simply
reads pre-calculated word from the buffer and only ones per N
iterations runs an actual PRNG algorithm, probably in a loop, often
in SIMD. In order for this approach to be effective, buffer can't be particularly small. 32 bytes (256 bits) appear to be an absolute
minimum. The buffer and counter that manages buffering, are parts of the generator state. That alone sets a practical minimal limit on the size
of generator and diminishes significance of the difference between
PRNGs with "algorithmic" state of 64 bits, 128 bits or even 256 bits.

The observation certainly applies to PCGs or to anything else that
utilizes LCG for its state update primitive.

Now, if one does no look for ultimate speed then said above does not
apply.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael Sanders@porkchop@invalid.foo to comp.lang.c on Thu Jan 8 02:57:08 2026

From Newsgroup: comp.lang.c

On Mon, 22 Dec 2025 08:48:10 -0000 (UTC), Michael Sanders wrote:

[...]

This puppy is done (at least I'm happy with it).

Main thing I wanted to accomplish was solid keyboard
handling in the REPL (so I can apply the knowledge
gained to another project & I believe I'm pretty close.)

As it stands the game now has:

- 3 games in a single binary (mastermind, moo, bagels)

- man page

- make file

- sparse comments...

Zipped sources [8.3KB] for a few days below:

<https://drive.google.com/file/d/1PAzCj-Mfx06y2oHz9LQTxa74Aerd2pYe/view?usp=sharing>

Thanks for the help everyone, I'm gone fishin' for awhile =)
--
:wq
Mike Sanders
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Thu Jan 8 14:03:35 2026

From Newsgroup: comp.lang.c

On Wed, 7 Jan 2026 10:51:16 -0000 (UTC)
antispam@fricas.org (Waldek Hebisch) wrote:

Michael S <already5chosen@yahoo.com> wrote:

I experimented a bit more (in fact, more like a lot more) with
test batteries of L’Ecuyer. It led me to conclusion that occasional failure in the either middle or big battery means nothing.
Sometimes even cripto-quality PRNG does not pass one or another
test. Then you try to reproduce it and see that with any other seed
that you try a failure does not happen.
All in all, it makes me more suspect of PRNGs that consistently pass
both batteries with various seed. I start to see it as a sign of
PRNG being rigged to pass tests.

Well, that depends on the tests and threshhold in the tests.
Some tests when fed with trurly random source will produce produce
very small variation of the results. With generous threshhold
such test will essentially never fail for trurly random source.
OTOH when expected variation of the results is larger and
threshhold is tight, then trurly random source will fail the
test from time to time.

Yes, there are many stable tests. But there are also many unstable
tests in Crash/BigCrash batteries. I can't say if the ratio is 50:50 or
75:25, but it is not 90:10.
svaria_WeightDistrib() and swalk_RandomWalk1() appear most unstable.
You feed them with particular pseudo-random vector (100,000,000 points)
and the test fails. Then you shift the vector by just one point, to the
left or to the right. And it passes, often not just passes, but produces
a result that is not even close to the margin.

And if you test long enough you should
be able to estimate probability of failure and possibly compare
is with theoretical result if available.

Long enough in this case is very damn long.

To say the truth, I do not know what failures of crypto-quality PRNG
means. It may mean that the test is of tight kind that is supposed
to fail from time to time for trurly random source. Or it may mean
to PRNG improves cypto part at cost of statistics. That is
non-uniform distribution of the output is not a problem in
crypto applications. Simply for crypto purposes future output
should be not predictable from current and past output only.
And slightly non-uniform distribution can increase probablity
of test failure enough that you can observe such failures.

BTW: You mentioned using counter and hardware AES128 round.
Given cheap AES128 round I would try 128-bit state and AES128
round as state update. I do not know if hardware AES128 is
fast enough to make it wortwhile, but using AES128 round as a
state update should be much better than scrambling a counter.

It depends on what one considers fast.
My expectations, trained by LCGs and esp. by optimized MT, are rather
high. [On relatively modern Intel and AMD hardware] generators that do
feedback through even a single AES round (with another round for post-processing) do not meet my expectations.
At best, fully inlined and after significant effort of removing all non-necessary bottlenecks, such generators produce one output word per
4-5 clock cycles. More naive implementations with state read from
memory and stored back to memory on each iteration, are much slower
than that, easily takes over 10 clocks per iteration.
For comparison, a counter that goes through 5 rounds of AES without
feedback runs at 3 clocks per word when inlined and at 3.5 clocks per
word via function call.
It seems that given today's hardware limitations the only way to get
really fast PRNG with the state updated through AES round is by running
several chains interleaved. Preferably no less than 4 chains. Which
increases size of basic PRNG state to 512 bits + some more for state management.
Another problem is that when all state bits go through AES then I can
not prove that the period of PRNG is really high. I can't even be sure
that it does not enter very short loop at some stage of its life. I
understand that it is highly unlikely, but don't like uncertainty.
Running half of the state through AES and taking another half from
counter solves it, but at cost of need for better diffusion in
post-processing (temper) state of generator. If more than 32 bits of
output produced per iteration, I would not feel good if tempering
consists of just one round of AES. I would want two rounds at least.
May be, better option is to do something like that:
interleave N times {
prnd_out = AES1R(state.w128);
state.w128 = AES1R(state.w128) ^ state.counter64;
state.counter64 = state.counter64 + 1;
}
I am still not 100% sure that the long period is guaranteed with such arrangement, but I am far more sure of it than in case of original
arrangement.
However, it is far from simple and the state is not very small. So, the question is "Why bother"? What's wrong with much simpler scheme that
takes 64-bit counter and runs it through 5 AES rounds? Or even 6 round
if being paranoid?
The only downside that I can see is that this simple robust scheme is
more dependent on high throughput of AES units which is not necessarily available on somewhat older hardware.

Said above does not apply to scomp_LinearComp() failures of mt19937.
Those failures are very consistent. I just don't consider them
significant for my own use of PRNGs or for any other uses of PRNG
that I personally ever encountered.

Overall an experience strengthened my position that general wisdom, previously shared by O'Neil herself, got it right: in absence of the special considerations people should select mt19937 and especially mt19937-64 as their default PRNGs of choice.
Looking closer, apart from its properties of randomness and apart
from huge period

Huge period alone is easy. AFAICS matrix variants of LCG can
easily get quite large periods. I did not test matrix LCG,
but on statistical tests they should be essentially as good
as multiprecision LCG, but should be cheaper to implement.

Just to be clear, I mean equation x_{n+1} = Ax_n + b, wher x_n
is a vector reprezenting n-th state, A is a matrix and b is a
vector. Matrix A may be somewhat sparse, that is have limited
number of non-zero entries, and some entries my be simple, like
1 or -1. With proper choice of a and b period should be
comparable with number of availalable states.

I see no reason to go for very long periods, already 512 bits
of state allow perids which in practice should be indistingushable
from infinite period.

Didn't I agree with that in the very next statement?

(which does not matter for me) I started to appreciate for
mt19937 for the following properties that I was not aware before:
- it does not use multiplier. So good fit for embedded systems that
have no (or very slow) multiplier HW.

Biggest MCU with no hardware multiplier that I have has 2kB RAM.
I do not want mt19937 on such a machine. 64-bit multiplication
on 8-biter with hardware multiplier may be slow, so 64-bit LCG
(and improvements based on it) may be slow. But multiplication
nicely spreads out bits, so it is not clear to me if there is
equally good cheaper alternative.

64-bit multipliers spread bits nicely, indeed. But implementing 64-bit multiplier on the core that natively has only 8x8=16 is not easy or
fast.
Even implementing 32-bit multiplier here is not easy or fast. And I
would not say that 32-bit multipliers spread bits nicely.
But I was not thinking about that sort of cores.
I had in mind minimalistic implementation of soft core architectures by
Xilinx and Altera.
These core have several useful properties:
- tested and supported much better than open-source alternatives
- occupy relative few logic and block RAM resources
- often capable to run at higher frequency than their full-featured
brethren
- the fact that they can be used without buying a license (free as
beer) does not hurt either
Of course, they are *much* slower (lower IPC) than full-featured cores,
but in plenty of cases it's o.k. One thing that these cores do not have
is multiplier. When compiler sees multiplication in source code it calls support routine. And when it happens then instead of *much* slower
(say, 5x) than full-featured core, which is commonly acceptable, these
cores turn ALOT slower (say, 200x0 which is commonly unacceptable.
Cores like that sometimes used in applications that want good PRNG,
like various sorts of built-in tests. In such app spending 2K bytes for
mt1937 state is o.k. Using emulated 64-bit multiplication for PRMG is
sometimes o.k. but more commonly not so.

If needed I would investigate
matrix LCG, they may be slightly cheaper.

- algorithm is very SIMD-friendly, so optimized implementations can
be very very fast on modern x86 and ARM64 hardware.

Just size of the state puts limit how fast it can be. And size of
the state means that it will compete for cache with user data.
BTW: AFAICS matrix LCG can be SIMD friendly too.

The latter property also means that very fast FPGA implementation is
easily possible as long as designer is willing to through at it
moderate amount of logic resources.

Said above does not mean that PCG generators of O'Neil have no
place. Intuitively, they appear not bad. But the theory is
unproven, optimized implementation is likely slower that optimized
mt19937, claimed "security" advantages are nonsense as admitted
later by O'Neil herself. And, as said above, I no longer trust her empirical methodology, based on work of L’Ecuyer.
So, PCG generators are valuable addition to the toolbox, but not
good enough to change my default.

I agree that ATM it is not entirely clear if PCG-s are as good as
suggested by tests. But I am surprised by your opinion about
speed, I did not analyze deeply either of them, but PCG-s are way
simpler, so I would expect them to be faster.

When used in simple way, PCG has LCG latency (IMUL+ADD) embedded in its
core.
On modern x86-64 it tends to be 4 clocks at very least. On ARM64 it's
often better, by I am more interested in x86-64.
OTOH, optimized implementations of mt19937 have [almost] no inherent
latency limits. If one has wider hardware it directly translate into
both faster state update and faster tampering phase.
More strictly speaking for state update there exists a strict
algorithmic limit of 397 32-bit words. Plus, non-heroic implementations
have limit of 624-397=237 words. Both limits are far away from
capabilities of the widest commonly available hardware (2x or 3x 512
bits, i.e. no more than 48 32-bit operations per cycle).
In practice, the biggest bottleneck , even without 512-bit processing,
is not a state update and not tampering, but fighting impedance
mismatch at the level of API. That is, implementation prefers to
generate 256 bits at once, but API insists on 32 bits. Which leads to
need for buffering and for additional management overhead. More time
spent here than within mt algorithm itself.
Still, even with overhead, when buffer management part is inlined
(which is easy to achieve in practice and does not cause significant
code bloat) modern but not the most modern Intel core (Raptor Cove)
that I have in my desktop at work is capable to deliver one 32-bit word
per exactly 3 clock cycles.
That's a little better than even most basic fully inlined LCG, so
necessarily better than any PCG.
There is little doubt that LCGs (and as result of it PCGs) can be
improved via advancing state by several steps at time instead of one
step at time. But then they will suffer from the same impedance
mismatch with APIs. Solution to mismatch would be the same as with mt - buffering and additional management overhead. So gone simplicity, gone
tiny state size :(
I did not try it, but it is very likely that for given width of output
word the resulting speed of PCG would be almost exactly the same as
optimized mt19937, because generator will spend majority of time in the
code that not merely almost the same between the two, but exactly the
same.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael Sanders@porkchop@invalid.foo to comp.lang.c on Thu Jan 8 15:34:04 2026

From Newsgroup: comp.lang.c

On Wed, 07 Jan 2026 13:54:21 -0800, Keith Thompson wrote:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

John McCue <jmclnx@gmail.com.invalid> writes:

Michael Sanders <porkchop@invalid.foo> wrote:

Is it incorrect to use 0 (zero) to seed srand()?

int seed = (argc >= 2 && strlen(argv[1]) == 9)
? atoi(argv[1])
: (int)(time(NULL) % 900000000 + 100000000);

srand(seed);

I like to just read /dev/urandom when I need a random
number. Seem easier and more portable across Linux &
the *BSDs.

int s;
read(fd, &s, sizeof(int));

Apples and oranges. Many applications that use random numbers
need a stream of numbers that is deterministic and reproducible,
which /dev/urandom is not.

And neither is the non-conforming rand() on OpenBSD.

The rand(1) man page on OpenBSD 7.8 says:

Standards insist that this interface return deterministic
results. Unsafe usage is very common, so OpenBSD changed the
subsystem to return non-deterministic results by default.

To satisfy portable code, srand() may be called to initialize
the subsystem. In OpenBSD the seed variable is ignored,
and strong random number results will be provided from
arc4random(3). In other systems, the seed variable primes a
simplistic deterministic algorithm.

It does provide an srand_deterministic() function that behaves the way srand() is supposed to.

So then clang would use:

#ifdef __OpenBSD__
srand_deterministic(seed);
#else
srand(seed);
#endif

But I don't know (yet) that gcc does as well under OpenBSD.
--
:wq
Mike Sanders
--- Synchronet 3.21a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Thu Jan 8 09:26:18 2026

From Newsgroup: comp.lang.c

antispam@fricas.org (Waldek Hebisch) writes:

Michael S <already5chosen@yahoo.com> wrote:

[...]

Anyway, even if I am skeptical about her criticism of popular PRNGs,
intuitively I agree with the constructive part of the article -
medium-quality PRNG that feeds medium quality hash function can
potentially produce very good fast PRNG with rather small internal
state.

She seem to care very much about having minimal possible state.
That is may be nice on embeded systems, but in general I would
happily accept slighty bigger state (say 256 bits). But if
we can get good properties with very small state, then why not?
[...]

That depends on whether one thinks the tests done to measure
quality are sufficient to determine all the axes of "good
properties". I'm not inclined to think that they do. The
set of tests done in the TestU01 suite are quite good IMO,
but I don't think they measure all the properties of interest.
I prefer to err on the side of caution.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Thu Jan 8 09:40:21 2026

From Newsgroup: comp.lang.c

antispam@fricas.org (Waldek Hebisch) writes:

Michael S <already5chosen@yahoo.com> wrote:

I experimented a bit more (in fact, more like a lot more) with
test batteries of L?Ecuyer. It led me to conclusion that occasional
failure in the either middle or big battery means nothing.
Sometimes even cripto-quality PRNG does not pass one or another test.
Then you try to reproduce it and see that with any other seed that you
try a failure does not happen.
All in all, it makes me more suspect of PRNGs that consistently pass
both batteries with various seed. I start to see it as a sign of
PRNG being rigged to pass tests.

Well, that depends on the tests and threshhold in the tests.
Some tests when fed with trurly random source will produce produce
very small variation of the results. With generous threshhold
such test will essentially never fail for trurly random source.
OTOH when expected variation of the results is larger and
threshhold is tight, then trurly random source will fail the
test from time to time. And if you test long enough you should
be able to estimate probability of failure and possibly compare
is with theoretical result if available.

It's inherent in the nature of statistical testing that every
so often a statistical test will "fail" even for a truly random
input. An input source that never fails can also be indicative
of a low quality source (and perhaps one that was tuned to the
particular set of tests being done). It took me a while to
learn that the results of a PRNG test suite should not be seen
as purely binary.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Thu Jan 8 14:44:27 2026

From Newsgroup: comp.lang.c

Michael Sanders <porkchop@invalid.foo> writes:

On Wed, 07 Jan 2026 13:54:21 -0800, Keith Thompson wrote:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

[...]

Apples and oranges. Many applications that use random numbers
need a stream of numbers that is deterministic and reproducible,
which /dev/urandom is not.

And neither is the non-conforming rand() on OpenBSD.

The rand(1) man page on OpenBSD 7.8 says:

Standards insist that this interface return deterministic
results. Unsafe usage is very common, so OpenBSD changed the
subsystem to return non-deterministic results by default.

To satisfy portable code, srand() may be called to initialize
the subsystem. In OpenBSD the seed variable is ignored,
and strong random number results will be provided from
arc4random(3). In other systems, the seed variable primes a
simplistic deterministic algorithm.

It does provide an srand_deterministic() function that behaves the way
srand() is supposed to.

So then clang would use:

#ifdef __OpenBSD__
srand_deterministic(seed);
#else
srand(seed);
#endif

But I don't know (yet) that gcc does as well under OpenBSD.

I don't know what you mean when you say that clang "would use"
that code.

I'm not aware that either clang or gcc uses random numbers
internally. I don't know why they would.

You could certainly write the above code and compile it with either
gcc or clang (or any other C compiler on OpenBSD). I've confirmed
that gcc on OpenBSD does predefine the symbol __OpenBSD__. There
should be no relevant difference between gcc and clang; random
number generation is implemented in the library, not in the compiler.

If your point is that a programmer using either gcc or clang could
use the above code to get the required deterministic behavior
for rand(), I agree. (Though it shouldn't be necessary; IMHO the
OpenBSD folks made a very bad decision.)

Relatedly, the NetBSD implementation of rand() is conforming, but
of very low quality. The low-order bit alternates between 0 and
1 on successive rand() calls, the two low-order bits repeat with
a cycle of 4, and so on. Larry Jones wrote about it here in 2010:

The even/odd problem was caused at Berkeley by a well meaning
but clueless individual who increased the range of the generator
(which originally matched the sample implementation) by returning
the *entire* internal state rather than just the high-order
bits of it. BSD was very popular, so that defective generator
got around a lot, unfortunately.

And I've just discovered that the OpenBSD rand() returns alternating
odd and even results after a call to srand_determinstic().

It's disturbing that this has never been fixed.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael Sanders@porkchop@invalid.foo to comp.lang.c on Fri Jan 9 06:06:57 2026

From Newsgroup: comp.lang.c

On Thu, 08 Jan 2026 14:44:27 -0800, Keith Thompson wrote:

Michael Sanders <porkchop@invalid.foo> writes:

On Wed, 07 Jan 2026 13:54:21 -0800, Keith Thompson wrote:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

[...]

Apples and oranges. Many applications that use random numbers
need a stream of numbers that is deterministic and reproducible,
which /dev/urandom is not.

And neither is the non-conforming rand() on OpenBSD.

The rand(1) man page on OpenBSD 7.8 says:

Standards insist that this interface return deterministic
results. Unsafe usage is very common, so OpenBSD changed the
subsystem to return non-deterministic results by default.

To satisfy portable code, srand() may be called to initialize
the subsystem. In OpenBSD the seed variable is ignored,
and strong random number results will be provided from
arc4random(3). In other systems, the seed variable primes a
simplistic deterministic algorithm.

It does provide an srand_deterministic() function that behaves the way
srand() is supposed to.

So then clang would use:

#ifdef __OpenBSD__
srand_deterministic(seed);
#else
srand(seed);
#endif

But I don't know (yet) that gcc does as well under OpenBSD.

I don't know what you mean when you say that clang "would use"
that code.

I'm not aware that either clang or gcc uses random numbers
internally. I don't know why they would.

Well, I meant the macro itself is (I'm guessing) probably defined
by clang since its the default compiler.

You could certainly write the above code and compile it with either
gcc or clang (or any other C compiler on OpenBSD). I've confirmed
that gcc on OpenBSD does predefine the symbol __OpenBSD__. There
should be no relevant difference between gcc and clang; random
number generation is implemented in the library, not in the compiler.

This is the info I'm, wondering about: both clang & gcc predefine
the symbol.

If your point is that a programmer using either gcc or clang could
use the above code to get the required deterministic behavior
for rand(), I agree. (Though it shouldn't be necessary; IMHO the
OpenBSD folks made a very bad decision.)

Relatedly, the NetBSD implementation of rand() is conforming, but
of very low quality. The low-order bit alternates between 0 and
1 on successive rand() calls, the two low-order bits repeat with
a cycle of 4, and so on. Larry Jones wrote about it here in 2010:

The even/odd problem was caused at Berkeley by a well meaning
but clueless individual who increased the range of the generator
(which originally matched the sample implementation) by returning
the *entire* internal state rather than just the high-order
bits of it. BSD was very popular, so that defective generator
got around a lot, unfortunately.

And I've just discovered that the OpenBSD rand() returns alternating
odd and even results after a call to srand_determinstic().

It's disturbing that this has never been fixed.

Yikes! Thanks Keith. This is sort of odd for OpenBSD.
--
:wq
Mike Sanders
--- Synchronet 3.21a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Thu Jan 8 22:46:42 2026

From Newsgroup: comp.lang.c

Michael Sanders <porkchop@invalid.foo> writes:

On Thu, 08 Jan 2026 14:44:27 -0800, Keith Thompson wrote:

Michael Sanders <porkchop@invalid.foo> writes:

[...]

So then clang would use:

#ifdef __OpenBSD__
srand_deterministic(seed);
#else
srand(seed);
#endif

But I don't know (yet) that gcc does as well under OpenBSD.

I don't know what you mean when you say that clang "would use"
that code.

I'm not aware that either clang or gcc uses random numbers
internally. I don't know why they would.

Well, I meant the macro itself is (I'm guessing) probably defined
by clang since its the default compiler.

You mean the macro __OpenBSD__? Yes, that and other similar macros
are predefined by the compiler, which is configured for each OS.
gcc on OpenBSD also predefines it. (I don't know whether it's
predefined by the preprocessor directly or by some header that's
included implicitly. That doesn't really matter.) Compilers on
other platforms will not predefine __OpenBSD__.

But your original statement implied that clang would *use* that
particular piece of code, which didn't make much sense. Were you
just asking about how the __OpenBSD__ macro is defined, without
reference to srand?

[...]
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.21a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Fri Jan 9 00:36:21 2026

From Newsgroup: comp.lang.c

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

John McCue <jmclnx@gmail.com.invalid> writes:

Michael Sanders <porkchop@invalid.foo> wrote:

Is it incorrect to use 0 (zero) to seed srand()?

int seed = (argc >= 2 && strlen(argv[1]) == 9)
? atoi(argv[1])
: (int)(time(NULL) % 900000000 + 100000000);

srand(seed);

I like to just read /dev/urandom when I need a random
number. Seem easier and more portable across Linux &
the *BSDs.

int s;
read(fd, &s, sizeof(int));

Apples and oranges. Many applications that use random numbers
need a stream of numbers that is deterministic and reproducible,
which /dev/urandom is not.

And neither is the non-conforming rand() on OpenBSD.

The rand(1) man page on OpenBSD 7.8 says:

Standards insist that this interface return deterministic
results. Unsafe usage is very common, so OpenBSD changed the
subsystem to return non-deterministic results by default.

To satisfy portable code, srand() may be called to initialize
the subsystem. In OpenBSD the seed variable is ignored,
and strong random number results will be provided from
arc4random(3). In other systems, the seed variable primes a
simplistic deterministic algorithm.

Apparently the OpenBSD folks have seen fit to remove the only
desirable property that ISO C actually specifies for the standard
library random number generator. Bravo!
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael Sanders@porkchop@invalid.foo to comp.lang.c on Fri Jan 9 22:38:59 2026

From Newsgroup: comp.lang.c

On Thu, 08 Jan 2026 22:46:42 -0800, Keith Thompson wrote:

But your original statement implied that clang would *use* that
particular piece of code, which didn't make much sense. Were you
just asking about how the __OpenBSD__ macro is defined, without
reference to srand?

Well, under OpenBSD I plan on using:

#ifdef __OpenBSD__
srand_deterministic(seed);
#else
srand(seed);
#endif

But what I was asking is whether or not gcc would recognize
the __OpenBSD__ macro (why wouldn't I'm assuming) since clang
is the default compiler.

But also about srand()... you've got me really wondering why
OpenBSD would deviate from the standard as they have. I get
that the those folks disagree because its deterministic, but
its the accepted standard to be deterministic with srand().

Only speaking for myself here, rather than srand_deterministic()
and srand() (that's not deterministic under OpenBSD) it
would've made more sense to've implemented srand_non_deterministic()
and left srand() alone. That design decision on their part only
muddies the waters in my thinking. Live & learn =)
--
:wq
Mike Sanders
--- Synchronet 3.21a-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Fri Jan 9 23:27:15 2026

From Newsgroup: comp.lang.c

Michael Sanders <porkchop@invalid.foo> writes:

On Thu, 08 Jan 2026 22:46:42 -0800, Keith Thompson wrote:

But your original statement implied that clang would *use* that
particular piece of code, which didn't make much sense. Were you
just asking about how the __OpenBSD__ macro is defined, without
reference to srand?

Well, under OpenBSD I plan on using:

#ifdef __OpenBSD__
srand_deterministic(seed);
#else
srand(seed);
#endif

But what I was asking is whether or not gcc would recognize
the __OpenBSD__ macro (why wouldn't I'm assuming) since clang
is the default compiler.

$ gcc -dM -E - < /dev/null

will show all the preprocessor macros predefined by the compiler.

There 397 predefined macros on my Fedora gcc 14 installation.

Note that other macros may be defined in header files.

But also about srand()... you've got me really wondering why
OpenBSD would deviate from the standard as they have. I get
that the those folks disagree because its deterministic, but
its the accepted standard to be deterministic with srand().

I expect they were primary concerned with the security
implications of a deterministic algorithm.

Only speaking for myself here, rather than srand_deterministic()
and srand() (that's not deterministic under OpenBSD) it
would've made more sense to've implemented srand_non_deterministic()
and left srand() alone. That design decision on their part only
muddies the waters in my thinking. Live & learn =)

I'm sure they wanted the change to apply by default to existing
applications (many of them likely distributed with various BSD
releases).

--- Synchronet 3.21a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Fri Jan 9 17:09:47 2026

From Newsgroup: comp.lang.c

Michael Sanders <porkchop@invalid.foo> writes:

On Thu, 08 Jan 2026 22:46:42 -0800, Keith Thompson wrote:

But your original statement implied that clang would *use* that
particular piece of code, which didn't make much sense. Were you
just asking about how the __OpenBSD__ macro is defined, without
reference to srand?

Well, under OpenBSD I plan on using:

#ifdef __OpenBSD__
srand_deterministic(seed);
#else
srand(seed);
#endif

But what I was asking is whether or not gcc would recognize
the __OpenBSD__ macro (why wouldn't I'm assuming) since clang
is the default compiler.

OK.

Do you understand that your original question was unclear?

You said that "clang would use" the quoted 5-line code snippet,
and asked whether "gcc does as well". It's not clang or gcc that
would use that code. It would be used by a programmer writing code
to be compiled with clang or gcc.

I understand now what you meant. I'd like to be sure that you
understand the problem with the question as you originally wrote it.

I have clang 19.1.7 and gcc 13.2.0 installed on OpenBSD 7.8, and
both predefine the macro __OpenBSD__.

But also about srand()... you've got me really wondering why
OpenBSD would deviate from the standard as they have. I get
that the those folks disagree because its deterministic, but
its the accepted standard to be deterministic with srand().

Only speaking for myself here, rather than srand_deterministic()
and srand() (that's not deterministic under OpenBSD) it
would've made more sense to've implemented srand_non_deterministic()
and left srand() alone. That design decision on their part only
muddies the waters in my thinking. Live & learn =)

I don't know why they made that decision. It was clearly deliberate.
I agree that adding an srand_non_deterministic() function would
have been better.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.21a-Linux NewsLink 1.2

From Kaz Kylheku@046-301-5902@kylheku.com to comp.lang.c on Sat Jan 10 19:44:01 2026

From Newsgroup: comp.lang.c

On 2026-01-09, Michael Sanders <porkchop@invalid.foo> wrote:

On Thu, 08 Jan 2026 22:46:42 -0800, Keith Thompson wrote:

But your original statement implied that clang would *use* that
particular piece of code, which didn't make much sense. Were you
just asking about how the __OpenBSD__ macro is defined, without
reference to srand?

Well, under OpenBSD I plan on using:

#ifdef __OpenBSD__
srand_deterministic(seed);
#else
srand(seed);
#endif

This is is better

// In some common configuration header:

#ifdef __OpenBSD__
#define HAVE_SRAND_DETERMINISTIC 1
#define HAVE_... /* other such macros */
#endif

(Or the configuration can be generated by scripts which detect features
in environment.)

Then in the code:

#if HAVE_SRAND_DETERMINISTIC
srand_deterministic(seed);
#else
srand(seed);
#endif

If a platform other than __OpenBSD__ comes along which has
srand_deterministic you just make sure HAVE_SRAND_DETERMINISTIC 1 is
turned on; you don't have to edit the code where that is used.

This idea is seen in the configuration of GNU programs and such.

There is a "GDB Internals" document which discusses it in a section
called "Clean Design"

Partial quote:

New #ifdef’s which test for specific compilers or manufacturers or
operating systems are unacceptable. All #ifdef’s should test for
features. The information about which configurations contain which
features should be segregated into the configuration files. Experience
has proven far too often that a feature unique to one particular system
often creeps into other systems; and that a conditional based on some
predefined macro for your current system will become worthless over
time, as new versions of your system come out that behave differently
with regard to this feature.

[ ... more discussion ... ]

https://www.sourceware.org/gdb/5/onlinedocs/gdbint.pdf

I think the GNU Coding Standards document may have had a similar
discussion; I don't see it in the current version though.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
--- Synchronet 3.21a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Tue Feb 3 05:26:47 2026

From Newsgroup: comp.lang.c

Michael S <already5chosen@yahoo.com> writes:

On Wed, 07 Jan 2026 08:41:25 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Michael S <already5chosen@yahoo.com> writes:

On Tue, 23 Dec 2025 17:54:05 -0000 (UTC)
antispam@fricas.org (Waldek Hebisch) wrote:

[...]

There is a paper "PCG: A Family of Simple Fast Space-Efficient
Statistically Good Algorithms for Random Number Generation"
by M. O?Neill where she gives a family of algorithms and runs
several statistical tests against known algorithms. Mersenne
Twister does not look good in tests. If you have enough (128) bits
LCGs do pass tests. A bunch of generators with 64-bit state also
passes tests. So the only reason to prefer Mersenne Twister is
that it is implemented in available libraries. Otherwise it is
not so good, have large state and needs more execution time
than alternatives.

I don't know. Testing randomness is complicated matter.
How can I be sure that L'Ecuyer and Simard's TestU01 suite tests
things that I personally care about and that it does not test
things that are of no interest for me? Especially, the latter.

Do you think any of the tests in the TestU01 suite are actually
counter-indicated? As long as you don't think any TestU01 test
makes things worse, there is no reason not to use all of them.
You are always free to disregard tests you don't care about.

Except that it's difficult psychologically.
The batteries of test gains position of of authority in your mind.
Well, may be, you specifically are resistant, but I am not. Nor is
Melissa O'Nail, it seems.

To illustrate my point, I will tell you the story about myself.
Sort of confession.
[very large portion]

I have read through your whole posting several times, and also
looked through your other postings in this thread. Despite my
efforts I am still not sure what you think or what you're trying to
say.

Let me put it as a question. Do you think there is a good and
objective test for measuring the quality of a PRNG? If so what test
(or tests) do you think would suffice? Here "quality" is meant as
some sort of numeric measure, which could be a monotonic metric (as
in "the larger the number the higher the quality") or just a simple
pass/fail.

If you don't think there is any such test, how do you propose that
PRNGs be evaluated?

One important point that I seem to figure out recently is that the
only practical way to produce both solid and very fast PRNG that
adheres to standard language APIs with 32-bit and to somewhat smaller
extent 64-bit output, is to use buffering. I.e. most of the time
generator simply reads pre-calculated word from the buffer and only
ones per N iterations runs an actual PRNG algorithm, probably in a
loop, often in SIMD. In order for this approach to be effective,
buffer can't be particularly small. 32 bytes (256 bits) appear to be
an absolute minimum. The buffer and counter that manages buffering,
are parts of the generator state. That alone sets a practical minimal
limit on the size of generator and diminishes significance of the
difference between PRNGs with "algorithmic" state of 64 bits, 128 bits
or even 256 bits.

I understand what you're saying. These concerns are orthogonal to
the question of the quality of the numbers generated, which for the
purposes of this conversation is all I'm interested in.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Tue Feb 3 16:37:08 2026

From Newsgroup: comp.lang.c

On Tue, 03 Feb 2026 05:26:47 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Michael S <already5chosen@yahoo.com> writes:

On Wed, 07 Jan 2026 08:41:25 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Michael S <already5chosen@yahoo.com> writes:

On Tue, 23 Dec 2025 17:54:05 -0000 (UTC)
antispam@fricas.org (Waldek Hebisch) wrote:

[...]

There is a paper "PCG: A Family of Simple Fast Space-Efficient
Statistically Good Algorithms for Random Number Generation"
by M. O?Neill where she gives a family of algorithms and runs
several statistical tests against known algorithms. Mersenne
Twister does not look good in tests. If you have enough (128)
bits LCGs do pass tests. A bunch of generators with 64-bit
state also passes tests. So the only reason to prefer Mersenne
Twister is that it is implemented in available libraries.
Otherwise it is not so good, have large state and needs more
execution time than alternatives.

I don't know. Testing randomness is complicated matter.
How can I be sure that L'Ecuyer and Simard's TestU01 suite tests
things that I personally care about and that it does not test
things that are of no interest for me? Especially, the latter.

Do you think any of the tests in the TestU01 suite are actually
counter-indicated? As long as you don't think any TestU01 test
makes things worse, there is no reason not to use all of them.
You are always free to disregard tests you don't care about.

Except that it's difficult psychologically.
The batteries of test gains position of of authority in your mind.
Well, may be, you specifically are resistant, but I am not. Nor is
Melissa O'Nail, it seems.

To illustrate my point, I will tell you the story about myself.
Sort of confession.
[very large portion]

I have read through your whole posting several times, and also
looked through your other postings in this thread. Despite my
efforts I am still not sure what you think or what you're trying to
say.

Let me put it as a question. Do you think there is a good and
objective test for measuring the quality of a PRNG? If so what test
(or tests) do you think would suffice? Here "quality" is meant as
some sort of numeric measure, which could be a monotonic metric (as
in "the larger the number the higher the quality") or just a simple pass/fail.

I don't think that it is possible to create generic empirical tests of
this sort.
What is possible is a test that measures specific property that is
known to be important for specific use.

If you don't think there is any such test, how do you propose that
PRNGs be evaluated?

I don't know.
But I believe in one philosophical principle that I learned first ~40
years ago in the field unrelated to PRNGs or to Computer Science: when
you don't know how to measure quality of your product then it is
advisable to ask your consumer. Do not ask a random consumer, ask
the one who requested improvement of the property that you have
difficulties to measure. He is the most likely one to know.

The field where I first heard about that principle was manufacturing of ultra-pure chemicals.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Tue Feb 17 23:47:06 2026

From Newsgroup: comp.lang.c

Michael S <already5chosen@yahoo.com> writes:

On Tue, 03 Feb 2026 05:26:47 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Michael S <already5chosen@yahoo.com> writes:

On Wed, 07 Jan 2026 08:41:25 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Michael S <already5chosen@yahoo.com> writes:

On Tue, 23 Dec 2025 17:54:05 -0000 (UTC)
antispam@fricas.org (Waldek Hebisch) wrote:

[...]

There is a paper "PCG: A Family of Simple Fast Space-Efficient
Statistically Good Algorithms for Random Number Generation"
by M. O?Neill where she gives a family of algorithms and runs
several statistical tests against known algorithms. Mersenne
Twister does not look good in tests. If you have enough (128)
bits LCGs do pass tests. A bunch of generators with 64-bit
state also passes tests. So the only reason to prefer Mersenne
Twister is that it is implemented in available libraries.
Otherwise it is not so good, have large state and needs more
execution time than alternatives.

I don't know. Testing randomness is complicated matter.
How can I be sure that L'Ecuyer and Simard's TestU01 suite tests
things that I personally care about and that it does not test
things that are of no interest for me? Especially, the latter.

Do you think any of the tests in the TestU01 suite are actually
counter-indicated? As long as you don't think any TestU01 test
makes things worse, there is no reason not to use all of them.
You are always free to disregard tests you don't care about.

Except that it's difficult psychologically.
The batteries of test gains position of of authority in your mind.
Well, may be, you specifically are resistant, but I am not. Nor is
Melissa O'Nail, it seems.

To illustrate my point, I will tell you the story about myself.
Sort of confession.
[very large portion]

I have read through your whole posting several times, and also
looked through your other postings in this thread. Despite my
efforts I am still not sure what you think or what you're trying to
say.

Let me put it as a question. Do you think there is a good and
objective test for measuring the quality of a PRNG? If so what test
(or tests) do you think would suffice? Here "quality" is meant as
some sort of numeric measure, which could be a monotonic metric (as
in "the larger the number the higher the quality") or just a simple
pass/fail.

I don't think that it is possible to create generic empirical tests
of this sort.
What is possible is a test that measures specific property that is
known to be important for specific use.

If you don't think there is any such test, how do you propose that
PRNGs be evaluated?

I don't know.
But I believe in one philosophical principle that I learned first ~40
years ago in the field unrelated to PRNGs or to Computer Science:
when you don't know how to measure quality of your product then it is advisable to ask your consumer. Do not ask a random consumer, ask the
one who requested improvement of the property that you have
difficulties to measure. He is the most likely one to know.

The field where I first heard about that principle was manufacturing
of ultra-pure chemicals.

I'm surprised to see this answer from you, primarily because it
seems to confuse different aspects of the subject.

The key property of a (pseudo) random number generator is that the
values produced exhibit no discernible pattern. Of course this
question cannot be answered absolutely, or more precisely it can be
answered definitely only in the negative. Any affirmative answer
must be partial and also statistical rather than absolute.

If someone wants to write a PRNG for general use, there is no point
in asking users what they want, because they don't know. Very few
people in the entire world have the mathematical sophistication
necessary to answer the question competently (I know I don't, and I
would put myself in the 75 percentile or above). The sort of people
who could answer the question are also likely to have written test
suites like TestU01, so it seems reasonable to use one or more of
those test suites to establish a lower bar for any proposed general
pupose PRNG.

Of course there are properties besides statistical quality measures
that might be desirable, such as whether the state of the PRNG can
be readily ascertained, the size of the state space, various sorts
of cryptographic concerns, and so forth. For some applications
there could be performance thresholds that are essential to meet,
such as how long to produce a new random value, or what the memory
footprint is. But these concerns are outside the domain of the
question, which is only about the quality of the values produced.
Do you see now what I'm getting at?
--- Synchronet 3.21b-Linux NewsLink 1.2

From Tristan Wibberley@tristan.wibberley+netnews2@alumni.manchester.ac.uk to comp.lang.c,sci.math.num-analysis on Wed Feb 18 11:21:22 2026

From Newsgroup: comp.lang.c

On 18/02/2026 07:47, Tim Rentsch wrote:

The key property of a (pseudo) random number generator is that the
values produced exhibit no discernible pattern.

For a PRNG, they exhibit the pattern of following the sequence of the PRNG!

Is it that, for any finite sequence of numbers from a PRNG, without
information about where it came from and how many numbers came before
you can't predict the next number better than chance?
--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c,sci.math.num-analysis on Thu Feb 19 10:01:04 2026

From Newsgroup: comp.lang.c

On 18/02/2026 12:21, Tristan Wibberley wrote:

On 18/02/2026 07:47, Tim Rentsch wrote:

The key property of a (pseudo) random number generator is that the
values produced exhibit no discernible pattern.

For a PRNG, they exhibit the pattern of following the sequence of the PRNG!

As a deterministic function, a PRNG will obviously follow the pattern of
its generating function. But the aim is to have no /discernible/
pattern. The sequence 3, 4, 2, 1, 1, 7, 0, 6, 7 has no pattern that
could be identified without knowledge of where they came from - and thus
no way to predict the next number, 9, in the sequence. But there is a
pattern there - it's the 90th - 100th digits of the decimal expansion of pi.

Is it that, for any finite sequence of numbers from a PRNG, without information about where it came from and how many numbers came before
you can't predict the next number better than chance?

That's the general aim, yes.

But Michael is absolutely correct that only the consumer can say what
they want to measure in order to judge the quality of any piece of code.
It is the customer that gives the requirement specifications, and the programmer's job is to write code that fulfils those specifications.
PRNGs are no different. (In practice, many customers need help figuring
out what their requirements are, and how to express those, but that's
another matter.)

In the case of PRNGs, there are many possible requirements beyond the
"it's hard to predict the next number in the sequence". These include :

* Simplicity of implementation
* Cryptographic security of implementation
* Running speed
* Statistical distribution of values (with many possible patterns, and consideration of length of samples)
* Repeat cycle length
* Psychological factors (sometimes you roll five sixes in a row, but
that might look like a loaded dice. Randomised playlists often use modifications to their PRNGs to avoid repetition of songs, and plotting
random points in a 2-D space does not look "random" to most people)
* Interaction with added entropy sources

As with most requirements for most software, turning most of these into
some kind of directly and objectively measurable "quality" function is difficult or impossible in practice. As Michael says, the only thing
you can do is when a consumer complains that it is not good enough for
their purposes, ask them how to identify when it would be good enough.

--- Synchronet 3.21b-Linux NewsLink 1.2

From James Kuyper@jameskuyper@alumni.caltech.edu to sci.math.num-analysis,comp.lang.c on Thu Feb 19 14:33:34 2026

From Newsgroup: comp.lang.c

On 2026-02-19 04:01, David Brown wrote:

On 18/02/2026 12:21, Tristan Wibberley wrote:

On 18/02/2026 07:47, Tim Rentsch wrote:

The key property of a (pseudo) random number generator is that the
values produced exhibit no discernible pattern.

For a PRNG, they exhibit the pattern of following the sequence of the PRNG! >>

As a deterministic function, a PRNG will obviously follow the pattern of
its generating function. But the aim is to have no /discernible/
pattern. The sequence 3, 4, 2, 1, 1, 7, 0, 6, 7 has no pattern that
could be identified without knowledge of where they came from - and thus
no way to predict the next number, 9, in the sequence. But there is a pattern there - it's the 90th - 100th digits of the decimal expansion of pi.

I think you're being overoptimistic. I suspect that the pattern could be identified, exactly, without knowing how it was generated. That's
because every possible pattern has infinitely many different ways in
which in can be produced. One of those other ways might be easier to
describe than the way in which the numbers were actually produced, in
which case that simpler way might be guessed more easily that the actual
one - possibly a lot easier.
--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to sci.math.num-analysis,comp.lang.c on Thu Feb 19 20:47:41 2026

From Newsgroup: comp.lang.c

On 19/02/2026 20:33, James Kuyper wrote:

On 2026-02-19 04:01, David Brown wrote:

On 18/02/2026 12:21, Tristan Wibberley wrote:

On 18/02/2026 07:47, Tim Rentsch wrote:

The key property of a (pseudo) random number generator is that the
values produced exhibit no discernible pattern.

For a PRNG, they exhibit the pattern of following the sequence of the PRNG! >>>

As a deterministic function, a PRNG will obviously follow the pattern of
its generating function. But the aim is to have no /discernible/
pattern. The sequence 3, 4, 2, 1, 1, 7, 0, 6, 7 has no pattern that
could be identified without knowledge of where they came from - and thus
no way to predict the next number, 9, in the sequence. But there is a
pattern there - it's the 90th - 100th digits of the decimal expansion of pi.

I think you're being overoptimistic. I suspect that the pattern could be identified, exactly, without knowing how it was generated. That's
because every possible pattern has infinitely many different ways in
which in can be produced. One of those other ways might be easier to
describe than the way in which the numbers were actually produced, in
which case that simpler way might be guessed more easily that the actual
one - possibly a lot easier.

How likely is it that someone would guess a formula that happened to
generate the decimal digits of pi, without more knowledge than a part of
the sequence? I don't believe it is possible to quantify such a
probability, but I would expect it to be very low.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c,sci.math.num-analysis on Thu Feb 19 14:39:33 2026

From Newsgroup: comp.lang.c

David Brown <david.brown@hesbynett.no> writes:
[...]

As a deterministic function, a PRNG will obviously follow the pattern
of its generating function. But the aim is to have no /discernible/
pattern. The sequence 3, 4, 2, 1, 1, 7, 0, 6, 7 has no pattern that
could be identified without knowledge of where they came from - and
thus no way to predict the next number, 9, in the sequence. But there
is a pattern there - it's the 90th - 100th digits of the decimal
expansion of pi.

[...]

A Google search for 342117067 gives numerous hits referring to the
digits of pi.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c,sci.math.num-analysis on Fri Feb 20 09:16:04 2026

From Newsgroup: comp.lang.c

On 19/02/2026 23:39, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

As a deterministic function, a PRNG will obviously follow the pattern
of its generating function. But the aim is to have no /discernible/
pattern. The sequence 3, 4, 2, 1, 1, 7, 0, 6, 7 has no pattern that
could be identified without knowledge of where they came from - and
thus no way to predict the next number, 9, in the sequence. But there
is a pattern there - it's the 90th - 100th digits of the decimal
expansion of pi.

[...]

A Google search for 342117067 gives numerous hits referring to the
digits of pi.

That is using knowledge of where the sequence comes from - something
else's knowledge rather than your own, but it's the same principle.

--- Synchronet 3.21b-Linux NewsLink 1.2

From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c,sci.math.num-analysis on Fri Feb 20 16:01:02 2026

From Newsgroup: comp.lang.c

On 2026-02-19 14:47, David Brown wrote:
...

How likely is it that someone would guess a formula that happened to
generate the decimal digits of pi, without more knowledge than a part
of the sequence? I don't believe it is possible to quantify such a probability, but I would expect it to be very low.

I'm thinking of the kind of software that looks for patterns in
something, such as compression utilities. A compression utility
basically converts a long string of numbers into a much shorter string
that can be expanded by the decompression utility to recover the
original pattern. If you look at the algorithms such code uses, you
realize that they do not attempt to recreate the process that originally generated the long string, they just, in effect, characterize the
resulting sequence.

--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c,sci.math.num-analysis on Sat Feb 21 11:09:12 2026

From Newsgroup: comp.lang.c

On 20/02/2026 22:01, James Kuyper wrote:

On 2026-02-19 14:47, David Brown wrote:
...

How likely is it that someone would guess a formula that happened to
generate the decimal digits of pi, without more knowledge than a part
of the sequence? I don't believe it is possible to quantify such a
probability, but I would expect it to be very low.

I'm thinking of the kind of software that looks for patterns in
something, such as compression utilities. A compression utility
basically converts a long string of numbers into a much shorter string
that can be expanded by the decompression utility to recover the
original pattern. If you look at the algorithms such code uses, you
realize that they do not attempt to recreate the process that originally generated the long string, they just, in effect, characterize the
resulting sequence.

One of the characteristics of a good PRNG is that its unpredictability
and its statistical properties (typically they aim to be like white
noise, but other distributions are sometimes useful) make them
uncompressible with generic algorithms. Since the PRNG's sequence is generated from an algorithm, it is of course always possible to
re-create that algorithm as a "compressed" form of the sequence. But
generic algorithms will never manage it.

Indeed, you can /define/ a random sequence as a series x(i) such that
for any given compression algorithm Z, there is an integer n such that
the Z-compressed version of x is always bigger than the original x for a sequence length n or more.

And for any given compression algorithm Z, you can find a PRNG algorithm
that Z cannot compress (after a possible initial segment). I haven't
dug through the details, but I am confident that you could show this
with a diagonalisation algorithm over Turing machines, or something akin
to the halting problem proofs.

Or you can just try it yourself:

$ dd if=/dev/urandom of=rand bs=1M count=1
$ cp rand rand1
$ cp rand rand2
$ gzip rand1
$ bzip2 rand2
$ ls -l
-rw-rw-r-- 1 david david 1048576 Feb 20 09:12 rand
-rw-rw-r-- 1 david david 1048760 Feb 20 09:12 rand1.gz
-rw-rw-r-- 1 david david 1053414 Feb 20 09:12 rand2.bz2

--- Synchronet 3.21b-Linux NewsLink 1.2

From Paul@nospam@needed.invalid to comp.lang.c,sci.math.num-analysis on Mon Feb 23 08:32:38 2026

From Newsgroup: comp.lang.c

On Fri, 2/20/2026 3:16 AM, David Brown wrote:

On 19/02/2026 23:39, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

As a deterministic function, a PRNG will obviously follow the pattern
of its generating function. But the aim is to have no /discernible/
pattern. The sequence 3, 4, 2, 1, 1, 7, 0, 6, 7 has no pattern that
could be identified without knowledge of where they came from - and
thus no way to predict the next number, 9, in the sequence. But there
is a pattern there - it's the 90th - 100th digits of the decimal
expansion of pi.

[...]

A Google search for 342117067 gives numerous hits referring to the
digits of pi.

That is using knowledge of where the sequence comes from - something else's knowledge rather than your own, but it's the same principle.

"In the following sequence, what is the next digit 7,7,7,7,7,7,7,7,7 ? " :-)

PI=3.

1415926535 8979323846 2643383279 5028841971 6939937510
...
7777777772 4846769425 9310468643 5260899021 0266057232 # Line 517834

I suspect seeing that, that's not good.

Using pgmp-chudnovsky.c , and dumping pi as a binary float to a file,
I get this:

(text version of PI) 100,000,022 bytes
PI-Binary.bin 41,524,121 bytes exponent and limbs
PI-Binary.bin.7Z 41,526,823 bytes 7Z Ultra compression, running on 1 core

The entropy property looks pretty good, but I doubt I would
be using that for my supply of random numbers :-)

https://gmplib.org/list-archives/gmp-discuss/2008-November/003444.html https://stackoverflow.com/questions/3318979/how-to-serialize-the-gmp-mpf-type
https://gmplib.org/list-archives/gmp-discuss/2007-November/002981.html

gcc -DNO_FACTOR -fopenmp -Wall -O2 -o pgmp-chudnovsky pgmp-chudnovsky.c -lgmp -lm

Paul

--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c,sci.math.num-analysis on Mon Feb 23 16:05:45 2026

From Newsgroup: comp.lang.c

On 23/02/2026 14:32, Paul wrote:

On Fri, 2/20/2026 3:16 AM, David Brown wrote:

On 19/02/2026 23:39, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

As a deterministic function, a PRNG will obviously follow the pattern
of its generating function. But the aim is to have no /discernible/
pattern. The sequence 3, 4, 2, 1, 1, 7, 0, 6, 7 has no pattern that
could be identified without knowledge of where they came from - and
thus no way to predict the next number, 9, in the sequence. But there >>>> is a pattern there - it's the 90th - 100th digits of the decimal
expansion of pi.

[...]

A Google search for 342117067 gives numerous hits referring to the
digits of pi.

That is using knowledge of where the sequence comes from - something else's knowledge rather than your own, but it's the same principle.

"In the following sequence, what is the next digit 7,7,7,7,7,7,7,7,7 ? " :-)

PI=3.

1415926535 8979323846 2643383279 5028841971 6939937510
...
7777777772 4846769425 9310468643 5260899021 0266057232 # Line 517834

I suspect seeing that, that's not good.

Using pgmp-chudnovsky.c , and dumping pi as a binary float to a file,
I get this:

(text version of PI) 100,000,022 bytes
PI-Binary.bin 41,524,121 bytes exponent and limbs
PI-Binary.bin.7Z 41,526,823 bytes 7Z Ultra compression, running on 1 core

The entropy property looks pretty good, but I doubt I would
be using that for my supply of random numbers :-)

In a random sequence of decimal digits, you would expect a sequence of
nine identical digits to turn up on average every 10 ^ 8 digits or so.
You calculated 10 ^ 8 digits, so it's not surprising to see that here.

As for your compression, remember that your text file contains only the
digits 0 to 9, spaces and newlines - 12 different characters in 8-bit
bytes. If these were purely randomly distributed, you'd expect a best compression ratio of log(12) / log(256), or 0.448. But they are not completely random - your space characters and newlines are predictably
spaced so you get marginally better compression ratios. Without spaces
and newlines, you'd expect log(10) / log(256) compression - 0.415241012.
What a coincidence - this matches your "exponent and limbs", and your compressor can't improve on it. (I downloaded a billion digits of pi
and gzip'ed it, for a compression ration of 0.469.)

It turns out that the pseudo-randomness here is extremely good. While
it has not been proven that pi is "normal" (that is to say, its digits
are all evenly distributed), it is strongly believed to be so.

Of course it's not a great source of entropy for secure random numbers,
but the digits of pi form a fine pseudo-random generator function (if
you don't mind the calculation time).

https://gmplib.org/list-archives/gmp-discuss/2008-November/003444.html https://stackoverflow.com/questions/3318979/how-to-serialize-the-gmp-mpf-type
https://gmplib.org/list-archives/gmp-discuss/2007-November/002981.html

gcc -DNO_FACTOR -fopenmp -Wall -O2 -o pgmp-chudnovsky pgmp-chudnovsky.c -lgmp -lm

Paul

--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c,sci.math.num-analysis on Mon Feb 23 19:59:17 2026

From Newsgroup: comp.lang.c

On Mon, 23 Feb 2026 16:05:45 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 23/02/2026 14:32, Paul wrote:

On Fri, 2/20/2026 3:16 AM, David Brown wrote:

On 19/02/2026 23:39, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

As a deterministic function, a PRNG will obviously follow the
pattern of its generating function. But the aim is to have no
/discernible/ pattern. The sequence 3, 4, 2, 1, 1, 7, 0, 6, 7
has no pattern that could be identified without knowledge of
where they came from - and thus no way to predict the next
number, 9, in the sequence. But there is a pattern there - it's
the 90th - 100th digits of the decimal expansion of pi.

[...]

A Google search for 342117067 gives numerous hits referring to the
digits of pi.

That is using knowledge of where the sequence comes from -
something else's knowledge rather than your own, but it's the same
principle.

"In the following sequence, what is the next digit
7,7,7,7,7,7,7,7,7 ? " :-)

PI=3.

1415926535 8979323846 2643383279 5028841971 6939937510
...
7777777772 4846769425 9310468643 5260899021 0266057232 # Line
517834

I suspect seeing that, that's not good.

Using pgmp-chudnovsky.c , and dumping pi as a binary float to a
file, I get this:

(text version of PI) 100,000,022 bytes
PI-Binary.bin 41,524,121 bytes exponent and limbs
PI-Binary.bin.7Z 41,526,823 bytes 7Z Ultra
compression, running on 1 core

The entropy property looks pretty good, but I doubt I would
be using that for my supply of random numbers :-)

In a random sequence of decimal digits, you would expect a sequence
of nine identical digits to turn up on average every 10 ^ 8 digits or
so. You calculated 10 ^ 8 digits, so it's not surprising to see that
here.

As for your compression, remember that your text file contains only
the digits 0 to 9, spaces and newlines - 12 different characters in
8-bit bytes. If these were purely randomly distributed, you'd expect
a best compression ratio of log(12) / log(256), or 0.448. But they
are not completely random - your space characters and newlines are predictably spaced so you get marginally better compression ratios.
Without spaces and newlines, you'd expect log(10) / log(256)
compression - 0.415241012. What a coincidence - this matches your
"exponent and limbs", and your compressor can't improve on it. (I
downloaded a billion digits of pi and gzip'ed it, for a compression
ration of 0.469.)

It turns out that the pseudo-randomness here is extremely good.
While it has not been proven that pi is "normal" (that is to say, its
digits are all evenly distributed), it is strongly believed to be so.

Of course it's not a great source of entropy for secure random
numbers, but the digits of pi form a fine pseudo-random generator
function (if you don't mind the calculation time).

Would be interesting to find out if it passes Big Crash of L’Ecuyer.
Of course, one would need far more than a billion of decimal digits to
have a chance. Something like 100B hexadecimal digits appears to be a
minimum.

https://gmplib.org/list-archives/gmp-discuss/2008-November/003444.html https://stackoverflow.com/questions/3318979/how-to-serialize-the-gmp-mpf-type
https://gmplib.org/list-archives/gmp-discuss/2007-November/002981.html

gcc -DNO_FACTOR -fopenmp -Wall -O2 -o pgmp-chudnovsky
pgmp-chudnovsky.c -lgmp -lm

Paul

--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c,sci.math.num-analysis on Mon Feb 23 20:06:23 2026

From Newsgroup: comp.lang.c

On 23/02/2026 18:59, Michael S wrote:

On Mon, 23 Feb 2026 16:05:45 +0100
David Brown <david.brown@hesbynett.no> wrote:

On 23/02/2026 14:32, Paul wrote:

On Fri, 2/20/2026 3:16 AM, David Brown wrote:

On 19/02/2026 23:39, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

As a deterministic function, a PRNG will obviously follow the
pattern of its generating function. But the aim is to have no
/discernible/ pattern. The sequence 3, 4, 2, 1, 1, 7, 0, 6, 7
has no pattern that could be identified without knowledge of
where they came from - and thus no way to predict the next
number, 9, in the sequence. But there is a pattern there - it's
the 90th - 100th digits of the decimal expansion of pi.

[...]

A Google search for 342117067 gives numerous hits referring to the
digits of pi.

That is using knowledge of where the sequence comes from -
something else's knowledge rather than your own, but it's the same
principle.

"In the following sequence, what is the next digit
7,7,7,7,7,7,7,7,7 ? " :-)

PI=3.

1415926535 8979323846 2643383279 5028841971 6939937510
...
7777777772 4846769425 9310468643 5260899021 0266057232 # Line
517834

I suspect seeing that, that's not good.

Using pgmp-chudnovsky.c , and dumping pi as a binary float to a
file, I get this:

(text version of PI) 100,000,022 bytes
PI-Binary.bin 41,524,121 bytes exponent and limbs
PI-Binary.bin.7Z 41,526,823 bytes 7Z Ultra
compression, running on 1 core

The entropy property looks pretty good, but I doubt I would
be using that for my supply of random numbers :-)

In a random sequence of decimal digits, you would expect a sequence
of nine identical digits to turn up on average every 10 ^ 8 digits or
so. You calculated 10 ^ 8 digits, so it's not surprising to see that
here.

As for your compression, remember that your text file contains only
the digits 0 to 9, spaces and newlines - 12 different characters in
8-bit bytes. If these were purely randomly distributed, you'd expect
a best compression ratio of log(12) / log(256), or 0.448. But they
are not completely random - your space characters and newlines are
predictably spaced so you get marginally better compression ratios.
Without spaces and newlines, you'd expect log(10) / log(256)
compression - 0.415241012. What a coincidence - this matches your
"exponent and limbs", and your compressor can't improve on it. (I
downloaded a billion digits of pi and gzip'ed it, for a compression
ration of 0.469.)

It turns out that the pseudo-randomness here is extremely good.
While it has not been proven that pi is "normal" (that is to say, its
digits are all evenly distributed), it is strongly believed to be so.

Of course it's not a great source of entropy for secure random
numbers, but the digits of pi form a fine pseudo-random generator
function (if you don't mind the calculation time).

Would be interesting to find out if it passes Big Crash of L’Ecuyer.
Of course, one would need far more than a billion of decimal digits to
have a chance. Something like 100B hexadecimal digits appears to be a minimum.

Weirdly (at least /I/ think it is weird), it is easier to calculate hexadecimal digits of pi than decimal digits. At least, it is possible
to calculate them independently, without having to calculate and
remember all the previous digits. So it should be possible to split the
task up and run it in parallel on multiple systems.

Of course, confirming that the hexadecimal digits of pi are random
enough to pass such a test does not ensure that the decimal digits would
do so too.

https://gmplib.org/list-archives/gmp-discuss/2008-November/003444.html
https://stackoverflow.com/questions/3318979/how-to-serialize-the-gmp-mpf-type
https://gmplib.org/list-archives/gmp-discuss/2007-November/002981.html

gcc -DNO_FACTOR -fopenmp -Wall -O2 -o pgmp-chudnovsky
pgmp-chudnovsky.c -lgmp -lm

Paul

--- Synchronet 3.21b-Linux NewsLink 1.2

From Paul@nospam@needed.invalid to comp.lang.c,sci.math.num-analysis on Mon Feb 23 15:24:20 2026

From Newsgroup: comp.lang.c

On Mon, 2/23/2026 2:06 PM, David Brown wrote:

(text version of PI) 100,000,022 bytes
PI-Binary.bin 41,524,121 bytes exponent and limbs

Weirdly (at least /I/ think it is weird), it is easier to calculate hexadecimal digits of pi than decimal digits.

I computed the decimal representation and the hex representation
(by dumping the exponent and limbs), in the same run.

int mpf_out_raw (FILE *f, mpf_t X) {
int expt; mpz_t Z; size_t nz;
expt = X->_mp_exp;
fwrite(&expt, sizeof(int), 1, f);
nz = X->_mp_size;
Z->_mp_alloc = nz;
Z->_mp_size = nz;
Z->_mp_d = X->_mp_d;
return (mpz_out_raw(f, Z) + sizeof(int));
}

And that's called this way.

/* Open the destination file in binary write mode */
FILE *destination = fopen("PI-Binary.bin", "wb");
if (!destination) {
perror("Error opening PI-Binary.bin file");
} else {
mpf_out_raw(destination, qi); /* qi happens to hold 100 million digits of PI */
fflush(destination);
fclose(destination);
}

You can do them in the same run.

The 7,7,7,7,7,7,7,7,2 sequence was detected in a 32 million digit run
of SuperPi 1.5 XS. The 100 million digit sequence is too large
for SuperPI, and pgmp-chudnovsky.c (with OpenMP) was
used for that, with a little extra code thrown in so I could get
the floating point storage in raw format. It takes a bit more
than one minute, to generate 100 million digits (16 cores). The
compression attempt was done on a single core, to ensure the best
attempt at compression with 7ZIP.

The order of the PI method O() used is covered here.

https://en.wikipedia.org/wiki/Chudnovsky_algorithm

Paul
--- Synchronet 3.21b-Linux NewsLink 1.2

From Axel Reichert@mail@axel-reichert.de to comp.lang.c,sci.math.num-analysis on Tue Feb 24 07:08:52 2026

From Newsgroup: comp.lang.c

David Brown <david.brown@hesbynett.no> writes:

Of course, confirming that the hexadecimal digits of pi are random
enough to pass such a test does not ensure that the decimal digits
would do so too.

I was puzzled by the "Of course": To me, this is not intuitively clear.
Is there any easy (not too technical) way to "see this"/make it
plausible? My gut feeling (wrongly?) said that the base should not
affect the randomness of a numerical pattern. "Of course" I am aware
(and taught to dozens of numerical beginners) that, say, 0.5 in base 10
has a non terminating representation in base 2, but "random" is neither representation.

Pointers or simple counter-examples highly welcome!

Axel
--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to sci.math.num-analysis,comp.lang.c on Tue Feb 24 10:24:34 2026

From Newsgroup: comp.lang.c

On 24/02/2026 07:08, Axel Reichert wrote:

David Brown <david.brown@hesbynett.no> writes:

Of course, confirming that the hexadecimal digits of pi are random
enough to pass such a test does not ensure that the decimal digits
would do so too.

I was puzzled by the "Of course": To me, this is not intuitively clear.
Is there any easy (not too technical) way to "see this"/make it
plausible? My gut feeling (wrongly?) said that the base should not
affect the randomness of a numerical pattern. "Of course" I am aware
(and taught to dozens of numerical beginners) that, say, 0.5 in base 10
has a non terminating representation in base 2, but "random" is neither representation.

Pointers or simple counter-examples highly welcome!

Axel

Numbers can be normal in some bases and not in others. This is easy to
see if we pick related bases, such as base 2 and base 16. For example,
let x be 1/3. Then x is 0.0101010101... in base 2. That is clearly
normal in base 2. But in base 16, it is 0.55555555..., which is clearly
very far from normal.

I think (but I am not sure) that if a number is normal in base B, then
it will be normal in any other bases co-prime to B. If the bases are
not co-prime, then things are not as clear (as shown by my simple example).

Almost all (in the technical mathematical sense) real numbers are normal
in all bases. And lots of numbers (including pi) are believed to be
normal, and have been checked to various lengths in many bases. But it
is extremely difficult to prove that any given number is normal, unless
it can be seen from its construction.

Being normal in a base is not sufficient to have the digits form a good pseudo-random sequence, but it is a necessary condition for a uniform distribution random sequence.

(I know you set the follow-ups to exclude comp.lang.c, since it is
off-topic in that group, but I added it again as I don't follow sci.math.num-analysis, so I would not see any replies. People have
different opinions on the pros and cons of limiting follow-up groups.
My opinion is that it is better to leave all groups there as long as
there are people from different groups in the discussion, even if it is
moving off-topic for a group, because limiting follow-up groups can lead
to fragmentation. It is better for comp.lang.c to have one off-topic
thread than to have multiple threads that are part of the same
discussion but appear separately as groups are added and removed. If
the discussion goes on for long, and becomes dominated by s.m.n regulars
and of no interest to c.l.c regulars, then it becomes time to move it
over to just that one group. Others can have different opinions on such matters.)

--- Synchronet 3.21b-Linux NewsLink 1.2

From Axel Reichert@mail@axel-reichert.de to sci.math.num-analysis,comp.lang.c on Thu Feb 26 19:13:41 2026

From Newsgroup: comp.lang.c

David Brown <david.brown@hesbynett.no> writes:

Numbers can be normal in some bases and not in others. This is easy to
see if we pick related bases, such as base 2 and base 16. For example,
let x be 1/3. Then x is 0.0101010101... in base 2. That is clearly
normal in base 2. But in base 16, it is 0.55555555..., which is
clearly very far from normal.

I had to look up "normal" on Wikipedia and was in for a delightful read.
Thanks for the pointer and the nice explanation!

Best regards

Axel
--- Synchronet 3.21c-Linux NewsLink 1.2

Who's Online

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,097
Nodes:	10 (0 / 10)
Uptime:	21:17:11
Calls:	14,089
Files:	187,111
D/L today:	1,321 files (439M bytes)
Messages:	2,490,431

Re: Article of Melissa O'Nail

Who's Online

System Info