I experimented a bit more (in fact, more like a lot more) with
test batteries of L’Ecuyer. It led me to conclusion that occasional
failure in the either middle or big battery means nothing.
Sometimes even cripto-quality PRNG does not pass one or another test.
Then you try to reproduce it and see that with any other seed that you
try a failure does not happen.
All in all, it makes me more suspect of PRNGs that consistently pass
both batteries with various seed. I start to see it as a sign of
PRNG being rigged to pass tests.
Said above does not apply to scomp_LinearComp() failures of mt19937.
Those failures are very consistent. I just don't consider them
significant for my own use of PRNGs or for any other uses of PRNG that
I personally ever encountered.
Overall an experience strengthened my position that general wisdom, previously shared by O'Neil herself, got it right: in absence of the
special considerations people should select mt19937 and especially
mt19937-64 as their default PRNGs of choice.
Looking closer, apart from its properties of randomness and apart from
huge period
(which does not matter for me) I started to appreciate for
mt19937 for the following properties that I was not aware before:
- it does not use multiplier. So good fit for embedded systems that
have no (or very slow) multiplier HW.
- algorithm is very SIMD-friendly, so optimized implementations can be
very very fast on modern x86 and ARM64 hardware.
The latter property also means that very fast FPGA implementation is
easily possible as long as designer is willing to through at it
moderate amount of logic resources.
Said above does not mean that PCG generators of O'Neil have no place. Intuitively, they appear not bad. But the theory is unproven, optimized implementation is likely slower that optimized mt19937, claimed
"security" advantages are nonsense as admitted later by O'Neil herself.
And, as said above, I no longer trust her empirical methodology, based
on work of L’Ecuyer.
So, PCG generators are valuable addition to the toolbox, but not good
enough to change my default.
Michael Sanders <porkchop@invalid.foo> wrote:
Is it incorrect to use 0 (zero) to seed srand()?
int seed = (argc >= 2 && strlen(argv[1]) == 9)
? atoi(argv[1])
: (int)(time(NULL) % 900000000 + 100000000);
srand(seed);
I like to just read /dev/urandom when I need a random
number. Seem easier and more portable across Linux &
the *BSDs.
int s;
read(fd, &s, sizeof(int));
Pay attention that C Standard only requires for the same seed to always produces the same sequence. There is no requirement that different
seeds have to produce different sequences.
So, for generator in your example, implementation like below would be
fully legal. Personally, I wouldn't even consider it as particularly
poor quality:
void srand(unsigned seed ) { init = seed | 1;}
There is a paper "PCG: A Family of Simple Fast Space-Efficient
Statistically Good Algorithms for Random Number Generation"
by M. O?Neill where she gives a family of algorithms and runs
several statistical tests against known algorithms. Mersenne
Twister does not look good in tests. If you have enough (128) bits
LCGs do pass tests. A bunch of generators with 64-bit state also
passes tests. So the only reason to prefer Mersenne Twister is
that it is implemented in available libraries. Otherwise it is
not so good, have large state and needs more execution time
than alternatives.
Michael S <already5chosen@yahoo.com> writes:
[regarding rand() and srand()]
Pay attention that C Standard only requires for the same seed to
always produces the same sequence. There is no requirement that
different seeds have to produce different sequences.
So, for generator in your example, implementation like below would
be fully legal. Personally, I wouldn't even consider it as
particularly poor quality:
void srand(unsigned seed ) { init = seed | 1;}
It seems better to do, for example,
void srand(unsigned seed ) { init = seed - !seed;}
On Tue, 23 Dec 2025 17:54:05 -0000 (UTC)[...]
antispam@fricas.org (Waldek Hebisch) wrote:
There is a paper "PCG: A Family of Simple Fast Space-Efficient
Statistically Good Algorithms for Random Number Generation"
by M. O?Neill where she gives a family of algorithms and runs
several statistical tests against known algorithms. Mersenne
Twister does not look good in tests. If you have enough (128) bits
LCGs do pass tests. A bunch of generators with 64-bit state also
passes tests. So the only reason to prefer Mersenne Twister is
that it is implemented in available libraries. Otherwise it is
not so good, have large state and needs more execution time
than alternatives.
I don't know. Testing randomness is complicated matter.
How can I be sure that L'Ecuyer and Simard's TestU01 suite tests
things that I personally care about and that it does not test
things that are of no interest for me? Especially, the latter.
Also, the TestU01 suit is made for generators with 32-bit output.
M. O'Neill used ad hoc technique to make it applicable to
generators with 64-bit output. Is this technique right? Or may
be it put 64-bit PRNG at unfair disadvantage?
Besides, I strongly disagree with at least one assertion made by
O'Neill: "While security-related applications should use a secure
generator, because we cannot always know the future contexts in
which our code will be used, it seems wise for all applications to
avoid generators that make discovering their entire internal state
completely trivial."
No, I know exactly what I am doing/ I know exactly that for my
application easy discovery of complete state of PRNG is not a
defect.
Anyway, even if I am skeptical about her criticism of popular PRNGs, intuitively I agree with the constructive part of the article - medium-quality PRNG that feeds medium quality hash function can
potentially produce very good fast PRNG with rather small internal
state.
On related note, I think that even simple counter fed into high
quality hash function (not cryptographically high quality, far
less than that) can produce excellent PRNG with even smaller
internal state. But not very fast one. Although the speed
depends on specifics of used computer. I can imagine computer
that has low-latency Rijndael128 instruction. On such computer,
running counter through 3-4 rounds of Rijndael ill produce very
good PRNG that is only 2-3 times slower than, for example, LCG
128/64.
John McCue <jmclnx@gmail.com.invalid> writes:
Michael Sanders <porkchop@invalid.foo> wrote:
Is it incorrect to use 0 (zero) to seed srand()?
int seed = (argc >= 2 && strlen(argv[1]) == 9)
? atoi(argv[1])
: (int)(time(NULL) % 900000000 + 100000000);
srand(seed);
I like to just read /dev/urandom when I need a random
number. Seem easier and more portable across Linux &
the *BSDs.
int s;
read(fd, &s, sizeof(int));
Apples and oranges. Many applications that use random numbers
need a stream of numbers that is deterministic and reproducible,
which /dev/urandom is not.
Michael S <already5chosen@yahoo.com> writes:
On Tue, 23 Dec 2025 17:54:05 -0000 (UTC)[...]
antispam@fricas.org (Waldek Hebisch) wrote:
There is a paper "PCG: A Family of Simple Fast Space-Efficient
Statistically Good Algorithms for Random Number Generation"
by M. O?Neill where she gives a family of algorithms and runs
several statistical tests against known algorithms. Mersenne
Twister does not look good in tests. If you have enough (128) bits
LCGs do pass tests. A bunch of generators with 64-bit state also
passes tests. So the only reason to prefer Mersenne Twister is
that it is implemented in available libraries. Otherwise it is
not so good, have large state and needs more execution time
than alternatives.
I don't know. Testing randomness is complicated matter.
How can I be sure that L'Ecuyer and Simard's TestU01 suite tests
things that I personally care about and that it does not test
things that are of no interest for me? Especially, the latter.
Do you think any of the tests in the TestU01 suite are actually counter-indicated? As long as you don't think any TestU01 test
makes things worse, there is no reason not to use all of them.
You are always free to disregard tests you don't care about.
Also, the TestU01 suit is made for generators with 32-bit output.
M. O'Neill used ad hoc technique to make it applicable to
generators with 64-bit output. Is this technique right? Or may
be it put 64-bit PRNG at unfair disadvantage?
As long as the same mapping is applied to all 64-bit PRNGs under consideration I don't see a problem. The point of the test is to
compare PRNGs, not to compare test methods. If someone thinks a
different set of tests is called for they are free to run them.
Besides, I strongly disagree with at least one assertion made by
O'Neill: "While security-related applications should use a secure generator, because we cannot always know the future contexts in
which our code will be used, it seems wise for all applications to
avoid generators that make discovering their entire internal state completely trivial."
No, I know exactly what I am doing/ I know exactly that for my
application easy discovery of complete state of PRNG is not a
defect.
You and she are talking about different things. You are talking
about choosing a PRNG to be used only by yourself. She is talking
about choosing a PRNG to be made available to other people without
knowing who they are or what their needs are. In the second case
it's reasonable to raise the bar for the set of criteria that need
to be met.
Anyway, even if I am skeptical about her criticism of popular PRNGs, intuitively I agree with the constructive part of the article - medium-quality PRNG that feeds medium quality hash function can
potentially produce very good fast PRNG with rather small internal
state.
After looking at one of the example PCG generators, I would
describe it as a medium-quality PRNG that feeds a low-quality
hash. The particular combination I looked at produced good
results, but it isn't clear which combinations of PRNG and
hash would do likewise.
On related note, I think that even simple counter fed into high
quality hash function (not cryptographically high quality, far
less than that) can produce excellent PRNG with even smaller
internal state. But not very fast one. Although the speed
depends on specifics of used computer. I can imagine computer
that has low-latency Rijndael128 instruction. On such computer,
running counter through 3-4 rounds of Rijndael ill produce very
good PRNG that is only 2-3 times slower than, for example, LCG
128/64.
I think the point of her paper where she talks about determining
how much internal state is needed is to measure the efficacy of
the PRNG, not to try to reduce the amount of state needed. Based
on my own experience with various PRNGs I think it's a mistake to
try to minimize the amount of internal state needed. My own rule
of thumb is to allow at least a factor of four: for example, a
PRNG with a 32-bit output should have at least 128 bits of state.
My latest favorite has 256 bits of state to produce 32-bit
outputs (and so might also do well to produce 64-bit outputs, but
I haven't tested that).
[...]
Michael S <already5chosen@yahoo.com> wrote:Yes, there are many stable tests. But there are also many unstable
I experimented a bit more (in fact, more like a lot more) with
test batteries of L’Ecuyer. It led me to conclusion that occasional failure in the either middle or big battery means nothing.
Sometimes even cripto-quality PRNG does not pass one or another
test. Then you try to reproduce it and see that with any other seed
that you try a failure does not happen.
All in all, it makes me more suspect of PRNGs that consistently pass
both batteries with various seed. I start to see it as a sign of
PRNG being rigged to pass tests.
Well, that depends on the tests and threshhold in the tests.
Some tests when fed with trurly random source will produce produce
very small variation of the results. With generous threshhold
such test will essentially never fail for trurly random source.
OTOH when expected variation of the results is larger and
threshhold is tight, then trurly random source will fail the
test from time to time.
And if you test long enough you should
be able to estimate probability of failure and possibly compare
is with theoretical result if available.
To say the truth, I do not know what failures of crypto-quality PRNG
means. It may mean that the test is of tight kind that is supposed
to fail from time to time for trurly random source. Or it may mean
to PRNG improves cypto part at cost of statistics. That is
non-uniform distribution of the output is not a problem in
crypto applications. Simply for crypto purposes future output
should be not predictable from current and past output only.
And slightly non-uniform distribution can increase probablity
of test failure enough that you can observe such failures.
BTW: You mentioned using counter and hardware AES128 round.
Given cheap AES128 round I would try 128-bit state and AES128
round as state update. I do not know if hardware AES128 is
fast enough to make it wortwhile, but using AES128 round as a
state update should be much better than scrambling a counter.
Said above does not apply to scomp_LinearComp() failures of mt19937.
Those failures are very consistent. I just don't consider them
significant for my own use of PRNGs or for any other uses of PRNG
that I personally ever encountered.
Overall an experience strengthened my position that general wisdom, previously shared by O'Neil herself, got it right: in absence of the special considerations people should select mt19937 and especially mt19937-64 as their default PRNGs of choice.
Looking closer, apart from its properties of randomness and apart
from huge period
Huge period alone is easy. AFAICS matrix variants of LCG can
easily get quite large periods. I did not test matrix LCG,
but on statistical tests they should be essentially as good
as multiprecision LCG, but should be cheaper to implement.
Just to be clear, I mean equation x_{n+1} = Ax_n + b, wher x_n
is a vector reprezenting n-th state, A is a matrix and b is a
vector. Matrix A may be somewhat sparse, that is have limited
number of non-zero entries, and some entries my be simple, like
1 or -1. With proper choice of a and b period should be
comparable with number of availalable states.
I see no reason to go for very long periods, already 512 bits
of state allow perids which in practice should be indistingushable
from infinite period.
64-bit multipliers spread bits nicely, indeed. But implementing 64-bit multiplier on the core that natively has only 8x8=16 is not easy or(which does not matter for me) I started to appreciate for
mt19937 for the following properties that I was not aware before:
- it does not use multiplier. So good fit for embedded systems that
have no (or very slow) multiplier HW.
Biggest MCU with no hardware multiplier that I have has 2kB RAM.
I do not want mt19937 on such a machine. 64-bit multiplication
on 8-biter with hardware multiplier may be slow, so 64-bit LCG
(and improvements based on it) may be slow. But multiplication
nicely spreads out bits, so it is not clear to me if there is
equally good cheaper alternative.
If needed I would investigate
matrix LCG, they may be slightly cheaper.
- algorithm is very SIMD-friendly, so optimized implementations can
be very very fast on modern x86 and ARM64 hardware.
Just size of the state puts limit how fast it can be. And size of
the state means that it will compete for cache with user data.
BTW: AFAICS matrix LCG can be SIMD friendly too.
The latter property also means that very fast FPGA implementation is
easily possible as long as designer is willing to through at it
moderate amount of logic resources.
Said above does not mean that PCG generators of O'Neil have no
place. Intuitively, they appear not bad. But the theory is
unproven, optimized implementation is likely slower that optimized
mt19937, claimed "security" advantages are nonsense as admitted
later by O'Neil herself. And, as said above, I no longer trust her empirical methodology, based on work of L’Ecuyer.
So, PCG generators are valuable addition to the toolbox, but not
good enough to change my default.
I agree that ATM it is not entirely clear if PCG-s are as good as
suggested by tests. But I am surprised by your opinion about
speed, I did not analyze deeply either of them, but PCG-s are way
simpler, so I would expect them to be faster.
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
John McCue <jmclnx@gmail.com.invalid> writes:
Michael Sanders <porkchop@invalid.foo> wrote:
Is it incorrect to use 0 (zero) to seed srand()?
int seed = (argc >= 2 && strlen(argv[1]) == 9)
? atoi(argv[1])
: (int)(time(NULL) % 900000000 + 100000000);
srand(seed);
I like to just read /dev/urandom when I need a random
number. Seem easier and more portable across Linux &
the *BSDs.
int s;
read(fd, &s, sizeof(int));
Apples and oranges. Many applications that use random numbers
need a stream of numbers that is deterministic and reproducible,
which /dev/urandom is not.
And neither is the non-conforming rand() on OpenBSD.
The rand(1) man page on OpenBSD 7.8 says:
Standards insist that this interface return deterministic
results. Unsafe usage is very common, so OpenBSD changed the
subsystem to return non-deterministic results by default.
To satisfy portable code, srand() may be called to initialize
the subsystem. In OpenBSD the seed variable is ignored,
and strong random number results will be provided from
arc4random(3). In other systems, the seed variable primes a
simplistic deterministic algorithm.
It does provide an srand_deterministic() function that behaves the way srand() is supposed to.
Michael S <already5chosen@yahoo.com> wrote:[...]
Anyway, even if I am skeptical about her criticism of popular PRNGs,
intuitively I agree with the constructive part of the article -
medium-quality PRNG that feeds medium quality hash function can
potentially produce very good fast PRNG with rather small internal
state.
She seem to care very much about having minimal possible state.
That is may be nice on embeded systems, but in general I would
happily accept slighty bigger state (say 256 bits). But if
we can get good properties with very small state, then why not?
[...]
Michael S <already5chosen@yahoo.com> wrote:
I experimented a bit more (in fact, more like a lot more) with
test batteries of L?Ecuyer. It led me to conclusion that occasional
failure in the either middle or big battery means nothing.
Sometimes even cripto-quality PRNG does not pass one or another test.
Then you try to reproduce it and see that with any other seed that you
try a failure does not happen.
All in all, it makes me more suspect of PRNGs that consistently pass
both batteries with various seed. I start to see it as a sign of
PRNG being rigged to pass tests.
Well, that depends on the tests and threshhold in the tests.
Some tests when fed with trurly random source will produce produce
very small variation of the results. With generous threshhold
such test will essentially never fail for trurly random source.
OTOH when expected variation of the results is larger and
threshhold is tight, then trurly random source will fail the
test from time to time. And if you test long enough you should
be able to estimate probability of failure and possibly compare
is with theoretical result if available.
On Wed, 07 Jan 2026 13:54:21 -0800, Keith Thompson wrote:[...]
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
Apples and oranges. Many applications that use random numbers
need a stream of numbers that is deterministic and reproducible,
which /dev/urandom is not.
And neither is the non-conforming rand() on OpenBSD.
The rand(1) man page on OpenBSD 7.8 says:
Standards insist that this interface return deterministic
results. Unsafe usage is very common, so OpenBSD changed the
subsystem to return non-deterministic results by default.
To satisfy portable code, srand() may be called to initialize
the subsystem. In OpenBSD the seed variable is ignored,
and strong random number results will be provided from
arc4random(3). In other systems, the seed variable primes a
simplistic deterministic algorithm.
It does provide an srand_deterministic() function that behaves the way
srand() is supposed to.
So then clang would use:
#ifdef __OpenBSD__
srand_deterministic(seed);
#else
srand(seed);
#endif
But I don't know (yet) that gcc does as well under OpenBSD.
Michael Sanders <porkchop@invalid.foo> writes:
On Wed, 07 Jan 2026 13:54:21 -0800, Keith Thompson wrote:[...]
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
Apples and oranges. Many applications that use random numbers
need a stream of numbers that is deterministic and reproducible,
which /dev/urandom is not.
And neither is the non-conforming rand() on OpenBSD.
The rand(1) man page on OpenBSD 7.8 says:
Standards insist that this interface return deterministic
results. Unsafe usage is very common, so OpenBSD changed the
subsystem to return non-deterministic results by default.
To satisfy portable code, srand() may be called to initialize
the subsystem. In OpenBSD the seed variable is ignored,
and strong random number results will be provided from
arc4random(3). In other systems, the seed variable primes a
simplistic deterministic algorithm.
It does provide an srand_deterministic() function that behaves the way
srand() is supposed to.
So then clang would use:
#ifdef __OpenBSD__
srand_deterministic(seed);
#else
srand(seed);
#endif
But I don't know (yet) that gcc does as well under OpenBSD.
I don't know what you mean when you say that clang "would use"
that code.
I'm not aware that either clang or gcc uses random numbers
internally. I don't know why they would.
You could certainly write the above code and compile it with either
gcc or clang (or any other C compiler on OpenBSD). I've confirmed
that gcc on OpenBSD does predefine the symbol __OpenBSD__. There
should be no relevant difference between gcc and clang; random
number generation is implemented in the library, not in the compiler.
If your point is that a programmer using either gcc or clang could
use the above code to get the required deterministic behavior
for rand(), I agree. (Though it shouldn't be necessary; IMHO the
OpenBSD folks made a very bad decision.)
Relatedly, the NetBSD implementation of rand() is conforming, but
of very low quality. The low-order bit alternates between 0 and
1 on successive rand() calls, the two low-order bits repeat with
a cycle of 4, and so on. Larry Jones wrote about it here in 2010:
The even/odd problem was caused at Berkeley by a well meaning
but clueless individual who increased the range of the generator
(which originally matched the sample implementation) by returning
the *entire* internal state rather than just the high-order
bits of it. BSD was very popular, so that defective generator
got around a lot, unfortunately.
And I've just discovered that the OpenBSD rand() returns alternating
odd and even results after a call to srand_determinstic().
It's disturbing that this has never been fixed.
On Thu, 08 Jan 2026 14:44:27 -0800, Keith Thompson wrote:[...]
Michael Sanders <porkchop@invalid.foo> writes:
So then clang would use:
#ifdef __OpenBSD__
srand_deterministic(seed);
#else
srand(seed);
#endif
But I don't know (yet) that gcc does as well under OpenBSD.
I don't know what you mean when you say that clang "would use"
that code.
I'm not aware that either clang or gcc uses random numbers
internally. I don't know why they would.
Well, I meant the macro itself is (I'm guessing) probably defined
by clang since its the default compiler.
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
John McCue <jmclnx@gmail.com.invalid> writes:
Michael Sanders <porkchop@invalid.foo> wrote:
Is it incorrect to use 0 (zero) to seed srand()?
int seed = (argc >= 2 && strlen(argv[1]) == 9)
? atoi(argv[1])
: (int)(time(NULL) % 900000000 + 100000000);
srand(seed);
I like to just read /dev/urandom when I need a random
number. Seem easier and more portable across Linux &
the *BSDs.
int s;
read(fd, &s, sizeof(int));
Apples and oranges. Many applications that use random numbers
need a stream of numbers that is deterministic and reproducible,
which /dev/urandom is not.
And neither is the non-conforming rand() on OpenBSD.
The rand(1) man page on OpenBSD 7.8 says:
Standards insist that this interface return deterministic
results. Unsafe usage is very common, so OpenBSD changed the
subsystem to return non-deterministic results by default.
To satisfy portable code, srand() may be called to initialize
the subsystem. In OpenBSD the seed variable is ignored,
and strong random number results will be provided from
arc4random(3). In other systems, the seed variable primes a
simplistic deterministic algorithm.
But your original statement implied that clang would *use* that
particular piece of code, which didn't make much sense. Were you
just asking about how the __OpenBSD__ macro is defined, without
reference to srand?
On Thu, 08 Jan 2026 22:46:42 -0800, Keith Thompson wrote:
But your original statement implied that clang would *use* that
particular piece of code, which didn't make much sense. Were you
just asking about how the __OpenBSD__ macro is defined, without
reference to srand?
Well, under OpenBSD I plan on using:
#ifdef __OpenBSD__
srand_deterministic(seed);
#else
srand(seed);
#endif
But what I was asking is whether or not gcc would recognize
the __OpenBSD__ macro (why wouldn't I'm assuming) since clang
is the default compiler.
But also about srand()... you've got me really wondering why
OpenBSD would deviate from the standard as they have. I get
that the those folks disagree because its deterministic, but
its the accepted standard to be deterministic with srand().
Only speaking for myself here, rather than srand_deterministic()
and srand() (that's not deterministic under OpenBSD) it
would've made more sense to've implemented srand_non_deterministic()
and left srand() alone. That design decision on their part only
muddies the waters in my thinking. Live & learn =)
On Thu, 08 Jan 2026 22:46:42 -0800, Keith Thompson wrote:
But your original statement implied that clang would *use* that
particular piece of code, which didn't make much sense. Were you
just asking about how the __OpenBSD__ macro is defined, without
reference to srand?
Well, under OpenBSD I plan on using:
#ifdef __OpenBSD__
srand_deterministic(seed);
#else
srand(seed);
#endif
But what I was asking is whether or not gcc would recognize
the __OpenBSD__ macro (why wouldn't I'm assuming) since clang
is the default compiler.
But also about srand()... you've got me really wondering why
OpenBSD would deviate from the standard as they have. I get
that the those folks disagree because its deterministic, but
its the accepted standard to be deterministic with srand().
Only speaking for myself here, rather than srand_deterministic()
and srand() (that's not deterministic under OpenBSD) it
would've made more sense to've implemented srand_non_deterministic()
and left srand() alone. That design decision on their part only
muddies the waters in my thinking. Live & learn =)
On Thu, 08 Jan 2026 22:46:42 -0800, Keith Thompson wrote:
But your original statement implied that clang would *use* that
particular piece of code, which didn't make much sense. Were you
just asking about how the __OpenBSD__ macro is defined, without
reference to srand?
Well, under OpenBSD I plan on using:
#ifdef __OpenBSD__
srand_deterministic(seed);
#else
srand(seed);
#endif
On Wed, 07 Jan 2026 08:41:25 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Tue, 23 Dec 2025 17:54:05 -0000 (UTC)
antispam@fricas.org (Waldek Hebisch) wrote:
[...]
There is a paper "PCG: A Family of Simple Fast Space-Efficient
Statistically Good Algorithms for Random Number Generation"
by M. O?Neill where she gives a family of algorithms and runs
several statistical tests against known algorithms. Mersenne
Twister does not look good in tests. If you have enough (128) bits
LCGs do pass tests. A bunch of generators with 64-bit state also
passes tests. So the only reason to prefer Mersenne Twister is
that it is implemented in available libraries. Otherwise it is
not so good, have large state and needs more execution time
than alternatives.
I don't know. Testing randomness is complicated matter.
How can I be sure that L'Ecuyer and Simard's TestU01 suite tests
things that I personally care about and that it does not test
things that are of no interest for me? Especially, the latter.
Do you think any of the tests in the TestU01 suite are actually
counter-indicated? As long as you don't think any TestU01 test
makes things worse, there is no reason not to use all of them.
You are always free to disregard tests you don't care about.
Except that it's difficult psychologically.
The batteries of test gains position of of authority in your mind.
Well, may be, you specifically are resistant, but I am not. Nor is
Melissa O'Nail, it seems.
To illustrate my point, I will tell you the story about myself.
Sort of confession.
[very large portion]
One important point that I seem to figure out recently is that the
only practical way to produce both solid and very fast PRNG that
adheres to standard language APIs with 32-bit and to somewhat smaller
extent 64-bit output, is to use buffering. I.e. most of the time
generator simply reads pre-calculated word from the buffer and only
ones per N iterations runs an actual PRNG algorithm, probably in a
loop, often in SIMD. In order for this approach to be effective,
buffer can't be particularly small. 32 bytes (256 bits) appear to be
an absolute minimum. The buffer and counter that manages buffering,
are parts of the generator state. That alone sets a practical minimal
limit on the size of generator and diminishes significance of the
difference between PRNGs with "algorithmic" state of 64 bits, 128 bits
or even 256 bits.
Michael S <already5chosen@yahoo.com> writes:
On Wed, 07 Jan 2026 08:41:25 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Tue, 23 Dec 2025 17:54:05 -0000 (UTC)
antispam@fricas.org (Waldek Hebisch) wrote:
[...]
There is a paper "PCG: A Family of Simple Fast Space-Efficient
Statistically Good Algorithms for Random Number Generation"
by M. O?Neill where she gives a family of algorithms and runs
several statistical tests against known algorithms. Mersenne
Twister does not look good in tests. If you have enough (128)
bits LCGs do pass tests. A bunch of generators with 64-bit
state also passes tests. So the only reason to prefer Mersenne
Twister is that it is implemented in available libraries.
Otherwise it is not so good, have large state and needs more
execution time than alternatives.
I don't know. Testing randomness is complicated matter.
How can I be sure that L'Ecuyer and Simard's TestU01 suite tests
things that I personally care about and that it does not test
things that are of no interest for me? Especially, the latter.
Do you think any of the tests in the TestU01 suite are actually
counter-indicated? As long as you don't think any TestU01 test
makes things worse, there is no reason not to use all of them.
You are always free to disregard tests you don't care about.
Except that it's difficult psychologically.
The batteries of test gains position of of authority in your mind.
Well, may be, you specifically are resistant, but I am not. Nor is
Melissa O'Nail, it seems.
To illustrate my point, I will tell you the story about myself.
Sort of confession.
[very large portion]
I have read through your whole posting several times, and also
looked through your other postings in this thread. Despite my
efforts I am still not sure what you think or what you're trying to
say.
Let me put it as a question. Do you think there is a good and
objective test for measuring the quality of a PRNG? If so what test
(or tests) do you think would suffice? Here "quality" is meant as
some sort of numeric measure, which could be a monotonic metric (as
in "the larger the number the higher the quality") or just a simple pass/fail.
If you don't think there is any such test, how do you propose that
PRNGs be evaluated?
On Tue, 03 Feb 2026 05:26:47 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Wed, 07 Jan 2026 08:41:25 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Tue, 23 Dec 2025 17:54:05 -0000 (UTC)
antispam@fricas.org (Waldek Hebisch) wrote:
[...]
There is a paper "PCG: A Family of Simple Fast Space-Efficient
Statistically Good Algorithms for Random Number Generation"
by M. O?Neill where she gives a family of algorithms and runs
several statistical tests against known algorithms. Mersenne
Twister does not look good in tests. If you have enough (128)
bits LCGs do pass tests. A bunch of generators with 64-bit
state also passes tests. So the only reason to prefer Mersenne
Twister is that it is implemented in available libraries.
Otherwise it is not so good, have large state and needs more
execution time than alternatives.
I don't know. Testing randomness is complicated matter.
How can I be sure that L'Ecuyer and Simard's TestU01 suite tests
things that I personally care about and that it does not test
things that are of no interest for me? Especially, the latter.
Do you think any of the tests in the TestU01 suite are actually
counter-indicated? As long as you don't think any TestU01 test
makes things worse, there is no reason not to use all of them.
You are always free to disregard tests you don't care about.
Except that it's difficult psychologically.
The batteries of test gains position of of authority in your mind.
Well, may be, you specifically are resistant, but I am not. Nor is
Melissa O'Nail, it seems.
To illustrate my point, I will tell you the story about myself.
Sort of confession.
[very large portion]
I have read through your whole posting several times, and also
looked through your other postings in this thread. Despite my
efforts I am still not sure what you think or what you're trying to
say.
Let me put it as a question. Do you think there is a good and
objective test for measuring the quality of a PRNG? If so what test
(or tests) do you think would suffice? Here "quality" is meant as
some sort of numeric measure, which could be a monotonic metric (as
in "the larger the number the higher the quality") or just a simple
pass/fail.
I don't think that it is possible to create generic empirical tests
of this sort.
What is possible is a test that measures specific property that is
known to be important for specific use.
If you don't think there is any such test, how do you propose that
PRNGs be evaluated?
I don't know.
But I believe in one philosophical principle that I learned first ~40
years ago in the field unrelated to PRNGs or to Computer Science:
when you don't know how to measure quality of your product then it is advisable to ask your consumer. Do not ask a random consumer, ask the
one who requested improvement of the property that you have
difficulties to measure. He is the most likely one to know.
The field where I first heard about that principle was manufacturing
of ultra-pure chemicals.
The key property of a (pseudo) random number generator is that the
values produced exhibit no discernible pattern.
On 18/02/2026 07:47, Tim Rentsch wrote:
The key property of a (pseudo) random number generator is that the
values produced exhibit no discernible pattern.
For a PRNG, they exhibit the pattern of following the sequence of the PRNG!
Is it that, for any finite sequence of numbers from a PRNG, without information about where it came from and how many numbers came before
you can't predict the next number better than chance?
On 18/02/2026 12:21, Tristan Wibberley wrote:
On 18/02/2026 07:47, Tim Rentsch wrote:
The key property of a (pseudo) random number generator is that the
values produced exhibit no discernible pattern.
For a PRNG, they exhibit the pattern of following the sequence of the PRNG! >>
As a deterministic function, a PRNG will obviously follow the pattern of
its generating function. But the aim is to have no /discernible/
pattern. The sequence 3, 4, 2, 1, 1, 7, 0, 6, 7 has no pattern that
could be identified without knowledge of where they came from - and thus
no way to predict the next number, 9, in the sequence. But there is a pattern there - it's the 90th - 100th digits of the decimal expansion of pi.
On 2026-02-19 04:01, David Brown wrote:
On 18/02/2026 12:21, Tristan Wibberley wrote:
On 18/02/2026 07:47, Tim Rentsch wrote:
The key property of a (pseudo) random number generator is that the
values produced exhibit no discernible pattern.
For a PRNG, they exhibit the pattern of following the sequence of the PRNG! >>>
As a deterministic function, a PRNG will obviously follow the pattern of
its generating function. But the aim is to have no /discernible/
pattern. The sequence 3, 4, 2, 1, 1, 7, 0, 6, 7 has no pattern that
could be identified without knowledge of where they came from - and thus
no way to predict the next number, 9, in the sequence. But there is a
pattern there - it's the 90th - 100th digits of the decimal expansion of pi.
I think you're being overoptimistic. I suspect that the pattern could be identified, exactly, without knowing how it was generated. That's
because every possible pattern has infinitely many different ways in
which in can be produced. One of those other ways might be easier to
describe than the way in which the numbers were actually produced, in
which case that simpler way might be guessed more easily that the actual
one - possibly a lot easier.
As a deterministic function, a PRNG will obviously follow the pattern[...]
of its generating function. But the aim is to have no /discernible/
pattern. The sequence 3, 4, 2, 1, 1, 7, 0, 6, 7 has no pattern that
could be identified without knowledge of where they came from - and
thus no way to predict the next number, 9, in the sequence. But there
is a pattern there - it's the 90th - 100th digits of the decimal
expansion of pi.
David Brown <david.brown@hesbynett.no> writes:
[...]
As a deterministic function, a PRNG will obviously follow the pattern[...]
of its generating function. But the aim is to have no /discernible/
pattern. The sequence 3, 4, 2, 1, 1, 7, 0, 6, 7 has no pattern that
could be identified without knowledge of where they came from - and
thus no way to predict the next number, 9, in the sequence. But there
is a pattern there - it's the 90th - 100th digits of the decimal
expansion of pi.
A Google search for 342117067 gives numerous hits referring to the
digits of pi.
How likely is it that someone would guess a formula that happened to
generate the decimal digits of pi, without more knowledge than a part
of the sequence? I don't believe it is possible to quantify such a probability, but I would expect it to be very low.
On 2026-02-19 14:47, David Brown wrote:
...
How likely is it that someone would guess a formula that happened to
generate the decimal digits of pi, without more knowledge than a part
of the sequence? I don't believe it is possible to quantify such a
probability, but I would expect it to be very low.
I'm thinking of the kind of software that looks for patterns in
something, such as compression utilities. A compression utility
basically converts a long string of numbers into a much shorter string
that can be expanded by the decompression utility to recover the
original pattern. If you look at the algorithms such code uses, you
realize that they do not attempt to recreate the process that originally generated the long string, they just, in effect, characterize the
resulting sequence.
On 19/02/2026 23:39, Keith Thompson wrote:
David Brown <david.brown@hesbynett.no> writes:
[...]
As a deterministic function, a PRNG will obviously follow the pattern[...]
of its generating function. But the aim is to have no /discernible/
pattern. The sequence 3, 4, 2, 1, 1, 7, 0, 6, 7 has no pattern that
could be identified without knowledge of where they came from - and
thus no way to predict the next number, 9, in the sequence. But there
is a pattern there - it's the 90th - 100th digits of the decimal
expansion of pi.
A Google search for 342117067 gives numerous hits referring to the
digits of pi.
That is using knowledge of where the sequence comes from - something else's knowledge rather than your own, but it's the same principle.
On Fri, 2/20/2026 3:16 AM, David Brown wrote:
On 19/02/2026 23:39, Keith Thompson wrote:
David Brown <david.brown@hesbynett.no> writes:
[...]
As a deterministic function, a PRNG will obviously follow the pattern[...]
of its generating function. But the aim is to have no /discernible/
pattern. The sequence 3, 4, 2, 1, 1, 7, 0, 6, 7 has no pattern that
could be identified without knowledge of where they came from - and
thus no way to predict the next number, 9, in the sequence. But there >>>> is a pattern there - it's the 90th - 100th digits of the decimal
expansion of pi.
A Google search for 342117067 gives numerous hits referring to the
digits of pi.
That is using knowledge of where the sequence comes from - something else's knowledge rather than your own, but it's the same principle.
"In the following sequence, what is the next digit 7,7,7,7,7,7,7,7,7 ? " :-)
PI=3.
1415926535 8979323846 2643383279 5028841971 6939937510
...
7777777772 4846769425 9310468643 5260899021 0266057232 # Line 517834
I suspect seeing that, that's not good.
Using pgmp-chudnovsky.c , and dumping pi as a binary float to a file,
I get this:
(text version of PI) 100,000,022 bytes
PI-Binary.bin 41,524,121 bytes exponent and limbs
PI-Binary.bin.7Z 41,526,823 bytes 7Z Ultra compression, running on 1 core
The entropy property looks pretty good, but I doubt I would
be using that for my supply of random numbers :-)
https://gmplib.org/list-archives/gmp-discuss/2008-November/003444.html https://stackoverflow.com/questions/3318979/how-to-serialize-the-gmp-mpf-type
https://gmplib.org/list-archives/gmp-discuss/2007-November/002981.html
gcc -DNO_FACTOR -fopenmp -Wall -O2 -o pgmp-chudnovsky pgmp-chudnovsky.c -lgmp -lm
Paul
On 23/02/2026 14:32, Paul wrote:
On Fri, 2/20/2026 3:16 AM, David Brown wrote:
On 19/02/2026 23:39, Keith Thompson wrote:
David Brown <david.brown@hesbynett.no> writes:
[...]
As a deterministic function, a PRNG will obviously follow the[...]
pattern of its generating function. But the aim is to have no
/discernible/ pattern. The sequence 3, 4, 2, 1, 1, 7, 0, 6, 7
has no pattern that could be identified without knowledge of
where they came from - and thus no way to predict the next
number, 9, in the sequence. But there is a pattern there - it's
the 90th - 100th digits of the decimal expansion of pi.
A Google search for 342117067 gives numerous hits referring to the
digits of pi.
That is using knowledge of where the sequence comes from -
something else's knowledge rather than your own, but it's the same
principle.
"In the following sequence, what is the next digit
7,7,7,7,7,7,7,7,7 ? " :-)
PI=3.
1415926535 8979323846 2643383279 5028841971 6939937510
...
7777777772 4846769425 9310468643 5260899021 0266057232 # Line
517834
I suspect seeing that, that's not good.
Using pgmp-chudnovsky.c , and dumping pi as a binary float to a
file, I get this:
(text version of PI) 100,000,022 bytes
PI-Binary.bin 41,524,121 bytes exponent and limbs
PI-Binary.bin.7Z 41,526,823 bytes 7Z Ultra
compression, running on 1 core
The entropy property looks pretty good, but I doubt I would
be using that for my supply of random numbers :-)
In a random sequence of decimal digits, you would expect a sequence
of nine identical digits to turn up on average every 10 ^ 8 digits or
so. You calculated 10 ^ 8 digits, so it's not surprising to see that
here.
As for your compression, remember that your text file contains only
the digits 0 to 9, spaces and newlines - 12 different characters in
8-bit bytes. If these were purely randomly distributed, you'd expect
a best compression ratio of log(12) / log(256), or 0.448. But they
are not completely random - your space characters and newlines are predictably spaced so you get marginally better compression ratios.
Without spaces and newlines, you'd expect log(10) / log(256)
compression - 0.415241012. What a coincidence - this matches your
"exponent and limbs", and your compressor can't improve on it. (I
downloaded a billion digits of pi and gzip'ed it, for a compression
ration of 0.469.)
It turns out that the pseudo-randomness here is extremely good.
While it has not been proven that pi is "normal" (that is to say, its
digits are all evenly distributed), it is strongly believed to be so.
Of course it's not a great source of entropy for secure random
numbers, but the digits of pi form a fine pseudo-random generator
function (if you don't mind the calculation time).
https://gmplib.org/list-archives/gmp-discuss/2008-November/003444.html https://stackoverflow.com/questions/3318979/how-to-serialize-the-gmp-mpf-type
https://gmplib.org/list-archives/gmp-discuss/2007-November/002981.html
gcc -DNO_FACTOR -fopenmp -Wall -O2 -o pgmp-chudnovsky
pgmp-chudnovsky.c -lgmp -lm
Paul
On Mon, 23 Feb 2026 16:05:45 +0100
David Brown <david.brown@hesbynett.no> wrote:
On 23/02/2026 14:32, Paul wrote:
On Fri, 2/20/2026 3:16 AM, David Brown wrote:
On 19/02/2026 23:39, Keith Thompson wrote:
David Brown <david.brown@hesbynett.no> writes:
[...]
As a deterministic function, a PRNG will obviously follow the[...]
pattern of its generating function. But the aim is to have no
/discernible/ pattern. The sequence 3, 4, 2, 1, 1, 7, 0, 6, 7
has no pattern that could be identified without knowledge of
where they came from - and thus no way to predict the next
number, 9, in the sequence. But there is a pattern there - it's
the 90th - 100th digits of the decimal expansion of pi.
A Google search for 342117067 gives numerous hits referring to the
digits of pi.
That is using knowledge of where the sequence comes from -
something else's knowledge rather than your own, but it's the same
principle.
"In the following sequence, what is the next digit
7,7,7,7,7,7,7,7,7 ? " :-)
PI=3.
1415926535 8979323846 2643383279 5028841971 6939937510
...
7777777772 4846769425 9310468643 5260899021 0266057232 # Line
517834
I suspect seeing that, that's not good.
Using pgmp-chudnovsky.c , and dumping pi as a binary float to a
file, I get this:
(text version of PI) 100,000,022 bytes
PI-Binary.bin 41,524,121 bytes exponent and limbs
PI-Binary.bin.7Z 41,526,823 bytes 7Z Ultra
compression, running on 1 core
The entropy property looks pretty good, but I doubt I would
be using that for my supply of random numbers :-)
In a random sequence of decimal digits, you would expect a sequence
of nine identical digits to turn up on average every 10 ^ 8 digits or
so. You calculated 10 ^ 8 digits, so it's not surprising to see that
here.
As for your compression, remember that your text file contains only
the digits 0 to 9, spaces and newlines - 12 different characters in
8-bit bytes. If these were purely randomly distributed, you'd expect
a best compression ratio of log(12) / log(256), or 0.448. But they
are not completely random - your space characters and newlines are
predictably spaced so you get marginally better compression ratios.
Without spaces and newlines, you'd expect log(10) / log(256)
compression - 0.415241012. What a coincidence - this matches your
"exponent and limbs", and your compressor can't improve on it. (I
downloaded a billion digits of pi and gzip'ed it, for a compression
ration of 0.469.)
It turns out that the pseudo-randomness here is extremely good.
While it has not been proven that pi is "normal" (that is to say, its
digits are all evenly distributed), it is strongly believed to be so.
Of course it's not a great source of entropy for secure random
numbers, but the digits of pi form a fine pseudo-random generator
function (if you don't mind the calculation time).
Would be interesting to find out if it passes Big Crash of L’Ecuyer.
Of course, one would need far more than a billion of decimal digits to
have a chance. Something like 100B hexadecimal digits appears to be a minimum.
https://gmplib.org/list-archives/gmp-discuss/2008-November/003444.html
https://stackoverflow.com/questions/3318979/how-to-serialize-the-gmp-mpf-type
https://gmplib.org/list-archives/gmp-discuss/2007-November/002981.html
gcc -DNO_FACTOR -fopenmp -Wall -O2 -o pgmp-chudnovsky
pgmp-chudnovsky.c -lgmp -lm
Paul
(text version of PI) 100,000,022 bytes
PI-Binary.bin 41,524,121 bytes exponent and limbs
Weirdly (at least /I/ think it is weird), it is easier to calculate hexadecimal digits of pi than decimal digits.
Of course, confirming that the hexadecimal digits of pi are random
enough to pass such a test does not ensure that the decimal digits
would do so too.
David Brown <david.brown@hesbynett.no> writes:
Of course, confirming that the hexadecimal digits of pi are random
enough to pass such a test does not ensure that the decimal digits
would do so too.
I was puzzled by the "Of course": To me, this is not intuitively clear.
Is there any easy (not too technical) way to "see this"/make it
plausible? My gut feeling (wrongly?) said that the base should not
affect the randomness of a numerical pattern. "Of course" I am aware
(and taught to dozens of numerical beginners) that, say, 0.5 in base 10
has a non terminating representation in base 2, but "random" is neither representation.
Pointers or simple counter-examples highly welcome!
Axel
Numbers can be normal in some bases and not in others. This is easy to
see if we pick related bases, such as base 2 and base 16. For example,
let x be 1/3. Then x is 0.0101010101... in base 2. That is clearly
normal in base 2. But in base 16, it is 0.55555555..., which is
clearly very far from normal.
| Sysop: | DaiTengu |
|---|---|
| Location: | Appleton, WI |
| Users: | 1,097 |
| Nodes: | 10 (0 / 10) |
| Uptime: | 21:17:11 |
| Calls: | 14,089 |
| Files: | 187,111 |
| D/L today: |
1,321 files (439M bytes) |
| Messages: | 2,490,431 |