DFS <nospam@dfs.com> writes:
On 3/21/2026 2:07 AM, Tim Rentsch wrote:
DFS <nospam@dfs.com> writes:
On 3/12/2026 2:24 AM, Bonita Montero wrote:
There was a programming-"contest" on comp.lang.c and I wanted to show >>>>> the simpler code in C, here it is:Is your strategy to just ignore reality, and keep making bogus claims
[C++ code]
C++ really rocks since you've to deal with much less details than in C. >>>>
that - for this challenge at least - you can't support?
Please don't encourage people who insist on using comp.lang.c
for other languages. It's better to just ignore them.
It doesn't bother me.
I'm sorry you place so little value on my participation.
It's no more off-topic than a discussion of sorting algorithms.
That's wrong. There have been lots of posts in the discussion of
sorting that are relevant to C. The problem with encouraging
discussion of C++, or Python, or other such languages, is they
tend to produce lots of postings that have nothing to do with C,
and rarely produce postings that have anything to do with C.
Not true at all [1]. But clc is a "No Kings" newsgroup, and nobody[...]
can dictate who or what kind of posts are made. If Montero or I annoy
you, killfile us.
I'm sorry you place so little value on my participation.
Not true at all [1]. But clc is a "No Kings" newsgroup, and nobody can >dictate who or what kind of posts are made. If Montero or I annoy you, >killfile us.
On 21/03/2026 02:35, DFS wrote:
On 3/20/2026 9:53 PM, Janis Papanagnou wrote:
On 2026-03-20 22:10, DFS wrote:
Sometimes listening to you clc guys is like listening to a room
full of CS professors (which I suspect some of you are or were).
CS professors sitting in their ivory tower and without practical
experiences, and programmers without a substantial CS background;
both of these extreme characters can be problematic (if fanatic).
It was meant as a compliment. Plenty of CS professors have good
practical and industry experience, too, which you'll see on their
bios and cv's.
It's got to be tough to find good CS teachers that stick around,
given they can probably make much more money in private industry.
Many folks here seem to have a good mix of necessary practical
IT, Project, and CS knowledge and proficiencies, as I'd value it.
And a clear mind to discuss topics. Sometimes it gets heated and
personal, though; certainly nothing to foster.
I've definitely seen Bart take some heat for continuing to post code
in his scripting language, rather than C
People post bits of code in all sorts of languages when it suits
them. It's happened in this thread (Bash, AWK, Python as well as C++),
and in many others.
But I get a lot of heat because I use my personal languages,
Heapsort is only marginaly more complex than bubble sort [...]
On average heapsort is not as fast as quicksort (main factor seem
to by much worse cache behaviour),
-----------------------
void isort(char** data, int ll, int rr) {
char* temp;
int i = ll, j = rr;
char* pivot = data[(ll + rr) / 2];
do {
while (strcmp(pivot, data[i]) > 0 && i < rr) ++i;
while (strcmp(pivot, data[j]) < 0 && j > ll) --j;
if (i <= j) {
temp = data[i]; data[i] = data[j]; data[j] = temp;
++i;
--j;
}
} while (i <= j);
if (ll < j) isort(data, ll, j);
if (i < rr) isort(data, i, rr);
}
//For a char* array A of N elements, call as 'isort(A, 0, n-1)'.
Looks o.k. speed wise.
It is not ideomatic C,
On Fri, 20 Mar 2026 14:47:18 +0100
David Brown <david.brown@hesbynett.no> wrote:
The cost of doing a comparison does not affect the complexity of
an algorithm - but it can certainly affect the best choice of
algorithm for the task in hand.
In case of lexicographic sort comparison most certainly affects
BigO. BigO is about counting elementary operations or, at least,
about counting non-elementary operations that have complexity
independent of N. In out specific case (described in the previous
paragraph) an average number of elementary comparisons within
strcmp() does depend on N. One can expect that it grows with N,
because for longer N quicksort spends relatively more time
comparing strings with longer and longer common prefixes.
I did a couple of experiments.
Here is a number of character comparisons during sorting by
quicksort with strcmp-alike comparison as a function number of
sorted strings. Inputs used for experiments were first N lines of
two long books (those that I suggested to Bart in other post in
this thread).
Book 1:
1000 40367
2000 93404
3000 156995
4000 238062
5000 322275
6000 401332
7000 483294
8000 588128
9000 707559
10000 811159
11000 969934
12000 1061037
13000 1172776
14000 1327489
15000 1456356
16000 1565355
17000 1691351
18000 1811836
19000 1971700
20000 2101477
21000 2237132
22000 2348410
23000 2492149
24000 2618858
25000 2726828
26000 2881609
27000 3017903
28000 3183088
29000 3320716
30000 3523695
31000 3719782
32000 3852290
33000 4007476
34000 4149704
35000 4228143
36000 4399583
37000 4581223
38000 4761085
39000 4919182
40000 5110223
41000 5463271
42000 5605860
43000 5830282
44000 6096857
45000 6464758
46000 6860786
47000 7101972
48000 7434769
49000 7736464
50000 8028681
51000 8334358
52000 8560822
53000 8814096
54000 8986343
55000 9162604
56000 9358948
57000 9468853
58000 9638512
59000 9863370
60000 10000734
61000 10242187
62000 10500148
63000 10654166
64000 10882232
65000 11187624
66000 11290778
67000 11481047
68000 11664433
69000 11858529
70000 12059970
71000 12173240
72000 12232160
73000 12418818
74000 12577954
75000 12786122
76000 12939916
77000 13083515
78000 13275432
79000 13616563
80000 13797048
81000 14137362
82000 14388124
83000 14667272
84000 14914600
85000 15174195
86000 15227048
87000 15414838
88000 15526278
89000 15670034
90000 15966396
91000 16120859
92000 16267285
93000 16402168
94000 16718506
95000 16968960
96000 17169618
97000 17362159
98000 17486179
99000 17944209
100000 18085095
101000 18455071
102000 18733130
102100 18769521
Book 2:
1000 104846
2000 243080
3000 399564
4000 661436
5000 855870
6000 1017042
7000 1205936
8000 1341884
9000 1511727
10000 1681973
11000 1820114
12000 1978196
13000 2081813
14000 2162091
15000 2280537
16000 2375272
17000 2500672
18000 2617364
19000 2787782
20000 2960678
21000 3149483
22000 3286894
23000 3481047
24000 3638352
25000 3842186
26000 4001105
27000 4083466
28000 4239273
29000 4399250
30000 4505316
30384 4598719
It is easy to see that at the beginning # of comparisons grows faster
than N*log(N). For the 1st book the trend continues throughout all
chart. For the 2nd book it does not.
I'd guess that the reasons behind it is that the 1st book is
relatively homogeneous, while the 2nd book consists of rather
distinct parts, likely with different characteristics of beginnings
of lines. At parts the 2nd book is full of periods, of rhythmical
prose close to poetry and at other parts it is not. I didn't have
time to dig deeper.
Michael S <already5chosen@yahoo.com> writes:
[discussion of sorting code]
-----------------------
void isort(char** data, int ll, int rr) {
char* temp;
int i = ll, j = rr;
char* pivot = data[(ll + rr) / 2];
do {
while (strcmp(pivot, data[i]) > 0 && i < rr) ++i;
while (strcmp(pivot, data[j]) < 0 && j > ll) --j;
if (i <= j) {
temp = data[i]; data[i] = data[j]; data[j] = temp;
++i;
--j;
}
} while (i <= j);
if (ll < j) isort(data, ll, j);
if (i < rr) isort(data, i, rr);
}
//For a char* array A of N elements, call as 'isort(A, 0, n-1)'.
Looks o.k. speed wise.
It is not ideomatic C,
What about the code prompts you to say it is not idiomatic C?
Is it because there is a line with multiple statements on it,
or is there something else?
Personally I would not call this code non-idiomatic, with
having three statements on one line a minor style exception.
Also I think Bart should be applauded for posting code that
AFAICT falls completely within the confines of ISO C, and
even looks fairly natural, style-wise.
Michael S <already5chosen@yahoo.com> writes:
On Fri, 20 Mar 2026 14:47:18 +0100
David Brown <david.brown@hesbynett.no> wrote:
The cost of doing a comparison does not affect the complexity of
an algorithm - but it can certainly affect the best choice of
algorithm for the task in hand.
In case of lexicographic sort comparison most certainly affects
BigO. BigO is about counting elementary operations or, at least,
about counting non-elementary operations that have complexity
independent of N. In out specific case (described in the previous paragraph) an average number of elementary comparisons within
strcmp() does depend on N. One can expect that it grows with N,
because for longer N quicksort spends relatively more time
comparing strings with longer and longer common prefixes.
This reasoning isn't right. The reason it isn't is that, even as
more words are sorted, the words don't get longer. Average match
length doesn't matter. All words in the two files mentioned are, as
best I can determine, not longer than 40 characters. Because the
length is bounded, so is the time needed to do a strcmp() call. In
other words the comparison operation is O(1). Using strcmp() can
change the constant factor, but it doesn't change BigO.
I did a couple of experiments.
Here is a number of character comparisons during sorting by
quicksort with strcmp-alike comparison as a function number of
sorted strings. Inputs used for experiments were first N lines of
two long books (those that I suggested to Bart in other post in
this thread).
Book 1:
1000 40367
2000 93404
3000 156995
4000 238062
5000 322275
6000 401332
7000 483294
8000 588128
9000 707559
10000 811159
11000 969934
12000 1061037
13000 1172776
14000 1327489
15000 1456356
16000 1565355
17000 1691351
18000 1811836
19000 1971700
20000 2101477
21000 2237132
22000 2348410
23000 2492149
24000 2618858
25000 2726828
26000 2881609
27000 3017903
28000 3183088
29000 3320716
30000 3523695
31000 3719782
32000 3852290
33000 4007476
34000 4149704
35000 4228143
36000 4399583
37000 4581223
38000 4761085
39000 4919182
40000 5110223
41000 5463271
42000 5605860
43000 5830282
44000 6096857
45000 6464758
46000 6860786
47000 7101972
48000 7434769
49000 7736464
50000 8028681
51000 8334358
52000 8560822
53000 8814096
54000 8986343
55000 9162604
56000 9358948
57000 9468853
58000 9638512
59000 9863370
60000 10000734
61000 10242187
62000 10500148
63000 10654166
64000 10882232
65000 11187624
66000 11290778
67000 11481047
68000 11664433
69000 11858529
70000 12059970
71000 12173240
72000 12232160
73000 12418818
74000 12577954
75000 12786122
76000 12939916
77000 13083515
78000 13275432
79000 13616563
80000 13797048
81000 14137362
82000 14388124
83000 14667272
84000 14914600
85000 15174195
86000 15227048
87000 15414838
88000 15526278
89000 15670034
90000 15966396
91000 16120859
92000 16267285
93000 16402168
94000 16718506
95000 16968960
96000 17169618
97000 17362159
98000 17486179
99000 17944209
100000 18085095
101000 18455071
102000 18733130
102100 18769521
Book 2:
1000 104846
2000 243080
3000 399564
4000 661436
5000 855870
6000 1017042
7000 1205936
8000 1341884
9000 1511727
10000 1681973
11000 1820114
12000 1978196
13000 2081813
14000 2162091
15000 2280537
16000 2375272
17000 2500672
18000 2617364
19000 2787782
20000 2960678
21000 3149483
22000 3286894
23000 3481047
24000 3638352
25000 3842186
26000 4001105
27000 4083466
28000 4239273
29000 4399250
30000 4505316
30384 4598719
It is easy to see that at the beginning # of comparisons grows
faster than N*log(N). For the 1st book the trend continues
throughout all chart. For the 2nd book it does not.
Obviously what words (or lines) appear can affect the character
counts, but that still doesn't change BigO. By the way you don't
say whether you are sorting words or lines.
That choice has an
effect, but as long as line lengths are bounded the strcmp() calls
are still O(1).
I'd guess that the reasons behind it is that the 1st book is
relatively homogeneous, while the 2nd book consists of rather
distinct parts, likely with different characteristics of beginnings
of lines. At parts the 2nd book is full of periods, of rhythmical
prose close to poetry and at other parts it is not. I didn't have
time to dig deeper.
A more careful experiment is needed.
On Mon, 06 Apr 2026 13:48:04 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
[discussion of sorting code]
-----------------------
void isort(char** data, int ll, int rr) {
char* temp;
int i = ll, j = rr;
char* pivot = data[(ll + rr) / 2];
do {
while (strcmp(pivot, data[i]) > 0 && i < rr) ++i;
while (strcmp(pivot, data[j]) < 0 && j > ll) --j;
if (i <= j) {
temp = data[i]; data[i] = data[j]; data[j] = temp;
++i;
--j;
}
} while (i <= j);
if (ll < j) isort(data, ll, j);
if (i < rr) isort(data, i, rr);
}
//For a char* array A of N elements, call as 'isort(A, 0, n-1)'.
Looks o.k. speed wise.
It is not ideomatic C,
What about the code prompts you to say it is not idiomatic C?
Is it because there is a line with multiple statements on it,
or is there something else?
Something else.
Here is a mechanical translation of Bart's code to what I consider more idiomatic C. Pay attention, the translation was mechanical, I didn't
check if original logic is 100% correct and whether it is possible to
omit some checks.
void isort(char** data, int len) {
if (len > 1) {
char** i = data, j = &data[len-1];
char* pivot = data[(len-1) / 2];
do {
while (strcmp(pivot, *i) > 0 && i < &data[len-1]) ++i;
while (strcmp(pivot, *j) < 0 && j > data) --j;
if (i <= j) {
char* temp = *i; *i = *j; *j = temp;
++i;
--j;
}
} while (i <= j);
if (j > data) isort(data, j+1-data);
if (i < &data[len-1]) isort(i, &data[len]-i);
}
}
Whether i and j are pointers or integers is less important.
But passing into call two indices instead of one and the fact that the
second index points into last element of araay instead of the next
place after last element is certainly non-idiomatic.
On Thu, 19 Mar 2026 14:49:16 +0000
Bart <bc@freeuk.com> wrote:
On 19/03/2026 09:19, Michael S wrote:
On Wed, 18 Mar 2026 11:20:03 +0100
David Brown <david.brown@hesbynett.no> wrote:
Normally however (and again in scripting code) I'd use my built-in
sort() based on quicksort, which is nearly 1000 times faster than
bubble-sort for my test (sort 20K random strings), and some 300x
faster than your routines. It's not O(n-squared) either.
For lexicographic sort of 20K random strings, plain quicksort is
probably quite sub-optimal.
If performance is important, I'd consider combined method: first
pre-sort by 3-char or 4-char prefix with Radix sort ('by LS character
first' variation of algorithm), then use quicksprt to sort sections
with the same prefix. For string taken from the real world it will not
work as well as for artificial random strings, but should still
significantly outperform plain quicksort.
On Mon, 06 Apr 2026 15:13:32 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Obviously what words (or lines) appear can affect the character
counts, but that still doesn't change BigO. By the way you don't
say whether you are sorting words or lines.
This sub-thread is about sorting lines with average length of few
dozens characters, i.e. many times longer than log2(N). That was
stated at one of earlier posts.
Michael S <already5chosen@yahoo.com> writes:
On Mon, 06 Apr 2026 15:13:32 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Obviously what words (or lines) appear can affect the character
counts, but that still doesn't change BigO. By the way you don't
say whether you are sorting words or lines.
This sub-thread is about sorting lines with average length of few
dozens characters, i.e. many times longer than log2(N). That was
stated at one of earlier posts.
That has nothing to do with BigO, which is about asymptotic behavior
as N goes to infinity.
On Wed, 18 Mar 2026 11:20:03 +0100[...]
David Brown <david.brown@hesbynett.no> wrote:
I don't think pure bubblesort has much serious use outside
educational purposes, but it is easy to understand, easy to
implement correctly in pretty much any language, and can do a
perfectly good job if you don't need efficient sorting of large
datasets (for small enough datasets, a hand-written bubblesort
in C will be faster than calling qsort).
IMHO, Bubble Sort is harder to understand then Straight Select
Sort.
My conclusions...
The right place to start on sorting algorithms is with bubble
sort, not because it's a good algorithm, but because it's the
easiest one to show, and what is more important it's a good way
to teach about how algorithms evolve.
There is no reason to use the knuth version of bubble sort.
Compared to ordinary bubble sort the upside is small and not
worth the downside of needing more intricate code.
If there is a good chance that input starts off being close to
sorted, ping pong sort can be considered.
In almost all cases prefer binary insertion sort to plain
insertion sort, not because the average case is better but
because the worst case is better. Moreover the extra code needed
is small and easily understood.
Writing a full-scale sorting algorithm, such as a replacement for
qsort, is a non-trivial undertaking.
Other considerations apply if the sort needs to be stable, but
surely this posting is long enough already. :)
Michael S <already5chosen@yahoo.com> writes:
On Mon, 06 Apr 2026 15:13:32 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Obviously what words (or lines) appear can affect the character
counts, but that still doesn't change BigO. By the way you don't
say whether you are sorting words or lines.
This sub-thread is about sorting lines with average length of few
dozens characters, i.e. many times longer than log2(N). That was
stated at one of earlier posts.
That has nothing to do with BigO, which is about asymptotic behavior
as N goes to infinity.
On Mon, 06 Apr 2026 13:48:04 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
[discussion of sorting code]
-----------------------
void isort(char** data, int ll, int rr) {
char* temp;
int i = ll, j = rr;
char* pivot = data[(ll + rr) / 2];
do {
while (strcmp(pivot, data[i]) > 0 && i < rr) ++i;
while (strcmp(pivot, data[j]) < 0 && j > ll) --j;
if (i <= j) {
temp = data[i]; data[i] = data[j]; data[j] = temp;
++i;
--j;
}
} while (i <= j);
if (ll < j) isort(data, ll, j);
if (i < rr) isort(data, i, rr);
}
//For a char* array A of N elements, call as 'isort(A, 0, n-1)'.
Looks o.k. speed wise.
It is not ideomatic C,
What about the code prompts you to say it is not idiomatic C?
Is it because there is a line with multiple statements on it,
or is there something else?
Something else.
Here is a mechanical translation of Bart's code to what I consider more idiomatic C. Pay attention, the translation was mechanical, I didn't
check if original logic is 100% correct and whether it is possible to
omit some checks.
void isort(char** data, int len) {
if (len > 1) {
char** i = data, j = &data[len-1];
char* pivot = data[(len-1) / 2];
do {
while (strcmp(pivot, *i) > 0 && i < &data[len-1]) ++i;
while (strcmp(pivot, *j) < 0 && j > data) --j;
if (i <= j) {
char* temp = *i; *i = *j; *j = temp;
++i;
--j;
}
} while (i <= j);
if (j > data) isort(data, j+1-data);
if (i < &data[len-1]) isort(i, &data[len]-i);
}
}
Whether i and j are pointers or integers is less important.
But passing into call two indices instead of one and the fact that the
second index points into last element of araay instead of the next
place after last element is certainly non-idiomatic.
bubble 75638850 37391291 100.000
knuth bubble 75611850 37391291 99.964
gnome 74794859 37391291 98.884
ping pong 49983149 37391291 66.081
selection 75638850 12289 100.000
insertion 37403579 37391291 49.450
titanic 75638850 37391291 100.000
seven up 37403579 37391291 49.450
heapsort 309459 168889 0.409
quicksort 192033 117831 0.254
binary insertion 150102 37391291 0.198
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Mon, 06 Apr 2026 15:13:32 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Obviously what words (or lines) appear can affect the character
counts, but that still doesn't change BigO. By the way you don't
say whether you are sorting words or lines.
This sub-thread is about sorting lines with average length of few
dozens characters, i.e. many times longer than log2(N). That was
stated at one of earlier posts.
That has nothing to do with BigO, which is about asymptotic
behavior as N goes to infinity.
Honest Big(O) varies length of the key with N. In practical range
key length may be constant, but fixing length gives unrealistic
problem for Big(O) analysis: without varying key length there are
finitely many keys and sorting is equivalent to counting how many
times each key appears in the input.
On Mon, 06 Apr 2026 21:00:46 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Mon, 06 Apr 2026 15:13:32 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Obviously what words (or lines) appear can affect the character
counts, but that still doesn't change BigO. By the way you don't
say whether you are sorting words or lines.
This sub-thread is about sorting lines with average length of few
dozens characters, i.e. many times longer than log2(N). That was
stated at one of earlier posts.
That has nothing to do with BigO, which is about asymptotic behavior
as N goes to infinity.
So, how do you call part of the curve from 1e2 to 3e5 lines? MiddleO ?
Michael S <already5chosen@yahoo.com> writes:
On Mon, 06 Apr 2026 21:00:46 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Mon, 06 Apr 2026 15:13:32 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Obviously what words (or lines) appear can affect the character
counts, but that still doesn't change BigO. By the way you don't
say whether you are sorting words or lines.
This sub-thread is about sorting lines with average length of few
dozens characters, i.e. many times longer than log2(N). That was
stated at one of earlier posts.
That has nothing to do with BigO, which is about asymptotic behavior
as N goes to infinity.
So, how do you call part of the curve from 1e2 to 3e5 lines? MiddleO ?
Consider a thought experiment. We have a qsort-like sorting function,
for the sake of discussion let's say it uses heapsort. It is widely >understood that heapsort is an O(N*log(N)) algorithm (not counting the >theoretical objections mentioned in Waldek Hebisch's post and my
response to those comments).
Now suppose we have two distinct invocations of said function. In
both cases the arguments are length 1000 character strings, having
only spaces and alphabetic characters, and no duplicates between the
strings. In one call all the strings have distinct values in the
first six characters, and in the other call the strings are all the
same in the first 993 characters, differing only in the last six
characters. The call to the sorting function points back to a
function that uses strcmp() to do its comparisons.
The heapsort algorithm is well-known to be N log N. All the strings
in both sorts are exactly 1000 characters, so the strcmp() calls have
a fixed maximum time cost. And yet one of these sorts runs almost 200
times as fast as the other.
I think what you're seeing is a data dependency that is orthogonal to
the time complexity of the algorithm. For example, the travelling
salesman problem is widely known to be NP complete, and yet on some
inputs a solution can be found very quickly.
BigO is about worst case
behavior, or sometimes average case behavior. Other kinds of behavior
can depend dramatically on the nature of the inputs.
antispam@fricas.org (Waldek Hebisch) writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Mon, 06 Apr 2026 15:13:32 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Obviously what words (or lines) appear can affect the character
counts, but that still doesn't change BigO. By the way you don't
say whether you are sorting words or lines.
This sub-thread is about sorting lines with average length of few
dozens characters, i.e. many times longer than log2(N). That was
stated at one of earlier posts.
That has nothing to do with BigO, which is about asymptotic
behavior as N goes to infinity.
Honest Big(O) varies length of the key with N. In practical range
key length may be constant, but fixing length gives unrealistic
problem for Big(O) analysis: without varying key length there are
finitely many keys and sorting is equivalent to counting how many
times each key appears in the input.
There's an important clarification to make here. There are two
independent parameters: N, the number of records to be sorted (a
record being a character string that is either a word or a line),
and the (maximum) length of any record, which in the discussion is
bounded above by a constant.
What is being asked about is the behavior as a function of N as N
increases without bound. Of course, theoretically, as the number of
records increases without bound, eventually the character strings
being sorted will have to have duplicates. But long before that
happens the index variable N will run out of bits. This property is
well understood in theoretical computer science, not just in terms
of how much time is used but how much storage is needed. In theory
log N bits are needed just to hold the index pointers. It is
customary though, outside of purely theoretical discussions, to
ignore that and treat the size of an index or pointer variable as
constant. In purely theoretical terms no sorting algorithm is
O(N*log(N)), because just incrementing a pointer takes more than
O(1) operations. Surely the discussions in Knuth's books take such
things into consideration.
On the practical side, which almost
always covers discussions that take place in usenet newsgroups,
these minor theoretical issues are ignored. Any actul computer in
the physical universe will never have occasion to process more than
2**512 records, due to the limitation of the number of elementary
particles in the universe, so a 512-bit address (or index value)
always suffices.
So yes, in theory, the considerations around processing an enormous
number of values are relevant. In the practical context of the
discussion underway here, they aren't.
In article <863416xid5.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
antispam@fricas.org (Waldek Hebisch) writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Mon, 06 Apr 2026 15:13:32 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Obviously what words (or lines) appear can affect the character
counts, but that still doesn't change BigO. By the way you
don't say whether you are sorting words or lines.
This sub-thread is about sorting lines with average length of few
dozens characters, i.e. many times longer than log2(N). That was
stated at one of earlier posts.
That has nothing to do with BigO, which is about asymptotic
behavior as N goes to infinity.
Honest Big(O) varies length of the key with N. In practical range
key length may be constant, but fixing length gives unrealistic
problem for Big(O) analysis: without varying key length there are
finitely many keys and sorting is equivalent to counting how many
times each key appears in the input.
There's an important clarification to make here. There are two
independent parameters: N, the number of records to be sorted (a
record being a character string that is either a word or a line),
and the (maximum) length of any record, which in the discussion is
bounded above by a constant.
What is being asked about is the behavior as a function of N as N
increases without bound. Of course, theoretically, as the number of >records increases without bound, eventually the character strings
being sorted will have to have duplicates. But long before that
happens the index variable N will run out of bits. This property is
well understood in theoretical computer science, not just in terms
of how much time is used but how much storage is needed. In theory
log N bits are needed just to hold the index pointers. It is
customary though, outside of purely theoretical discussions, to
ignore that and treat the size of an index or pointer variable as
constant. In purely theoretical terms no sorting algorithm is
O(N*log(N)), because just incrementing a pointer takes more than
O(1) operations. Surely the discussions in Knuth's books take such
things into consideration.
If by "Knuth's books" you're referring to TAOCP, then he does
not seem to give it too much attention. At the start of Volume
3, he spends considerable time discussing the combinatorics of
permutations as prefatory to discussing sorting, but if he talks
about the complexity of key comparison at all, it's not an
extended treatment. He refers to a "<" operation for comparing
keys, and he does talk about "multiprecision keys" and
lexiographic orderings, but doesn't spend a lot of ink talking
about how it might be implemented; one of the exercises
acknowledges that this can have a significant impact on
performance, but doesn't go into much detail. Another excise
challenges the reader to write a MIX program for comparing keys,
but again, doesn't give details about complexity analysis _of
comparison_. There is another exercise that talks about this
tangentially, in which he suggests sorting (it's kind of implied
of text data) slows down as the sort nears completion, since
more and more of the key value is now the same, and suggests
keeping track of the length of the common prefix to use as an
offset for subsequent comparisons.
I have seen the notion that the actual time required for
individual operations should be taken into account when
analyzing their time complexity does appear in other books;
Dasgupta, Papadimitriou, and Vazirani (http://algorithmics.lsi.upc.edu/docs/Dasgupta-Papadimitriou-Vazirani.pdf) talk about this in the context of computing Fibonacci numbers,
for example.
On the practical side, which almost
always covers discussions that take place in usenet newsgroups,
these minor theoretical issues are ignored. Any actul computer in
the physical universe will never have occasion to process more than
2**512 records, due to the limitation of the number of elementary
particles in the universe, so a 512-bit address (or index value)
always suffices.
So yes, in theory, the considerations around processing an enormous
number of values are relevant. In the practical context of the
discussion underway here, they aren't.
Indeed. As Rob Pike once put it, "Fancy algorithms are slow
when $n$ is small, and $n$ is usually small. Fancy algorithms
have big constants. Until you know that $n$ is frequently going
to get big, don't get fancy."
- Dan C.
antispam@fricas.org (Waldek Hebisch) writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Mon, 06 Apr 2026 15:13:32 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Obviously what words (or lines) appear can affect the character
counts, but that still doesn't change BigO. By the way you don't
say whether you are sorting words or lines.
This sub-thread is about sorting lines with average length of few
dozens characters, i.e. many times longer than log2(N). That was
stated at one of earlier posts.
That has nothing to do with BigO, which is about asymptotic
behavior as N goes to infinity.
Honest Big(O) varies length of the key with N. In practical range
key length may be constant, but fixing length gives unrealistic
problem for Big(O) analysis: without varying key length there are
finitely many keys and sorting is equivalent to counting how many
times each key appears in the input.
There's an important clarification to make here. There are two
independent parameters: N, the number of records to be sorted (a
record being a character string that is either a word or a line),
and the (maximum) length of any record, which in the discussion is
bounded above by a constant.
What is being asked about is the behavior as a function of N as N
increases without bound. Of course, theoretically, as the number of
records increases without bound, eventually the character strings
being sorted will have to have duplicates. But long before that
happens the index variable N will run out of bits. This property is
well understood in theoretical computer science, not just in terms
of how much time is used but how much storage is needed. In theory
log N bits are needed just to hold the index pointers. It is
customary though, outside of purely theoretical discussions, to
ignore that and treat the size of an index or pointer variable as
constant. In purely theoretical terms no sorting algorithm is
O(N*log(N)), because just incrementing a pointer takes more than
O(1) operations. Surely the discussions in Knuth's books take such
things into consideration. On the practical side, which almost
always covers discussions that take place in usenet newsgroups,
these minor theoretical issues are ignored. Any actul computer in
the physical universe will never have occasion to process more than
2**512 records, due to the limitation of the number of elementary
particles in the universe, so a 512-bit address (or index value)
always suffices.
So yes, in theory, the considerations around processing an enormous
number of values are relevant. In the practical context of the
discussion underway here, they aren't.
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
antispam@fricas.org (Waldek Hebisch) writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Mon, 06 Apr 2026 15:13:32 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Obviously what words (or lines) appear can affect the character
counts, but that still doesn't change BigO. By the way you don't
say whether you are sorting words or lines.
This sub-thread is about sorting lines with average length of few
dozens characters, i.e. many times longer than log2(N). That was
stated at one of earlier posts.
That has nothing to do with BigO, which is about asymptotic
behavior as N goes to infinity.
Honest Big(O) varies length of the key with N. In practical range
key length may be constant, but fixing length gives unrealistic
problem for Big(O) analysis: without varying key length there are
finitely many keys and sorting is equivalent to counting how many
times each key appears in the input.
There's an important clarification to make here. There are two
independent parameters: N, the number of records to be sorted (a
record being a character string that is either a word or a line),
and the (maximum) length of any record, which in the discussion is
bounded above by a constant.
There is one choice, made often to simplify problem. But there
is quite universal choice: N is total size of input data.
In article <86y0iyvypl.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Mon, 06 Apr 2026 21:00:46 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Mon, 06 Apr 2026 15:13:32 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Obviously what words (or lines) appear can affect the character
counts, but that still doesn't change BigO. By the way you don't
say whether you are sorting words or lines.
This sub-thread is about sorting lines with average length of few
dozens characters, i.e. many times longer than log2(N). That was
stated at one of earlier posts.
That has nothing to do with BigO, which is about asymptotic behavior
as N goes to infinity.
So, how do you call part of the curve from 1e2 to 3e5 lines? MiddleO ?
Consider a thought experiment. We have a qsort-like sorting function,
for the sake of discussion let's say it uses heapsort. It is widely
understood that heapsort is an O(N*log(N)) algorithm (not counting the
theoretical objections mentioned in Waldek Hebisch's post and my
response to those comments).
Now suppose we have two distinct invocations of said function. In
both cases the arguments are length 1000 character strings, having
only spaces and alphabetic characters, and no duplicates between the
strings. In one call all the strings have distinct values in the
first six characters, and in the other call the strings are all the
same in the first 993 characters, differing only in the last six
characters. The call to the sorting function points back to a
function that uses strcmp() to do its comparisons.
The heapsort algorithm is well-known to be N log N. All the strings
in both sorts are exactly 1000 characters, so the strcmp() calls have
a fixed maximum time cost. And yet one of these sorts runs almost 200
times as fast as the other.
I think what you're seeing is a data dependency that is orthogonal to
the time complexity of the algorithm. For example, the travelling
salesman problem is widely known to be NP complete, and yet on some
inputs a solution can be found very quickly.
A useful thought experiment, but it is perhaps easier to explain
that the asymptotic time complexity of a sorting algorithm, for
example heapsort being O(n log n), is expressed in terms of the
number of comparison operations required by the algorithm,
independent of the complexity of those comparisons.
BigO is about worst case behavior, or sometimes average case
behavior. Other kinds of behavior can depend dramatically on the
nature of the inputs.
I disagree with this. The "Big-O" of an algorithm isn't about
worst-case behavior, it's a notation for categorizing which term
dominates in an expression describing the complexity of an
algorithm, as the number of inputs to that algorithm grows.
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <86y0iyvypl.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Mon, 06 Apr 2026 21:00:46 -0700Consider a thought experiment. We have a qsort-like sorting function,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Mon, 06 Apr 2026 15:13:32 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Obviously what words (or lines) appear can affect the character
counts, but that still doesn't change BigO. By the way you don't >>>>>>> say whether you are sorting words or lines.
This sub-thread is about sorting lines with average length of few
dozens characters, i.e. many times longer than log2(N). That was
stated at one of earlier posts.
That has nothing to do with BigO, which is about asymptotic behavior >>>>> as N goes to infinity.
So, how do you call part of the curve from 1e2 to 3e5 lines? MiddleO ? >>>
for the sake of discussion let's say it uses heapsort. It is widely
understood that heapsort is an O(N*log(N)) algorithm (not counting the
theoretical objections mentioned in Waldek Hebisch's post and my
response to those comments).
Now suppose we have two distinct invocations of said function. In
both cases the arguments are length 1000 character strings, having
only spaces and alphabetic characters, and no duplicates between the
strings. In one call all the strings have distinct values in the
first six characters, and in the other call the strings are all the
same in the first 993 characters, differing only in the last six
characters. The call to the sorting function points back to a
function that uses strcmp() to do its comparisons.
The heapsort algorithm is well-known to be N log N. All the strings
in both sorts are exactly 1000 characters, so the strcmp() calls have
a fixed maximum time cost. And yet one of these sorts runs almost 200
times as fast as the other.
I think what you're seeing is a data dependency that is orthogonal to
the time complexity of the algorithm. For example, the travelling
salesman problem is widely known to be NP complete, and yet on some
inputs a solution can be found very quickly.
A useful thought experiment, but it is perhaps easier to explain
that the asymptotic time complexity of a sorting algorithm, for
example heapsort being O(n log n), is expressed in terms of the
number of comparison operations required by the algorithm,
independent of the complexity of those comparisons.
No. This misses the essence of what I was trying to convey.
BigO is about worst case behavior, or sometimes average case
behavior. Other kinds of behavior can depend dramatically on the
nature of the inputs.
I disagree with this. The "Big-O" of an algorithm isn't about
worst-case behavior, it's a notation for categorizing which term
dominates in an expression describing the complexity of an
algorithm, as the number of inputs to that algorithm grows.
You misunderstood the point of my comment. Of course O()
notation is about expressing an asymptotic upper bound of
some function. The key issue here though is not about O()
but about what function should be the one under consideration.
In article <863416xid5.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
antispam@fricas.org (Waldek Hebisch) writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Mon, 06 Apr 2026 15:13:32 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Obviously what words (or lines) appear can affect the character
counts, but that still doesn't change BigO. By the way you don't
say whether you are sorting words or lines.
This sub-thread is about sorting lines with average length of few
dozens characters, i.e. many times longer than log2(N). That was
stated at one of earlier posts.
That has nothing to do with BigO, which is about asymptotic
behavior as N goes to infinity.
Honest Big(O) varies length of the key with N. In practical range
key length may be constant, but fixing length gives unrealistic
problem for Big(O) analysis: without varying key length there are
finitely many keys and sorting is equivalent to counting how many
times each key appears in the input.
There's an important clarification to make here. There are two
independent parameters: N, the number of records to be sorted (a
record being a character string that is either a word or a line),
and the (maximum) length of any record, which in the discussion is
bounded above by a constant.
What is being asked about is the behavior as a function of N as N
increases without bound. Of course, theoretically, as the number of
records increases without bound, eventually the character strings
being sorted will have to have duplicates. But long before that
happens the index variable N will run out of bits. This property is
well understood in theoretical computer science, not just in terms
of how much time is used but how much storage is needed. In theory
log N bits are needed just to hold the index pointers. It is
customary though, outside of purely theoretical discussions, to
ignore that and treat the size of an index or pointer variable as
constant. In purely theoretical terms no sorting algorithm is
O(N*log(N)), because just incrementing a pointer takes more than
O(1) operations. Surely the discussions in Knuth's books take such
things into consideration.
If by "Knuth's books" you're referring to TAOCP, then he does
not seem to give it too much attention. [...]
On the practical side, which almost
always covers discussions that take place in usenet newsgroups,
these minor theoretical issues are ignored. Any actul computer in
the physical universe will never have occasion to process more than
2**512 records, due to the limitation of the number of elementary
particles in the universe, so a 512-bit address (or index value)
always suffices.
So yes, in theory, the considerations around processing an enormous
number of values are relevant. In the practical context of the
discussion underway here, they aren't.
Indeed. As Rob Pike once put it, "Fancy algorithms are slow
when $n$ is small, and $n$ is usually small. Fancy algorithms
have big constants. Until you know that $n$ is frequently going
to get big, don't get fancy."
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <863416xid5.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
antispam@fricas.org (Waldek Hebisch) writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Mon, 06 Apr 2026 15:13:32 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Obviously what words (or lines) appear can affect the character
counts, but that still doesn't change BigO. By the way you don't >>>>>>> say whether you are sorting words or lines.
This sub-thread is about sorting lines with average length of few
dozens characters, i.e. many times longer than log2(N). That was
stated at one of earlier posts.
That has nothing to do with BigO, which is about asymptotic
behavior as N goes to infinity.
Honest Big(O) varies length of the key with N. In practical range
key length may be constant, but fixing length gives unrealistic
problem for Big(O) analysis: without varying key length there are
finitely many keys and sorting is equivalent to counting how many
times each key appears in the input.
There's an important clarification to make here. There are two
independent parameters: N, the number of records to be sorted (a
record being a character string that is either a word or a line),
and the (maximum) length of any record, which in the discussion is
bounded above by a constant.
What is being asked about is the behavior as a function of N as N
increases without bound. Of course, theoretically, as the number of
records increases without bound, eventually the character strings
being sorted will have to have duplicates. But long before that
happens the index variable N will run out of bits. This property is
well understood in theoretical computer science, not just in terms
of how much time is used but how much storage is needed. In theory
log N bits are needed just to hold the index pointers. It is
customary though, outside of purely theoretical discussions, to
ignore that and treat the size of an index or pointer variable as
constant. In purely theoretical terms no sorting algorithm is
O(N*log(N)), because just incrementing a pointer takes more than
O(1) operations. Surely the discussions in Knuth's books take such
things into consideration.
If by "Knuth's books" you're referring to TAOCP, then he does
not seem to give it too much attention. [...]
In most of the chapter on Sorting, TAOCP uses the number of
comparisons as the basis of comparison. But not everywhere
in the chapter.
My statement was not meant to be limited to the discussion of
Sorting.
On the practical side, which almost
always covers discussions that take place in usenet newsgroups,
these minor theoretical issues are ignored. Any actul computer in
the physical universe will never have occasion to process more than
2**512 records, due to the limitation of the number of elementary
particles in the universe, so a 512-bit address (or index value)
always suffices.
So yes, in theory, the considerations around processing an enormous
number of values are relevant. In the practical context of the
discussion underway here, they aren't.
Indeed. As Rob Pike once put it, "Fancy algorithms are slow
when $n$ is small, and $n$ is usually small. Fancy algorithms
have big constants. Until you know that $n$ is frequently going
to get big, don't get fancy."
Whether the Rob Pike advisory is applicable or not is beside the
point.
My comment was about fancy mathematics, not fancy
algorithms. My statement is just as applicable to Tim Sort (one
of the fancier sorting algorithms) as it is to Bubble Sort.
On Thu, 9 Apr 2026 21:15:14 -0000 (UTC)[...]
cross@spitfire.i.gajendra.net (Dan Cross) wrote:
Indeed. As Rob Pike once put it, "Fancy algorithms are slow
when $n$ is small, and $n$ is usually small. Fancy algorithms
have big constants. Until you know that $n$ is frequently going
to get big, don't get fancy."
One case where these considerations are not at all theoretical and
where simple quicksort from the books performs very very slowly exactly because when sorting progresses each lexicographic comparison
takes more and more time, is a sorting at core of Burrows-Wheeler
transform, which in turn is at core of various compression schemes,
including bzip2. The problem hits you the worst when data set compresses well.
In specific case of bzip2, they limited block size to 900KB which
is quite low and did preprocessing on input data which often seriously impacts the quality of compression. I can't say for sure, but it seems
to me that the reason was exactly that - avoiding prohibitively slow
sorting. Were they had time and desire to use "fancy algorithms",
either combinations of bucket and non-bucket variants of radix sort, or quicksort that memorizes common prefixes, or even combination of all
three, then they would not need to use preprocessing and small blocks
and would end up with better compression ratios.
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
antispam@fricas.org (Waldek Hebisch) writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Mon, 06 Apr 2026 15:13:32 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Obviously what words (or lines) appear can affect the character
counts, but that still doesn't change BigO. By the way you don't
say whether you are sorting words or lines.
This sub-thread is about sorting lines with average length of few
dozens characters, i.e. many times longer than log2(N). That was
stated at one of earlier posts.
That has nothing to do with BigO, which is about asymptotic
behavior as N goes to infinity.
Honest Big(O) varies length of the key with N. In practical range
key length may be constant, but fixing length gives unrealistic
problem for Big(O) analysis: without varying key length there are
finitely many keys and sorting is equivalent to counting how many
times each key appears in the input.
There's an important clarification to make here. There are two
independent parameters: N, the number of records to be sorted (a
record being a character string that is either a word or a line),
and the (maximum) length of any record, which in the discussion is
bounded above by a constant.
There is one choice, made often to simplify problem. But there
is quite universal choice: N is total size of input data.
What is being asked about is the behavior as a function of N as N
increases without bound. Of course, theoretically, as the number of
records increases without bound, eventually the character strings
being sorted will have to have duplicates. But long before that
happens the index variable N will run out of bits. This property is
well understood in theoretical computer science, not just in terms
of how much time is used but how much storage is needed. In theory
log N bits are needed just to hold the index pointers. It is
customary though, outside of purely theoretical discussions, to
ignore that and treat the size of an index or pointer variable as
constant. In purely theoretical terms no sorting algorithm is
O(N*log(N)), because just incrementing a pointer takes more than
O(1) operations. Surely the discussions in Knuth's books take such
things into consideration. On the practical side, which almost
always covers discussions that take place in usenet newsgroups,
these minor theoretical issues are ignored. Any actul computer in
the physical universe will never have occasion to process more than
2**512 records, due to the limitation of the number of elementary
particles in the universe, so a 512-bit address (or index value)
always suffices.
So yes, in theory, the considerations around processing an enormous
number of values are relevant. In the practical context of the
discussion underway here, they aren't.
It was you who insisted on asymptitic complexity, rejecting
practical amendments...
antispam@fricas.org (Waldek Hebisch) writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
antispam@fricas.org (Waldek Hebisch) writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Mon, 06 Apr 2026 15:13:32 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Obviously what words (or lines) appear can affect the character
counts, but that still doesn't change BigO. By the way you don't >>>>>>> say whether you are sorting words or lines.
This sub-thread is about sorting lines with average length of few
dozens characters, i.e. many times longer than log2(N). That was
stated at one of earlier posts.
That has nothing to do with BigO, which is about asymptotic
behavior as N goes to infinity.
Honest Big(O) varies length of the key with N. In practical range
key length may be constant, but fixing length gives unrealistic
problem for Big(O) analysis: without varying key length there are
finitely many keys and sorting is equivalent to counting how many
times each key appears in the input.
There's an important clarification to make here. There are two
independent parameters: N, the number of records to be sorted (a
record being a character string that is either a word or a line),
and the (maximum) length of any record, which in the discussion is
bounded above by a constant.
There is one choice, made often to simplify problem. But there
is quite universal choice: N is total size of input data.
No, N is usually the number of records to be sorted, not the size of
the input. Also, in the discussion I was responding to, there were
clearly two distinct axes relevant to the discussion.
What is being asked about is the behavior as a function of N as N
increases without bound. Of course, theoretically, as the number of
records increases without bound, eventually the character strings
being sorted will have to have duplicates. But long before that
happens the index variable N will run out of bits. This property is
well understood in theoretical computer science, not just in terms
of how much time is used but how much storage is needed. In theory
log N bits are needed just to hold the index pointers. It is
customary though, outside of purely theoretical discussions, to
ignore that and treat the size of an index or pointer variable as
constant. In purely theoretical terms no sorting algorithm is
O(N*log(N)), because just incrementing a pointer takes more than
O(1) operations. Surely the discussions in Knuth's books take such
things into consideration. On the practical side, which almost
always covers discussions that take place in usenet newsgroups,
these minor theoretical issues are ignored. Any actul computer in
the physical universe will never have occasion to process more than
2**512 records, due to the limitation of the number of elementary
particles in the universe, so a 512-bit address (or index value)
always suffices.
So yes, in theory, the considerations around processing an enormous
number of values are relevant. In the practical context of the
discussion underway here, they aren't.
It was you who insisted on asymptitic complexity, rejecting
practical amendments...
This statement isn't right. BigO was already part of the discussion
when I joined in. Also, it is customary in discussion of sorting
algorithms to use the metric of number of comparisons done, without
regard to the size of the variables needed to hold the indices of
the records being sorted.
See Knuth chapter 5, on Sorting, in volume 3 of TAOCP.
On Tue, 07 Apr 2026 02:12:49 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
<snip>
My conclusions...
The right place to start on sorting algorithms is with bubble
sort, not because it's a good algorithm, but because it's the
easiest one to show, and what is more important it's a good way
to teach about how algorithms evolve.
If you write it in terms of moves rather than in terms of swaps,
bubble sort is not easier than insertion and selection.
There is no reason to use the knuth version of bubble sort.
Compared to ordinary bubble sort the upside is small and not
worth the downside of needing more intricate code.
knuth version has at least one merit. You claim that in practice the
it's too rare that the merit is exercised. May be, it is true. Or may
be cases of almost sorted array with misplaced elements coming in pairs
with close proximity between peers is not that uncommon. I don't know.
But merit exist.
OTOH, 'ordinary bubble sort' has no merits at all relatively to simple insertion sort.
BTW, thank you for 'boustrophedon'. Not that there is a
serious chance that I will remember the word tomorrow or even
tonight, But hopefully I'd remember that such word exists.
| Sysop: | DaiTengu |
|---|---|
| Location: | Appleton, WI |
| Users: | 1,113 |
| Nodes: | 10 (0 / 10) |
| Uptime: | 492402:22:13 |
| Calls: | 14,249 |
| Files: | 186,315 |
| D/L today: |
85 files (22,085K bytes) |
| Messages: | 2,515,148 |