• Re: Isn't that beauty ? (no it's not)

    From DFS@nospam@dfs.com to comp.lang.c on Fri Mar 27 00:35:39 2026
    From Newsgroup: comp.lang.c

    On 3/26/2026 11:08 PM, Tim Rentsch wrote:
    DFS <nospam@dfs.com> writes:

    On 3/21/2026 2:07 AM, Tim Rentsch wrote:

    DFS <nospam@dfs.com> writes:

    On 3/12/2026 2:24 AM, Bonita Montero wrote:

    There was a programming-"contest" on comp.lang.c and I wanted to show >>>>> the simpler code in C, here it is:

    [C++ code]

    C++ really rocks since you've to deal with much less details than in C. >>>>
    Is your strategy to just ignore reality, and keep making bogus claims
    that - for this challenge at least - you can't support?

    Please don't encourage people who insist on using comp.lang.c
    for other languages. It's better to just ignore them.

    It doesn't bother me.

    I'm sorry you place so little value on my participation.


    Not true at all [1]. But clc is a "No Kings" newsgroup, and nobody can dictate who or what kind of posts are made. If Montero or I annoy you, killfile us.



    It's no more off-topic than a discussion of sorting algorithms.

    That's wrong. There have been lots of posts in the discussion of
    sorting that are relevant to C. The problem with encouraging
    discussion of C++, or Python, or other such languages, is they
    tend to produce lots of postings that have nothing to do with C,
    and rarely produce postings that have anything to do with C.

    As a percent, I don't see lots of postings that have nothing to do with C.

    This group is still very clean, as opposed to the rec.sport.golf group I
    used to participate in; non-golfing interlopers completely wrecked it
    years ago.




    [1] You da man. You deserve another kudos with your performance in the
    'sort of trivial code' challenge. I was looking at that code again, and
    it's kind of mind-blowing. Is that what you do to challenge yourself?
    How long have you been writing C programs? Do you do it for a living?

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Thu Mar 26 21:53:28 2026
    From Newsgroup: comp.lang.c

    DFS <nospam@dfs.com> writes:
    [...]
    Not true at all [1]. But clc is a "No Kings" newsgroup, and nobody
    can dictate who or what kind of posts are made. If Montero or I annoy
    you, killfile us.
    [...]

    There is no authority that can enforce topicality rules in
    comp.lang.c. But come on, the name of the programming language is
    right in the name.

    A lot of other languages have their own newsgroups, some of which
    I actually read. Posts about language Foo, with no C content,
    are better posted to comp.lang.foo, or perhaps to comp.lang.misc
    if comp.lang.foo doesn't exist, or to comp.programming if the post
    is more about algorithms than about the language, or ....

    I dislike posts here that have no C content at all, and there have
    been too many such posts lately.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.c on Fri Mar 27 17:03:16 2026
    From Newsgroup: comp.lang.c

    In article <10q51er$3ec1o$3@dont-email.me>, DFS <nospam@dfs.com> wrote:
    ...
    (Tim wrote)
    I'm sorry you place so little value on my participation.

    Not true at all [1]. But clc is a "No Kings" newsgroup, and nobody can >dictate who or what kind of posts are made. If Montero or I annoy you, >killfile us.

    I think yuo get this week's "Totally Missing the Point" award.
    --
    I've learned that people will forget what you said, people will forget
    what you did, but people will never forget how you made them feel.

    - Maya Angelou -
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Apr 6 12:32:48 2026
    From Newsgroup: comp.lang.c

    Bart <bc@freeuk.com> writes:

    On 21/03/2026 02:35, DFS wrote:

    On 3/20/2026 9:53 PM, Janis Papanagnou wrote:

    On 2026-03-20 22:10, DFS wrote:

    Sometimes listening to you clc guys is like listening to a room
    full of CS professors (which I suspect some of you are or were).

    CS professors sitting in their ivory tower and without practical
    experiences, and programmers without a substantial CS background;
    both of these extreme characters can be problematic (if fanatic).

    It was meant as a compliment. Plenty of CS professors have good
    practical and industry experience, too, which you'll see on their
    bios and cv's.

    It's got to be tough to find good CS teachers that stick around,
    given they can probably make much more money in private industry.


    Many folks here seem to have a good mix of necessary practical
    IT, Project, and CS knowledge and proficiencies, as I'd value it.
    And a clear mind to discuss topics. Sometimes it gets heated and
    personal, though; certainly nothing to foster.

    I've definitely seen Bart take some heat for continuing to post code
    in his scripting language, rather than C

    People post bits of code in all sorts of languages when it suits
    them. It's happened in this thread (Bash, AWK, Python as well as C++),
    and in many others.

    But I get a lot of heat because I use my personal languages,

    You get heat because you ignore the rules of usual newgroup
    etiquette, which most other posters observe, and because you
    insist on focusing on your self-centered interests, without
    regard or consideration for other readers or the chartered
    purpose of the group. Besides being better for the group,
    it would be better for you if you could learn to be less
    self-focused, and more considerate of others.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Apr 6 13:23:02 2026
    From Newsgroup: comp.lang.c

    antispam@fricas.org (Waldek Hebisch) writes:

    [...]

    Heapsort is only marginaly more complex than bubble sort [...]

    I think this claim is ridiculous. Code for heapsort is more than
    twice as long as code for bubble sort, and is both harder to write
    and harder to understand. Heapsort really needs two functions
    rather than one. All of the simple sorts (bubble sort, selection
    sort, insertion sort, merge sort) are IME easily coded without
    needing any significant thought. Not so for heapsort. Even
    quicksort, which isn't trivial, is easier than heapsort.

    On average heapsort is not as fast as quicksort (main factor seem
    to by much worse cache behaviour),

    More to say on that matter; saving for another reply and hoping to
    get to that soon.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Apr 6 13:48:04 2026
    From Newsgroup: comp.lang.c

    Michael S <already5chosen@yahoo.com> writes:

    [discussion of sorting code]

    -----------------------
    void isort(char** data, int ll, int rr) {
    char* temp;
    int i = ll, j = rr;
    char* pivot = data[(ll + rr) / 2];

    do {
    while (strcmp(pivot, data[i]) > 0 && i < rr) ++i;
    while (strcmp(pivot, data[j]) < 0 && j > ll) --j;

    if (i <= j) {
    temp = data[i]; data[i] = data[j]; data[j] = temp;
    ++i;
    --j;
    }
    } while (i <= j);

    if (ll < j) isort(data, ll, j);
    if (i < rr) isort(data, i, rr);
    }

    //For a char* array A of N elements, call as 'isort(A, 0, n-1)'.

    Looks o.k. speed wise.
    It is not ideomatic C,

    What about the code prompts you to say it is not idiomatic C?
    Is it because there is a line with multiple statements on it,
    or is there something else?

    Personally I would not call this code non-idiomatic, with
    having three statements on one line a minor style exception.

    Also I think Bart should be applauded for posting code that
    AFAICT falls completely within the confines of ISO C, and
    even looks fairly natural, style-wise.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Apr 6 15:13:32 2026
    From Newsgroup: comp.lang.c

    Michael S <already5chosen@yahoo.com> writes:

    On Fri, 20 Mar 2026 14:47:18 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    The cost of doing a comparison does not affect the complexity of
    an algorithm - but it can certainly affect the best choice of
    algorithm for the task in hand.

    In case of lexicographic sort comparison most certainly affects
    BigO. BigO is about counting elementary operations or, at least,
    about counting non-elementary operations that have complexity
    independent of N. In out specific case (described in the previous
    paragraph) an average number of elementary comparisons within
    strcmp() does depend on N. One can expect that it grows with N,
    because for longer N quicksort spends relatively more time
    comparing strings with longer and longer common prefixes.

    This reasoning isn't right. The reason it isn't is that, even as
    more words are sorted, the words don't get longer. Average match
    length doesn't matter. All words in the two files mentioned are, as
    best I can determine, not longer than 40 characters. Because the
    length is bounded, so is the time needed to do a strcmp() call. In
    other words the comparison operation is O(1). Using strcmp() can
    change the constant factor, but it doesn't change BigO.

    I did a couple of experiments.
    Here is a number of character comparisons during sorting by
    quicksort with strcmp-alike comparison as a function number of
    sorted strings. Inputs used for experiments were first N lines of
    two long books (those that I suggested to Bart in other post in
    this thread).


    Book 1:
    1000 40367
    2000 93404
    3000 156995
    4000 238062
    5000 322275
    6000 401332
    7000 483294
    8000 588128
    9000 707559
    10000 811159
    11000 969934
    12000 1061037
    13000 1172776
    14000 1327489
    15000 1456356
    16000 1565355
    17000 1691351
    18000 1811836
    19000 1971700
    20000 2101477
    21000 2237132
    22000 2348410
    23000 2492149
    24000 2618858
    25000 2726828
    26000 2881609
    27000 3017903
    28000 3183088
    29000 3320716
    30000 3523695
    31000 3719782
    32000 3852290
    33000 4007476
    34000 4149704
    35000 4228143
    36000 4399583
    37000 4581223
    38000 4761085
    39000 4919182
    40000 5110223
    41000 5463271
    42000 5605860
    43000 5830282
    44000 6096857
    45000 6464758
    46000 6860786
    47000 7101972
    48000 7434769
    49000 7736464
    50000 8028681
    51000 8334358
    52000 8560822
    53000 8814096
    54000 8986343
    55000 9162604
    56000 9358948
    57000 9468853
    58000 9638512
    59000 9863370
    60000 10000734
    61000 10242187
    62000 10500148
    63000 10654166
    64000 10882232
    65000 11187624
    66000 11290778
    67000 11481047
    68000 11664433
    69000 11858529
    70000 12059970
    71000 12173240
    72000 12232160
    73000 12418818
    74000 12577954
    75000 12786122
    76000 12939916
    77000 13083515
    78000 13275432
    79000 13616563
    80000 13797048
    81000 14137362
    82000 14388124
    83000 14667272
    84000 14914600
    85000 15174195
    86000 15227048
    87000 15414838
    88000 15526278
    89000 15670034
    90000 15966396
    91000 16120859
    92000 16267285
    93000 16402168
    94000 16718506
    95000 16968960
    96000 17169618
    97000 17362159
    98000 17486179
    99000 17944209
    100000 18085095
    101000 18455071
    102000 18733130
    102100 18769521

    Book 2:
    1000 104846
    2000 243080
    3000 399564
    4000 661436
    5000 855870
    6000 1017042
    7000 1205936
    8000 1341884
    9000 1511727
    10000 1681973
    11000 1820114
    12000 1978196
    13000 2081813
    14000 2162091
    15000 2280537
    16000 2375272
    17000 2500672
    18000 2617364
    19000 2787782
    20000 2960678
    21000 3149483
    22000 3286894
    23000 3481047
    24000 3638352
    25000 3842186
    26000 4001105
    27000 4083466
    28000 4239273
    29000 4399250
    30000 4505316
    30384 4598719

    It is easy to see that at the beginning # of comparisons grows faster
    than N*log(N). For the 1st book the trend continues throughout all
    chart. For the 2nd book it does not.

    Obviously what words (or lines) appear can affect the character
    counts, but that still doesn't change BigO. By the way you don't
    say whether you are sorting words or lines. That choice has an
    effect, but as long as line lengths are bounded the strcmp() calls
    are still O(1).

    I'd guess that the reasons behind it is that the 1st book is
    relatively homogeneous, while the 2nd book consists of rather
    distinct parts, likely with different characteristics of beginnings
    of lines. At parts the 2nd book is full of periods, of rhythmical
    prose close to poetry and at other parts it is not. I didn't have
    time to dig deeper.

    A more careful experiment is needed.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Tue Apr 7 01:58:23 2026
    From Newsgroup: comp.lang.c

    On Mon, 06 Apr 2026 13:48:04 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [discussion of sorting code]

    -----------------------
    void isort(char** data, int ll, int rr) {
    char* temp;
    int i = ll, j = rr;
    char* pivot = data[(ll + rr) / 2];

    do {
    while (strcmp(pivot, data[i]) > 0 && i < rr) ++i;
    while (strcmp(pivot, data[j]) < 0 && j > ll) --j;

    if (i <= j) {
    temp = data[i]; data[i] = data[j]; data[j] = temp;
    ++i;
    --j;
    }
    } while (i <= j);

    if (ll < j) isort(data, ll, j);
    if (i < rr) isort(data, i, rr);
    }

    //For a char* array A of N elements, call as 'isort(A, 0, n-1)'.

    Looks o.k. speed wise.
    It is not ideomatic C,

    What about the code prompts you to say it is not idiomatic C?
    Is it because there is a line with multiple statements on it,
    or is there something else?


    Something else.
    Here is a mechanical translation of Bart's code to what I consider more idiomatic C. Pay attention, the translation was mechanical, I didn't
    check if original logic is 100% correct and whether it is possible to
    omit some checks.

    void isort(char** data, int len) {
    if (len > 1) {
    char** i = data, j = &data[len-1];
    char* pivot = data[(len-1) / 2];
    do {
    while (strcmp(pivot, *i) > 0 && i < &data[len-1]) ++i;
    while (strcmp(pivot, *j) < 0 && j > data) --j;
    if (i <= j) {
    char* temp = *i; *i = *j; *j = temp;
    ++i;
    --j;
    }
    } while (i <= j);
    if (j > data) isort(data, j+1-data);
    if (i < &data[len-1]) isort(i, &data[len]-i);
    }
    }

    Whether i and j are pointers or integers is less important.
    But passing into call two indices instead of one and the fact that the
    second index points into last element of araay instead of the next
    place after last element is certainly non-idiomatic.


    Personally I would not call this code non-idiomatic, with
    having three statements on one line a minor style exception.

    Also I think Bart should be applauded for posting code that
    AFAICT falls completely within the confines of ISO C, and
    even looks fairly natural, style-wise.


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Tue Apr 7 02:22:13 2026
    From Newsgroup: comp.lang.c

    On Mon, 06 Apr 2026 15:13:32 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Fri, 20 Mar 2026 14:47:18 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    The cost of doing a comparison does not affect the complexity of
    an algorithm - but it can certainly affect the best choice of
    algorithm for the task in hand.

    In case of lexicographic sort comparison most certainly affects
    BigO. BigO is about counting elementary operations or, at least,
    about counting non-elementary operations that have complexity
    independent of N. In out specific case (described in the previous paragraph) an average number of elementary comparisons within
    strcmp() does depend on N. One can expect that it grows with N,
    because for longer N quicksort spends relatively more time
    comparing strings with longer and longer common prefixes.

    This reasoning isn't right. The reason it isn't is that, even as
    more words are sorted, the words don't get longer. Average match
    length doesn't matter. All words in the two files mentioned are, as
    best I can determine, not longer than 40 characters. Because the
    length is bounded, so is the time needed to do a strcmp() call. In
    other words the comparison operation is O(1). Using strcmp() can
    change the constant factor, but it doesn't change BigO.

    I did a couple of experiments.
    Here is a number of character comparisons during sorting by
    quicksort with strcmp-alike comparison as a function number of
    sorted strings. Inputs used for experiments were first N lines of
    two long books (those that I suggested to Bart in other post in
    this thread).


    Book 1:
    1000 40367
    2000 93404
    3000 156995
    4000 238062
    5000 322275
    6000 401332
    7000 483294
    8000 588128
    9000 707559
    10000 811159
    11000 969934
    12000 1061037
    13000 1172776
    14000 1327489
    15000 1456356
    16000 1565355
    17000 1691351
    18000 1811836
    19000 1971700
    20000 2101477
    21000 2237132
    22000 2348410
    23000 2492149
    24000 2618858
    25000 2726828
    26000 2881609
    27000 3017903
    28000 3183088
    29000 3320716
    30000 3523695
    31000 3719782
    32000 3852290
    33000 4007476
    34000 4149704
    35000 4228143
    36000 4399583
    37000 4581223
    38000 4761085
    39000 4919182
    40000 5110223
    41000 5463271
    42000 5605860
    43000 5830282
    44000 6096857
    45000 6464758
    46000 6860786
    47000 7101972
    48000 7434769
    49000 7736464
    50000 8028681
    51000 8334358
    52000 8560822
    53000 8814096
    54000 8986343
    55000 9162604
    56000 9358948
    57000 9468853
    58000 9638512
    59000 9863370
    60000 10000734
    61000 10242187
    62000 10500148
    63000 10654166
    64000 10882232
    65000 11187624
    66000 11290778
    67000 11481047
    68000 11664433
    69000 11858529
    70000 12059970
    71000 12173240
    72000 12232160
    73000 12418818
    74000 12577954
    75000 12786122
    76000 12939916
    77000 13083515
    78000 13275432
    79000 13616563
    80000 13797048
    81000 14137362
    82000 14388124
    83000 14667272
    84000 14914600
    85000 15174195
    86000 15227048
    87000 15414838
    88000 15526278
    89000 15670034
    90000 15966396
    91000 16120859
    92000 16267285
    93000 16402168
    94000 16718506
    95000 16968960
    96000 17169618
    97000 17362159
    98000 17486179
    99000 17944209
    100000 18085095
    101000 18455071
    102000 18733130
    102100 18769521

    Book 2:
    1000 104846
    2000 243080
    3000 399564
    4000 661436
    5000 855870
    6000 1017042
    7000 1205936
    8000 1341884
    9000 1511727
    10000 1681973
    11000 1820114
    12000 1978196
    13000 2081813
    14000 2162091
    15000 2280537
    16000 2375272
    17000 2500672
    18000 2617364
    19000 2787782
    20000 2960678
    21000 3149483
    22000 3286894
    23000 3481047
    24000 3638352
    25000 3842186
    26000 4001105
    27000 4083466
    28000 4239273
    29000 4399250
    30000 4505316
    30384 4598719

    It is easy to see that at the beginning # of comparisons grows
    faster than N*log(N). For the 1st book the trend continues
    throughout all chart. For the 2nd book it does not.

    Obviously what words (or lines) appear can affect the character
    counts, but that still doesn't change BigO. By the way you don't
    say whether you are sorting words or lines.

    This sub-thread is about sorting lines with average length of few
    dozens characters, i.e. many times longer than log2(N). That was stated
    at one of earlier posts.
    If it was about sorting English words it would be completely different
    matter. Likely, dependency still exists for very short sorts, but
    likely disappers at few K words.

    That choice has an
    effect, but as long as line lengths are bounded the strcmp() calls
    are still O(1).

    May be, for text, consistiong of many million of lines it would be
    the case. But the longest books ever written by human are much much
    shorter than that.


    I'd guess that the reasons behind it is that the 1st book is
    relatively homogeneous, while the 2nd book consists of rather
    distinct parts, likely with different characteristics of beginnings
    of lines. At parts the 2nd book is full of periods, of rhythmical
    prose close to poetry and at other parts it is not. I didn't have
    time to dig deeper.

    A more careful experiment is needed.

    There aren't many books that are sufficiently long.




    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Bart@bc@freeuk.com to comp.lang.c on Tue Apr 7 01:02:09 2026
    From Newsgroup: comp.lang.c

    On 06/04/2026 23:58, Michael S wrote:
    On Mon, 06 Apr 2026 13:48:04 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [discussion of sorting code]

    -----------------------
    void isort(char** data, int ll, int rr) {
    char* temp;
    int i = ll, j = rr;
    char* pivot = data[(ll + rr) / 2];

    do {
    while (strcmp(pivot, data[i]) > 0 && i < rr) ++i;
    while (strcmp(pivot, data[j]) < 0 && j > ll) --j;

    if (i <= j) {
    temp = data[i]; data[i] = data[j]; data[j] = temp;
    ++i;
    --j;
    }
    } while (i <= j);

    if (ll < j) isort(data, ll, j);
    if (i < rr) isort(data, i, rr);
    }

    //For a char* array A of N elements, call as 'isort(A, 0, n-1)'.

    Looks o.k. speed wise.
    It is not ideomatic C,

    What about the code prompts you to say it is not idiomatic C?
    Is it because there is a line with multiple statements on it,
    or is there something else?


    Something else.
    Here is a mechanical translation of Bart's code to what I consider more idiomatic C. Pay attention, the translation was mechanical, I didn't
    check if original logic is 100% correct and whether it is possible to
    omit some checks.

    void isort(char** data, int len) {
    if (len > 1) {
    char** i = data, j = &data[len-1];
    char* pivot = data[(len-1) / 2];
    do {
    while (strcmp(pivot, *i) > 0 && i < &data[len-1]) ++i;
    while (strcmp(pivot, *j) < 0 && j > data) --j;
    if (i <= j) {
    char* temp = *i; *i = *j; *j = temp;
    ++i;
    --j;
    }
    } while (i <= j);
    if (j > data) isort(data, j+1-data);
    if (i < &data[len-1]) isort(i, &data[len]-i);
    }
    }

    Whether i and j are pointers or integers is less important.
    But passing into call two indices instead of one and the fact that the
    second index points into last element of araay instead of the next
    place after last element is certainly non-idiomatic.

    It can be idiomatic for a Quicksort API. I've just browsed loads of
    examples on RosettaCode, and Left/Right arguments are common, even when
    arrays contain their own dimensions so allowing slices to be passed.

    Regarding the upper bound not being one past the end, both are intended
    to refer to actual elements.

    I also saw some examples like this Python:

    def quicksort(array):
    _quicksort(array, 0, len(array) - 1)

    def _quicksort(array, start, stop): ...

    Python is zero-based like C, but here the 'stop' index is still that of
    the last element, not one past.

    I think your idea of idiomatic C is to have something that is harder to understand and less readable. I acknowledge that a lot of C is like
    that, but it doesn't need to be.


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Apr 6 18:37:47 2026
    From Newsgroup: comp.lang.c

    Michael S <already5chosen@yahoo.com> writes:

    On Thu, 19 Mar 2026 14:49:16 +0000
    Bart <bc@freeuk.com> wrote:

    On 19/03/2026 09:19, Michael S wrote:

    On Wed, 18 Mar 2026 11:20:03 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    Normally however (and again in scripting code) I'd use my built-in
    sort() based on quicksort, which is nearly 1000 times faster than
    bubble-sort for my test (sort 20K random strings), and some 300x
    faster than your routines. It's not O(n-squared) either.

    For lexicographic sort of 20K random strings, plain quicksort is
    probably quite sub-optimal.
    If performance is important, I'd consider combined method: first
    pre-sort by 3-char or 4-char prefix with Radix sort ('by LS character
    first' variation of algorithm), then use quicksprt to sort sections
    with the same prefix. For string taken from the real world it will not
    work as well as for artificial random strings, but should still
    significantly outperform plain quicksort.

    Put the strings in a hash table. If a string has been seen
    before nothing more needs to be done. Any string not seen
    before to be inserted into a balanced tree (with no further
    tests for equality needed). The balanced tree can be used as
    a basis to associate an integer with each string, where the
    associated integers have the same sort order as the strings.
    The only string comparisons needed are for building the
    hash table (which for the most part will have no misses),
    and for building the balanced tree, which is good because
    most of the comparisons are with values that are far away,
    lexicographically, from the string being inserted, and so
    not many character comparisons are needed before a decision
    is made about which way to go in the tree.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Apr 6 21:00:46 2026
    From Newsgroup: comp.lang.c

    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 06 Apr 2026 15:13:32 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Obviously what words (or lines) appear can affect the character
    counts, but that still doesn't change BigO. By the way you don't
    say whether you are sorting words or lines.

    This sub-thread is about sorting lines with average length of few
    dozens characters, i.e. many times longer than log2(N). That was
    stated at one of earlier posts.

    That has nothing to do with BigO, which is about asymptotic behavior
    as N goes to infinity.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Tue Apr 7 09:37:05 2026
    From Newsgroup: comp.lang.c

    On Mon, 06 Apr 2026 21:00:46 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 06 Apr 2026 15:13:32 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Obviously what words (or lines) appear can affect the character
    counts, but that still doesn't change BigO. By the way you don't
    say whether you are sorting words or lines.

    This sub-thread is about sorting lines with average length of few
    dozens characters, i.e. many times longer than log2(N). That was
    stated at one of earlier posts.

    That has nothing to do with BigO, which is about asymptotic behavior
    as N goes to infinity.

    So, how do you call part of the curve from 1e2 to 3e5 lines? MiddleO ?

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Tue Apr 7 02:12:49 2026
    From Newsgroup: comp.lang.c

    Michael S <already5chosen@yahoo.com> writes:

    [..lots left out..]

    On Wed, 18 Mar 2026 11:20:03 +0100
    David Brown <david.brown@hesbynett.no> wrote:
    [...]
    I don't think pure bubblesort has much serious use outside
    educational purposes, but it is easy to understand, easy to
    implement correctly in pretty much any language, and can do a
    perfectly good job if you don't need efficient sorting of large
    datasets (for small enough datasets, a hand-written bubblesort
    in C will be faster than calling qsort).

    IMHO, Bubble Sort is harder to understand then Straight Select
    Sort.

    After reading through the several discussions regarding sorting,
    I have a variety of comments to make about sorting algorithms,
    not just the remarks above but to the discussions in general.
    The posting above seems a reasonable departure point for those
    comments, although much of what I have to say is prompted by
    comments in other postings.

    For starters, I contend that a completely simple bubble sort is
    the easiest sorting algorithm to show to novice programmers. By
    novice here I mean people who barely even know what is meant by
    the word algorithm. To be specific, what I mean by simple bubble
    sort is like this:

    void
    simple_minded_bubble_sort( unsigned n, Element a[n] ){
    unsigned i, j;
    for( i = 0; i < n; i++ ){
    for( j = 1; j < n; j++ ){
    if( follows( j-1, j, a ) ) swap( j-1, j, a );
    }
    }
    }

    Incidentally all the algorithms I will show here are written in
    terms of the two functions follows() and swap(), whose meanings
    are (I hope) very obvious. The array to be sorted has values of
    type Element, which doesn't play any direct role in the
    discussion.

    I think it goes without saying that this is a crummy algorithm.
    But it's a very simple algorithm, which makes it a good first
    choice for novice programmers.

    We can reasonably ask how bad this algorithm is, and ask in a way
    that novices can understand. It seems obvious that a good
    sorting algorithm should compare any two particular elements at
    most once. If we try this algorithm on an array of randomly
    chosen input values, we might get statistics like this:

    simple bubble 151277700 37391291 200.000

    Here the columns are name, number of compares, number of swaps,
    and number of compares shown as a percentage of the "maximal"
    number of compares. The crumminess is evident: this algorithm
    compares each pair of elements not once but twice. Awful!

    There is a second and more subtle problem, which is that even
    though the algorithm is simple it isn't obvious that it works or
    why. However it isn't hard to see that the inner loop carries
    the largest value to the upper end of the array, and similarly
    the second iteration of the inner loop will carry the second
    largest value to the last place before that, and so on. This
    observation leads to a second algorithm that is both faster and
    has a motivating argument for why it works:

    void
    bubble_sort( unsigned n, Element a[n] ){
    unsigned i, j;
    for( i = n; i > 1; i-- ){
    for( j = 1; j < i; j++ ){
    if( follows( j-1, j, a ) ) swap( j-1, j, a );
    }
    }
    }

    Now the inner loop works over smaller and smaller subsections of
    the whole array, each time carrying the largest remaining value
    to its appropriate location in the to-be-sorted array. Using the
    same test data as before, we get these measurements:

    simple bubble 151277700 37391291 200.000
    bubble 75638850 37391291 100.000

    Excellent! Besides being able to see why it works, we do better
    on the number of compares by a factor of two. The number of
    swaps is the same (as indeed it cannot be improved for any
    algorithm that swaps only adjacent elements), but cutting down on
    the number of compares is clearly a step in the right direction.

    Now if we are inspired we can notice something. During one
    iteration of the inner loop, if we remember the last place a swap
    occurred, subsequent iterations of the inner don't need to go any
    farther than that. The reason is that we know all the values
    after the last swap must the larger than the element being
    carried along, so only smaller values were passed over, so we
    don't need to look past the location of the rightmost swap. It's
    kind of a subtle argument, but we can program it thusly:

    void
    knuth_bubble_sort( unsigned n, Element a[n] ){
    unsigned i, j, t;
    for( i = n; i > 1; i = t ){
    for( t = 0, j = 1; j < i; j++ ){
    if( follows( j-1, j, a ) ){
    swap( j-1, j, a ), t = j;
    }
    }
    }
    }

    The variable 't' records the location past which no more compares
    need be done in the next iteration. By using 'i = t' in the
    outer loop control, we get to limit the range of what is looked
    at in the next iteration of the inner loop. Here is the data for
    the first three algorithms, again all run on the same input:

    simple bubble 151277700 37391291 200.000
    bubble 75638850 37391291 100.000
    knuth bubble 75611850 37391291 99.964

    The number of compares done has indeed gone down, but not by
    much. There are a couple of dirty secrets about the "improved"
    version of bubble sort. One is that it doesn't help very much.
    The other is about the idea that it terminates early when the
    array values are already almost sorted. That is mostly a myth.
    It can happen that when the array values are already almost
    sorted the knuth_bubble_sort() will terminate early, but it
    almost never does. Even when there is only one array value out
    of place, it is relatively rare for this algorithm to get a big
    speedup.

    It's time to look at another algorithm. Here is one that wasn't
    around when I was first learning about sorting algorithms, called
    gnome sort:

    void
    gnome_sort( unsigned n, Element a[n] ){
    unsigned i = 0;
    while( ++i < n ){
    if( follows( i-1, i, a ) ){
    swap( i-1, i, a );
    i = i > 1 ? i-2 : 1;
    }
    }
    }

    The idea is simple. Imagine a gnome wandering among the array
    elements, starting at the left end. If the two elements being
    looked at are in order the gnome moves right; if they aren't
    then the gnome swaps them and moves left (of course never moving
    past the left end). So the gnome just wanders back and forth,
    swapping elements that are out of order and going either left or
    right, depending on whether it just did a swap or not. How does
    it do, measurement-wise? Here is the data:

    simple bubble 151277700 37391291 200.000
    bubble 75638850 37391291 100.000
    knuth bubble 75611850 37391291 99.964
    gnome 74794859 37391291 98.884

    We see the relatively simple gnome sort does better than the more
    complicated knuth bubble sort. Perhaps not a lot better, but
    still better.

    Even so, these numbers are not very inspiring. We can do better
    if we take the basic idea of (knuth) bubble sort and make it
    boustrophedonic, or what might be called a ping pong sort:

    void
    ping_pong_sort( unsigned n, Element a[n] ){
    unsigned i=1, j=-1, k=n, newj, newk;
    while( j+1 < k-1 ){
    for( i = j+2, newk = i; i < k; i++ ){
    if( follows( i-1, i, a ) ){
    swap( i-1, i, a ), newk = i;
    }
    }
    k = newk;
    for( i = k-1, newj = i; i > j+1; i-- ){
    if( follows( i-1, i, a ) ){
    swap( i-1, i, a ), newj = i-1;
    }
    }
    j = newj;
    }
    }

    Gosh, what a mess. But let's look at the metrics.

    simple bubble 151277700 37391291 200.000
    bubble 75638850 37391291 100.000
    knuth bubble 75611850 37391291 99.964
    gnome 74794859 37391291 98.884
    ping pong 49983149 37391291 66.081

    That is nice to see - a healthy performance improvement. But of
    course what everyone is waiting for are the well-known simple
    standard sorts:

    void
    selection_sort( unsigned n, Element a[n] ){
    unsigned i, j;
    for( i = 0; i < n; i++ ){
    unsigned k = i;
    for( j = i+1; j < n; j++ ){
    if( follows( k, j, a ) ) k = j;
    }
    swap( i, k, a );
    }
    }

    void
    insertion_sort( unsigned n, Element a[n] ){
    unsigned i, j;
    for( i = 1; i < n; i++ ){
    for( j = i; j > 0; j-- ){
    if( !follows( j-1, j, a ) ) break;
    swap( j-1, j, a );
    }
    }
    }

    simple bubble 151277700 37391291 200.000
    bubble 75638850 37391291 100.000
    knuth bubble 75611850 37391291 99.964
    gnome 74794859 37391291 98.884
    ping pong 49983149 37391291 66.081

    selection 75638850 12289 100.000
    insertion 37403579 37391291 49.450

    Clearly selection sort should be used only when the amount of
    movement is the overriding factor, and compares don't matter.
    Conversely, insertion sort is the flip side of that - the number
    of compares is much lower, the amount of movement is the same as
    all the others.

    For my own amusement I wrote a new sorting algorithm, which I
    whimsically call titanic sort:

    void
    titanic_sort( unsigned n, Element a[n] ){
    unsigned i, j;
    for( i = n; i > 0; i-- ){
    for( j = i-1; j < n-1; j++ ){
    if( follows( j, j+1, a ) ) swap( j, j+1, a );
    }
    }
    }

    Titanic sort is just like an (ordinary) bubble sort, except it
    does ever larger iterations in the inner loop, rather than ever
    smaller iterations like an ordinary bubble sort. Here are the
    metrics:

    simple bubble 151277700 37391291 200.000
    bubble 75638850 37391291 100.000
    knuth bubble 75611850 37391291 99.964
    gnome 74794859 37391291 98.884
    ping pong 49983149 37391291 66.081

    selection 75638850 12289 100.000
    insertion 37403579 37391291 49.450

    titanic 75638850 37391291 100.000

    After writing the titanic sort, I had an idea. Because we are
    working on ever growing subintervals, we can stop the inner loop
    early whenever we find we don't need to swap. For this sort I
    once again chose a whimsical name, namely seven up sort:

    void
    sevenup_sort( unsigned n, Element a[n] ){
    unsigned i, j;
    for( i = n; i > 0; i-- ){
    for( j = i-1; j < n-1; j++ ){
    if( !follows( j, j+1, a ) ) break;
    swap( j, j+1, a );
    }
    }
    }

    simple bubble 151277700 37391291 200.000
    bubble 75638850 37391291 100.000
    knuth bubble 75611850 37391291 99.964
    gnome 74794859 37391291 98.884
    ping pong 49983149 37391291 66.081

    selection 75638850 12289 100.000
    insertion 37403579 37391291 49.450

    titanic 75638850 37391291 100.000
    seven up 37403579 37391291 49.450

    Oh of course; a seven up sort is just like an insertion sort,
    but inserting on the far end rather than the near end. Duh!

    After doing all these I wrote and measured three other sorts, for
    which I will give the metrics but the code for only one. Here
    are the metrics:

    simple bubble 151277700 37391291 200.000
    bubble 75638850 37391291 100.000
    knuth bubble 75611850 37391291 99.964
    gnome 74794859 37391291 98.884
    ping pong 49983149 37391291 66.081

    selection 75638850 12289 100.000
    insertion 37403579 37391291 49.450

    titanic 75638850 37391291 100.000
    seven up 37403579 37391291 49.450

    heapsort 309459 168889 0.409
    quicksort 192033 117831 0.254
    binary insertion 150102 37391291 0.198

    The code for heapsort and for quicksort was not as clean as I
    would like so I'm not posting that. Here is the code for binary
    insertion sort:

    void
    binary_insertion_sort( unsigned n, Element a[n] ){
    unsigned i, j, k, m;
    for( i = 1; i < n; i++ ){
    j = -1, k = i;
    while( j+1 < k ){
    m = j+(k-j)/2;
    if( follows( m, i, a ) ) k = m;
    else j = m;
    }
    for( j = i; j > k; j-- ) swap( j-1, j, a );
    }
    }

    A comment about heapsort, especially as compared with quicksort.
    Using this test input heapsort did a lot more compares than
    quicksort did. That result is not an accident of the data. Of
    course sometimes quicksort misbehaves, but usually it doesn't,
    and in those cases heapsort is going to run slower than quicksort
    because it does more compares. That result is inherent in the
    heapsort algorithm. It's true that the worst case for heapsort
    is O( N * log N ), but constant for heapsort is bigger than (the
    average case) for quicksort. Also it's worth noting that binary
    insertion sort does the best of all in terms of number of
    compares done.

    My conclusions...

    The right place to start on sorting algorithms is with bubble
    sort, not because it's a good algorithm, but because it's the
    easiest one to show, and what is more important it's a good way
    to teach about how algorithms evolve.

    There is no reason to use the knuth version of bubble sort.
    Compared to ordinary bubble sort the upside is small and not
    worth the downside of needing more intricate code.

    If there is a good chance that input starts off being close to
    sorted, ping pong sort can be considered.

    In almost all cases prefer binary insertion sort to plain
    insertion sort, not because the average case is better but
    because the worst case is better. Moreover the extra code needed
    is small and easily understood.

    Writing a full-scale sorting algorithm, such as a replacement for
    qsort, is a non-trivial undertaking.

    Other considerations apply if the sort needs to be stable, but
    surely this posting is long enough already. :)
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Tue Apr 7 14:00:41 2026
    From Newsgroup: comp.lang.c

    On Tue, 07 Apr 2026 02:12:49 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:


    <snip>

    My conclusions...

    The right place to start on sorting algorithms is with bubble
    sort, not because it's a good algorithm, but because it's the
    easiest one to show, and what is more important it's a good way
    to teach about how algorithms evolve.


    If you write it in terms of moves rather than in terms of swaps,
    bubble sort is not easier than insertion and selection.

    There is no reason to use the knuth version of bubble sort.
    Compared to ordinary bubble sort the upside is small and not
    worth the downside of needing more intricate code.


    knuth version has at least one merit. You claim that in practice the
    it's too rare that the merit is exercised. May be, it is true. Or may
    be cases of almost sorted array with misplaced elements coming in pairs
    with close proximity between peers is not that uncommon. I don't know.
    But merit exist.
    OTOH, 'ordinary bubble sort' has no merits at all relatively to simple insertion sort.

    If there is a good chance that input starts off being close to
    sorted, ping pong sort can be considered.

    In almost all cases prefer binary insertion sort to plain
    insertion sort, not because the average case is better but
    because the worst case is better. Moreover the extra code needed
    is small and easily understood.


    The most important practical use case for insertion sorts is in sorting
    short sections created by higher level sorting algorithms. I'm
    reasonably certain that for this use case straight insertion sort is
    better than binary insertion sort.
    Esp. so on modern CPUs where branch mispredictions become relatively
    more expensive (about the same as 15-20 years ago in terms of cycles,
    but in each cycle modern CPU can do 3-4 times more work) and with
    modern gcc compiler that [rightly] applies branch avoidance techniques (if-conversion in gcc docs) much more conservatively than in the older versions.

    Writing a full-scale sorting algorithm, such as a replacement for
    qsort, is a non-trivial undertaking.

    Other considerations apply if the sort needs to be stable, but
    surely this posting is long enough already. :)

    Still qute a lot shorter TAOCP vol.3 :-)


    BTW, thank you for 'boustrophedon'. Not that there is a serious chance
    that I will remember the word tomorrow or even tonight, But
    hopefully I'd remember that such word exists.




    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Tue Apr 7 14:46:19 2026
    From Newsgroup: comp.lang.c

    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 06 Apr 2026 15:13:32 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Obviously what words (or lines) appear can affect the character
    counts, but that still doesn't change BigO. By the way you don't
    say whether you are sorting words or lines.

    This sub-thread is about sorting lines with average length of few
    dozens characters, i.e. many times longer than log2(N). That was
    stated at one of earlier posts.

    That has nothing to do with BigO, which is about asymptotic behavior
    as N goes to infinity.

    Honest Big(O) varies length of the key with N. In practical range
    key length may be constant, but fixing length gives unrealistic
    problem for Big(O) analysis: without varying key length there
    are finitely many keys and sorting is equivalent to counting how
    many times each key appears in the input.
    --
    Waldek Hebisch
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Tue Apr 7 08:01:49 2026
    From Newsgroup: comp.lang.c

    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 06 Apr 2026 13:48:04 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [discussion of sorting code]

    -----------------------
    void isort(char** data, int ll, int rr) {
    char* temp;
    int i = ll, j = rr;
    char* pivot = data[(ll + rr) / 2];

    do {
    while (strcmp(pivot, data[i]) > 0 && i < rr) ++i;
    while (strcmp(pivot, data[j]) < 0 && j > ll) --j;

    if (i <= j) {
    temp = data[i]; data[i] = data[j]; data[j] = temp;
    ++i;
    --j;
    }
    } while (i <= j);

    if (ll < j) isort(data, ll, j);
    if (i < rr) isort(data, i, rr);
    }

    //For a char* array A of N elements, call as 'isort(A, 0, n-1)'.

    Looks o.k. speed wise.
    It is not ideomatic C,

    What about the code prompts you to say it is not idiomatic C?
    Is it because there is a line with multiple statements on it,
    or is there something else?

    Something else.
    Here is a mechanical translation of Bart's code to what I consider more idiomatic C. Pay attention, the translation was mechanical, I didn't
    check if original logic is 100% correct and whether it is possible to
    omit some checks.

    void isort(char** data, int len) {
    if (len > 1) {
    char** i = data, j = &data[len-1];
    char* pivot = data[(len-1) / 2];
    do {
    while (strcmp(pivot, *i) > 0 && i < &data[len-1]) ++i;
    while (strcmp(pivot, *j) < 0 && j > data) --j;
    if (i <= j) {
    char* temp = *i; *i = *j; *j = temp;
    ++i;
    --j;
    }
    } while (i <= j);
    if (j > data) isort(data, j+1-data);
    if (i < &data[len-1]) isort(i, &data[len]-i);
    }
    }

    Whether i and j are pointers or integers is less important.
    But passing into call two indices instead of one and the fact that the
    second index points into last element of araay instead of the next
    place after last element is certainly non-idiomatic.

    I agree that using one past the last item is more common. I
    myself wouldn't call using at the last item non-idiomatic,
    but I understand your point.

    As for whether the number of index/pointer arguments is one
    or two, that's an internal design decision. Depending on
    other factors either choice might be preferable; I have
    used both in the past, even just in this last exercise of
    trying out different sorting algorithms. So I have to
    object to calling either choice non-idiomatic.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From DFS@nospam@dfs.com to comp.lang.c on Tue Apr 7 16:39:03 2026
    From Newsgroup: comp.lang.c

    On 4/7/2026 5:12 AM, Tim Rentsch wrote:

    <snip>
    > simple bubble 151277700 37391291 200.000
    bubble 75638850 37391291 100.000
    knuth bubble 75611850 37391291 99.964
    gnome 74794859 37391291 98.884
    ping pong 49983149 37391291 66.081

    selection 75638850 12289 100.000
    insertion 37403579 37391291 49.450

    titanic 75638850 37391291 100.000
    seven up 37403579 37391291 49.450

    heapsort 309459 168889 0.409
    quicksort 192033 117831 0.254
    binary insertion 150102 37391291 0.198



    Nice post! About 15 minutes to write? :)


    A visualization with sound would be nice:

    https://www.youtube.com/watch?v=NVIjHj-lrT4







    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Tue Apr 7 20:04:38 2026
    From Newsgroup: comp.lang.c

    antispam@fricas.org (Waldek Hebisch) writes:

    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 06 Apr 2026 15:13:32 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Obviously what words (or lines) appear can affect the character
    counts, but that still doesn't change BigO. By the way you don't
    say whether you are sorting words or lines.

    This sub-thread is about sorting lines with average length of few
    dozens characters, i.e. many times longer than log2(N). That was
    stated at one of earlier posts.

    That has nothing to do with BigO, which is about asymptotic
    behavior as N goes to infinity.

    Honest Big(O) varies length of the key with N. In practical range
    key length may be constant, but fixing length gives unrealistic
    problem for Big(O) analysis: without varying key length there are
    finitely many keys and sorting is equivalent to counting how many
    times each key appears in the input.

    There's an important clarification to make here. There are two
    independent parameters: N, the number of records to be sorted (a
    record being a character string that is either a word or a line),
    and the (maximum) length of any record, which in the discussion is
    bounded above by a constant.

    What is being asked about is the behavior as a function of N as N
    increases without bound. Of course, theoretically, as the number of
    records increases without bound, eventually the character strings
    being sorted will have to have duplicates. But long before that
    happens the index variable N will run out of bits. This property is
    well understood in theoretical computer science, not just in terms
    of how much time is used but how much storage is needed. In theory
    log N bits are needed just to hold the index pointers. It is
    customary though, outside of purely theoretical discussions, to
    ignore that and treat the size of an index or pointer variable as
    constant. In purely theoretical terms no sorting algorithm is
    O(N*log(N)), because just incrementing a pointer takes more than
    O(1) operations. Surely the discussions in Knuth's books take such
    things into consideration. On the practical side, which almost
    always covers discussions that take place in usenet newsgroups,
    these minor theoretical issues are ignored. Any actul computer in
    the physical universe will never have occasion to process more than
    2**512 records, due to the limitation of the number of elementary
    particles in the universe, so a 512-bit address (or index value)
    always suffices.

    So yes, in theory, the considerations around processing an enormous
    number of values are relevant. In the practical context of the
    discussion underway here, they aren't.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Tue Apr 7 21:54:30 2026
    From Newsgroup: comp.lang.c

    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 06 Apr 2026 21:00:46 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 06 Apr 2026 15:13:32 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Obviously what words (or lines) appear can affect the character
    counts, but that still doesn't change BigO. By the way you don't
    say whether you are sorting words or lines.

    This sub-thread is about sorting lines with average length of few
    dozens characters, i.e. many times longer than log2(N). That was
    stated at one of earlier posts.

    That has nothing to do with BigO, which is about asymptotic behavior
    as N goes to infinity.

    So, how do you call part of the curve from 1e2 to 3e5 lines? MiddleO ?

    Consider a thought experiment. We have a qsort-like sorting function,
    for the sake of discussion let's say it uses heapsort. It is widely
    understood that heapsort is an O(N*log(N)) algorithm (not counting the theoretical objections mentioned in Waldek Hebisch's post and my
    response to those comments).

    Now suppose we have two distinct invocations of said function. In
    both cases the arguments are length 1000 character strings, having
    only spaces and alphabetic characters, and no duplicates between the
    strings. In one call all the strings have distinct values in the
    first six characters, and in the other call the strings are all the
    same in the first 993 characters, differing only in the last six
    characters. The call to the sorting function points back to a
    function that uses strcmp() to do its comparisons.

    The heapsort algorithm is well-known to be N log N. All the strings
    in both sorts are exactly 1000 characters, so the strcmp() calls have
    a fixed maximum time cost. And yet one of these sorts runs almost 200
    times as fast as the other.

    I think what you're seeing is a data dependency that is orthogonal to
    the time complexity of the algorithm. For example, the travelling
    salesman problem is widely known to be NP complete, and yet on some
    inputs a solution can be found very quickly. BigO is about worst case behavior, or sometimes average case behavior. Other kinds of behavior
    can depend dramatically on the nature of the inputs.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Thu Apr 9 16:06:32 2026
    From Newsgroup: comp.lang.c

    In article <86y0iyvypl.fsf@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 06 Apr 2026 21:00:46 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 06 Apr 2026 15:13:32 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Obviously what words (or lines) appear can affect the character
    counts, but that still doesn't change BigO. By the way you don't
    say whether you are sorting words or lines.

    This sub-thread is about sorting lines with average length of few
    dozens characters, i.e. many times longer than log2(N). That was
    stated at one of earlier posts.

    That has nothing to do with BigO, which is about asymptotic behavior
    as N goes to infinity.

    So, how do you call part of the curve from 1e2 to 3e5 lines? MiddleO ?

    Consider a thought experiment. We have a qsort-like sorting function,
    for the sake of discussion let's say it uses heapsort. It is widely >understood that heapsort is an O(N*log(N)) algorithm (not counting the >theoretical objections mentioned in Waldek Hebisch's post and my
    response to those comments).

    Now suppose we have two distinct invocations of said function. In
    both cases the arguments are length 1000 character strings, having
    only spaces and alphabetic characters, and no duplicates between the
    strings. In one call all the strings have distinct values in the
    first six characters, and in the other call the strings are all the
    same in the first 993 characters, differing only in the last six
    characters. The call to the sorting function points back to a
    function that uses strcmp() to do its comparisons.

    The heapsort algorithm is well-known to be N log N. All the strings
    in both sorts are exactly 1000 characters, so the strcmp() calls have
    a fixed maximum time cost. And yet one of these sorts runs almost 200
    times as fast as the other.

    I think what you're seeing is a data dependency that is orthogonal to
    the time complexity of the algorithm. For example, the travelling
    salesman problem is widely known to be NP complete, and yet on some
    inputs a solution can be found very quickly.

    A useful thought experiment, but it is perhaps easier to explain
    that the asymptotic time complexity of a sorting algorithm, for
    example heapsort being O(n log n), is expressed in terms of the
    number of comparison operations required by the algorithm,
    independent of the complexity of those comparisons.

    Obviously, the real-world performance of such algorithms
    depends on the time required for the comparison, but analysis of
    algorithms themselves abstracts over that and considers it a
    constant.

    BigO is about worst case
    behavior, or sometimes average case behavior. Other kinds of behavior
    can depend dramatically on the nature of the inputs.

    I disagree with this. The "Big-O" of an algorithm isn't about
    worst-case behavior, it's a notation for categorizing which term
    dominates in an expression describing the complexity of an
    algorithm, as the number of inputs to that algorithm grows.

    That is, it describe an aspect of rates of growth; "Big-O" in
    particular expresses an upper bound for such growth (so an
    O(n^2) algorithm is also O(n^3) and O(n^k) for k in 2..inf).

    Perhaps you meant "upper bound" instead of "worst case", but I
    would argue that this is misleading, as what one canonically
    calls the "Worst case" is again different; consider QuickSort,
    which is O(n^2) in the worst case (an already ordered input, or
    one that is otherwise pathological with respect to selection of
    the pivot), but O(n lg n) in the average case (a
    relatively-random distribution in the input's order).

    There are other common notations for related concepts. Omega is
    often used for describing lower bounds, and Theta for what CLRS
    call "tight" bounds, which we may think of as the infimum of O.

    Note that these notions predate computer science by some time.
    Hardy defined all three rigorously in "A Course of Pure
    Mathematics" (1908), using O() in the conventional sense, though
    he used "o" and "~" for omega and theta, respectively. These
    are useful in different fields of analysis (among other places).

    - Dan C.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Thu Apr 9 21:15:14 2026
    From Newsgroup: comp.lang.c

    In article <863416xid5.fsf@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    antispam@fricas.org (Waldek Hebisch) writes:

    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 06 Apr 2026 15:13:32 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Obviously what words (or lines) appear can affect the character
    counts, but that still doesn't change BigO. By the way you don't
    say whether you are sorting words or lines.

    This sub-thread is about sorting lines with average length of few
    dozens characters, i.e. many times longer than log2(N). That was
    stated at one of earlier posts.

    That has nothing to do with BigO, which is about asymptotic
    behavior as N goes to infinity.

    Honest Big(O) varies length of the key with N. In practical range
    key length may be constant, but fixing length gives unrealistic
    problem for Big(O) analysis: without varying key length there are
    finitely many keys and sorting is equivalent to counting how many
    times each key appears in the input.

    There's an important clarification to make here. There are two
    independent parameters: N, the number of records to be sorted (a
    record being a character string that is either a word or a line),
    and the (maximum) length of any record, which in the discussion is
    bounded above by a constant.

    What is being asked about is the behavior as a function of N as N
    increases without bound. Of course, theoretically, as the number of
    records increases without bound, eventually the character strings
    being sorted will have to have duplicates. But long before that
    happens the index variable N will run out of bits. This property is
    well understood in theoretical computer science, not just in terms
    of how much time is used but how much storage is needed. In theory
    log N bits are needed just to hold the index pointers. It is
    customary though, outside of purely theoretical discussions, to
    ignore that and treat the size of an index or pointer variable as
    constant. In purely theoretical terms no sorting algorithm is
    O(N*log(N)), because just incrementing a pointer takes more than
    O(1) operations. Surely the discussions in Knuth's books take such
    things into consideration.

    If by "Knuth's books" you're referring to TAOCP, then he does
    not seem to give it too much attention. At the start of Volume
    3, he spends considerable time discussing the combinatorics of
    permutations as prefatory to discussing sorting, but if he talks
    about the complexity of key comparison at all, it's not an
    extended treatment. He refers to a "<" operation for comparing
    keys, and he does talk about "multiprecision keys" and
    lexiographic orderings, but doesn't spend a lot of ink talking
    about how it might be implemented; one of the exercises
    acknowledges that this can have a significant impact on
    performance, but doesn't go into much detail. Another excise
    challenges the reader to write a MIX program for comparing keys,
    but again, doesn't give details about complexity analysis _of
    comparison_. There is another exercise that talks about this
    tangentially, in which he suggests sorting (it's kind of implied
    of text data) slows down as the sort nears completion, since
    more and more of the key value is now the same, and suggests
    keeping track of the length of the common prefix to use as an
    offset for subsequent comparisons.

    I have seen the notion that the actual time required for
    individual operations should be taken into account when
    analyzing their time complexity does appear in other books;
    Dasgupta, Papadimitriou, and Vazirani (http://algorithmics.lsi.upc.edu/docs/Dasgupta-Papadimitriou-Vazirani.pdf)
    talk about this in the context of computing Fibonacci numbers,
    for example.

    On the practical side, which almost
    always covers discussions that take place in usenet newsgroups,
    these minor theoretical issues are ignored. Any actul computer in
    the physical universe will never have occasion to process more than
    2**512 records, due to the limitation of the number of elementary
    particles in the universe, so a 512-bit address (or index value)
    always suffices.

    So yes, in theory, the considerations around processing an enormous
    number of values are relevant. In the practical context of the
    discussion underway here, they aren't.

    Indeed. As Rob Pike once put it, "Fancy algorithms are slow
    when $n$ is small, and $n$ is usually small. Fancy algorithms
    have big constants. Until you know that $n$ is frequently going
    to get big, don't get fancy."

    - Dan C.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Fri Apr 10 01:31:39 2026
    From Newsgroup: comp.lang.c

    On Thu, 9 Apr 2026 21:15:14 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) wrote:
    In article <863416xid5.fsf@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    antispam@fricas.org (Waldek Hebisch) writes:

    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 06 Apr 2026 15:13:32 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Obviously what words (or lines) appear can affect the character
    counts, but that still doesn't change BigO. By the way you
    don't say whether you are sorting words or lines.

    This sub-thread is about sorting lines with average length of few
    dozens characters, i.e. many times longer than log2(N). That was
    stated at one of earlier posts.

    That has nothing to do with BigO, which is about asymptotic
    behavior as N goes to infinity.

    Honest Big(O) varies length of the key with N. In practical range
    key length may be constant, but fixing length gives unrealistic
    problem for Big(O) analysis: without varying key length there are
    finitely many keys and sorting is equivalent to counting how many
    times each key appears in the input.

    There's an important clarification to make here. There are two
    independent parameters: N, the number of records to be sorted (a
    record being a character string that is either a word or a line),
    and the (maximum) length of any record, which in the discussion is
    bounded above by a constant.

    What is being asked about is the behavior as a function of N as N
    increases without bound. Of course, theoretically, as the number of >records increases without bound, eventually the character strings
    being sorted will have to have duplicates. But long before that
    happens the index variable N will run out of bits. This property is
    well understood in theoretical computer science, not just in terms
    of how much time is used but how much storage is needed. In theory
    log N bits are needed just to hold the index pointers. It is
    customary though, outside of purely theoretical discussions, to
    ignore that and treat the size of an index or pointer variable as
    constant. In purely theoretical terms no sorting algorithm is
    O(N*log(N)), because just incrementing a pointer takes more than
    O(1) operations. Surely the discussions in Knuth's books take such
    things into consideration.

    If by "Knuth's books" you're referring to TAOCP, then he does
    not seem to give it too much attention. At the start of Volume
    3, he spends considerable time discussing the combinatorics of
    permutations as prefatory to discussing sorting, but if he talks
    about the complexity of key comparison at all, it's not an
    extended treatment. He refers to a "<" operation for comparing
    keys, and he does talk about "multiprecision keys" and
    lexiographic orderings, but doesn't spend a lot of ink talking
    about how it might be implemented; one of the exercises
    acknowledges that this can have a significant impact on
    performance, but doesn't go into much detail. Another excise
    challenges the reader to write a MIX program for comparing keys,
    but again, doesn't give details about complexity analysis _of
    comparison_. There is another exercise that talks about this
    tangentially, in which he suggests sorting (it's kind of implied
    of text data) slows down as the sort nears completion, since
    more and more of the key value is now the same, and suggests
    keeping track of the length of the common prefix to use as an
    offset for subsequent comparisons.

    I have seen the notion that the actual time required for
    individual operations should be taken into account when
    analyzing their time complexity does appear in other books;
    Dasgupta, Papadimitriou, and Vazirani (http://algorithmics.lsi.upc.edu/docs/Dasgupta-Papadimitriou-Vazirani.pdf) talk about this in the context of computing Fibonacci numbers,
    for example.

    On the practical side, which almost
    always covers discussions that take place in usenet newsgroups,
    these minor theoretical issues are ignored. Any actul computer in
    the physical universe will never have occasion to process more than
    2**512 records, due to the limitation of the number of elementary
    particles in the universe, so a 512-bit address (or index value)
    always suffices.

    So yes, in theory, the considerations around processing an enormous
    number of values are relevant. In the practical context of the
    discussion underway here, they aren't.

    Indeed. As Rob Pike once put it, "Fancy algorithms are slow
    when $n$ is small, and $n$ is usually small. Fancy algorithms
    have big constants. Until you know that $n$ is frequently going
    to get big, don't get fancy."

    - Dan C.

    One case where these considerations are not at all theoretical and
    where simple quicksort from the books performs very very slowly exactly
    because when sorting progresses each lexicographic comparison
    takes more and more time, is a sorting at core of Burrows–Wheeler
    transform, which in turn is at core of various compression schemes,
    including bzip2. The problem hits you the worst when data set compresses
    well. In specific case of bzip2, they limited block size to 900KB which
    is quite low and did preprocessing on input data which often seriously
    impacts the quality of compression. I can't say for sure, but it seems
    to me that the reason was exactly that - avoiding prohibitively slow
    sorting. Were they had time and desire to use "fancy algorithms",
    either combinations of bucket and non-bucket variants of radix sort, or quicksort that memorizes common prefixes, or even combination of all
    three, then they would not need to use preprocessing and small blocks
    and would end up with better compression ratios.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Thu Apr 9 23:33:28 2026
    From Newsgroup: comp.lang.c

    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    antispam@fricas.org (Waldek Hebisch) writes:

    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 06 Apr 2026 15:13:32 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Obviously what words (or lines) appear can affect the character
    counts, but that still doesn't change BigO. By the way you don't
    say whether you are sorting words or lines.

    This sub-thread is about sorting lines with average length of few
    dozens characters, i.e. many times longer than log2(N). That was
    stated at one of earlier posts.

    That has nothing to do with BigO, which is about asymptotic
    behavior as N goes to infinity.

    Honest Big(O) varies length of the key with N. In practical range
    key length may be constant, but fixing length gives unrealistic
    problem for Big(O) analysis: without varying key length there are
    finitely many keys and sorting is equivalent to counting how many
    times each key appears in the input.

    There's an important clarification to make here. There are two
    independent parameters: N, the number of records to be sorted (a
    record being a character string that is either a word or a line),
    and the (maximum) length of any record, which in the discussion is
    bounded above by a constant.

    There is one choice, made often to simplify problem. But there
    is quite universal choice: N is total size of input data.

    What is being asked about is the behavior as a function of N as N
    increases without bound. Of course, theoretically, as the number of
    records increases without bound, eventually the character strings
    being sorted will have to have duplicates. But long before that
    happens the index variable N will run out of bits. This property is
    well understood in theoretical computer science, not just in terms
    of how much time is used but how much storage is needed. In theory
    log N bits are needed just to hold the index pointers. It is
    customary though, outside of purely theoretical discussions, to
    ignore that and treat the size of an index or pointer variable as
    constant. In purely theoretical terms no sorting algorithm is
    O(N*log(N)), because just incrementing a pointer takes more than
    O(1) operations. Surely the discussions in Knuth's books take such
    things into consideration. On the practical side, which almost
    always covers discussions that take place in usenet newsgroups,
    these minor theoretical issues are ignored. Any actul computer in
    the physical universe will never have occasion to process more than
    2**512 records, due to the limitation of the number of elementary
    particles in the universe, so a 512-bit address (or index value)
    always suffices.

    So yes, in theory, the considerations around processing an enormous
    number of values are relevant. In the practical context of the
    discussion underway here, they aren't.

    It was you who insisted on asymptitic complexity, rejecting
    practical amendments...
    --
    Waldek Hebisch
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Fri Apr 10 11:35:38 2026
    From Newsgroup: comp.lang.c

    In article <10r9d06$32585$1@paganini.bofh.team>,
    Waldek Hebisch <antispam@fricas.org> wrote:
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    antispam@fricas.org (Waldek Hebisch) writes:

    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 06 Apr 2026 15:13:32 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Obviously what words (or lines) appear can affect the character
    counts, but that still doesn't change BigO. By the way you don't
    say whether you are sorting words or lines.

    This sub-thread is about sorting lines with average length of few
    dozens characters, i.e. many times longer than log2(N). That was
    stated at one of earlier posts.

    That has nothing to do with BigO, which is about asymptotic
    behavior as N goes to infinity.

    Honest Big(O) varies length of the key with N. In practical range
    key length may be constant, but fixing length gives unrealistic
    problem for Big(O) analysis: without varying key length there are
    finitely many keys and sorting is equivalent to counting how many
    times each key appears in the input.

    There's an important clarification to make here. There are two
    independent parameters: N, the number of records to be sorted (a
    record being a character string that is either a word or a line),
    and the (maximum) length of any record, which in the discussion is
    bounded above by a constant.

    There is one choice, made often to simplify problem. But there
    is quite universal choice: N is total size of input data.

    This is too simplistic.

    Analysis of sorting algorithms treats $n$ as the number of input
    _records_, only considering their size tangentially. In turn,
    records are distinguished by having some kind of "key" and the
    existence of some comparison operator, "<", that obeys the usual
    ordering properties of mathematics.

    Sorting involves establishing an order for some collection of
    records, <R_1, ..., R_n> so that R_1 < R_2 < ... < R_n. When we
    say, O(n lg n), we are referring to the number of comparisons
    required.

    If the comparison function itself is complex, specifically if it
    is non-constant, is an entirely separate matter.

    "Total size of input data" doesn't enter into it at at all;
    indeed, it doesn't even make much sense conceptually: suppose I
    have some sequence of large records, but the key is just a fixed
    size integer in that record. Then the comparison is cheap;
    probably just a single machine instruction on real computers,
    despite the data itself being large.

    "Aha, but what about the cost of moving data within such a
    sequence? If the records are large, that's non-trivial and you
    must account for that." Indeed, this can be an issue; often one
    works around it by holding an auxiliary array of pointers to the
    records themselves, and sorting those. Again, exchanges here
    are cheap: just a handful of machine instructions.

    Do real-world programmers care about the actual cost of both
    the comparison function _and_ moving data around to exchange
    elements in e.g., an array when sorting? Absolutely. But
    trying to shoehorn those concerns into big-O notation is not a
    useful exercise in accounting for them.

    - Dan C.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sat Apr 11 09:04:44 2026
    From Newsgroup: comp.lang.c

    cross@spitfire.i.gajendra.net (Dan Cross) writes:

    In article <86y0iyvypl.fsf@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 06 Apr 2026 21:00:46 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 06 Apr 2026 15:13:32 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Obviously what words (or lines) appear can affect the character
    counts, but that still doesn't change BigO. By the way you don't
    say whether you are sorting words or lines.

    This sub-thread is about sorting lines with average length of few
    dozens characters, i.e. many times longer than log2(N). That was
    stated at one of earlier posts.

    That has nothing to do with BigO, which is about asymptotic behavior
    as N goes to infinity.

    So, how do you call part of the curve from 1e2 to 3e5 lines? MiddleO ?

    Consider a thought experiment. We have a qsort-like sorting function,
    for the sake of discussion let's say it uses heapsort. It is widely
    understood that heapsort is an O(N*log(N)) algorithm (not counting the
    theoretical objections mentioned in Waldek Hebisch's post and my
    response to those comments).

    Now suppose we have two distinct invocations of said function. In
    both cases the arguments are length 1000 character strings, having
    only spaces and alphabetic characters, and no duplicates between the
    strings. In one call all the strings have distinct values in the
    first six characters, and in the other call the strings are all the
    same in the first 993 characters, differing only in the last six
    characters. The call to the sorting function points back to a
    function that uses strcmp() to do its comparisons.

    The heapsort algorithm is well-known to be N log N. All the strings
    in both sorts are exactly 1000 characters, so the strcmp() calls have
    a fixed maximum time cost. And yet one of these sorts runs almost 200
    times as fast as the other.

    I think what you're seeing is a data dependency that is orthogonal to
    the time complexity of the algorithm. For example, the travelling
    salesman problem is widely known to be NP complete, and yet on some
    inputs a solution can be found very quickly.

    A useful thought experiment, but it is perhaps easier to explain
    that the asymptotic time complexity of a sorting algorithm, for
    example heapsort being O(n log n), is expressed in terms of the
    number of comparison operations required by the algorithm,
    independent of the complexity of those comparisons.

    No. This misses the essence of what I was trying to convey.

    BigO is about worst case behavior, or sometimes average case
    behavior. Other kinds of behavior can depend dramatically on the
    nature of the inputs.

    I disagree with this. The "Big-O" of an algorithm isn't about
    worst-case behavior, it's a notation for categorizing which term
    dominates in an expression describing the complexity of an
    algorithm, as the number of inputs to that algorithm grows.

    You misunderstood the point of my comment. Of course O()
    notation is about expressing an asymptotic upper bound of
    some function. The key issue here though is not about O()
    but about what function should be the one under consideration.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Sat Apr 11 19:55:10 2026
    From Newsgroup: comp.lang.c

    In article <86a4v9a3fn.fsf@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:

    In article <86y0iyvypl.fsf@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 06 Apr 2026 21:00:46 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 06 Apr 2026 15:13:32 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Obviously what words (or lines) appear can affect the character
    counts, but that still doesn't change BigO. By the way you don't >>>>>>> say whether you are sorting words or lines.

    This sub-thread is about sorting lines with average length of few
    dozens characters, i.e. many times longer than log2(N). That was
    stated at one of earlier posts.

    That has nothing to do with BigO, which is about asymptotic behavior >>>>> as N goes to infinity.

    So, how do you call part of the curve from 1e2 to 3e5 lines? MiddleO ? >>>
    Consider a thought experiment. We have a qsort-like sorting function,
    for the sake of discussion let's say it uses heapsort. It is widely
    understood that heapsort is an O(N*log(N)) algorithm (not counting the
    theoretical objections mentioned in Waldek Hebisch's post and my
    response to those comments).

    Now suppose we have two distinct invocations of said function. In
    both cases the arguments are length 1000 character strings, having
    only spaces and alphabetic characters, and no duplicates between the
    strings. In one call all the strings have distinct values in the
    first six characters, and in the other call the strings are all the
    same in the first 993 characters, differing only in the last six
    characters. The call to the sorting function points back to a
    function that uses strcmp() to do its comparisons.

    The heapsort algorithm is well-known to be N log N. All the strings
    in both sorts are exactly 1000 characters, so the strcmp() calls have
    a fixed maximum time cost. And yet one of these sorts runs almost 200
    times as fast as the other.

    I think what you're seeing is a data dependency that is orthogonal to
    the time complexity of the algorithm. For example, the travelling
    salesman problem is widely known to be NP complete, and yet on some
    inputs a solution can be found very quickly.

    A useful thought experiment, but it is perhaps easier to explain
    that the asymptotic time complexity of a sorting algorithm, for
    example heapsort being O(n log n), is expressed in terms of the
    number of comparison operations required by the algorithm,
    independent of the complexity of those comparisons.

    No. This misses the essence of what I was trying to convey.

    Then what you are trying to convey is misleading. Note that I
    was making factual statements.

    BigO is about worst case behavior, or sometimes average case
    behavior. Other kinds of behavior can depend dramatically on the
    nature of the inputs.

    I disagree with this. The "Big-O" of an algorithm isn't about
    worst-case behavior, it's a notation for categorizing which term
    dominates in an expression describing the complexity of an
    algorithm, as the number of inputs to that algorithm grows.

    You misunderstood the point of my comment. Of course O()
    notation is about expressing an asymptotic upper bound of
    some function. The key issue here though is not about O()
    but about what function should be the one under consideration.

    The misunderstanding is yours, I'm afraid.

    You wrote, "BigO is about worst case behavior." That is
    definitionally incorrect, and it's not really something that is
    up for debate or open to alternative interpretations.

    If you want to account for the complexity of a comparison
    function, and you can characterize it in some useful, way, you
    can of course incorporate that into the analysis. For example,
    string comparison is linear in the size of the input strings;
    if one denotes that as m, then the time required to sort strings
    using heapsort might be described is O(m*n*log(n))

    - Dan C.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sat Apr 11 21:32:21 2026
    From Newsgroup: comp.lang.c

    cross@spitfire.i.gajendra.net (Dan Cross) writes:

    In article <863416xid5.fsf@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    antispam@fricas.org (Waldek Hebisch) writes:

    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 06 Apr 2026 15:13:32 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Obviously what words (or lines) appear can affect the character
    counts, but that still doesn't change BigO. By the way you don't
    say whether you are sorting words or lines.

    This sub-thread is about sorting lines with average length of few
    dozens characters, i.e. many times longer than log2(N). That was
    stated at one of earlier posts.

    That has nothing to do with BigO, which is about asymptotic
    behavior as N goes to infinity.

    Honest Big(O) varies length of the key with N. In practical range
    key length may be constant, but fixing length gives unrealistic
    problem for Big(O) analysis: without varying key length there are
    finitely many keys and sorting is equivalent to counting how many
    times each key appears in the input.

    There's an important clarification to make here. There are two
    independent parameters: N, the number of records to be sorted (a
    record being a character string that is either a word or a line),
    and the (maximum) length of any record, which in the discussion is
    bounded above by a constant.

    What is being asked about is the behavior as a function of N as N
    increases without bound. Of course, theoretically, as the number of
    records increases without bound, eventually the character strings
    being sorted will have to have duplicates. But long before that
    happens the index variable N will run out of bits. This property is
    well understood in theoretical computer science, not just in terms
    of how much time is used but how much storage is needed. In theory
    log N bits are needed just to hold the index pointers. It is
    customary though, outside of purely theoretical discussions, to
    ignore that and treat the size of an index or pointer variable as
    constant. In purely theoretical terms no sorting algorithm is
    O(N*log(N)), because just incrementing a pointer takes more than
    O(1) operations. Surely the discussions in Knuth's books take such
    things into consideration.

    If by "Knuth's books" you're referring to TAOCP, then he does
    not seem to give it too much attention. [...]

    In most of the chapter on Sorting, TAOCP uses the number of
    comparisons as the basis of comparison. But not everywhere
    in the chapter.

    My statement was not meant to be limited to the discussion of
    Sorting.

    On the practical side, which almost
    always covers discussions that take place in usenet newsgroups,
    these minor theoretical issues are ignored. Any actul computer in
    the physical universe will never have occasion to process more than
    2**512 records, due to the limitation of the number of elementary
    particles in the universe, so a 512-bit address (or index value)
    always suffices.

    So yes, in theory, the considerations around processing an enormous
    number of values are relevant. In the practical context of the
    discussion underway here, they aren't.

    Indeed. As Rob Pike once put it, "Fancy algorithms are slow
    when $n$ is small, and $n$ is usually small. Fancy algorithms
    have big constants. Until you know that $n$ is frequently going
    to get big, don't get fancy."

    Whether the Rob Pike advisory is applicable or not is beside the
    point. My comment was about fancy mathematics, not fancy
    algorithms. My statement is just as applicable to Tim Sort (one
    of the fancier sorting algorithms) as it is to Bubble Sort.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Sun Apr 12 04:59:39 2026
    From Newsgroup: comp.lang.c

    In article <861pgkaje2.fsf@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <863416xid5.fsf@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    antispam@fricas.org (Waldek Hebisch) writes:
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    Michael S <already5chosen@yahoo.com> writes:
    On Mon, 06 Apr 2026 15:13:32 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    Obviously what words (or lines) appear can affect the character
    counts, but that still doesn't change BigO. By the way you don't >>>>>>> say whether you are sorting words or lines.

    This sub-thread is about sorting lines with average length of few
    dozens characters, i.e. many times longer than log2(N). That was
    stated at one of earlier posts.

    That has nothing to do with BigO, which is about asymptotic
    behavior as N goes to infinity.

    Honest Big(O) varies length of the key with N. In practical range
    key length may be constant, but fixing length gives unrealistic
    problem for Big(O) analysis: without varying key length there are
    finitely many keys and sorting is equivalent to counting how many
    times each key appears in the input.

    There's an important clarification to make here. There are two
    independent parameters: N, the number of records to be sorted (a
    record being a character string that is either a word or a line),
    and the (maximum) length of any record, which in the discussion is
    bounded above by a constant.

    What is being asked about is the behavior as a function of N as N
    increases without bound. Of course, theoretically, as the number of
    records increases without bound, eventually the character strings
    being sorted will have to have duplicates. But long before that
    happens the index variable N will run out of bits. This property is
    well understood in theoretical computer science, not just in terms
    of how much time is used but how much storage is needed. In theory
    log N bits are needed just to hold the index pointers. It is
    customary though, outside of purely theoretical discussions, to
    ignore that and treat the size of an index or pointer variable as
    constant. In purely theoretical terms no sorting algorithm is
    O(N*log(N)), because just incrementing a pointer takes more than
    O(1) operations. Surely the discussions in Knuth's books take such
    things into consideration.

    If by "Knuth's books" you're referring to TAOCP, then he does
    not seem to give it too much attention. [...]

    In most of the chapter on Sorting, TAOCP uses the number of
    comparisons as the basis of comparison. But not everywhere
    in the chapter.

    It seems like I pointed out a few places where he acknowledges
    a more complex picture. Are there other places to which you are
    referring?

    My statement was not meant to be limited to the discussion of
    Sorting.

    What do you think I was referring to, exactly? I was responding
    to your comments about Knuth's books, specifically, and the
    quoted text above, which seems concerned solely with sorting.

    As I mentioned, Dasgupta et al _do_ mention that analysis of
    algorithms is more complex than most treatments, because of
    precisely the idea that as things grow, seemingly constant
    operations are no longer constant. As I mentioned, they did
    this within the context of Fibonacci numbers, not sorting, but
    the point stands. Since, as you say, your statement was not
    meant to be limited to discussions of sorting, then it seems to
    be supporting what you are saying.

    On the practical side, which almost
    always covers discussions that take place in usenet newsgroups,
    these minor theoretical issues are ignored. Any actul computer in
    the physical universe will never have occasion to process more than
    2**512 records, due to the limitation of the number of elementary
    particles in the universe, so a 512-bit address (or index value)
    always suffices.

    So yes, in theory, the considerations around processing an enormous
    number of values are relevant. In the practical context of the
    discussion underway here, they aren't.

    Indeed. As Rob Pike once put it, "Fancy algorithms are slow
    when $n$ is small, and $n$ is usually small. Fancy algorithms
    have big constants. Until you know that $n$ is frequently going
    to get big, don't get fancy."

    Whether the Rob Pike advisory is applicable or not is beside the
    point.

    On the contrary; I mentioned it because it supports your thesis.

    My comment was about fancy mathematics, not fancy
    algorithms. My statement is just as applicable to Tim Sort (one
    of the fancier sorting algorithms) as it is to Bubble Sort.

    There's nothing particularly fancy about it, but that aside, I'm
    honestly not sure what exactly I said that you are (apparently?)
    disagreeing with.

    I was responding with a specific statement about Knuth's books,
    a reference to another book in support of your statement, and
    yet another reference to something that Pike had written that,
    again, supports your point.

    - Dan C.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sun Apr 12 06:17:12 2026
    From Newsgroup: comp.lang.c

    Michael S <already5chosen@yahoo.com> writes:

    On Thu, 9 Apr 2026 21:15:14 -0000 (UTC)
    cross@spitfire.i.gajendra.net (Dan Cross) wrote:
    [...]
    Indeed. As Rob Pike once put it, "Fancy algorithms are slow
    when $n$ is small, and $n$ is usually small. Fancy algorithms
    have big constants. Until you know that $n$ is frequently going
    to get big, don't get fancy."

    One case where these considerations are not at all theoretical and
    where simple quicksort from the books performs very very slowly exactly because when sorting progresses each lexicographic comparison
    takes more and more time, is a sorting at core of Burrows-Wheeler
    transform, which in turn is at core of various compression schemes,
    including bzip2. The problem hits you the worst when data set compresses well.

    I'm curious to know if you can quantify this to some degree, and
    if so how much. I don't have any experience with Burrows-Wheeler
    transform or (any internals of) bzip2.

    In specific case of bzip2, they limited block size to 900KB which
    is quite low and did preprocessing on input data which often seriously impacts the quality of compression. I can't say for sure, but it seems
    to me that the reason was exactly that - avoiding prohibitively slow
    sorting. Were they had time and desire to use "fancy algorithms",
    either combinations of bucket and non-bucket variants of radix sort, or quicksort that memorizes common prefixes, or even combination of all
    three, then they would not need to use preprocessing and small blocks
    and would end up with better compression ratios.

    There could be other reasons for wanting to limit the block size.
    Have you done any web searches or tried to investigate in some
    other ways?
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sun Apr 12 07:13:41 2026
    From Newsgroup: comp.lang.c

    antispam@fricas.org (Waldek Hebisch) writes:

    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    antispam@fricas.org (Waldek Hebisch) writes:

    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 06 Apr 2026 15:13:32 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Obviously what words (or lines) appear can affect the character
    counts, but that still doesn't change BigO. By the way you don't
    say whether you are sorting words or lines.

    This sub-thread is about sorting lines with average length of few
    dozens characters, i.e. many times longer than log2(N). That was
    stated at one of earlier posts.

    That has nothing to do with BigO, which is about asymptotic
    behavior as N goes to infinity.

    Honest Big(O) varies length of the key with N. In practical range
    key length may be constant, but fixing length gives unrealistic
    problem for Big(O) analysis: without varying key length there are
    finitely many keys and sorting is equivalent to counting how many
    times each key appears in the input.

    There's an important clarification to make here. There are two
    independent parameters: N, the number of records to be sorted (a
    record being a character string that is either a word or a line),
    and the (maximum) length of any record, which in the discussion is
    bounded above by a constant.

    There is one choice, made often to simplify problem. But there
    is quite universal choice: N is total size of input data.

    No, N is usually the number of records to be sorted, not the size of
    the input. Also, in the discussion I was responding to, there were
    clearly two distinct axes relevant to the discussion.

    What is being asked about is the behavior as a function of N as N
    increases without bound. Of course, theoretically, as the number of
    records increases without bound, eventually the character strings
    being sorted will have to have duplicates. But long before that
    happens the index variable N will run out of bits. This property is
    well understood in theoretical computer science, not just in terms
    of how much time is used but how much storage is needed. In theory
    log N bits are needed just to hold the index pointers. It is
    customary though, outside of purely theoretical discussions, to
    ignore that and treat the size of an index or pointer variable as
    constant. In purely theoretical terms no sorting algorithm is
    O(N*log(N)), because just incrementing a pointer takes more than
    O(1) operations. Surely the discussions in Knuth's books take such
    things into consideration. On the practical side, which almost
    always covers discussions that take place in usenet newsgroups,
    these minor theoretical issues are ignored. Any actul computer in
    the physical universe will never have occasion to process more than
    2**512 records, due to the limitation of the number of elementary
    particles in the universe, so a 512-bit address (or index value)
    always suffices.

    So yes, in theory, the considerations around processing an enormous
    number of values are relevant. In the practical context of the
    discussion underway here, they aren't.

    It was you who insisted on asymptitic complexity, rejecting
    practical amendments...

    This statement isn't right. BigO was already part of the discussion
    when I joined in. Also, it is customary in discussion of sorting
    algorithms to use the metric of number of comparisons done, without
    regard to the size of the variables needed to hold the indices of
    the records being sorted. See Knuth chapter 5, on Sorting, in
    volume 3 of TAOCP.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sun Apr 12 11:16:50 2026
    From Newsgroup: comp.lang.c

    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

    [...]

    After my previous posting I got curious about how different sorting
    algorithms behave for "nearly sorted" inputs. I decided to run some
    trials for some of the sorting functions given in my last posting.
    Here are the results:

    Average number of compares for sorting 100 elements, with one
    element out of place:

    insertion sort 132.647
    gnome sort 166.293
    ping pong sort 180.177
    knuth bubble sort 956.500

    Average number of compares for sorting 100 elements, with two
    elements out of place:

    insertion sort 166.126
    gnome sort 233.253
    ping pong sort 234.163
    knuth bubble sort 1566.487

    My conclusion, based on the above, is that bubble sort has nothing
    to recommend it as a practical sorting algorithm, even considering
    the refinement of early pruning (called the "knuth bubble sort" in
    the list above). The gnome sort is both simpler and better in terms
    of performance.

    Incidentally, some years ago I devised a variation of gnome sort,
    which I call "leaping gnome sort". A leaping gnome sort is like
    gnome sort when going right to left, that is, when the elements
    being looked at are out of order, swap them and move one place to
    the left. When no swap is needed, the gnome "leaps" to the last
    place a rightward shift was initiated. In terms of code, leaping
    gnome sort can be written as follows:

    void
    leaping_gnome_sort( unsigned n, Element a[n] ){
    for(
    unsigned i = 1, j = 2;
    i < n;
    i = follows(i-1,i,a) ? swap(i-1,i,a), i>1 ?i-1 :j++ : j++
    ){}
    }

    The "leaping" behavior is evident whenever the current location 'i'
    gets the value 'j++' rather than 'i-1'.

    Of course this algorithm could be written using a while() loop
    rather than a for() loop:

    void
    leaping_gnome_sort( unsigned n, Element a[n] ){
    unsigned i = 1, j = 2;
    while( i < n ){
    if( !follows(i-1,i,a) ) i = j++;
    else {
    swap(i-1,i,a);
    i = i > 1 ? i-1 : j++;
    }
    }
    }

    In either case it's nice having only a single loop structure rather
    than an outer loop and a second, nested loop.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Mon Apr 13 20:44:07 2026
    From Newsgroup: comp.lang.c

    In article <86fr508dwq.fsf@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    antispam@fricas.org (Waldek Hebisch) writes:
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    antispam@fricas.org (Waldek Hebisch) writes:
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    Michael S <already5chosen@yahoo.com> writes:
    On Mon, 06 Apr 2026 15:13:32 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Obviously what words (or lines) appear can affect the character
    counts, but that still doesn't change BigO. By the way you don't >>>>>>> say whether you are sorting words or lines.

    This sub-thread is about sorting lines with average length of few
    dozens characters, i.e. many times longer than log2(N). That was
    stated at one of earlier posts.

    That has nothing to do with BigO, which is about asymptotic
    behavior as N goes to infinity.

    Honest Big(O) varies length of the key with N. In practical range
    key length may be constant, but fixing length gives unrealistic
    problem for Big(O) analysis: without varying key length there are
    finitely many keys and sorting is equivalent to counting how many
    times each key appears in the input.

    There's an important clarification to make here. There are two
    independent parameters: N, the number of records to be sorted (a
    record being a character string that is either a word or a line),
    and the (maximum) length of any record, which in the discussion is
    bounded above by a constant.

    There is one choice, made often to simplify problem. But there
    is quite universal choice: N is total size of input data.

    No, N is usually the number of records to be sorted, not the size of
    the input. Also, in the discussion I was responding to, there were
    clearly two distinct axes relevant to the discussion.

    What is being asked about is the behavior as a function of N as N
    increases without bound. Of course, theoretically, as the number of
    records increases without bound, eventually the character strings
    being sorted will have to have duplicates. But long before that
    happens the index variable N will run out of bits. This property is
    well understood in theoretical computer science, not just in terms
    of how much time is used but how much storage is needed. In theory
    log N bits are needed just to hold the index pointers. It is
    customary though, outside of purely theoretical discussions, to
    ignore that and treat the size of an index or pointer variable as
    constant. In purely theoretical terms no sorting algorithm is
    O(N*log(N)), because just incrementing a pointer takes more than
    O(1) operations. Surely the discussions in Knuth's books take such
    things into consideration. On the practical side, which almost
    always covers discussions that take place in usenet newsgroups,
    these minor theoretical issues are ignored. Any actul computer in
    the physical universe will never have occasion to process more than
    2**512 records, due to the limitation of the number of elementary
    particles in the universe, so a 512-bit address (or index value)
    always suffices.

    So yes, in theory, the considerations around processing an enormous
    number of values are relevant. In the practical context of the
    discussion underway here, they aren't.

    It was you who insisted on asymptitic complexity, rejecting
    practical amendments...

    This statement isn't right. BigO was already part of the discussion
    when I joined in. Also, it is customary in discussion of sorting
    algorithms to use the metric of number of comparisons done, without
    regard to the size of the variables needed to hold the indices of
    the records being sorted.

    Hmm, this response echoes almost exactly what I wrote in <10ranaa$ihf$1@reader1.panix.com> on April 10th, also in
    response to Waldek. Did you see it?

    See Knuth chapter 5, on Sorting, in volume 3 of TAOCP.

    As for TAOCP, as much as I respect and admire Knuth, I don't
    think "The Art of Computer Programming" is a good reference for
    working programmers. At 391 pages long, Chapter Five occupies
    more than half of an ~750 page book, the examples are all in
    assembly language for his notional MIX computer, and the
    analysis he presents presumes a mathematics background that,
    frankly, most programmers neither have nor need. "See chapter
    5" is thus not useful as a reference, and rather comes across as
    an admonishment.

    A much more useful treatment is Cormen, Leiserson, Rivest, and
    Stein's, "Introduction to Algorithms." The treatment is much
    more accessible than TAOCP, while still rigorous. Disclaimer:
    I once served as a industry mentor for students taking one of
    Leiserson's classes, though I haven't worked with him closely.

    CLRS suggests that Aho, Hopcroft, and Ullman advocated for
    asymptotic analysis of algorithms in, "The Design and Analysis
    of Computer Algorithms". Again, it's a wonderful book, but I
    would argue it's more theoretical than most programmers would
    find comfortable. Disclaimer: Aho taught me compilers.

    Personally, I like Skiena's, "Algorithm Design Manual" and think
    that its treatment is among the best I've seen when it comes to
    threading the needle between rigorous attention to detail and
    accessibility, though one needs more mathematics to negotiate it
    than e.g., CLRS.

    - Dan C.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Thu Apr 16 10:23:23 2026
    From Newsgroup: comp.lang.c

    Michael S <already5chosen@yahoo.com> writes:

    On Tue, 07 Apr 2026 02:12:49 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:


    <snip>

    My conclusions...

    The right place to start on sorting algorithms is with bubble
    sort, not because it's a good algorithm, but because it's the
    easiest one to show, and what is more important it's a good way
    to teach about how algorithms evolve.

    If you write it in terms of moves rather than in terms of swaps,
    bubble sort is not easier than insertion and selection.

    A reason to use swap() is that swap is easier to understand
    than moves. Each swap() operation, all by itself, preserves
    the invariant that the array contains the same elements after
    the swap() is done that were there before. With code written
    using moves, several statements together need to be looked at
    to verify that they are doing the right thing. Furthermore,
    when used in conjunction with follows(), whenever we see the
    two together written like this

    if( follows(x,y,array) ) swap(x,y,array);

    we know that both (1) the array has the same elements as it did
    before, and (2) the "sortedness" of array elements has increased,
    or at least has remained unchanged. Compare that to an insertion
    sort written using moves, where a whole loop body (including a
    'break;' no less), plus a statement after, needs to be examined
    to verify that it is doing something sensible. Clearly less
    mental effort is needed to see that the one line written above is
    working than what is needed for the eight lines seen in the
    insertion sort shown in your (much) earlier posting.

    There is no reason to use the knuth version of bubble sort.
    Compared to ordinary bubble sort the upside is small and not
    worth the downside of needing more intricate code.

    knuth version has at least one merit. You claim that in practice the
    it's too rare that the merit is exercised. May be, it is true. Or may
    be cases of almost sorted array with misplaced elements coming in pairs
    with close proximity between peers is not that uncommon. I don't know.
    But merit exist.

    I posted a performance comparison of several sorts including knuth
    bubble sort. It's true that knuth bubble sort is better than plain
    bubble sort. But it's still awful compared to the even simpler
    gnome sort.

    OTOH, 'ordinary bubble sort' has no merits at all relatively to simple insertion sort.

    Simple bubble sort is easier to understand than insertion sort.

    For those who don't like bubble sort, there is gnome sort, which
    is even simpler than bubble sort, and actually has better
    performance. And, if people will forgive me, it's only a short
    leap to get to leaping gnome sort, which has better performance
    still.

    BTW, thank you for 'boustrophedon'. Not that there is a
    serious chance that I will remember the word tomorrow or even
    tonight, But hopefully I'd remember that such word exists.

    It comes from Greek, meaning roughly "ox plow turning". Maybe
    that will help you remember; I think it does for me.
    --- Synchronet 3.21f-Linux NewsLink 1.2