• Command line globber/tokenizer library for C?

    From ted@loft.tnolan.com (Ted Nolan@tednolan to comp.lang.c on Tue Sep 10 19:01:37 2024
    From Newsgroup: comp.lang.c

    I have the case where my C program is handed a string which is basically
    a command line.

    Is there a common open source C library for tokenizing and globbing
    this into an argc/argv as a shell would do? I've googled, but I get
    too much C++ & other language stuff.

    Note that I'm not asking for getopt(), that comes afterwards, and
    I'm not asking for any variable interpolation, but just that a string
    like, say

    hello -world "This is foo.*" foo.*

    becomes something like

    my_argv[0] "hello"
    my_argv[1] "-world"
    my_argv[2] "This is foo.*"
    my_argv[3] foo.h
    my_argv[4] foo.c
    my_argv[5] foo.txt

    my_argc = 6

    I could live without the globbing if that's a bridge too far.
    --
    columbiaclosings.com
    What's not in Columbia anymore..
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.c on Tue Sep 10 20:58:36 2024
    From Newsgroup: comp.lang.c

    On 10 Sep 2024 19:01:37 GMT, Ted Nolan <tednolan> wrote:

    I have the case where my C program is handed a string which is basically
    a command line.

    If that’s what your OS is giving you, your OS is doing it wrong.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Tue Sep 10 23:05:32 2024
    From Newsgroup: comp.lang.c

    On 10.09.2024 21:01, Ted Nolan <tednolan> wrote:
    I have the case where my C program is handed a string which is basically
    a command line.

    IIUC you don't want the shell to do the expansion but, sort of,
    re-invent the wheel in your application (a'la DOS). - Okay.


    Is there a common open source C library for tokenizing and globbing
    this into an argc/argv as a shell would do? I've googled, but I get
    too much C++ & other language stuff.

    I also suppose that by "tokenizing" you don't mean something like
    strtok (3) - extract tokens from strings
    but a field separation as the Unix shell does using 'IFS'.

    I don't know of a C library but if I'd want to implement a function
    that all POSIX shells do then I'd look into the shell packages...

    For Kornshell (e.g. version 93u+m) I see these files in the package
    src/lib/libast/include/glob.h
    src/lib/libast/misc/glob.c
    that obviously care about the globbing function. (I suspect you'll
    need some more supporting files from the ksh package.)

    HTH

    Janis


    Note that I'm not asking for getopt(), that comes afterwards, and
    I'm not asking for any variable interpolation, but just that a string
    like, say

    hello -world "This is foo.*" foo.*

    becomes something like

    my_argv[0] "hello"
    my_argv[1] "-world"
    my_argv[2] "This is foo.*"
    my_argv[3] foo.h
    my_argv[4] foo.c
    my_argv[5] foo.txt

    my_argc = 6

    I could live without the globbing if that's a bridge too far.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Tue Sep 10 14:12:50 2024
    From Newsgroup: comp.lang.c

    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    On 10 Sep 2024 19:01:37 GMT, Ted Nolan <tednolan> wrote:
    I have the case where my C program is handed a string which is basically
    a command line.

    If that’s what your OS is giving you, your OS is doing it wrong.

    He didn't say the string is coming from the OS.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Tue Sep 10 14:37:15 2024
    From Newsgroup: comp.lang.c

    ted@loft.tnolan.com (Ted Nolan <tednolan>) writes:
    I have the case where my C program is handed a string which is basically
    a command line.

    Is there a common open source C library for tokenizing and globbing
    this into an argc/argv as a shell would do? I've googled, but I get
    too much C++ & other language stuff.

    Note that I'm not asking for getopt(), that comes afterwards, and
    I'm not asking for any variable interpolation, but just that a string
    like, say

    hello -world "This is foo.*" foo.*

    becomes something like

    my_argv[0] "hello"
    my_argv[1] "-world"
    my_argv[2] "This is foo.*"
    my_argv[3] foo.h
    my_argv[4] foo.c
    my_argv[5] foo.txt

    my_argc = 6

    I could live without the globbing if that's a bridge too far.

    What environment(s) does this need to run in?

    I don't know of a standard(ish) function that does this. POSIX defines
    the glob() function, but it only does globbing, not word-splitting.

    If you're trying to emulate the way the shell (which one?) parses
    command lines, and *if* you're on a system that has a shell, you can
    invoke a shell to do the work for you. Here's a quick and dirty
    example:

    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>
    int main(void) {
    const char *line = "hello -world \"This is foo.*\" foo.*";
    char *cmd = malloc(50 + strlen(line));
    sprintf(cmd, "printf '%%s\n' %s", line);
    system(cmd);
    }

    This prints the arguments to stdout, one per line (and doesn't handle
    arguments with embedded newlines very well). You could modify the
    command to write the output to a temporary file and then read that file,
    or you could use popen() if it's available.

    Of course this is portable only to systems that have a Unix-style shell,
    and it can even behave differently depending on how the default shell
    behaves. And invoking a new process is going to make this relatively
    slow, which may or may not matter depending on how many times you need
    to do it.

    There is no completely portable solution, since you need to be able to
    get directory listings to handle wildcards.

    A quick Google search points to this question:

    https://stackoverflow.com/q/21335041/827263
    "How to split a string using shell-like rules in C++?"

    An answer refers to Boost.Program_options, which is specific to C++.
    Apparently boost::program_options::split_unix() does what you're looking
    for.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From ted@loft.tnolan.com (Ted Nolan@tednolan to comp.lang.c on Tue Sep 10 22:11:29 2024
    From Newsgroup: comp.lang.c

    In article <vbqcat$35kjh$1@dont-email.me>,
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    On 10.09.2024 21:01, Ted Nolan <tednolan> wrote:
    I have the case where my C program is handed a string which is basically
    a command line.

    IIUC you don't want the shell to do the expansion but, sort of,
    re-invent the wheel in your application (a'la DOS). - Okay.


    Is there a common open source C library for tokenizing and globbing
    this into an argc/argv as a shell would do? I've googled, but I get
    too much C++ & other language stuff.

    I also suppose that by "tokenizing" you don't mean something like
    strtok (3) - extract tokens from strings
    but a field separation as the Unix shell does using 'IFS'.


    More or less, and homething that understands double and single quoting
    so that a token can have white space inside. Backslash handling
    would be nice too so

    'Who\'s a good boy?'

    would work as one token.

    I don't know of a C library but if I'd want to implement a function
    that all POSIX shells do then I'd look into the shell packages...

    For Kornshell (e.g. version 93u+m) I see these files in the package
    src/lib/libast/include/glob.h
    src/lib/libast/misc/glob.c
    that obviously care about the globbing function. (I suspect you'll
    need some more supporting files from the ksh package.)

    HTH

    Janis


    Thanks, fixing up something out of shell components is probably more
    than I want to take on here though.

    Ted
    --
    columbiaclosings.com
    What's not in Columbia anymore..
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From ted@loft.tnolan.com (Ted Nolan@tednolan to comp.lang.c on Tue Sep 10 22:13:06 2024
    From Newsgroup: comp.lang.c

    In article <87ldzzyyus.fsf@nosuchdomain.example.com>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    ted@loft.tnolan.com (Ted Nolan <tednolan>) writes:
    I have the case where my C program is handed a string which is basically
    a command line.

    Is there a common open source C library for tokenizing and globbing
    this into an argc/argv as a shell would do? I've googled, but I get
    too much C++ & other language stuff.

    Note that I'm not asking for getopt(), that comes afterwards, and
    I'm not asking for any variable interpolation, but just that a string
    like, say

    hello -world "This is foo.*" foo.*

    becomes something like

    my_argv[0] "hello"
    my_argv[1] "-world"
    my_argv[2] "This is foo.*"
    my_argv[3] foo.h
    my_argv[4] foo.c
    my_argv[5] foo.txt

    my_argc = 6

    I could live without the globbing if that's a bridge too far.

    What environment(s) does this need to run in?

    I don't know of a standard(ish) function that does this. POSIX defines
    the glob() function, but it only does globbing, not word-splitting.

    If you're trying to emulate the way the shell (which one?) parses
    command lines, and *if* you're on a system that has a shell, you can
    invoke a shell to do the work for you. Here's a quick and dirty
    example:

    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>
    int main(void) {
    const char *line = "hello -world \"This is foo.*\" foo.*";
    char *cmd = malloc(50 + strlen(line));
    sprintf(cmd, "printf '%%s\n' %s", line);
    system(cmd);
    }

    This prints the arguments to stdout, one per line (and doesn't handle >arguments with embedded newlines very well). You could modify the
    command to write the output to a temporary file and then read that file,
    or you could use popen() if it's available.

    Of course this is portable only to systems that have a Unix-style shell,
    and it can even behave differently depending on how the default shell >behaves. And invoking a new process is going to make this relatively
    slow, which may or may not matter depending on how many times you need
    to do it.

    There is no completely portable solution, since you need to be able to
    get directory listings to handle wildcards.

    Yeah, that's the kind of thing I was hoping to avoid, and probably more
    than I want to get into, but thanks!


    A quick Google search points to this question:

    https://stackoverflow.com/q/21335041/827263
    "How to split a string using shell-like rules in C++?"

    An answer refers to Boost.Program_options, which is specific to C++. >Apparently boost::program_options::split_unix() does what you're looking
    for.

    --
    columbiaclosings.com
    What's not in Columbia anymore..
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.c on Wed Sep 11 01:56:27 2024
    From Newsgroup: comp.lang.c

    In article <lkbjchFebk9U1@mid.individual.net>,
    Ted Nolan <tednolan> <tednolan> wrote:
    I have the case where my C program is handed a string which is basically
    a command line.

    Is there a common open source C library for tokenizing and globbing
    this into an argc/argv as a shell would do? I've googled, but I get
    too much C++ & other language stuff.

    Note that I'm not asking for getopt(), that comes afterwards, and
    I'm not asking for any variable interpolation, but just that a string
    like, say

    Have a look at wordexp(3).
    --
    Trump has normalized hate.

    The media has normalized Trump.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From ted@loft.tnolan.com (Ted Nolan@tednolan to comp.lang.c on Wed Sep 11 02:54:32 2024
    From Newsgroup: comp.lang.c

    In article <vbqtcb$1r0i3$1@news.xmission.com>,
    Kenny McCormack <gazelle@shell.xmission.com> wrote:
    In article <lkbjchFebk9U1@mid.individual.net>,
    Ted Nolan <tednolan> <tednolan> wrote:
    I have the case where my C program is handed a string which is basically
    a command line.

    Is there a common open source C library for tokenizing and globbing
    this into an argc/argv as a shell would do? I've googled, but I get
    too much C++ & other language stuff.

    Note that I'm not asking for getopt(), that comes afterwards, and
    I'm not asking for any variable interpolation, but just that a string
    like, say

    Have a look at wordexp(3).


    Very interesting, thanks!

    Something added since lasttime I paged through section 3...
    --
    columbiaclosings.com
    What's not in Columbia anymore..
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Wed Sep 11 14:17:33 2024
    From Newsgroup: comp.lang.c

    Do you think it would make sense to switch the language ?

    #include <Windows.h>
    #include <iostream>
    #include <string_view>

    using namespace std;

    template<typename CharType, typename Consumer>
    requires requires( Consumer consumer, basic_string_view<CharType> sv ) { { consumer( sv ) }; }
    void Tokenize( basic_string_view<CharType> sv, Consumer consumer )
    {
    using sv_t = basic_string_view<CharType>;
    auto it = sv.begin();
    for( ; it != sv.end(); )
    {
    CharType end;
    typename sv_t::iterator tkBegin;
    if( *it == '\"' )
    {
    end = '\"';
    tkBegin = ++it;
    }
    else
    {
    end = ' ';
    tkBegin = it++;
    }
    for( ; it != sv.end() && *it != end; ++it );
    consumer( sv_t( tkBegin, it ) );
    if( it != sv.end() ) [[unlikely]]
    {
    while( ++it != sv.end() && *it == ' ' );
    continue;
    }
    }
    }

    int main()
    {
    LPWSTR pCmdLine = GetCommandLineW();
    size_t i = 1;
    Tokenize( wstring_view( pCmdLine ), [&]( wstring_view sv )
    {
    wcout << i++ << L": \"" << sv << L"\"" << endl;
    } );
    }
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From ted@loft.tnolan.com (Ted Nolan@tednolan to comp.lang.c on Wed Sep 11 12:22:16 2024
    From Newsgroup: comp.lang.c

    In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Do you think it would make sense to switch the language ?


    No, not an option, thanks.
    --
    columbiaclosings.com
    What's not in Columbia anymore..
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Wed Sep 11 14:28:33 2024
    From Newsgroup: comp.lang.c

    Am 11.09.2024 um 14:22 schrieb Ted Nolan <tednolan>:
    In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Do you think it would make sense to switch the language ?


    No, not an option, thanks.


    I could write a C-bridge for you.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bart@bc@freeuk.com to comp.lang.c on Wed Sep 11 13:42:00 2024
    From Newsgroup: comp.lang.c

    On 11/09/2024 13:17, Bonita Montero wrote:
    #include <Windows.h>
    #include <iostream>
    #include <string_view>

    using namespace std;

    template<typename CharType, typename Consumer>
        requires requires( Consumer consumer, basic_string_view<CharType> sv ) { { consumer( sv ) }; }
    void Tokenize( basic_string_view<CharType> sv, Consumer consumer )
    {
        using sv_t = basic_string_view<CharType>;
        auto it = sv.begin();
        for( ; it != sv.end(); )
        {
            CharType end;
            typename sv_t::iterator tkBegin;
            if( *it == '\"' )
            {
                end = '\"';
                tkBegin = ++it;
            }
            else
            {
                end = ' ';
                tkBegin = it++;
            }
            for( ; it != sv.end() && *it != end; ++it );
            consumer( sv_t( tkBegin, it ) );
            if( it != sv.end() ) [[unlikely]]
            {
                while( ++it != sv.end() && *it == ' ' );
                continue;
            }
        }
    }

    int main()
    {
        LPWSTR pCmdLine = GetCommandLineW();
        size_t i = 1;
        Tokenize( wstring_view( pCmdLine ), [&]( wstring_view sv )
            {
                wcout << i++ << L": \"" << sv << L"\"" << endl;
            } );
    }


    This doesn't do globbing (expanding non-quoted wildcard filenames into
    lists of individual filenames).

    Neither is it clear if the OP is on Windows. (Otherwise I can supply
    something in C for the globbing part. Chopping up into line into
    separate items is fairly trivial.)
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From ted@loft.tnolan.com (Ted Nolan@tednolan to comp.lang.c on Wed Sep 11 12:44:19 2024
    From Newsgroup: comp.lang.c

    In article <vbs2da$3jobe$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 11.09.2024 um 14:22 schrieb Ted Nolan <tednolan>:
    In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Do you think it would make sense to switch the language ?


    No, not an option, thanks.


    I could write a C-bridge for you.


    No, thank you.
    --
    columbiaclosings.com
    What's not in Columbia anymore..
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.c on Wed Sep 11 14:59:48 2024
    From Newsgroup: comp.lang.c

    In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Do you think it would make sense to switch the language ?

    Do you think it would make sense to pay attention to the "Newsgroups" line
    in your header before clicking "Send"?
    --
    The randomly chosen signature file that would have appeared here is more than 4 lines long. As such, it violates one or more Usenet RFCs. In order to remain in compliance with said RFCs, the actual sig can be found at the following URL:
    http://user.xmission.com/~gazelle/Sigs/IceCream
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Wed Sep 11 20:14:11 2024
    From Newsgroup: comp.lang.c

    Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
    In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Do you think it would make sense to switch the language ?

    Do you think it would make sense to pay attention to the "Newsgroups" line
    in your header before clicking "Send"?

    I just wanted to suggest a simpler language.
    Compare that with a manual implementation of the same in C.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.c on Wed Sep 11 18:17:04 2024
    From Newsgroup: comp.lang.c

    In article <vbsmlb$3o6n2$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
    In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Do you think it would make sense to switch the language ?

    Do you think it would make sense to pay attention to the "Newsgroups" line >> in your header before clicking "Send"?

    I just wanted to suggest a simpler language.
    Compare that with a manual implementation of the same in C.

    You know the rules around here, just as well as I do.
    --
    The coronavirus is the first thing, in his 74 pathetic years of existence,
    that the orange menace has come into contact with, that he couldn't browbeat, bully, bullshit, bribe, sue, legally harrass, get Daddy to fix, get his siblings to bail him out of, or, if all else fails, simply wish it away.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From ted@loft.tnolan.com (Ted Nolan@tednolan to comp.lang.c on Wed Sep 11 18:49:19 2024
    From Newsgroup: comp.lang.c

    In article <vbsmlb$3o6n2$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
    In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Do you think it would make sense to switch the language ?

    Do you think it would make sense to pay attention to the "Newsgroups" line >> in your header before clicking "Send"?

    I just wanted to suggest a simpler language.
    Compare that with a manual implementation of the same in C.


    Thanks, I appreciate that, but it does have to be C.
    --
    columbiaclosings.com
    What's not in Columbia anymore..
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bart@bc@freeuk.com to comp.lang.c on Wed Sep 11 21:19:58 2024
    From Newsgroup: comp.lang.c

    On 11/09/2024 19:14, Bonita Montero wrote:
    Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
    In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero  <Bonita.Montero@gmail.com> wrote:
    Do you think it would make sense to switch the language ?

    Do you think it would make sense to pay attention to the "Newsgroups"
    line
    in your header before clicking "Send"?

    I just wanted to suggest a simpler language.
    Compare that with a manual implementation of the same in C.


    C++ is a simpler language? You're having a laugh!

    I made a version of your code that was about 50 lines, so a higher line
    count, but was some 10% smaller in character count.

    It doesn't need 'templates', or 'basic-string-view', or 'Consumer',
    whatever that is, or iterators. This is a trivial exercise as I said.

    However, if working on Windows, there may be no need: there is already a CommandLineToArgvW function.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Wed Sep 11 14:43:35 2024
    From Newsgroup: comp.lang.c

    ted@loft.tnolan.com (Ted Nolan <tednolan>) writes:
    In article <vbsmlb$3o6n2$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
    In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Do you think it would make sense to switch the language ?

    Do you think it would make sense to pay attention to the "Newsgroups" line >>> in your header before clicking "Send"?

    I just wanted to suggest a simpler language.
    Compare that with a manual implementation of the same in C.

    Thanks, I appreciate that, but it does have to be C.

    We could help you more effectively if we understood your requirements.

    Why exactly does it have to be C?

    What system or systems do you need to support? (I asked this before and
    you didn't answer.)

    If you only care about Windows, for example, that's going to affect what solutions we can offer; likewise if you only care about POSIX-based
    systems, or only about Linux-based systems.

    It might also be useful to know more about the context. If this is for
    some specific application, what is that application intended to do, and
    why does it need to do tokenization and globbing?
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Sep 12 04:22:08 2024
    From Newsgroup: comp.lang.c

    Am 11.09.2024 um 22:19 schrieb Bart:

    C++ is a simpler language? You're having a laugh!

    The solutions are simpler because you've a fifth of the code as in C.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From ted@loft.tnolan.com (Ted Nolan@tednolan to comp.lang.c on Thu Sep 12 03:06:15 2024
    From Newsgroup: comp.lang.c

    In article <87cyl9zx14.fsf@nosuchdomain.example.com>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    ted@loft.tnolan.com (Ted Nolan <tednolan>) writes:
    In article <vbsmlb$3o6n2$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
    In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Do you think it would make sense to switch the language ?

    Do you think it would make sense to pay attention to the "Newsgroups" line >>>> in your header before clicking "Send"?

    I just wanted to suggest a simpler language.
    Compare that with a manual implementation of the same in C.

    Thanks, I appreciate that, but it does have to be C.

    We could help you more effectively if we understood your requirements.

    Why exactly does it have to be C?

    What system or systems do you need to support? (I asked this before and
    you didn't answer.)

    If you only care about Windows, for example, that's going to affect what >solutions we can offer; likewise if you only care about POSIX-based
    systems, or only about Linux-based systems.

    It might also be useful to know more about the context. If this is for
    some specific application, what is that application intended to do, and
    why does it need to do tokenization and globbing?


    This would be for work, so I am limited in what I can say about it, but
    it has to be in C because it is would be a C callout from a GT.M mumps
    process. GT.M stores the command line tail (everything it doesn't need
    to get a program running) in the special variable $ZCMDLINE which can
    be passed to a callout. I would like to parse that string as the
    shell does a command line. Basically, if it isn't a C library that
    is commonly available through Linux package managers I probably can't
    use it. In the end this is a "nice to have" and I have a q&d approach
    that I will probably use.
    --
    columbiaclosings.com
    What's not in Columbia anymore..
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Wed Sep 11 20:37:06 2024
    From Newsgroup: comp.lang.c

    ted@loft.tnolan.com (Ted Nolan <tednolan>) writes:
    In article <87cyl9zx14.fsf@nosuchdomain.example.com>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    ted@loft.tnolan.com (Ted Nolan <tednolan>) writes:
    In article <vbsmlb$3o6n2$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
    In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Do you think it would make sense to switch the language ?

    Do you think it would make sense to pay attention to the "Newsgroups" line
    in your header before clicking "Send"?

    I just wanted to suggest a simpler language.
    Compare that with a manual implementation of the same in C.

    Thanks, I appreciate that, but it does have to be C.

    We could help you more effectively if we understood your requirements.

    Why exactly does it have to be C?

    What system or systems do you need to support? (I asked this before and >>you didn't answer.)

    If you only care about Windows, for example, that's going to affect what >>solutions we can offer; likewise if you only care about POSIX-based >>systems, or only about Linux-based systems.

    It might also be useful to know more about the context. If this is for >>some specific application, what is that application intended to do, and
    why does it need to do tokenization and globbing?

    This would be for work, so I am limited in what I can say about it, but
    it has to be in C because it is would be a C callout from a GT.M mumps process. GT.M stores the command line tail (everything it doesn't need
    to get a program running) in the special variable $ZCMDLINE which can
    be passed to a callout. I would like to parse that string as the
    shell does a command line. Basically, if it isn't a C library that
    is commonly available through Linux package managers I probably can't
    use it. In the end this is a "nice to have" and I have a q&d approach
    that I will probably use.

    Since you mentioned Linux package managers, I presume this only needs to
    work on Linux-based systems, which means you can use POSIX-specific
    functions. That could have been useful to know earlier.

    And you might consider posting to comp.unix.programmer for more
    system-specific solutions.

    Earlier I suggested using system() to pass the string to the shell.
    That wouldn't work on Windows, but it should be ok for your
    requirements. There are good reasons not to want to do that, but "there
    might not be a POSIX shell available" apparently isn't one of them.

    I'd also suggest nailing down your exact requirements; "as the
    shell does" is inexact, since different shells behave differently.

    Suggested reading: https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V3_chap02.html
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From ted@loft.tnolan.com (Ted Nolan@tednolan to comp.lang.c on Thu Sep 12 03:56:09 2024
    From Newsgroup: comp.lang.c

    In article <87r09py23h.fsf@nosuchdomain.example.com>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    ted@loft.tnolan.com (Ted Nolan <tednolan>) writes:
    In article <87cyl9zx14.fsf@nosuchdomain.example.com>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    ted@loft.tnolan.com (Ted Nolan <tednolan>) writes:
    In article <vbsmlb$3o6n2$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
    In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Do you think it would make sense to switch the language ?

    Do you think it would make sense to pay attention to the "Newsgroups" line
    in your header before clicking "Send"?

    I just wanted to suggest a simpler language.
    Compare that with a manual implementation of the same in C.

    Thanks, I appreciate that, but it does have to be C.

    We could help you more effectively if we understood your requirements.

    Why exactly does it have to be C?

    What system or systems do you need to support? (I asked this before and >>>you didn't answer.)

    If you only care about Windows, for example, that's going to affect what >>>solutions we can offer; likewise if you only care about POSIX-based >>>systems, or only about Linux-based systems.

    It might also be useful to know more about the context. If this is for >>>some specific application, what is that application intended to do, and >>>why does it need to do tokenization and globbing?

    This would be for work, so I am limited in what I can say about it, but
    it has to be in C because it is would be a C callout from a GT.M mumps
    process. GT.M stores the command line tail (everything it doesn't need
    to get a program running) in the special variable $ZCMDLINE which can
    be passed to a callout. I would like to parse that string as the
    shell does a command line. Basically, if it isn't a C library that
    is commonly available through Linux package managers I probably can't
    use it. In the end this is a "nice to have" and I have a q&d approach
    that I will probably use.

    Since you mentioned Linux package managers, I presume this only needs to
    work on Linux-based systems, which means you can use POSIX-specific >functions. That could have been useful to know earlier.

    And you might consider posting to comp.unix.programmer for more >system-specific solutions.

    Earlier I suggested using system() to pass the string to the shell.
    That wouldn't work on Windows, but it should be ok for your
    requirements. There are good reasons not to want to do that, but "there >might not be a POSIX shell available" apparently isn't one of them.

    I'd also suggest nailing down your exact requirements; "as the
    shell does" is inexact, since different shells behave differently.

    Suggested reading: >https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V3_chap02.html

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    Thank you. system() would not work as I don't want to execute
    anything, just parse into an argv-like array.

    I appreciate the responses, but it looks like I will be staying with
    my q&d approach for now.
    --
    columbiaclosings.com
    What's not in Columbia anymore..
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.c on Thu Sep 12 04:14:05 2024
    From Newsgroup: comp.lang.c

    On 12 Sep 2024 03:06:15 GMT, Ted Nolan <tednolan> wrote:

    GT.M stores the command line tail (everything it doesn't need
    to get a program running) in the special variable $ZCMDLINE which can be passed to a callout.

    What, all the arguments smooshed together into a single string?

    That’s a dumb way to do it.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ben Bacarisse@ben@bsb.me.uk to comp.lang.c on Thu Sep 12 10:43:52 2024
    From Newsgroup: comp.lang.c

    ted@loft.tnolan.com (Ted Nolan <tednolan>) writes:

    In article <87cyl9zx14.fsf@nosuchdomain.example.com>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    ted@loft.tnolan.com (Ted Nolan <tednolan>) writes:
    In article <vbsmlb$3o6n2$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
    In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Do you think it would make sense to switch the language ?

    Do you think it would make sense to pay attention to the "Newsgroups" line
    in your header before clicking "Send"?

    I just wanted to suggest a simpler language.
    Compare that with a manual implementation of the same in C.

    Thanks, I appreciate that, but it does have to be C.

    We could help you more effectively if we understood your requirements.

    Why exactly does it have to be C?

    What system or systems do you need to support? (I asked this before and >>you didn't answer.)

    If you only care about Windows, for example, that's going to affect what >>solutions we can offer; likewise if you only care about POSIX-based >>systems, or only about Linux-based systems.

    It might also be useful to know more about the context. If this is for >>some specific application, what is that application intended to do, and
    why does it need to do tokenization and globbing?


    This would be for work, so I am limited in what I can say about it, but
    it has to be in C because it is would be a C callout from a GT.M mumps process. GT.M stores the command line tail (everything it doesn't need
    to get a program running) in the special variable $ZCMDLINE which can
    be passed to a callout. I would like to parse that string as the
    shell does a command line. Basically, if it isn't a C library that
    is commonly available through Linux package managers I probably can't
    use it. In the end this is a "nice to have" and I have a q&d approach
    that I will probably use.

    If it were down to me I'd do the word splitting "by hand" and use POSIX
    glob(3) to do the file expansion.

    For the word splitting, the key would be to know where these strings
    come from and what is really needed. That would enable you to pick a
    syntax that makes sense for your particular use-case. For example, if
    the string are typed by people, I wouldn't use the typical shell
    quoting. I would not want anyone (other than technical Unix users) to
    have to type

    'He said "you can'"'""t"

    You might get away with a very simple word splitting algorithm.
    --
    Ben.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bart@bc@freeuk.com to comp.lang.c on Thu Sep 12 12:29:26 2024
    From Newsgroup: comp.lang.c

    On 12/09/2024 03:22, Bonita Montero wrote:
    Am 11.09.2024 um 22:19 schrieb Bart:

    C++ is a simpler language? You're having a laugh!

    The solutions are simpler because you've a fifth of the code as in C.

    In this case, it actually needed somewhat more code, even if the line
    count was half.

    But your solutions are always incomprehensible because they strive for
    the most advanced features possible.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.c on Thu Sep 12 12:13:55 2024
    From Newsgroup: comp.lang.c

    In article <vbujak$733i$3@dont-email.me>, Bart <bc@freeuk.com> wrote:
    On 12/09/2024 03:22, Bonita Montero wrote:
    Am 11.09.2024 um 22:19 schrieb Bart:

    C++ is a simpler language? You're having a laugh!

    The solutions are simpler because you've a fifth of the code as in C.

    In this case, it actually needed somewhat more code, even if the line >count was half.

    But your solutions are always incomprehensible because they strive for
    the most advanced features possible.

    And, of course, totally off-topic.

    Maybe I should start posting Fortran "solutions".

    Or maybe Haskell?

    Or Intercal?
    --
    Mike Huckabee has yet to consciously uncouple from Josh Duggar.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Thu Sep 12 14:20:01 2024
    From Newsgroup: comp.lang.c

    On 12.09.2024 13:29, Bart wrote:
    On 12/09/2024 03:22, Bonita Montero wrote:
    Am 11.09.2024 um 22:19 schrieb Bart:

    C++ is a simpler language? You're having a laugh!

    The solutions are simpler because you've a fifth of the code as in C.

    In this case, it actually needed somewhat more code, even if the line
    count was half.

    But your solutions are always incomprehensible because they strive for
    the most advanced features possible.

    I don't know of the other poster's solutions. But a quick browse seems
    to show nothing incomprehensible or anything that should be difficult
    to understand. (YMMV; especially if you're not familiar with C++ then
    I'm sure the code may look like noise to you.)

    In the given context of C and C++ I've always perceived the features
    of C++ to add to comprehensibility of source code where the respective
    C code required writing clumsy code and needed (unnecessary) syntactic
    ballast to implement similar functions and program constructs.

    Your undifferentiated complaint sounds more like someone not willing
    to understand the other concepts or have a reluctance or laziness to
    make yourself familiar with them.

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Thu Sep 12 14:24:37 2024
    From Newsgroup: comp.lang.c

    On 12.09.2024 14:13, Kenny McCormack wrote:

    Maybe I should start posting Fortran "solutions".

    Or maybe Haskell?

    Or Intercal?

    The latter might certainly be enlightening. I had always problems
    to write such code. And seeing functional code would help. - But
    it's off-topic as you say. Less off-topic are (IMO) C++ solutions
    in contrast to C; C++ has a C base and C appears to me to advance
    "with an eye on" C++.

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.c on Thu Sep 12 13:20:14 2024
    From Newsgroup: comp.lang.c

    In article <vbumi6$8ipp$1@dont-email.me>,
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    On 12.09.2024 14:13, Kenny McCormack wrote:

    Maybe I should start posting Fortran "solutions".

    Or maybe Haskell?

    Or Intercal?

    The latter might certainly be enlightening. I had always problems
    to write such code. And seeing functional code would help. - But
    it's off-topic as you say. Less off-topic are (IMO) C++ solutions
    in contrast to C; C++ has a C base and C appears to me to advance
    "with an eye on" C++.

    It's not me saying this. I am just repeating the CLC party line.

    Ask Leader Keith. He'll tell you.

    It has always been CLC policy that C++ is just as off-topic as Fortran or
    C# or any other language (other than C, of course). And, of course, that
    being "off topic" is the highest and most unforgivable sin.

    Just ask Leader Keith. He'll tell you.

    Leader Keith will tell you that we are not here to solve problems or to
    discuss programming techniques. We are here to debate minutiae of the
    various standards documents.
    --
    Elect a clown, expect a circus.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.c on Thu Sep 12 13:22:48 2024
    From Newsgroup: comp.lang.c

    In article <lkf72pFd61U1@mid.individual.net>,
    Ted Nolan <tednolan> <tednolan> wrote:
    ...
    Thank you. system() would not work as I don't want to execute
    anything, just parse into an argv-like array.

    I appreciate the responses, but it looks like I will be staying with
    my q&d approach for now.

    This is a "solved problem". Or, to put it another way, if wordexp(3) is
    not the solution, then there is no general solution (and that means, yes, you'll have to "roll your own", as many here have suggested you do).

    columbiaclosings.com
    What's not in Columbia anymore..

    Which Columbia are we talking about here? And why?
    --
    Mike Huckabee has yet to consciously uncouple from Josh Duggar.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bart@bc@freeuk.com to comp.lang.c on Thu Sep 12 14:44:03 2024
    From Newsgroup: comp.lang.c

    On 12/09/2024 13:20, Janis Papanagnou wrote:
    On 12.09.2024 13:29, Bart wrote:
    On 12/09/2024 03:22, Bonita Montero wrote:
    Am 11.09.2024 um 22:19 schrieb Bart:

    C++ is a simpler language? You're having a laugh!

    The solutions are simpler because you've a fifth of the code as in C.

    In this case, it actually needed somewhat more code, even if the line
    count was half.

    But your solutions are always incomprehensible because they strive for
    the most advanced features possible.

    I don't know of the other poster's solutions. But a quick browse seems
    to show nothing incomprehensible or anything that should be difficult
    to understand. (YMMV; especially if you're not familiar with C++ then
    I'm sure the code may look like noise to you.)

    In the given context of C and C++ I've always perceived the features
    of C++ to add to comprehensibility of source code where the respective
    C code required writing clumsy code and needed (unnecessary) syntactic ballast to implement similar functions and program constructs.

    Your undifferentiated complaint sounds more like someone not willing
    to understand the other concepts or have a reluctance or laziness to
    make yourself familiar with them.

    I'm saying it's not necessary to use such advanced features to do some
    trivial parsing.

    I've given a C solution below. (To test outside of Windows, remove
    windows.h and set cmdline to any string containing a test input or use a
    local function to get the program's command line as one string.)

    It uses no special features. Anybody can understand such code. Anybody
    can port it to another language far more easily than the C++. (Actually
    I wrote it first in my language then ported it to C. I only needed to do
    1- to 0-based conversion.)

    There are two things missing compared with the C++ (other than it uses
    UTF8 strings):

    * Individual parameters are capped in length (to 1023 chars here). This
    can be solved by determining only the span of the item then working from
    that.

    * Handling an unknown number of parameters is not automatic:

    For the latter, the example uses a fixed array size. For a dynamic array
    size, call 'strtoargs' with a count of 0 to first determine the number
    of args, then allocate an array and call again to populate it.


    -------------------------------------------
    #include <windows.h>
    #include <stdio.h>
    #include <string.h>

    int strtoargs(char* cmd, char** dest, int count) {
    enum {ilen=1024};
    char item[ilen];
    int n=0, length, c;
    char *p=cmd, *q, *end=&item[ilen-1];

    while (c=*p++) {
    if (c==' ' || c=='\t')
    continue;
    else if (c=='"') {
    length=0;
    q=item;

    while (c=*p++, c!='"') {
    if (c==0) {
    --p;
    break;
    } else {
    if (q<end) *q++ = c;
    }
    }
    goto store;
    } else {
    length=0;
    q=item;
    --p;

    while (c=*p++, c!=' ' && c!='\t') {
    if (c==0) {
    --p;
    break;
    } else {
    if (q<end) *q++ = c;
    }
    }

    store: *q=0;
    ++n;
    if (n<=count) dest[n-1]=strdup(item);
    }
    }
    return n;
    }

    int main(void) {
    char* cmdline;
    enum {cap=30};
    char* args[cap];
    int n;

    cmdline = GetCommandLineA();

    n=strtoargs(cmdline, args, cap);

    for (int i=0; i<n; ++i) {
    if (i<cap)
    printf("%d %s\n", i, args[i]);
    else
    printf("%d <overflow>\n", i);
    }
    }
    -------------------------------------------

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From ted@loft.tnolan.com (Ted Nolan@tednolan to comp.lang.c on Thu Sep 12 13:50:38 2024
    From Newsgroup: comp.lang.c

    In article <vbupv8$1t2d8$2@news.xmission.com>,
    Kenny McCormack <gazelle@shell.xmission.com> wrote:
    In article <lkf72pFd61U1@mid.individual.net>,
    Ted Nolan <tednolan> <tednolan> wrote:
    ...
    Thank you. system() would not work as I don't want to execute
    anything, just parse into an argv-like array.

    I appreciate the responses, but it looks like I will be staying with
    my q&d approach for now.

    This is a "solved problem". Or, to put it another way, if wordexp(3) is
    not the solution, then there is no general solution (and that means, yes, >you'll have to "roll your own", as many here have suggested you do).

    columbiaclosings.com
    What's not in Columbia anymore..

    Which Columbia are we talking about here? And why?


    SC. It keeps me busy.
    --
    columbiaclosings.com
    What's not in Columbia anymore..
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Sep 12 15:58:44 2024
    From Newsgroup: comp.lang.c

    Am 12.09.2024 um 13:29 schrieb Bart:
    On 12/09/2024 03:22, Bonita Montero wrote:
    Am 11.09.2024 um 22:19 schrieb Bart:

    C++ is a simpler language? You're having a laugh!

    The solutions are simpler because you've a fifth of the code as in C.

    In this case, it actually needed somewhat more code, even if the line
    count was half.

    But your solutions are always incomprehensible because they strive for
    the most advanced features possible.

    I don't use the newest feature to end in itself. I'm using this features because they make sense. F.e. I use a lot of functional programming to
    prevent vectors of return values. Instead I directly hand the data to
    a callback. That's more efficient and more convenient.
    And I use concepts to have meaningful errors when type-properties are
    met. Maybe you think it's better to live with the errors from inside
    a templated function; I think the errors which say which part of a
    concept isn't met are more readable.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Sep 12 16:00:07 2024
    From Newsgroup: comp.lang.c

    Am 12.09.2024 um 14:13 schrieb Kenny McCormack:
    In article <vbujak$733i$3@dont-email.me>, Bart <bc@freeuk.com> wrote:
    On 12/09/2024 03:22, Bonita Montero wrote:
    Am 11.09.2024 um 22:19 schrieb Bart:

    C++ is a simpler language? You're having a laugh!

    The solutions are simpler because you've a fifth of the code as in C.

    In this case, it actually needed somewhat more code, even if the line
    count was half.

    But your solutions are always incomprehensible because they strive for
    the most advanced features possible.

    And, of course, totally off-topic.

    Maybe I should start posting Fortran "solutions".

    Or maybe Haskell?
    Or Intercal?

    Maybe Rust, that would fit like C++ because it's also a systems
    programming language with some capabilities like C or C++. Haskell
    would share much less properties.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Sep 12 16:01:17 2024
    From Newsgroup: comp.lang.c

    Am 12.09.2024 um 15:20 schrieb Kenny McCormack:

    It has always been CLC policy that C++ is just as off-topic as Fortran or
    C# or any other language (other than C, of course). And, of course, that being "off topic" is the highest and most unforgivable sin.

    A switch to C++ is much more likely than to Fortran.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Sep 12 16:04:12 2024
    From Newsgroup: comp.lang.c

    Am 12.09.2024 um 14:20 schrieb Janis Papanagnou:

    I don't know of the other poster's solutions. But a quick browse seems
    to show nothing incomprehensible or anything that should be difficult
    to understand. (YMMV; especially if you're not familiar with C++ then
    I'm sure the code may look like noise to you.)

    C++ shared a property with C: The language facilties are mostly that
    simple that it's easy to roughly imagine the resulting code. So C++
    can be written with the same mindset.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.c on Thu Sep 12 14:07:46 2024
    From Newsgroup: comp.lang.c

    In article <vbus74$9k96$3@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 12.09.2024 um 15:20 schrieb Kenny McCormack:

    It has always been CLC policy that C++ is just as off-topic as Fortran or
    C# or any other language (other than C, of course). And, of course, that
    being "off topic" is the highest and most unforgivable sin.

    A switch to C++ is much more likely than to Fortran.

    Doesn't matter. I'm talking policy, not personal feelings.
    --
    "Only a genius could lose a billion dollars running a casino."
    "You know what they say: the house always loses."
    "When life gives you lemons, don't pay taxes."
    "Grab 'em by the p***y!"
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Sep 12 16:14:41 2024
    From Newsgroup: comp.lang.c

    Am 12.09.2024 um 16:07 schrieb Kenny McCormack:
    In article <vbus74$9k96$3@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 12.09.2024 um 15:20 schrieb Kenny McCormack:

    It has always been CLC policy that C++ is just as off-topic as Fortran or >>> C# or any other language (other than C, of course). And, of course, that >>> being "off topic" is the highest and most unforgivable sin.

    A switch to C++ is much more likely than to Fortran.

    Doesn't matter. I'm talking policy, not personal feelings.

    C and C++ are programmed with the same mindset.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bart@bc@freeuk.com to comp.lang.c on Thu Sep 12 15:16:02 2024
    From Newsgroup: comp.lang.c

    On 12/09/2024 14:44, Bart wrote:
    On 12/09/2024 13:20, Janis Papanagnou wrote:
    On 12.09.2024 13:29, Bart wrote:
    On 12/09/2024 03:22, Bonita Montero wrote:
    Am 11.09.2024 um 22:19 schrieb Bart:

    C++ is a simpler language? You're having a laugh!

    The solutions are simpler because you've a fifth of the code as in C.

    In this case, it actually needed somewhat more code, even if the line
    count was half.

    But your solutions are always incomprehensible because they strive for
    the most advanced features possible.

    I don't know of the other poster's solutions. But a quick browse seems
    to show nothing incomprehensible or anything that should be difficult
    to understand. (YMMV; especially if you're not familiar with C++ then
    I'm sure the code may look like noise to you.)

    In the given context of C and C++ I've always perceived the features
    of C++ to add to comprehensibility of source code where the respective
    C code required writing clumsy code and needed (unnecessary) syntactic
    ballast to implement similar functions and program constructs.

    Your undifferentiated complaint sounds more like someone not willing
    to understand the other concepts or have a reluctance or laziness to
    make yourself familiar with them.

    I'm saying it's not necessary to use such advanced features to do some trivial parsing.

    I've given a C solution below.

    BTW here are the sources sizes for the tokeniser function. (For C++ I've included the 'using' statement.)

    Spaces Hard tabs

    C++ 829 682 characters
    C 959 634
    M 785 548 (My original of the C version)

    So my C version is actually smaller than the C++ when using hard tabs.

    In any case, the C++ is not significantly smaller than the C, and
    certainly not a fifth the size.

    For proper higher level solutions in different languages, below is one
    of mine. That function is 107 bytes with hard tabs.

    (It's not possible to just split the string on white space because of
    quoted items with embedded spaces.)

    -------------------------------
    func strtoargs(cmdline)=
    args::=()
    sreadln(cmdline)

    while k:=sread("n") do
    args &:= k
    od
    args
    end

    println strtoargs(getcommandlinea())


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Sep 12 16:23:58 2024
    From Newsgroup: comp.lang.c

    Am 12.09.2024 um 16:16 schrieb Bart:

    So my C version is actually smaller than the C++ when using hard tabs.

    Did you really do your own parsing ? And your own filename-expansion ?


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Sep 12 17:08:24 2024
    From Newsgroup: comp.lang.c

    Am 10.09.2024 um 21:01 schrieb Ted Nolan <tednolan>:

    I have the case where my C program is handed a string which is basically
    a command line.

    I tried to experiment with that with /proc/<pid>/cmdline. The first
    problem was that the arguments aren't space delimited, but broken up
    with zeroes. The second problem was the cmdline-file doesn't contain
    the original commandline but with expanded files.
    This was my code so far:

    #include <iostream>
    #include <fstream>
    #include <sstream>
    #include <algorithm>
    #include <unistd.h>

    using namespace std;

    int main()
    {
    pid_t pid = getpid();
    string cmdLineFile( (ostringstream() << "/proc/" << pid << "/cmdline").str() );
    ifstream ifs;
    ifs.exceptions( ifstream::failbit | ifstream::badbit );
    ifs.open( cmdLineFile );
    string fullCmdLine;
    ifs >> fullCmdLine;
    ifs.close();
    replace( fullCmdLine.begin(), fullCmdLine.end(), (char)0, (char)' ' );
    cout << fullCmdLine << endl;
    }
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Thu Sep 12 18:16:25 2024
    From Newsgroup: comp.lang.c

    On Thu, 12 Sep 2024 14:44:03 +0100
    Bart <bc@freeuk.com> wrote:

    On 12/09/2024 13:20, Janis Papanagnou wrote:
    On 12.09.2024 13:29, Bart wrote:
    On 12/09/2024 03:22, Bonita Montero wrote:
    Am 11.09.2024 um 22:19 schrieb Bart:

    C++ is a simpler language? You're having a laugh!

    The solutions are simpler because you've a fifth of the code as
    in C.

    In this case, it actually needed somewhat more code, even if the
    line
    count was half.

    But your solutions are always incomprehensible because they strive
    for the most advanced features possible.

    I don't know of the other poster's solutions. But a quick browse
    seems to show nothing incomprehensible or anything that should be
    difficult to understand. (YMMV; especially if you're not familiar
    with C++ then I'm sure the code may look like noise to you.)

    In the given context of C and C++ I've always perceived the features
    of C++ to add to comprehensibility of source code where the
    respective C code required writing clumsy code and needed
    (unnecessary) syntactic ballast to implement similar functions and
    program constructs.

    Your undifferentiated complaint sounds more like someone not willing
    to understand the other concepts or have a reluctance or laziness to
    make yourself familiar with them.

    I'm saying it's not necessary to use such advanced features to do
    some trivial parsing.

    I've given a C solution below. (To test outside of Windows, remove
    windows.h and set cmdline to any string containing a test input or
    use a local function to get the program's command line as one string.)

    It uses no special features. Anybody can understand such code.
    Anybody can port it to another language far more easily than the C++. (Actually I wrote it first in my language then ported it to C. I only
    needed to do 1- to 0-based conversion.)

    There are two things missing compared with the C++ (other than it
    uses UTF8 strings):

    * Individual parameters are capped in length (to 1023 chars here).
    This can be solved by determining only the span of the item then
    working from that.

    * Handling an unknown number of parameters is not automatic:

    For the latter, the example uses a fixed array size. For a dynamic
    array size, call 'strtoargs' with a count of 0 to first determine the
    number of args, then allocate an array and call again to populate it.


    -------------------------------------------
    #include <windows.h>
    #include <stdio.h>
    #include <string.h>

    int strtoargs(char* cmd, char** dest, int count) {
    enum {ilen=1024};
    char item[ilen];
    int n=0, length, c;
    char *p=cmd, *q, *end=&item[ilen-1];

    while (c=*p++) {
    if (c==' ' || c=='\t')
    continue;
    else if (c=='"') {
    length=0;
    q=item;

    while (c=*p++, c!='"') {
    if (c==0) {
    --p;
    break;
    } else {
    if (q<end) *q++ = c;
    }
    }
    goto store;
    } else {
    length=0;
    q=item;
    --p;

    while (c=*p++, c!=' ' && c!='\t') {
    if (c==0) {
    --p;
    break;
    } else {
    if (q<end) *q++ = c;
    }
    }

    store: *q=0;
    ++n;
    if (n<=count) dest[n-1]=strdup(item);
    }
    }
    return n;
    }

    int main(void) {
    char* cmdline;
    enum {cap=30};
    char* args[cap];
    int n;

    cmdline = GetCommandLineA();

    n=strtoargs(cmdline, args, cap);

    for (int i=0; i<n; ++i) {
    if (i<cap)
    printf("%d %s\n", i, args[i]);
    else
    printf("%d <overflow>\n", i);
    }
    }
    -------------------------------------------


    Apart from unnecessary ilen limit, of unnecessary goto into block (I
    have nothing against forward gotos out of blocks, but gotos into blocks
    make me nervous) and of variable 'length' that serves no purpose, your
    code simply does not fulfill requirements of OP.
    I can immediately see two gotchas: no handling of escaped double
    quotation marks \" and no handling of single quotation marks. Quite
    possibly there are additional omissions.









    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.c on Thu Sep 12 15:23:02 2024
    From Newsgroup: comp.lang.c

    In article <vbv04v$aci0$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 10.09.2024 um 21:01 schrieb Ted Nolan <tednolan>:

    I have the case where my C program is handed a string which is basically
    a command line.

    I tried to experiment with that with /proc/<pid>/cmdline. The first
    problem was that the arguments aren't space delimited, but broken up
    with zeroes. The second problem was the cmdline-file doesn't contain
    the original commandline but with expanded files.

    More OT b***s***.

    Take it somewhere else.
    --
    Q: How much do dead batteries cost?

    A: Nothing. They are free of charge.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Sep 12 17:24:16 2024
    From Newsgroup: comp.lang.c

    Am 12.09.2024 um 17:23 schrieb Kenny McCormack:
    In article <vbv04v$aci0$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 10.09.2024 um 21:01 schrieb Ted Nolan <tednolan>:

    I have the case where my C program is handed a string which is basically >>> a command line.

    I tried to experiment with that with /proc/<pid>/cmdline. The first
    problem was that the arguments aren't space delimited, but broken up
    with zeroes. The second problem was the cmdline-file doesn't contain
    the original commandline but with expanded files.

    More OT b***s***.

    The problem would be the same in C.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Thu Sep 12 17:30:26 2024
    From Newsgroup: comp.lang.c

    On 12.09.2024 16:04, Bonita Montero wrote:
    Am 12.09.2024 um 14:20 schrieb Janis Papanagnou:

    I don't know of the other poster's solutions. But a quick browse seems
    to show nothing incomprehensible or anything that should be difficult
    to understand. (YMMV; especially if you're not familiar with C++ then
    I'm sure the code may look like noise to you.)

    C++ shared a property with C: The language facilties are mostly that
    simple that it's easy to roughly imagine the resulting code. So C++
    can be written with the same mindset.

    Not only "roughly imagine"; I think the imperative languages have
    so many common basic concepts that you can have a quite good idea,
    especially if you know more than just two or three such languages.

    But there are features, even basic ones, that are not existing in
    "C" thus making especially folks who are focused to some specific
    restricted or poorer language(s) obviously get confused.

    Yes, C++ can be written with a "C" mindset. But this is nothing
    I'd suggest. Better make yourself familiar with the new concepts
    (OO, genericity, or even simple things like references). - IMO.

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Thu Sep 12 15:37:33 2024
    From Newsgroup: comp.lang.c

    Michael S <already5chosen@yahoo.com> writes:
    On Thu, 12 Sep 2024 14:44:03 +0100
    Bart <bc@freeuk.com> wrote:

    On 12/09/2024 13:20, Janis Papanagnou wrote:

    * Handling an unknown number of parameters is not automatic:

    For the latter, the example uses a fixed array size. For a dynamic
    array size, call 'strtoargs' with a count of 0 to first determine the
    number of args, then allocate an array and call again to populate it.


    -------------------------------------------
    #include <windows.h>
    #include <stdio.h>
    #include <string.h>

    int strtoargs(char* cmd, char** dest, int count) {
    enum {ilen=1024};
    char item[ilen];
    int n=0, length, c;
    char *p=cmd, *q, *end=&item[ilen-1];

    while (c=*p++) {
    if (c==' ' || c=='\t')
    continue;
    else if (c=='"') {
    length=0;
    q=item;

    while (c=*p++, c!='"') {
    if (c==0) {
    --p;
    break;
    } else {
    if (q<end) *q++ = c;
    }
    }
    goto store;
    } else {
    length=0;
    q=item;
    --p;

    while (c=*p++, c!=' ' && c!='\t') {
    if (c==0) {
    --p;
    break;
    } else {
    if (q<end) *q++ = c;
    }
    }

    store: *q=0;
    ++n;
    if (n<=count) dest[n-1]=strdup(item);
    }
    }
    return n;
    }

    int main(void) {
    char* cmdline;
    enum {cap=30};
    char* args[cap];
    int n;

    cmdline = GetCommandLineA();

    n=strtoargs(cmdline, args, cap);

    for (int i=0; i<n; ++i) {
    if (i<cap)
    printf("%d %s\n", i, args[i]);
    else
    printf("%d <overflow>\n", i);
    }
    }
    -------------------------------------------


    Apart from unnecessary ilen limit, of unnecessary goto into block (I
    have nothing against forward gotos out of blocks, but gotos into blocks
    make me nervous) and of variable 'length' that serves no purpose, your
    code simply does not fulfill requirements of OP.
    I can immediately see two gotchas: no handling of escaped double
    quotation marks \" and no handling of single quotation marks. Quite
    possibly there are additional omissions.

    /*
    * For most commands, we'll split the rest of the line into
    * individual arguments, separated by whitespace. However,
    * some commands may wish to process the entire remainder of
    * the line as a single argument. Those commands will set the
    * ce_splitargs field to zero in the command table.
    */
    if (cep->ce_splitargs) {
    argcount = 0;
    cp = line;
    while (*cp != '\0') {
    if (argcount == MAX_ARGCOUNT) {
    fprintf(stdout,
    "Error: More than %d arguments unsupported\n",
    MAX_ARGCOUNT);
    return 1;
    }
    while (*cp != '\0' && isspace(*cp)) cp++;
    if (*cp == '\0') continue;
    if (*cp == '"') {
    in_quote = true;
    cp++;
    }
    arglist[argcount++] = cp;
    if (in_quote) {
    while (*cp != '\0' && *cp != '"') cp++;
    in_quote = false;
    } else {
    while (*cp != '\0' && !isspace(*cp)) cp++;
    }
    if (*cp == '\0') continue;
    *cp++ = '\0';
    }
    } else {
    arglist[0] = command;
    arglist[1] = line;
    argcount = 2;
    }
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Thu Sep 12 17:40:17 2024
    From Newsgroup: comp.lang.c

    On 12.09.2024 16:14, Bonita Montero wrote:

    C and C++ are programmed with the same mindset.

    Careful! This is depending on the background and experiences of the
    respective programmer(s). If you come, say, from Simula you'd most
    likely have another (OOP) perspective than if you'd come from "C".

    Programming C++ with only a "C" mindset I'd not consider advisable.
    That's what I've generally observed; with sole knowledge of X there
    seems to be an impetus and preference to infer those techniques to
    programming in Y. A lot of early C++ programs I've seen were just,
    umm, "enhanced" "C" programs.

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Thu Sep 12 17:46:44 2024
    From Newsgroup: comp.lang.c

    On 12.09.2024 16:16, Bart wrote:
    Spaces Hard tabs

    C++ 829 682 characters

    You are counting spaces, tabs and characters to characterize programs'
    quality or legibility or what? - Abandon all hope ye who enter here.

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Sep 12 17:47:23 2024
    From Newsgroup: comp.lang.c

    Am 12.09.2024 um 17:30 schrieb Janis Papanagnou:

    Not only "roughly imagine"; I think the imperative languages have
    so many common basic concepts that you can have a quite good idea,
    especially if you know more than just two or three such languages.

    Then tell me which lanuage a) has this kind of mostly minimized language-facilities and b) you can layout data structures 1:1
    like they fit into memory (platform-dependent).

    Yes, C++ can be written with a "C" mindset. But this is nothing
    I'd suggest. Better make yourself familiar with the new concepts
    (OO, genericity, or even simple things like references). - IMO.

    I'm using mostly all new features as you can see from my code.
    But the mindset is still the same.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Sep 12 17:48:25 2024
    From Newsgroup: comp.lang.c

    Am 12.09.2024 um 17:40 schrieb Janis Papanagnou:

    Programming C++ with only a "C" mindset I'd not consider advisable.
    That's what I've generally observed; with sole knowledge of X there
    seems to be an impetus and preference to infer those techniques to programming in Y. A lot of early C++ programs I've seen were just,
    umm, "enhanced" "C" programs.

    I'm using most new language facilities, but the mindset is still the same.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Thu Sep 12 18:49:11 2024
    From Newsgroup: comp.lang.c

    On Thu, 12 Sep 2024 15:37:33 GMT
    scott@slp53.sl.home (Scott Lurndal) wrote:

    <snip code from unidentified source>

    Huh?

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Thu Sep 12 17:56:58 2024
    From Newsgroup: comp.lang.c

    On 12.09.2024 17:47, Bonita Montero wrote:
    Am 12.09.2024 um 17:30 schrieb Janis Papanagnou:

    Not only "roughly imagine"; I think the imperative languages have
    so many common basic concepts that you can have a quite good idea,
    especially if you know more than just two or three such languages.

    Then tell me which lanuage a) has this kind of mostly minimized language-facilities and b) you can layout data structures 1:1
    like they fit into memory (platform-dependent).

    Don't know what you're trying to say here or what it is you aim
    at. If you think it's worth discussing please elaborate.


    Yes, C++ can be written with a "C" mindset. But this is nothing
    I'd suggest. Better make yourself familiar with the new concepts
    (OO, genericity, or even simple things like references). - IMO.

    I'm using mostly all new features as you can see from my code.
    But the mindset is still the same.

    I don't know you or your background or much of your programming.
    So please understand that I'm not inclined to make any comments
    about you or your code; this would be all speculative and not
    contribute anything to the discussion. If you had the impression
    that what I said was referring to you personally you were wrong.

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Thu Sep 12 17:59:01 2024
    From Newsgroup: comp.lang.c

    On 12.09.2024 17:48, Bonita Montero wrote:
    Am 12.09.2024 um 17:40 schrieb Janis Papanagnou:

    Programming C++ with only a "C" mindset I'd not consider advisable.
    That's what I've generally observed; with sole knowledge of X there
    seems to be an impetus and preference to infer those techniques to
    programming in Y. A lot of early C++ programs I've seen were just,
    umm, "enhanced" "C" programs.

    I'm using most new language facilities, but the mindset is still the same.

    You already said that in your previous posting. See my reply in my
    response to that post.

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Sep 12 18:09:43 2024
    From Newsgroup: comp.lang.c

    Am 12.09.2024 um 17:56 schrieb Janis Papanagnou:

    On 12.09.2024 17:47, Bonita Montero wrote:

    Then tell me which lanuage a) has this kind of mostly minimized
    language-facilities and b) you can layout data structures 1:1
    like they fit into memory (platform-dependent).

    Don't know what you're trying to say here or what it is you aim
    at. If you think it's worth discussing please elaborate.

    for 95% of C's lanuage facilties it's easy to imagine which computation
    steps on an ISA-level are processed. For C that's also easy two answer.
    In both langugages it's easy to layout data structures as they could
    be hex-dumped with a debugger. The combination of both features may
    not hold true for a lot of other languages.


    I'm using mostly all new features as you can see from my code.
    But the mindset is still the same.

    I don't know you or your background or much of your programming.
    So please understand that I'm not inclined to make any comments
    about you or your code; this would be all speculative and not
    contribute anything to the discussion. If you had the impression
    that what I said was referring to you personally you were wrong.

    I just wanted to say that the kind of thinking is the same in C
    and C++.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bart@bc@freeuk.com to comp.lang.c on Thu Sep 12 17:15:35 2024
    From Newsgroup: comp.lang.c

    On 12/09/2024 16:46, Janis Papanagnou wrote:
    On 12.09.2024 16:16, Bart wrote:
    Spaces Hard tabs

    C++ 829 682 characters

    You are counting spaces, tabs and characters to characterize programs' quality or legibility or what? - Abandon all hope ye who enter here.

    I'm counting the number of characters needed to express the function.
    Since one of BM's claims is that the C++ example was smaller than C.

    The difference between the two columns is whether indentation uses hard
    tabs or spaces. The C version is more deeply indentated so that makes a difference. (Also the width of the tabs, but everything was measured
    with tabs set to 4 characters.)

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Sep 12 18:26:48 2024
    From Newsgroup: comp.lang.c

    Am 12.09.2024 um 18:15 schrieb Bart:

    I'm counting the number of characters needed to express the function.
    Since one of BM's claims is that the C++ example was smaller than C.

    That was a general statement about C++ and not on my code. I'm using abstractions like iterators for the benefit of bounds-checking while
    debugging but the code is similar.
    But usually you write in C++ a fifth or less code. Look what s simple vector<T>::emplace_back() saves work over manually relocating a complex vector-like data structure in C. Or consider how convenient a map or unordered_map is over f.e. sth. like with libavl.
    This stupid character-counting from a simple example shows that Bart
    has no professional C++ experience.

    The difference between the two columns is whether indentation uses hard
    tabs or spaces. The C version is more deeply indentated so that makes a difference. (Also the width of the tabs, but everything was measured
    with tabs set to 4 characters.)

    That's as ridiculous as Barts's discussion.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bart@bc@freeuk.com to comp.lang.c on Thu Sep 12 17:28:27 2024
    From Newsgroup: comp.lang.c

    On 12/09/2024 16:16, Michael S wrote:
    On Thu, 12 Sep 2024 14:44:03 +0100
    Bart <bc@freeuk.com> wrote:

    Apart from unnecessary ilen limit, of unnecessary goto into block (I
    have nothing against forward gotos out of blocks, but gotos into blocks
    make me nervous) and of variable 'length' that serves no purpose, your
    code simply does not fulfill requirements of OP.
    I can immediately see two gotchas: no handling of escaped double
    quotation marks \" and no handling of single quotation marks. Quite
    possibly there are additional omissions.

    BM's C++ version doesn't handle embedded quotes or single quotes either. Neither expand wildcards into sequences of filename arguments.

    But you're right about 'length' which in the end was not used. It makes
    the C version even smaller without it.

    I wasn't trying to match the OP's requirements, as I don't know what
    they are.

    If this has to exactly match how the OS parses the command line into
    separate parameters, then that's likely to be a significantly more
    complex program, especially if it is to run on Linux.

    There's probably no point in trying to create such program; you'd need
    to find a way of utilising the OS to do the work.

    Note that I wasn't posting to solve the OP's problem, but as a
    counter-example to that C++ code which literally hurt my eyes to look at.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Sep 12 19:02:35 2024
    From Newsgroup: comp.lang.c

    Am 12.09.2024 um 18:28 schrieb Bart:

    BM's C++ version doesn't handle embedded quotes or single quotes either.
    Neither expand wildcards into sequences of filename arguments.

    Ok, that must be impossible with C++.
    I just wanted to show how to do it basically and what are the
    advantages: no intermediate data structure through functional
    progtamming and debug iterators.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.c on Thu Sep 12 17:39:46 2024
    From Newsgroup: comp.lang.c

    In article <vbv6r1$bhc9$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 12.09.2024 um 18:28 schrieb Bart:

    BM's C++ version doesn't handle embedded quotes or single quotes either.
    Neither expand wildcards into sequences of filename arguments.

    Ok, that must be impossible with C++.
    I just wanted to show how to do it basically and what are the
    advantages: no intermediate data structure through functional
    progtamming and debug iterators.

    All of which would have been fine - and they'd probably all be raving about what a clever boy you are - if you'd only posted it to an appropriate newsgroup.
    --
    Many (most?) Trump voters voted for him because they thought if they
    supported Trump enough, they'd get to *be* Trump.

    Similarly, Trump believes that if *he* praises Putin enough, he'll get to *be* Putin.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Thu Sep 12 22:38:28 2024
    From Newsgroup: comp.lang.c

    On Thu, 12 Sep 2024 19:02:35 +0200
    Bonita Montero <Bonita.Montero@gmail.com> wrote:

    Am 12.09.2024 um 18:28 schrieb Bart:

    BM's C++ version doesn't handle embedded quotes or single quotes
    either.
    Neither expand wildcards into sequences of filename arguments.

    Ok, that must be impossible with C++.
    I just wanted to show how to do it basically and what are the
    advantages: no intermediate data structure through functional
    progtamming and debug iterators.



    Callback is as easy in C as in C++.
    Debug iterators not needed in such simple program. At least, I don't
    need them.
    Here is an equivalent of your parser written in C. It does not look 5
    times longer.

    Attention! That is an equivalent of Bonita's code, no more and
    hopefully no less. The routine does not fulfill requirements of OP!

    #include <stddef.h>

    void parse(const char* src,
    void (*OnToken)(const char* beg, size_t len, void* context),
    void* context) {
    char c0 = ' ', c1 = '\t';
    const char* beg = 0;
    for (;;src++) {
    char c = *src;
    if (c == c0 || c == c1 || c == 0) {
    if (beg) {
    OnToken(beg, src-beg, context);
    c0 = ' ', c1 = '\t';
    beg = 0;
    }
    if (c == 0)
    break;
    } else if (!beg) {
    beg = src;
    if (c == '"') {
    c0 = c1 = c;
    ++beg;
    }
    }
    }
    }

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.c on Thu Sep 12 22:09:52 2024
    From Newsgroup: comp.lang.c

    On Thu, 12 Sep 2024 17:08:24 +0200, Bonita Montero wrote:

    I tried to experiment with that with /proc/<pid>/cmdline. The first
    problem was that the arguments aren't space delimited, but broken up
    with zeroes.

    That’s not a “problem”: it actually simplifies the parsing, because you can unambiguously extract the original command arguments without having to apply any complicated parsing/quoting/unquoting rules.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.c on Thu Sep 12 22:32:10 2024
    From Newsgroup: comp.lang.c

    On Thu, 12 Sep 2024 17:40:17 +0200, Janis Papanagnou wrote:

    A lot of early C++ programs I've seen were just, umm, "enhanced" "C" programs.

    Given that C++ makes “virtual” optional instead of standard behaviour, I’d
    say that C++ is in fact designed to be used that way.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Thu Sep 12 18:50:11 2024
    From Newsgroup: comp.lang.c

    On 9/12/24 18:32, Lawrence D'Oliveiro wrote:
    On Thu, 12 Sep 2024 17:40:17 +0200, Janis Papanagnou wrote:

    A lot of early C++ programs I've seen were just, umm, "enhanced" "C"
    programs.

    Given that C++ makes “virtual” optional instead of standard behaviour, I’d
    say that C++ is in fact designed to be used that way.

    Like many other aspects of C++, that was dictated by a necessity of
    remaining a certain minimum level of backwards compatibility with
    existing C code. You shouldn't draw any larger conclusions from that choice.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.c on Fri Sep 13 01:37:08 2024
    From Newsgroup: comp.lang.c

    On Thu, 12 Sep 2024 18:50:11 -0400, James Kuyper wrote:

    On 9/12/24 18:32, Lawrence D'Oliveiro wrote:

    On Thu, 12 Sep 2024 17:40:17 +0200, Janis Papanagnou wrote:

    A lot of early C++ programs I've seen were just, umm, "enhanced" "C"
    programs.

    Given that C++ makes “virtual” optional instead of standard behaviour, >> I’d say that C++ is in fact designed to be used that way.

    Like many other aspects of C++, that was dictated by a necessity of
    remaining a certain minimum level of backwards compatibility with
    existing C code.

    No it wasn’t. OO was an entirely new feature, with no counterpart in C, so there was nothing to maintain “backwards compatibility” with.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Fri Sep 13 04:06:58 2024
    From Newsgroup: comp.lang.c

    On 13.09.2024 00:32, Lawrence D'Oliveiro wrote:
    On Thu, 12 Sep 2024 17:40:17 +0200, Janis Papanagnou wrote:

    A lot of early C++ programs I've seen were just, umm, "enhanced" "C"
    programs.

    Given that C++ makes “virtual” optional instead of standard behaviour, I’d
    say that C++ is in fact designed to be used that way.

    There's different semantics with and without a 'virtual' specification.

    Even if you want polymorphism (and have to use 'virtual') there's no
    need to define it as _default_ (and "disable" it where unnecessary).
    A language designer is free to have it explicitly specified for that.
    Given that C++ has its paragon in Simula it's not surprising that it
    has been defined in a similar way; to explicitly declare it virtual.

    (Other OO languages may have that differently designed. But I won't
    engage in any "real" OO languages have 'virtual' defined as default
    sort of discussions. - It is fine for me as it is in Simula or C++.
    If you want to program in OO paradigm just specify 'virtual'.[*])

    Janis

    [*] Note: The Simula compiler I use nowadays is Cim. And Cim has a
    serious bug where specifying 'virtual' does not work as specified;
    it's actually ineffective. I discovered an alternative non-standard
    syntax specification that makes polymorphism work in Cim. (Search
    the Web/Usenet or just ask or mail me if interested in the details.)

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.c on Fri Sep 13 02:43:56 2024
    From Newsgroup: comp.lang.c

    On Thu, 12 Sep 2024 17:47:23 +0200, Bonita Montero wrote:

    Then tell me which lanuage a) has this kind of mostly minimized language-facilities and b) you can layout data structures 1:1 like they
    fit into memory (platform-dependent).

    Python.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.c on Fri Sep 13 02:58:07 2024
    From Newsgroup: comp.lang.c

    On Fri, 13 Sep 2024 04:06:58 +0200, Janis Papanagnou wrote:

    On 13.09.2024 00:32, Lawrence D'Oliveiro wrote:

    On Thu, 12 Sep 2024 17:40:17 +0200, Janis Papanagnou wrote:

    A lot of early C++ programs I've seen were just, umm, "enhanced" "C"
    programs.

    Given that C++ makes “virtual” optional instead of standard behaviour, >> I’d say that C++ is in fact designed to be used that way.

    There's different semantics with and without a 'virtual' specification.

    Precisely. And consider what the meaning of a non-virtual destructor is:
    it is essentially always the wrong thing to do.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Sep 13 07:27:47 2024
    From Newsgroup: comp.lang.c

    Am 13.09.2024 um 04:43 schrieb Lawrence D'Oliveiro:
    On Thu, 12 Sep 2024 17:47:23 +0200, Bonita Montero wrote:

    Then tell me which lanuage a) has this kind of mostly minimized
    language-facilities and b) you can layout data structures 1:1 like they
    fit into memory (platform-dependent).

    Python.

    lol

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Sep 13 07:28:34 2024
    From Newsgroup: comp.lang.c

    Am 12.09.2024 um 21:38 schrieb Michael S:

    Callback is as easy in C as in C++.

    Absolutely not because callbacks can't have state in C.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.c on Fri Sep 13 06:49:20 2024
    From Newsgroup: comp.lang.c

    On Fri, 13 Sep 2024 07:27:47 +0200, Bonita Montero wrote:

    Am 13.09.2024 um 04:43 schrieb Lawrence D'Oliveiro:

    On Thu, 12 Sep 2024 17:47:23 +0200, Bonita Montero wrote:

    Then tell me which lanuage a) has this kind of mostly minimized
    language-facilities and b) you can layout data structures 1:1 like
    they fit into memory (platform-dependent).

    Python.

    Have a look at <https://gitlab.com/ldo/inotipy_examples/-/blob/master/fanotify_7_example?ref_type=heads>,
    and compare the C original from <https://manpages.debian.org/7/fanotify.7.en.html>. The Python code is
    half the size and can use high-level async calls.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Fri Sep 13 11:30:29 2024
    From Newsgroup: comp.lang.c

    On Fri, 13 Sep 2024 01:37:08 -0000 (UTC)
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
    On Thu, 12 Sep 2024 18:50:11 -0400, James Kuyper wrote:

    On 9/12/24 18:32, Lawrence D'Oliveiro wrote:

    On Thu, 12 Sep 2024 17:40:17 +0200, Janis Papanagnou wrote:

    A lot of early C++ programs I've seen were just, umm, "enhanced"
    "C" programs.

    Given that C++ makes “virtual” optional instead of standard
    behaviour, I’d say that C++ is in fact designed to be used that
    way.

    Like many other aspects of C++, that was dictated by a necessity of remaining a certain minimum level of backwards compatibility with
    existing C code.

    No it wasn’t. OO was an entirely new feature, with no counterpart in
    C, so there was nothing to maintain “backwards compatibility” with.
    Agreed.
    Method syntax was entirely new with no backward compatibility
    restrictions.
    BTW, in these sort of discussion I'd rather avoid in-concrete words,
    like "OO".
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Fri Sep 13 11:38:15 2024
    From Newsgroup: comp.lang.c

    On Fri, 13 Sep 2024 07:28:34 +0200
    Bonita Montero <Bonita.Montero@gmail.com> wrote:

    Am 12.09.2024 um 21:38 schrieb Michael S:

    Callback is as easy in C as in C++.

    Absolutely not because callbacks can't have state in C.



    So what is 'context' parameter in my code?

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Fri Sep 13 11:49:35 2024
    From Newsgroup: comp.lang.c

    On Fri, 13 Sep 2024 06:49:20 -0000 (UTC)
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

    On Fri, 13 Sep 2024 07:27:47 +0200, Bonita Montero wrote:

    Am 13.09.2024 um 04:43 schrieb Lawrence D'Oliveiro:

    On Thu, 12 Sep 2024 17:47:23 +0200, Bonita Montero wrote:

    Then tell me which lanuage a) has this kind of mostly minimized
    language-facilities and b) you can layout data structures 1:1 like
    they fit into memory (platform-dependent).

    Python.

    Have a look at <https://gitlab.com/ldo/inotipy_examples/-/blob/master/fanotify_7_example?ref_type=heads>,
    and compare the C original from <https://manpages.debian.org/7/fanotify.7.en.html>. The Python code is
    half the size and can use high-level async calls.

    What exactly your response has to do with producing data structures
    with predefined layout?

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Sep 13 14:12:32 2024
    From Newsgroup: comp.lang.c

    Am 13.09.2024 um 10:38 schrieb Michael S:
    On Fri, 13 Sep 2024 07:28:34 +0200
    Bonita Montero <Bonita.Montero@gmail.com> wrote:

    Am 12.09.2024 um 21:38 schrieb Michael S:

    Callback is as easy in C as in C++.

    Absolutely not because callbacks can't have state in C.



    So what is 'context' parameter in my code?

    In C++ the state is an own internal "this"-like object and you dont't
    need any explicit parameters. Just a [&] and the lambda refers to the
    whole outer context.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Fri Sep 13 15:25:00 2024
    From Newsgroup: comp.lang.c

    On Fri, 13 Sep 2024 14:12:32 +0200
    Bonita Montero <Bonita.Montero@gmail.com> wrote:

    Am 13.09.2024 um 10:38 schrieb Michael S:
    On Fri, 13 Sep 2024 07:28:34 +0200
    Bonita Montero <Bonita.Montero@gmail.com> wrote:

    Am 12.09.2024 um 21:38 schrieb Michael S:

    Callback is as easy in C as in C++.

    Absolutely not because callbacks can't have state in C.



    So what is 'context' parameter in my code?

    In C++ the state is an own internal "this"-like object and you dont't
    need any explicit parameters.

    So, do you admit that callback in C can have state?

    Just a [&] and the lambda refers to the whole outer context.

    Bad software engineering practice that easily leads to incomprehensible
    code.
    When in C++ and not in mood for C-style, I very much prefer functors. Ideologically they are the same as C-style context, but a little
    sugarized syntactically.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Fri Sep 13 14:31:06 2024
    From Newsgroup: comp.lang.c

    On 13.09.2024 04:58, Lawrence D'Oliveiro wrote:
    On Fri, 13 Sep 2024 04:06:58 +0200, Janis Papanagnou wrote:

    On 13.09.2024 00:32, Lawrence D'Oliveiro wrote:

    On Thu, 12 Sep 2024 17:40:17 +0200, Janis Papanagnou wrote:

    A lot of early C++ programs I've seen were just, umm, "enhanced" "C"
    programs.

    Given that C++ makes “virtual” optional instead of standard behaviour, >>> I’d say that C++ is in fact designed to be used that way.

    There's different semantics with and without a 'virtual' specification.

    Precisely. And consider what the meaning of a non-virtual destructor is:
    it is essentially always the wrong thing to do.

    I've used both design patterns depending on what I intended,
    so I cannot say that one would be "wrong" in any way.

    (Upthread I seem to have rightly sensed that this might lead
    to a "right/wrong" ("real" OO) sort of discussion. I abstain.)

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Sep 13 15:20:25 2024
    From Newsgroup: comp.lang.c

    Am 13.09.2024 um 14:25 schrieb Michael S:
    On Fri, 13 Sep 2024 14:12:32 +0200
    Bonita Montero <Bonita.Montero@gmail.com> wrote:

    Am 13.09.2024 um 10:38 schrieb Michael S:
    On Fri, 13 Sep 2024 07:28:34 +0200
    Bonita Montero <Bonita.Montero@gmail.com> wrote:

    Am 12.09.2024 um 21:38 schrieb Michael S:

    Callback is as easy in C as in C++.

    Absolutely not because callbacks can't have state in C.



    So what is 'context' parameter in my code?

    In C++ the state is an own internal "this"-like object and you dont't
    need any explicit parameters.

    So, do you admit that callback in C can have state?

    No, because this is a parameter and a lambda is a glue of an
    object and a calling operator. The object is the state and a
    C function-pointer misses that.

    Bad software engineering practice that easily leads to incomprehensible
    code.

    I'm using this convention with nearly any lambda and if I can't
    remember later which outer variables are used I remove the & and
    the temporary non-referencing locals are underlined red and when
    I've notice which locals were used I press ^Z.

    When in C++ and not in mood for C-style, I very much prefer functors. Ideologically they are the same as C-style context, but a little
    sugarized syntactically.

    No, A C++ functor may be an object with a calling operator. In C you
    don't have the implicit object; that's magnitudes less convenient. C
    is always multiple times more work.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Fri Sep 13 09:05:04 2024
    From Newsgroup: comp.lang.c

    Michael S <already5chosen@yahoo.com> writes:

    [..iterate over words in a string..]

    #include <stddef.h>

    void parse(const char* src,
    void (*OnToken)(const char* beg, size_t len, void* context),
    void* context) {
    char c0 = ' ', c1 = '\t';
    const char* beg = 0;
    for (;;src++) {
    char c = *src;
    if (c == c0 || c == c1 || c == 0) {
    if (beg) {
    OnToken(beg, src-beg, context);
    c0 = ' ', c1 = '\t';
    beg = 0;
    }
    if (c == 0)
    break;
    } else if (!beg) {
    beg = src;
    if (c == '"') {
    c0 = c1 = c;
    ++beg;
    }
    }
    }
    }

    I couldn't resist writing some code along similar lines. The
    entry point is words_do(), which returns one on success and
    zero if the end of string is reached inside double quotes.


    typedef struct gopher_s *Gopher;
    struct gopher_s { void (*f)( Gopher, const char *, const char * ); };

    static _Bool collect_word( const char *, const char *, _Bool, Gopher ); static _Bool is_space( char );


    _Bool
    words_do( const char *s, Gopher go ){
    char c = *s;

    return
    is_space(c) ? words_do( s+1, go ) :
    c ? collect_word( s, s, 1, go ) :
    /***************/ 1;
    }

    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;

    return
    c == 0 ? go->f( go, r, s ), w :
    is_space(c) && w ? go->f( go, r, s ), words_do( s, go ) :
    /***************/ collect_word( s+1, r, w ^ c == '"', go );
    }

    _Bool
    is_space( char c ){
    return c == ' ' || c == '\t';
    }
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.c on Fri Sep 13 22:04:43 2024
    From Newsgroup: comp.lang.c

    On Fri, 13 Sep 2024 11:49:35 +0300, Michael S wrote:

    What exactly your response has to do with producing data structures with predefined layout?

    Look at those structures: they have a specific predefined layout.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.c on Fri Sep 13 22:24:00 2024
    From Newsgroup: comp.lang.c

    On Fri, 13 Sep 2024 14:12:32 +0200, Bonita Montero wrote:

    In C++ the state is an own internal "this"-like object and you dont't
    need any explicit parameters.

    But you need a calling convention that passes “this” explicitly.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bart@bc@freeuk.com to comp.lang.c on Fri Sep 13 23:48:38 2024
    From Newsgroup: comp.lang.c

    On 13/09/2024 23:04, Lawrence D'Oliveiro wrote:
    On Fri, 13 Sep 2024 11:49:35 +0300, Michael S wrote:

    What exactly your response has to do with producing data structures with
    predefined layout?

    Look at those structures: they have a specific predefined layout.

    Look at them where? One link is a man-page with several C structs
    defined (triple-spaced for some reason).

    But I can't see anything in the Python link that looks like it might be defining a struct layout.

    So I would also question what it has to do with it.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sat Sep 14 01:42:05 2024
    From Newsgroup: comp.lang.c

    Am 14.09.2024 um 00:24 schrieb Lawrence D'Oliveiro:
    On Fri, 13 Sep 2024 14:12:32 +0200, Bonita Montero wrote:

    In C++ the state is an own internal "this"-like object and you dont't
    need any explicit parameters.

    But you need a calling convention that passes “this” explicitly.

    That's not part of the C++-language.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.c on Sat Sep 14 01:41:02 2024
    From Newsgroup: comp.lang.c

    On Fri, 13 Sep 2024 23:48:38 +0100, Bart wrote:

    On 13/09/2024 23:04, Lawrence D'Oliveiro wrote:

    On Fri, 13 Sep 2024 11:49:35 +0300, Michael S wrote:

    What exactly your response has to do with producing data structures
    with predefined layout?

    Look at those structures: they have a specific predefined layout.

    One link is a man-page with several C structs defined ...

    Correct. Structures that the Python wrapper is able to map exactly.

    And the choice between which particular structure variants to use is
    dynamic, dependent on the event type. So the Python wrapper is able to dynamically generate a suitable type-safe wrapper -- something that a statically-typed language cannot do.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.c on Sat Sep 14 01:41:32 2024
    From Newsgroup: comp.lang.c

    On Sat, 14 Sep 2024 01:42:05 +0200, Bonita Montero wrote:

    Am 14.09.2024 um 00:24 schrieb Lawrence D'Oliveiro:

    On Fri, 13 Sep 2024 14:12:32 +0200, Bonita Montero wrote:

    In C++ the state is an own internal "this"-like object and you dont't
    need any explicit parameters.

    But you need a calling convention that passes “this” explicitly.

    That's not part of the C++-language.

    If the implementation doesn’t do it, it doesn’t work.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bart@bc@freeuk.com to comp.lang.c on Sat Sep 14 10:58:34 2024
    From Newsgroup: comp.lang.c

    On 14/09/2024 02:41, Lawrence D'Oliveiro wrote:
    On Fri, 13 Sep 2024 23:48:38 +0100, Bart wrote:

    On 13/09/2024 23:04, Lawrence D'Oliveiro wrote:

    On Fri, 13 Sep 2024 11:49:35 +0300, Michael S wrote:

    What exactly your response has to do with producing data structures
    with predefined layout?

    Look at those structures: they have a specific predefined layout.

    One link is a man-page with several C structs defined ...

    Correct. Structures that the Python wrapper is able to map exactly.

    And the choice between which particular structure variants to use is
    dynamic, dependent on the event type. So the Python wrapper is able to dynamically generate a suitable type-safe wrapper -- something that a statically-typed language cannot do.

    So, where IS the struct defined in that Python code? Which line number?

    If it is the defined is elsewhere, then that Python proves nothing.

    For example, where is this struct:

    struct fanotify_event_info_header {
    __u8 info_type;
    __u8 pad;
    __u16 len;
    };

    defined in that Python? I have an idea how this might be done using the
    ctypes module for example, but it's not pretty. However I'm not even
    seeing that.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.c on Sat Sep 14 22:37:49 2024
    From Newsgroup: comp.lang.c

    On Sat, 14 Sep 2024 10:58:34 +0100, Bart wrote:

    So, where IS the struct defined in that Python code?

    In the API wrapper module, of course.

    <https://gitlab.com/ldo/inotipy>
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Sun Sep 15 12:22:11 2024
    From Newsgroup: comp.lang.c

    On Fri, 13 Sep 2024 09:05:04 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [..iterate over words in a string..]

    #include <stddef.h>

    void parse(const char* src,
    void (*OnToken)(const char* beg, size_t len, void* context),
    void* context) {
    char c0 = ' ', c1 = '\t';
    const char* beg = 0;
    for (;;src++) {
    char c = *src;
    if (c == c0 || c == c1 || c == 0) {
    if (beg) {
    OnToken(beg, src-beg, context);
    c0 = ' ', c1 = '\t';
    beg = 0;
    }
    if (c == 0)
    break;
    } else if (!beg) {
    beg = src;
    if (c == '"') {
    c0 = c1 = c;
    ++beg;
    }
    }
    }
    }

    I couldn't resist writing some code along similar lines. The
    entry point is words_do(), which returns one on success and
    zero if the end of string is reached inside double quotes.


    typedef struct gopher_s *Gopher;
    struct gopher_s { void (*f)( Gopher, const char *, const char * ); };

    static _Bool collect_word( const char *, const char *, _Bool,
    Gopher ); static _Bool is_space( char );


    _Bool
    words_do( const char *s, Gopher go ){
    char c = *s;

    return
    is_space(c) ? words_do( s+1, go )
    : c ? collect_word( s, s, 1, go )
    : /***************/ 1;
    }

    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;

    return
    c == 0 ? go->f( go, r, s ), w
    : is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
    : /***************/ collect_word( s+1, r, w ^ c == '"', go );
    }

    _Bool
    is_space( char c ){
    return c == ' ' || c == '\t';
    }


    Can you give an example implementation of go->f() ?
    It seems to me that it would have to use CONTAINING_RECORD or
    container_of or analogous non-standard macro.

    Also, while formally the program is written in C, by spirit it's
    something else. May be, Lisp.
    Lisp compilers are known to be very good at tail call elimination.
    C compilers also can do it, but not reliably. In this particular case I
    am afraid that common C compilers will implement it as written, i.e.
    without turning recursion into iteration.

    Tested on godbolt.
    gcc -O2 turns it into iteration starting from v.4.4
    clang -O2 turns it into iteration starting from v.4.0
    Latest icc still does not turn it into iteration at least along one
    code paths.
    Latest MSVC implements it as written, 100% recursion.








    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Sep 16 00:52:26 2024
    From Newsgroup: comp.lang.c

    Michael S <already5chosen@yahoo.com> writes:

    [comments reordered]

    Also, while formally the program is written in C, by spirit it's
    something else. May be, Lisp.

    I would call it a functional style, but still C. Not a C style
    as most people are used to seeing it, I grant you that. I still
    think of it as C though.


    Lisp compilers are known to be very good at tail call elimination.
    C compilers also can do it, but not reliably. In this particular
    case I am afraid that common C compilers will implement it as
    written, i.e. without turning recursion into iteration.

    I routinely use gcc and clang, and both are good at turning
    this kind of mutual recursion into iteration (-Os or higher,
    although clang was able to eliminate all the recursion at -O1).
    I agree the recursion elimination is not as reliable as one
    would like; in practice though I find it quite usable.


    Tested on godbolt.
    gcc -O2 turns it into iteration starting from v.4.4
    clang -O2 turns it into iteration starting from v.4.0

    Both as expected.

    Latest icc still does not turn it into iteration at least along one
    code paths.

    That's disappointing, but good to know.

    Latest MSVC implements it as written, 100% recursion.

    I'm not surprised at all. In my admittedly very limited experience,
    MSVC is garbage.


    Can you give an example implementation of go->f() ?
    It seems to me that it would have to use CONTAINING_RECORD or
    container_of or analogous non-standard macro.

    You say that like you think such macros don't have well-defined
    behavior. If I needed such a macro probably I would just
    define it myself (and would be confident that it would
    work correctly).

    In this case I don't need a macro because I would put the gopher
    struct at the beginning of the containing struct. For example:

    #include <stdio.h>

    typedef struct {
    struct gopher_s go;
    unsigned words;
    } WordCounter;


    static void
    print_word( Gopher go, const char *s, const char *t ){
    WordCounter *context = (void*) go;
    int n = t-s;

    printf( " word: %.*s\n", n, s );
    context->words ++;
    }


    int
    main(){
    WordCounter wc = { { print_word }, 0 };
    char *words = "\tthe quick \"brown fox\" jumps over the lazy dog.";

    words_do( words, &wc.go );
    printf( "\n" );
    printf( " There were %u words found\n", wc.words );
    return 0;
    }
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Mon Sep 16 12:23:38 2024
    From Newsgroup: comp.lang.c

    On Mon, 16 Sep 2024 00:52:26 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [comments reordered]

    Also, while formally the program is written in C, by spirit it's
    something else. May be, Lisp.

    I would call it a functional style, but still C. Not a C style
    as most people are used to seeing it, I grant you that. I still
    think of it as C though.


    Lisp compilers are known to be very good at tail call elimination.
    C compilers also can do it, but not reliably. In this particular
    case I am afraid that common C compilers will implement it as
    written, i.e. without turning recursion into iteration.

    I routinely use gcc and clang, and both are good at turning
    this kind of mutual recursion into iteration (-Os or higher,
    although clang was able to eliminate all the recursion at -O1).
    I agree the recursion elimination is not as reliable as one
    would like; in practice though I find it quite usable.


    Tested on godbolt.
    gcc -O2 turns it into iteration starting from v.4.4
    clang -O2 turns it into iteration starting from v.4.0

    Both as expected.


    So, only 15 years for gcc and only 7 years for clang.

    Latest icc still does not turn it into iteration at least along one
    code paths.

    That's disappointing, but good to know.

    Latest MSVC implements it as written, 100% recursion.

    I'm not surprised at all. In my admittedly very limited experience,
    MSVC is garbage.


    For sort of code that is important to me, gcc, clang and MSVC tend to
    generate code of similar quality. clang is most suspect of the three to sometimes unexpectedly produce utter crap. On the other hand, it is
    sometimes most brilliant.
    In case of gcc, I hate that recently they put tree-slp-vectorize under
    -O2 umbrella.


    Can you give an example implementation of go->f() ?
    It seems to me that it would have to use CONTAINING_RECORD or
    container_of or analogous non-standard macro.

    You say that like you think such macros don't have well-defined
    behavior. If I needed such a macro probably I would just
    define it myself (and would be confident that it would
    work correctly).

    In this case I don't need a macro because I would put the gopher
    struct at the beginning of the containing struct. For example:

    #include <stdio.h>

    typedef struct {
    struct gopher_s go;
    unsigned words;
    } WordCounter;


    static void
    print_word( Gopher go, const char *s, const char *t ){
    WordCounter *context = (void*) go;

    That's what I was missing. Simple and adequate.

    int n = t-s;

    printf( " word: %.*s\n", n, s );
    context->words ++;
    }



    int
    main(){
    WordCounter wc = { { print_word }, 0 };
    char *words = "\tthe quick \"brown fox\" jumps over the lazy dog.";

    words_do( words, &wc.go );
    printf( "\n" );
    printf( " There were %u words found\n", wc.words );
    return 0;
    }

    There are couple of differences between your and my parsing.
    1. "42""43"
    You parse it as a single word, I split. It seems, your behavior is
    closer to that of both bash and cmd.exe
    2. I strip " characters from "-delimited words. You seem to leave them.
    In this case what I do is more similar to both bash and cmd.exe

    Not that it matters.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Tue Sep 17 03:12:04 2024
    From Newsgroup: comp.lang.c

    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 16 Sep 2024 00:52:26 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [comments reordered]

    Also, while formally the program is written in C, by spirit it's
    something else. May be, Lisp.

    I would call it a functional style, but still C. Not a C style
    as most people are used to seeing it, I grant you that. I still
    think of it as C though.


    Lisp compilers are known to be very good at tail call elimination.
    C compilers also can do it, but not reliably. In this particular
    case I am afraid that common C compilers will implement it as
    written, i.e. without turning recursion into iteration.

    I routinely use gcc and clang, and both are good at turning
    this kind of mutual recursion into iteration (-Os or higher,
    although clang was able to eliminate all the recursion at -O1).
    I agree the recursion elimination is not as reliable as one
    would like; in practice though I find it quite usable.


    Tested on godbolt.
    gcc -O2 turns it into iteration starting from v.4.4
    clang -O2 turns it into iteration starting from v.4.0
    Latest icc still does not turn it into iteration at least along one
    code paths.

    That's disappointing, but good to know.

    Latest MSVC implements it as written, 100% recursion.

    I'm not surprised at all. In my admittedly very limited experience,
    MSVC is garbage.

    For sort of code that is important to me, gcc, clang and MSVC tend to generate code of similar quality.

    To clarify, my earlier comment about MSVC is about what it thinks
    the language is, not anything about quality of generated code. But
    the lack of tail call elimination fits in with what else I have
    seen.

    clang is most suspect of the three to sometimes unexpectedly
    produce utter crap. On the other hand, it is sometimes most
    brilliant.

    That's interesting. Recently I encountered a problem where clang
    did just fine but gcc generated bad code under -O3.

    In case of gcc, I hate that recently they put tree-slp-vectorize
    under -O2 umbrella.

    Yes, gcc is like a box of chocolates - you never know what you're
    going to get.

    Can you give an example implementation of go->f() ?
    It seems to me that it would have to use CONTAINING_RECORD or
    container_of or analogous non-standard macro.

    You say that like you think such macros don't have well-defined
    behavior. If I needed such a macro probably I would just
    define it myself (and would be confident that it would
    work correctly).

    In this case I don't need a macro because I would put the gopher
    struct at the beginning of the containing struct. For example:

    #include <stdio.h>

    typedef struct {
    struct gopher_s go;
    unsigned words;
    } WordCounter;


    static void
    print_word( Gopher go, const char *s, const char *t ){
    WordCounter *context = (void*) go;

    That's what I was missing. Simple and adequate.

    I now prefer this technique for callbacks. Cuts down on the
    number of parameters, safer than a (void*) parameter, and it puts
    the function pointer near the context state so it's easier to
    connect the two (and less worry about them getting out of sync).

    int n = t-s;

    printf( " word: %.*s\n", n, s );
    context->words ++;
    }

    int
    main(){
    WordCounter wc = { { print_word }, 0 };
    char *words = "\tthe quick \"brown fox\" jumps over the lazy dog.";

    words_do( words, &wc.go );
    printf( "\n" );
    printf( " There were %u words found\n", wc.words );
    return 0;
    }

    There are couple of differences between your and my parsing.
    1. "42""43"
    You parse it as a single word, I split. It seems, your behavior is
    closer to that of both bash and cmd.exe

    Yes. I chose that deliberately because I often use patterns like
    foo."$suffix" and it made sense to allow quoted subparts for that
    reason.

    2. I strip " characters from "-delimited words. You seem to leave them.
    In this case what I do is more similar to both bash and cmd.exe

    I do, both because it's easier, and in case the caller wants to
    know where the quotes are. If it's important to strip them out
    it's up to the caller to do that.

    Not that it matters.

    Yeah. These choices are only minor details; the general
    approach taken is the main thing.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From antispam@antispam@fricas.org to comp.lang.c on Tue Sep 17 22:34:33 2024
    From Newsgroup: comp.lang.c

    Michael S <already5chosen@yahoo.com> wrote:
    On Fri, 13 Sep 2024 09:05:04 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [..iterate over words in a string..]

    I couldn't resist writing some code along similar lines. The
    entry point is words_do(), which returns one on success and
    zero if the end of string is reached inside double quotes.


    typedef struct gopher_s *Gopher;
    struct gopher_s { void (*f)( Gopher, const char *, const char * ); };

    static _Bool collect_word( const char *, const char *, _Bool,
    Gopher ); static _Bool is_space( char );


    _Bool
    words_do( const char *s, Gopher go ){
    char c = *s;

    return
    is_space(c) ? words_do( s+1, go )
    : c ? collect_word( s, s, 1, go )
    : /***************/ 1;
    }

    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;

    return
    c == 0 ? go->f( go, r, s ), w
    : is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
    : /***************/ collect_word( s+1, r, w ^ c == '"', go );
    }

    _Bool
    is_space( char c ){
    return c == ' ' || c == '\t';
    }



    <snip>

    Tested on godbolt.
    gcc -O2 turns it into iteration starting from v.4.4
    clang -O2 turns it into iteration starting from v.4.0
    Latest icc still does not turn it into iteration at least along one
    code paths.
    Latest MSVC implements it as written, 100% recursion.

    I tested using gcc 12. AFAICS calls to 'go->f' in 'collect_word'
    are not tail calls and gcc 12 compiles them as normal call.
    The other calls are compiled to jumps. But call to 'collect_word'
    in 'words_do' is not "sibicall" and dependig in calling convention
    compiler may treat it narmal call. Two other calls, that is
    call to 'words_do' in 'words_do' and call to 'collect_word' in
    'collect_word' are clearly tail self recursion and compiler
    should always optimize them to a jump.
    --
    Waldek Hebisch
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Tue Sep 17 16:33:16 2024
    From Newsgroup: comp.lang.c

    antispam@fricas.org writes:

    Michael S <already5chosen@yahoo.com> wrote:

    On Fri, 13 Sep 2024 09:05:04 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [..iterate over words in a string..]

    I couldn't resist writing some code along similar lines. The
    entry point is words_do(), which returns one on success and
    zero if the end of string is reached inside double quotes.


    typedef struct gopher_s *Gopher;
    struct gopher_s { void (*f)( Gopher, const char *, const char * ); };

    static _Bool collect_word( const char *, const char *, _Bool,
    Gopher ); static _Bool is_space( char );


    _Bool
    words_do( const char *s, Gopher go ){
    char c = *s;

    return
    is_space(c) ? words_do( s+1, go )
    : c ? collect_word( s, s, 1, go )
    : /***************/ 1;
    }

    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;

    return
    c == 0 ? go->f( go, r, s ), w
    : is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
    : /***************/ collect_word( s+1, r, w ^ c == '"', go );
    }

    _Bool
    is_space( char c ){
    return c == ' ' || c == '\t';
    }



    <snip>

    Tested on godbolt.
    gcc -O2 turns it into iteration starting from v.4.4
    clang -O2 turns it into iteration starting from v.4.0
    Latest icc still does not turn it into iteration at least along one
    code paths.
    Latest MSVC implements it as written, 100% recursion.

    I tested using gcc 12. AFAICS calls to 'go->f' in 'collect_word'
    are not tail calls and gcc 12 compiles them as normal call.

    Right, they are not tail calls, simply ordinary calls (indirect
    calls, but still ordinary calls).

    The other calls are compiled to jumps. But call to 'collect_word'
    in 'words_do' is not "sibicall" and dependig in calling convention
    compiler may treat it narmal call. Two other calls, that is
    call to 'words_do' in 'words_do' and call to 'collect_word' in
    'collect_word' are clearly tail self recursion and compiler
    should always optimize them to a jump.

    Yes, a different set of calling conventions could result in the
    call to collect_word from words_do being a normal call. It
    should be possible to correct that by adding two dummy parameters
    to words_do(), and wrapping the result in one outer function so
    that there is at most one extra call besides the call from outide.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Wed Sep 18 02:46:11 2024
    From Newsgroup: comp.lang.c

    On Tue, 17 Sep 2024 22:34:33 -0000 (UTC)
    antispam@fricas.org wrote:

    Michael S <already5chosen@yahoo.com> wrote:
    On Fri, 13 Sep 2024 09:05:04 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [..iterate over words in a string..]

    I couldn't resist writing some code along similar lines. The
    entry point is words_do(), which returns one on success and
    zero if the end of string is reached inside double quotes.


    typedef struct gopher_s *Gopher;
    struct gopher_s { void (*f)( Gopher, const char *, const char * );
    };

    static _Bool collect_word( const char *, const char *, _Bool,
    Gopher ); static _Bool is_space( char );


    _Bool
    words_do( const char *s, Gopher go ){
    char c = *s;

    return
    is_space(c) ? words_do( s+1, go )
    : c ? collect_word( s, s, 1, go )
    : /***************/ 1;
    }

    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;

    return
    c == 0 ? go->f( go, r, s ), w
    : is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
    : /***************/ collect_word( s+1, r, w ^ c == '"', go );
    }

    _Bool
    is_space( char c ){
    return c == ' ' || c == '\t';
    }



    <snip>

    Tested on godbolt.
    gcc -O2 turns it into iteration starting from v.4.4
    clang -O2 turns it into iteration starting from v.4.0
    Latest icc still does not turn it into iteration at least along one
    code paths.
    Latest MSVC implements it as written, 100% recursion.

    I tested using gcc 12. AFAICS calls to 'go->f' in 'collect_word'
    are not tail calls and gcc 12 compiles them as normal call.

    Naturally.

    The other calls are compiled to jumps. But call to 'collect_word'
    in 'words_do' is not "sibicall" and dependig in calling convention
    compiler may treat it narmal call. Two other calls, that is
    call to 'words_do' in 'words_do' and call to 'collect_word' in
    'collect_word' are clearly tail self recursion and compiler
    should always optimize them to a jump.


    "Should" or not, MSVC does not eliminate them.

    The funny thing is that it does eliminate all four calls after I rewrote
    the code in more boring style.

    _Bool
    words_do( const char *s, Gopher go ){
    char c = *s;
    #if 1
    if (is_space(c))
    return words_do( s+1, go );
    if (c)
    return collect_word( s, s, 1, go );
    return 1;
    #else
    return
    is_space(c) ? words_do( s+1, go ) :
    c ? collect_word( s, s, 1, go ):
    /***************/ 1;
    #endif
    }

    static
    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;
    #if 1
    if (c == 0) {
    go->f( go, r, s );
    return w;
    }
    if (is_space(c) && w) {
    go->f( go, r, s );
    return words_do( s, go );
    }
    return collect_word( s+1, r, w ^ c == '"', go );
    #else
    return
    c == 0 ? go->f( go, r, s ), w :
    is_space(c) && w ? go->f( go, r, s ), words_do( s, go ) :
    /***************/ collect_word( s+1, r, w ^ c == '"', go );
    #endif
    }







    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bart@bc@freeuk.com to comp.lang.c on Wed Sep 18 01:07:17 2024
    From Newsgroup: comp.lang.c

    On 18/09/2024 00:46, Michael S wrote:
    On Tue, 17 Sep 2024 22:34:33 -0000 (UTC)
    antispam@fricas.org wrote:

    Michael S <already5chosen@yahoo.com> wrote:
    On Fri, 13 Sep 2024 09:05:04 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [..iterate over words in a string..]

    I couldn't resist writing some code along similar lines. The
    entry point is words_do(), which returns one on success and
    zero if the end of string is reached inside double quotes.


    typedef struct gopher_s *Gopher;
    struct gopher_s { void (*f)( Gopher, const char *, const char * );
    };

    static _Bool collect_word( const char *, const char *, _Bool,
    Gopher ); static _Bool is_space( char );


    _Bool
    words_do( const char *s, Gopher go ){
    char c = *s;

    return
    is_space(c) ? words_do( s+1, go )
    : c ? collect_word( s, s, 1, go )
    : /***************/ 1;
    }

    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;

    return
    c == 0 ? go->f( go, r, s ), w
    : is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
    : /***************/ collect_word( s+1, r, w ^ c == '"', go );
    }

    _Bool
    is_space( char c ){
    return c == ' ' || c == '\t';
    }



    <snip>

    Tested on godbolt.
    gcc -O2 turns it into iteration starting from v.4.4
    clang -O2 turns it into iteration starting from v.4.0
    Latest icc still does not turn it into iteration at least along one
    code paths.
    Latest MSVC implements it as written, 100% recursion.

    I tested using gcc 12. AFAICS calls to 'go->f' in 'collect_word'
    are not tail calls and gcc 12 compiles them as normal call.

    Naturally.

    The other calls are compiled to jumps. But call to 'collect_word'
    in 'words_do' is not "sibicall" and dependig in calling convention
    compiler may treat it narmal call. Two other calls, that is
    call to 'words_do' in 'words_do' and call to 'collect_word' in
    'collect_word' are clearly tail self recursion and compiler
    should always optimize them to a jump.


    "Should" or not, MSVC does not eliminate them.

    The funny thing is that it does eliminate all four calls after I rewrote
    the code in more boring style.

    static
    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;
    #if 1
    if (c == 0) {
    go->f( go, r, s );
    return w;
    }
    if (is_space(c) && w) {
    go->f( go, r, s );
    return words_do( s, go );
    }
    return collect_word( s+1, r, w ^ c == '"', go );
    #else
    return
    c == 0 ? go->f( go, r, s ), w :
    is_space(c) && w ? go->f( go, r, s ), words_do( s, go ) :
    /***************/ collect_word( s+1, r, w ^ c == '"', go );
    #endif
    }

    I find such a coding style pretty much impossible to grasp and
    unpleasant to look at. I had to refactor it like this:

    ---------------

    static_Bool collect_word(char *s, char *r, _Bool w, Gopher go ) {
    char c = *s;
    #if 1
    if (c == 0) {
    go->f(go, r, s);
    return w;
    }
    if (is_space(c) && w) {
    go->f(go, r, s);
    return words_do(s, go);
    }
    return collect_word(s+1, r, (w ^ c) == '"', go);

    #else
    if (c == 0) {
    go->f(go, r, s);
    return w;
    }
    else if (is_space(c) && w) {
    go->f(go, r, s);
    return words_do(s, go);
    }
    else {
    return collect_word(s+1, r, (w ^ c) = '"', go);
    }

    #endif
    }

    ---------------

    When I'd finished, I realised that those two conditional blocks do more
    or less the same thing! If that's what you mean by 'boring', then I'll
    all for it.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Tue Sep 17 18:31:10 2024
    From Newsgroup: comp.lang.c

    Michael S <already5chosen@yahoo.com> writes:

    On Tue, 17 Sep 2024 22:34:33 -0000 (UTC)
    antispam@fricas.org wrote:

    Michael S <already5chosen@yahoo.com> wrote:

    On Fri, 13 Sep 2024 09:05:04 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [..iterate over words in a string..]

    I couldn't resist writing some code along similar lines. The
    entry point is words_do(), which returns one on success and
    zero if the end of string is reached inside double quotes.


    typedef struct gopher_s *Gopher;
    struct gopher_s { void (*f)( Gopher, const char *, const char * );
    };

    static _Bool collect_word( const char *, const char *, _Bool,
    Gopher ); static _Bool is_space( char );


    _Bool
    words_do( const char *s, Gopher go ){
    char c = *s;

    return
    is_space(c) ? words_do( s+1, go )
    : c ? collect_word( s, s, 1, go )
    : /***************/ 1;
    }

    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;

    return
    c == 0 ? go->f( go, r, s ), w
    : is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
    : /***************/ collect_word( s+1, r, w ^ c == '"', go );
    }

    _Bool
    is_space( char c ){
    return c == ' ' || c == '\t';
    }



    <snip>

    Tested on godbolt.
    gcc -O2 turns it into iteration starting from v.4.4
    clang -O2 turns it into iteration starting from v.4.0
    Latest icc still does not turn it into iteration at least along one
    code paths.
    Latest MSVC implements it as written, 100% recursion.

    I tested using gcc 12. AFAICS calls to 'go->f' in 'collect_word'
    are not tail calls and gcc 12 compiles them as normal call.

    Naturally.

    The other calls are compiled to jumps. But call to 'collect_word'
    in 'words_do' is not "sibicall" and dependig in calling convention
    compiler may treat it narmal call. Two other calls, that is
    call to 'words_do' in 'words_do' and call to 'collect_word' in
    'collect_word' are clearly tail self recursion and compiler
    should always optimize them to a jump.

    "Should" or not, MSVC does not eliminate them.

    The funny thing is that it does eliminate all four calls after I rewrote
    the code in more boring style.

    _Bool
    words_do( const char *s, Gopher go ){
    char c = *s;
    #if 1
    if (is_space(c))
    return words_do( s+1, go );
    if (c)
    return collect_word( s, s, 1, go );
    return 1;
    #else
    return
    is_space(c) ? words_do( s+1, go ) :
    c ? collect_word( s, s, 1, go ):
    /***************/ 1;
    #endif
    }

    static
    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;
    #if 1
    if (c == 0) {
    go->f( go, r, s );
    return w;
    }
    if (is_space(c) && w) {
    go->f( go, r, s );
    return words_do( s, go );
    }
    return collect_word( s+1, r, w ^ c == '"', go );
    #else
    return
    c == 0 ? go->f( go, r, s ), w :
    is_space(c) && w ? go->f( go, r, s ), words_do( s, go ) :
    /***************/ collect_word( s+1, r, w ^ c == '"', go );
    #endif
    }

    That's amusing. :)

    Do you know if icc will do tail call elimination for
    the boring version of the code?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.c on Wed Sep 18 02:20:18 2024
    From Newsgroup: comp.lang.c

    On Wed, 18 Sep 2024 02:46:11 +0300, Michael S wrote:

    "Should" or not, MSVC does not eliminate them.

    Another reason to stay away from MSVC?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Wed Sep 18 11:03:44 2024
    From Newsgroup: comp.lang.c

    On Tue, 17 Sep 2024 18:31:10 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:


    That's amusing. :)

    Do you know if icc will do tail call elimination for
    the boring version of the code?

    Output of 'icc -O2' does recursive inlining to quite significant depth,
    so it is rather hard to follow.
    But it seems that the answer is "No".

    Anyway, by now icc is mostly of historical interest.
    They ceased independent compiler development 2-3 years ago and turned
    into yet another LLVM/clang distributor.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Wed Sep 18 11:05:24 2024
    From Newsgroup: comp.lang.c

    On Wed, 18 Sep 2024 02:20:18 -0000 (UTC)
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

    On Wed, 18 Sep 2024 02:46:11 +0300, Michael S wrote:

    "Should" or not, MSVC does not eliminate them.

    Another reason to stay away from MSVC?

    No, it isn't.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Wed Sep 18 11:43:05 2024
    From Newsgroup: comp.lang.c

    On Wed, 18 Sep 2024 01:07:17 +0100
    Bart <bc@freeuk.com> wrote:

    On 18/09/2024 00:46, Michael S wrote:
    On Tue, 17 Sep 2024 22:34:33 -0000 (UTC)
    antispam@fricas.org wrote:

    Michael S <already5chosen@yahoo.com> wrote:
    On Fri, 13 Sep 2024 09:05:04 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [..iterate over words in a string..]

    I couldn't resist writing some code along similar lines. The
    entry point is words_do(), which returns one on success and
    zero if the end of string is reached inside double quotes.


    typedef struct gopher_s *Gopher;
    struct gopher_s { void (*f)( Gopher, const char *, const char *
    ); };

    static _Bool collect_word( const char *, const char *, _Bool,
    Gopher ); static _Bool is_space( char );


    _Bool
    words_do( const char *s, Gopher go ){
    char c = *s;

    return
    is_space(c) ? words_do( s+1, go )
    : c ? collect_word( s, s, 1, go )
    : /***************/ 1;
    }

    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;

    return
    c == 0 ? go->f( go, r, s ), w
    : is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
    : /***************/ collect_word( s+1, r, w ^ c == '"', go );
    }

    _Bool
    is_space( char c ){
    return c == ' ' || c == '\t';
    }



    <snip>

    Tested on godbolt.
    gcc -O2 turns it into iteration starting from v.4.4
    clang -O2 turns it into iteration starting from v.4.0
    Latest icc still does not turn it into iteration at least along
    one code paths.
    Latest MSVC implements it as written, 100% recursion.

    I tested using gcc 12. AFAICS calls to 'go->f' in 'collect_word'
    are not tail calls and gcc 12 compiles them as normal call.

    Naturally.

    The other calls are compiled to jumps. But call to 'collect_word'
    in 'words_do' is not "sibicall" and dependig in calling convention
    compiler may treat it narmal call. Two other calls, that is
    call to 'words_do' in 'words_do' and call to 'collect_word' in
    'collect_word' are clearly tail self recursion and compiler
    should always optimize them to a jump.


    "Should" or not, MSVC does not eliminate them.

    The funny thing is that it does eliminate all four calls after I
    rewrote the code in more boring style.

    static
    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;
    #if 1
    if (c == 0) {
    go->f( go, r, s );
    return w;
    }
    if (is_space(c) && w) {
    go->f( go, r, s );
    return words_do( s, go );
    }
    return collect_word( s+1, r, w ^ c == '"', go );
    #else
    return
    c == 0 ? go->f( go, r, s ), w :
    is_space(c) && w ? go->f( go, r, s ), words_do( s, go ) :
    /***************/ collect_word( s+1, r, w ^ c == '"', go );
    #endif
    }

    I find such a coding style pretty much impossible to grasp and
    unpleasant to look at. I had to refactor it like this:

    ---------------

    static_Bool collect_word(char *s, char *r, _Bool w, Gopher go ) {
    char c = *s;
    #if 1
    if (c == 0) {
    go->f(go, r, s);
    return w;
    }
    if (is_space(c) && w) {
    go->f(go, r, s);
    return words_do(s, go);
    }
    return collect_word(s+1, r, (w ^ c) == '"', go);

    That's not how it was written in original. Should be:
    return collect_word(s+1, r, w ^ c == '"', go);
    Not the same thing at all. https://en.cppreference.com/w/c/language/operator_precedence


    #else
    if (c == 0) {
    go->f(go, r, s);
    return w;
    }
    else if (is_space(c) && w) {
    go->f(go, r, s);
    return words_do(s, go);
    }
    else {
    return collect_word(s+1, r, (w ^ c) = '"', go);

    The same here.

    }

    #endif
    }

    ---------------

    When I'd finished, I realised that those two conditional blocks do
    more or less the same thing! If that's what you mean by 'boring',
    then I'll all for it.


    Since I am not accustomed to the functional programming style, for me
    even a boring variant is way too entertaining.
    I prefer mundane (untested, could be buggy):

    static
    const char* collect_word(const char *s) {
    _Bool w = 0;
    char c;
    while ((c = *s) != 0) {
    if (!w && is_space(c))
    break;
    if (c == '"')
    w = !w;
    ++s;
    }
    return s;
    }

    void words_do(const char *s, Gopher go ){
    char c;
    while ((c = *s) != 0) {
    if (is_space(c)) {
    ++s;
    } else {
    const char *r = s;
    s = collect_word(s);
    go->f(go, r, s);
    }
    }
    }






    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bart@bc@freeuk.com to comp.lang.c on Wed Sep 18 10:49:38 2024
    From Newsgroup: comp.lang.c

    On 18/09/2024 09:43, Michael S wrote:
    On Wed, 18 Sep 2024 01:07:17 +0100
    Bart <bc@freeuk.com> wrote:

    I find such a coding style pretty much impossible to grasp and
    unpleasant to look at. I had to refactor it like this:

    ---------------

    static_Bool collect_word(char *s, char *r, _Bool w, Gopher go ) {
    char c = *s;
    #if 1
    if (c == 0) {
    go->f(go, r, s);
    return w;
    }
    if (is_space(c) && w) {
    go->f(go, r, s);
    return words_do(s, go);
    }
    return collect_word(s+1, r, (w ^ c) == '"', go);

    That's not how it was written in original. Should be:
    return collect_word(s+1, r, w ^ c == '"', go);
    Not the same thing at all.

    So, what you are saying is that it means 'w ^ (c == '"')'? Because there
    could be some ambiguity, I put in the brackets. I had to to guess the precedence and chose the one that seemed more plausible, but I guessed
    wrong.

    Mine version then should be:

    return collect_word(s+1, r, w ^ (c == '"'), go);


    The same here.

    I'm surprised there weren't more typos, but that's not what my post was
    about which was presentation and layout.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Wed Sep 18 12:44:51 2024
    From Newsgroup: comp.lang.c

    On 18/09/2024 11:49, Bart wrote:
    On 18/09/2024 09:43, Michael S wrote:
    On Wed, 18 Sep 2024 01:07:17 +0100
    Bart <bc@freeuk.com> wrote:


              return collect_word(s+1, r, (w ^ c) == '"', go);

    That's not how it was written in original. Should be:
              return collect_word(s+1, r, w ^ c == '"', go);
    Not the same thing at all.

    So, what you are saying is that it means 'w ^ (c == '"')'? Because there could be some ambiguity, I put in the brackets. I had to to guess the precedence and chose the one that seemed more plausible, but I guessed wrong.


    There is no ambiguity in the C language - the equality operator has
    higher precedence than the bitwise operators and logical operators.

    However, I fully agree with you that code is clearer if parenthesis are
    added. It makes the code easier to read, easier to write, and
    eliminates the risk of programmers (either those writing the code or
    those reading it) getting it wrong.

    Mine version then should be:

          return collect_word(s+1, r, w ^ (c == '"'), go);


    Put spaces around the "+" operator, and it would be perfect :-)


    The same here.

    I'm surprised there weren't more typos, but that's not what my post was about which was presentation and layout.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Wed Sep 18 05:09:08 2024
    From Newsgroup: comp.lang.c

    Michael S <already5chosen@yahoo.com> writes:

    On Tue, 17 Sep 2024 18:31:10 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    That's amusing. :)

    Do you know if icc will do tail call elimination for
    the boring version of the code?

    Output of 'icc -O2' does recursive inlining to quite significant depth,
    so it is rather hard to follow.
    But it seems that the answer is "No".

    Anyway, by now icc is mostly of historical interest.
    They ceased independent compiler development 2-3 years ago and turned
    into yet another LLVM/clang distributor.

    Thank you, that is good to know.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Wed Sep 18 06:01:33 2024
    From Newsgroup: comp.lang.c

    Michael S <already5chosen@yahoo.com> writes:

    [...]

    Since I am not accustomed to the functional programming style, for
    me even a boring variant [not shown] is way too entertaining. I
    prefer mundane (untested, could be buggy):

    static
    const char* collect_word(const char *s) {
    _Bool w = 0;
    char c;
    while ((c = *s) != 0) {
    if (!w && is_space(c))
    break;
    if (c == '"')
    w = !w;
    ++s;
    }
    return s;
    }

    void words_do(const char *s, Gopher go ){
    char c;
    while ((c = *s) != 0) {
    if (is_space(c)) {
    ++s;
    } else {
    const char *r = s;
    s = collect_word(s);
    go->f(go, r, s);
    }
    }
    }

    If writing in an imperative-rather-than-functional style, I would
    likely gravitate toward something like this:


    static const char *process_word( const char *, Gopher );

    void
    words_do( const char *s, Gopher go ){
    char c;

    while( c = *s++ ){
    if( ! is_space(c) ) s = process_word( s-1, go );
    }
    }

    const char *
    process_word( const char *r, Gopher go ){
    const char *s = r;
    _Bool q = 0;

    do q ^= *s++ == '"'; while( *s && (q || !is_space(*s)) );

    return go->f( go, r, s ), s;
    }


    which seems to result in slightly better generated code than my
    functional version, in a few spot checks using gcc or clang
    under various -O settings (-Os, -O2, -O3).
    --- Synchronet 3.20a-Linux NewsLink 1.114