I have the case where my C program is handed a string which is basically
a command line.
I have the case where my C program is handed a string which is basically
a command line.
Is there a common open source C library for tokenizing and globbing
this into an argc/argv as a shell would do? I've googled, but I get
too much C++ & other language stuff.
Note that I'm not asking for getopt(), that comes afterwards, and
I'm not asking for any variable interpolation, but just that a string
like, say
hello -world "This is foo.*" foo.*
becomes something like
my_argv[0] "hello"
my_argv[1] "-world"
my_argv[2] "This is foo.*"
my_argv[3] foo.h
my_argv[4] foo.c
my_argv[5] foo.txt
my_argc = 6
I could live without the globbing if that's a bridge too far.
On 10 Sep 2024 19:01:37 GMT, Ted Nolan <tednolan> wrote:
I have the case where my C program is handed a string which is basically
a command line.
If that’s what your OS is giving you, your OS is doing it wrong.
I have the case where my C program is handed a string which is basically
a command line.
Is there a common open source C library for tokenizing and globbing
this into an argc/argv as a shell would do? I've googled, but I get
too much C++ & other language stuff.
Note that I'm not asking for getopt(), that comes afterwards, and
I'm not asking for any variable interpolation, but just that a string
like, say
hello -world "This is foo.*" foo.*
becomes something like
my_argv[0] "hello"
my_argv[1] "-world"
my_argv[2] "This is foo.*"
my_argv[3] foo.h
my_argv[4] foo.c
my_argv[5] foo.txt
my_argc = 6
I could live without the globbing if that's a bridge too far.
On 10.09.2024 21:01, Ted Nolan <tednolan> wrote:
I have the case where my C program is handed a string which is basically
a command line.
IIUC you don't want the shell to do the expansion but, sort of,
re-invent the wheel in your application (a'la DOS). - Okay.
Is there a common open source C library for tokenizing and globbing
this into an argc/argv as a shell would do? I've googled, but I get
too much C++ & other language stuff.
I also suppose that by "tokenizing" you don't mean something like
strtok (3) - extract tokens from strings
but a field separation as the Unix shell does using 'IFS'.
I don't know of a C library but if I'd want to implement a function
that all POSIX shells do then I'd look into the shell packages...
For Kornshell (e.g. version 93u+m) I see these files in the package
src/lib/libast/include/glob.h
src/lib/libast/misc/glob.c
that obviously care about the globbing function. (I suspect you'll
need some more supporting files from the ksh package.)
HTH
Janis
ted@loft.tnolan.com (Ted Nolan <tednolan>) writes:
I have the case where my C program is handed a string which is basically
a command line.
Is there a common open source C library for tokenizing and globbing
this into an argc/argv as a shell would do? I've googled, but I get
too much C++ & other language stuff.
Note that I'm not asking for getopt(), that comes afterwards, and
I'm not asking for any variable interpolation, but just that a string
like, say
hello -world "This is foo.*" foo.*
becomes something like
my_argv[0] "hello"
my_argv[1] "-world"
my_argv[2] "This is foo.*"
my_argv[3] foo.h
my_argv[4] foo.c
my_argv[5] foo.txt
my_argc = 6
I could live without the globbing if that's a bridge too far.
What environment(s) does this need to run in?
I don't know of a standard(ish) function that does this. POSIX defines
the glob() function, but it only does globbing, not word-splitting.
If you're trying to emulate the way the shell (which one?) parses
command lines, and *if* you're on a system that has a shell, you can
invoke a shell to do the work for you. Here's a quick and dirty
example:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(void) {
const char *line = "hello -world \"This is foo.*\" foo.*";
char *cmd = malloc(50 + strlen(line));
sprintf(cmd, "printf '%%s\n' %s", line);
system(cmd);
}
This prints the arguments to stdout, one per line (and doesn't handle >arguments with embedded newlines very well). You could modify the
command to write the output to a temporary file and then read that file,
or you could use popen() if it's available.
Of course this is portable only to systems that have a Unix-style shell,
and it can even behave differently depending on how the default shell >behaves. And invoking a new process is going to make this relatively
slow, which may or may not matter depending on how many times you need
to do it.
There is no completely portable solution, since you need to be able to
get directory listings to handle wildcards.
A quick Google search points to this question:
https://stackoverflow.com/q/21335041/827263
"How to split a string using shell-like rules in C++?"
An answer refers to Boost.Program_options, which is specific to C++. >Apparently boost::program_options::split_unix() does what you're looking
for.
I have the case where my C program is handed a string which is basically
a command line.
Is there a common open source C library for tokenizing and globbing
this into an argc/argv as a shell would do? I've googled, but I get
too much C++ & other language stuff.
Note that I'm not asking for getopt(), that comes afterwards, and
I'm not asking for any variable interpolation, but just that a string
like, say
In article <lkbjchFebk9U1@mid.individual.net>,
Ted Nolan <tednolan> <tednolan> wrote:
I have the case where my C program is handed a string which is basically
a command line.
Is there a common open source C library for tokenizing and globbing
this into an argc/argv as a shell would do? I've googled, but I get
too much C++ & other language stuff.
Note that I'm not asking for getopt(), that comes afterwards, and
I'm not asking for any variable interpolation, but just that a string
like, say
Have a look at wordexp(3).
Do you think it would make sense to switch the language ?
In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Do you think it would make sense to switch the language ?
No, not an option, thanks.
#include <Windows.h>
#include <iostream>
#include <string_view>
using namespace std;
template<typename CharType, typename Consumer>
requires requires( Consumer consumer, basic_string_view<CharType> sv ) { { consumer( sv ) }; }
void Tokenize( basic_string_view<CharType> sv, Consumer consumer )
{
using sv_t = basic_string_view<CharType>;
auto it = sv.begin();
for( ; it != sv.end(); )
{
CharType end;
typename sv_t::iterator tkBegin;
if( *it == '\"' )
{
end = '\"';
tkBegin = ++it;
}
else
{
end = ' ';
tkBegin = it++;
}
for( ; it != sv.end() && *it != end; ++it );
consumer( sv_t( tkBegin, it ) );
if( it != sv.end() ) [[unlikely]]
{
while( ++it != sv.end() && *it == ' ' );
continue;
}
}
}
int main()
{
LPWSTR pCmdLine = GetCommandLineW();
size_t i = 1;
Tokenize( wstring_view( pCmdLine ), [&]( wstring_view sv )
{
wcout << i++ << L": \"" << sv << L"\"" << endl;
} );
}
Am 11.09.2024 um 14:22 schrieb Ted Nolan <tednolan>:
In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Do you think it would make sense to switch the language ?
No, not an option, thanks.
I could write a C-bridge for you.
Do you think it would make sense to switch the language ?
In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Do you think it would make sense to switch the language ?
Do you think it would make sense to pay attention to the "Newsgroups" line
in your header before clicking "Send"?
Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Do you think it would make sense to switch the language ?
Do you think it would make sense to pay attention to the "Newsgroups" line >> in your header before clicking "Send"?
I just wanted to suggest a simpler language.
Compare that with a manual implementation of the same in C.
Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Do you think it would make sense to switch the language ?
Do you think it would make sense to pay attention to the "Newsgroups" line >> in your header before clicking "Send"?
I just wanted to suggest a simpler language.
Compare that with a manual implementation of the same in C.
Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Do you think it would make sense to switch the language ?
Do you think it would make sense to pay attention to the "Newsgroups"
line
in your header before clicking "Send"?
I just wanted to suggest a simpler language.
Compare that with a manual implementation of the same in C.
In article <vbsmlb$3o6n2$1@raubtier-asyl.eternal-september.org>,
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Do you think it would make sense to switch the language ?
Do you think it would make sense to pay attention to the "Newsgroups" line >>> in your header before clicking "Send"?
I just wanted to suggest a simpler language.
Compare that with a manual implementation of the same in C.
Thanks, I appreciate that, but it does have to be C.
C++ is a simpler language? You're having a laugh!
ted@loft.tnolan.com (Ted Nolan <tednolan>) writes:
In article <vbsmlb$3o6n2$1@raubtier-asyl.eternal-september.org>,
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Do you think it would make sense to switch the language ?
Do you think it would make sense to pay attention to the "Newsgroups" line >>>> in your header before clicking "Send"?
I just wanted to suggest a simpler language.
Compare that with a manual implementation of the same in C.
Thanks, I appreciate that, but it does have to be C.
We could help you more effectively if we understood your requirements.
Why exactly does it have to be C?
What system or systems do you need to support? (I asked this before and
you didn't answer.)
If you only care about Windows, for example, that's going to affect what >solutions we can offer; likewise if you only care about POSIX-based
systems, or only about Linux-based systems.
It might also be useful to know more about the context. If this is for
some specific application, what is that application intended to do, and
why does it need to do tokenization and globbing?
In article <87cyl9zx14.fsf@nosuchdomain.example.com>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
ted@loft.tnolan.com (Ted Nolan <tednolan>) writes:
In article <vbsmlb$3o6n2$1@raubtier-asyl.eternal-september.org>,
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Do you think it would make sense to switch the language ?
Do you think it would make sense to pay attention to the "Newsgroups" line
in your header before clicking "Send"?
I just wanted to suggest a simpler language.
Compare that with a manual implementation of the same in C.
Thanks, I appreciate that, but it does have to be C.
We could help you more effectively if we understood your requirements.
Why exactly does it have to be C?
What system or systems do you need to support? (I asked this before and >>you didn't answer.)
If you only care about Windows, for example, that's going to affect what >>solutions we can offer; likewise if you only care about POSIX-based >>systems, or only about Linux-based systems.
It might also be useful to know more about the context. If this is for >>some specific application, what is that application intended to do, and
why does it need to do tokenization and globbing?
This would be for work, so I am limited in what I can say about it, but
it has to be in C because it is would be a C callout from a GT.M mumps process. GT.M stores the command line tail (everything it doesn't need
to get a program running) in the special variable $ZCMDLINE which can
be passed to a callout. I would like to parse that string as the
shell does a command line. Basically, if it isn't a C library that
is commonly available through Linux package managers I probably can't
use it. In the end this is a "nice to have" and I have a q&d approach
that I will probably use.
ted@loft.tnolan.com (Ted Nolan <tednolan>) writes:
In article <87cyl9zx14.fsf@nosuchdomain.example.com>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
ted@loft.tnolan.com (Ted Nolan <tednolan>) writes:
In article <vbsmlb$3o6n2$1@raubtier-asyl.eternal-september.org>,
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Do you think it would make sense to switch the language ?
Do you think it would make sense to pay attention to the "Newsgroups" line
in your header before clicking "Send"?
I just wanted to suggest a simpler language.
Compare that with a manual implementation of the same in C.
Thanks, I appreciate that, but it does have to be C.
We could help you more effectively if we understood your requirements.
Why exactly does it have to be C?
What system or systems do you need to support? (I asked this before and >>>you didn't answer.)
If you only care about Windows, for example, that's going to affect what >>>solutions we can offer; likewise if you only care about POSIX-based >>>systems, or only about Linux-based systems.
It might also be useful to know more about the context. If this is for >>>some specific application, what is that application intended to do, and >>>why does it need to do tokenization and globbing?
This would be for work, so I am limited in what I can say about it, but
it has to be in C because it is would be a C callout from a GT.M mumps
process. GT.M stores the command line tail (everything it doesn't need
to get a program running) in the special variable $ZCMDLINE which can
be passed to a callout. I would like to parse that string as the
shell does a command line. Basically, if it isn't a C library that
is commonly available through Linux package managers I probably can't
use it. In the end this is a "nice to have" and I have a q&d approach
that I will probably use.
Since you mentioned Linux package managers, I presume this only needs to
work on Linux-based systems, which means you can use POSIX-specific >functions. That could have been useful to know earlier.
And you might consider posting to comp.unix.programmer for more >system-specific solutions.
Earlier I suggested using system() to pass the string to the shell.
That wouldn't work on Windows, but it should be ok for your
requirements. There are good reasons not to want to do that, but "there >might not be a POSIX shell available" apparently isn't one of them.
I'd also suggest nailing down your exact requirements; "as the
shell does" is inexact, since different shells behave differently.
Suggested reading: >https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V3_chap02.html
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
GT.M stores the command line tail (everything it doesn't need
to get a program running) in the special variable $ZCMDLINE which can be passed to a callout.
In article <87cyl9zx14.fsf@nosuchdomain.example.com>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
ted@loft.tnolan.com (Ted Nolan <tednolan>) writes:
In article <vbsmlb$3o6n2$1@raubtier-asyl.eternal-september.org>,
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Do you think it would make sense to switch the language ?
Do you think it would make sense to pay attention to the "Newsgroups" line
in your header before clicking "Send"?
I just wanted to suggest a simpler language.
Compare that with a manual implementation of the same in C.
Thanks, I appreciate that, but it does have to be C.
We could help you more effectively if we understood your requirements.
Why exactly does it have to be C?
What system or systems do you need to support? (I asked this before and >>you didn't answer.)
If you only care about Windows, for example, that's going to affect what >>solutions we can offer; likewise if you only care about POSIX-based >>systems, or only about Linux-based systems.
It might also be useful to know more about the context. If this is for >>some specific application, what is that application intended to do, and
why does it need to do tokenization and globbing?
This would be for work, so I am limited in what I can say about it, but
it has to be in C because it is would be a C callout from a GT.M mumps process. GT.M stores the command line tail (everything it doesn't need
to get a program running) in the special variable $ZCMDLINE which can
be passed to a callout. I would like to parse that string as the
shell does a command line. Basically, if it isn't a C library that
is commonly available through Linux package managers I probably can't
use it. In the end this is a "nice to have" and I have a q&d approach
that I will probably use.
Am 11.09.2024 um 22:19 schrieb Bart:count was half.
C++ is a simpler language? You're having a laugh!
The solutions are simpler because you've a fifth of the code as in C.
In this case, it actually needed somewhat more code, even if the line
On 12/09/2024 03:22, Bonita Montero wrote:
Am 11.09.2024 um 22:19 schrieb Bart:
C++ is a simpler language? You're having a laugh!
The solutions are simpler because you've a fifth of the code as in C.
In this case, it actually needed somewhat more code, even if the line >count was half.
But your solutions are always incomprehensible because they strive for
the most advanced features possible.
On 12/09/2024 03:22, Bonita Montero wrote:
Am 11.09.2024 um 22:19 schrieb Bart:count was half.
C++ is a simpler language? You're having a laugh!
The solutions are simpler because you've a fifth of the code as in C.
In this case, it actually needed somewhat more code, even if the line
But your solutions are always incomprehensible because they strive for
the most advanced features possible.
Maybe I should start posting Fortran "solutions".
Or maybe Haskell?
Or Intercal?
On 12.09.2024 14:13, Kenny McCormack wrote:
Maybe I should start posting Fortran "solutions".
Or maybe Haskell?
Or Intercal?
The latter might certainly be enlightening. I had always problems
to write such code. And seeing functional code would help. - But
it's off-topic as you say. Less off-topic are (IMO) C++ solutions
in contrast to C; C++ has a C base and C appears to me to advance
"with an eye on" C++.
Thank you. system() would not work as I don't want to execute
anything, just parse into an argv-like array.
I appreciate the responses, but it looks like I will be staying with
my q&d approach for now.
columbiaclosings.com
What's not in Columbia anymore..
On 12.09.2024 13:29, Bart wrote:
On 12/09/2024 03:22, Bonita Montero wrote:
Am 11.09.2024 um 22:19 schrieb Bart:count was half.
C++ is a simpler language? You're having a laugh!
The solutions are simpler because you've a fifth of the code as in C.
In this case, it actually needed somewhat more code, even if the line
But your solutions are always incomprehensible because they strive for
the most advanced features possible.
I don't know of the other poster's solutions. But a quick browse seems
to show nothing incomprehensible or anything that should be difficult
to understand. (YMMV; especially if you're not familiar with C++ then
I'm sure the code may look like noise to you.)
In the given context of C and C++ I've always perceived the features
of C++ to add to comprehensibility of source code where the respective
C code required writing clumsy code and needed (unnecessary) syntactic ballast to implement similar functions and program constructs.
Your undifferentiated complaint sounds more like someone not willing
to understand the other concepts or have a reluctance or laziness to
make yourself familiar with them.
In article <lkf72pFd61U1@mid.individual.net>,
Ted Nolan <tednolan> <tednolan> wrote:
...
Thank you. system() would not work as I don't want to execute
anything, just parse into an argv-like array.
I appreciate the responses, but it looks like I will be staying with
my q&d approach for now.
This is a "solved problem". Or, to put it another way, if wordexp(3) is
not the solution, then there is no general solution (and that means, yes, >you'll have to "roll your own", as many here have suggested you do).
columbiaclosings.com
What's not in Columbia anymore..
Which Columbia are we talking about here? And why?
On 12/09/2024 03:22, Bonita Montero wrote:
Am 11.09.2024 um 22:19 schrieb Bart:count was half.
C++ is a simpler language? You're having a laugh!
The solutions are simpler because you've a fifth of the code as in C.
In this case, it actually needed somewhat more code, even if the line
But your solutions are always incomprehensible because they strive for
the most advanced features possible.
In article <vbujak$733i$3@dont-email.me>, Bart <bc@freeuk.com> wrote:
On 12/09/2024 03:22, Bonita Montero wrote:
Am 11.09.2024 um 22:19 schrieb Bart:count was half.
C++ is a simpler language? You're having a laugh!
The solutions are simpler because you've a fifth of the code as in C.
In this case, it actually needed somewhat more code, even if the line
But your solutions are always incomprehensible because they strive for
the most advanced features possible.
And, of course, totally off-topic.
Maybe I should start posting Fortran "solutions".
Or maybe Haskell?
Or Intercal?
It has always been CLC policy that C++ is just as off-topic as Fortran or
C# or any other language (other than C, of course). And, of course, that being "off topic" is the highest and most unforgivable sin.
I don't know of the other poster's solutions. But a quick browse seems
to show nothing incomprehensible or anything that should be difficult
to understand. (YMMV; especially if you're not familiar with C++ then
I'm sure the code may look like noise to you.)
Am 12.09.2024 um 15:20 schrieb Kenny McCormack:
It has always been CLC policy that C++ is just as off-topic as Fortran or
C# or any other language (other than C, of course). And, of course, that
being "off topic" is the highest and most unforgivable sin.
A switch to C++ is much more likely than to Fortran.
In article <vbus74$9k96$3@raubtier-asyl.eternal-september.org>,
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Am 12.09.2024 um 15:20 schrieb Kenny McCormack:
It has always been CLC policy that C++ is just as off-topic as Fortran or >>> C# or any other language (other than C, of course). And, of course, that >>> being "off topic" is the highest and most unforgivable sin.
A switch to C++ is much more likely than to Fortran.
Doesn't matter. I'm talking policy, not personal feelings.
On 12/09/2024 13:20, Janis Papanagnou wrote:
On 12.09.2024 13:29, Bart wrote:
On 12/09/2024 03:22, Bonita Montero wrote:
Am 11.09.2024 um 22:19 schrieb Bart:count was half.
C++ is a simpler language? You're having a laugh!
The solutions are simpler because you've a fifth of the code as in C.
In this case, it actually needed somewhat more code, even if the line
But your solutions are always incomprehensible because they strive for
the most advanced features possible.
I don't know of the other poster's solutions. But a quick browse seems
to show nothing incomprehensible or anything that should be difficult
to understand. (YMMV; especially if you're not familiar with C++ then
I'm sure the code may look like noise to you.)
In the given context of C and C++ I've always perceived the features
of C++ to add to comprehensibility of source code where the respective
C code required writing clumsy code and needed (unnecessary) syntactic
ballast to implement similar functions and program constructs.
Your undifferentiated complaint sounds more like someone not willing
to understand the other concepts or have a reluctance or laziness to
make yourself familiar with them.
I'm saying it's not necessary to use such advanced features to do some trivial parsing.
I've given a C solution below.
So my C version is actually smaller than the C++ when using hard tabs.
I have the case where my C program is handed a string which is basically
a command line.
On 12/09/2024 13:20, Janis Papanagnou wrote:
On 12.09.2024 13:29, Bart wrote:
On 12/09/2024 03:22, Bonita Montero wrote:
Am 11.09.2024 um 22:19 schrieb Bart:count was half.
C++ is a simpler language? You're having a laugh!
The solutions are simpler because you've a fifth of the code as
in C.
In this case, it actually needed somewhat more code, even if the
line
But your solutions are always incomprehensible because they strive
for the most advanced features possible.
I don't know of the other poster's solutions. But a quick browse
seems to show nothing incomprehensible or anything that should be
difficult to understand. (YMMV; especially if you're not familiar
with C++ then I'm sure the code may look like noise to you.)
In the given context of C and C++ I've always perceived the features
of C++ to add to comprehensibility of source code where the
respective C code required writing clumsy code and needed
(unnecessary) syntactic ballast to implement similar functions and
program constructs.
Your undifferentiated complaint sounds more like someone not willing
to understand the other concepts or have a reluctance or laziness to
make yourself familiar with them.
I'm saying it's not necessary to use such advanced features to do
some trivial parsing.
I've given a C solution below. (To test outside of Windows, remove
windows.h and set cmdline to any string containing a test input or
use a local function to get the program's command line as one string.)
It uses no special features. Anybody can understand such code.
Anybody can port it to another language far more easily than the C++. (Actually I wrote it first in my language then ported it to C. I only
needed to do 1- to 0-based conversion.)
There are two things missing compared with the C++ (other than it
uses UTF8 strings):
* Individual parameters are capped in length (to 1023 chars here).
This can be solved by determining only the span of the item then
working from that.
* Handling an unknown number of parameters is not automatic:
For the latter, the example uses a fixed array size. For a dynamic
array size, call 'strtoargs' with a count of 0 to first determine the
number of args, then allocate an array and call again to populate it.
-------------------------------------------
#include <windows.h>
#include <stdio.h>
#include <string.h>
int strtoargs(char* cmd, char** dest, int count) {
enum {ilen=1024};
char item[ilen];
int n=0, length, c;
char *p=cmd, *q, *end=&item[ilen-1];
while (c=*p++) {
if (c==' ' || c=='\t')
continue;
else if (c=='"') {
length=0;
q=item;
while (c=*p++, c!='"') {
if (c==0) {
--p;
break;
} else {
if (q<end) *q++ = c;
}
}
goto store;
} else {
length=0;
q=item;
--p;
while (c=*p++, c!=' ' && c!='\t') {
if (c==0) {
--p;
break;
} else {
if (q<end) *q++ = c;
}
}
store: *q=0;
++n;
if (n<=count) dest[n-1]=strdup(item);
}
}
return n;
}
int main(void) {
char* cmdline;
enum {cap=30};
char* args[cap];
int n;
cmdline = GetCommandLineA();
n=strtoargs(cmdline, args, cap);
for (int i=0; i<n; ++i) {
if (i<cap)
printf("%d %s\n", i, args[i]);
else
printf("%d <overflow>\n", i);
}
}
-------------------------------------------
Am 10.09.2024 um 21:01 schrieb Ted Nolan <tednolan>:
I have the case where my C program is handed a string which is basically
a command line.
I tried to experiment with that with /proc/<pid>/cmdline. The first
problem was that the arguments aren't space delimited, but broken up
with zeroes. The second problem was the cmdline-file doesn't contain
the original commandline but with expanded files.
In article <vbv04v$aci0$1@raubtier-asyl.eternal-september.org>,
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Am 10.09.2024 um 21:01 schrieb Ted Nolan <tednolan>:
I have the case where my C program is handed a string which is basically >>> a command line.
I tried to experiment with that with /proc/<pid>/cmdline. The first
problem was that the arguments aren't space delimited, but broken up
with zeroes. The second problem was the cmdline-file doesn't contain
the original commandline but with expanded files.
More OT b***s***.
Am 12.09.2024 um 14:20 schrieb Janis Papanagnou:
I don't know of the other poster's solutions. But a quick browse seems
to show nothing incomprehensible or anything that should be difficult
to understand. (YMMV; especially if you're not familiar with C++ then
I'm sure the code may look like noise to you.)
C++ shared a property with C: The language facilties are mostly that
simple that it's easy to roughly imagine the resulting code. So C++
can be written with the same mindset.
On Thu, 12 Sep 2024 14:44:03 +0100
Bart <bc@freeuk.com> wrote:
On 12/09/2024 13:20, Janis Papanagnou wrote:
* Handling an unknown number of parameters is not automatic:
For the latter, the example uses a fixed array size. For a dynamic
array size, call 'strtoargs' with a count of 0 to first determine the
number of args, then allocate an array and call again to populate it.
-------------------------------------------
#include <windows.h>
#include <stdio.h>
#include <string.h>
int strtoargs(char* cmd, char** dest, int count) {
enum {ilen=1024};
char item[ilen];
int n=0, length, c;
char *p=cmd, *q, *end=&item[ilen-1];
while (c=*p++) {
if (c==' ' || c=='\t')
continue;
else if (c=='"') {
length=0;
q=item;
while (c=*p++, c!='"') {
if (c==0) {
--p;
break;
} else {
if (q<end) *q++ = c;
}
}
goto store;
} else {
length=0;
q=item;
--p;
while (c=*p++, c!=' ' && c!='\t') {
if (c==0) {
--p;
break;
} else {
if (q<end) *q++ = c;
}
}
store: *q=0;
++n;
if (n<=count) dest[n-1]=strdup(item);
}
}
return n;
}
int main(void) {
char* cmdline;
enum {cap=30};
char* args[cap];
int n;
cmdline = GetCommandLineA();
n=strtoargs(cmdline, args, cap);
for (int i=0; i<n; ++i) {
if (i<cap)
printf("%d %s\n", i, args[i]);
else
printf("%d <overflow>\n", i);
}
}
-------------------------------------------
Apart from unnecessary ilen limit, of unnecessary goto into block (I
have nothing against forward gotos out of blocks, but gotos into blocks
make me nervous) and of variable 'length' that serves no purpose, your
code simply does not fulfill requirements of OP.
I can immediately see two gotchas: no handling of escaped double
quotation marks \" and no handling of single quotation marks. Quite
possibly there are additional omissions.
C and C++ are programmed with the same mindset.
Spaces Hard tabs
C++ 829 682 characters
Not only "roughly imagine"; I think the imperative languages have
so many common basic concepts that you can have a quite good idea,
especially if you know more than just two or three such languages.
Yes, C++ can be written with a "C" mindset. But this is nothing
I'd suggest. Better make yourself familiar with the new concepts
(OO, genericity, or even simple things like references). - IMO.
Programming C++ with only a "C" mindset I'd not consider advisable.
That's what I've generally observed; with sole knowledge of X there
seems to be an impetus and preference to infer those techniques to programming in Y. A lot of early C++ programs I've seen were just,
umm, "enhanced" "C" programs.
Am 12.09.2024 um 17:30 schrieb Janis Papanagnou:
Not only "roughly imagine"; I think the imperative languages have
so many common basic concepts that you can have a quite good idea,
especially if you know more than just two or three such languages.
Then tell me which lanuage a) has this kind of mostly minimized language-facilities and b) you can layout data structures 1:1
like they fit into memory (platform-dependent).
Yes, C++ can be written with a "C" mindset. But this is nothing
I'd suggest. Better make yourself familiar with the new concepts
(OO, genericity, or even simple things like references). - IMO.
I'm using mostly all new features as you can see from my code.
But the mindset is still the same.
Am 12.09.2024 um 17:40 schrieb Janis Papanagnou:
Programming C++ with only a "C" mindset I'd not consider advisable.
That's what I've generally observed; with sole knowledge of X there
seems to be an impetus and preference to infer those techniques to
programming in Y. A lot of early C++ programs I've seen were just,
umm, "enhanced" "C" programs.
I'm using most new language facilities, but the mindset is still the same.
On 12.09.2024 17:47, Bonita Montero wrote:
Then tell me which lanuage a) has this kind of mostly minimized
language-facilities and b) you can layout data structures 1:1
like they fit into memory (platform-dependent).
Don't know what you're trying to say here or what it is you aim
at. If you think it's worth discussing please elaborate.
I'm using mostly all new features as you can see from my code.
But the mindset is still the same.
I don't know you or your background or much of your programming.
So please understand that I'm not inclined to make any comments
about you or your code; this would be all speculative and not
contribute anything to the discussion. If you had the impression
that what I said was referring to you personally you were wrong.
On 12.09.2024 16:16, Bart wrote:
Spaces Hard tabs
C++ 829 682 characters
You are counting spaces, tabs and characters to characterize programs' quality or legibility or what? - Abandon all hope ye who enter here.
I'm counting the number of characters needed to express the function.
Since one of BM's claims is that the C++ example was smaller than C.
The difference between the two columns is whether indentation uses hard
tabs or spaces. The C version is more deeply indentated so that makes a difference. (Also the width of the tabs, but everything was measured
with tabs set to 4 characters.)
On Thu, 12 Sep 2024 14:44:03 +0100
Bart <bc@freeuk.com> wrote:
Apart from unnecessary ilen limit, of unnecessary goto into block (I
have nothing against forward gotos out of blocks, but gotos into blocks
make me nervous) and of variable 'length' that serves no purpose, your
code simply does not fulfill requirements of OP.
I can immediately see two gotchas: no handling of escaped double
quotation marks \" and no handling of single quotation marks. Quite
possibly there are additional omissions.
BM's C++ version doesn't handle embedded quotes or single quotes either.
Neither expand wildcards into sequences of filename arguments.
Am 12.09.2024 um 18:28 schrieb Bart:
BM's C++ version doesn't handle embedded quotes or single quotes either.
Neither expand wildcards into sequences of filename arguments.
Ok, that must be impossible with C++.
I just wanted to show how to do it basically and what are the
advantages: no intermediate data structure through functional
progtamming and debug iterators.
Am 12.09.2024 um 18:28 schrieb Bart:
BM's C++ version doesn't handle embedded quotes or single quotes
either.
Neither expand wildcards into sequences of filename arguments.
Ok, that must be impossible with C++.
I just wanted to show how to do it basically and what are the
advantages: no intermediate data structure through functional
progtamming and debug iterators.
I tried to experiment with that with /proc/<pid>/cmdline. The first
problem was that the arguments aren't space delimited, but broken up
with zeroes.
A lot of early C++ programs I've seen were just, umm, "enhanced" "C" programs.
On Thu, 12 Sep 2024 17:40:17 +0200, Janis Papanagnou wrote:
A lot of early C++ programs I've seen were just, umm, "enhanced" "C"
programs.
Given that C++ makes “virtual” optional instead of standard behaviour, I’d
say that C++ is in fact designed to be used that way.
On 9/12/24 18:32, Lawrence D'Oliveiro wrote:
On Thu, 12 Sep 2024 17:40:17 +0200, Janis Papanagnou wrote:
A lot of early C++ programs I've seen were just, umm, "enhanced" "C"
programs.
Given that C++ makes “virtual” optional instead of standard behaviour, >> I’d say that C++ is in fact designed to be used that way.
Like many other aspects of C++, that was dictated by a necessity of
remaining a certain minimum level of backwards compatibility with
existing C code.
On Thu, 12 Sep 2024 17:40:17 +0200, Janis Papanagnou wrote:
A lot of early C++ programs I've seen were just, umm, "enhanced" "C"
programs.
Given that C++ makes “virtual” optional instead of standard behaviour, I’d
say that C++ is in fact designed to be used that way.
Then tell me which lanuage a) has this kind of mostly minimized language-facilities and b) you can layout data structures 1:1 like they
fit into memory (platform-dependent).
On 13.09.2024 00:32, Lawrence D'Oliveiro wrote:
On Thu, 12 Sep 2024 17:40:17 +0200, Janis Papanagnou wrote:
A lot of early C++ programs I've seen were just, umm, "enhanced" "C"
programs.
Given that C++ makes “virtual” optional instead of standard behaviour, >> I’d say that C++ is in fact designed to be used that way.
There's different semantics with and without a 'virtual' specification.
On Thu, 12 Sep 2024 17:47:23 +0200, Bonita Montero wrote:
Then tell me which lanuage a) has this kind of mostly minimized
language-facilities and b) you can layout data structures 1:1 like they
fit into memory (platform-dependent).
Python.
Callback is as easy in C as in C++.
Am 13.09.2024 um 04:43 schrieb Lawrence D'Oliveiro:
On Thu, 12 Sep 2024 17:47:23 +0200, Bonita Montero wrote:
Then tell me which lanuage a) has this kind of mostly minimized
language-facilities and b) you can layout data structures 1:1 like
they fit into memory (platform-dependent).
Python.
On Thu, 12 Sep 2024 18:50:11 -0400, James Kuyper wrote:Agreed.
On 9/12/24 18:32, Lawrence D'Oliveiro wrote:
On Thu, 12 Sep 2024 17:40:17 +0200, Janis Papanagnou wrote:
A lot of early C++ programs I've seen were just, umm, "enhanced"
"C" programs.
Given that C++ makes “virtual” optional instead of standard
behaviour, I’d say that C++ is in fact designed to be used that
way.
Like many other aspects of C++, that was dictated by a necessity of remaining a certain minimum level of backwards compatibility with
existing C code.
No it wasn’t. OO was an entirely new feature, with no counterpart in
C, so there was nothing to maintain “backwards compatibility” with.
Am 12.09.2024 um 21:38 schrieb Michael S:
Callback is as easy in C as in C++.
Absolutely not because callbacks can't have state in C.
On Fri, 13 Sep 2024 07:27:47 +0200, Bonita Montero wrote:
Am 13.09.2024 um 04:43 schrieb Lawrence D'Oliveiro:
On Thu, 12 Sep 2024 17:47:23 +0200, Bonita Montero wrote:
Then tell me which lanuage a) has this kind of mostly minimized
language-facilities and b) you can layout data structures 1:1 like
they fit into memory (platform-dependent).
Python.
Have a look at <https://gitlab.com/ldo/inotipy_examples/-/blob/master/fanotify_7_example?ref_type=heads>,
and compare the C original from <https://manpages.debian.org/7/fanotify.7.en.html>. The Python code is
half the size and can use high-level async calls.
On Fri, 13 Sep 2024 07:28:34 +0200
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Am 12.09.2024 um 21:38 schrieb Michael S:
Callback is as easy in C as in C++.
Absolutely not because callbacks can't have state in C.
So what is 'context' parameter in my code?
Am 13.09.2024 um 10:38 schrieb Michael S:
On Fri, 13 Sep 2024 07:28:34 +0200
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Am 12.09.2024 um 21:38 schrieb Michael S:
Callback is as easy in C as in C++.
Absolutely not because callbacks can't have state in C.
So what is 'context' parameter in my code?
In C++ the state is an own internal "this"-like object and you dont't
need any explicit parameters.
Just a [&] and the lambda refers to the whole outer context.
On Fri, 13 Sep 2024 04:06:58 +0200, Janis Papanagnou wrote:
On 13.09.2024 00:32, Lawrence D'Oliveiro wrote:
On Thu, 12 Sep 2024 17:40:17 +0200, Janis Papanagnou wrote:
A lot of early C++ programs I've seen were just, umm, "enhanced" "C"
programs.
Given that C++ makes “virtual” optional instead of standard behaviour, >>> I’d say that C++ is in fact designed to be used that way.
There's different semantics with and without a 'virtual' specification.
Precisely. And consider what the meaning of a non-virtual destructor is:
it is essentially always the wrong thing to do.
On Fri, 13 Sep 2024 14:12:32 +0200
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Am 13.09.2024 um 10:38 schrieb Michael S:
On Fri, 13 Sep 2024 07:28:34 +0200
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Am 12.09.2024 um 21:38 schrieb Michael S:
Callback is as easy in C as in C++.
Absolutely not because callbacks can't have state in C.
So what is 'context' parameter in my code?
In C++ the state is an own internal "this"-like object and you dont't
need any explicit parameters.
So, do you admit that callback in C can have state?
Bad software engineering practice that easily leads to incomprehensible
code.
When in C++ and not in mood for C-style, I very much prefer functors. Ideologically they are the same as C-style context, but a little
sugarized syntactically.
#include <stddef.h>
void parse(const char* src,
void (*OnToken)(const char* beg, size_t len, void* context),
void* context) {
char c0 = ' ', c1 = '\t';
const char* beg = 0;
for (;;src++) {
char c = *src;
if (c == c0 || c == c1 || c == 0) {
if (beg) {
OnToken(beg, src-beg, context);
c0 = ' ', c1 = '\t';
beg = 0;
}
if (c == 0)
break;
} else if (!beg) {
beg = src;
if (c == '"') {
c0 = c1 = c;
++beg;
}
}
}
}
What exactly your response has to do with producing data structures with predefined layout?
In C++ the state is an own internal "this"-like object and you dont't
need any explicit parameters.
On Fri, 13 Sep 2024 11:49:35 +0300, Michael S wrote:
What exactly your response has to do with producing data structures with
predefined layout?
Look at those structures: they have a specific predefined layout.
On Fri, 13 Sep 2024 14:12:32 +0200, Bonita Montero wrote:
In C++ the state is an own internal "this"-like object and you dont't
need any explicit parameters.
But you need a calling convention that passes “this” explicitly.
On 13/09/2024 23:04, Lawrence D'Oliveiro wrote:
On Fri, 13 Sep 2024 11:49:35 +0300, Michael S wrote:
What exactly your response has to do with producing data structures
with predefined layout?
Look at those structures: they have a specific predefined layout.
One link is a man-page with several C structs defined ...
Am 14.09.2024 um 00:24 schrieb Lawrence D'Oliveiro:
On Fri, 13 Sep 2024 14:12:32 +0200, Bonita Montero wrote:
In C++ the state is an own internal "this"-like object and you dont't
need any explicit parameters.
But you need a calling convention that passes “this” explicitly.
That's not part of the C++-language.
On Fri, 13 Sep 2024 23:48:38 +0100, Bart wrote:
On 13/09/2024 23:04, Lawrence D'Oliveiro wrote:
On Fri, 13 Sep 2024 11:49:35 +0300, Michael S wrote:
What exactly your response has to do with producing data structures
with predefined layout?
Look at those structures: they have a specific predefined layout.
One link is a man-page with several C structs defined ...
Correct. Structures that the Python wrapper is able to map exactly.
And the choice between which particular structure variants to use is
dynamic, dependent on the event type. So the Python wrapper is able to dynamically generate a suitable type-safe wrapper -- something that a statically-typed language cannot do.
So, where IS the struct defined in that Python code?
Michael S <already5chosen@yahoo.com> writes:
[..iterate over words in a string..]
#include <stddef.h>
void parse(const char* src,
void (*OnToken)(const char* beg, size_t len, void* context),
void* context) {
char c0 = ' ', c1 = '\t';
const char* beg = 0;
for (;;src++) {
char c = *src;
if (c == c0 || c == c1 || c == 0) {
if (beg) {
OnToken(beg, src-beg, context);
c0 = ' ', c1 = '\t';
beg = 0;
}
if (c == 0)
break;
} else if (!beg) {
beg = src;
if (c == '"') {
c0 = c1 = c;
++beg;
}
}
}
}
I couldn't resist writing some code along similar lines. The
entry point is words_do(), which returns one on success and
zero if the end of string is reached inside double quotes.
typedef struct gopher_s *Gopher;
struct gopher_s { void (*f)( Gopher, const char *, const char * ); };
static _Bool collect_word( const char *, const char *, _Bool,
Gopher ); static _Bool is_space( char );
_Bool
words_do( const char *s, Gopher go ){
char c = *s;
return
is_space(c) ? words_do( s+1, go )
: c ? collect_word( s, s, 1, go )
: /***************/ 1;
}
_Bool
collect_word( const char *s, const char *r, _Bool w, Gopher go ){
char c = *s;
return
c == 0 ? go->f( go, r, s ), w
: is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
: /***************/ collect_word( s+1, r, w ^ c == '"', go );
}
_Bool
is_space( char c ){
return c == ' ' || c == '\t';
}
Also, while formally the program is written in C, by spirit it's
something else. May be, Lisp.
Lisp compilers are known to be very good at tail call elimination.
C compilers also can do it, but not reliably. In this particular
case I am afraid that common C compilers will implement it as
written, i.e. without turning recursion into iteration.
Tested on godbolt.
gcc -O2 turns it into iteration starting from v.4.4
clang -O2 turns it into iteration starting from v.4.0
Latest icc still does not turn it into iteration at least along one
code paths.
Latest MSVC implements it as written, 100% recursion.
Can you give an example implementation of go->f() ?
It seems to me that it would have to use CONTAINING_RECORD or
container_of or analogous non-standard macro.
Michael S <already5chosen@yahoo.com> writes:
[comments reordered]
Also, while formally the program is written in C, by spirit it's
something else. May be, Lisp.
I would call it a functional style, but still C. Not a C style
as most people are used to seeing it, I grant you that. I still
think of it as C though.
Lisp compilers are known to be very good at tail call elimination.
C compilers also can do it, but not reliably. In this particular
case I am afraid that common C compilers will implement it as
written, i.e. without turning recursion into iteration.
I routinely use gcc and clang, and both are good at turning
this kind of mutual recursion into iteration (-Os or higher,
although clang was able to eliminate all the recursion at -O1).
I agree the recursion elimination is not as reliable as one
would like; in practice though I find it quite usable.
Tested on godbolt.
gcc -O2 turns it into iteration starting from v.4.4
clang -O2 turns it into iteration starting from v.4.0
Both as expected.
Latest icc still does not turn it into iteration at least along one
code paths.
That's disappointing, but good to know.
Latest MSVC implements it as written, 100% recursion.
I'm not surprised at all. In my admittedly very limited experience,
MSVC is garbage.
Can you give an example implementation of go->f() ?
It seems to me that it would have to use CONTAINING_RECORD or
container_of or analogous non-standard macro.
You say that like you think such macros don't have well-defined
behavior. If I needed such a macro probably I would just
define it myself (and would be confident that it would
work correctly).
In this case I don't need a macro because I would put the gopher
struct at the beginning of the containing struct. For example:
#include <stdio.h>
typedef struct {
struct gopher_s go;
unsigned words;
} WordCounter;
static void
print_word( Gopher go, const char *s, const char *t ){
WordCounter *context = (void*) go;
int n = t-s;
printf( " word: %.*s\n", n, s );
context->words ++;
}
int
main(){
WordCounter wc = { { print_word }, 0 };
char *words = "\tthe quick \"brown fox\" jumps over the lazy dog.";
words_do( words, &wc.go );
printf( "\n" );
printf( " There were %u words found\n", wc.words );
return 0;
}
On Mon, 16 Sep 2024 00:52:26 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
[comments reordered]
Also, while formally the program is written in C, by spirit it's
something else. May be, Lisp.
I would call it a functional style, but still C. Not a C style
as most people are used to seeing it, I grant you that. I still
think of it as C though.
Lisp compilers are known to be very good at tail call elimination.
C compilers also can do it, but not reliably. In this particular
case I am afraid that common C compilers will implement it as
written, i.e. without turning recursion into iteration.
I routinely use gcc and clang, and both are good at turning
this kind of mutual recursion into iteration (-Os or higher,
although clang was able to eliminate all the recursion at -O1).
I agree the recursion elimination is not as reliable as one
would like; in practice though I find it quite usable.
Tested on godbolt.
gcc -O2 turns it into iteration starting from v.4.4
clang -O2 turns it into iteration starting from v.4.0
Latest icc still does not turn it into iteration at least along one
code paths.
That's disappointing, but good to know.
Latest MSVC implements it as written, 100% recursion.
I'm not surprised at all. In my admittedly very limited experience,
MSVC is garbage.
For sort of code that is important to me, gcc, clang and MSVC tend to generate code of similar quality.
clang is most suspect of the three to sometimes unexpectedly
produce utter crap. On the other hand, it is sometimes most
brilliant.
In case of gcc, I hate that recently they put tree-slp-vectorize
under -O2 umbrella.
Can you give an example implementation of go->f() ?
It seems to me that it would have to use CONTAINING_RECORD or
container_of or analogous non-standard macro.
You say that like you think such macros don't have well-defined
behavior. If I needed such a macro probably I would just
define it myself (and would be confident that it would
work correctly).
In this case I don't need a macro because I would put the gopher
struct at the beginning of the containing struct. For example:
#include <stdio.h>
typedef struct {
struct gopher_s go;
unsigned words;
} WordCounter;
static void
print_word( Gopher go, const char *s, const char *t ){
WordCounter *context = (void*) go;
That's what I was missing. Simple and adequate.
int n = t-s;
printf( " word: %.*s\n", n, s );
context->words ++;
}
int
main(){
WordCounter wc = { { print_word }, 0 };
char *words = "\tthe quick \"brown fox\" jumps over the lazy dog.";
words_do( words, &wc.go );
printf( "\n" );
printf( " There were %u words found\n", wc.words );
return 0;
}
There are couple of differences between your and my parsing.
1. "42""43"
You parse it as a single word, I split. It seems, your behavior is
closer to that of both bash and cmd.exe
2. I strip " characters from "-delimited words. You seem to leave them.
In this case what I do is more similar to both bash and cmd.exe
Not that it matters.
On Fri, 13 Sep 2024 09:05:04 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
[..iterate over words in a string..]
I couldn't resist writing some code along similar lines. The
entry point is words_do(), which returns one on success and
zero if the end of string is reached inside double quotes.
typedef struct gopher_s *Gopher;
struct gopher_s { void (*f)( Gopher, const char *, const char * ); };
static _Bool collect_word( const char *, const char *, _Bool,
Gopher ); static _Bool is_space( char );
_Bool
words_do( const char *s, Gopher go ){
char c = *s;
return
is_space(c) ? words_do( s+1, go )
: c ? collect_word( s, s, 1, go )
: /***************/ 1;
}
_Bool
collect_word( const char *s, const char *r, _Bool w, Gopher go ){
char c = *s;
return
c == 0 ? go->f( go, r, s ), w
: is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
: /***************/ collect_word( s+1, r, w ^ c == '"', go );
}
_Bool
is_space( char c ){
return c == ' ' || c == '\t';
}
Tested on godbolt.
gcc -O2 turns it into iteration starting from v.4.4
clang -O2 turns it into iteration starting from v.4.0
Latest icc still does not turn it into iteration at least along one
code paths.
Latest MSVC implements it as written, 100% recursion.
Michael S <already5chosen@yahoo.com> wrote:
On Fri, 13 Sep 2024 09:05:04 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
[..iterate over words in a string..]
I couldn't resist writing some code along similar lines. The
entry point is words_do(), which returns one on success and
zero if the end of string is reached inside double quotes.
typedef struct gopher_s *Gopher;
struct gopher_s { void (*f)( Gopher, const char *, const char * ); };
static _Bool collect_word( const char *, const char *, _Bool,
Gopher ); static _Bool is_space( char );
_Bool
words_do( const char *s, Gopher go ){
char c = *s;
return
is_space(c) ? words_do( s+1, go )
: c ? collect_word( s, s, 1, go )
: /***************/ 1;
}
_Bool
collect_word( const char *s, const char *r, _Bool w, Gopher go ){
char c = *s;
return
c == 0 ? go->f( go, r, s ), w
: is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
: /***************/ collect_word( s+1, r, w ^ c == '"', go );
}
_Bool
is_space( char c ){
return c == ' ' || c == '\t';
}
<snip>
Tested on godbolt.
gcc -O2 turns it into iteration starting from v.4.4
clang -O2 turns it into iteration starting from v.4.0
Latest icc still does not turn it into iteration at least along one
code paths.
Latest MSVC implements it as written, 100% recursion.
I tested using gcc 12. AFAICS calls to 'go->f' in 'collect_word'
are not tail calls and gcc 12 compiles them as normal call.
The other calls are compiled to jumps. But call to 'collect_word'
in 'words_do' is not "sibicall" and dependig in calling convention
compiler may treat it narmal call. Two other calls, that is
call to 'words_do' in 'words_do' and call to 'collect_word' in
'collect_word' are clearly tail self recursion and compiler
should always optimize them to a jump.
Michael S <already5chosen@yahoo.com> wrote:
On Fri, 13 Sep 2024 09:05:04 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
[..iterate over words in a string..]
I couldn't resist writing some code along similar lines. The
entry point is words_do(), which returns one on success and
zero if the end of string is reached inside double quotes.
typedef struct gopher_s *Gopher;
struct gopher_s { void (*f)( Gopher, const char *, const char * );
};
static _Bool collect_word( const char *, const char *, _Bool,
Gopher ); static _Bool is_space( char );
_Bool
words_do( const char *s, Gopher go ){
char c = *s;
return
is_space(c) ? words_do( s+1, go )
: c ? collect_word( s, s, 1, go )
: /***************/ 1;
}
_Bool
collect_word( const char *s, const char *r, _Bool w, Gopher go ){
char c = *s;
return
c == 0 ? go->f( go, r, s ), w
: is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
: /***************/ collect_word( s+1, r, w ^ c == '"', go );
}
_Bool
is_space( char c ){
return c == ' ' || c == '\t';
}
<snip>
Tested on godbolt.
gcc -O2 turns it into iteration starting from v.4.4
clang -O2 turns it into iteration starting from v.4.0
Latest icc still does not turn it into iteration at least along one
code paths.
Latest MSVC implements it as written, 100% recursion.
I tested using gcc 12. AFAICS calls to 'go->f' in 'collect_word'
are not tail calls and gcc 12 compiles them as normal call.
The other calls are compiled to jumps. But call to 'collect_word'
in 'words_do' is not "sibicall" and dependig in calling convention
compiler may treat it narmal call. Two other calls, that is
call to 'words_do' in 'words_do' and call to 'collect_word' in
'collect_word' are clearly tail self recursion and compiler
should always optimize them to a jump.
On Tue, 17 Sep 2024 22:34:33 -0000 (UTC)
antispam@fricas.org wrote:
Michael S <already5chosen@yahoo.com> wrote:
On Fri, 13 Sep 2024 09:05:04 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
[..iterate over words in a string..]
I couldn't resist writing some code along similar lines. The
entry point is words_do(), which returns one on success and
zero if the end of string is reached inside double quotes.
typedef struct gopher_s *Gopher;
struct gopher_s { void (*f)( Gopher, const char *, const char * );
};
static _Bool collect_word( const char *, const char *, _Bool,
Gopher ); static _Bool is_space( char );
_Bool
words_do( const char *s, Gopher go ){
char c = *s;
return
is_space(c) ? words_do( s+1, go )
: c ? collect_word( s, s, 1, go )
: /***************/ 1;
}
_Bool
collect_word( const char *s, const char *r, _Bool w, Gopher go ){
char c = *s;
return
c == 0 ? go->f( go, r, s ), w
: is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
: /***************/ collect_word( s+1, r, w ^ c == '"', go );
}
_Bool
is_space( char c ){
return c == ' ' || c == '\t';
}
<snip>
Tested on godbolt.
gcc -O2 turns it into iteration starting from v.4.4
clang -O2 turns it into iteration starting from v.4.0
Latest icc still does not turn it into iteration at least along one
code paths.
Latest MSVC implements it as written, 100% recursion.
I tested using gcc 12. AFAICS calls to 'go->f' in 'collect_word'
are not tail calls and gcc 12 compiles them as normal call.
Naturally.
The other calls are compiled to jumps. But call to 'collect_word'
in 'words_do' is not "sibicall" and dependig in calling convention
compiler may treat it narmal call. Two other calls, that is
call to 'words_do' in 'words_do' and call to 'collect_word' in
'collect_word' are clearly tail self recursion and compiler
should always optimize them to a jump.
"Should" or not, MSVC does not eliminate them.
The funny thing is that it does eliminate all four calls after I rewrote
the code in more boring style.
static
_Bool
collect_word( const char *s, const char *r, _Bool w, Gopher go ){
char c = *s;
#if 1
if (c == 0) {
go->f( go, r, s );
return w;
}
if (is_space(c) && w) {
go->f( go, r, s );
return words_do( s, go );
}
return collect_word( s+1, r, w ^ c == '"', go );
#else
return
c == 0 ? go->f( go, r, s ), w :
is_space(c) && w ? go->f( go, r, s ), words_do( s, go ) :
/***************/ collect_word( s+1, r, w ^ c == '"', go );
#endif
}
On Tue, 17 Sep 2024 22:34:33 -0000 (UTC)
antispam@fricas.org wrote:
Michael S <already5chosen@yahoo.com> wrote:
On Fri, 13 Sep 2024 09:05:04 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
[..iterate over words in a string..]
I couldn't resist writing some code along similar lines. The
entry point is words_do(), which returns one on success and
zero if the end of string is reached inside double quotes.
typedef struct gopher_s *Gopher;
struct gopher_s { void (*f)( Gopher, const char *, const char * );
};
static _Bool collect_word( const char *, const char *, _Bool,
Gopher ); static _Bool is_space( char );
_Bool
words_do( const char *s, Gopher go ){
char c = *s;
return
is_space(c) ? words_do( s+1, go )
: c ? collect_word( s, s, 1, go )
: /***************/ 1;
}
_Bool
collect_word( const char *s, const char *r, _Bool w, Gopher go ){
char c = *s;
return
c == 0 ? go->f( go, r, s ), w
: is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
: /***************/ collect_word( s+1, r, w ^ c == '"', go );
}
_Bool
is_space( char c ){
return c == ' ' || c == '\t';
}
<snip>
Tested on godbolt.
gcc -O2 turns it into iteration starting from v.4.4
clang -O2 turns it into iteration starting from v.4.0
Latest icc still does not turn it into iteration at least along one
code paths.
Latest MSVC implements it as written, 100% recursion.
I tested using gcc 12. AFAICS calls to 'go->f' in 'collect_word'
are not tail calls and gcc 12 compiles them as normal call.
Naturally.
The other calls are compiled to jumps. But call to 'collect_word'
in 'words_do' is not "sibicall" and dependig in calling convention
compiler may treat it narmal call. Two other calls, that is
call to 'words_do' in 'words_do' and call to 'collect_word' in
'collect_word' are clearly tail self recursion and compiler
should always optimize them to a jump.
"Should" or not, MSVC does not eliminate them.
The funny thing is that it does eliminate all four calls after I rewrote
the code in more boring style.
_Bool
words_do( const char *s, Gopher go ){
char c = *s;
#if 1
if (is_space(c))
return words_do( s+1, go );
if (c)
return collect_word( s, s, 1, go );
return 1;
#else
return
is_space(c) ? words_do( s+1, go ) :
c ? collect_word( s, s, 1, go ):
/***************/ 1;
#endif
}
static
_Bool
collect_word( const char *s, const char *r, _Bool w, Gopher go ){
char c = *s;
#if 1
if (c == 0) {
go->f( go, r, s );
return w;
}
if (is_space(c) && w) {
go->f( go, r, s );
return words_do( s, go );
}
return collect_word( s+1, r, w ^ c == '"', go );
#else
return
c == 0 ? go->f( go, r, s ), w :
is_space(c) && w ? go->f( go, r, s ), words_do( s, go ) :
/***************/ collect_word( s+1, r, w ^ c == '"', go );
#endif
}
"Should" or not, MSVC does not eliminate them.
That's amusing. :)
Do you know if icc will do tail call elimination for
the boring version of the code?
On Wed, 18 Sep 2024 02:46:11 +0300, Michael S wrote:
"Should" or not, MSVC does not eliminate them.
Another reason to stay away from MSVC?
On 18/09/2024 00:46, Michael S wrote:
On Tue, 17 Sep 2024 22:34:33 -0000 (UTC)
antispam@fricas.org wrote:
Michael S <already5chosen@yahoo.com> wrote:
On Fri, 13 Sep 2024 09:05:04 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
[..iterate over words in a string..]
I couldn't resist writing some code along similar lines. The
entry point is words_do(), which returns one on success and
zero if the end of string is reached inside double quotes.
typedef struct gopher_s *Gopher;
struct gopher_s { void (*f)( Gopher, const char *, const char *
); };
static _Bool collect_word( const char *, const char *, _Bool,
Gopher ); static _Bool is_space( char );
_Bool
words_do( const char *s, Gopher go ){
char c = *s;
return
is_space(c) ? words_do( s+1, go )
: c ? collect_word( s, s, 1, go )
: /***************/ 1;
}
_Bool
collect_word( const char *s, const char *r, _Bool w, Gopher go ){
char c = *s;
return
c == 0 ? go->f( go, r, s ), w
: is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
: /***************/ collect_word( s+1, r, w ^ c == '"', go );
}
_Bool
is_space( char c ){
return c == ' ' || c == '\t';
}
<snip>
Tested on godbolt.
gcc -O2 turns it into iteration starting from v.4.4
clang -O2 turns it into iteration starting from v.4.0
Latest icc still does not turn it into iteration at least along
one code paths.
Latest MSVC implements it as written, 100% recursion.
I tested using gcc 12. AFAICS calls to 'go->f' in 'collect_word'
are not tail calls and gcc 12 compiles them as normal call.
Naturally.
The other calls are compiled to jumps. But call to 'collect_word'
in 'words_do' is not "sibicall" and dependig in calling convention
compiler may treat it narmal call. Two other calls, that is
call to 'words_do' in 'words_do' and call to 'collect_word' in
'collect_word' are clearly tail self recursion and compiler
should always optimize them to a jump.
"Should" or not, MSVC does not eliminate them.
The funny thing is that it does eliminate all four calls after I
rewrote the code in more boring style.
static
_Bool
collect_word( const char *s, const char *r, _Bool w, Gopher go ){
char c = *s;
#if 1
if (c == 0) {
go->f( go, r, s );
return w;
}
if (is_space(c) && w) {
go->f( go, r, s );
return words_do( s, go );
}
return collect_word( s+1, r, w ^ c == '"', go );
#else
return
c == 0 ? go->f( go, r, s ), w :
is_space(c) && w ? go->f( go, r, s ), words_do( s, go ) :
/***************/ collect_word( s+1, r, w ^ c == '"', go );
#endif
}
I find such a coding style pretty much impossible to grasp and
unpleasant to look at. I had to refactor it like this:
---------------
static_Bool collect_word(char *s, char *r, _Bool w, Gopher go ) {
char c = *s;
#if 1
if (c == 0) {
go->f(go, r, s);
return w;
}
if (is_space(c) && w) {
go->f(go, r, s);
return words_do(s, go);
}
return collect_word(s+1, r, (w ^ c) == '"', go);
#else
if (c == 0) {
go->f(go, r, s);
return w;
}
else if (is_space(c) && w) {
go->f(go, r, s);
return words_do(s, go);
}
else {
return collect_word(s+1, r, (w ^ c) = '"', go);
}
#endif
}
---------------
When I'd finished, I realised that those two conditional blocks do
more or less the same thing! If that's what you mean by 'boring',
then I'll all for it.
On Wed, 18 Sep 2024 01:07:17 +0100
Bart <bc@freeuk.com> wrote:
I find such a coding style pretty much impossible to grasp and
unpleasant to look at. I had to refactor it like this:
---------------
static_Bool collect_word(char *s, char *r, _Bool w, Gopher go ) {
char c = *s;
#if 1
if (c == 0) {
go->f(go, r, s);
return w;
}
if (is_space(c) && w) {
go->f(go, r, s);
return words_do(s, go);
}
return collect_word(s+1, r, (w ^ c) == '"', go);
That's not how it was written in original. Should be:
return collect_word(s+1, r, w ^ c == '"', go);
Not the same thing at all.
The same here.
On 18/09/2024 09:43, Michael S wrote:
On Wed, 18 Sep 2024 01:07:17 +0100
Bart <bc@freeuk.com> wrote:
return collect_word(s+1, r, (w ^ c) == '"', go);
That's not how it was written in original. Should be:
return collect_word(s+1, r, w ^ c == '"', go);
Not the same thing at all.
So, what you are saying is that it means 'w ^ (c == '"')'? Because there could be some ambiguity, I put in the brackets. I had to to guess the precedence and chose the one that seemed more plausible, but I guessed wrong.
Mine version then should be:
return collect_word(s+1, r, w ^ (c == '"'), go);
The same here.
I'm surprised there weren't more typos, but that's not what my post was about which was presentation and layout.
On Tue, 17 Sep 2024 18:31:10 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
That's amusing. :)
Do you know if icc will do tail call elimination for
the boring version of the code?
Output of 'icc -O2' does recursive inlining to quite significant depth,
so it is rather hard to follow.
But it seems that the answer is "No".
Anyway, by now icc is mostly of historical interest.
They ceased independent compiler development 2-3 years ago and turned
into yet another LLVM/clang distributor.
Since I am not accustomed to the functional programming style, for
me even a boring variant [not shown] is way too entertaining. I
prefer mundane (untested, could be buggy):
static
const char* collect_word(const char *s) {
_Bool w = 0;
char c;
while ((c = *s) != 0) {
if (!w && is_space(c))
break;
if (c == '"')
w = !w;
++s;
}
return s;
}
void words_do(const char *s, Gopher go ){
char c;
while ((c = *s) != 0) {
if (is_space(c)) {
++s;
} else {
const char *r = s;
s = collect_word(s);
go->f(go, r, s);
}
}
}
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 984 |
Nodes: | 10 (0 / 10) |
Uptime: | 82:18:15 |
Calls: | 12,854 |
Calls today: | 3 |
Files: | 186,574 |
Messages: | 3,214,708 |