In attempting writting a simple language, I had a thought of what language is to share. Because I saw many people are stuck in thinking C/C++ (or other high level language) can be so abstract, unlimited 'high level' to mysteriously
solve various human description of idea.
C and assembly are essentially the same, maybe better call it 'portable assembly'.
In C, we don't explicitly specify how wide the register/memory unit is, we use
char/int (short/long, signed/unsigned) to denote the basic unit. I.e.
a=b; // equ. to "mov a,b"
The 2nd difference: Assembly contains too many burdomsom labels. In C, we use 'structure', for example:
while(a<b) { // 'while', '(', ')' may be the place for implicit lables
a+=1;
} // '}' is an implicit label
if(a<b) {
} else { // '{', '}' are implicit labels
} // ditto
The 3rd difference: Function calling convention in C is reentrance-able (mostly).
The 4th difference: Local variable.
(Assembly can theoritically do the same but I don't have impression which one support this feature.)
On 14/04/2026 15:47, wij wrote:
In attempting writting a simple language, I had a thought of what language is
to share. Because I saw many people are stuck in thinking C/C++ (or other high level language) can be so abstract, unlimited 'high level' to mysteriously
solve various human description of idea.
C and assembly are essentially the same, maybe better call it 'portable assembly'.
In C, we don't explicitly specify how wide the register/memory unit is, we use
char/int (short/long, signed/unsigned) to denote the basic unit. I.e.
a=b; // equ. to "mov a,b"
What C's 'a=b' equates to in assembly could be anything, depending on
target machine, the types of 'a' and 'b', their scopes and linkage, the compiler used, and the optimisation levels employed.
The 2nd difference: Assembly contains too many burdomsom labels. In C, we use
'structure', for example:
while(a<b) { // 'while', '(', ')' may be the place for implicit lables
a+=1;
} // '}' is an implicit label
if(a<b) {
} else { // '{', '}' are implicit labels
} // ditto
The 3rd difference: Function calling convention in C is reentrance-able (mostly).
The 4th difference: Local variable.
(Assembly can theoritically do the same but I don't have impression which one
support this feature.)
So basically, C and Assembly are NOT essentially the same. C has farAnyway, IMO, 'portable assembly' is more descriptive.
more abstractions: it is a HLL.
And actually, there are at least a couple of language levels I've used--- Synchronet 3.21f-Linux NewsLink 1.2
that sit between Assembly and C.
On Tue, 2026-04-14 at 18:45 +0100, Bart wrote:
On 14/04/2026 15:47, wij wrote:
In attempting writting a simple language, I had a thought of what language is
to share. Because I saw many people are stuck in thinking C/C++ (or other >> > high level language) can be so abstract, unlimited 'high level' to mysteriously
solve various human description of idea.
C and assembly are essentially the same, maybe better call it 'portable assembly'.
In C, we don't explicitly specify how wide the register/memory unit is, we use
char/int (short/long, signed/unsigned) to denote the basic unit. I.e.
a=b; // equ. to "mov a,b"
What C's 'a=b' equates to in assembly could be anything, depending on
target machine, the types of 'a' and 'b', their scopes and linkage, the
compiler used, and the optimisation levels employed.
The 2nd difference: Assembly contains too many burdomsom labels. In C, we use
'structure', for example:
while(a<b) { // 'while', '(', ')' may be the place for implicit lables
a+=1;
} // '}' is an implicit label
if(a<b) {
} else { // '{', '}' are implicit labels
} // ditto
The 3rd difference: Function calling convention in C is reentrance-able (mostly).
The 4th difference: Local variable.
(Assembly can theoritically do the same but I don't have impression which one
support this feature.)
So basically, C and Assembly are NOT essentially the same. C has far
more abstractions: it is a HLL.
Anyway, IMO, 'portable assembly' is more descriptive.
'High-Level Language' is anyone's interpretation (prone to mis-interpretation and
misunderstanding).
'Assembly' can also be like C:
// This is 'assembly'
def int=32bit; // Choose right bits for your platform, or leave it for
def char= 8bit; // compiler to decide.
int a;
char b;
a=b; // allow auto promotion
while(a<b) {
a+=1;
}
You also can call the above example 'C'. If so, you still have to know how wide
int/char is (Not rare. programmers often struggle which size to use) while writing "a=b", eventually. What the 'abstracton' really mean? Maybe, eventually
back to int32_t and int8_t after long theoretical/phillisophical pondering?
HHL is just 'style' in favor of specific purpose than the other for me. I am not
saying it is wrong, instead it is very helpful (measured by actuall effort and
gain).
And actually, there are at least a couple of language levels I've used
that sit between Assembly and C.
On Tue, 2026-04-14 at 18:45 +0100, Bart wrote:
On 14/04/2026 15:47, wij wrote:
In attempting writting a simple language, I had a thought of what language is
to share. Because I saw many people are stuck in thinking C/C++ (or other >>> high level language) can be so abstract, unlimited 'high level' to mysteriously
solve various human description of idea.
C and assembly are essentially the same, maybe better call it 'portable assembly'.
In C, we don't explicitly specify how wide the register/memory unit is, we use
char/int (short/long, signed/unsigned) to denote the basic unit. I.e.
a=b; // equ. to "mov a,b"
What C's 'a=b' equates to in assembly could be anything, depending on
target machine, the types of 'a' and 'b', their scopes and linkage, the
compiler used, and the optimisation levels employed.
The 2nd difference: Assembly contains too many burdomsom labels. In C, we use
'structure', for example:
while(a<b) { // 'while', '(', ')' may be the place for implicit lables
a+=1;
} // '}' is an implicit label
if(a<b) {
} else { // '{', '}' are implicit labels
} // ditto
The 3rd difference: Function calling convention in C is reentrance-able (mostly).
The 4th difference: Local variable.
(Assembly can theoritically do the same but I don't have impression which one
support this feature.)
So basically, C and Assembly are NOT essentially the same. C has far
more abstractions: it is a HLL.
Anyway, IMO, 'portable assembly' is more descriptive.
'High-Level Language' is anyone's interpretation (prone to mis-interpretation and
misunderstanding).
'Assembly' can also be like C:
// This is 'assembly'
def int=32bit; // Choose right bits for your platform, or leave it for
def char= 8bit; // compiler to decide.
int a;
char b;
a=b; // allow auto promotion
while(a<b) {
a+=1;
}
You also can call the above example 'C'.
If so, you still have to know how wide
int/char is (Not rare. programmers often struggle which size to use) while writing "a=b", eventually. What the 'abstracton' really mean? Maybe, eventually
back to int32_t and int8_t after long theoretical/phillisophical pondering?
HHL is just 'style' in favor of specific purpose than the other for me. I am not
saying it is wrong, instead it is very helpful (measured by actuall effort and
gain).
But if you want to call C some kind of assembler, even though it is
several levels above actual assembly, then that's up to you.
In attempting writting a simple language, I had a thought of what language is to share. Because I saw many people are stuck in thinking C/C++ (or other high level language) can be so abstract, unlimited 'high level' to mysteriously
solve various human description of idea.
C and assembly are essentially the same, maybe better call it 'portable assembly'.
In C, we don't explicitly specify how wide the register/memory unit is, we use
char/int (short/long, signed/unsigned) to denote the basic unit. I.e.
a=b; // equ. to "mov a,b"
On 2026-04-14 23:41, Bart wrote:
But if you want to call C some kind of assembler, even though it is
several levels above actual assembly, then that's up to you.
Can you name and describe a couple of these "several levels above
actual assembly"? (Assembler macros might qualify as one level.)
Beyond the inherent subjective aspects of that or the OP's initial
statement I certainly see "C" closer to the machine than many HLLs.
It certainly depends on where one is coming from; from an abstract
or user-application level or from the machine level.
There was often mentioned here - very much to the despise of the
audience - that there's a lot effort necessary to implement simple
concepts. To jump on that bandwagon; how would, say, Awk's array
construct map[key] = value have to be modeled in (native) "C".
(Note that this simple statement represents an associative array.)
"C" is abstracting from the machine. And the OP's initial statement
"C and assembly are essentially the same" may be nonsense
wij <wyniijj5@gmail.com> writes:[Repeat] 'Assembly' can also be like C:
In attempting writting a simple language, I had a thought of what language is
to share. Because I saw many people are stuck in thinking C/C++ (or other high level language) can be so abstract, unlimited 'high level' to mysteriously
solve various human description of idea.
C and assembly are essentially the same, maybe better call it 'portable assembly'.
No, C is not any kind of assembly. Assembly language and C are fundamentally different.
An assembly language program specifies a sequence of CPU instructions.
A C program specifies run-time behavior. (A compiler generates CPU instructions behind the scenes to implement that behavior.)Being 'portable', it should specify 'run-time behavior', no exact instructions.
All mentioned could also be implemented in assembly.In C, we don't explicitly specify how wide the register/memory unit is, we use
char/int (short/long, signed/unsigned) to denote the basic unit. I.e.
a=b; // equ. to "mov a,b"
(Or "mov b,a" depending on the assembly syntax.)
Nope. `a=b` could translate to a lot of different instruction
sequences. Either or both of the operands could be registers. There
might or might not be different "mov" instructions for integers, pointers, floating-point values. a and b could be large structs, and the
assignment might be translated to a call to memcpy(), or to equivalent
inline code.
Or the assignment might not result in any code at all, if the compiler
can prove that it has no side effects and the value of a is not used.
[...]
On 14/04/2026 19:41, wij wrote:
On Tue, 2026-04-14 at 18:45 +0100, Bart wrote:
On 14/04/2026 15:47, wij wrote:
In attempting writting a simple language, I had a thought of what language is
to share. Because I saw many people are stuck in thinking C/C++ (or other
high level language) can be so abstract, unlimited 'high level' to mysteriously
solve various human description of idea.
C and assembly are essentially the same, maybe better call it 'portable assembly'.
In C, we don't explicitly specify how wide the register/memory unit is, we use
char/int (short/long, signed/unsigned) to denote the basic unit. I.e.
a=b; // equ. to "mov a,b"
What C's 'a=b' equates to in assembly could be anything, depending on target machine, the types of 'a' and 'b', their scopes and linkage, the compiler used, and the optimisation levels employed.
The 2nd difference: Assembly contains too many burdomsom labels. In C, we use
'structure', for example:
while(a<b) { // 'while', '(', ')' may be the place for implicit lables
a+=1;
} // '}' is an implicit label
if(a<b) {
} else { // '{', '}' are implicit labels
} // ditto
The 3rd difference: Function calling convention in C is reentrance-able (mostly).
The 4th difference: Local variable.
(Assembly can theoritically do the same but I don't have impression which one
support this feature.)
So basically, C and Assembly are NOT essentially the same. C has far
more abstractions: it is a HLL.
Anyway, IMO, 'portable assembly' is more descriptive.
'High-Level Language' is anyone's interpretation (prone to mis-interpretation and
misunderstanding).
'Assembly' can also be like C:
// This is 'assembly'
def int=32bit; // Choose right bits for your platform, or leave it for
def char= 8bit; // compiler to decide.
int a;
char b;
a=b; // allow auto promotion
while(a<b) {
a+=1;
}
You also can call the above example 'C'.
That's because it is pretty much C. It's not like any assembly I've ever seen!
If so, you still have to know how wide
int/char is (Not rare. programmers often struggle which size to use) while writing "a=b", eventually. What the 'abstracton' really mean? Maybe, eventually
back to int32_t and int8_t after long theoretical/phillisophical pondering?
Wrap the above into a viable C function. Paste it into godbolt.org, then look at the actual assembly that is generation for combinations of
target, compile and options. All will be different. Some may not even generate any code for that loop. (You might also try Clang with -emit-llvm.)
Then change the types of a and b, say to floats or pointers, and do it again. The assembly will changer yet again, even though you've modified nothing else. That is a characteristic of a HLL: you change one small
part, and the generated code changes across the program.
But if you want to call C some kind of assembler, even though it is
several levels above actual assembly, then that's up to you.
Do you program (read/write) IL directly?HHL is just 'style' in favor of specific purpose than the other for me. I am not
saying it is wrong, instead it is very helpful (measured by actuall effort and
gain).
Below C there are HLAs or high-level assemblers, which at one time were
also called machine-oriented languages, intended for humans to use. And
a little below that might be intermediate languages (ILs), usually machine-generated, intended for compiler backends.
ILs will be target-independent and so portable to some extent. I'd say
that 'portable assembly' fits those better.
(I've implemented, or devised and implemented, all the four levelsI am not talking about compiler technology.
discussed here. There are also other languages in this space such as
PL/M, or Forth.)
On Tue, 2026-04-14 at 15:31 -0700, Keith Thompson wrote:
wij <wyniijj5@gmail.com> writes:
In attempting writting a simple language, I had a thought of what language is
to share. Because I saw many people are stuck in thinking C/C++ (or other >> > high level language) can be so abstract, unlimited 'high level' to mysteriously
solve various human description of idea.
C and assembly are essentially the same, maybe better call it 'portable assembly'.
No, C is not any kind of assembly. Assembly language and C are
fundamentally different.
An assembly language program specifies a sequence of CPU instructions.
[Repeat] 'Assembly' can also be like C:
// This is 'assembly'
def int=32bit; // Choose right bits for your platform, or leave it for
def char= 8bit; // compiler to decide.
int a;
char b;
a=b; // allow auto promotion
while(a<b) {
a+=1;
}
Yes, the C-like example above specifies exactly a sequence of CPU instructions
(well, small deviation is allowed, and assembly can also have function, macro)
A C program specifies run-time behavior. (A compiler generates CPU
instructions behind the scenes to implement that behavior.)
Being 'portable', it should specify 'run-time behavior', no exact instructions.
In C, we don't explicitly specify how wide the register/memory unit is, we use
char/int (short/long, signed/unsigned) to denote the basic unit. I.e.
a=b; // equ. to "mov a,b"
(Or "mov b,a" depending on the assembly syntax.)
Nope. `a=b` could translate to a lot of different instruction
sequences. Either or both of the operands could be registers. There
might or might not be different "mov" instructions for integers, pointers, >> floating-point values. a and b could be large structs, and the
assignment might be translated to a call to memcpy(), or to equivalent
inline code.
Or the assignment might not result in any code at all, if the compiler
can prove that it has no side effects and the value of a is not used.
[...]
All mentioned could also be implemented in assembly.
Note that I am not saying C is assembly.
C and assembly are essentially the same
wij <wyniijj5@gmail.com> writes:I think you realize the example above is just an example to demo my idea.
On Tue, 2026-04-14 at 15:31 -0700, Keith Thompson wrote:
wij <wyniijj5@gmail.com> writes:
In attempting writting a simple language, I had a thought of what language is
to share. Because I saw many people are stuck in thinking C/C++ (or other
high level language) can be so abstract, unlimited 'high level' to mysteriously
solve various human description of idea.
C and assembly are essentially the same, maybe better call it 'portable assembly'.
No, C is not any kind of assembly. Assembly language and C are fundamentally different.
An assembly language program specifies a sequence of CPU instructions.
[Repeat] 'Assembly' can also be like C:
// This is 'assembly'
def int=32bit; // Choose right bits for your platform, or leave it for
def char= 8bit; // compiler to decide.
Compiler? You said this was assembly.
int a;
char b;
a=b; // allow auto promotion
while(a<b) {
a+=1;
}
You've claimed that that's assembly language. What assembler?
For what CPU?
Is it even for a real assembler?
How/what do you specify 'run-time behavior'? Not based on CPU?Yes, the C-like example above specifies exactly a sequence of CPU instructions
(well, small deviation is allowed, and assembly can also have function, macro)
A C program specifies run-time behavior. (A compiler generates CPU instructions behind the scenes to implement that behavior.)
Being 'portable', it should specify 'run-time behavior', no exact instructions.
Yes, that's what I said. And that's the fundamental difference between assembly and C.
When I heard 'sophisticated assemblers', I would think something like my idea ofIn C, we don't explicitly specify how wide the register/memory unit is, we use
char/int (short/long, signed/unsigned) to denote the basic unit. I.e.
a=b; // equ. to "mov a,b"
(Or "mov b,a" depending on the assembly syntax.)
Nope. `a=b` could translate to a lot of different instruction sequences. Either or both of the operands could be registers. There might or might not be different "mov" instructions for integers, pointers,
floating-point values. a and b could be large structs, and the assignment might be translated to a call to memcpy(), or to equivalent inline code.
Or the assignment might not result in any code at all, if the compiler can prove that it has no side effects and the value of a is not used.
[...]
All mentioned could also be implemented in assembly.
Sure, many C compilers can generate assembly code. But I question your claim that an assembler can plausibly generate a call to memcpy() for something that looks like a simple assignment.
Many assemblers support macros, but the assembly language still
specifies the sequence of CPU instructions.
If you can cite a real-world "assembler" that behaves that way,
there might be something to discuss.
Note that I am not saying C is assembly.
You said that "C and assembly are essentially the same, maybe better
call it 'portable assembly'." I disagree.
I had a similar discussion here some time ago. As I recall, the
other participant repeatedly claimed that sophisticated assemblers
that don't generate specified sequences of CPU instructions are
common, but never provided an example. (I haven't been able to
track down the discussion.)
On 14/04/2026 23:20, Janis Papanagnou wrote:
On 2026-04-14 23:41, Bart wrote:
But if you want to call C some kind of assembler, even though it is
several levels above actual assembly, then that's up to you.
Can you name and describe a couple of these "several levels above
actual assembly"? (Assembler macros might qualify as one level.)
I said C is several levels above, and mentioned 2 categories and 2
specific ones that can be considered to be in-between.
Namely:
* HLAs (high-level assemblers) of various kinds, as this is a broad
category (see note)
* Intermediate languages (IRs/ILs) such as LLVM IR
* Forth
* PL/M (an old one; there was also C--, now dead)
(Note: the one I implemented was called 'Babbage', devised for the GEC
4000 machines. My task was to port it to DEC PDP10. There's something
about it 2/3 down this page: https://en.wikipedia.org/wiki/GEC_4000_series)
Beyond the inherent subjective aspects of that or the OP's initial
statement I certainly see "C" closer to the machine than many HLLs.
I see it as striving to distance itself from the machine as much as possible!
Certainly until C99 when stdint.h came along.
For example:
* Not committing to actual machine types, widths or representations,
such as a 'byte', or 'twos complement'.
* Being vague about the relations between the different integer types
* Not allowing (until standardised after half a century) binary
literals, and still not allowing those to be printed
* Not being allowed to do a dozen things that you KNOW are well-defined
on your target machine, but C says are UB.
It certainly depends on where one is coming from; from an abstract
or user-application level or from the machine level.
There was often mentioned here - very much to the despise of the
audience - that there's a lot effort necessary to implement simple
concepts. To jump on that bandwagon; how would, say, Awk's array
construct map[key] = value have to be modeled in (native) "C".
(Note that this simple statement represents an associative array.)
"C" is abstracting from the machine. And the OP's initial statement
"C and assembly are essentially the same" may be nonsense
Actually, describing C as 'portable assembly' annoys me which is why I
went into some detail.
On Tue, 2026-04-14 at 21:46 -0700, Keith Thompson wrote:
wij <wyniijj5@gmail.com> writes:
On Tue, 2026-04-14 at 15:31 -0700, Keith Thompson wrote:
wij <wyniijj5@gmail.com> writes:
In attempting writting a simple language, I had a thought of what language is
to share. Because I saw many people are stuck in thinking C/C++ (or other >>>>> high level language) can be so abstract, unlimited 'high level' to mysteriously
solve various human description of idea.
C and assembly are essentially the same, maybe better call it 'portable assembly'.
No, C is not any kind of assembly. Assembly language and C are
fundamentally different.
An assembly language program specifies a sequence of CPU instructions.
[Repeat] 'Assembly' can also be like C:
// This is 'assembly'
def int=32bit; // Choose right bits for your platform, or leave it for
def char= 8bit; // compiler to decide.
Compiler? You said this was assembly.
int a;
char b;
a=b; // allow auto promotion
while(a<b) {
a+=1;
}
You've claimed that that's assembly language. What assembler?
For what CPU?
Is it even for a real assembler?
I think you realize the example above is just an example to demo my idea.
Yes, the C-like example above specifies exactly a sequence of CPU instructions
(well, small deviation is allowed, and assembly can also have function, macro)
A C program specifies run-time behavior. (A compiler generates CPU
instructions behind the scenes to implement that behavior.)
Being 'portable', it should specify 'run-time behavior', no exact instructions.
Yes, that's what I said. And that's the fundamental difference between
assembly and C.
How/what do you specify 'run-time behavior'? Not based on CPU?
E.g. in C, int types are fixed-size, have range, wrap-around, alignment
and 'atomic','overlapping' properties, you cannot really understand or hide it and
program C/C++ correctly from the high-level concept of 'integer'.
The point is that C has NO WAY get rid of these (hard-ware) features, no matter
how high-level one thinks C is or expect C would be.
On Tue, 2026-04-14 at 22:41 +0100, Bart wrote:
HHL is just 'style' in favor of specific purpose than the other for me. I am not
saying it is wrong, instead it is very helpful (measured by actuall effort and
gain).
Below C there are HLAs or high-level assemblers, which at one time were
also called machine-oriented languages, intended for humans to use. And
a little below that might be intermediate languages (ILs), usually
machine-generated, intended for compiler backends.
ILs will be target-independent and so portable to some extent. I'd say
that 'portable assembly' fits those better.
Do you program (read/write) IL directly?
I am talking about the language that human uses directly.
(I've implemented, or devised and implemented, all the four levels
discussed here. There are also other languages in this space such as
PL/M, or Forth.)
I am not talking about compiler technology.
On Tue, 2026-04-14 at 21:46 -0700, Keith Thompson wrote:
int a;
char b;
a=b; // allow auto promotion
while(a<b) {
a+=1;
}
You've claimed that that's assembly language. What assembler?
For what CPU?
Is it even for a real assembler?
I think you realize the example above is just an example to demo my idea.
Yes, the C-like example above specifies exactly a sequence of CPU instructions
(well, small deviation is allowed, and assembly can also have function, macro)
A C program specifies run-time behavior. (A compiler generates CPU
instructions behind the scenes to implement that behavior.)
Being 'portable', it should specify 'run-time behavior', no exact instructions.
Yes, that's what I said. And that's the fundamental difference between
assembly and C.
How/what do you specify 'run-time behavior'? Not based on CPU?
E.g. in C, int types are fixed-size, have range, wrap-around, alignment
and 'atomic','overlapping' properties, you cannot really understand or hide it and
program C/C++ correctly from the high-level concept of 'integer'.
The point is that C has NO WAY get rid of these (hard-ware) features, no matter
how high-level one thinks C is or expect C would be.
I had a similar discussion here some time ago. As I recall, the
other participant repeatedly claimed that sophisticated assemblers
that don't generate specified sequences of CPU instructions are
common, but never provided an example. (I haven't been able to
track down the discussion.)
When I heard 'sophisticated assemblers', I would think something like my idea of
'portable' assembly, but maybe different.
One my point should be clear as stated in the above int example "... C has NO WAY
get rid of these (hard-ware) features, no matter how high-level one thinks C is or
expect C would be."
On 15/04/2026 05:20, wij wrote:There are many reasons for those have chose C. I agree the dominant one
On Tue, 2026-04-14 at 22:41 +0100, Bart wrote:
HHL is just 'style' in favor of specific purpose than the other for me. I am not
saying it is wrong, instead it is very helpful (measured by actuall effort and
gain).
Below C there are HLAs or high-level assemblers, which at one time were also called machine-oriented languages, intended for humans to use. And
a little below that might be intermediate languages (ILs), usually machine-generated, intended for compiler backends.
ILs will be target-independent and so portable to some extent. I'd say that 'portable assembly' fits those better.
Do you program (read/write) IL directly?
I am talking about the language that human uses directly.
It is possible to write IL directly, when a textual form of it exists.
Not many do that, but then not many write assembly these days either, /because more convenient higher level languages exist/, one of them being C.
Why do /you/ think that people prefer to use C to write programs rather
than assembly, if they are 'essentially the same'?
(I've implemented, or devised and implemented, all the four levels discussed here. There are also other languages in this space such as PL/M, or Forth.)
I am not talking about compiler technology.
You claimed that C and assembler are at pretty much the same level. I'm saying that they are not only at different levels, but other levelsThis argument of 'level' seems based on engineering of compiler of multiple languages. My point of view is based on theory of computation and maybe psychological recognition.
exist, and I know because I've used them!
A compiler can choose to translate a language to any of those levels, including C (from a higher level language than C usually).
On 15/04/2026 07:05, wij wrote:Exactly. But not really 'invented'. I feagured if anyone wants to implement
On Tue, 2026-04-14 at 21:46 -0700, Keith Thompson wrote:
int a;
char b;
a=b; // allow auto promotion
while(a<b) {
a+=1;
}
You've claimed that that's assembly language. What assembler?
For what CPU?
Is it even for a real assembler?
I think you realize the example above is just an example to demo my idea.
So you've invented an 'assembly' syntax that looks exactly like C, in
order to support your notion that C and assembly are really the same thing!
Real assembly generally uses explicit instructions and labels ratherYou may say that.
than the implicit ones used here. It would also have limits on the complexity of expressions. If your pseudo-assembler supports:
a = b+c*f(x,y);
then you've invented a HLL.
I calim C is (maybe I should use 'may be'. Sometimes I feel the conversationYes, the C-like example above specifies exactly a sequence of CPU instructions
(well, small deviation is allowed, and assembly can also have function, macro)
A C program specifies run-time behavior. (A compiler generates CPU instructions behind the scenes to implement that behavior.)
Being 'portable', it should specify 'run-time behavior', no exact instructions.
Yes, that's what I said. And that's the fundamental difference between assembly and C.
How/what do you specify 'run-time behavior'? Not based on CPU?
E.g. in C, int types are fixed-size, have range, wrap-around, alignment
and 'atomic','overlapping' properties, you cannot really understand or hide it and
program C/C++ correctly from the high-level concept of 'integer'.
The point is that C has NO WAY get rid of these (hard-ware) features, no matter
how high-level one thinks C is or expect C would be.
There are a dozen or more HLLs that have exactly such a set of integer types. Actually, those have fixed-width integers with fixed ranges, wrap-around behaviour, twos complement format and so on, even more so
than C.
So those HLLs (that is, C++, C#, D, Rust, Java, Zig, Go, ...) are even
more closely tied to the machine than C is. (In C, built-in types are
not sized, but have mininum widths, and until C23, integer
representation was not specified.)
Would you claim that those are also essentially assembly? If not, why not?
Thanks for the example. I did not stress 'C is assembly', maybe it isI had a similar discussion here some time ago. As I recall, the
other participant repeatedly claimed that sophisticated assemblers
that don't generate specified sequences of CPU instructions are
common, but never provided an example. (I haven't been able to
track down the discussion.)
When I heard 'sophisticated assemblers', I would think something like my idea of
'portable' assembly, but maybe different.
One my point should be clear as stated in the above int example "... C has NO WAY
get rid of these (hard-ware) features, no matter how high-level one thinks C is or
expect C would be."
Starting with C23, C has _BitInt, where you can define a 1000000-bit
integer type if you want. (There may be limits as to how big.)
Or a 37-bit type.
While I don't agree with such a feature for this language (partly
/because/ it is a big departure from machine types), it is a
counter-example to your point.
On Wed, 2026-04-15 at 11:46 +0100, Bart wrote:Typo: 'structured assembly'
On 15/04/2026 07:05, wij wrote:
On Tue, 2026-04-14 at 21:46 -0700, Keith Thompson wrote:
int a;
char b;
a=b; // allow auto promotion
while(a<b) {
a+=1;
}
You've claimed that that's assembly language. What assembler?
For what CPU?
Is it even for a real assembler?
I think you realize the example above is just an example to demo my idea.
So you've invented an 'assembly' syntax that looks exactly like C, in order to support your notion that C and assembly are really the same thing!
Exactly. But not really 'invented'. I feagured if anyone wants to implement
a 'portable assembly', he would find it not much different from C (from the example shown, 'structured C'). So, in a sense, not worthy to implement.
Real assembly generally uses explicit instructions and labels rather
than the implicit ones used here. It would also have limits on the complexity of expressions. If your pseudo-assembler supports:
a = b+c*f(x,y);
then you've invented a HLL.
You may say that.
Typo: char carr[sizeof(float)];Yes, the C-like example above specifies exactly a sequence of CPU instructions
(well, small deviation is allowed, and assembly can also have function, macro)
A C program specifies run-time behavior. (A compiler generates CPU
instructions behind the scenes to implement that behavior.)
Being 'portable', it should specify 'run-time behavior', no exact instructions.
Yes, that's what I said. And that's the fundamental difference between
assembly and C.
How/what do you specify 'run-time behavior'? Not based on CPU?
E.g. in C, int types are fixed-size, have range, wrap-around, alignment and 'atomic','overlapping' properties, you cannot really understand or hide it and
program C/C++ correctly from the high-level concept of 'integer'.
The point is that C has NO WAY get rid of these (hard-ware) features, no matter
how high-level one thinks C is or expect C would be.
There are a dozen or more HLLs that have exactly such a set of integer types. Actually, those have fixed-width integers with fixed ranges, wrap-around behaviour, twos complement format and so on, even more so
than C.
So those HLLs (that is, C++, C#, D, Rust, Java, Zig, Go, ...) are even more closely tied to the machine than C is. (In C, built-in types are
not sized, but have mininum widths, and until C23, integer
representation was not specified.)
Would you claim that those are also essentially assembly? If not, why not?
I calim C is (maybe I should use 'may be'. Sometimes I feel the conversation is difficult) 'portable assembly' is because C (subset) could map to 'assembly'
and in a sense have to. E.g.
int p2; // p2 is connected to extern hardware
p2=0;
p2=0; // significant (hard-ware knows the second 'touch' triggers different
// action (or for delay purpose).
And, in union, I don't how 'high-level' can explain the way read/write part of float object officially.
union {
char carr[sizeof(uint64_t)]; // C++ guarantees sizeof(char)==1 float f;
}
--- Synchronet 3.21f-Linux NewsLink 1.2I had a similar discussion here some time ago. As I recall, the
other participant repeatedly claimed that sophisticated assemblers
that don't generate specified sequences of CPU instructions are
common, but never provided an example. (I haven't been able to
track down the discussion.)
When I heard 'sophisticated assemblers', I would think something like my idea of
'portable' assembly, but maybe different.
One my point should be clear as stated in the above int example "... C has NO WAY
get rid of these (hard-ware) features, no matter how high-level one thinks C is or
expect C would be."
Starting with C23, C has _BitInt, where you can define a 1000000-bit integer type if you want. (There may be limits as to how big.)
Or a 37-bit type.
While I don't agree with such a feature for this language (partly /because/ it is a big departure from machine types), it is a counter-example to your point.
Thanks for the example. I did not stress 'C is assembly', maybe it is
because I saw too many Bonita-type of programming concept to stress 'portable assembly' (also I think it may be helpful to others).
My understand of C is that the development of C is simply from practical needs
(i.e. rare of C is from 'theoretical imagination'). Maybe _BitInt is the same but
I don't know.
On Wed, 2026-04-15 at 11:21 +0100, Bart wrote:
On 15/04/2026 05:20, wij wrote:
On Tue, 2026-04-14 at 22:41 +0100, Bart wrote:
HHL is just 'style' in favor of specific purpose than the other for me. I am not
saying it is wrong, instead it is very helpful (measured by actuall effort and
gain).
Below C there are HLAs or high-level assemblers, which at one time were >>>> also called machine-oriented languages, intended for humans to use. And >>>> a little below that might be intermediate languages (ILs), usually
machine-generated, intended for compiler backends.
ILs will be target-independent and so portable to some extent. I'd say >>>> that 'portable assembly' fits those better.
Do you program (read/write) IL directly?
I am talking about the language that human uses directly.
It is possible to write IL directly, when a textual form of it exists.
Not many do that, but then not many write assembly these days either,
/because more convenient higher level languages exist/, one of them being C. >>
Why do /you/ think that people prefer to use C to write programs rather
than assembly, if they are 'essentially the same'?
There are many reasons for those have chose C. I agree the dominant one
is support and convenience.
(I've implemented, or devised and implemented, all the four levels
discussed here. There are also other languages in this space such as
PL/M, or Forth.)
I am not talking about compiler technology.
You claimed that C and assembler are at pretty much the same level. I'm
saying that they are not only at different levels, but other levels
exist, and I know because I've used them!
A compiler can choose to translate a language to any of those levels,
including C (from a higher level language than C usually).
This argument of 'level' seems based on engineering of compiler of multiple languages. My point of view is based on theory of computation and maybe psychological recognition.
On Wed, 2026-04-15 at 11:46 +0100, Bart wrote:
There are a dozen or more HLLs that have exactly such a set of integer
types. Actually, those have fixed-width integers with fixed ranges,
wrap-around behaviour, twos complement format and so on, even more so
than C.
So those HLLs (that is, C++, C#, D, Rust, Java, Zig, Go, ...) are even
more closely tied to the machine than C is. (In C, built-in types are
not sized, but have mininum widths, and until C23, integer
representation was not specified.)
Would you claim that those are also essentially assembly? If not, why not?
I calim C is (maybe I should use 'may be'. Sometimes I feel the conversation is difficult) 'portable assembly' is because C (subset) could map to 'assembly'
and in a sense have to. E.g.
int p2; // p2 is connected to extern hardware
p2=0;
p2=0; // significant (hard-ware knows the second 'touch' triggers different
// action (or for delay purpose).
The 4th difference: Local variable.
(Assembly can theoritically do the same but I don't have impression which one support this feature.)
On Wed, 2026-04-15 at 11:46 +0100, Bart wrote:
On 15/04/2026 07:05, wij wrote:
On Tue, 2026-04-14 at 21:46 -0700, Keith Thompson wrote:So you've invented an 'assembly' syntax that looks exactly like C, in
int a;
char b;
a=b; // allow auto promotion
while(a<b) {
a+=1;
}
You've claimed that that's assembly language. What assembler?
For what CPU?
Is it even for a real assembler?
I think you realize the example above is just an example to demo my idea. >>
order to support your notion that C and assembly are really the same thing!
Exactly. But not really 'invented'. I feagured if anyone wants to implement
a 'portable assembly', he would find it not much different from C (from the example shown, 'structured C'). So, in a sense, not worthy to implement.
Real assembly generally uses explicit instructions and labels rather
than the implicit ones used here. It would also have limits on the
complexity of expressions. If your pseudo-assembler supports:
a = b+c*f(x,y);
then you've invented a HLL.
You may say that.
Yes, the C-like example above specifies exactly a sequence of CPU instructions
(well, small deviation is allowed, and assembly can also have function, macro)
A C program specifies run-time behavior. (A compiler generates CPU >>>>>> instructions behind the scenes to implement that behavior.)
Being 'portable', it should specify 'run-time behavior', no exact instructions.
Yes, that's what I said. And that's the fundamental difference between >>>> assembly and C.
How/what do you specify 'run-time behavior'? Not based on CPU?
E.g. in C, int types are fixed-size, have range, wrap-around, alignment
and 'atomic','overlapping' properties, you cannot really understand or hide it and
program C/C++ correctly from the high-level concept of 'integer'.
The point is that C has NO WAY get rid of these (hard-ware) features, no matter
how high-level one thinks C is or expect C would be.
There are a dozen or more HLLs that have exactly such a set of integer
types. Actually, those have fixed-width integers with fixed ranges,
wrap-around behaviour, twos complement format and so on, even more so
than C.
So those HLLs (that is, C++, C#, D, Rust, Java, Zig, Go, ...) are even
more closely tied to the machine than C is. (In C, built-in types are
not sized, but have mininum widths, and until C23, integer
representation was not specified.)
Would you claim that those are also essentially assembly? If not, why not?
I calim C is (maybe I should use 'may be'. Sometimes I feel the conversation is difficult) 'portable assembly' is because C (subset) could map to 'assembly'
and in a sense have to. E.g.
int p2; // p2 is connected to extern hardware
p2=0;
p2=0; // significant (hard-ware knows the second 'touch' triggers different
// action (or for delay purpose).
And, in union, I don't how 'high-level' can explain the way read/write part of float object officially.
union {
char carr[sizeof(float)]; // C++ guarantees sizeof(char)==1
float f;
}
The 4th difference: Local variable.
(Assembly can theoritically do the same but I don't have impression which one
support this feature.)
If you are talking about function-local data, there are multiple ways
to do store them in an easy-to-clean-up fashion:
- Volatile registers, for the shortest lived data. Calling other
functions causes them to be overwritten with the function's
return value or irrelevant data.
- Non-volatile registers, for data that need to persist across
function calls. You save the contents of them before using them,
as your caller expects the contents of these registers to be
intact once you return.
- The stack, for long-lived function-local data when you are out of
non-volatile registers. You manipulate a dedicated stack pointer
register to allocate and deallocate space for your data.
- Immediates, and the .rodata (ELF) / .rdata (PE) section, for
constants and tables of constants.
The notion of local variable allows you to ignore all of these in C,
though. Assembly having multiple ways to store local data instead of
one can make things fairly complicated to read, write and debug.
(forwarding to alt.lang.asm because you are comparing C with it)
On 15/04/2026 12:52, wij wrote:
On Wed, 2026-04-15 at 11:21 +0100, Bart wrote:
On 15/04/2026 05:20, wij wrote:
On Tue, 2026-04-14 at 22:41 +0100, Bart wrote:
HHL is just 'style' in favor of specific purpose than the other for me. I am not
saying it is wrong, instead it is very helpful (measured by actuall effort and
gain).
Below C there are HLAs or high-level assemblers, which at one time were
also called machine-oriented languages, intended for humans to use. And
a little below that might be intermediate languages (ILs), usually machine-generated, intended for compiler backends.
ILs will be target-independent and so portable to some extent. I'd say
that 'portable assembly' fits those better.
Do you program (read/write) IL directly?
I am talking about the language that human uses directly.
It is possible to write IL directly, when a textual form of it exists.
Not many do that, but then not many write assembly these days either, /because more convenient higher level languages exist/, one of them being C.
Why do /you/ think that people prefer to use C to write programs rather than assembly, if they are 'essentially the same'?
There are many reasons for those have chose C. I agree the dominant one
is support and convenience.
(I've implemented, or devised and implemented, all the four levels discussed here. There are also other languages in this space such as PL/M, or Forth.)
I am not talking about compiler technology.
You claimed that C and assembler are at pretty much the same level. I'm saying that they are not only at different levels, but other levels exist, and I know because I've used them!
A compiler can choose to translate a language to any of those levels, including C (from a higher level language than C usually).
This argument of 'level' seems based on engineering of compiler of multiple languages. My point of view is based on theory of computation and maybe psychological recognition.
If you take syntax out of the equation, and then 'lower' what's leftI guess yes, every foundamental of computation is the same (e.g. assembly and TM), yet you might surprise: assembly (or TM) is more powerful than any other formal 'language' (formal system), including those seen in math/logic.
(ie. flatten the various abstractions), then probably you can compare
the behaviour of a lot of languages with assembly.
However, those things are exactly what HLLs are about, while that
removing of syntax and lowering is exactly what compilers do.
That's why we use HLLs and not ASM unless we need to.
On 15/04/2026 14:21, wij wrote:I switched from C to C++ 30 years ago. But that is 'theoretical', I see things from real world side. I think you approach 'C' from standard documents, that is
On Wed, 2026-04-15 at 11:46 +0100, Bart wrote:
There are a dozen or more HLLs that have exactly such a set of integer types. Actually, those have fixed-width integers with fixed ranges, wrap-around behaviour, twos complement format and so on, even more so than C.
So those HLLs (that is, C++, C#, D, Rust, Java, Zig, Go, ...) are even more closely tied to the machine than C is. (In C, built-in types are
not sized, but have mininum widths, and until C23, integer
representation was not specified.)
Would you claim that those are also essentially assembly? If not, why not?
I calim C is (maybe I should use 'may be'. Sometimes I feel the conversation
is difficult) 'portable assembly' is because C (subset) could map to 'assembly'
and in a sense have to. E.g.
int p2; // p2 is connected to extern hardware
p2=0;
p2=0; // significant (hard-ware knows the second 'touch' triggers different
// action (or for delay purpose).
You are not making any sense. I don't think you understand what C is,
how the language is defined, or how typical C implementations work.
In C, when you write the code above there is /nothing/ to suggest thatAs I know, 'old-time' C has no optimization.
there should be two actions.
C compilers can - and many will - combineNot a valid reason.
the two "p2 = 0;" statements. This is critical to understanding why C
is not in any sense an "assembler".
In assembly languages, if you writeAssembly compiler (or language) can also do the same optimization.
the equivalent of "p2 = 0;" twice, you get the appropriate opcode twice.
In C, the language do not require an operation for the statement "p2 = 0;". They require that after that statement, any observable behaviour produced by the program will be as if the value 0 had been assigned toYou need a model now by saying so.
the object "p2".
Repeating that same requirement does not change it -Nope, I quit C (but I keep watching C, since part of C++ is C)
the compiler does not have to have implement "p2 = 0;" twice. (It is
free to do so twice - or two hundred times if it likes. And if the
value of p2 is not used, it can be completely eliminated.)
Have you actually done any C programming at all?
I think the local variable thing (of my trial) depends on the program model (computation model). Thanks for the info, your suggestion relates to more complicated stuff than I encountered (I am compiler newbie).The 4th difference: Local variable.
(Assembly can theoritically do the same but I don't have impression which one
support this feature.)
If you are talking about function-local data, there are multiple ways
to do store them in an easy-to-clean-up fashion:
- Volatile registers, for the shortest lived data. Calling other functions causes them to be overwritten with the function's
return value or irrelevant data.
- Non-volatile registers, for data that need to persist across
function calls. You save the contents of them before using them, as your caller expects the contents of these registers to be intact once you return.
- The stack, for long-lived function-local data when you are out of non-volatile registers. You manipulate a dedicated stack pointer register to allocate and deallocate space for your data.
- Immediates, and the .rodata (ELF) / .rdata (PE) section, for
constants and tables of constants.
The notion of local variable allows you to ignore all of these in C,
though. Assembly having multiple ways to store local data instead of
one can make things fairly complicated to read, write and debug.
(forwarding to alt.lang.asm because you are comparing C with it)
(forwarding to alt.lang.asm because you are comparing C with it)
W dniu 15.04.2026 o 15:40, makendo pisze:
(forwarding to alt.lang.asm because you are comparing C with it)
Great, but what is wrong with comp.lang.asm ? I subcribe it instead any ohter asm related groups. Is this wrong aproach?
--
Jacek Marcin Jaworski, Pruszcz Gd., woj. Pomorskie, Polska 🇵🇱, EU 🇪🇺;
tel.: +48-609-170-742, najlepiej w godz.: 5:00-5:55 lub 16:00-17:25; <jmj@energokod.gda.pl>, gpg: 4A541AA7A6E872318B85D7F6A651CC39244B0BFA;
Domowa s. WWW: <https://energokod.gda.pl>;
Mini Netykieta: <https://energokod.gda.pl/MiniNetykieta.html>; Mailowa Samoobrona: <https://emailselfdefense.fsf.org/pl>.
UWAGA:
NIE ZACIĄGAJ "UKRYTEGO DŁUGU"! PŁAĆ ZA PROG. FOSS I INFO. INTERNETOWE! CZYTAJ DARMOWY: "17. Raport Totaliztyczny - Patroni Kontra Bankierzy": <https://energokod.gda.pl/raporty-totaliztyczne/17.%20Patroni%20Kontra%20Bankierzy.pdf>
On Wed, 2026-04-15 at 15:38 +0200, David Brown wrote:
On 15/04/2026 14:21, wij wrote:
On Wed, 2026-04-15 at 11:46 +0100, Bart wrote:You are not making any sense. I don't think you understand what C is,
I calim C is (maybe I should use 'may be'. Sometimes I feel the conversation
There are a dozen or more HLLs that have exactly such a set of integer >>>> types. Actually, those have fixed-width integers with fixed ranges,
wrap-around behaviour, twos complement format and so on, even more so
than C.
So those HLLs (that is, C++, C#, D, Rust, Java, Zig, Go, ...) are even >>>> more closely tied to the machine than C is. (In C, built-in types are
not sized, but have mininum widths, and until C23, integer
representation was not specified.)
Would you claim that those are also essentially assembly? If not, why not? >>>
is difficult) 'portable assembly' is because C (subset) could map to 'assembly'
and in a sense have to. E.g.
int p2; // p2 is connected to extern hardware
p2=0;
p2=0; // significant (hard-ware knows the second 'touch' triggers different
// action (or for delay purpose).
how the language is defined, or how typical C implementations work.
I switched from C to C++ 30 years ago.
But that is 'theoretical', I see things
from real world side. I think you approach 'C' from standard documents, that is
not the way of understanding. You cannot understand the world by/from reading the bible.
In C, when you write the code above there is /nothing/ to suggest that
there should be two actions.
As I know, 'old-time' C has no optimization.
C compilers can - and many will - combine
the two "p2 = 0;" statements. This is critical to understanding why C
is not in any sense an "assembler".
Not a valid reason.
In assembly languages, if you write
the equivalent of "p2 = 0;" twice, you get the appropriate opcode twice.
Assembly compiler (or language) can also do the same optimization.
In C, the language do not require an operation for the statement "p2 = >> 0;". They require that after that statement, any observable behaviour
produced by the program will be as if the value 0 had been assigned to
the object "p2".
You need a model now by saying so.
Repeating that same requirement does not change it -
the compiler does not have to have implement "p2 = 0;" twice. (It is
free to do so twice - or two hundred times if it likes. And if the
value of p2 is not used, it can be completely eliminated.)
Have you actually done any C programming at all?
Nope, I quit C (but I keep watching C, since part of C++ is C)
On Wed, 15 Apr 2026 20:23:52 +0200
🇵🇱Jacek Marcin Jaworski🇵🇱<jmj@energokod.gda.pl> wrote:
W dniu 15.04.2026 o 15:40, makendo pisze:DYM comp.lang.asm.x86?
(forwarding to alt.lang.asm because you are comparing C with it)Great, but what is wrong with comp.lang.asm ? I subcribe it instead any
ohter asm related groups. Is this wrong aproach?
comp.lang.asm is an empty header for me on eternal september's feed.
The newsgroup comp.lang.asm is generally considered an unmoderated Usenet group. Unlike comp.lang.asm.x86, which is known to be moderated, comp.lang.asm does not have an official moderation process and typically allows posts to appear without prior review.
sig is overlong, and crowded, IMHO.
On 15/04/2026 13:21, wij wrote:
On Wed, 2026-04-15 at 11:46 +0100, Bart wrote:
On 15/04/2026 07:05, wij wrote:
On Tue, 2026-04-14 at 21:46 -0700, Keith Thompson wrote:
int a;
char b;
a=b; // allow auto promotion
while(a<b) {
a+=1;
}
You've claimed that that's assembly language. What assembler?
For what CPU?
Is it even for a real assembler?
I think you realize the example above is just an example to demo my idea.
So you've invented an 'assembly' syntax that looks exactly like C, in order to support your notion that C and assembly are really the same thing!
Exactly. But not really 'invented'. I feagured if anyone wants to implement a 'portable assembly', he would find it not much different from C (from the example shown, 'structured C'). So, in a sense, not worthy to implement.
Real assembly generally uses explicit instructions and labels rather
than the implicit ones used here. It would also have limits on the complexity of expressions. If your pseudo-assembler supports:
a = b+c*f(x,y);
then you've invented a HLL.
You may say that.
It sounds like you don't understand the difference between a low-level language and a high-level one.
These days C might be considered mid-level (I call it a lower-level HLL, because so many HLLs are much higher level and more abstract).
Compiling a HLL involves lowering it to a different representation, say
from language A to language B.
But just because that translation happens to be routine, doesn't mean
and A is essentially B.
Yes, the C-like example above specifies exactly a sequence of CPU instructions
(well, small deviation is allowed, and assembly can also have function, macro)
A C program specifies run-time behavior. (A compiler generates CPU
instructions behind the scenes to implement that behavior.)
Being 'portable', it should specify 'run-time behavior', no exact instructions.
Yes, that's what I said. And that's the fundamental difference between
assembly and C.
How/what do you specify 'run-time behavior'? Not based on CPU?
E.g. in C, int types are fixed-size, have range, wrap-around, alignment and 'atomic','overlapping' properties, you cannot really understand or hide it and
program C/C++ correctly from the high-level concept of 'integer'.
The point is that C has NO WAY get rid of these (hard-ware) features, no matter
how high-level one thinks C is or expect C would be.
There are a dozen or more HLLs that have exactly such a set of integer types. Actually, those have fixed-width integers with fixed ranges, wrap-around behaviour, twos complement format and so on, even more so than C.
So those HLLs (that is, C++, C#, D, Rust, Java, Zig, Go, ...) are even more closely tied to the machine than C is. (In C, built-in types are
not sized, but have mininum widths, and until C23, integer
representation was not specified.)
Would you claim that those are also essentially assembly? If not, why not?
I calim C is (maybe I should use 'may be'. Sometimes I feel the conversation
is difficult) 'portable assembly' is because C (subset) could map to 'assembly'
and in a sense have to. E.g.
int p2; // p2 is connected to extern hardware
p2=0;
p2=0; // significant (hard-ware knows the second 'touch' triggers different
// action (or for delay purpose).
That won't work in C. 'p2' is likely to be in a register; that extra
write may be elided.
You'd have to use 'volatile' to guard against that. But you still can't control where p2 is put into memory. C /is/ used for this stuff, but all sorts of special extensions, or compiler specifics, may be employed.
In assembly it's much easier.
I have a Soft-CPU class you might be insterested suitable for various kind of script languages. The idea should not be too difficult to implement in C. ------------------------------------------------------------------------------- Wy.Sct.Spu(3wy) Wy.Sct.Spu(3wy) NAMEAnd, in union, I don't how 'high-level' can explain the way read/write part of float object officially.
union {
char carr[sizeof(float)]; // C++ guarantees sizeof(char)==1 float f;
}
(Fixed that sizeof.)
I normally use my own systems language. That one is aligned much more directly to hardware than C is, even though it is marginally higher level.
This is because C is intended to work on possible hardware, while mine
was created to work with one target as a time.
Also, when I started on mine (c. 1982 rather than 1972), hardware was already standardising on 8-bit bytes, byte-addressed, power-of-two word sizes, and twos-complement integers.
I don't however consider my language to be a form of assembly for lots
of reasons already mentioned.
Its compilers use 3 internal representations before it gets to native code:
HLL source -> AST -> IL -> MCL -> Native
'MCL' is the internal representation of the native code. If I need ASM output, then MCL can be dumped into a suitable syntax (I support 4
different ASM syntaxes for x64).
This MCL/ASM itself has abstractions, so the same 'MOV' mnemonic is used
for a dozens of different move instructions that each have different
binary opcodes.
On 15/04/2026 18:58, wij wrote:
On Wed, 2026-04-15 at 15:38 +0200, David Brown wrote:
On 15/04/2026 14:21, wij wrote:
On Wed, 2026-04-15 at 11:46 +0100, Bart wrote:
There are a dozen or more HLLs that have exactly such a set of integer
types. Actually, those have fixed-width integers with fixed ranges, wrap-around behaviour, twos complement format and so on, even more so than C.
So those HLLs (that is, C++, C#, D, Rust, Java, Zig, Go, ...) are even
more closely tied to the machine than C is. (In C, built-in types are not sized, but have mininum widths, and until C23, integer representation was not specified.)
Would you claim that those are also essentially assembly? If not, why not?
I calim C is (maybe I should use 'may be'. Sometimes I feel the conversation
is difficult) 'portable assembly' is because C (subset) could map to 'assembly'
and in a sense have to. E.g.
int p2; // p2 is connected to extern hardware
p2=0;
p2=0; // significant (hard-ware knows the second 'touch' triggers different
// action (or for delay purpose).
You are not making any sense. I don't think you understand what C is, how the language is defined, or how typical C implementations work.
I switched from C to C++ 30 years ago.
I don't think you understand C++ either. In the context of this discussion, it is not different from C.
But that is 'theoretical', I see things
from real world side. I think you approach 'C' from standard documents, that is
not the way of understanding. You cannot understand the world by/from reading
the bible.
No, I understand C and C++ from using them in real-world code - as well
as knowing what the code means and what is guaranteed by the language. Practical experience tells you what works well in practice - but
theoretical knowledge tells you what you can expect so that you are not
just programming by luck and "it worked for me when I tried it".
In C, when you write the code above there is /nothing/ to suggest that there should be two actions.
As I know, 'old-time' C has no optimization.
Nonsense.
Modern C compilers often do more optimisation than older ones, but there
was never a "pre-optimisation" world. Things like eliminating dead
code, or optimising based on knowing that signed integer overflow never occurs in a correct program, have been around from early tools. I have used heavily optimising compilers for 30 years.
C compilers can - and many will - combine
the two "p2 = 0;" statements. This is critical to understanding why C is not in any sense an "assembler".
Not a valid reason.
What do you mean by that? It's a fact, not a "reason".
In assembly languages, if you write
the equivalent of "p2 = 0;" twice, you get the appropriate opcode twice.
Assembly compiler (or language) can also do the same optimization.
No, assemblers cannot do that - if they did, they would not be
assemblers. An assembler directly translates your instructions from mnemonic codes (assembly instructions) to binary opcodes. Some
assemblers might have pseudo-instructions that translate to more than
one binary opcode, but always in a specific defined pattern.
In C, the language do not require an operation for the statement "p2 =
0;". They require that after that statement, any observable behaviour produced by the program will be as if the value 0 had been assigned to the object "p2".
You need a model now by saying so.
Again, I don't know what you are trying to say.
Repeating that same requirement does not change it -
the compiler does not have to have implement "p2 = 0;" twice. (It is free to do so twice - or two hundred times if it likes. And if the value of p2 is not used, it can be completely eliminated.)
Have you actually done any C programming at all?
Nope, I quit C (but I keep watching C, since part of C++ is C)
Okay, have you ever actually done any C++ programming? The languagesYou are really a sick person. Looser of the real world. You just don't know yourself.
share the same philosophy here.
On Tue, 2026-04-14 at 21:46 -0700, Keith Thompson wrote:
wij <wyniijj5@gmail.com> writes:
On Tue, 2026-04-14 at 15:31 -0700, Keith Thompson wrote:
wij <wyniijj5@gmail.com> writes:[Repeat] 'Assembly' can also be like C:
In attempting writting a simple language, I had a thought of what language is
to share. Because I saw many people are stuck in thinking C/C++ (or other
high level language) can be so abstract, unlimited 'high level' to mysteriously
solve various human description of idea.
C and assembly are essentially the same, maybe better call it 'portable assembly'.
No, C is not any kind of assembly. Assembly language and C are
fundamentally different.
An assembly language program specifies a sequence of CPU instructions. >> >
// This is 'assembly'
def int=32bit; // Choose right bits for your platform, or leave it for
def char= 8bit; // compiler to decide.
Compiler? You said this was assembly.
int a;
char b;
a=b; // allow auto promotion
while(a<b) {
a+=1;
}
You've claimed that that's assembly language. What assembler?
For what CPU?
Is it even for a real assembler?
I think you realize the example above is just an example to demo my idea.
Yes, the C-like example above specifies exactly a sequence of CPU instructions
(well, small deviation is allowed, and assembly can also have function, macro)
A C program specifies run-time behavior. (A compiler generates CPU
instructions behind the scenes to implement that behavior.)
Being 'portable', it should specify 'run-time behavior', no exact instructions.
Yes, that's what I said. And that's the fundamental difference between
assembly and C.
How/what do you specify 'run-time behavior'? Not based on CPU?
E.g. in C, int types are fixed-size, have range, wrap-around, alignment
and 'atomic','overlapping' properties, you cannot really understand or hide it and
program C/C++ correctly from the high-level concept of 'integer'.
The point is that C has NO WAY get rid of these (hard-ware) features, no matter
how high-level one thinks C is or expect C would be.
When I heard 'sophisticated assemblers', I would think something like
my idea of 'portable' assembly, but maybe different. One my point
should be clear as stated in the above int example "... C has NO WAY
get rid of these (hard-ware) features, no matter how high-level one
thinks C is or expect C would be."
On Wed, 2026-04-15 at 15:38 +0200, David Brown wrote:[...]
In C, when you write the code above there is /nothing/ to suggest that
there should be two actions.
As I know, 'old-time' C has no optimization.
In assembly languages, if you write
the equivalent of "p2 = 0;" twice, you get the appropriate opcode twice.
Assembly compiler (or language) can also do the same optimization.
On 15/04/2026 01:33, Bart wrote:[...]
Certainly until C99 when stdint.h came along.I would not draw that distinction - indeed, I see the opposite. Prior
to <stdint.h>, your integer type sizes were directly from the target
machine - with <stdint.h> explicitly sized integer types, they are now independent of the target hardware.
On Wed, 2026-04-15 at 22:11 +0200, David Brown wrote:[...]
Okay, have you ever actually done any C++ programming? The languages
share the same philosophy here.
You are really a sick person. Looser of the real world. You just don't know yourself.
I have a gold medal, an aluminum medal and a bronze commemorative plaque (for
solving a riddle of Northrop Coorp.). What you have? Well... a paper (paid for)
and still making false memory everyday for yourself.
I retired at 37, can you?
Ah, recently, you also failed to verify a simple program that proves
3x+1 problem. Fact is not made by mouth (like DJT?), looser.
On Wed, 2026-04-15 at 15:06 +0100, Bart wrote:
The boundary of assembly and HLL is not clear to me.
I had wrote a killer-grade commercial assembly program, it may still be running
today after >30 years. My experience is that assembly is not that scary as commonly
thought, just don't think in low level.
wij <wyniijj5@gmail.com> writes:I had thought questions like yours might have been due to the English problem.
On Tue, 2026-04-14 at 21:46 -0700, Keith Thompson wrote:
wij <wyniijj5@gmail.com> writes:
On Tue, 2026-04-14 at 15:31 -0700, Keith Thompson wrote:
wij <wyniijj5@gmail.com> writes:
In attempting writting a simple language, I had a thought of what language is
to share. Because I saw many people are stuck in thinking C/C++ (or other
high level language) can be so abstract, unlimited 'high level' to mysteriously
solve various human description of idea.
C and assembly are essentially the same, maybe better call it 'portable assembly'.
No, C is not any kind of assembly. Assembly language and C are fundamentally different.
An assembly language program specifies a sequence of CPU instructions.
[Repeat] 'Assembly' can also be like C:
// This is 'assembly'
def int=32bit; // Choose right bits for your platform, or leave it for
def char= 8bit; // compiler to decide.
Compiler? You said this was assembly.
int a;
char b;
a=b; // allow auto promotion
while(a<b) {
a+=1;
}
You've claimed that that's assembly language. What assembler?
For what CPU?
Is it even for a real assembler?
I think you realize the example above is just an example to demo my idea.
I hadn't. I realize it now that you've admitted it.
In other words, you made it up.
I don't believe there is any real-world assembler that accepts
that syntax. Your example is meaningless.
For every assembler I've used, the assembly language input
unambiguously specifies the sequence of CPU instructions in the
generated object file. Support for macros do not change that;
it just means the mapping is slightly more complicated.
Cite an example of an existing real-world assembler that does not
behave that way, and we might have something interesting to discuss.
Yes, the C-like example above specifies exactly a sequence of CPU instructions
(well, small deviation is allowed, and assembly can also have function, macro)
A C program specifies run-time behavior. (A compiler generates CPU instructions behind the scenes to implement that behavior.)
Being 'portable', it should specify 'run-time behavior', no exact instructions.
Yes, that's what I said. And that's the fundamental difference between assembly and C.
How/what do you specify 'run-time behavior'? Not based on CPU?
The C standard defines "behavior" as "external appearance or action",
which is admittedly vague. Run-time behavior is what happens when the program is running on the target system. It includes things like input
and output, either to a console or to files.
The C standard specifies the behavior of this program:
#include <stdio.h>
int main(void) { puts("hello, world"); }
It does so without reference to any CPU. (Of course some CPU will be
used to implement that behavior.)
E.g. in C, int types are fixed-size, have range, wrap-around, alignment
and 'atomic','overlapping' properties, you cannot really understand or hide it and
program C/C++ correctly from the high-level concept of 'integer'.
The point is that C has NO WAY get rid of these (hard-ware) features, no matter
how high-level one thinks C is or expect C would be.
Right, C doesn't directly support abstract mathematical integers.
Of course I agree that C is a lower level language than many others.
Python, for example, has reasonably transparent support for integers
of arbitrary width. Python is a higher level language than C.
(Notably, the Python interpreter is written in C).
That doesn't make C an assembly language.
[...]
When I heard 'sophisticated assemblers', I would think something like
my idea of 'portable' assembly, but maybe different. One my point should be clear as stated in the above int example "... C has NO WAY
get rid of these (hard-ware) features, no matter how high-level one
thinks C is or expect C would be."
Again, yes, C is a relatively low-level language. And again,
C is not an assembly language.
And again, if you can cite a real-world example of the kind of
"sophisticated assembler" you're talking about, that would be an
interesting data point.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
On 15/04/2026 22:12, wij wrote:
On Wed, 2026-04-15 at 15:06 +0100, Bart wrote:
The boundary of assembly and HLL is not clear to me.
That seems to be obvious.
I had wrote a killer-grade commercial assembly program, it may still be running
today after >30 years. My experience is that assembly is not that scary as commonly
thought, just don't think in low level.
It's not that scary. Just unergonomic to code in it, taking longer,
being more error prone, much harder to understand, harder to maintain,
much less portable ...
On Wed, 2026-04-15 at 23:52 +0100, Bart wrote:
On 15/04/2026 22:12, wij wrote:
On Wed, 2026-04-15 at 15:06 +0100, Bart wrote:
The boundary of assembly and HLL is not clear to me.
That seems to be obvious.
I had wrote a killer-grade commercial assembly program, it may still be running
today after >30 years. My experience is that assembly is not that scary as commonly
thought, just don't think in low level.
It's not that scary. Just unergonomic to code in it, taking longer,
being more error prone, much harder to understand, harder to maintain,
much less portable ...
Skill. Treat assembly as a chunk. Well document.
Again, yes, C is a relatively low-level language. And again,
C is not an assembly language.
And again, if you can cite a real-world example of the kind of
"sophisticated assembler" you're talking about, that would be an
interesting data point.
I had thought questions like yours might have been due to the English problem.
I did not mean C is (equal to) assembly, but C is-a assembly (logic course 101).
And I hope the following code could explain some confusion.
The 'assembly' could be 'structured assembly', but then I felt the
result should not be much different from C...
[... comparing C and assembly language ...]
On 16/04/2026 00:30, wij wrote:
On Wed, 2026-04-15 at 23:52 +0100, Bart wrote:
On 15/04/2026 22:12, wij wrote:Skill. Treat assembly as a chunk. Well document.
On Wed, 2026-04-15 at 15:06 +0100, Bart wrote:
The boundary of assembly and HLL is not clear to me.
That seems to be obvious.
I had wrote a killer-grade commercial assembly program, it may still be running
today after >30 years. My experience is that assembly is not that scary as commonly
thought, just don't think in low level.
It's not that scary. Just unergonomic to code in it, taking longer,
being more error prone, much harder to understand, harder to maintain,
much less portable ...
You're not making sense. It's like saying I should walk everywhere
instead of using my car.
But I don't want to spend two extra hours a day walking and carrying
shopping etc.
What exactly is the benefit of using assembly over a HLL when both can
tackle the task?
When I first started with microprocessors, I first had to build the
hardware, which was programmed in binary. I wrote a hex editor so I
could use a keyboard. Then used that to write an assembler. Then used
the assembler to write a compiler for a simple HLL.
The HLL allowed me to be far more productive than otherwise. Everybody
seems to understand that, except you.
But I have a counter-proposal: why don't you also program in binary
machine code (I'll let you use hex!) instead of assembly? After all
it's just a skill.
wij <wyniijj5@gmail.com> writes:
[...]
[signature snipped]
When you post a followup, please trim quoted text that's not relevant to
your reply. And in particular, don't quote signatures.
wij <wyniijj5@gmail.com> writes:Maybe you are right. I say A is-a B, one persist to read A is (exactly) B.
[... comparing C and assembly language ...]
Gentlemen,
I understand the natural reaction to want to respond to the kind of statements being made in this thread. I hope y'all can resist this
natural reaction and not respond to people who persist in making
arguments that are basically isomorphic to saying 1 equals 0.
Thank you for your assistance in this matter.
On Wed, 2026-04-15 at 17:14 -0700, Tim Rentsch wrote:
wij <wyniijj5@gmail.com> writes:
[... comparing C and assembly language ...]
Gentlemen,
I understand the natural reaction to want to respond to the kind of
statements being made in this thread. I hope y'all can resist this
natural reaction and not respond to people who persist in making
arguments that are basically isomorphic to saying 1 equals 0.
Thank you for your assistance in this matter.
Maybe you are right. I say A is-a B, one persist to read A is (exactly) B.
I provide help to using assembly. One persist to read I persuade using assembly and give up HLL. What is going on here?
On Wed, 2026-04-15 at 17:14 -0700, Tim Rentsch wrote:
wij <wyniijj5@gmail.com> writes:
[... comparing C and assembly language ...]
Gentlemen,
I understand the natural reaction to want to respond to the kind of
statements being made in this thread. I hope y'all can resist this
natural reaction and not respond to people who persist in making
arguments that are basically isomorphic to saying 1 equals 0.
Thank you for your assistance in this matter.
Maybe you are right. I say A is-a B, one persist to read A is (exactly) B.
I provide help to using assembly. One persist to read I persuade using assembly and give up HLL. What is going on here?
On 15/04/2026 18:58, wij wrote:
Assembly compiler (or language) can also do the same optimization.
No, assemblers cannot do that - if they did, they would not be
assemblers. An assembler directly translates your instructions from mnemonic codes (assembly instructions) to binary opcodes. Some
assemblers might have pseudo-instructions that translate to more than
one binary opcode, but always in a specific defined pattern.
So what you're saying is that assembly can do anything that any other arbitrary language (that has to eventually compile down to the same
machine code) can do? This should not be surprising to anyone.
C has never been, and was never intended to be, a "portable assembly".
It was designed to reduce the need to write assembly code. There is a
huge difference in these concepts.
On 15/04/2026 01:33, Bart wrote:
On 14/04/2026 23:20, Janis Papanagnou wrote:
On 2026-04-14 23:41, Bart wrote:
But if you want to call C some kind of assembler, even though it is
several levels above actual assembly, then that's up to you.
Can you name and describe a couple of these "several levels above
actual assembly"? (Assembler macros might qualify as one level.)
I said C is several levels above, and mentioned 2 categories and 2
specific ones that can be considered to be in-between.
I agree with a great deal you have written in this thread (at least what
I have read so far). My points below are mainly additional comments
rather than arguments or disagreements. Like you, my disagreement is primarily with wij.
Namely:
* HLAs (high-level assemblers) of various kinds, as this is a broad
category (see note)
When I used to do significant amounts of assembly programming (often on "brain-dead" 8-bit CISC microcontrollers), I would make heavy use of assembler macros as a way of getting slightly "higher level" assembly.
Even with common assembler tools you can write something that is a kind
of HLA. And then for some targets, there are more sophisticated tools
(or you can write them yourself) for additional higher level constructs.
* Intermediate languages (IRs/ILs) such as LLVM IR
LLVM is probably the best candidate for something that could be called a "portable assembler". It is quite likely that other such "languages"
have been developed and used (perhaps internally in multi-target
compilers), but LLVM's is the biggest and with the widest support.
* Forth
Forth is always a bit difficult to categorise. Many Forth
implementations are done with virtual machines or byte-code
interpreters, raising them above assembly. Others are for stack machine processors (very common in the 4-bit world) where the assembly /is/ a
small Forth language. A lot of Forth tools compile very directly
(giving you the "you get what your code looks like" aspect of assembly), others do more optimisation (for the "you get what your code means"
aspect of high level languages).
* PL/M (an old one; there was also C--, now dead)
I never used PL/M - I'm too young for that! C-- was conceived as a portable intermediary language that compilers could generate to get cross-target compilation without needing individual target backends. In practice, ordinary C does a good enough job for many transpilers during development, then they can move to LLVM for more control and efficiency
if they see it as worth the effort.
(Note: the one I implemented was called 'Babbage', devised for the GEC
4000 machines. My task was to port it to DEC PDP10. There's something
about it 2/3 down this page: https://en.wikipedia.org/wiki/
GEC_4000_series)
Beyond the inherent subjective aspects of that or the OP's initial
statement I certainly see "C" closer to the machine than many HLLs.
I see it as striving to distance itself from the machine as much as
possible!
Yes - as much as possible while retaining efficiency.
Certainly until C99 when stdint.h came along.
I would not draw that distinction - indeed, I see the opposite. Prior
to <stdint.h>, your integer type sizes were directly from the target
machine - with <stdint.h> explicitly sized integer types, they are now independent of the target hardware.
C has always intended to be as independent from the machine as
practically possible without compromising efficiency. That's why it has implementation-defined behaviour where it makes a significant difference (such as the size of integer types), while giving full definitions of
things that can reasonably be widely portable while still being
efficient (and sometimes leaving things undefined to encourage
portability).
For example:
* Not committing to actual machine types, widths or representations,
such as a 'byte', or 'twos complement'.
(With C23, two's complement is the only allowed signed integer representation. There comes a point where something is so dominant that even C commits it to the standards.)
* Being vague about the relations between the different integer types
* Not allowing (until standardised after half a century) binary
literals, and still not allowing those to be printed
That one is more that no one had bothered standardising binary literals.
The people that wanted them, for the most part, are low-level embedded programmers and their tools already supported them. (And even then,
they are not much used in practice.) Printing in binary is not
something people often want - it is far too cumbersome for numbers, and
if you want to dump a view of some flag register then a custom function
with letters is vastly more useful.
C is standardised on binary - unsigned integer types would not work well
on a non-binary target.
* Not being allowed to do a dozen things that you KNOW are well-
defined on your target machine, but C says are UB.
That is certainly part of it. Things like "signed integer arithmetic overflow" is UB at least partly because C models mathematical integer arithmetic. It does not attempt to mimic the underlying hardware. This is clearly "high level language" territory - C defines the behaviour of
an abstract machine in terms of mathematics. It is not an "assembler"
that defines operations in terms of hardware instructions.
It certainly depends on where one is coming from; from an abstract
or user-application level or from the machine level.
There was often mentioned here - very much to the despise of the
audience - that there's a lot effort necessary to implement simple
concepts. To jump on that bandwagon; how would, say, Awk's array
construct map[key] = value have to be modeled in (native) "C".
(Note that this simple statement represents an associative array.)
"C" is abstracting from the machine. And the OP's initial statement
"C and assembly are essentially the same" may be nonsense
Actually, describing C as 'portable assembly' annoys me which is why I
went into some detail.
Indeed.
C is defined in terms of an abstract machine, not hardware. And the C source code running on this abstract machine only needs to match up with
the actual binary code on the real target machine in very specific and limited ways - the "observable behaviour" of the program. That's
basically start, stop, volatile accesses and IO. Everything else
follows the "as if" - the compiler needs to generate target code that
works (for observable behaviour) "as if" it had done a direct, naïve translation of the source.
As I understand the history - and certainly the practice - of the C language, it is a language with two goals. One is that it should be possible to write highly portable C code that can be used on a very wide range of target systems while remaining efficient. The other is that it should be useable for a lot of target-specific system code.
C has never been, and was never intended to be, a "portable assembly".
It was designed to reduce the need to write assembly code. There is a
huge difference in these concepts.
On 4/15/2026 12:57 AM, David Brown wrote:
On 15/04/2026 01:33, Bart wrote:
On 14/04/2026 23:20, Janis Papanagnou wrote:
On 2026-04-14 23:41, Bart wrote:
But if you want to call C some kind of assembler, even though it is >>>>> several levels above actual assembly, then that's up to you.
Can you name and describe a couple of these "several levels above
actual assembly"? (Assembler macros might qualify as one level.)
I said C is several levels above, and mentioned 2 categories and 2
specific ones that can be considered to be in-between.
I agree with a great deal you have written in this thread (at least
what I have read so far). My points below are mainly additional
comments rather than arguments or disagreements. Like you, my
disagreement is primarily with wij.
Namely:
* HLAs (high-level assemblers) of various kinds, as this is a broad
category (see note)
When I used to do significant amounts of assembly programming (often
on "brain-dead" 8-bit CISC microcontrollers), I would make heavy use
of assembler macros as a way of getting slightly "higher level"
assembly. Even with common assembler tools you can write something
that is a kind of HLA. And then for some targets, there are more
sophisticated tools (or you can write them yourself) for additional
higher level constructs.
* Intermediate languages (IRs/ILs) such as LLVM IR
LLVM is probably the best candidate for something that could be called
a "portable assembler". It is quite likely that other such
"languages" have been developed and used (perhaps internally in multi-
target compilers), but LLVM's is the biggest and with the widest support.
* Forth
Forth is always a bit difficult to categorise. Many Forth
implementations are done with virtual machines or byte-code
interpreters, raising them above assembly. Others are for stack
machine processors (very common in the 4-bit world) where the
assembly /is/ a small Forth language. A lot of Forth tools compile
very directly (giving you the "you get what your code looks like"
aspect of assembly), others do more optimisation (for the "you get
what your code means" aspect of high level languages).
* PL/M (an old one; there was also C--, now dead)
I never used PL/M - I'm too young for that! C-- was conceived as a
portable intermediary language that compilers could generate to get
cross-target compilation without needing individual target backends.
In practice, ordinary C does a good enough job for many transpilers
during development, then they can move to LLVM for more control and
efficiency if they see it as worth the effort.
(Note: the one I implemented was called 'Babbage', devised for the
GEC 4000 machines. My task was to port it to DEC PDP10. There's
something about it 2/3 down this page: https://en.wikipedia.org/wiki/
GEC_4000_series)
Beyond the inherent subjective aspects of that or the OP's initial
statement I certainly see "C" closer to the machine than many HLLs.
I see it as striving to distance itself from the machine as much as
possible!
Yes - as much as possible while retaining efficiency.
Certainly until C99 when stdint.h came along.
I would not draw that distinction - indeed, I see the opposite. Prior
to <stdint.h>, your integer type sizes were directly from the target
machine - with <stdint.h> explicitly sized integer types, they are now
independent of the target hardware.
C has always intended to be as independent from the machine as
practically possible without compromising efficiency. That's why it
has implementation-defined behaviour where it makes a significant
difference (such as the size of integer types), while giving full
definitions of things that can reasonably be widely portable while
still being efficient (and sometimes leaving things undefined to
encourage portability).
For example:
* Not committing to actual machine types, widths or representations,
such as a 'byte', or 'twos complement'.
(With C23, two's complement is the only allowed signed integer
representation. There comes a point where something is so dominant
that even C commits it to the standards.)
* Being vague about the relations between the different integer types
* Not allowing (until standardised after half a century) binary
literals, and still not allowing those to be printed
That one is more that no one had bothered standardising binary
literals. The people that wanted them, for the most part, are low-
level embedded programmers and their tools already supported them.
(And even then, they are not much used in practice.) Printing in
binary is not something people often want - it is far too cumbersome
for numbers, and if you want to dump a view of some flag register then
a custom function with letters is vastly more useful.
C is standardised on binary - unsigned integer types would not work
well on a non-binary target.
* Not being allowed to do a dozen things that you KNOW are well-
defined on your target machine, but C says are UB.
That is certainly part of it. Things like "signed integer arithmetic
overflow" is UB at least partly because C models mathematical integer
arithmetic. It does not attempt to mimic the underlying hardware.
This is clearly "high level language" territory - C defines the
behaviour of an abstract machine in terms of mathematics. It is not
an "assembler" that defines operations in terms of hardware instructions.
It certainly depends on where one is coming from; from an abstract
or user-application level or from the machine level.
There was often mentioned here - very much to the despise of the
audience - that there's a lot effort necessary to implement simple
concepts. To jump on that bandwagon; how would, say, Awk's array
construct map[key] = value have to be modeled in (native) "C".
(Note that this simple statement represents an associative array.)
"C" is abstracting from the machine. And the OP's initial statement
"C and assembly are essentially the same" may be nonsense
Actually, describing C as 'portable assembly' annoys me which is why
I went into some detail.
Indeed.
C is defined in terms of an abstract machine, not hardware. And the C
source code running on this abstract machine only needs to match up
with the actual binary code on the real target machine in very
specific and limited ways - the "observable behaviour" of the
program. That's basically start, stop, volatile accesses and IO.
Everything else follows the "as if" - the compiler needs to generate
target code that works (for observable behaviour) "as if" it had done
a direct, naïve translation of the source.
As I understand the history - and certainly the practice - of the C
language, it is a language with two goals. One is that it should be
possible to write highly portable C code that can be used on a very
wide range of target systems while remaining efficient. The other is
that it should be useable for a lot of target-specific system code.
C has never been, and was never intended to be, a "portable assembly".
It was designed to reduce the need to write assembly code. There is a
huge difference in these concepts.
Use C to create the ASM, then GAS it... ;^)
Nope, I quit C (but I keep watching C, since part of C++ is C)
wij <wyniijj5@gmail.com> writes:
On Wed, 2026-04-15 at 22:11 +0200, David Brown wrote:[...]
Okay, have you ever actually done any C++ programming? The languages
share the same philosophy here.
You are really a sick person. Looser of the real world. You just don't know >> yourself.
I have a gold medal, an aluminum medal and a bronze commemorative plaque (for
solving a riddle of Northrop Coorp.). What you have? Well... a paper (paid for)
and still making false memory everyday for yourself.
I retired at 37, can you?
Ah, recently, you also failed to verify a simple program that proves
3x+1 problem. Fact is not made by mouth (like DJT?), looser.
Keep the personal abuse to yourself.
On 15/04/2026 13:21, wij wrote:A high level lang can dump code for a lower level one and vise versa.
On Wed, 2026-04-15 at 11:46 +0100, Bart wrote:
On 15/04/2026 07:05, wij wrote:
On Tue, 2026-04-14 at 21:46 -0700, Keith Thompson wrote:
int a;
char b;
a=b; // allow auto promotion
while(a<b) {
a+=1;
}
You've claimed that that's assembly language. What assembler?
For what CPU?
Is it even for a real assembler?
I think you realize the example above is just an example to demo my
idea.
So you've invented an 'assembly' syntax that looks exactly like C, in
order to support your notion that C and assembly are really the same
thing!
Exactly. But not really 'invented'. I feagured if anyone wants to
implement
a 'portable assembly', he would find it not much different from C
(from the
example shown, 'structured C'). So, in a sense, not worthy to implement.
Real assembly generally uses explicit instructions and labels rather
than the implicit ones used here. It would also have limits on the
complexity of expressions. If your pseudo-assembler supports:
a = b+c*f(x,y);
then you've invented a HLL.
You may say that.
It sounds like you don't understand the difference between a low-level language and a high-level one.
On Wed, 2026-04-15 at 23:52 +0100, Bart wrote:
On 15/04/2026 22:12, wij wrote:
On Wed, 2026-04-15 at 15:06 +0100, Bart wrote:
The boundary of assembly and HLL is not clear to me.
That seems to be obvious.
I had wrote a killer-grade commercial assembly program, it may still be running
today after >30 years. My experience is that assembly is not that scary as commonly
thought, just don't think in low level.
It's not that scary. Just unergonomic to code in it, taking longer,
being more error prone, much harder to understand, harder to maintain,
much less portable ...
Skill. Treat assembly as a chunk. Well document.
On 4/15/2026 4:30 PM, wij wrote:
On Wed, 2026-04-15 at 23:52 +0100, Bart wrote:
On 15/04/2026 22:12, wij wrote:
On Wed, 2026-04-15 at 15:06 +0100, Bart wrote:
The boundary of assembly and HLL is not clear to me.
That seems to be obvious.
I had wrote a killer-grade commercial assembly program, it may still
be running
today after >30 years. My experience is that assembly is not that
scary as commonly
thought, just don't think in low level.
It's not that scary. Just unergonomic to code in it, taking longer,
being more error prone, much harder to understand, harder to maintain,
much less portable ...
Skill. Treat assembly as a chunk. Well document.
Well crafted asm is not bad. Only used when needed! simple... :^)
I found some of my old asm on the way back machine:
https://web.archive.org/web/20060214112345/http:// appcore.home.comcast.net/appcore/src/cpu/i686/ac_i686_gcc_asm.html
On 4/15/2026 11:37 PM, Chris M. Thomasson wrote:
On 4/15/2026 4:30 PM, wij wrote:
On Wed, 2026-04-15 at 23:52 +0100, Bart wrote:
On 15/04/2026 22:12, wij wrote:
On Wed, 2026-04-15 at 15:06 +0100, Bart wrote:
The boundary of assembly and HLL is not clear to me.
That seems to be obvious.
I had wrote a killer-grade commercial assembly program, it may
still be running
today after >30 years. My experience is that assembly is not that
scary as commonly
thought, just don't think in low level.
It's not that scary. Just unergonomic to code in it, taking longer,
being more error prone, much harder to understand, harder to maintain, >>>> much less portable ...
Skill. Treat assembly as a chunk. Well document.
Well crafted asm is not bad. Only used when needed! simple... :^)
I found some of my old asm on the way back machine:
https://web.archive.org/web/20060214112345/http://
appcore.home.comcast.net/appcore/src/cpu/i686/ac_i686_gcc_asm.html
2005, damn time goes on bye, bye... ;^o
David Brown <david.brown@hesbynett.no> writes:
On 15/04/2026 01:33, Bart wrote:[...]
Certainly until C99 when stdint.h came along.I would not draw that distinction - indeed, I see the opposite. Prior
to <stdint.h>, your integer type sizes were directly from the target
machine - with <stdint.h> explicitly sized integer types, they are now
independent of the target hardware.
A minor quibble: The sizes of the predefined integer types have
always been determined by the compiler, often mandated by an ABI
for the target platform. The choice is *influenced* by the target
hardware, but not controlled by it. For example, the width of
`long` on x86_64 is likely to be 32 bits on Windows, 64 bits on
other platforms.
David Brown <david.brown@hesbynett.no> wrote:
On 15/04/2026 18:58, wij wrote:
Assembly compiler (or language) can also do the same optimization.
No, assemblers cannot do that - if they did, they would not be
assemblers. An assembler directly translates your instructions from
mnemonic codes (assembly instructions) to binary opcodes. Some
assemblers might have pseudo-instructions that translate to more than
one binary opcode, but always in a specific defined pattern.
Well, as a program assembler is not a compiler. But people talk
about "assembly language" and you can have a compiler that
takes assembly language as an input. This was done by DEC
for VAX assembly. A guy created compilers for 360 assembly,
one targeting 386, another one targetimg Java. Such compilers
to be useful should do same optimization.
Bart <bc@freeuk.com> writes:
On 16/04/2026 00:30, wij wrote:
On Wed, 2026-04-15 at 23:52 +0100, Bart wrote:
On 15/04/2026 22:12, wij wrote:Skill. Treat assembly as a chunk. Well document.
On Wed, 2026-04-15 at 15:06 +0100, Bart wrote:
The boundary of assembly and HLL is not clear to me.
That seems to be obvious.
I had wrote a killer-grade commercial assembly program, it may still be running
today after >30 years. My experience is that assembly is not that scary as commonly
thought, just don't think in low level.
It's not that scary. Just unergonomic to code in it, taking longer,
being more error prone, much harder to understand, harder to maintain, >>>> much less portable ...
You're not making sense. It's like saying I should walk everywhere
instead of using my car.
But I don't want to spend two extra hours a day walking and carrying
shopping etc.
What exactly is the benefit of using assembly over a HLL when both can
tackle the task?
When I first started with microprocessors, I first had to build the
hardware, which was programmed in binary. I wrote a hex editor so I
could use a keyboard. Then used that to write an assembler. Then used
the assembler to write a compiler for a simple HLL.
The HLL allowed me to be far more productive than otherwise. Everybody
seems to understand that, except you.
But I have a counter-proposal: why don't you also program in binary
machine code (I'll let you use hex!) instead of assembly? After all
it's just a skill.
Assembly is a great thing to know. It makes it easier to know what's
going on under the hood of higher level languages, and can even help in trouboeshooting and reasoning about how to make your code more efficiet.
Do I think that learning assembly is an asset? Absolutely.
Do I think it's something that a project should be written in directly?
In most cases, absolutely not.
* Not allowing (until standardised after half a century) binary
literals, and still not allowing those to be printed
* Not being allowed to do a dozen things that you KNOW are well-defined
on your target machine, but C says are UB.
W dniu 15.04.2026 o 22:01, Kerr-Mudd, John pisze:
On Wed, 15 Apr 2026 20:23:52 +0200
🇵🇱Jacek Marcin Jaworski🇵🇱<jmj@energokod.gda.pl> wrote:
W dniu 15.04.2026 o 15:40, makendo pisze:DYM comp.lang.asm.x86?
(forwarding to alt.lang.asm because you are comparing C with it)Great, but what is wrong with comp.lang.asm ? I subcribe it instead any
ohter asm related groups. Is this wrong aproach?
No!
comp.lang.asm is an empty header for me on eternal september's feed.
After question "is comp.lang.asm moderated?" ecosia.org AI answer today, quote:
The newsgroup comp.lang.asm is generally considered an unmoderated Usenet group. Unlike comp.lang.asm.x86, which is known to be moderated, comp.lang.asm does not have an official moderation process and typically allows posts to appear without prior review.
I see old posts published on comp.lang.asm, and last is yours: "Kenny
Code for DOS", from 2023-04-24, mon. (without any answers).
sig is overlong, and crowded, IMHO.
I have so many things to communicate Poles - this is the reason of bit
sig. But I try to be laconic.
--
Jacek Marcin Jaworski, Pruszcz Gd., woj. Pomorskie, Polska 🇵🇱, EU 🇪🇺;
tel.: +48-609-170-742, najlepiej w godz.: 5:00-5:55 lub 16:00-17:25; <jmj@energokod.gda.pl>, gpg: 4A541AA7A6E872318B85D7F6A651CC39244B0BFA;
Domowa s. WWW: <https://energokod.gda.pl>;
Mini Netykieta: <https://energokod.gda.pl/MiniNetykieta.html>; Mailowa Samoobrona: <https://emailselfdefense.fsf.org/pl>.
UWAGA:
NIE ZACIĄGAJ "UKRYTEGO DŁUGU"! PŁAĆ ZA PROG. FOSS I INFO. INTERNETOWE! CZYTAJ DARMOWY: "17. Raport Totaliztyczny - Patroni Kontra Bankierzy": <https://energokod.gda.pl/raporty-totaliztyczne/17.%20Patroni%20Kontra%20Bankierzy.pdf>
wij <wyniijj5@gmail.com> writes:If I said C is assembly is in the sense that have at least shown in the last post (s_tut2.cpp), where even 'instruction' can be any function (e.g. change directory, copy files, launch an editor,...). And also, what is 'computation' is demonstrated, which include suggestion what C is, essentially any program, and in this sense what HLL is. Finally, it could demonstrate the meaning and testify Church-Turing thesis (my words: no computation language, including various kind of math formula, can exceeds the expressive power of TM).
On Wed, 2026-04-15 at 17:14 -0700, Tim Rentsch wrote:
wij <wyniijj5@gmail.com> writes:
[... comparing C and assembly language ...]
Gentlemen,
I understand the natural reaction to want to respond to the kind of statements being made in this thread. I hope y'all can resist this natural reaction and not respond to people who persist in making arguments that are basically isomorphic to saying 1 equals 0.
Thank you for your assistance in this matter.
Maybe you are right. I say A is-a B, one persist to read A is (exactly) B. I provide help to using assembly. One persist to read I persuade using assembly and give up HLL. What is going on here?
You say that C is an assembly language. Nobody here thinks that
you're *equating* C and assembly language. It's obvious that
there are plenty of assembly languages that are not C, and nobody
has said otherwise. I have no idea why you think anyone has that
particular confusion.
At least one person has apparently interpreted your defense of
assembly language (that it isn't as scary as some think it is)
as a claim that we should program in assembly language rather
than in HLLs. You're right, that was a misinterpretation of what
you wrote. I considered mentioning that, but didn't bother.
The issue I've been discussing is your claim that C is an assembly language. It is not.
I do not intend to post again in this thread until and unless you--- Synchronet 3.21f-Linux NewsLink 1.2
post something substantive on that issue.
Well crafted asm is not bad. Only used when needed! simple... :^)
wij <wyniijj5@gmail.com> writes:
[... comparing C and assembly language ...]
Gentlemen,
I understand the natural reaction to want to respond to the kind of >statements being made in this thread. I hope y'all can resist this
natural reaction and not respond to people who persist in making
arguments that are basically isomorphic to saying 1 equals 0.
Thank you for your assistance in this matter.
It seem you insist C and assembly have to be exactly what your bible says.
If so, I would say what C standard (I cannot read it) says is the meaning
of terminology of term in it, not intended to be anything used in any
other situation.
I do not intend to post again in this thread until and unless you
post something substantive on that issue.
On Wed, 2026-04-15 at 19:04 -0700, Keith Thompson wrote:(continue)
wij <wyniijj5@gmail.com> writes:
On Wed, 2026-04-15 at 17:14 -0700, Tim Rentsch wrote:
wij <wyniijj5@gmail.com> writes:
[... comparing C and assembly language ...]
Gentlemen,
I understand the natural reaction to want to respond to the kind of statements being made in this thread. I hope y'all can resist this natural reaction and not respond to people who persist in making arguments that are basically isomorphic to saying 1 equals 0.
Thank you for your assistance in this matter.
Maybe you are right. I say A is-a B, one persist to read A is (exactly) B.
I provide help to using assembly. One persist to read I persuade using assembly and give up HLL. What is going on here?
You say that C is an assembly language. Nobody here thinks that
you're *equating* C and assembly language. It's obvious that
there are plenty of assembly languages that are not C, and nobody
has said otherwise. I have no idea why you think anyone has that particular confusion.
At least one person has apparently interpreted your defense of
assembly language (that it isn't as scary as some think it is)
as a claim that we should program in assembly language rather
than in HLLs. You're right, that was a misinterpretation of what
you wrote. I considered mentioning that, but didn't bother.
The issue I've been discussing is your claim that C is an assembly language. It is not.
If I said C is assembly is in the sense that have at least shown in the last post (s_tut2.cpp), where even 'instruction' can be any function (e.g. change directory, copy files, launch an editor,...). And also, what is 'computation' is demonstrated, which include suggestion what C is, essentially any program, and in this sense what HLL is. Finally, it could demonstrate the meaning and testify Church-Turing thesis (my words: no computation language, including various kind of math formula, can exceeds the expressive power of TM).
It seem you insist C and assembly have to be exactly what your bible says. If so, I would say what C standard (I cannot read it) says is the meaning of terminology of term in it, not intended to be anything used in any other situation.
I do not intend to post again in this thread until and unless you
post something substantive on that issue.
In article <1e4ef965d5ee27013e0abfd3c5dc18831400ad5f.camel@gmail.com>,Forget about that, fact first. There are LLM. This is not court.
wij <wyniijj5@gmail.com> wrote:
...
It seem you insist C and assembly have to be exactly what your bible says. If so, I would say what C standard (I cannot read it) says is the meaning of terminology of term in it, not intended to be anything used in any
other situation.
Keith is the king of this newsgroup. What he says, goes.
The way he defines words is the law, and all must fall in line with that.
You're new around here, so you are probably not familar with these rules,comp.lang.c allows idiots (not all, but maybe including me)
but you will be soon (if you choose to stick around).
Kind Keith then stated:
I do not intend to post again in this thread until and unless you
post something substantive on that issue.
For which we are all grateful.
David Brown <david.brown@hesbynett.no> wrote:
On 15/04/2026 18:58, wij wrote:
Assembly compiler (or language) can also do the same optimization.
No, assemblers cannot do that - if they did, they would not be
assemblers. An assembler directly translates your instructions from
mnemonic codes (assembly instructions) to binary opcodes. Some
assemblers might have pseudo-instructions that translate to more than
one binary opcode, but always in a specific defined pattern.
Well, as a program assembler is not a compiler. But people talk
about "assembly language" and you can have a compiler that
takes assembly language as an input. This was done by DEC
for VAX assembly. A guy created compilers for 360 assembly,
one targeting 386, another one targetimg Java. Such compilers
to be useful should do same optimization.
On Thu, 2026-04-16 at 13:10 +0000, Kenny McCormack wrote:
In article <1e4ef965d5ee27013e0abfd3c5dc18831400ad5f.camel@gmail.com>,
wij <wyniijj5@gmail.com> wrote:
...
It seem you insist C and assembly have to be exactly what your bible says. >>> If so, I would say what C standard (I cannot read it) says is the meaning >>> of terminology of term in it, not intended to be anything used in any
other situation.
Keith is the king of this newsgroup. What he says, goes.
The way he defines words is the law, and all must fall in line with that.
Forget about that, fact first. There are LLM. This is not court.
As I know, comp.lang.c should be a forum for more general topics than lang.c.mod, comp.lang.c.std. And, refrain from telling what other should do, you are just another participant.
antispam@fricas.org (Waldek Hebisch) writes:
David Brown <david.brown@hesbynett.no> wrote:
On 15/04/2026 18:58, wij wrote:
Assembly compiler (or language) can also do the same optimization.
No, assemblers cannot do that - if they did, they would not be
assemblers. An assembler directly translates your instructions from
mnemonic codes (assembly instructions) to binary opcodes. Some
assemblers might have pseudo-instructions that translate to more than
one binary opcode, but always in a specific defined pattern.
Well, as a program assembler is not a compiler. But people talk
about "assembly language" and you can have a compiler that
takes assembly language as an input. This was done by DEC
for VAX assembly. A guy created compilers for 360 assembly,
one targeting 386, another one targetimg Java. Such compilers
to be useful should do same optimization.
The C compiler in the GNU Compiler Collection provides
a mechanism to 'take assembly language as an input'
in the form of in-line assembler fragments. It's
useful in some limited cases (machine-level software like
kernels, boot loaders and the like).
antispam@fricas.org (Waldek Hebisch) writes:
David Brown <david.brown@hesbynett.no> wrote:
On 15/04/2026 18:58, wij wrote:
Assembly compiler (or language) can also do the same optimization.
No, assemblers cannot do that - if they did, they would not be
assemblers. An assembler directly translates your instructions from
mnemonic codes (assembly instructions) to binary opcodes. Some
assemblers might have pseudo-instructions that translate to more than
one binary opcode, but always in a specific defined pattern.
Well, as a program assembler is not a compiler. But people talk
about "assembly language" and you can have a compiler that
takes assembly language as an input. This was done by DEC
for VAX assembly. A guy created compilers for 360 assembly,
one targeting 386, another one targetimg Java. Such compilers
to be useful should do same optimization.
The C compiler in the GNU Compiler Collection provides
a mechanism to 'take assembly language as an input'
in the form of in-line assembler fragments. It's
useful in some limited cases (machine-level software like
kernels, boot loaders and the like).
The Burroughs Large systems (B5500 and descendents) has
never had an assembler; all code is written in a flavor
of Algol (with special syntax extensions required for
the MCP and other privileged applications).
The Burroughs Medium systems COBOL68 compiler supported
the 'ENTER SYMBOLIC' statement, which was followed by
in-line assembler until the LEAVE SYMBOLIC statement.
[snip]
FWIW, I believe that the origins of C had much the same
philosophy: write parts in suitable languages, and link
them together prior to execution.
K&R C had no reason
to support inline assembly (and, as far as I have read)
the authors studiously avoided that capability.
On 2026-04-16 17:11, Lew Pitcher wrote:
[snip]
FWIW, I believe that the origins of C had much the same
philosophy: write parts in suitable languages, and link
them together prior to execution.
But was that an outcome of the C-language design, or of
the UNIX operating system concepts with its languages,
toolbox, and linking-editor?
There also seems to have been an asymmetry here with "C",
at least evolving later...
From what I observed, "C" had reached a status to not be
"inter pares". As a comparably low-level language it had
been often used for other languages as the compile-output
to be then handled by any C-compiler. Also HLLs supported
interfaces to access (primarily) "C" modules because of
their (much better) performance and the typically easier
access to system resources.
K&R C had no reason
to support inline assembly (and, as far as I have read)
the authors studiously avoided that capability.
Nonetheless it supported the reserved word 'asm' (as I can
read in my old translation of K&R). (Not exactly what I'd
call "studiously avoided".)
Janis--
On Thu, 16 Apr 2026 14:38:06 +0000, Scott Lurndal wrote:
antispam@fricas.org (Waldek Hebisch) writes:
David Brown <david.brown@hesbynett.no> wrote:
On 15/04/2026 18:58, wij wrote:
Assembly compiler (or language) can also do the same optimization.
No, assemblers cannot do that - if they did, they would not be
assemblers. An assembler directly translates your instructions from
mnemonic codes (assembly instructions) to binary opcodes. Some
assemblers might have pseudo-instructions that translate to more than >>>> one binary opcode, but always in a specific defined pattern.
Well, as a program assembler is not a compiler. But people talk
about "assembly language" and you can have a compiler that
takes assembly language as an input. This was done by DEC
for VAX assembly. A guy created compilers for 360 assembly,
one targeting 386, another one targetimg Java. Such compilers
to be useful should do same optimization.
The C compiler in the GNU Compiler Collection provides
a mechanism to 'take assembly language as an input'
in the form of in-line assembler fragments. It's
useful in some limited cases (machine-level software like
kernels, boot loaders and the like).
I believe that the authors of GNU C latched on to an (at the
time) useful extension of the C language, originally implemented
in Ron Cain's "Small C Compiler for the 8080's" (Dr. Dobbs
Journal # 45, 1980) as the #asm/#endasm preprocessor directives.
Ron's K&R C subset compiler didn't compile to machine code;
instead, it compiled to CP/M 8080 assembler (CP/M came with
an 8080 assembler as it's only language tool), and so an
sourcecode assembly "passthrough" was easily implemented.
The Burroughs Large systems (B5500 and descendents) has
never had an assembler; all code is written in a flavor
of Algol (with special syntax extensions required for
the MCP and other privileged applications).
The Burroughs Medium systems COBOL68 compiler supported
the 'ENTER SYMBOLIC' statement, which was followed by
in-line assembler until the LEAVE SYMBOLIC statement.
The IBM language environments that I worked in all
supported static (and later, dynamic) linkage, and my
employer could afford a suite of IBM language tools.
IBMs language tools shared a common object interface,
so it was (relatively) easy to write the Assembly
parts in Assembler, and the HLL parts in the appropriate
HLL (ususally, for us, COBOL), and link them together
for execution.
Consequently, none of the high-level languages supported
an "assembly" escape (although COBOL provided extensions
for IBM DB2 relational database interaction).
On 2026-04-16 17:11, Lew Pitcher wrote:
[snip]
FWIW, I believe that the origins of C had much the same
philosophy: write parts in suitable languages, and link
them together prior to execution.
But was that an outcome of the C-language design, or of
the UNIX operating system concepts with its languages,
toolbox, and linking-editor?
On Thu, 16 Apr 2026 17:43:19 +0200, Janis Papanagnou wrote:
On 2026-04-16 17:11, Lew Pitcher wrote:
[snip]
FWIW, I believe that the origins of C had much the same
philosophy: write parts in suitable languages, and link
them together prior to execution.
But was that an outcome of the C-language design, or of
the UNIX operating system concepts with its languages,
toolbox, and linking-editor?
All of the above.
Linkage editors were (and still are) common technology,
as was separation of languages (assembler vs high level
language). Originally, Unix was written in assembler, and
(according to the histories) C was designed (with the existent
language tools in mind) to allow the Unix developers to use
a high-level language in their development. Remember, Bell
Labs wrote more than just Unix in C; C became the lingua-franca
for all the tools and applications, including the text management
tools (TROFF, EQN, SED, AWK, etc) and games (CHESS/CHECKERS/
BACKGAMMON)
I recall reading (but cannot find the reference now) that
Unix (V7 perhaps?) consisted of thousands of lines of C code,
and a few hundred lines of assembly for device drivers.
There also seems to have been an asymmetry here with "C",
at least evolving later...
From what I observed, "C" had reached a status to not be
"inter pares". As a comparably low-level language it had
been often used for other languages as the compile-output
to be then handled by any C-compiler. Also HLLs supported
interfaces to access (primarily) "C" modules because of
their (much better) performance and the typically easier
access to system resources.
K&R C had no reason
to support inline assembly (and, as far as I have read)
the authors studiously avoided that capability.
Nonetheless it supported the reserved word 'asm' (as I can
read in my old translation of K&R). (Not exactly what I'd
call "studiously avoided".)
To quote K&R ("The C Programming Language" 1978)
from Appendix A ("C Reference Manual") section 2.3 ("Keywords")
"The 'entry' keyword is not currently implemented by
any compiler, but is reserved for future use. Some
implementations also reserve the words 'fortran' and 'asm'."
I note that, according to that appendix, C had been ported to
PDP 11, Honeywell 6000, IBM 360/370, and Interdata 8/32 systems
at that time, none of them running Unix, to my knowledge.
As
such, the language (at that time in a bit of a plastic state,
being supplied as source code to AT&T customers and educators
alike) may have been altered on a site-by-site basis to suit
the needs of each particular client. As the context of these
keywords was never explained, I find it easier to believe that
the intent for these keywords was as a storage modifier, and
not an inline language change. Something like
extern fortran int F1(); /* use fortran calling convention */
extern asm char *F2(); /* use assembly calling convention */
On 15/04/2026 01:33, Bart wrote:
...
* Not allowing (until standardised after half a century) binary
literals, and still not allowing those to be printed
The latest draft standard supports %b and %B formats.
...
* Not being allowed to do a dozen things that you KNOW are well-defined
on your target machine, but C says are UB.
If you know they are well-defined on your only target platform, there's nothing wrong with writing such code. That's part of the reason why C
says the behavior is undefined, rather than requiring that such code be rejected. Implementations are intended to take advantage of that fact
for code that does not need to be portable.
On 15/04/2026 01:33, Bart wrote:
...
* Not allowing (until standardised after half a century) binary
literals, and still not allowing those to be printed
The latest draft standard supports %b and %B formats.
...
* Not being allowed to do a dozen things that you KNOW are well-defined
on your target machine, but C says are UB.
If you know they are well-defined on your only target platform, there's nothing wrong with writing such code. That's part of the reason why C
says the behavior is undefined, rather than requiring that such code be rejected. Implementations are intended to take advantage of that fact
for code that does not need to be portable.
On 2026-04-16 08:37, Chris M. Thomasson wrote:
Well crafted asm is not bad. Only used when needed! simple... :^)
And in practice a throwaway-product once you change platform.
(I'm shuddering thinking about porting my decades old DSP asm
code to some other platform/CPU architecture.) But I've ported
or re-used old "C" code without much effort. This is a crucial
differences, especially in the light of the thread-theses.
On 16/04/2026 11:28, James Kuyper wrote:
On 15/04/2026 01:33, Bart wrote:
...
* Not allowing (until standardised after half a century) binaryThe latest draft standard supports %b and %B formats.
literals, and still not allowing those to be printed
...
* Not being allowed to do a dozen things that you KNOW are well-definedIf you know they are well-defined on your only target platform,
on your target machine, but C says are UB.
there's
nothing wrong with writing such code. That's part of the reason why C
says the behavior is undefined, rather than requiring that such code be
rejected. Implementations are intended to take advantage of that fact
for code that does not need to be portable.
Taking advantage in what way? Doing something entirely unexpected or unintuitive?
/That's/ the problem!
On 16/04/2026 11:28, James Kuyper wrote:
On 15/04/2026 01:33, Bart wrote:
...
* Not allowing (until standardised after half a century) binaryThe latest draft standard supports %b and %B formats.
literals, and still not allowing those to be printed
...
* Not being allowed to do a dozen things that you KNOW are well-definedIf you know they are well-defined on your only target platform,
on your target machine, but C says are UB.
there's
nothing wrong with writing such code. That's part of the reason why C
says the behavior is undefined, rather than requiring that such code be
rejected. Implementations are intended to take advantage of that fact
for code that does not need to be portable.
Taking advantage in what way? Doing something entirely unexpected or unintuitive?
Bart <bc@freeuk.com> writes:
On 16/04/2026 11:28, James Kuyper wrote:
On 15/04/2026 01:33, Bart wrote:
...
* Not allowing (until standardised after half a century) binaryThe latest draft standard supports %b and %B formats.
literals, and still not allowing those to be printed
...
* Not being allowed to do a dozen things that you KNOW are well-defined >>>> on your target machine, but C says are UB.If you know they are well-defined on your only target platform,
there's
nothing wrong with writing such code. That's part of the reason why C
says the behavior is undefined, rather than requiring that such code be
rejected. Implementations are intended to take advantage of that fact
for code that does not need to be portable.
Taking advantage in what way? Doing something entirely unexpected or
unintuitive?
How ridiculous! If you can figure out a way to take advantage of
unexpected behavior, I'd appreciate knowing what it is.
I was talking
about defining the behavior that the C standard itself leaves undefined,
in ways that make things convenient for the developer.
On 17/04/2026 01:26, James Kuyper wrote:
Bart <bc@freeuk.com> writes:
On 16/04/2026 11:28, James Kuyper wrote:
On 15/04/2026 01:33, Bart wrote:
...
* Not allowing (until standardised after half a century) binaryThe latest draft standard supports %b and %B formats.
literals, and still not allowing those to be printed
...
* Not being allowed to do a dozen things that you KNOW are well-If you know they are well-defined on your only target platform,
defined
on your target machine, but C says are UB.
there's
nothing wrong with writing such code. That's part of the reason why C
says the behavior is undefined, rather than requiring that such code be >>>> rejected. Implementations are intended to take advantage of that fact
for code that does not need to be portable.
Taking advantage in what way? Doing something entirely unexpected or
unintuitive?
How ridiculous! If you can figure out a way to take advantage of
unexpected behavior, I'd appreciate knowing what it is.
It was you who mentioned taking advantage.
And by taking advantage, I assume you meant all the unpredictable things that optimising compilers like to do, because they assume that UB cannot happen.
Signed integer overflow is the one that everyone knows (though oddly it
is not listed in Appendix J.2, or if it is, it doesn't use the word 'overflow'!).
I think there are other obscure ones to do with the order you read and
write members of unions, or apply type-punning, or what you can do with pointers.
A common scenario is where someone is implementing a language where such things are well-defined, and they want to run it on a target machine
where they are also well-defined, but decide to use C as an intermediate language.
Unfortunately C has other ideas! So this means somehow getting around
the UB in the C that is generated, or stipulating specific compilers or compiler options.
Or just crossing your fingers and hoping the compiler will not be so crass.
Another scenerio is where you just writing C code and want that same behaviour.
I was talking
about defining the behavior that the C standard itself leaves undefined,
in ways that make things convenient for the developer.
The developer of the C implementation, or the C application?
I don't often use intermediate C code now, but that code is no longer portable among C compilers. It is for gcc only, and requires:
-fno-strict-aliasing
I can't remember exactly why it's needed, but some programs won't work without it.
(It's used with -O2, also necessary due to much redundancy in the C
code. Without the aliasing option, gcc will warn with: "dereferencing type-punned pointer will break strict-aliasing rules")
Whatever it is, I don't need anything like that when bypassing C and
going straight to native code.
And you won't need it if writing real assembly.
On 17/04/2026 13:27, Bart wrote:That would, indeed, avoid undefined behavior, but it leaves you in the
On 17/04/2026 01:26, James Kuyper wrote:
Bart <bc@freeuk.com> writes:
On 16/04/2026 11:28, James Kuyper wrote:
On 15/04/2026 01:33, Bart wrote:
...
* Not allowing (until standardised after half a century) binaryThe latest draft standard supports %b and %B formats.
literals, and still not allowing those to be printed
...
* Not being allowed to do a dozen things that you KNOW areIf you know they are well-defined on your only target platform,
well- defined
on your target machine, but C says are UB.
there's
nothing wrong with writing such code. That's part of the reason
why C says the behavior is undefined, rather than requiring that
such code be rejected. Implementations are intended to take
advantage of that fact for code that does not need to be
portable.
Taking advantage in what way? Doing something entirely unexpected
or unintuitive?
How ridiculous! If you can figure out a way to take advantage of
unexpected behavior, I'd appreciate knowing what it is.
It was you who mentioned taking advantage.
And by taking advantage, I assume you meant all the unpredictable
things that optimising compilers like to do, because they assume
that UB cannot happen.
Signed integer overflow is the one that everyone knows (though
oddly it is not listed in Appendix J.2, or if it is, it doesn't use
the word 'overflow'!).
"An exceptional condition occurs during the evaluation of an
expression (6.5.1)"
You are correct that it does not use the word "overflow" - it's a bit
more generic than that.
I think there are other obscure ones to do with the order you read
and write members of unions, or apply type-punning, or what you can
do with pointers.
A common scenario is where someone is implementing a language where
such things are well-defined, and they want to run it on a target
machine where they are also well-defined, but decide to use C as an intermediate language.
That is an extraordinarily /uncommon/ scenario. I know it applies to
you, but you are not a typical C user in this respect.
People who want to use C as an intermediate language need to generate
code that is correct according to C semantics. It does not matter
how well the source language matches the target processor in its
behaviour if the C code in the middle has different ideas. (Indeed,
it does not matter what the target processor semantics are, except
for knowing the efficiency you can hope to achieve.) Thus if you
want wrapping signed integer arithmetic in your source language, you
must generate C code that emulates those semantics - such as by
having casts back and forth to unsigned types,
or using bigger types and then masking,Which can be problematic when dealing with the widest integer type.
or writing non-portable code such as adding
"#pragma GCC optimize ("wrapv")" to the generated code.
Unfortunately C has other ideas! So this means somehow getting
around the UB in the C that is generated, or stipulating specific
compilers or compiler options.
Should C semantics be designed to suit millions of general C
developers over several generations, or should they be optimised to
suit a single developer of non-C languages who can't be bothered
adding some casts to his code generator? Hm, that's a difficult
trade-off question...
Or just crossing your fingers and hoping the compiler will not be
so crass.
Another scenerio is where you just writing C code and want that
same behaviour.
That's a great deal more common than the transpiler situation. But
it is still far rarer than many people think. In general, people
don't want their integer arithmetic to overflow - doing so is a bug,
no matter what the results.
I was talking
about defining the behavior that the C standard itself leaves
undefined, in ways that make things convenient for the developer.
The developer of the C implementation, or the C application?
I don't often use intermediate C code now, but that code is no
longer portable among C compilers. It is for gcc only, and requires:
-fno-strict-aliasing
I recommend adding that as a pragma, not expecting people (yourself)
to remember it as a command-line option.
I can't remember exactly why it's needed, but some programs won't
work without it.
It is needed if you faff around with converting pointer types - lying
to your compiler by saying "this is a pointer to type A" when you are setting it to the address of an object of type B. Such "tricks" can
be convenient sometimes, more convenient than semantically correct
methods (like unions or using memmove) so I can understand the
appeal. But you should understand clearly that your C code here is non-portable and has undefined behaviour according to the C standard
- "gcc -fno-strict-aliasing" provides additional semantics that you
can rely on as long as you use that flag.
(It's used with -O2, also necessary due to much redundancy in the C
code. Without the aliasing option, gcc will warn with:
"dereferencing type-punned pointer will break strict-aliasing
rules")
gcc's warning here is slightly inaccurately worded, but very useful.
Whatever it is, I don't need anything like that when bypassing C
and going straight to native code.
And you won't need it if writing real assembly.
Sure. If you don't use C, you don't have to care about C semantics.
On 17/04/2026 13:27, Bart wrote:
Signed integer overflow is the one that everyone knows (though oddly
it is not listed in Appendix J.2, or if it is, it doesn't use the word
'overflow'!).
"An exceptional condition occurs during the evaluation of an expression (6.5.1)"
You are correct that it does not use the word "overflow" - it's a bit
more generic than that.
I think there are other obscure ones to do with the order you read and
write members of unions, or apply type-punning, or what you can do
with pointers.
A common scenario is where someone is implementing a language where
such things are well-defined, and they want to run it on a target
machine where they are also well-defined, but decide to use C as an
intermediate language.
That is an extraordinarily /uncommon/ scenario.
People who want to use C as an intermediate language need to generate
code that is correct according to C semantics. It does not matter how
well the source language matches the target processor in its behaviour
if the C code in the middle has different ideas.
Unfortunately C has other ideas! So this means somehow getting around
the UB in the C that is generated, or stipulating specific compilers
or compiler options.
Should C semantics be designed to suit millions of general C developers
over several generations, or should they be optimised to suit a single developer of non-C languages who can't be bothered adding some casts to
his code generator? Hm, that's a difficult trade-off question...
Another scenerio is where you just writing C code and want that same
behaviour.
That's a great deal more common than the transpiler situation. But it
is still far rarer than many people think. In general, people don't
want their integer arithmetic to overflow - doing so is a bug, no matter what the results.
I was talking
about defining the behavior that the C standard itself leaves undefined, >>> in ways that make things convenient for the developer.
The developer of the C implementation, or the C application?
I don't often use intermediate C code now, but that code is no longer
portable among C compilers. It is for gcc only, and requires:
-fno-strict-aliasing
I recommend adding that as a pragma, not expecting people (yourself) to remember it as a command-line option.
It is needed if you faff around with converting pointer types - lying to your compiler by saying "this is a pointer to type A" when you are
setting it to the address of an object of type B.
On Fri, 17 Apr 2026 14:37:47 +0200
David Brown <david.brown@hesbynett.no> wrote:
On 17/04/2026 13:27, Bart wrote:
On 17/04/2026 01:26, James Kuyper wrote:
Bart <bc@freeuk.com> writes:
On 16/04/2026 11:28, James Kuyper wrote:
On 15/04/2026 01:33, Bart wrote:
...
* Not allowing (until standardised after half a century) binaryThe latest draft standard supports %b and %B formats.
literals, and still not allowing those to be printed
...
* Not being allowed to do a dozen things that you KNOW areIf you know they are well-defined on your only target platform,
well- defined
on your target machine, but C says are UB.
there's
nothing wrong with writing such code. That's part of the reason
why C says the behavior is undefined, rather than requiring that
such code be rejected. Implementations are intended to take
advantage of that fact for code that does not need to be
portable.
Taking advantage in what way? Doing something entirely unexpected
or unintuitive?
How ridiculous! If you can figure out a way to take advantage of
unexpected behavior, I'd appreciate knowing what it is.
It was you who mentioned taking advantage.
And by taking advantage, I assume you meant all the unpredictable
things that optimising compilers like to do, because they assume
that UB cannot happen.
Signed integer overflow is the one that everyone knows (though
oddly it is not listed in Appendix J.2, or if it is, it doesn't use
the word 'overflow'!).
"An exceptional condition occurs during the evaluation of an
expression (6.5.1)"
You are correct that it does not use the word "overflow" - it's a bit
more generic than that.
I think there are other obscure ones to do with the order you read
and write members of unions, or apply type-punning, or what you can
do with pointers.
A common scenario is where someone is implementing a language where
such things are well-defined, and they want to run it on a target
machine where they are also well-defined, but decide to use C as an
intermediate language.
That is an extraordinarily /uncommon/ scenario. I know it applies to
you, but you are not a typical C user in this respect.
People who want to use C as an intermediate language need to generate
code that is correct according to C semantics. It does not matter
how well the source language matches the target processor in its
behaviour if the C code in the middle has different ideas. (Indeed,
it does not matter what the target processor semantics are, except
for knowing the efficiency you can hope to achieve.) Thus if you
want wrapping signed integer arithmetic in your source language, you
must generate C code that emulates those semantics - such as by
having casts back and forth to unsigned types,
That would, indeed, avoid undefined behavior, but it leaves you in the
realm of implementation-defined behavior (6.3.1.3.3).
6.3.1.3 Signed and unsigned integers
1
When a value with integer type is converted to another integer type
other than _Bool, if the value can be represented by the new type, it
is unchanged.
2
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.60)
3
Otherwise, the new type is signed and the value cannot be represented
in it; either the result is implementation-defined or an implementation-defined signal is raised.
or using bigger types and then masking,
Which can be problematic when dealing with the widest integer type.
Besides, it's still implementation-defined (the same 6.3.1.3.3 apply),
unless generated code is *very* elaborate.
On 17/04/2026 13:37, David Brown wrote:
On 17/04/2026 13:27, Bart wrote:
Signed integer overflow is the one that everyone knows (though oddly
it is not listed in Appendix J.2, or if it is, it doesn't use the
word 'overflow'!).
"An exceptional condition occurs during the evaluation of an
expression (6.5.1)"
You are correct that it does not use the word "overflow" - it's a bit
more generic than that.
I think there are other obscure ones to do with the order you read
and write members of unions, or apply type-punning, or what you can
do with pointers.
A common scenario is where someone is implementing a language where
such things are well-defined, and they want to run it on a target
machine where they are also well-defined, but decide to use C as an
intermediate language.
That is an extraordinarily /uncommon/ scenario.
Lots of languages do this. A few that you may have heard of include
Haxe, Seed7, Nim, FreeBasic and Haskell. Although with some it will be
an option.
Even early C++ did so, but there it had mainly C semantics anyway.
People who want to use C as an intermediate language need to generate
code that is correct according to C semantics. It does not matter how
well the source language matches the target processor in its behaviour
if the C code in the middle has different ideas.
Well, this is the problem.
But the thread is about C being equated to assembly, and this is one of
the differences.
Some UBs are reasonable, others are not because the
behaviour is poorly defined on some rare or obsolete hardware, but would
be fine on virtually anything someone is likely to use.
Unfortunately C has other ideas! So this means somehow getting around
the UB in the C that is generated, or stipulating specific compilers
or compiler options.
Should C semantics be designed to suit millions of general C
developers over several generations, or should they be optimised to
suit a single developer of non-C languages who can't be bothered
adding some casts to his code generator? Hm, that's a difficult
trade-off question...
My generated C now is full of casts. It doesn't help much, partly
because the casts are designed to match the C to the semantics of the
source language (here it is typed IL code transpiled to C), rather than fixing the problems of C.
Example (extract from a larger output; module name is 'h'):
#define asi64(x) *(i64*)&x
#define tou64(x) (u64)x
extern i32 printf(u64 $1, ...);
extern void exit(i32);
void h_main();
int main(int nargs, char** args) {
h_main();
}
void h_main() {
u64 R1, R2;
i64 a;
i64 b;
i64 c;
asi64(R1) = b;
asi64(R2) = c;
asi64(R1) += asi64(R2);
a = asi64(R1);
asi64(R1) = a;
R2 = tou64("hello %lld\n");
printf(asu64(R2), asi64(R1));
R1 = 0;
exit(R1);
return;
}
Another scenerio is where you just writing C code and want that same
behaviour.
That's a great deal more common than the transpiler situation. But it
is still far rarer than many people think. In general, people don't
want their integer arithmetic to overflow - doing so is a bug, no
matter what the results.
They want to do arbitrary conversions and type-punning. They want to use unions in whatever way they like without worrying that it may or may not
be UB.
I was talking
about defining the behavior that the C standard itself leaves
undefined,
in ways that make things convenient for the developer.
The developer of the C implementation, or the C application?
I don't often use intermediate C code now, but that code is no longer
portable among C compilers. It is for gcc only, and requires:
-fno-strict-aliasing
I recommend adding that as a pragma, not expecting people (yourself)
to remember it as a command-line option.
I've tried pragmas before, but they only worked on gcc/Windows; they
seemed to be ignored on gcc/Linux. For example for '-fno-builtin'. But
maybe I'll try it again.
Still, the entire build process for any of my programs, when expressed
as C, is still one command line involving one source file.
It is needed if you faff around with converting pointer types - lying
to your compiler by saying "this is a pointer to type A" when you are
setting it to the address of an object of type B.
Why would I be lying if I clearly use a cast to change T* (or some
integer X) to U*?
On 17/04/2026 15:49, Bart wrote:
On 17/04/2026 13:37, David Brown wrote:
On 17/04/2026 13:27, Bart wrote:
Signed integer overflow is the one that everyone knows (though oddly
it is not listed in Appendix J.2, or if it is, it doesn't use the
word 'overflow'!).
"An exceptional condition occurs during the evaluation of an
expression (6.5.1)"
You are correct that it does not use the word "overflow" - it's a bit
more generic than that.
I think there are other obscure ones to do with the order you read
and write members of unions, or apply type-punning, or what you can
do with pointers.
A common scenario is where someone is implementing a language where
such things are well-defined, and they want to run it on a target
machine where they are also well-defined, but decide to use C as an
intermediate language.
That is an extraordinarily /uncommon/ scenario.
Lots of languages do this. A few that you may have heard of include
Haxe, Seed7, Nim, FreeBasic and Haskell. Although with some it will be
an option.
Do all these languages support type-punning, unions, and signed integer arithmetic overflow defined in the way you think? I know Haskell does
not, I don't imagine FreeBasic does, but I can't answer for the others.
Well, this is the problem.
If you feel it is a problem for /you/, then I can't argue against that -
but it is /your/ problem.
#define asi64(x) *(i64*)&x
This is an extremely bad way to do conversions. It is possibly the
reason you need the "-fno-strict-aliasing" flag. Prefer to use value casts, not pointer casts, as you do with "tou64".
extern i32 printf(u64 $1, ...);
extern void exit(i32);
Why would you declare these standard library functions like that? Using "printf" will be UB, as the declaration does not match the definition.
It might happen to work on x86, but some platform ABIs pass pointers and integers in different registers.
And you are doing all this with uninitialised variables, which is UB.
It's not C that's the problem here. If you see problems, it is because
you are pretending that C is something that it is not, and that you can write all sorts of risky nonsense.
Why would I be lying if I clearly use a cast to change T* (or some
integer X) to U*?
C requires you to access objects using lvalues of the appropriate type.
But it also allows conversions between various pointer types - that is
how you can have generic and flexible code (such as using malloc returns
for different types). So if you have a pointer "p" of type "T*", and
you write "(U*) p", you are telling the compiler "I know I said p was a pointer to objects of type T, but in this particular case it is actually pointing to an object of type U - the value contained in p started off
as the address of a U, before it was converted to a T*". If the thing
your pointer "p" points to is /not/ an object of type U (or other
suitable type, following the compatibility and qualifier rules), then
you are lying to the compiler.
I've no idea. You just said it was uncommon to use C in this way. But[...]
every other amateur compiler project on Reddit forums likes to use a C target.
On 17/04/2026 01:26, James Kuyper wrote:
Bart <bc@freeuk.com> writes:
On 16/04/2026 11:28, James Kuyper wrote:How ridiculous! If you can figure out a way to take advantage of
On 15/04/2026 01:33, Bart wrote:
...
* Not allowing (until standardised after half a century) binaryThe latest draft standard supports %b and %B formats.
literals, and still not allowing those to be printed
...
* Not being allowed to do a dozen things that you KNOW are well-defined >>>>> on your target machine, but C says are UB.If you know they are well-defined on your only target platform,
there's
nothing wrong with writing such code. That's part of the reason why C
says the behavior is undefined, rather than requiring that such code be >>>> rejected. Implementations are intended to take advantage of that fact
for code that does not need to be portable.
Taking advantage in what way? Doing something entirely unexpected or
unintuitive?
unexpected behavior, I'd appreciate knowing what it is.
It was you who mentioned taking advantage.
And by taking advantage, I assume you meant all the unpredictable
things that optimising compilers like to do, because they assume that
UB cannot happen.
On 17/04/2026 15:45, David Brown wrote:
On 17/04/2026 15:49, Bart wrote:
On 17/04/2026 13:37, David Brown wrote:
On 17/04/2026 13:27, Bart wrote:
Signed integer overflow is the one that everyone knows (though oddly it is not listed in Appendix J.2, or if it is, it doesn't use the word 'overflow'!).
"An exceptional condition occurs during the evaluation of an expression (6.5.1)"
You are correct that it does not use the word "overflow" - it's a bit more generic than that.
I think there are other obscure ones to do with the order you read and write members of unions, or apply type-punning, or what you can do with pointers.
A common scenario is where someone is implementing a language where such things are well-defined, and they want to run it on a target machine where they are also well-defined, but decide to use C as an intermediate language.
That is an extraordinarily /uncommon/ scenario.
Lots of languages do this. A few that you may have heard of include Haxe, Seed7, Nim, FreeBasic and Haskell. Although with some it will be an option.
Do all these languages support type-punning, unions, and signed integer arithmetic overflow defined in the way you think? I know Haskell does not, I don't imagine FreeBasic does, but I can't answer for the others.
I've no idea. You just said it was uncommon to use C in this way. But
every other amateur compiler project on Reddit forums likes to use a C target.
Well, this is the problem.
If you feel it is a problem for /you/, then I can't argue against that - but it is /your/ problem.
It is a problem when using C for this purpose, which wouldn't arise
using a language designed to be used as an intermediate target.
#define asi64(x) *(i64*)&x
This is an extremely bad way to do conversions. It is possibly the reason you need the "-fno-strict-aliasing" flag. Prefer to use value casts, not pointer casts, as you do with "tou64".
For this purpose, the C has to emulate a stack machine, with the stack
slots being a fixed type (u64) which have to contain signed and unsigned integers, floats or doubles, or any kinds of pointer, or even any
arbitrary struct or array, by value.
One option was to use a union type for each stack element, but I decided
my choice would give cleaner code.
extern i32 printf(u64 $1, ...);
extern void exit(i32);
Why would you declare these standard library functions like that? Using "printf" will be UB, as the declaration does not match the definition.
On most 64-bit machines these days you have float and non-float register banks. Pointers are non-floats so can be handled like ints.
In the IL that this C comes from, there are no pointer types. The
convention is to use 'u64' to represent addresses.
It might happen to work on x86, but some platform ABIs pass pointers and integers in different registers.
The only one I know off-hand is 68K, which has separate data and address registers, and that might happen, so I'll keep it in mind!
(There is a slim chance I can target 68K from my IL, via an emulator
that I would make, but likely I wouldn't be able to use a C library
anyway. It's funny I remember thinking around 1984 that those dual
register files would make it tricky to compile for.)
And you are doing all this with uninitialised variables, which is UB.
So this is something else. There should be no problem with using
unitialised data here, other than not being meaningful or useful.
They're uninitialised because my test program didn't bother to do so.
But running the program shouldn't be a problem. Why, what do you think C might do that is so bad?
Here is the original HLL fragment:
int a, b, c
a := b + c
This is the portable IL generated from that:
i64 x
i64 y
i64 z
load y i64 # load/store = push/pop
load z i64
add i64
store x i64
This uses two stack slots. In the C above, those slots are called R1 and R2.
The same IL can be turned directly into x64 code:
R.a = D3 # D0-D15 are 64-bit regs R.b = D4
R.c = D5
mov D0, R.b
add D0, R.c
mov R.a, D0
(This could be reduced to one instruction, but it's no faster.)
AFAIK no hardware exceptions are caused by adding whatever bit patterns happen to be in those 'b' and 'c' registers.
It's not C that's the problem here. If you see problems, it is because you are pretending that C is something that it is not, and that you can write all sorts of risky nonsense.
I use generated C for three things:
* To share my non-C programs with others, who can't/won't use my
compiler binary
* To optimise my non-C programs
* To run my non-C programs on platforms I don't directly support.
When I do that, it seems to work. Eg. Paul Edwards is using my C
compiler, and I distribute it as a 66Kloc file full of code like my
example. But I can also distribute it as NASM, AT&T or MASM assembly.
Why would I be lying if I clearly use a cast to change T* (or some integer X) to U*?
If I understand from several of your posts, you might expect that the standard CC requires you to access objects using lvalues of the appropriate type. But it also allows conversions between various pointer types - that is
how you can have generic and flexible code (such as using malloc returns for different types). So if you have a pointer "p" of type "T*", and
you write "(U*) p", you are telling the compiler "I know I said p was a pointer to objects of type T, but in this particular case it is actually pointing to an object of type U - the value contained in p started off
as the address of a U, before it was converted to a T*". If the thing your pointer "p" points to is /not/ an object of type U (or other
suitable type, following the compatibility and qualifier rules), then
you are lying to the compiler.
I don't care about any of this. If I take a byte* pointer and cast it to int* and then write via that version, I expect it to do exactly that,
and not question my choice!
This is exactly how it works in assembly, in my HLLs, and in my ILs.
It's possible that I may have done that erroneously, but that is another matter. This is not about detecting coding bugs in the source language.
On 17/04/2026 01:26, James Kuyper wrote:...
Bart <bc@freeuk.com> writes:
On 16/04/2026 11:28, James Kuyper wrote:
On 15/04/2026 01:33, Bart wrote:
* Not being allowed to do a dozen things that you KNOW are well-If you know they are well-defined on your only target platform,
defined
on your target machine, but C says are UB.
there's
nothing wrong with writing such code. That's part of the reason why C
says the behavior is undefined, rather than requiring that such code be >>>> rejected. Implementations are intended to take advantage of that fact
for code that does not need to be portable.
Taking advantage in what way? Doing something entirely unexpected or
unintuitive?
How ridiculous! If you can figure out a way to take advantage of
unexpected behavior, I'd appreciate knowing what it is.
It was you who mentioned taking advantage.
And by taking advantage, I assume you meant all the unpredictable things that optimising compilers like to do, because they assume that UB cannot happen.
I think there are other obscure ones to do with the order you read and
write members of unions, or apply type-punning, or what you can do with pointers.
A common scenario is where someone is implementing a language where such things are well-defined, and they want to run it on a target machine
where they are also well-defined, but decide to use C as an intermediate language.
Unfortunately C has other ideas! So this means somehow getting around
the UB in the C that is generated, or stipulating specific compilers or
Or just crossing your fingers and hoping the compiler will not be so crass.
Another scenerio is where you just writing C code and want that same behaviour.In virtually every case where the C behavior is undefined, you can
I was talking
about defining the behavior that the C standard itself leaves undefined,
in ways that make things convenient for the developer.
The developer of the C implementation, or the C application?
I don't often use intermediate C code now, but that code is no longer portable among C compilers. It is for gcc only, and requires:
-fno-strict-aliasing
I can't remember exactly why it's needed, but some programs won't work without it.
In attempting writting a simple language, I had a thought of what language is >to share. Because I saw many people are stuck in thinking C/C++ (or other >high level language) can be so abstract, unlimited 'high level' to mysteriously
solve various human description of idea.
C and assembly are essentially the same, maybe better call it 'portable assembly'.
In C, we don't explicitly specify how wide the register/memory unit is, we use >char/int (short/long, signed/unsigned) to denote the basic unit. I.e.
a=b; // equ. to "mov a,b"
On 17/04/2026 15:45, David Brown wrote:
On 17/04/2026 15:49, Bart wrote:
On 17/04/2026 13:37, David Brown wrote:
On 17/04/2026 13:27, Bart wrote:
Signed integer overflow is the one that everyone knows (though
oddly it is not listed in Appendix J.2, or if it is, it doesn't use >>>>> the word 'overflow'!).
"An exceptional condition occurs during the evaluation of an
expression (6.5.1)"
You are correct that it does not use the word "overflow" - it's a
bit more generic than that.
I think there are other obscure ones to do with the order you read
and write members of unions, or apply type-punning, or what you can >>>>> do with pointers.
A common scenario is where someone is implementing a language where >>>>> such things are well-defined, and they want to run it on a target
machine where they are also well-defined, but decide to use C as an >>>>> intermediate language.
That is an extraordinarily /uncommon/ scenario.
Lots of languages do this. A few that you may have heard of include
Haxe, Seed7, Nim, FreeBasic and Haskell. Although with some it will
be an option.
Do all these languages support type-punning, unions, and signed
integer arithmetic overflow defined in the way you think? I know
Haskell does not, I don't imagine FreeBasic does, but I can't answer
for the others.
I've no idea. You just said it was uncommon to use C in this way. But
every other amateur compiler project on Reddit forums likes to use a C target.
Well, this is the problem.
If you feel it is a problem for /you/, then I can't argue against that
- but it is /your/ problem.
It is a problem when using C for this purpose, which wouldn't arise
using a language designed to be used as an intermediate target.
#define asi64(x) *(i64*)&x
This is an extremely bad way to do conversions. It is possibly the
reason you need the "-fno-strict-aliasing" flag. Prefer to use value
casts, not pointer casts, as you do with "tou64".
For this purpose, the C has to emulate a stack machine, with the stack
slots being a fixed type (u64) which have to contain signed and unsigned integers, floats or doubles, or any kinds of pointer, or even any
arbitrary struct or array, by value.
One option was to use a union type for each stack element, but I decided
my choice would give cleaner code.
extern i32 printf(u64 $1, ...);
extern void exit(i32);
Why would you declare these standard library functions like that?
Using "printf" will be UB, as the declaration does not match the
definition.
On most 64-bit machines these days you have float and non-float register banks. Pointers are non-floats so can be handled like ints.
In the IL that this C comes from, there are no pointer types. The
convention is to use 'u64' to represent addresses.
It might happen to work on x86, but some platform ABIs pass pointers
and integers in different registers.
The only one I know off-hand is 68K, which has separate data and address registers, and that might happen, so I'll keep it in mind!
(There is a slim chance I can target 68K from my IL, via an emulator
that I would make, but likely I wouldn't be able to use a C library
anyway. It's funny I remember thinking around 1984 that those dual
register files would make it tricky to compile for.)
And you are doing all this with uninitialised variables, which is UB.
So this is something else. There should be no problem with using
unitialised data here, other than not being meaningful or useful.
They're uninitialised because my test program didn't bother to do so.
But running the program shouldn't be a problem. Why, what do you think C might do that is so bad?
On Thu, 2026-04-16 at 18:42 +0800, wij wrote:(Continue)
On Wed, 2026-04-15 at 19:04 -0700, Keith Thompson wrote:
wij <wyniijj5@gmail.com> writes:
On Wed, 2026-04-15 at 17:14 -0700, Tim Rentsch wrote:
wij <wyniijj5@gmail.com> writes:
[... comparing C and assembly language ...]
Gentlemen,
I understand the natural reaction to want to respond to the kind of statements being made in this thread. I hope y'all can resist this natural reaction and not respond to people who persist in making arguments that are basically isomorphic to saying 1 equals 0.
Thank you for your assistance in this matter.
Maybe you are right. I say A is-a B, one persist to read A is (exactly) B.
I provide help to using assembly. One persist to read I persuade using assembly and give up HLL. What is going on here?
You say that C is an assembly language. Nobody here thinks that
you're *equating* C and assembly language. It's obvious that
there are plenty of assembly languages that are not C, and nobody
has said otherwise. I have no idea why you think anyone has that particular confusion.
At least one person has apparently interpreted your defense of
assembly language (that it isn't as scary as some think it is)
as a claim that we should program in assembly language rather
than in HLLs. You're right, that was a misinterpretation of what
you wrote. I considered mentioning that, but didn't bother.
The issue I've been discussing is your claim that C is an assembly language. It is not.
If I said C is assembly is in the sense that have at least shown in the last
post (s_tut2.cpp), where even 'instruction' can be any function (e.g. change
directory, copy files, launch an editor,...). And also, what is 'computation'
is demonstrated, which include suggestion what C is, essentially any program,
and in this sense what HLL is. Finally, it could demonstrate the meaning and
testify Church-Turing thesis (my words: no computation language, including
various kind of math formula, can exceeds the expressive power of TM).
It seem you insist C and assembly have to be exactly what your bible says. If
so, I would say what C standard (I cannot read it) says is the meaning of terminology of term in it, not intended to be anything used in any other situation.
I do not intend to post again in this thread until and unless you
post something substantive on that issue.
(continue)
IMO, C standard is like book of legal terms. Like many symbols in the header file, it defines one symbol in anoter symbol. The real meaning is not fixed. The result is you cannot 'prove' correctness of the source program, even consistency is a problem.
'Instruction' is low-level? Yes, by definition, but not as one might think. Instruction could refer to a processing unit (might be like the x87 math co-processor, which may even be more higher level to process expression,...) As good chance of C is to find a good function that can be hardwired.
So, the basic feature of HLL is 'structured' (or 'nested') text which removes labels. Semantics is inventor's imagination. So, avoid bizarre complexity, it won't add express power to the language, just a matter of short or lengthy expression of programming idea.
On 17/04/2026 18:42, Bart wrote:
I've no idea. You just said it was uncommon to use C in this way. But
every other amateur compiler project on Reddit forums likes to use a C
target.
You didn't simply claim that people were using C as an intermediary
language - you claimed they were doing so specifically for languages
that defined things like type punning, wrapping signed integer
arithmetic, and messing about with pointers.
It is a problem when using C for this purpose, which wouldn't arise
using a language designed to be used as an intermediate target.
C is not designed for that purpose, nor are C compilers.
So if you
don't like C here, don't use it. It is not the fault of C, its language designers, or toolchain implementers. And if this really were the
problem you seem to think, people would use something else.
As it turns out, people /do/ use something else. There are countless virtual machines with their own byte-codes, specialised for different
types of source languages. And there is a common intermediary language used by a lot of tools - LLVM "assembly". This /was/ designed for that purpose, and does a pretty good job at it.
And if you don't like it (of course you don't like it - you didn't
invent it), find or make something else.
One option was to use a union type for each stack element, but I
decided my choice would give cleaner code.
Oh, right - you knew of a correct solution, but decided instead that something broken would be cleaner.
So you think UB is better than doing things correctly, and then you
complain when C doesn't have the semantics you want?
So this is something else. There should be no problem with using
unitialised data here, other than not being meaningful or useful.
Again - you are pretending that C means what you think it should mean.
Using uninitialised local data leads, in most cases, to UB in C. If
your language treats uninitialised data as unspecified values, or
default initialised (typically to 0), or has some other determined behaviour, then you need to implement that behaviour in the generated C code. You don't get to generate C and pretend it means something different.
On 18/04/2026 14:37, David Brown wrote:
On 17/04/2026 18:42, Bart wrote:
I've no idea. You just said it was uncommon to use C in thisYou didn't simply claim that people were using C as an intermediary
way. But every other amateur compiler project on Reddit forums
likes to use a C target.
language - you claimed they were doing so specifically for languages
that defined things like type punning, wrapping signed integer
arithmetic, and messing about with pointers.
The broader picture is being forgotten. The thread is partly about C
being a 'portable assembler', and this is a common notion.
C is famous for being low level; being close to the hardware; for a
1:1 correspondence between types that people work with, and the
operations on those, with the equivalent assembly.
Whether that is correct or not, that is what people think or say, and
what many assume.
It is also what very many want, including me.
Yes, I know. There should have been one that is much better - a HLL,
not the monstrosity that is LLVM. But it doesn't exist.
Oh, right - you knew of a correct solution, but decided instead that
something broken would be cleaner.
Well, it shouldn't BE broken! That's the problem with C.
So you think UB is better than doing things correctly, and then you
complain when C doesn't have the semantics you want?
I'm saying that a lot of things shouldn't be UB. Some people just want
want to write assembly - for a specific machine - but also want HLL conveniences.
So this is something else. There should be no problem with usingAgain - you are pretending that C means what you think it should
unitialised data here, other than not being meaningful or useful.
mean. Using uninitialised local data leads, in most cases, to UB in
C. If your language treats uninitialised data as unspecified
values, or default initialised (typically to 0), or has some other
determined behaviour, then you need to implement that behaviour in
the generated C code. You don't get to generate C and pretend it
means something different.
In a real application then using unitialised data, outside of .bss,
would be uncommon, and likely be a bug.
But outside of a real application, such as in fragments of test code
that I work on every day, then variables can be uninitialised,
especially if I'm interested more in the code that is being generated
and will not actually run it.
It seems that a C compiler cannot make that distinction and must
always assume that every program, even in development, is
mission-critical.
In my original example, they weren't initialised in order to keep the
posted examples short.
I still wouldn't call A + B undefined behaviour when A/B are not
initialised; the result is the sum of whatever A and B happen to
contain, and is little different from:
A = rand();
B = rand();
A + B;
It's a common wrong notion.[...]
If you are talking about function-local data, there are multiple
ways to do store them in an easy-to-clean-up fashion:
On 18/04/2026 14:37, David Brown wrote:
On 17/04/2026 18:42, Bart wrote:
I've no idea. You just said it was uncommon to use C in this way. But
every other amateur compiler project on Reddit forums likes to use a
C target.
You didn't simply claim that people were using C as an intermediary
language - you claimed they were doing so specifically for languages
that defined things like type punning, wrapping signed integer
arithmetic, and messing about with pointers.
The broader picture is being forgotten. The thread is partly about C
being a 'portable assembler', and this is a common notion.
C is famous for being low level; being close to the hardware; for a 1:1 correspondence between types that people work with, and the operations
on those, with the equivalent assembly.
Whether that is correct or not, that is what people think or say, and
what many assume.
It is also what very many want, including me.
This particular use-case for C as an intermediate language is one
example, a good one as it highlights the issues. But I also want all
those assumptions to be true.
(In my systems language, it is a lot truer than in C. But my language supports a small number of targets, and usually one at a time.)
It is a problem when using C for this purpose, which wouldn't arise
using a language designed to be used as an intermediate target.
C is not designed for that purpose, nor are C compilers.
Yes, I know. There should have been one that is much better - a HLL, not
the monstrosity that is LLVM. But it doesn't exist.
If it did, then it could have served another purpose for which C is currently used and is not ideal either, which is to express APIs of libraries. Currently that is too C-centric and it is a big task to
tranlate into bindings for other languages.
(For example, the headers for GTK2 include about 4000 C macro definitions.)
So if you don't like C here, don't use it. It is not the fault of
C, its language designers, or toolchain implementers. And if this
really were the problem you seem to think, people would use something
else.
There /is/ nothing else. C is the best of a bad bunch.
One option was to use a union type for each stack element, but I
decided my choice would give cleaner code.
Oh, right - you knew of a correct solution, but decided instead that
something broken would be cleaner.
Well, it shouldn't BE broken! That's the problem with C.
So you think UB is better than doing things correctly, and then you
complain when C doesn't have the semantics you want?
I'm saying that a lot of things shouldn't be UB. Some people just want
want to write assembly - for a specific machine - but also want HLL conveniences.
So this is something else. There should be no problem with using
unitialised data here, other than not being meaningful or useful.
Again - you are pretending that C means what you think it should mean.
Using uninitialised local data leads, in most cases, to UB in C. If
your language treats uninitialised data as unspecified values, or
default initialised (typically to 0), or has some other determined
behaviour, then you need to implement that behaviour in the generated
C code. You don't get to generate C and pretend it means something
different.
In a real application then using unitialised data, outside of .bss,
would be uncommon, and likely be a bug.
But outside of a real application, such as in fragments of test code
that I work on every day, then variables can be uninitialised,
especially if I'm interested more in the code that is being generated
and will not actually run it.
It seems that a C compiler cannot make that distinction and must always assume that every program, even in development, is mission-critical.
In my original example, they weren't initialised in order to keep the
posted examples short.
I still wouldn't call A + B undefined behaviour when A/B are not initialised; the result is the sum of whatever A and B happen to
contain, and is little different from:
A = rand();
B = rand();
A + B;
On 18/04/2026 17:08, Bart wrote:
(Yes, LLVM and the tools around it are big. It takes a lot of effort to make use of them, but you get a lot in return. A "little language" has
to grow to a certain size in numbers of toolchain developers and numbers
of toolchain users before it can make sense to move to LLVM.
But doing
so is still a fraction of the work compared to making a serious
optimising back-end for multiple targets.)
If it did, then it could have served another purpose for which C is
currently used and is not ideal either, which is to express APIs of
libraries. Currently that is too C-centric and it is a big task to
tranlate into bindings for other languages.
(For example, the headers for GTK2 include about 4000 C macro
definitions.)
And yet in practice C is it good enough for almost cases.
A C compiler expects code written in valid C. Compilers expect code to
be run - I don't think that is unreasonable.
And when I use a compiler
to look at generated assembly for some C code (and I do that quite
often), I am using C code that has a meaning if it were to be run.
On 19/04/2026 11:17, David Brown wrote:
On 18/04/2026 17:08, Bart wrote:
(Yes, LLVM and the tools around it are big. It takes a lot of effort
to make use of them, but you get a lot in return. A "little language"
has to grow to a certain size in numbers of toolchain developers and
numbers of toolchain users before it can make sense to move to LLVM.
Actually lots of small projects use LLVM.
But probably people don't realise it is like installing the engine from
a container ship into your small family car.
A C compiler expects code written in valid C. Compilers expect code
to be run - I don't think that is unreasonable.
What's not valid about 'a = b + c'?
And when I use a compiler to look at generated assembly for some C
code (and I do that quite often), I am using C code that has a meaning
if it were to be run.
I'm interested too, but if I compile this in godbolt:
void F() {
int a, b, c;
a = b + c * 8;
}
then all the C compilers I tried generated code at -O0 which kept those variables in memory.
What does the code look like when a/b/c are kept in registers? I've no
idea, because at soon as you try -O1 and above, the whole expression is elided.
If you stick 'static' in front, then the whole function disappears. This
is not very useful when trying to compare code generation across
compilers and languages!
If I do something meaningful with 'a' to keep the expression alive, and initialise b and c, then the whole expression is reduced to a constant.
What do you have to do see if the expression would be compiled to, for example, 'lea ra, [rb + rc*8]'?
Bart <bc@freeuk.com> writes:
On 18/04/2026 14:37, David Brown wrote:
On 17/04/2026 18:42, Bart wrote:
I've no idea. You just said it was uncommon to use C in thisYou didn't simply claim that people were using C as an intermediary
way. But every other amateur compiler project on Reddit forums
likes to use a C target.
language - you claimed they were doing so specifically for languages
that defined things like type punning, wrapping signed integer
arithmetic, and messing about with pointers.
The broader picture is being forgotten. The thread is partly about C
being a 'portable assembler', and this is a common notion.
It's a common wrong notion.
One person here recently claimed that C is a kind of assembly
language.
Yes, I know. There should have been one that is much better - a HLL,
not the monstrosity that is LLVM. But it doesn't exist.
Given your habit of inventing your own languages and writing your own compilers, I'm surprised you haven't defined your own intermediate
language, something like LLVM IR but suiting your purposes better.
You're complaining about a problem that *you* might be in a position
to address.
On 18/04/2026 17:08, Bart wrote:
The broader picture is being forgotten. The thread is partly about C
being a 'portable assembler', and this is a common notion.
It is a common misconception - and I believe we agree it is a
misconception.
C is famous for being low level; [...]
I describe C as being a relatively low level high-level language. [...]
On 19/04/2026 11:17, David Brown wrote:
On 18/04/2026 17:08, Bart wrote:
(Yes, LLVM and the tools around it are big. It takes a lot of
effort to make use of them, but you get a lot in return. A "little language" has to grow to a certain size in numbers of toolchain
developers and numbers of toolchain users before it can make sense
to move to LLVM.
Actually lots of small projects use LLVM.
But probably people don't realise it is like installing the engine
from a container ship into your small family car.
It would be faster to compile. Probably, meaningfully faster forBut doing
so is still a fraction of the work compared to making a serious
optimising back-end for multiple targets.)
If it did, then it could have served another purpose for which C
is currently used and is not ideal either, which is to express
APIs of libraries. Currently that is too C-centric and it is a big
task to tranlate into bindings for other languages.
(For example, the headers for GTK2 include about 4000 C macro
definitions.)
And yet in practice C is it good enough for almost cases.
It is not even good enough C. To get back to GTK2 (which I looked at
in detail some years back), compiling this program:
#include <gtk2.h>
involved processing over 1000 #includes, some 550 discrete headers,
330K lines of declarations, with a bunch of -I options to tell it the
dozen different folders it needs to go and look for those headers.
I was looking at reducing the whole thing to one file - a set of
bindings in my language for the functions, types etc that are exposed.
This file would have been 25Kloc in my language (including those 4000 headers; most would have been simple #defines, but many will have
needed manual translation: macros can contain actual C code, not just declarations).
HOWEVER... if such an exercise works for my language, why can't it
work for C too? That is, reduce those 100s of header files and dozens
of folders into a single 25Kloc file, specific to your platform.
Think how much easier it would be to install, or employ, and how
much faster to /compile/!
So why doesn't this happen? The equivalent exercise for SDL2 would
reduce 50Kloc across 80 header files (at least these are in the same
folder) to one 3Kloc file.
A C compiler expects code written in valid C. Compilers expect
code to be run - I don't think that is unreasonable.
What's not valid about 'a = b + c'?
And when I use a compiler
to look at generated assembly for some C code (and I do that quite
often), I am using C code that has a meaning if it were to be run.
I'm interested too, but if I compile this in godbolt:
void F() {
int a, b, c;
a = b + c * 8;
}
then all the C compilers I tried generated code at -O0 which kept
those variables in memory.
What does the code look like when a/b/c are kept in registers? I've
no idea, because at soon as you try -O1 and above, the whole
expression is elided.
If you stick 'static' in front, then the whole function disappears.
This is not very useful when trying to compare code generation across compilers and languages!
If I do something meaningful with 'a' to keep the expression alive,
and initialise b and c, then the whole expression is reduced to a
constant.
What do you have to do see if the expression would be compiled to,
for example, 'lea ra, [rb + rc*8]'?
On 19/04/2026 13:50, Bart wrote:
On 19/04/2026 11:17, David Brown wrote:
On 18/04/2026 17:08, Bart wrote:
(Yes, LLVM and the tools around it are big. It takes a lot of effort
to make use of them, but you get a lot in return. A "little
language" has to grow to a certain size in numbers of toolchain
developers and numbers of toolchain users before it can make sense to
move to LLVM.
Actually lots of small projects use LLVM.
But probably people don't realise it is like installing the engine
from a container ship into your small family car.
The strange thing about the software world is that it does not matter.
I do appreciate liking things to be small, simple and efficient.
Sometimes that is important - in my own work, it is very often
important. But often it doesn't matter at all. There are other things more worthy of our time and effort.
Would it be better if the gcc toolchain installation for the cross
compiler I use were 1 MB of installation over 20 files, rather than
whatever it is now?
void F() {
int a, b, c;
a = b + c * 8;
}
So you want to know how the compiler deals with meaningless code. Why?
Do you not know how to write meaningful code?
then all the C compilers I tried generated code at -O0 which kept
those variables in memory.
They are on the stack in memory, yes. You've asked for close to a
direct and naïve translation, which gives no insight into what kind of
code the compiler can generate and is harder to follow (because it's
mostly moving things onto and off from the stack).
What does the code look like when a/b/c are kept in registers? I've no
idea, because at soon as you try -O1 and above, the whole expression
is elided.
If you stick 'static' in front, then the whole function disappears.
This is not very useful when trying to compare code generation across
compilers and languages!
If I do something meaningful with 'a' to keep the expression alive,
and initialise b and c, then the whole expression is reduced to a
constant.
What do you have to do see if the expression would be compiled to, for
example, 'lea ra, [rb + rc*8]'?
int f(int b, int c)
{
int a;
a = b + c * 8;
return a;
}
If you don't want to use parameters and return values, I recommend
declaring externally linked volatile variables and use them as the
source and destination of your calculations:
volatile int xa;
volatile int xb;
volatile int xc;
void foo(void) {
int a, b, c;
b = xa;
c = xc;
a = b + c * 8;
xa = a;
}
When you ask the compiler "give me an efficient implementation of this
code" and the compiler can see that the code does nothing, it generates
no code (or just a "ret"). This should not be a surprise. So you might need "tricks" to make the code mean something - including access to
volatile objects is one of these tricks.
On Sun, 19 Apr 2026 12:50:04 +0100
Bart <bc@freeuk.com> wrote:
It is not even good enough C. To get back to GTK2 (which I looked at
in detail some years back), compiling this program:
#include <gtk2.h>
involved processing over 1000 #includes, some 550 discrete headers,
330K lines of declarations, with a bunch of -I options to tell it the
dozen different folders it needs to go and look for those headers.
I was looking at reducing the whole thing to one file - a set of
bindings in my language for the functions, types etc that are exposed.
This file would have been 25Kloc in my language (including those 4000
headers; most would have been simple #defines, but many will have
needed manual translation: macros can contain actual C code, not just
declarations).
HOWEVER... if such an exercise works for my language, why can't it
work for C too? That is, reduce those 100s of header files and dozens
of folders into a single 25Kloc file, specific to your platform.
Think how much easier it would be to install, or employ, and how
much faster to /compile/!
It would be faster to compile. Probably, meaningfully faster for
compiling large GUI project from scratch with very slow compiler like
gcc. Probably, not meaningfully faster in other situations.
It would not be easier to install or employ unless one happens to be as stubborn as you are.
If I ever want to write code using GTK2 for hobby purpose, which is
extremely unlikely, then all I'd need to do is to type 'pacman -S mingw-w64-ucrt-x86_64-gtk2' at msys2 command prompt. That's all.
For somebody on Debian/Ubuntu it likely would be 'apt-get install
gtk2'. RHEL/Fedora, MSVC command prompt or Mac it would be some other
magic incantation. Except that for the latter two it's probably not
available at all, so even easier.
The point is - it's already so easy that you can't really make it any
easier, at best the same.
On 19/04/2026 13:17, David Brown wrote:
On 19/04/2026 13:50, Bart wrote:
On 19/04/2026 11:17, David Brown wrote:
On 18/04/2026 17:08, Bart wrote:
void F() {
int a, b, c;
a = b + c * 8;
}
So you want to know how the compiler deals with meaningless code.
Why? Do you not know how to write meaningful code?
I don't want the compiler deciding what's a meaningful program. The
intent here is clear:
* Allocate 3 local slots for int
* Add the contents of two of those, and store into the third.
That is the task. In terms of observable effects, there are are at least two: the code that is generated, and the time it might take to execute.
There is also the code size, and the compilation time.
One of my favourite compilation benchmarks is this:
void F() {
int a, b=2, c=3, d=4
a = b + c * d;
.... // repeat N times
printf("%d\n", a);
}
Here initialisation is used otherwise it causes problems with
interpreted languages for example.
It is amazing how many language implementations have trouble with this, especially with bigger N such as 1000000. The bigger ones usually fare worse.
This program is not meaningful; it is simply a stress test. Two more r observable effects are at what N it fails, and whether it crashes or
fails gracefully.
then all the C compilers I tried generated code at -O0 which kept
those variables in memory.
They are on the stack in memory, yes. You've asked for close to a
direct and naïve translation, which gives no insight into what kind of
code the compiler can generate and is harder to follow (because it's
mostly moving things onto and off from the stack).
It's easier to follow. Or would be it the compiler were to generate
decent assembly. gcc -O0 produces:
F:
pushq %rbp
movq %rsp, %rbp
movl -4(%rbp), %eax
leal 0(,%rax,8), %edx
movl -8(%rbp), %eax
addl %edx, %eax
movl %eax, -12(%rbp)
nop
popq %rbp
ret
What does the code look like when a/b/c are kept in registers? I've
no idea, because at soon as you try -O1 and above, the whole
expression is elided.
If you stick 'static' in front, then the whole function disappears.
This is not very useful when trying to compare code generation across
compilers and languages!
If I do something meaningful with 'a' to keep the expression alive,
and initialise b and c, then the whole expression is reduced to a
constant.
What do you have to do see if the expression would be compiled to,
for example, 'lea ra, [rb + rc*8]'?
int f(int b, int c)
{
int a;
a = b + c * 8;
return a;
}
If you don't want to use parameters and return values, I recommend
declaring externally linked volatile variables and use them as the
source and destination of your calculations:
volatile int xa;
volatile int xb;
volatile int xc;
void foo(void) {
int a, b, c;
b = xa;
c = xc;
a = b + c * 8;
xa = a;
}
When you ask the compiler "give me an efficient implementation of this
code" and the compiler can see that the code does nothing, it
generates no code (or just a "ret"). This should not be a surprise.
So you might need "tricks" to make the code mean something - including
access to volatile objects is one of these tricks.
So, you have to spend time fooling the compiler. And then you are never quite sure if it has left something out so that you're not comparing
like with like.
However, this is a perfect example of how even a language and especially
its compilers differ from assembly and assemblers.
It can happen with my compilers too, but on a much smaller scale. For example 'a = 2 + 2' is reduced to 'a = 4'. But it is easier to get around.
On 19/04/2026 16:28, Bart wrote:
On 19/04/2026 13:17, David Brown wrote:
On 19/04/2026 13:50, Bart wrote:
On 19/04/2026 11:17, David Brown wrote:
On 18/04/2026 17:08, Bart wrote:
void F() {
int a, b, c;
a = b + c * 8;
}
So you want to know how the compiler deals with meaningless code.
Why? Do you not know how to write meaningful code?
I don't want the compiler deciding what's a meaningful program. The
intent here is clear:
No, the intent is not clear. If you are writing in C, and you intend
the code to have a definite meaning, you have to write that meaning in
C. Break C's rules, and the code does not have meaning as a whole - and compilers cannot be expected to guess what you meant, especially when
you ask them to analyse your code carefully to generate optimised output.
* Allocate 3 local slots for int
* Add the contents of two of those, and store into the third.
That is not what you wrote - because that's not what the C means.
As you pointed out yourself, C is not assembly. It does not have a
direct meaning like this.
Stress tests of tools can be useful. I would not say something like
this is useful as a compilation benchmark - I want my tools to be fast enough for practical use on the real code I write, and don't care how
slow they are for totally meaningless and unrealistic code.
But if I
were writing a tool, I'd like to know how well it handled extreme cases.
(Sometimes generated C code has functions with huge numbers of simple lines, totally unlike code that anyone would write by hand.
It would
not have pointless repetition of lines, however.)
So, you have to spend time fooling the compiler. And then you are
never quite sure if it has left something out so that you're not
comparing like with like.
Sorry, I thought I was being helpful so that you would understand how to
get the results you are asking for from compilers. I am not "fooling"
the compiler, I am showing you how to ask the right questions.
On 19/04/2026 16:47, David Brown wrote:
On 19/04/2026 16:28, Bart wrote:
On 19/04/2026 13:17, David Brown wrote:
On 19/04/2026 13:50, Bart wrote:
On 19/04/2026 11:17, David Brown wrote:
On 18/04/2026 17:08, Bart wrote:
void F() {
int a, b, c;
a = b + c * 8;
}
So you want to know how the compiler deals with meaningless code.
Why? Do you not know how to write meaningful code?
I don't want the compiler deciding what's a meaningful program. The
intent here is clear:
No, the intent is not clear. If you are writing in C, and you intend
the code to have a definite meaning, you have to write that meaning in
C. Break C's rules, and the code does not have meaning as a whole -
and compilers cannot be expected to guess what you meant, especially
when you ask them to analyse your code carefully to generate optimised
output.
* Allocate 3 local slots for int
* Add the contents of two of those, and store into the third.
That is not what you wrote - because that's not what the C means.
I forgot the scaling of 'c'.
As you pointed out yourself, C is not assembly. It does not have a
direct meaning like this.
I don't understand what else it can possibly mean.
Get the value of 'b',
whatever it happens to be, add the value of 'c'
scaled by 8, and store the result it into 'a'. The only things to
consider are that some intermediate results may lose the top bits.
Is 'a = b' equally undefined? If so that C is even crazy than I'd thought.
Stress tests of tools can be useful. I would not say something like
this is useful as a compilation benchmark - I want my tools to be fast
enough for practical use on the real code I write, and don't care how
slow they are for totally meaningless and unrealistic code.
Meaningless and unrealistic are what stress tests and benchmarks are!
But they can also give useful insights, highlight shortcomings, and can
be used to compare implementations.
I think if I used a real program such as sqlite3.c, you still wouldn't
care about my results.
But if I were writing a tool, I'd like to know how well it handled
extreme cases. (Sometimes generated C code has functions with huge
numbers of simple lines, totally unlike code that anyone would write
by hand.
It would not have pointless repetition of lines, however.)
Then it becomes much, much harder to have a simple test that can used
for practically any language.
As a matter of interest, I tried 1 million lines of 'a=b+c*d' now. These
are some results:
gcc -O0 560 seconds
Tiny C 1.7 seconds
bcc 2.0 seconds
mm 1.9 seconds (non-C); both these run unoptimised code
gcc likely uses some sort of SSA representation, meaning a new variable
for each intermediate result. Here it probably needs 5 million intermediates.
(From memory, it is faster when using optimise flags, because it can eliminate 99.9999% of those assignments, so there is less code to
process later. You know when that happens as the resulting EXE is too
small to be feasible.)
Here's an interesting one:
void F(){
L1:
L2:
...
L1000000:
;
}
Both bcc and tcc crash instantly, because in C, labels have a recursive definition in the grammar, and this cause a stack overflow.
But I can't tell you what gcc does, as I aborted it after 5 minutes.
(In my language, labels are just another statement, and this compiled in
1.5 seconds. However I had to increase the hashtable size as it doesn't
grow as needed.)
So, you have to spend time fooling the compiler. And then you are
never quite sure if it has left something out so that you're not
comparing like with like.
Sorry, I thought I was being helpful so that you would understand how
to get the results you are asking for from compilers. I am not
"fooling" the compiler, I am showing you how to ask the right questions.
The question was already posed as I wanted in my original fragment.
On 19/04/2026 01:35, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
On 18/04/2026 14:37, David Brown wrote:It's a common wrong notion.
On 17/04/2026 18:42, Bart wrote:
I've no idea. You just said it was uncommon to use C in thisYou didn't simply claim that people were using C as an intermediary
way. But every other amateur compiler project on Reddit forums
likes to use a C target.
language - you claimed they were doing so specifically for languages
that defined things like type punning, wrapping signed integer
arithmetic, and messing about with pointers.
The broader picture is being forgotten. The thread is partly about C
being a 'portable assembler', and this is a common notion.
One person here recently claimed that C is a kind of assembly
language.
'C being portable assembly' keeps coming up, not just here.
Yes, I know. There should have been one that is much better - a HLL,Given your habit of inventing your own languages and writing your
not the monstrosity that is LLVM. But it doesn't exist.
own
compilers, I'm surprised you haven't defined your own intermediate
language, something like LLVM IR but suiting your purposes better.
You're complaining about a problem that *you* might be in a position
to address.
If you read my post again, you'll see that I did exactly that.
On 19/04/2026 19:47, Bart wrote:
Get the value of 'b',
You can't do that. "b" has no value. "b" is indeterminate, and using
its value is UB - the code has no meaning right out of the gate.
When you use "b" in an expression, you are /not/ asking C to read the
bits and bytes stored at the address of the object "b". You are asking
for the /value/ of the object "b". How the compiler gets that value is
up to the compiler - it can read the memory, or use a stored copy in a register, or use program analysis to know what the value is in some
other way. And if the object "b" does not have a value, you are asking
the impossible.
Try asking a human "You have two numbers, b and c. Add them. What is
the answer?".
whatever it happens to be, add the value of 'c' scaled by 8, and store
the result it into 'a'. The only things to consider are that some
intermediate results may lose the top bits.
Is 'a = b' equally undefined? If so that C is even crazy than I'd
thought.
If "a" or "b" are indeterminate, then using them is undefined. I have
two things - are they the same colour? How is that supposed to make sense?
You keep thinking of objects like "b" as a section of memory with a bit pattern in it. Objects are not that simple in C - C is not assembly.
You mean when the object code is small because the compiler did a good job?
On 19/04/2026 11:17, David Brown wrote:
On 18/04/2026 17:08, Bart wrote:
(Yes, LLVM and the tools around it are big. It takes a lot of effort to >> make use of them, but you get a lot in return. A "little language" has
to grow to a certain size in numbers of toolchain developers and numbers
of toolchain users before it can make sense to move to LLVM.
Actually lots of small projects use LLVM.
But probably people don't realise it is like installing the engine from
a container ship into your small family car.
A C compiler expects code written in valid C. Compilers expect code to
be run - I don't think that is unreasonable.
What's not valid about 'a = b + c'?
And when I use a compiler
to look at generated assembly for some C code (and I do that quite
often), I am using C code that has a meaning if it were to be run.
I'm interested too, but if I compile this in godbolt:
void F() {
int a, b, c;
a = b + c * 8;
}
then all the C compilers I tried generated code at -O0 which kept those variables in memory.
What does the code look like when a/b/c are kept in registers? I've no
idea, because at soon as you try -O1 and above, the whole expression is elided.
If you stick 'static' in front, then the whole function disappears. This
is not very useful when trying to compare code generation across
compilers and languages!
If I do something meaningful with 'a' to keep the expression alive, and initialise b and c, then the whole expression is reduced to a constant.
What do you have to do see if the expression would be compiled to, for example, 'lea ra, [rb + rc*8]'?
On 19/04/2026 20:32, David Brown wrote:
On 19/04/2026 19:47, Bart wrote:
Get the value of 'b',
You can't do that. "b" has no value. "b" is indeterminate, and using
its value is UB - the code has no meaning right out of the gate.
When you use "b" in an expression, you are /not/ asking C to read the
bits and bytes stored at the address of the object "b". You are
asking for the /value/ of the object "b". How the compiler gets that
value is up to the compiler - it can read the memory, or use a stored
copy in a register, or use program analysis to know what the value is
in some other way. And if the object "b" does not have a value, you
are asking the impossible.
Try asking a human "You have two numbers, b and c. Add them. What is
the answer?".
You have two slates A and B which someone should have wiped clean then written a new number on each.
But that part hasn't been done; they each still have an old number from their last use.
You can still add them together, nothing bad will happen. It just may be
the wrong answer if the purpose of the exercise was to find the sum of
two specific new numbers.
But the purpose may also be see how good they are adding. Or in
following instructions.
whatever it happens to be, add the value of 'c' scaled by 8, and
store the result it into 'a'. The only things to consider are that
some intermediate results may lose the top bits.
Is 'a = b' equally undefined? If so that C is even crazy than I'd
thought.
If "a" or "b" are indeterminate, then using them is undefined. I have
two things - are they the same colour? How is that supposed to make
sense?
You keep thinking of objects like "b" as a section of memory with a
bit pattern in it. Objects are not that simple in C - C is not assembly.
Why ISN'T it that simple? What ghastly thing would happen if it was?
"b" will be some location in memory or it might be some register, and it WILL have a value. That value happens to be unknown until it is
initialised.
So accessing it will return garbage (unless you know exactly what you
are doing then it may be something useful).
My original example was something like 'a = b + c' (I think in my
language), converted to my IL, then expressed in very low-level C.
You were concerned that in that C, the values weren't initialised. How
would that have affected the code that C compiler generated from that?
It's starting to appear that the compiler is more of the problem!
Because mine would certainly not be bothered by it and nobody would be scratching their heads wondering what surpises the compiler might have
in store.
Would the compiler have been happier with this:
int a, b = F(), c = F();
a = b + c;
If so, then suppose F was this:
int F() {int x; return x;}
When the body of F is not visible, then that cannot possibly affect what
is generated for 'a = b + c'.
So I'm still interested in what possible reason the compiler might have
for generating code that is any different in the absence of
initialisation. Warn about it, sure, but why do anything else?
THIS is why I try to stay from using C intermediate code.
You mean when the object code is small because the compiler did a good
job?
It may do a good job of eliminating duplicate or redundant code. But
maybe you are measuring how well it copes with a certain quantity of
code, which when synthesised may well be duplicate or redundant.
Then it is not helpful that it discards most of it. How is that supposed
to give an accurate meaure of how well it does when it really does need
to do it all?
It's like comparing car A and car B over a course (we've been here
before), but A's driver is using clever shortcuts. Or maybe he doesn't
even bother going anywhere if the course is circular.
That will give A an unfair advantage, and a misleading result. It could
be that B is actually faster, so somebody deciding to buy A based on
this test is going to be disappointed!
On 20/04/2026 01:36, Bart wrote:
In C, "b" is not any specific place. In optimising compilers, the implementation is unlikely to exist at all until it has a real value,
C programmers don't need to scratch their heads. They simply have to
write meaningful code. It's not rocket science.
(As a C implementer, you should have a better understanding of these
details than C programmers usually need.)
You are arguing that C is difficult to use as an intermediate language because you don't know what happens when you generate shite sort-of C code? Just generate valid C code that has meaning, and stop worrying.
So gcc is a more powerful compiler than yours, and that's not fair?
On 20/04/2026 07:25, David Brown wrote:
On 20/04/2026 01:36, Bart wrote:
In C, "b" is not any specific place. In optimising compilers, the
implementation is unlikely to exist at all until it has a real value,
It should come into existence when it is referenced. Then it will have a value.
Here for example:
int b;
if (rand()&1) b=0;
printf("%d", b);
'b' may or may not be initialised. But I expect the 'b' used in that
last line to exist somewhere and for the generated code to access that location. I'd also expect the same if the assignment was commented out.
Someone could write some actual code like my example, with an
unconditional assignment, but for various reasons has to temporarily
comment out that assignment.
It might be a function that is not called. Or it might be in a program
that is not run at all, because the developer is sorting out some build issue.
But according to you, that part of the code is UB, whether the program
is ever run or not, and so the whole thing is undefined.
That would be ludicrous.
C programmers don't need to scratch their heads. They simply have to
write meaningful code. It's not rocket science.
(As a C implementer, you should have a better understanding of these
details than C programmers usually need.)
I implement it in a common sense manner.
I don't say, Ah, 'x' might not
be initialised at this point, so it is UB, therefore I don't need to
bother compiling these remaining 100 lines, then the program will be
smaller and faster!
You are arguing that C is difficult to use as an intermediate language
because you don't know what happens when you generate shite sort-of C
code? Just generate valid C code that has meaning, and stop worrying.
My language allows you to do this:
int a, b
a := b
It is well-defined in the language, and I know it is well defined on all
my likely targets. (I think we're back where we started!)
However, we are generating C via an IL. The IL will be something like this:
local a
local b
...
load b
store a
Again, it is perfectly well-defined. Whatever bit-pattern in b is
transfered to a. In assembly, the same thing: b will be in memory or register.
All well and good. UNTIL we decided to involve C! Let's say everything
has u64 type:
u64 R1; # represents the one stack slot used
u64 a, b; # our local variables
Now we need to translate that load and store:
R1 = b;
a = R1;
This looks really easy, but no, C just has to make it UB.
So, how do you suggest this is fixed? Do I now have to do an in-depth analysis of that IL to figure out whether 'b' was initialised at this
point (there might be 100 lines of IL code in-between, including
conditional code and loops). Even if I find out it wasn't, what do I do about it?
Maybe a simpler solution: zero all locals whether necessary or not:
u64 a = 0, b = 0;
However, the point of using C may be to get a faster program. I don't
want unnecessary assignments which the C compiler may or may not be able
to elide.
Especially when declaring entire arrays or structs:
struct $B1 dd = {0};
struct $B1 ee = {0};
In any case, there will be a million other things that are probably UB
as well. How far do you go in trying to appease C?
So, C as an intermediate is undesirable, but the real reason is the compilers. A simple, dumb compiler like bcc or tcc is preferable (but
mine doesn't optimise and is for Windows, and tcc has its own problems).
This is why we needed C--.
So gcc is a more powerful compiler than yours, and that's not fair?
If it is effectively cheating at benchmarks, then no it isn't.
If you take recursive Fibonacci, then fib(N) is expected to execute
about 2*fib(N) function calls (for versions that start 'if (N<3) return
1').
However, using -Os or O1, gcc's code only does half of those calls. And
with -O2/-O3, only 5%, via aggressive inlining and TOC optimising.
So, no, that's not a fair comparison.
Fibonacci is supposed to be a
measure of how many calls/second a language implementation can make, and
the figure you'd get with gcc can be misleading.
(It might as well use memoisation, or be clever enough to convert to to iterative form, then it can report an infinite number of calls per
second. That's really useful!
So, for gcc and Fibonacci, I now use -fno-inline and another to turn off TCO.)
On 19/04/2026 20:32, David Brown wrote:
On 19/04/2026 19:47, Bart wrote:
Get the value of 'b',
You can't do that. "b" has no value. "b" is indeterminate, and using
its value is UB - the code has no meaning right out of the gate.
When you use "b" in an expression, you are /not/ asking C to read the
bits and bytes stored at the address of the object "b". You are asking
for the /value/ of the object "b". How the compiler gets that value is
up to the compiler - it can read the memory, or use a stored copy in a
register, or use program analysis to know what the value is in some
other way. And if the object "b" does not have a value, you are asking
the impossible.
Try asking a human "You have two numbers, b and c. Add them. What is
the answer?".
You have two slates A and B which someone should have wiped clean then written a new number on each.
But that part hasn't been done; they each still have an old number from their last use.
You can still add them together, nothing bad will happen. It just may be
the wrong answer if the purpose of the exercise was to find the sum of
two specific new numbers.
But the purpose may also be see how good they are adding. Or in
following instructions.
whatever it happens to be, add the value of 'c' scaled by 8, and store
the result it into 'a'. The only things to consider are that some
intermediate results may lose the top bits.
Is 'a = b' equally undefined? If so that C is even crazy than I'd
thought.
If "a" or "b" are indeterminate, then using them is undefined. I have
two things - are they the same colour? How is that supposed to make sense? >>
You keep thinking of objects like "b" as a section of memory with a bit
pattern in it. Objects are not that simple in C - C is not assembly.
Why ISN'T it that simple? What ghastly thing would happen if it was?
"b" will be some location in memory or it might be some register, and it WILL have a value. That value happens to be unknown until it is initialised.
So accessing it will return garbage (unless you know exactly what you
are doing then it may be something useful).
My original example was something like 'a = b + c' (I think in my
language), converted to my IL, then expressed in very low-level C.
You were concerned that in that C, the values weren't initialised. How
would that have affected the code that C compiler generated from that?
So I'm still interested in what possible reason the compiler might have
for generating code that is any different in the absence of
initialisation. Warn about it, sure, but why do anything else?
THIS is why I try to stay from using C intermediate code.
You mean when the object code is small because the compiler did a good job?
It may do a good job of eliminating duplicate or redundant code. But
maybe you are measuring how well it copes with a certain quantity of
code, which when synthesised may well be duplicate or redundant.
Then it is not helpful that it discards most of it.
How is that supposed
to give an accurate meaure of how well it does when it really does need
to do it all?
On 20/04/2026 13:45, Bart wrote:
I implement it in a common sense manner.
"Common sense" is another way of saying "I don't know the actual rules".
It is a shame if we are back where we started - because you started out wrong. You started out treating C like assembly, and you haven't shown
you understand the difference.
The semantics of your language are important to you - but not to C. The semantics of whatever targets you use are important to the back-end of
the C compiler, but not to the C language or its semantics.
If it is effectively cheating at benchmarks, then no it isn't.
Again - you are asking the wrong questions in your benchmarks.
You /think/ you are asking the car to drive round a loop. But you what
you are writing is asking the car to go from A to B. And then you
complain when gcc figures out it can drive directly from A to B without going through the loop.
If you want to benchmark a compiler going through the whole path, write
that in the code. Force observable behaviour at the start (with a
volatile access), use code lines that depend on that input and previous lines, and observe the behaviour at the end (with a volatile write, or a printf, or something else /real/).
int fibonacci(int n)
{
if (n <= 2) return 1;
return fibonacci(n - 1) + fibonacci(n - 2);
}
No, I don't expect the generated code to have 2 * fib(n) recursive
calls. I expect the code to give the same results as if it had made
those calls.
If a compiler can optimise in such a way as to reduce the number of
calls, that's great.
So, for gcc and Fibonacci, I now use -fno-inline and another to turn
off TCO.)
And does that give you any kind of information that is useful for any purpose? I suspect not.
On 20/04/2026 14:02, David Brown wrote:
On 20/04/2026 13:45, Bart wrote:
I implement it in a common sense manner.
"Common sense" is another way of saying "I don't know the actual rules".
It means doing the obvious thing with no unexpected surprises.
It is a shame if we are back where we started - because you started
out wrong. You started out treating C like assembly, and you haven't
shown you understand the difference.
So why should I listen to you, and why should I care?
Bart <bc@freeuk.com> wrote:effort to
On 19/04/2026 11:17, David Brown wrote:
On 18/04/2026 17:08, Bart wrote:
(Yes, LLVM and the tools around it are big. It takes a lot of
numbersmake use of them, but you get a lot in return. A "little language" has
to grow to a certain size in numbers of toolchain developers and
of toolchain users before it can make sense to move to LLVM.
Actually lots of small projects use LLVM.
But probably people don't realise it is like installing the engine from
a container ship into your small family car.
AFAICS people are proud using powerful engine and tend to ignore disadvantages.
In non-C context co-worker in a project uses
"standard" documentation tool to generate tens of megabytes of
HTML documentation. This needs something like 2 min 30 seconds,
and tens of megabytes extra packages. I wrote few hundreds lines
of code to do almost the same thing but directly. Amount of
specialized code is similar to what is needed to interface with
external package. My code is doing the job in about 1.5 sec.
His reaction was essentially: "Why are you wasting time when the
code works". Actually, there were differing assumptions: he
assumed that code will be run once a few months, so performance
would not matter at all. I would run the code as part of build
and test cycle and 2 min 30 seconds per cycle matters a lot.
The external package has a lot of features, for example it
support tens (maybe hundreds) color schemes. But we need only
one color scheme.
Anyway, people belive that by using major "standard" package
they will somewhat get superior features.
What's not valid about 'a = b + c'?It is incomplete. Why do not you use eqiuvalent function:
int
add(int b, int c) {
return b + c;
}
Bart <bc@freeuk.com> wrote:
On 19/04/2026 20:32, David Brown wrote:
On 19/04/2026 19:47, Bart wrote:
Get the value of 'b',
You can't do that. "b" has no value. "b" is indeterminate, and using >>> its value is UB - the code has no meaning right out of the gate.
When you use "b" in an expression, you are /not/ asking C to read the
bits and bytes stored at the address of the object "b". You are asking >>> for the /value/ of the object "b". How the compiler gets that value is >>> up to the compiler - it can read the memory, or use a stored copy in a
register, or use program analysis to know what the value is in some
other way. And if the object "b" does not have a value, you are asking >>> the impossible.
Try asking a human "You have two numbers, b and c. Add them. What is >>> the answer?".
You have two slates A and B which someone should have wiped clean then
written a new number on each.
But that part hasn't been done; they each still have an old number from
their last use.
You can still add them together, nothing bad will happen. It just may be
the wrong answer if the purpose of the exercise was to find the sum of
two specific new numbers.
But the purpose may also be see how good they are adding. Or in
following instructions.
whatever it happens to be, add the value of 'c' scaled by 8, and store >>>> the result it into 'a'. The only things to consider are that some
intermediate results may lose the top bits.
Is 'a = b' equally undefined? If so that C is even crazy than I'd
thought.
If "a" or "b" are indeterminate, then using them is undefined. I have
two things - are they the same colour? How is that supposed to make sense?
You keep thinking of objects like "b" as a section of memory with a bit
pattern in it. Objects are not that simple in C - C is not assembly.
Why ISN'T it that simple? What ghastly thing would happen if it was?
"b" will be some location in memory or it might be some register, and it
WILL have a value. That value happens to be unknown until it is initialised. >>
So accessing it will return garbage (unless you know exactly what you
are doing then it may be something useful).
My original example was something like 'a = b + c' (I think in my
language), converted to my IL, then expressed in very low-level C.
You were concerned that in that C, the values weren't initialised. How
would that have affected the code that C compiler generated from that?
You look at trivial example, where AFAICS the best answer is:
"Compiler follows general rules, why should it make exception for
this case?". Note that in this trivial case "interesting"
behaviour could happen on exotic hardware (probably disallowed
by C23 rules, but AFAICS legal for earlier C versions).
Namely, consder machine where one bit pattern is illegal
and causes exception at runtime when read from memory by
integer load. Compiler could "initialize" all otherwise
uninitialized variables with this bit pattern. So accessing
uninitialised integer variable would cause runtime exception.
If you look at more complex examples you may see why the rule
allows more efficient code on ordinary machines. Namely,
look at:
void
f() {
bool b;
printf("b is ");
if (b) {
printf("true\n");
}
if (!b) {
printf("false\n");
}
}
Compiler could contain function called 'known_false' and omit
code for conditional statement is condition (in our case 'b')
is known to be false. How compiler could know this? Simplest
case is when condition is a constant. But that is trivial case.
More interesting cases are when some earlier statement assigns
constant value to 'b'. But function may contain "interesting"
control, so determining which assigments are executed is tricky.
Instead, compiler probably would use some kind of approximation,
tracking possible values at different program points. Now,
according to your point of view, uninitialized variable would
mean "any value is possible". According to C rules uninitialized
variable can not occur in correct program, which means that
there must be assignment later and analyzing possible values
corrent statement is "no value". In the function above,
consistently propagating information according to your
rules means that in conditional 'b' can take any value, so
compiler must emit the code. Using C rules, 'b' has no
value, so can not be true and compiler can delete the conditional
(and the same for conditional involving '!b').
This example
is still pretty simple, so you may think that your rules are
superior.
But imagine that between declaration of 'b' and
conditional there is some hairy code. This code initializes
'b' to false, but only if some conditions are satisfied.
Now, consider situation were in fact 'b' is always initialized,
but compiler is too limited to see this. Under C rules
compiler will assume that 'b' is initialized and conclude
that it is false, allowing it to delete the conditional.
Under your rules compiler would have to consider possibility
that 'b' is uninitialized and keep the conditional.
So I'm still interested in what possible reason the compiler might have
for generating code that is any different in the absence of
initialisation. Warn about it, sure, but why do anything else?
As explained, under C rules compiler can generate more efficient
code.
It may do a good job of eliminating duplicate or redundant code. But
maybe you are measuring how well it copes with a certain quantity of
code, which when synthesised may well be duplicate or redundant.
I very much want my compiler backend to eliminate duplicate or
redundant code inserted by front end.
Then it is not helpful that it discards most of it.
If you use C compiler as a backed it is quite helpful.
On 20/04/2026 07:25, David Brown wrote:
On 20/04/2026 01:36, Bart wrote:
In C, "b" is not any specific place. In optimising compilers, the
implementation is unlikely to exist at all until it has a real
value,
It should come into existence when it is referenced. Then it will have
a value.
Here for example:
int b;
if (rand()&1) b=0;
printf("%d", b);
'b' may or may not be initialised. But I expect the 'b' used in that
last line to exist somewhere and for the generated code to access that location. I'd also expect the same if the assignment was commented
out.
So gcc is a more powerful compiler than yours, and that's not fair?
If it is effectively cheating at benchmarks, then no it isn't.
If you take recursive Fibonacci, then fib(N) is expected to execute
about 2*fib(N) function calls (for versions that start 'if (N<3)
return 1').
However, using -Os or O1, gcc's code only does half of those
calls. And with -O2/-O3, only 5%, via aggressive inlining and TOC
optimising.
So, no, that's not a fair comparison. Fibonacci is supposed to be a
measure of how many calls/second a language implementation can make,
and the figure you'd get with gcc can be misleading.
(It might as well use memoisation, or be clever enough to convert to
to iterative form, then it can report an infinite number of calls per
second. That's really useful!
So, for gcc and Fibonacci, I now use -fno-inline and another to turn
off TCO.)
So why should I listen to you, and why should I care?
Bart <bc@freeuk.com> writes:
[...]
So why should I listen to you, and why should I care?
I don't know, why should you?
You obviously care a great deal, or you wouldn't spend so much
time arguing.
Bart <bc@freeuk.com> writes:
Yes, that's really useful!
So, for gcc and Fibonacci, I now use -fno-inline and another to turn
off TCO.)
If I write this program:
#include <stdio.h>
int fib(int n) {
if (n <= 1) {
return 1;
}
else {
return fib(n-2) + fib(n-1);
}
}
int main(void) {
printf("%d\n", fib(10));
}
the implementation's job is to generate code that prints "89".
If it's able to do so by replacing the whole thing with `puts("89");`
*that's a good thing*. That's not cheating. That's good code
generation.
If you want to write a benchmark that avoids certain optimizations,
you need to write it carefully so you get the code you want.
On 20/04/2026 18:48, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
Yes, that's really useful!
So which implementation is faster at actually doing function calls? And
how many calls were actually made?
the implementation's job is to generate code that prints "89".
In that case, why bother using very slow recursive Fibonacci?
"(Why would recursive Fibonacci even ever be used as a benchmark when
the iterative method be much faster in every case?)"
I will give you the answer: it is to compare how implementations cope
with very large numbers of recursive function calls. So if one finds a
way to avoid doing such calls, then it is not a fair comparison.
Nobody is interested in the actual output, other than checking it worked >correctly. But in how long it took.
On 20/04/2026 14:49, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
On 19/04/2026 20:32, David Brown wrote:
On 19/04/2026 19:47, Bart wrote:
You look at trivial example, where AFAICS the best answer is:
"Compiler follows general rules, why should it make exception for
this case?". Note that in this trivial case "interesting"
behaviour could happen on exotic hardware (probably disallowed
by C23 rules, but AFAICS legal for earlier C versions).
I don't care about exotic hardware. I don't see why its needs should
impact the 99.99% (if not 100%) of actual hardware that people use.
It ought to have made more things implementation defined.
Namely, consder machine where one bit pattern is illegal
and causes exception at runtime when read from memory by
integer load. Compiler could "initialize" all otherwise
uninitialized variables with this bit pattern. So accessing
uninitialised integer variable would cause runtime exception.
I acknowledge this somewhere, for the case of floating point numbers. A
poor implementation may have problems. But in the case of XMM registers
on x64, they seem to tolerate arbitrary bit patterns used in floating
point operations.
At worst you end up with a NaN result or something.
And obviously, it is inadvisable to dereference a unknown pointer value.
But you can give all this advice, issue warnings etc, and still not
seize upon such UB as an excuse to invalidate the rest of the program or
for a compiler to choose to do whatever it likes.
If you look at more complex examples you may see why the rule
allows more efficient code on ordinary machines. Namely,
look at:
void
f() {
bool b;
printf("b is ");
if (b) {
printf("true\n");
}
if (!b) {
printf("false\n");
}
}
Compiler could contain function called 'known_false' and omit
code for conditional statement is condition (in our case 'b')
is known to be false. How compiler could know this? Simplest
case is when condition is a constant. But that is trivial case.
More interesting cases are when some earlier statement assigns
constant value to 'b'. But function may contain "interesting"
control, so determining which assigments are executed is tricky.
Instead, compiler probably would use some kind of approximation,
tracking possible values at different program points. Now,
according to your point of view, uninitialized variable would
mean "any value is possible". According to C rules uninitialized
variable can not occur in correct program, which means that
there must be assignment later and analyzing possible values
corrent statement is "no value". In the function above,
consistently propagating information according to your
rules means that in conditional 'b' can take any value, so
compiler must emit the code. Using C rules, 'b' has no
value, so can not be true and compiler can delete the conditional
(and the same for conditional involving '!b').
If I apply gcc-O2 to your example, it prints that b is false without actually testing the value. If I get to return the value of b, it
returns a hard-coded zero.
This example
is still pretty simple, so you may think that your rules are
superior.
They're certainly simpler. I can't predict what gcc will do. And
whatever it does, can differ depending on options.
It comes down to the user's intention: was the non-initialisation an oversight? Did they know that only one of those conditionals can be true?
My compilers don't try and double-guess the user: they will simply do
what is requested.
So I'm still interested in what possible reason the compiler might have
for generating code that is any different in the absence of
initialisation. Warn about it, sure, but why do anything else?
As explained, under C rules compiler can generate more efficient
code.
A lot of it seems to be for dodgy-looking code. I tend to rely or assume sensibly written programs. That seems to go a long way!
It may do a good job of eliminating duplicate or redundant code. But
maybe you are measuring how well it copes with a certain quantity of
code, which when synthesised may well be duplicate or redundant.
I very much want my compiler backend to eliminate duplicate or
redundant code inserted by front end.
Then it is not helpful that it discards most of it.
If you use C compiler as a backed it is quite helpful.
My current transpiled C is full of redundant intermediates like your example, and such optimising is necessary to get reasonble size and speed.
On 20/04/2026 18:50, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
So why should I listen to you, and why should I care?I don't know, why should you?
You obviously care a great deal, or you wouldn't spend so much
time arguing.
I first posted this to show how casts are extensively used in my
generated C:
i64 a;
i64 b;
i64 c;
asi64(R1) = b;
asi64(R2) = c;
asi64(R1) += asi64(R2);
a = asi64(R1);
This was generated from this fragment HLL code: "a := b + c". There is
no initialisation because that is rarely done when testing compiler code-generation. Examples are kept as simple as possible, and
initialisation would have absolutely no bearing on the matter.
But somebody said this was UB. Now even though uninitialised variables
are not used in my production programs (AFAIK), I disagreed about this matter.
On 20/04/2026 18:48, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
Yes, that's really useful!
So which implementation is faster at actually doing function calls?
And how many calls were actually made?
So, for gcc and Fibonacci, I now use -fno-inline and another to turnIf I write this program:
off TCO.)
#include <stdio.h>
int fib(int n) {
if (n <= 1) {
return 1;
}
else {
return fib(n-2) + fib(n-1);
}
}
int main(void) {
printf("%d\n", fib(10));
}
the implementation's job is to generate code that prints "89".
In that case, why bother using very slow recursive Fibonacci?
Presumably the expection is that it would actually be using recursion.
I already posed this question:
"(Why would recursive Fibonacci even ever be used as a benchmark when
the iterative method be much faster in every case?)"
I will give you the answer: it is to compare how implementations cope
with very large numbers of recursive function calls. So if one finds a
way to avoid doing such calls, then it is not a fair comparison.
If you want to write a benchmark that avoids certain optimizations,
you need to write it carefully so you get the code you want.
It's not possible to do that with Fibonacci without making it
unrecognisable and so a poor comparison for other reasons.
If testing with gcc now, I'd use these two options:
-fno-inline
-fno-optimize-sibling-calls
On my PC, gcc-O2 code than manages some 560M calls/second running
Fibonacci, rather than a misleading 1270M calls/second
See also:
https://github.com/drujensen/fib/issues/119
Referenced from: https://github.com/drujensen/fib
On 20/04/2026 19:34, Bart wrote:
And obviously, it is inadvisable to dereference a unknown pointer value.
Okay, so you think it is "obvious" that you should avoid doing some
things that are explicitly UB, and yet you think it is "obvious" that
you should be able to do other types of UB. Who makes up those
"obvious" rules? Why do you think such inconsistency is a good idea?
No, your rules are far from simple - you have internal ideas about what kinds of UB you think should produce certain results, and which should
not, and how compilers should interpret things that have no meaning in
C. That's not simple.
I can predict what gcc will do,
The compiler can quite reasonably generate all sorts of different code
here. A different version, or a different compiler, or on a different
day, you could get different results. That's life when you use UB.
What else could the non-initialisation have been other than an oversight
- a bug in their code due to ignorance, or just making a mistake as we
all do occasionally? Do you think it is likely that someone
intentionally and knowingly wrote incorrect code?
My compilers don't try and double-guess the user: they will simply do
what is requested.
No, guessing the user's intentions is /exactly/ what your compiler is
trying to do. It is trying to guess what the programmer wrote even
though the programmer made a mistake and wrote something that does not
make sense.
A good
compiler will work with sensibly written programs, yet you insist on
writing C that is not sensibly written.
Oh, so you want gcc to optimise away redundant code when your transpiler generates redundant code, but it is "cheating" if it optimises away redundant code in bizarre tests because your C compiler can't do that?
Actually _EVERYBODY_ is interested in the actual output, and NOBODY is interested in how long it took.
The 5 people in the world that think in terms of random irrelevent
benchmarks are the only people would even think to care.
Bart <bc@freeuk.com> writes:
On 20/04/2026 18:50, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
So why should I listen to you, and why should I care?I don't know, why should you?
You obviously care a great deal, or you wouldn't spend so much
time arguing.
I first posted this to show how casts are extensively used in my
generated C:
i64 a;
i64 b;
i64 c;
asi64(R1) = b;
asi64(R2) = c;
asi64(R1) += asi64(R2);
a = asi64(R1);
This was generated from this fragment HLL code: "a := b + c". There is
no initialisation because that is rarely done when testing compiler
code-generation. Examples are kept as simple as possible, and
initialisation would have absolutely no bearing on the matter.
But somebody said this was UB. Now even though uninitialised variables
are not used in my production programs (AFAIK), I disagreed about this
matter.
The point is not that "somebody said" that this was UB.
I'm aware of your opinions about this, but will you acknowledge that
the standard actually says what it says? I'm not asking whether
you think the behavior should be undefined. I'm asking whether
you'll acknowledge that the ISO C standard says it's undefined.
Yes or no.
On Thu, 2026-04-16 at 22:14 +0800, wij wrote:Therefore, from this point of view, C is also a formal language, and
On Thu, 2026-04-16 at 18:42 +0800, wij wrote:
On Wed, 2026-04-15 at 19:04 -0700, Keith Thompson wrote:
wij <wyniijj5@gmail.com> writes:
On Wed, 2026-04-15 at 17:14 -0700, Tim Rentsch wrote:
wij <wyniijj5@gmail.com> writes:
[... comparing C and assembly language ...]
Gentlemen,
I understand the natural reaction to want to respond to the kind of statements being made in this thread. I hope y'all can resist this
natural reaction and not respond to people who persist in making arguments that are basically isomorphic to saying 1 equals 0.
Thank you for your assistance in this matter.
Maybe you are right. I say A is-a B, one persist to read A is (exactly) B.
I provide help to using assembly. One persist to read I persuade using
assembly and give up HLL. What is going on here?
You say that C is an assembly language. Nobody here thinks that you're *equating* C and assembly language. It's obvious that
there are plenty of assembly languages that are not C, and nobody
has said otherwise. I have no idea why you think anyone has that particular confusion.
At least one person has apparently interpreted your defense of
assembly language (that it isn't as scary as some think it is)
as a claim that we should program in assembly language rather
than in HLLs. You're right, that was a misinterpretation of what
you wrote. I considered mentioning that, but didn't bother.
The issue I've been discussing is your claim that C is an assembly language. It is not.
If I said C is assembly is in the sense that have at least shown in the last
post (s_tut2.cpp), where even 'instruction' can be any function (e.g. change
directory, copy files, launch an editor,...). And also, what is 'computation'
is demonstrated, which include suggestion what C is, essentially any program,
and in this sense what HLL is. Finally, it could demonstrate the meaning and
testify Church-Turing thesis (my words: no computation language, including
various kind of math formula, can exceeds the expressive power of TM).
It seem you insist C and assembly have to be exactly what your bible says. If
so, I would say what C standard (I cannot read it) says is the meaning of terminology of term in it, not intended to be anything used in any other situation.
I do not intend to post again in this thread until and unless you
post something substantive on that issue.
(continue)
IMO, C standard is like book of legal terms. Like many symbols in the header
file, it defines one symbol in anoter symbol. The real meaning is not fixed.
The result is you cannot 'prove' correctness of the source program, even consistency is a problem.
'Instruction' is low-level? Yes, by definition, but not as one might think. Instruction could refer to a processing unit (might be like the x87 math co-processor, which may even be more higher level to process expression,...)
As good chance of C is to find a good function that can be hardwired.
So, the basic feature of HLL is 'structured' (or 'nested') text which removes
labels. Semantics is inventor's imagination. So, avoid bizarre complexity, it
won't add express power to the language, just a matter of short or lengthy
expression of programming idea.
(Continue)
Thus, C is-a language for controlling hardware. Therefore, the term 'portable assembly' seems fit for this meaning. But on the other side, C needs to be user
friendly. But skipping the friend part, I think there should more, C could be
the foundation of forml system (particularily for academic uses). For example:
Case 1: "Σ(n=1,m) f(n)" should be defined as:
sum=0;
for(int n=1; n<=m; ++n) {
sum+=f(n)
}
By doing so, it is easier to deduce things from nested series.
Case 2: What if m=∞ ?
for(int n=1; ; ++n) {
sum+=f(n)
}
The infinity case has no consensus. At least, it demonstrates that 'infinity'
simply refers to an infinite loop. This leads to the long debate of 0.999....=? (0.999... will not terminates BY DEFINITION, no finite proof can
prove it equals to anything except you define it).... And what INF,INFINITY
should be in C.
Case 3: Proposition ∀x,P(x)::= P(x1)∧P(x2)∧..∧P(xn) (x∈{x1,x2,..})
bool f() { // f()= "∀x,P(x)"
for(int x=1; x<=S.size(); ++x) {
if(P(x)==false) {
return false;
}
}
return true;
};
Universal quantifier itself is also a proposition, therefore, from definition, its negation exists:
~Prop(∀x,P(x))= ~(P(x1)∧P(x2)∧..∧P(xn)= ~P(x1)∨~P(x2)∨..∨~P(xn)∨
= Prop(∃x,~P(x))
Math/logic has no such clear definition.
Multiple quantifiers (∀x∃y∃z) and its negation are thus easier to
understand and used in 'reasoning'.
Note: This leads to a case: if(a&&b) { /*...*/ }
I tends to think the omission of evaluation of b in case a==false
is not really optimizaion. It is the problem or definition of the
traditional logic.
So, don't make C too bizarre.
Bart <bc@freeuk.com> writes:
On 20/04/2026 18:48, Keith Thompson wrote:
Presumably the expection is that it would actually be using recursion.
That expectation was not expressed in the code.
"(Why would recursive Fibonacci even ever be used as a benchmark when
the iterative method be much faster in every case?)"
Because you want to measure the speed of function calls, of course.
I will give you the answer: it is to compare how implementations cope
with very large numbers of recursive function calls. So if one finds a
way to avoid doing such calls, then it is not a fair comparison.
Then write the code so the compiler can't eliminate the calls.
You want the compiler to work with one hand tied behind its
metaphorical back for the sake of "fairness". Not gonna happen.
If you ask me to go from point A to point B, if it's a few kilometers
away, I'll probably drive my car. If you intended it to be a
three-legged race, I'm not cheating *if you didn't tell me that*.
If testing with gcc now, I'd use these two options:
-fno-inline
-fno-optimize-sibling-calls
On my PC, gcc-O2 code than manages some 560M calls/second running
Fibonacci, rather than a misleading 1270M calls/second
It's misleading *to you*, because you (deliberately?) misinterpret
the results.
See also:
https://github.com/drujensen/fib/issues/119
Referenced from: https://github.com/drujensen/fib
Let me ask you a simple question. Given my fibonacci example,
if a compiler compiled it to the equivalent of `puts("89")`, would
that compiler fail to conform to the ISO C standard? If so, why?
Are you able to distinguish between "I dislike this requirement
in the standard" and "I deny that this requirement exists"?
On 20/04/2026 18:48, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
Yes, that's really useful!
So which implementation is faster at actually doing function calls? And
how many calls were actually made?
So, for gcc and Fibonacci, I now use -fno-inline and another to turn
off TCO.)
If I write this program:
#include <stdio.h>
int fib(int n) {
if (n <= 1) {
return 1;
}
else {
return fib(n-2) + fib(n-1);
}
}
int main(void) {
printf("%d\n", fib(10));
}
the implementation's job is to generate code that prints "89".
In that case, why bother using very slow recursive Fibonacci?
Presumably the expection is that it would actually be using recursion.
I already posed this question:
"(Why would recursive Fibonacci even ever be used as a benchmark when
the iterative method be much faster in every case?)"
I will give you the answer: it is to compare how implementations cope
with very large numbers of recursive function calls. So if one finds a
way to avoid doing such calls, then it is not a fair comparison.
Nobody is interested in the actual output, other than checking it worked correctly. But in how long it took.
If it's able to do so by replacing the whole thing with `puts("89");`
*that's a good thing*. That's not cheating. That's good code
generation.
What gcc-O2/-O3 actually does is to take the 5 lines of the Fibonacci function in C, which normally generates 25 lines of assembly, and turn
it into 270 lines of assembly.
Imagine such a ten-fold explosion in code size across a whole program,
for some tiny function which might not even ever be called as far as it knows. It's a little suspect; why these 5 lines over a 100Kloc program
for example?
On 20/04/2026 22:59, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
On 20/04/2026 18:48, Keith Thompson wrote:That expectation was not expressed in the code.
Presumably the expection is that it would actually be using recursion.
Other than clearly using recursion?
"(Why would recursive Fibonacci even ever be used as a benchmark whenBecause you want to measure the speed of function calls, of course.
the iterative method be much faster in every case?)"
That's ... a surprising response.
I assumed both you and David had absolutely no interest in such
matters and were not sympathetic to those who did.
The naive fib() benchmark tells me it can achieve 1.27 billion fib()
calls per second on my PC. Great!
In that case, I should also get 1.27 billion calls/second when I run
the fib1/fib2/fib3 version.
But, it doesn't; I get less than half that throughput. What's gone wrong?
According to you, gcc code should be able to have that throughput; why doesn't it?
Let me ask you a simple question. Given my fibonacci example,
if a compiler compiled it to the equivalent of `puts("89")`, would
that compiler fail to conform to the ISO C standard? If so, why?
Are you able to distinguish between "I dislike this requirement
in the standard" and "I deny that this requirement exists"?
I don't understand. I assume you know the answer, that a C compiler
can do whatever it likes (including emailing your source to a human accomplise and have them mail back a cheat like this).
My problem is doing fair comparisons between implementations doing the
same task using the same algorithm. And in the case of recursive
fibonacci, I showed above that the naive gcc results are unreliable.
Bart <bc@freeuk.com> wrote:
On 20/04/2026 18:48, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
Yes, that's really useful!
So which implementation is faster at actually doing function calls? And
how many calls were actually made?
Implementation that uses 0 instructions to implement function call
on normal machine will get shorter runtime, so clearly is faster
at doing function calls. This does not differ much from what
some modern processors do, namely move instruction may effectively
take 0 cycles. People used to old ways when confrotnted with
movq %rax, %rdx
expect that there will be actual movement of data, that instruction
must travel the whole CPU pipeline. But modern processors do
register renaming and after looking at this istruction may
simply note that to get value of %rdx one uses place storing
%rax (I am using AT&T convention so direction is from %rax to
%rdx) and otherwise drop the istructruction. Is the processor
cheating? Naive benchmark where moves are overrepresented may
execute unexpectedy fast, but moves are frequent in real
program so this gives valuable speedup for all programs.
Coming back to function calls, consider programmer who cares
very much about speed. He knows that his program would be
simpler and easier to write if he used a lot of small
functions. In old days he would worry about cost of
function calls and he proably would write much bigger and
complicated functions to get good speed. But if cost of
function call is 0 he can freely use small functions, without
worrying about cost of calls.
I will give you the answer: it is to compare how implementations cope
with very large numbers of recursive function calls. So if one finds a
way to avoid doing such calls, then it is not a fair comparison.
Well, Fibonacci and similar functions have limited use.
So the
real question is what is the cost of function calls in actual
programs. For calls to small non-recursive function cost is
close to 0. Recursion increases makes optimization more tricky,
so increases cost. But still, in practice cost is lower than
one could naively expect.
Concerning fairness, AFAIK gcc optimization were developed to
speed up real programs. They speed up Fibonacci basically as
a side effect.
So IMO it is fair: compier that can not speed
up calls in Fibonacci probably will have trouble speeding up
calls at least in some real programs.
| Sysop: | DaiTengu |
|---|---|
| Location: | Appleton, WI |
| Users: | 1,113 |
| Nodes: | 10 (0 / 10) |
| Uptime: | 492402:19:14 |
| Calls: | 14,249 |
| Files: | 186,315 |
| D/L today: |
84 files (21,992K bytes) |
| Messages: | 2,515,138 |