There's also the realization that computer memory except for a few >specialized Forth chips is always made from RAM. So ideological
devotion to a pure stack VM seems to pass up perfectly good hardware >capabilities.
Gforth does support address-like locals if you want to use them.
With competent Forth compilers, the machine code is 1) the same when
using stack operations, when using the return stack, or when using
locals
If you want to use a language that is "ideologically devoted" to the >architecture, maybe you shouldn't use Forth at all - and stick with C.
I know there are situations when there are six values on the data stack
and four on the return stack which leave you with few other options. But
you can always use vanilla variables or an extra stack (which is trivial
to implement) to remedy that.
Using Forth means being resourceful. Not to choose the most convenient
and lazy solution imaginable.
I don't see anything about C that is closer to the hardware than Forth
is, and I think that both languages are about equally '"ideologically devoted" to the architecture'. In particular, a C local variable is
no closer to a register (the most efficient hardware feature for
storing data) than a stack item or return stack item is, and register allocation of any of the three is similarly difficult...
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
With competent Forth compilers, the machine code is 1) the same when
using stack operations, when using the return stack, or when using
locals
"Competent Forth compilers" there describes what by Forth standards
would be called quite fancy optimizing compilers ("analytic compilers").
They are a significant technical feat and there aren't that many of
them. Traditionally Forth has been implemented as simple interpreters.
r 1->0 third 1->2 >l >l 1->1 dup 1->1mov -$08[r14],r13 mov r15,$10[r10] >l 1->1 mov [r10],r13
2->1 add r14,$08 mov rax,rbp mov rbx,[r14]mov -$08[r10],r15 mov rax,[rbx] lea rbp,-$08[rbp] add r14,$08
In that case, a pure stack VM seems to ignore capabilities of the
underlying hardware. Particularly, the the stack's memory actually
being RAM.
Doesn't PICK go back to the earliest days of Forth, as a way
to bypass the limitation?
I believe early C compilers didn't attempt much if any register
allocation.
The
difference was that the C compiler generated straightforward assembly
code to access those variables even when they were in the stack
interior. You didn't have to use ROT or juggle stuff to the R stack to
get to the inner elements.
Forth for whatever reason
chose strict stack discipline (with some loopholes like PICK). I
understand wanting to stay with purity of a model, but a more >hardware-sympathetic model would have been "stack implemented in RAM".
So I still don't understand the benefit of the "pure abstract stack" >approach, other than for a few weird special CPU's.
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
I don't see anything about C that is closer to the hardware than Forth
is, and I think that both languages are about equally '"ideologically
devoted" to the architecture'. In particular, a C local variable is
no closer to a register (the most efficient hardware feature for
storing data) than a stack item or return stack item is, and register
allocation of any of the three is similarly difficult...
I believe early C compilers didn't attempt much if any register
allocation. You could say "register int x" to manually assign a
register to x if one was available. You were limited to 2 or 3 of those
on the PDP-11. Local variables in C otherwise lived in the stack. The >difference was that the C compiler generated straightforward assembly
code to access those variables even when they were in the stack
interior. You didn't have to use ROT or juggle stuff to the R stack to
get to the inner elements.
In assembler, you could also program in a stack-oriented style yet >straightforwardly access the inner elements. Forth for whatever reason
chose strict stack discipline (with some loopholes like PICK). I
understand wanting to stay with purity of a model, but a more >hardware-sympathetic model would have been "stack implemented in RAM".
So I still don't understand the benefit of the "pure abstract stack" >approach, other than for a few weird special CPU's.
locals
with without ratio
max 3.56us 2.69us 1.32
strcmp 83.20us 70.50us 1.18
- anton
String handling and move operation are the exception, because
they are both simpler and faster in low level.
Simpler is the argument (especially for i86).
Faster is the bonus.
Hans Bezemer <the.beez.speaks@gmail.com> writes:
I don't see anything about C that is closer to the hardware than Forth
is, and I think that both languages are about equally '"ideologically devoted" to the architecture'. In particular, a C local variable is
no closer to a register (the most efficient hardware feature for
storing data) than a stack item or return stack item is, and register allocation of any of the three is similarly difficult (with big
differences in difficulty between solutions that provide some register allocation to those that are so reliable that you usually count on
them).
Using Forth means being resourceful. Not to choose the most convenient
and lazy solution imaginable.
According to <https://www.dictionary.com/browse/resourceful>:
|able to deal skillfully and promptly with new situations,
|difficulties, etc.
Forth systems that do not implement locals are not a new situation.
So do you mean to say that it is a difficulty?
But blaming the programmer for the system implementor's failings is a
tactic used widely by system implementors (in the C world as well as
in the Forth world).
(..) and they often find some arguments that appeal to
elitism (i.e., only the chosen ones can use this programming language
for the elite as it should be used, and the others should program in
Python or "should never have been allowed to touch a keyboard" (Ulrich Drepper).
In any case, why should it be better to use an inconvenient solution
that requires more work rather than a convenient solution that
requires less work (i.e., is lazy)?
For me virtues in programming are to produce correct code, to produce
it quickly, the code should use the resources economically (which does
not mean that saving a few bytes on a machine with GBs of memory is
virtuos), and the code should be readable and maintainable.
albert@spenarnc.xs4all.nl writes:
String handling and move operation are the exception, because
they are both simpler and faster in low level.
Simpler is the argument (especially for i86).
Faster is the bonus.
In other words, Forth without locals is not well suited for words
that have so much active data. That is also reflected in hardware
designed for Forth, which got additional registers like A or B (or
additional capabilities for the top of the return stack register R),
which make it simpler and faster to implement such words.
A definition of STRCMP in the paper is
: strcmp { addr1 u1 addr2 u2 -- n }
addr1 addr2
u1 u2 min 0
?do { s1 s2 }
s1 c@ s2 c@ - ?dup
if
unloop exit
then
s1 char+ s2 char+
loop
2drop
u1 u2 - ;
So in the loop we have a loop count (on the return stack), two cursors
(s1 and s2) into the compared strings, and within the loop body we additionally have the two characters, for a total of five live values,
three of which survive across iterations and are changed in every
iteration. One could implement it as
\ untested, and the following versions, too
: strcmp { addr1 u1 addr2 u2 -- n }
addr1 addr2
u1 u2 min 0
?do
addr1 i + c@ addr2 i + c@ - ?dup
if
unloop exit
then
loop
u1 u2 - ;
where only one of the values changes in each iteration, but now the ?DO...LOOP cannot be replaced with a version that does not store a
second value but counts down (or up) to 0, so now we have a total of 6
live values, four of which survive across iterations, and one is
changed on every iteration.
One can reduce this by one value by keeping one of the cursors in the
loop counter:
: strcmp {: addr1 u1 addr2 u2 -- n :}
addr2 addr1 - {: offset :}
u1 u2 min addr1 + addr1 ?do
i c@ i offset + c@ - ?dup
if
unloop exit
then
loop
u1 u2 - ;
So now we have five live values in the body of the loop at the same
time, three of which live across iterations, and one of which changes
in each iteration. Keeping the loop parameters separate significantly lessens the load on the data stack.
Let's see if we can eliminate the local from the loop body:
: strcmp {: addr1 u1 addr2 u2 -- n :}
addr2 addr1 - ( offset )
u1 u2 min addr1 + addr1 ?do ( offset )
dup i + c@ i c@ - ?dup
if
nip unloop exit
then
loop
drop u1 u2 - ;
That leaves stack purists with the task of eliminating the locals from
the prologue and epilogue of this word. Two items have to be stored
across the loop, or the difference could be computed speculatively and
only one item stored across the loop. And the computations before the
loop involve four values alive at the same time (fortunately addr2 is
does not live long). Let's see:
: strcmp {: addr1 u1 addr2 u2 -- n :}
rot 2dup - >r ( addr1 addr2 u1 u2 R: n1 )
min -rot over - ( u12 addr1 offset R: n1 )
swap rot bounds ( offset limit start R: n1 )
?do ( offset R: n1 loop-sys )
dup i + c@ i c@ - ?dup
if
nip unloop r> drop exit
then
loop
drop r> negate ;
As can be seen by the many stack comments, the stack load here is more
than I can easily deal with.
Maybe a stack purist can improve on that. But can he improve it
enough to make it as easy to understand as any of the versions with
locals?
- anton
On Sat, 25 Apr 2026 10:22:16 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
albert@spenarnc.xs4all.nl writes:
String handling and move operation are the exception, because
they are both simpler and faster in low level.
Simpler is the argument (especially for i86).
Faster is the bonus.
In other words, Forth without locals is not well suited for words
that have so much active data. That is also reflected in hardware
designed for Forth, which got additional registers like A or B (or
additional capabilities for the top of the return stack register R),
which make it simpler and faster to implement such words.
A definition of STRCMP in the paper is
: strcmp { addr1 u1 addr2 u2 -- n }
addr1 addr2
u1 u2 min 0
?do { s1 s2 }
s1 c@ s2 c@ - ?dup
if
unloop exit
then
s1 char+ s2 char+
loop
2drop
u1 u2 - ;
So in the loop we have a loop count (on the return stack), two cursors
(s1 and s2) into the compared strings, and within the loop body we
additionally have the two characters, for a total of five live values,
three of which survive across iterations and are changed in every
iteration. One could implement it as
\ untested, and the following versions, too
: strcmp { addr1 u1 addr2 u2 -- n }
addr1 addr2
u1 u2 min 0
?do
addr1 i + c@ addr2 i + c@ - ?dup
if
unloop exit
then
loop
u1 u2 - ;
where only one of the values changes in each iteration, but now the
?DO...LOOP cannot be replaced with a version that does not store a
second value but counts down (or up) to 0, so now we have a total of 6
live values, four of which survive across iterations, and one is
changed on every iteration.
One can reduce this by one value by keeping one of the cursors in the
loop counter:
: strcmp {: addr1 u1 addr2 u2 -- n :}
addr2 addr1 - {: offset :}
u1 u2 min addr1 + addr1 ?do
i c@ i offset + c@ - ?dup
if
unloop exit
then
loop
u1 u2 - ;
So now we have five live values in the body of the loop at the same
time, three of which live across iterations, and one of which changes
in each iteration. Keeping the loop parameters separate significantly
lessens the load on the data stack.
Let's see if we can eliminate the local from the loop body:
: strcmp {: addr1 u1 addr2 u2 -- n :}
addr2 addr1 - ( offset )
u1 u2 min addr1 + addr1 ?do ( offset )
dup i + c@ i c@ - ?dup
if
nip unloop exit
then
loop
drop u1 u2 - ;
That leaves stack purists with the task of eliminating the locals from
the prologue and epilogue of this word. Two items have to be stored
across the loop, or the difference could be computed speculatively and
only one item stored across the loop. And the computations before the
loop involve four values alive at the same time (fortunately addr2 is
does not live long). Let's see:
: strcmp {: addr1 u1 addr2 u2 -- n :}
rot 2dup - >r ( addr1 addr2 u1 u2 R: n1 )
min -rot over - ( u12 addr1 offset R: n1 )
swap rot bounds ( offset limit start R: n1 )
?do ( offset R: n1 loop-sys )
dup i + c@ i c@ - ?dup
if
nip unloop r> drop exit
then
loop
drop r> negate ;
As can be seen by the many stack comments, the stack load here is more
than I can easily deal with.
Maybe a stack purist can improve on that. But can he improve it
enough to make it as easy to understand as any of the versions with
locals?
I recently reviewed the string comparison for search-wordlist
and came up with the following
The string stored in the word header is already uppercased.
So string comparison will be case insensitive
: UC ( c -- c' ) \ uppercase char
dup $61 $7B within $20 and - ;
: NCOMP4 ( addr n addr' n' - f) \ 0 is match
dup >r
begin
rot = while \ str cstr
r> dup 1- >r
while \ str cstr
swap count uc \ cstr str' s1
rot count \ str' s1 cstr' c1
repeat
2drop r> drop 0 exit
then
2drop r> drop 1 ;
First iteration in the loop it does not compare chars but the length!
BR
Peter
On 25-04-2026 07:26, Anton Ertl wrote:[reinserted deleted, relevant context]
Hans Bezemer <the.beez.speaks@gmail.com> writes:
If you want to use a language that is "ideologically devoted" to the
architecture, maybe you shouldn't use Forth at all - and stick with C.
I don't see anything about C that is closer to the hardware than Forth
is, and I think that both languages are about equally '"ideologically
devoted" to the architecture'. In particular, a C local variable is
no closer to a register (the most efficient hardware feature for
storing data) than a stack item or return stack item is, and register
allocation of any of the three is similarly difficult (with big
differences in difficulty between solutions that provide some register
allocation to those that are so reliable that you usually count on
them).
Well, you're actually shooting at Paul Rubin - not at me. Thank you! I
take all the help I can get!
(..) and they often find some arguments that appeal to
elitism (i.e., only the chosen ones can use this programming language
for the elite as it should be used, and the others should program in
Python or "should never have been allowed to touch a keyboard" (Ulrich
Drepper).
It's your own pal Bernd that said: "A good programmer will write even
better code in Forth. A bad programmer will write abysmal code in Forth.
And I'm sorry to say - but most programmers are quite bad."
So, either you agree with him or we have an unfortunate departure of one
of the most foremost members of Gforth. Because this states - in no >uncertain words - that Forth programmers *ARE* elite.
It would be better to think deeply, find an original solution and learn.
Like Albert with his brilliant ;: word.
Hans Bezemer
albert@spenarnc.xs4all.nl writes:
String handling and move operation are the exception, because
they are both simpler and faster in low level.
Simpler is the argument (especially for i86).
Faster is the bonus.
In other words, Forth without locals is not well suited for words
that have so much active data. That is also reflected in hardware
designed for Forth, which got additional registers like A or B (or
additional capabilities for the top of the return stack register R),
which make it simpler and faster to implement such words.
A definition of STRCMP in the paper is
: strcmp { addr1 u1 addr2 u2 -- n }
addr1 addr2
u1 u2 min 0
?do { s1 s2 }
s1 c@ s2 c@ - ?dup
if
unloop exit
then
s1 char+ s2 char+
loop
2drop
u1 u2 - ;
- anton--
This one is about a third bigger than yours - if we disregard the "UC",
that is:
: comp
rot over - if drop 2drop true exit then
0 ?do
over i chars + c@ over i chars + c@ -
if drop drop unloop true exit then
loop drop drop false
;
In 4tH, it is even visually more compact:
: comp
rot over - if drop 2drop true ;then
0 ?do over i [] c@ over i [] c@ - if drop drop unloop true ;then loop
drop drop false
;
The extra length comes mainly from the three different possible exits:
- It's not the same size (first line);
- It's not the same content (exit within loop);
- It's the same thing (after loop).
I can't say I particularly like the use of "COUNT" here - because it
actually represents "C@+" - except for the first run. Neither am I very
happy with the BEGIN..WHILE..WHILE..REPEAT..THEN construct - but that's
not your fault ;-)
All that being said, I cannot deny it is a clever piece of code using
the full capabilities of the language, bravo!
Hans Bezemer--
...
In the case of Forth and locals this tactic has not worked very well,
so even Forth, Inc. (who have been the most vocal among the commercial
Forth providers about their dislike of locals) have implemented
locals.
...
And traditionally Forth has been implemented without locals, for the
same reason: It takes less memory and, for the system implementor,
less work
In any case, when it comes to performance measurements on "simple interpreters" like the Gforth of 1994, Forth code with locals usually
turns out to be slower and consume more memory than Forth code using
(and trying to avoid) stack juggling.
... looking at the code for Gforth for 3DUP.3 compared to the others,
Gforth still uses more primitives ...
You seem to argue that the random-access aspect of locals provides a performance advantage on simple systems, but in most cases, code using
locals is at a performance disadvantage on such systems
(and traditionalists have often used that to argue against locals).
Keeping at least one stack item in a register leads to a smaller and
faster implementation, and is not more complex than keeping all the
stack memory in RAM.
A way to use RAM that is less frowned upon by Forth traditionalists is (global) variables. The fact that the use of global variables is
frowned upon in the wider programming community for various reasons
seems to pour oil into the fire of their elitism.
Hans Bezemer <the.beez.speaks@gmail.com> writes:
We do have N>R (https://forth-standard.org/standard/tools/NtoR). So if
the whole problem is "there is no more room on the FP stack", there is
a way out.
That must be pretty new (it's not in gforth 0.7.3)
so I wonder how
helpful it really is.
In any case, it does not help with FP stack limitations at all,
because N>R transfers cells from the data stack to the return stack.
R was suggested as a way to implement horribleness #2 but it wouldactually have to be FN>R or something like that.
Paul Rubin <no.email@nospam.invalid> writes:
Hans Bezemer <the.beez.speaks@gmail.com> writes:
We do have N>R (https://forth-standard.org/standard/tools/NtoR). So if
the whole problem is "there is no more room on the FP stack", there is
a way out.
That must be pretty new (it's not in gforth 0.7.3)
It was accepted into Forth-200x at the 2010 standards meeting.
so I wonder how
helpful it really is.
We have two uses in the Gforth sources. I.e., not particularly
useful.
In any case, it does not help with FP stack limitations at all,
because N>R transfers cells from the data stack to the return stack.
My take on FP stack depth limitations in some systems is that you use
as much FP stack as you need, and a Forth system (like Gforth) where
you can make the FP stack as deep as available memory and address
space permit, and publish that. Maybe it will inspire the system implementors with shallow FP stacks to provide deep FP stacks, at
least optionally.
However, when I did something that required a deep FP stack (adding up
an array with pairwise addition <2025Jul16.132504@mips.complang.tuwien.ac.at>), I actually worked
around the limitations of systems that only provide a shallow FP
stack. But that was easy enough in that case.
Concerning systems with FP stack limits, AFAIK VFX has FP packages
that support very deep stacks, including the SSE-based package that
used to be the default in VFX64 for a while.
iForth implements a deep stack: it uses the 387 stack within a
definition and stores the FP stack items that are on the 387 stack to
memory on calls, and if the FP stack would overflow from the
computations within a word. I think this is a good approach: Much FP computation time is spent in words that do not call other words, or at
least the FP stack items do not live across the calls. iForth seems
to overdo it, however, even code like
: bar
dup f@ cell+ dup f@ cell+ dup f@ cell+
dup f@ cell+ dup f@ cell+ dup f@ cell+
f+ f+ f+ f+ f+ ;
which uses only 6 FP stack items does not produce the obvious code,
but something significantly longer: It first performs 6 FLD
instructions corresponding to the 6 F@, then stores 4 FP items,
presumably on the memory FP stack, and only then starts the additions (interleaved with some other code).
- anton
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
In any case, it does not help with FP stack limitations at all,
because N>R transfers cells from the data stack to the return stack.
In the code I mentioned, I wasn't running out of FP stack space, but
rather, I didn't see how to write the function in any non-horrible way without using FP locals. Horrible ways included: 1) implementing a
separate FP stack in memory for intermediate values during the
recursion, or 2) using ugly hacks to stash FP values on the regular data stack.
R was suggested as a way to implement horribleness #2 but it wouldactually have to be FN>R or something like that.
lxf uses the cpu FP stack. I think that is one of the worse decisions
I made for it. It will fail on all but the simplest complex fp math >operations. For lxf64 a priority was to have a separate in memory
FP stack. It has worked out very well!
BR
Peter
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
And traditionally Forth has been implemented without locals, for the
same reason: It takes less memory and, for the system implementor,
less work
A simple implementation of locals doesn't sound like that much work?
Mostly you need a runtime scheme to make sure the locals are cleaned up
in case of exceptions being thrown. If you're willing to ignore the
standard you don't need to complicate the text interpreter much. I
Hans Bezemer <the.beez.speaks@gmail.com> writes:
On 25-04-2026 07:26, Anton Ertl wrote:[reinserted deleted, relevant context]
Hans Bezemer <the.beez.speaks@gmail.com> writes:
If you want to use a language that is "ideologically devoted" to the
architecture, maybe you shouldn't use Forth at all - and stick with C.
I don't see anything about C that is closer to the hardware than Forth
is, and I think that both languages are about equally '"ideologically
devoted" to the architecture'. In particular, a C local variable is
no closer to a register (the most efficient hardware feature for
storing data) than a stack item or return stack item is, and register
allocation of any of the three is similarly difficult (with big
differences in difficulty between solutions that provide some register
allocation to those that are so reliable that you usually count on
them).
Well, you're actually shooting at Paul Rubin - not at me. Thank you! I
take all the help I can get!
Actually, this whole paragraph is a reaction on your statement, not
his. You deleted it for whatever reason, so I reinserted it.
Concerning Paul Rubin, just because he is wrong does not mean you are
right.
(..) and they often find some arguments that appeal to
elitism (i.e., only the chosen ones can use this programming language
for the elite as it should be used, and the others should program in
Python or "should never have been allowed to touch a keyboard" (Ulrich
Drepper).
It's your own pal Bernd that said: "A good programmer will write even
better code in Forth. A bad programmer will write abysmal code in Forth.
And I'm sorry to say - but most programmers are quite bad."
So, either you agree with him or we have an unfortunate departure of one
of the most foremost members of Gforth. Because this states - in no
uncertain words - that Forth programmers *ARE* elite.
What departure? We disagree on a number of things.
And the issue is not whether Forth programmers or any other
programmers are elite, but that many programmers think that they are
elite (whether they are or aren't) and that the designers or advocates
of deficient programming systems make use of that to dupe them, along
the lines of: "You as elite programmers can cope with this deficiency
[of course they don't call it a definiency], it's only subpar
programmers [more elaborate denigrations are common, see Ulrich
Drepper] who complain about it."
In the case of Forth and locals this tactic has not worked very well,
so even Forth, Inc. (who have been the most vocal among the commercial
Forth providers about their dislike of locals) have implemented
locals. But of course we see the echo of all of this still around
here.
In article <nnd$1196d1a5$0da70c85@6de98b5b6c1b0418>,
Hans Bezemer <the.beez.speaks@gmail.com> wrote:
<SNIP>
It would be better to think deeply, find an original solution and learn.
Like Albert with his brilliant ;: word.
Chuck Moore invented and coined the ;: word.
I came up with CO with is similar, or maybe the same.
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
And traditionally Forth has been implemented without locals, for the
same reason: It takes less memory and, for the system implementor,
less work
A simple implementation of locals doesn't sound like that much work?
I've
imagined some alternate versions of COLON, e.g.
: foo ( ... ) ; \ regular colon, no locals
1: foo ( ... ) ; \ one local called A
2: foo (... ) ; \ two locals, A and B
...
4: foo (... ) ; \ four locals: A, B, C, D.
The slowdown doesn't surprise me but it's not that big a deal, compared
to the slowdown of using interpreted Forth instead of assembly language
in the first place.
... looking at the code for Gforth for 3DUP.3 compared to the others,
Gforth still uses more primitives ...
That's a lot of code in the expansion! I wonder how it will look in a
simple interpreter.
You seem to argue that the random-access aspect of locals provides a
performance advantage on simple systems, but in most cases, code using
locals is at a performance disadvantage on such systems
Well, if the slowdown is less than say 2x, I'd say the code cleanup
matters more, due to the traditional 90/10 rule (maybe now 99/1) of
where CPU cycles go. Code the hot spots for speed and the rest for >convenience.
Keeping at least one stack item in a register leads to a smaller and
faster implementation, and is not more complex than keeping all the
stack memory in RAM.
That's only with a fancy compiler AND a requirement of the application
code having statically determined stack effects. Traditional words like
?DUP would confuse this scheme amirite?
A way to use RAM that is less frowned upon by Forth traditionalists is
(global) variables. The fact that the use of global variables is
frowned upon in the wider programming community for various reasons
seems to pour oil into the fire of their elitism.
I see what you mean by that. But, whole-program C compilers do
something like register allocation to re-use those "global" cells when
sets of them won't be needed at the same time. The Forth approach would
need either a similar fancy compiler, or else require the programmer to
do an error-prone manual memory layout process, or else burn memory >unnecessarily for those cells whose usage doesn't overlap.
Paul Rubin <no.email@nospam.invalid> writes:
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
And traditionally Forth has been implemented without locals, for the
same reason: It takes less memory and, for the system implementor,
less work
A simple implementation of locals doesn't sound like that much work?
Bernd Paysan wrote a simple locals implementation <https://cgit.git.savannah.gnu.org/cgit/gforth.git/tree/locals.fs>
that takes 84 SLOC:
I recently reviewed the string comparison for search-wordlist
and came up with the following
The string stored in the word header is already uppercased.
So string comparison will be case insensitive
: UC ( c -- c' ) \ uppercase char
dup $61 $7B within $20 and - ;
: NCOMP4 ( addr n addr' n' - f) \ 0 is match
dup >r
begin
rot = while \ str cstr
r> dup 1- >r
while \ str cstr
swap count uc \ cstr str' s1
rot count \ str' s1 cstr' c1
repeat
2drop r> drop 0 exit
then
2drop r> drop 1 ;
First iteration in the loop it does not compare chars but the length!
On 26-04-2026 11:50, Anton Ertl wrote:
Bernd Paysan wrote a simple locals implementation
<https://cgit.git.savannah.gnu.org/cgit/gforth.git/tree/locals.fs>
that takes 84 SLOC:
With all respect to Bernd, but yeah - compare that to this 0.5 SLOC >implementation of local:
: local r> swap dup >r @ >r ;: r> r> ! ;
Paul Rubin <no.email@nospam.invalid> writes:
...
I've
imagined some alternate versions of COLON, e.g.
: foo ( ... ) ; \ regular colon, no locals
1: foo ( ... ) ; \ one local called A
2: foo (... ) ; \ two locals, A and B
...
4: foo (... ) ; \ four locals: A, B, C, D.
If you cannot chose the names, locals lose a lot of their benefits in
making the code more understandable (OTOH, mathematicians have made to
with similar naming schemes for a long time). You might then just as
well work with >R >R >R >R and R@, R'@, 2 RPICK and 3 RPICK.
Hans Bezemer <the.beez.speaks@gmail.com> writes:
On 26-04-2026 11:50, Anton Ertl wrote:
Bernd Paysan wrote a simple locals implementation
<https://cgit.git.savannah.gnu.org/cgit/gforth.git/tree/locals.fs>
that takes 84 SLOC:
With all respect to Bernd, but yeah - compare that to this 0.5 SLOC
implementation of local:
: local r> swap dup >r @ >r ;: r> r> ! ;
Let's see:
[~:167902] gforth-0.5.0
GForth 0.5.0, Copyright (C) 1995-2000 Free Software Foundation, Inc.
GForth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
warnings off include locals.fs ok
ok
: local r> swap dup >r @ >r ;: r> r> ! ;
*the terminal*:1: Undefined word
peter <peter.noreply@tin.it> writes:
I recently reviewed the string comparison for search-wordlist
and came up with the following
The string stored in the word header is already uppercased.
So string comparison will be case insensitive
: UC ( c -- c' ) \ uppercase char
dup $61 $7B within $20 and - ;
: NCOMP4 ( addr n addr' n' - f) \ 0 is match
dup >r
begin
rot = while \ str cstr
r> dup 1- >r
while \ str cstr
swap count uc \ cstr str' s1
rot count \ str' s1 cstr' c1
repeat
2drop r> drop 0 exit
then
2drop r> drop 1 ;
First iteration in the loop it does not compare chars but the length!
Clever, but, at least without comment, too clever.
This code, and, more clearly, Hans Bezemers version demonstrate that
STR= is easier than COMPARE, STRCMP, or STR<, because you can deal
with the case of length difference right at the start, whereas the
latter words have to check the characters up to the end of the shorter
string first before dealing with the length. This shows the greatest
benefit in cases like
s" 0123456789abcdefg" s" 0123456789abcdefgh" strcmp
As for STRCMP, I have measured the five versions shown in my earlier
posting (whole program posted below), with the bugs fixed, and the
?DUP IF replaced by DUP IF ... THEN DROP, because it produces better
code.
I have also included the following versions:
: strcmp { addr1 u1 addr2 u2 -- n }
u1 u2 min 0
?do
addr1 c@ addr2 c@ - ?dup
if
unloop exit
then
addr1 char+ TO addr1
addr2 char+ TO addr2
loop
u1 u2 - ;
This comes from the '94 paper and is the version that uses TO instead
of defining new locals at every iteration. Paul Rubin will love the
code that current Gforth produces for "addr2 char+ TO addr2":
<strcmp+$E0> @local2 1->2
$7F337DA71BBA: mov 0x10(%rbp),%r15
<strcmp+$E8> char+ 2->2
$7F337DA71BBE: add $0x1,%r15
<strcmp+$F0> !local2 2->1
$7F337DA71BC2: mov %r15,0x10(%rbp)
The TO <local> code was not that efficient in earlier Gforth versions.
The other version I added is:
: strcmp ( addr1 u1 addr2 u2 -- n )
rot 2dup 2>r min 0 ?do ( addr1 addr2 )
over c@ over c@ - dup if
nip nip 2rdrop unloop exit then
drop
char+ swap char+ swap
loop
2drop r> r> - ;
This is the STRCMP3 from <2024Apr9.175958@mips.complang.tuwien.ac.at>
and may be the locals-less version I compared against in the '94
paper.
I also included your version (without the UC call) and Hans Bezemer's version.
I benchmarked two Forth systems, gforth-fast and gforth-itc.
gforth-itc uses indirect-threaded code and should perform similar to
the "simple interpreters" that Paul Rubin had in mind.
I ran three different benchmarks on these words, which performed the following a number of times:
s" 0123456789abcdefg" 2dup strcmp drop \ bench1
s" 0123456789abcdefg" s" 2123456789abcdefg" strcmp drop \ bench2
s" 0123456789abcdefg" s" 0123456789abcdefgh" strcmp drop \bench3
In bench1 the strings are equal and everything has to be compared. In
bench2 the strings have the same length, but differ in the first char,
so the loop can terminate after the first char. In bench3 the strings
have different length, but all chars that both strings have are the
same. In the latter case versionpeter and versionbezemer have an
advantage from not performing the same functionality.
The cycles numbers are per invocation of STRCMP, including benchmark overhead.
The benchmarks are run on a Ryzen 8700G (Zen4)>
In addition to the cycles, I also show the bytes of the native code of
the whole word in gforth-fast on AMD64 (without the final jmp (2
Bytes)), and of the loop (including the code for the if...then).
Bytes | cycles gforth-fast | cycles gforth-itc |
strcmp loop|bench1 bench2 bench3 | bench1 bench2 bench3 |
262 127 | 109.5 16.6 109.4 | 1732.7 147.4 1724.5 | version0
303 151 | 164.2 17.2 164.4 | 1714.1 170.4 1613.5 | version1
257 122 | 105.3 17.4 105.1 | 1496.7 166.4 1493.0 | version2
280 113 | 98.6 19.2 99.0 | 1230.1 194.4 1116.2 | version3
267 118 | 91.2 17.9 91.2 | 1268.6 198.4 1269.0 | version4
273 108 | 89.9 17.0 90.0 | 1136.0 178.4 1138.9 | version5
261 128 | 121.1 14.6 118.5 | 1221.4 131.3 1213.3 | version6
210 142 | 137.5 15.4 9.5 | 1244.4 155.3 78.3 | versionpeter
260 119 | 107.8 16.4 9.8 | 1186.2 134.5 71.3 | versionbezemer
So the champion among the full-featured strcmps for bench1 and bench3
is version5, for bench2 version6. The str= variants are much faster
for bench3 (of course), but slower than several other versions for
bench1 and slower than version6 for bench2. The native code size is
smallest for version2 (among the full-featured strcmp
implementations), so the locals-less versions do not win everything.
So locals-less (version5 and version6) is somewhat faster on both
gforth-fast and gforth-itc.
lxf has a more efficient locals implementation. Let's see how it
fares. It does not support the usage in version1, so I leave that
away.
cycles lxf
bench1 bench2 bench3
79.9 12.0 79.9 version0
99.6 12.0 99.6 version2
98.8 14.1 98.1 version3
86.0 13.2 86.0 version4
84.1 12.6 84.2 version5
88.7 10.0 92.8 version6
98.3 10.0 6.0 versionpeter
72.1 9.5 6.0 versionbezemer
On lxf version0 (with locals) is the fastest for bench1 and bench3,
and version6 is the fastest for bench2. Hans Bezemers version wins everything if we are only interested in str= functionality.
And here's the code (measurement scripts at the bottom): ----------------------------------------------------------
[defined] version0 [if]
: strcmp {: addr1 u1 addr2 u2 -- n :}
u1 u2 min 0
?do
addr1 c@ addr2 c@ - dup
if
unloop exit
then
drop
addr1 char+ TO addr1
addr2 char+ TO addr2
loop
u1 u2 - ;
[then]
[defined] version1 [if]
: strcmp {: addr1 u1 addr2 u2 -- n :}
addr1 addr2
u1 u2 min 0
?do {: s1 s2 :}
s1 c@ s2 c@ - dup
if
unloop exit
then
drop s1 char+ s2 char+
loop
2drop
u1 u2 - ;
[then]
[defined] version2 [if]
: strcmp {: addr1 u1 addr2 u2 -- n :}
u1 u2 min 0
?do
addr1 i + c@ addr2 i + c@ - dup
if
unloop exit
then
drop
loop
u1 u2 - ;
[then]
[defined] version3 [if]
: strcmp {: addr1 u1 addr2 u2 -- n :}
addr2 addr1 - {: offset :}
u1 u2 min addr1 + addr1 ?do
i c@ i offset + c@ - dup
if
unloop exit
then
drop
loop
u1 u2 - ;
[then]
[defined] version4 [if]
: strcmp {: addr1 u1 addr2 u2 -- n :}
addr2 addr1 - ( offset )
u1 u2 min addr1 + addr1 ?do ( offset )
dup i + c@ i c@ - dup
if
nip negate unloop exit
then
drop
loop
drop u1 u2 - ;
[then]
[defined] version5 [if]
: strcmp ( addr1 u1 addr2 u2 -- n )
rot 2dup - >r ( addr1 addr2 u1 u2 R: n1 )
min -rot over - ( u12 addr1 offset R: n1 )
swap rot bounds ( offset limit start R: n1 )
?do ( offset R: n1 loop-sys )
dup i + c@ i c@ - dup
if
nip negate unloop r> drop exit
then
drop
loop
drop r> negate ;
[then]
[defined] version6 [if]
[undefined] 2rdrop [if]
: 2rdrop postpone 2r> postpone 2drop ; immediate
[then]
: strcmp ( addr1 u1 addr2 u2 -- n )
rot 2dup 2>r min 0 ?do ( addr1 addr2 )
over c@ over c@ - dup if
nip nip 2rdrop unloop exit then
drop
char+ swap char+ swap
loop
2drop r> r> - ;
[then]
[defined] versionpeter [if]
\ from <20260425160747.00007f4a@tin.it>
\ renamed and deleted the call to UC
: strcmp ( addr n addr' n' - f) \ 0 is match
dup >r
begin
rot = while \ str cstr
r> dup 1- >r
while \ str cstr
swap count \ cstr str' s1
rot count \ str' s1 cstr' c1
repeat
2drop r> drop 0 exit
then
2drop r> drop 1 ;
[then]
[defined] versionbezemer [if]
\ from <nnd$548d4f1b$1e104571@905dda44db1f54ae>
\ renamed
: strcmp
rot over - if drop 2drop true exit then
0 ?do
over i chars + c@ over i chars + c@ -
if drop drop unloop true exit then
loop drop drop false
;
[then]
[defined] t{ [if]
t{ s" abc" s" abc" strcmp -> 0 }t
t{ s" abc" s" abcd" strcmp -> -1 }t
t{ s" abc" s" abd" strcmp -> -1 }t
t{ s" abd" s" abc" strcmp -> 1 }t
t{ s" cbc" s" abc" strcmp -> 2 }t
t{ s" abc" s" adc" strcmp -> -2 }t
[then]
\ Benchmarks
[undefined] iterations [if]
100000000 constant iterations
[then]
: benchmark ( c-addr1 u1 c-addr2 u2 -- )
iterations 0 do
2over 2over strcmp drop
loop
2drop 2drop ;
: bench1
s" 0123456789abcdefg" 2dup benchmark ;
: bench2
s" 0123456789abcdefg" s" 2123456789abcdefg" benchmark ;
: bench3
s" 0123456789abcdefg" s" 0123456789abcdefgh" benchmark ;
0 [if]
# bash script for producing the cycles
IFS=":"
for i in 0 1 2 3 4 5 6 peter bezemer; do
for forthit in gforth-fast:100000000 gforth-itc:10000000; do
fields=($forthit); forth="${fields[0]}"; iterations="${fields[1]}"
for bench in 1 2 3; do
perf stat --log-fd 3 -x, -e cycles:u $forth -e "create version$i $iterations constant iterations" ~/forth/strcmp.4th -e "bench$bench bye" 3>&1 >/dev/null|
awk -F, '{printf "%6.1f ",$1/'$iterations'}'
done
done
echo version$i
done
IFS=":"
for i in 0 2 3 4 5 6 peter bezemer; do
forth=lxf; iterations=100000000
for bench in 1 2 3; do
perf stat --log-fd 3 -x, -e cycles:u $forth "create version$i $iterations constant iterations include $HOME/forth/strcmp.4th bench$bench bye" 3>&1 >/dev/null|
awk -F, '{printf "%6.1f ",$1/'$iterations'}'
done
echo version$i
done
[then]
--------------------------------------------------------------
- anton
On Sun, 26 Apr 2026 14:03:03 GMT...
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
I benchmarked two Forth systems, gforth-fast and gforth-itc.
gforth-itc uses indirect-threaded code and should perform similar to
the "simple interpreters" that Paul Rubin had in mind.
I ran three different benchmarks on these words, which performed the
following a number of times:
s" 0123456789abcdefg" 2dup strcmp drop \ bench1
s" 0123456789abcdefg" s" 2123456789abcdefg" strcmp drop \ bench2
s" 0123456789abcdefg" s" 0123456789abcdefgh" strcmp drop \bench3
In bench1 the strings are equal and everything has to be compared. In
bench2 the strings have the same length, but differ in the first char,
so the loop can terminate after the first char. In bench3 the strings
have different length, but all chars that both strings have are the
same. In the latter case versionpeter and versionbezemer have an
advantage from not performing the same functionality.
The cycles numbers are per invocation of STRCMP, including benchmark overhead.
The benchmarks are run on a Ryzen 8700G (Zen4)>
In addition to the cycles, I also show the bytes of the native code of
the whole word in gforth-fast on AMD64 (without the final jmp (2
Bytes)), and of the loop (including the code for the if...then).
Bytes | cycles gforth-fast | cycles gforth-itc |
strcmp loop|bench1 bench2 bench3 | bench1 bench2 bench3 |
262 127 | 109.5 16.6 109.4 | 1732.7 147.4 1724.5 | version0
303 151 | 164.2 17.2 164.4 | 1714.1 170.4 1613.5 | version1
257 122 | 105.3 17.4 105.1 | 1496.7 166.4 1493.0 | version2
280 113 | 98.6 19.2 99.0 | 1230.1 194.4 1116.2 | version3
267 118 | 91.2 17.9 91.2 | 1268.6 198.4 1269.0 | version4
273 108 | 89.9 17.0 90.0 | 1136.0 178.4 1138.9 | version5
261 128 | 121.1 14.6 118.5 | 1221.4 131.3 1213.3 | version6
210 142 | 137.5 15.4 9.5 | 1244.4 155.3 78.3 | versionpeter
260 119 | 107.8 16.4 9.8 | 1186.2 134.5 71.3 | versionbezemer
lxf has a more efficient locals implementation. Let's see how it
fares. It does not support the usage in version1, so I leave that
away.
cycles lxf
bench1 bench2 bench3
79.9 12.0 79.9 version0
99.6 12.0 99.6 version2
98.8 14.1 98.1 version3
86.0 13.2 86.0 version4
84.1 12.6 84.2 version5
88.7 10.0 92.8 version6
98.3 10.0 6.0 versionpeter
72.1 9.5 6.0 versionbezemer
Anton, thanks for running all these tests.
I have now also run them on my Ryzen 9950X.
There is an error in version 6 that i corrected.
2rdrop needs to be after unloop. On lxf64 that uses registers for
loop parameters this is necessary!
I needed also to change the log-fd to 5 to get it to run.
The tests are run with Debian under WSL2.
Here are the results
lxf64
59.1 10.0 57.6 version0
48.1 10.0 48.4 version2
43.0 10.7 42.5 version4
42.2 9.1 42.2 version5
55.1 9.0 55.0 version6
65.7 8.0 6.0 versionpeter
32.8 9.0 4.2 versionbezemer
lxf
64.2 8.5 64.2 version0
112.3 10.2 90.1 version2
78.8 10.6 75.6 version4
88.1 9.4 88.2 version5
112.2 7.5 114.7 version6
71.0 8.2 7.4 versionpeter
50.9 8.3 4.3 versionbezemer
There is a significant impact in having loop parameters in registers!
version 2 and 6 are interesting for lxf. The full stat gives some more
info.
peter <peter.noreply@tin.it> writes:
On Sun, 26 Apr 2026 14:03:03 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
I benchmarked two Forth systems, gforth-fast and gforth-itc.
gforth-itc uses indirect-threaded code and should perform similar to
the "simple interpreters" that Paul Rubin had in mind.
I ran three different benchmarks on these words, which performed the
following a number of times:
s" 0123456789abcdefg" 2dup strcmp drop \ bench1
s" 0123456789abcdefg" s" 2123456789abcdefg" strcmp drop \ bench2
s" 0123456789abcdefg" s" 0123456789abcdefgh" strcmp drop \bench3
In bench1 the strings are equal and everything has to be compared. In
bench2 the strings have the same length, but differ in the first char,
so the loop can terminate after the first char. In bench3 the strings
have different length, but all chars that both strings have are the
same. In the latter case versionpeter and versionbezemer have an
advantage from not performing the same functionality.
The cycles numbers are per invocation of STRCMP, including benchmark overhead.
The benchmarks are run on a Ryzen 8700G (Zen4)>
In addition to the cycles, I also show the bytes of the native code of
the whole word in gforth-fast on AMD64 (without the final jmp (2
Bytes)), and of the loop (including the code for the if...then).
Bytes | cycles gforth-fast | cycles gforth-itc |
strcmp loop|bench1 bench2 bench3 | bench1 bench2 bench3 |
262 127 | 109.5 16.6 109.4 | 1732.7 147.4 1724.5 | version0
303 151 | 164.2 17.2 164.4 | 1714.1 170.4 1613.5 | version1
257 122 | 105.3 17.4 105.1 | 1496.7 166.4 1493.0 | version2
280 113 | 98.6 19.2 99.0 | 1230.1 194.4 1116.2 | version3
267 118 | 91.2 17.9 91.2 | 1268.6 198.4 1269.0 | version4
273 108 | 89.9 17.0 90.0 | 1136.0 178.4 1138.9 | version5
261 128 | 121.1 14.6 118.5 | 1221.4 131.3 1213.3 | version6
210 142 | 137.5 15.4 9.5 | 1244.4 155.3 78.3 | versionpeter
260 119 | 107.8 16.4 9.8 | 1186.2 134.5 71.3 | versionbezemer ...
lxf has a more efficient locals implementation. Let's see how it
fares. It does not support the usage in version1, so I leave that
away.
cycles lxf
bench1 bench2 bench3
79.9 12.0 79.9 version0
99.6 12.0 99.6 version2
98.8 14.1 98.1 version3
86.0 13.2 86.0 version4
84.1 12.6 84.2 version5
88.7 10.0 92.8 version6
98.3 10.0 6.0 versionpeter
72.1 9.5 6.0 versionbezemer
And, to top it off, sf64 and vfx64, after correcting the bug in
version6 that you pointed out:
cycles sf-4.0.0-RC89 | cycles vfx64 5.43 |
bench1 bench2 bench3 | bench1 bench2 bench3 |
195.1 62.0 194.5 | 124.2 42.2 123.3 | version0
136.3 63.0 136.2 | 200.4 124.1 204.4 | version2
143.7 69.6 143.4 | 90.7 36.7 91.3 | version4
115.1 36.0 114.1 | 102.0 30.2 101.8 | version5
132.8 38.0 133.3 | 85.8 28.2 88.2 | version6
182.0 19.0 9.0 | 95.7 10.2 6.2 | versionpeter
224.9 40.2 8.0 | 63.2 29.2 6.2 | versionbezemer
Interesting performance variations.
Anton, thanks for running all these tests.
I have now also run them on my Ryzen 9950X.
There is an error in version 6 that i corrected.
2rdrop needs to be after unloop. On lxf64 that uses registers for
loop parameters this is necessary!
Thanks. In sf64 and vfx64 this change is necessary, too.
I needed also to change the log-fd to 5 to get it to run.
The tests are run with Debian under WSL2.
WSL2 supports performance counters. Great!
What happens with log-fd=3?
Here are the results
lxf64
59.1 10.0 57.6 version0
48.1 10.0 48.4 version2
43.0 10.7 42.5 version4
42.2 9.1 42.2 version5
55.1 9.0 55.0 version6
65.7 8.0 6.0 versionpeter
32.8 9.0 4.2 versionbezemer
lxf
64.2 8.5 64.2 version0
112.3 10.2 90.1 version2
78.8 10.6 75.6 version4
88.1 9.4 88.2 version5
112.2 7.5 114.7 version6
71.0 8.2 7.4 versionpeter
50.9 8.3 4.3 versionbezemer
There is a significant impact in having loop parameters in registers! >version 2 and 6 are interesting for lxf. The full stat gives some more >info.
Not any info that I find helpful. But my guess is as follows: Keeping
the loop index in memory has reliably meant that counted loops take at
least 5 cycles per iteration. In recent processors (from this decade
or a little earlier), hardware can perform zero-cycle store-to-load forwarding, but it is not reliable. So my guess is that in version2
and version6 we are seeing cases where this hardware optimization has
not worked. So, yes, keeping loop parameters that change in registers
is a good idea even on recent CPUs.
The differences between Zen4 and Zen5 on lxf are significant, but I
guess that if you take the average, you get the picture of small
progress that I see on various websites.
- anton
Hans Bezemer <the.beez.speaks@gmail.com> writes:
On 26-04-2026 11:50, Anton Ertl wrote:
Bernd Paysan wrote a simple locals implementation
<https://cgit.git.savannah.gnu.org/cgit/gforth.git/tree/locals.fs>
that takes 84 SLOC:
With all respect to Bernd, but yeah - compare that to this 0.5 SLOC
implementation of local:
: local r> swap dup >r @ >r ;: r> r> ! ;
Let's see:
[~:167902] gforth-0.5.0
GForth 0.5.0, Copyright (C) 1995-2000 Free Software Foundation, Inc.
GForth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
warnings off include locals.fs ok
ok
: local r> swap dup >r @ >r ;: r> r> ! ;
*the terminal*:1: Undefined word
: local r> swap dup >r @ >r ;: r> r> ! ;
^^
Backtrace:
$F7B5A158 throw
$F7B6418C no.extensions
Although, admittedly, while Bernd Paysan's locals.fs loads, it does
not work AFAICT (I tried it on gforth-0.4 and gforth-0.5; it does not
load on gforth-0.6 and later). Apparently it had bitrotted between
the time when it was written in 1992 and gforth-0.4 in 1998.
- anton
On 26/04/2026 7:50 pm, Anton Ertl wrote:
Paul Rubin <no.email@nospam.invalid> writes:
...
I've
imagined some alternate versions of COLON, e.g.
: foo ( ... ) ; \ regular colon, no locals
1: foo ( ... ) ; \ one local called A
2: foo (... ) ; \ two locals, A and B
...
4: foo (... ) ; \ four locals: A, B, C, D.
informsIf you cannot chose the names, locals lose a lot of their benefits in
making the code more understandable (OTOH, mathematicians have made to
with similar naming schemes for a long time). You might then just as
well work with >R >R >R >R and R@, R'@, 2 RPICK and 3 RPICK.
That Julian Noble (among others) felt the need for FTRAN INTRAN etc
what scientists and academics really want - and it's a long way from the 'stack based' locals offered by most forth systems. The latter representidentifiers.
a concession to forth before a user has even begun to consider
To an outsider, forth locals do nothing to ameliorate what they see as fundamentally broken about the language. ISTM if a forther hasconceded to
use stack-based locals, he can certainly make choices about what form identifiers take.
As a matter of fact, this thingy creates locals:
: ;: >r ; : local r> swap dup >r @ >r ;: r> r> ! ;
On 28/04/2026 13:34, Hans Bezemer wrote:
As a matter of fact, this thingy creates locals:
: ;: >r ; : local r> swap dup >r @ >r ;: r> r> ! ;
LOCAL can also be defined as:
: local r> over @ rot 2>r ;: 2r> ! ;
which I guess you won't like, but is a bit shorter. It also survives
your pre-processor conversion of 2>r to >r >r, similarly 2r>
On 29-04-2026 13:44, Gerry Jackson wrote:
On 28/04/2026 13:34, Hans Bezemer wrote:
As a matter of fact, this thingy creates locals:
: ;: >r ; : local r> swap dup >r @ >r ;: r> r> ! ;
LOCAL can also be defined as:
: local r> over @ rot 2>r ;: 2r> ! ;
which I guess you won't like, but is a bit shorter. It also survives
your pre-processor conversion of 2>r to >r >r, similarly 2r>
I don't say you're wrong, but there is some logic to this madness:
1. In 4tH, "2>R" is the same as ">R >R". The compiler expands it like
that. So -- there is no advantage to do "2>R". Yes, you can do "2R@",
but not "R@". It won't be portable;
If you cannot chose the names... You might then just as well work with
R >R >R >R and R@, R'@, 2 RPICK and 3 RPICK.
In the code you see the threaded code interspersed with the native
code. If you ignore the native code, you see what a simple
interpreter would see (if it had a locals implementation that produced
code similar to that of Gforth).
So it's "code cleanup", not making use of hardware facilities for
efficiency on simple interpreters, that you see as the benefit of
locals.
Even with multi-representation stack-caching as used since Gforth 0.7
(which does require more compiler smarts), no statically determined
stack effect is necessary, because the code generator returns to the canonical state on control-flow.
... we have user variables like BASE and HLD (in F83, HOLDPTR in
gforth). They are used across multiple words, and the fact that you
don't have to pass them and put them into a local has been touted as
an advantage over locals: Definitions that use global variables are
easier to factor.
You might then just as well work with >R >R >R >R and R@, R'@, 2 RPICK
and 3 RPICK.
...
Flashforth has a separate P stack which can be used for temporaries
within a word, but I don't remember how cleanup is handled, if at all.
It's a cpu register - not a stack. For re-entrancy old value must first
be pushed onto the cpu stack before loading the new. IIRC FF has a word
that combines those. Basically a variable.
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:[...]
I wonder if gforth would get less code bloat if you added some
primitives for pushing more than one local. E.g. 2>L, 3>L, etc. would
push that many stack elements to LOCAL0, LOCAL1, LOCAL2. Then there
wouldn't be that big chunk of replicated code.
l >l 62 len= 4+ 26+ 3
l >l >l 9 len= 4+ 34+ 3
l >l >l >l 5 len= 4+ 42+ 3
l f>l 2 len= 4+ 42+ 3
l @local0 20 len= 4+ 11+ 3
l lit f@localn 1 len= 4+ 24+ 3
l 67 len= 4+ 18+ 3
l 10 len= 4+ 23+ 3
Even with multi-representation stack-caching as used since Gforth 0.7
(which does require more compiler smarts), no statically determined
stack effect is necessary, because the code generator returns to the
canonical state on control-flow.
I see, yeah, but that means stack juggling to get to the canonical
state.
... we have user variables like BASE and HLD (in F83, HOLDPTR in
gforth). They are used across multiple words, and the fact that you
don't have to pass them and put them into a local has been touted as
an advantage over locals: Definitions that use global variables are
easier to factor.
Urgggh...
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
You might then just as well work with >R >R >R >R and R@, R'@, 2 RPICK
and 3 RPICK.
But, now you have to avoid mixing that style with using the R stack for >temporaries, including stuff like loop indexes which sometimes go
there.
And you have to clean up the R stack before returning,
and maybe
arrange for that to happen in case of an exception.
| Sysop: | DaiTengu |
|---|---|
| Location: | Appleton, WI |
| Users: | 1,116 |
| Nodes: | 10 (0 / 10) |
| Uptime: | 89:17:52 |
| Calls: | 14,306 |
| Calls today: | 1 |
| Files: | 186,338 |
| D/L today: |
1,551 files (531M bytes) |
| Messages: | 2,525,562 |