They call the architecture VLIW which I don't think is quite
right -- they do indeed have wide instruction words but I don't
think they do speculative execution.
Here's a preprint of an IEEE Micro article:
Google's Training Supercomputers from TPU v2 to Ironwood:
Architectural Stability, Scale, Resilience, Power Efficiency,
and Sustainability Across Five Generations
They call the architecture VLIW which I don't think is quite
right -- they do indeed have wide instruction words but I don't
think they do speculative execution.
John Levine <johnl@taugh.com> writes:
Here's a preprint of an IEEE Micro article:
Google's Training Supercomputers from TPU v2 to Ironwood:
Architectural Stability, Scale, Resilience, Power Efficiency,
and Sustainability Across Five Generations
They call the architecture VLIW which I don't think is quite
right -- they do indeed have wide instruction words but I don't
think they do speculative execution.
It is more like clocked (or lock-step) data flow architectures,
without the natural asynchronicity in data flow.
On 6/17/2026 11:05 AM, Scott Lurndal wrote:
John Levine <johnl@taugh.com> writes:
Here's a preprint of an IEEE Micro article:
Google's Training Supercomputers from TPU v2 to Ironwood:
Architectural Stability, Scale, Resilience, Power Efficiency,
and Sustainability Across Five Generations
They call the architecture VLIW which I don't think is quite
right -- they do indeed have wide instruction words but I don't
think they do speculative execution.
It is more like clocked (or lock-step) data flow architectures,
without the natural asynchronicity in data flow.
An idle thought here is whether there is any "better" option than conventional register-machine designs.
An idle thought here is whether there is any "better" option than conventional register-machine designs.
An idle thought here is whether there is any "better" option than conventional register-machine designs.
On Wed, 17 Jun 2026 15:22:35 -0500, BGB wrote:
An idle thought here is whether there is any "better" option than
conventional register-machine designs.
Back in the 1980s and earlier, there were many architectural
possibilities being considered.
Consider the Transputer: this consisted of a lot of CPU nodes, each
with its own local memory. I guess they envisioned scaling up both
numbers of CPUs as well as amount of memory in more powerful
configurations, so the amount of memory per CPU stayed roughly
constant.
Why didn’t that work? Obviously, because CPU speeds increased disproportionately more than memory speeds. And so the total amount of
memory has in reality been increasing much faster than the number of
CPUs.
And I don’t think you see NUMA in consumer machines; maybe in supercomputers.
On Wed, 17 Jun 2026 15:22:35 -0500, BGB wrote:
An idle thought here is whether there is any "better" option than
conventional register-machine designs.
Back in the 1980s and earlier, there were many architectural
possibilities being considered.
Consider the Transputer: this consisted of a lot of CPU nodes, each
with its own local memory. I guess they envisioned scaling up both
numbers of CPUs as well as amount of memory in more powerful
configurations, so the amount of memory per CPU stayed roughly
constant.
Why didn’t that work? Obviously, because CPU speeds increased disproportionately more than memory speeds. And so the total amount of
memory has in reality been increasing much faster than the number of
CPUs.
On Wed, 17 Jun 2026 15:22:35 -0500, BGB wrote:
An idle thought here is whether there is any "better" option than conventional register-machine designs.
On 2026-06-18 10:45, Lawrence D’Oliveiro wrote:
On Wed, 17 Jun 2026 15:22:35 -0500, BGB wrote:
An idle thought here is whether there is any "better" option than
conventional register-machine designs.
Back in the 1980s and earlier, there were many architectural
possibilities being considered.
Consider the Transputer: this consisted of a lot of CPU nodes, each
with its own local memory. I guess they envisioned scaling up both
numbers of CPUs as well as amount of memory in more powerful
configurations, so the amount of memory per CPU stayed roughly
constant.
Why didn’t that work? Obviously, because CPU speeds increased
disproportionately more than memory speeds. And so the total amount of
memory has in reality been increasing much faster than the number of
CPUs.
The Transputer (from Inmos) led to the XCORE many-core chips from XMOS (https://www.xmos.com/) which seem to be somewhat successful today.
The Transputer was successful for a while, but IMO then waned because
Inmos focused on making each single-core processor chip more powerful,
which put the transputer into direction competition with conventional processors and made many-processor transputer systems expensive, instead
of making many-core chips, which is what XMOS does, making the cost per
core small.
On Wed, 17 Jun 2026 15:22:35 -0500, BGB wrote:
An idle thought here is whether there is any "better" option than
conventional register-machine designs.
As an interesting thought experiment, let's assume that a vast
amount of memory is available with access times better than
SRAM (let's suppose 1-cycle for the purposes of this thread).
Would registers even be needed in such an architecture?
On Wed, 17 Jun 2026 15:22:35 -0500, BGB wrote:
An idle thought here is whether there is any "better" option than
conventional register-machine designs.
As an interesting thought experiment, let's assume that a vast
amount of memory is available with access times better than
SRAM (let's suppose 1-cycle for the purposes of this thread).
Would registers even be needed in such an architecture?
On 18/06/2026 16:39, Scott Lurndal wrote:
On Wed, 17 Jun 2026 15:22:35 -0500, BGB wrote:
An idle thought here is whether there is any "better" option than
conventional register-machine designs.
As an interesting thought experiment, let's assume that a vast
amount of memory is available with access times better than
SRAM (let's suppose 1-cycle for the purposes of this thread).
Would registers even be needed in such an architecture?
There have been plenty of microcontrollers where there is only one or
very few actual registers - everything else is ram. The 8-bit PIC
family works like that, and has been hugely popular. There are also
"stack machine" architectures where you have, at most, a register for
the top-of-stack (along with at least one stack pointer register, a
program counter, and perhaps a flag/status register). Pretty much all
4-bit processors work like that, AFAIK.
I think there's a lot to be said for stack machine type designs,
possibly with more than one stack.
The Transputer was successful for a while, but IMO then waned because
Inmos focused on making each single-core processor chip more powerful,
which put the transputer into direction competition with conventional >processors and made many-processor transputer systems expensive, instead
of making many-core chips, which is what XMOS does, making the cost per
core small.
Another massively multi-core device I read about was the GreenArray
GA144 <https://www.greenarraychips.com/>. In theory, the 144 processing >elements means it can do a massive number of operations per second with
very little power and cost - in practice, the tiny amount of ram for
code and data on each element means it can do almost nothing. It is >programmed in a type of Forth (I know there are Forth experts in this
group, who might have more informed opinions on the chip and development
for it), but it is an obscure and limited Forth.
Combined with the
complication of splitting tasks between many elements and communicating
and synchronising between them, I making use of these devices is a very >niche skill.
David Brown <david.brown@hesbynett.no> writes:
Another massively multi-core device I read about was the GreenArray
GA144 <https://www.greenarraychips.com/>. In theory, the 144 processing
elements means it can do a massive number of operations per second with
very little power and cost - in practice, the tiny amount of ram for
code and data on each element means it can do almost nothing. It is
programmed in a type of Forth (I know there are Forth experts in this
group, who might have more informed opinions on the chip and development
for it), but it is an obscure and limited Forth.
That is not its major problem IMO.
The Greenarrays chips have IIRC 64 18-bit words per core. That's
really little for a general-purpose computer, and too little to be of
any use in that capacity. A number of people in the Forth community
were fascinated by these chips and ordered some to play around with
them, but I rarely heard of any actual uses, much less production
uses. Greenarrays apparently is still around, so maybe someone has
found some use for it.
One suggestion I have read is that it would be useful for bit-banging
on I/O lines. 64 words might be enough for that (as long as the
protocol is not too complex), and at 700MHz these chips might outdo
FPGAs in some of these applications. But I have not heard much about
such applications, either.
Combined with the
complication of splitting tasks between many elements and communicating
and synchronising between them, I making use of these devices is a very
niche skill.
One interesting aspect is the synchronization. AFAIK you can send
over a word to a neighboring core. As long as that word is not
consumed, sending another word will block. Reading a word from a
neighbor if there is none available will block, too. Not sure if it
has a way to check whether something is in the buffer before trying to
read from or write to it.
On 18/06/2026 16:39, Scott Lurndal wrote:Yes, for some applications. As you noted, many/most of the successful
On Wed, 17 Jun 2026 15:22:35 -0500, BGB wrote:
An idle thought here is whether there is any "better" option than
conventional register-machine designs.
As an interesting thought experiment, let's assume that a vast
amount of memory is available with access times better than
SRAM (let's suppose 1-cycle for the purposes of this thread).
Would registers even be needed in such an architecture?
There have been plenty of microcontrollers where there is only one or
very few actual registers - everything else is ram. The 8-bit PIC
family works like that, and has been hugely popular. There are also
"stack machine" architectures where you have, at most, a register for
the top-of-stack (along with at least one stack pointer register, a
program counter, and perhaps a flag/status register). Pretty much all 4-bit processors work like that, AFAIK.
I think there's a lot to be said for stack machine type designs,
possibly with more than one stack.
On 6/18/2026 7:59 AM, David Brown wrote:
On 18/06/2026 16:39, Scott Lurndal wrote:Yes, for some applications. As you noted, many/most of the successful stack architectures CPUs are in the small embedded space. The
On Wed, 17 Jun 2026 15:22:35 -0500, BGB wrote:
An idle thought here is whether there is any "better" option than
conventional register-machine designs.
As an interesting thought experiment, let's assume that a vast
amount of memory is available with access times better than
SRAM (let's suppose 1-cycle for the purposes of this thread).
Would registers even be needed in such an architecture?
There have been plenty of microcontrollers where there is only one or
very few actual registers - everything else is ram. The 8-bit PIC
family works like that, and has been hugely popular. There are also
"stack machine" architectures where you have, at most, a register for
the top-of-stack (along with at least one stack pointer register, a
program counter, and perhaps a flag/status register). Pretty much all
4-bit processors work like that, AFAIK.
I think there's a lot to be said for stack machine type designs,
possibly with more than one stack.
advantages of stack architectures, besides the ones you mentioned
include smaller code footprint and faster context switch.
The downside becomes more problematic when you get to more powerful
systems and try to do superscaler operations. For example, it is easy
to see how to perform simultaneously two adds that involve different registers, but since essentially all operations in a stack machine have
top of stack as the destination, it gets more tricky.
On Wed, 17 Jun 2026 15:22:35 -0500, BGB wrote:
An idle thought here is whether there is any "better" option than
conventional register-machine designs.
As an interesting thought experiment, let's assume that a vast
amount of memory is available with access times better than
SRAM (let's suppose 1-cycle for the purposes of this thread).
Would registers even be needed in such an architecture?
scott@slp53.sl.home (Scott Lurndal) writes:
On Wed, 17 Jun 2026 15:22:35 -0500, BGB wrote:
An idle thought here is whether there is any "better" option than
conventional register-machine designs.
As an interesting thought experiment, let's assume that a vast
amount of memory is available with access times better than
SRAM (let's suppose 1-cycle for the purposes of this thread).
Would registers even be needed in such an architecture?
Registers in high-performance CPUs give you several benefits:
1) The addresses are hard-coded in the instructions. This means that
read access can start early, that dependencies (read-after-write, write-after-write, write-after-read) can be determined early and used
for forwarding, and for renaming registers), and for reducing port requirements.
2) They have many read and write ports.
3) Fast access time. Well, maybe. Thanks to 1) fast access time is
actually not necessary, it just means that you need fewer forwarding
paths.
Let's look at your thought experiment:
Advantage 1 is missing. Some AMD64 implementations still manage to
implement 0-cycle store-to-load-forwarding in many cases, but AFAIK
not as reliably as for registers.
Advantage 2 tends to be missing. E.g., the most extreme I have seen
up to now is 3 reads and 2 writes per cycle, and IIRC <5 total memory accesses per cycle, on a machine that can do 8 or 10 instructions per
cycle, i.e. at least 16 register reads and 8 register writes per cycle
(maybe limited to less, but with advantage 1 mitigating that to some
extent).
Advantage 3: What would single-cycle memory access mean for d=a+b+c? It would be compiled to
t=b+c
d=a+t
With registers this has a latency of typically 2 cycles. With
single-cycle memory access this typically has a latency of 6 cycles.
BTW, it's not just a though experiment:
A number of IA-64 implementations have had single-cycle D-cache
access. It still had registers.
Processors like the 6502 and the 6809 have single-cycle memory access.
They still have registers (actually, accumulators and index
registers).
Of course, the challenge here is that programming is significantly
different from what we are used to, and you need a new type of OS as
well as new applications - and ideally, new programming languages.
That's a lot of big hurdles to clear even if the result is theoretically more efficient. (And you still need to keep the fast single-threaded
cpus, and the fast SIMD / vector processing systems, for other kinds of tasks.)
Possibly the biggest millstone around the neck of computing
architectures is the C language.
For every processor that is not just
for highly niche code (like gpus), what matters is how fast C code can
run on it. Most other languages either use a similar model, or run on
VMs written in C.
Why bother with support for multiple stacks, or other interesting hardware innovations, if it doesn't support faster C?
With
all due respect to Anton and other Forth enthusiasts, "fastest Forth benchmarks" is not going to attract much investment money.
I'd love to see new architectures and new hardware features that are genuinely different, but they rarely turn up.
Even with C programming, there are so many things that could be made more efficient with a bit of interesting hardware. (I say this with little knowledge of the complications implied.) A lot of time in C code is spend in memory allocation work - that's got to be a prime candidate for hardware acceleration, especially if we can get away from the brutish malloc/free approach.
(Stack-based allocators are one possibility for a lot of allocations.)
There could be hardware support for threading, locking,
and inter-process communication.
Separate data stacks and return stacks would make things faster and more secure.
Fat pointers that can track access modes and range limits, at least in common cases, would aid reliability and security.
I guess a big what if is, say, rather than having a 64-bit or
128-bit pipe to a relatively large RAM, you could have a whole lot
of pipes to smaller and narrower RAM modules.
Say, for example, Say, for example, 64x 16b LPDDR?...
On Thu, 18 Jun 2026 03:50:52 -0500, BGB wrote:
I guess a big what if is, say, rather than having a 64-bit or
128-bit pipe to a relatively large RAM, you could have a whole lot
of pipes to smaller and narrower RAM modules.
Say, for example, Say, for example, 64x 16b LPDDR?...
16-bit ... wasn’t that the bus width of Rambus?
Was Rambus trying to do anything like this? Remember, Intel invested
heavily in it ... only for just about the entire rest of the industry
to bring out DDR.
Yes, pipelining RAM would seem an obvious answer to try to keep up
with faster and faster CPUs.
On Thu, 18 Jun 2026 03:50:52 -0500, BGB wrote:
I guess a big what if is, say, rather than having a 64-bit or
128-bit pipe to a relatively large RAM, you could have a whole lot
of pipes to smaller and narrower RAM modules.
Say, for example, Say, for example, 64x 16b LPDDR?...
16-bit ... wasn’t that the bus width of Rambus?
Was Rambus trying to do anything like this? Remember, Intel invested
heavily in it ... only for just about the entire rest of the industry
to bring out DDR.
Yes, pipelining RAM would seem an obvious answer to try to keep up
with faster and faster CPUs.
David Brown <david.brown@hesbynett.no> posted:
-----------------------
Of course, the challenge here is that programming is significantly
different from what we are used to, and you need a new type of OS as
well as new applications - and ideally, new programming languages.
That's a lot of big hurdles to clear even if the result is theoretically
more efficient. (And you still need to keep the fast single-threaded
cpus, and the fast SIMD / vector processing systems, for other kinds of
tasks.)
Possibly the biggest millstone around the neck of computing
architectures is the C language.
C is not an albatross !! it is a standard to which one designs--
exactly like air (our atmosphere) provides the standard to which
airplanes have to be designed.
C ended up being this model because its floor supports almost all other programming languages: certainly {Fortran, C++, Algol68, Pascal, Jovial,
...} and is not all that bad when doing {LISP, RPG, COBOL, APL, Snowbal}.
For every processor that is not just
for highly niche code (like gpus), what matters is how fast C code can
run on it. Most other languages either use a similar model, or run on
VMs written in C.
You state that like it was a BAD thing--it is not. I just we had all chosen the same standards at which to design {BE or LE} is like having the steering wheel on the {left or right}, ...
Why bother with support for multiple stacks, or other
interesting hardware innovations, if it doesn't support faster C?
I can argue that having 2 stacks {one for the preserved state from
caller to callee, the other for data} enables ever so slightly faster
C--but that is not the point--the point is robustness in the face of
threats (buffer overruns, ROP, malicious use of memory).
The speed advantage is by knowing that registers written to the call-
stack do not need to be written to L2 (or farther) if RET has occurred
and the cache line replaced. Saves a trifling of power, too. These is
another advantage is when an EXIT instruction is still reading from
stack and an ENTER instruction starts writing to the stack. WE taught
the compiler a prescribed order to utilize the registers, so that when
an EXIT is running and an ENTER is decoded, the EXIT can be short
circuited and some of the ENTER short circuited, eliding cycle waste.
With
all due respect to Anton and other Forth enthusiasts, "fastest Forth
benchmarks" is not going to attract much investment money.
I'd love to see new architectures and new hardware features that are
genuinely different, but they rarely turn up.
My 66000 is replete with those features--and it is argued here daily
that it (my 66000 ISA) has gone too far !!
[Rbase+Rindex<<scale+DISP] is more than most would allow. Yet with universal Constants, a single memory reference can access anywhere in memory at any time.
Jump-Through-Table (switch) making PIC standard; while making the tables smaller {1/8th to 1/4th}
Load IP instructions (CALX, JMPX, CALA, JMPA) enable control transfers directly through GOT (or other SW table).
Multi-line multi-instruction ATOMIC sequences freely available to SW.
Transcendental instructions that take FDIV number of cycles.
Context Switches performed without instruction execution--as if the
state of a thread was treated like a write-back cache.
Interrupt tables that can be used as a low level scheduler built into
the priority (and privilege) model with support for vVMs monitoring vMs
One can schedule an DPC/sofIRQ in 1 instruction that never fails (excepting when the interrupt message takes an unrecoverable ECC failure between core and table.)
Even with C programming,
there are so many things that could be made more efficient with a bit of
interesting hardware. (I say this with little knowledge of the
complications implied.) A lot of time in C code is spend in memory
allocation work - that's got to be a prime candidate for hardware
acceleration, especially if we can get away from the brutish malloc/free
approach.
And then C++ goes all 'new' on using memory...
(Stack-based allocators are one possibility for a lot of
allocations.)
There could be hardware support for threading, locking,
and inter-process communication.
My 66000 can switch threads in a single instruction.
My 66000 ESM provides unrealized synchronization capabilities.
My 66000 Interrupt tables provide single instruction message send and
single instruction message receives.
SW determines what the messages mean.
Separate data stacks and return stacks
would make things faster and more secure.
Already present. But in addition, My 66000 DRAM controller is free from RowHammer-like attack vectors--By Architecture of the memory hierarchy.
Plus, the code sees a 64-bit Virtual Address Space, while the system has
four 64-bit physical address spaces. The spaces are used to determine
the consistency model:: {
Cacheable DRAM is causally ordered and coherent and cached
unCacheable DRAM is sequentially consistent incoherent
ROM is unordered incoherent but cached
MMI/O is sequentially consistent incoherent
Config is strongly ordered incoherent
}
Fat pointers that can track
access modes and range limits, at least in common cases, would aid
reliability and security.
A bit Too "Cheri" for me.
Possibly the biggest millstone around the neck of computing
architectures is the C language. For every processor that is not just
for highly niche code (like gpus), what matters is how fast C code can
run on it. Most other languages either use a similar model, or run on
VMs written in C. Why bother with support for multiple stacks, or other >interesting hardware innovations, if it doesn't support faster C? With
all due respect to Anton and other Forth enthusiasts, "fastest Forth >benchmarks" is not going to attract much investment money.
I'd love to see new architectures and new hardware features that are >genuinely different, but they rarely turn up.
Even with C programming,
there are so many things that could be made more efficient with a bit of >interesting hardware. (I say this with little knowledge of the >complications implied.) A lot of time in C code is spend in memory >allocation work - that's got to be a prime candidate for hardware >acceleration, especially if we can get away from the brutish malloc/free >approach. (Stack-based allocators are one possibility for a lot of >allocations.)
There could be hardware support for threading, locking,
and inter-process communication.
Separate data stacks and return stacks
would make things faster and more secure.
The early transputers were fast for their time, e.g., with the T414 at
up to 20MHz (and single-cycle instruction execution) in 1985, while
the 80386 was introduced in 1985 at 12.5MHz (with at least two cycles
per instruction), and the MIPS R2000 introduced in 1986 was available
at up to 15MHz.
Another architectural feature: One might think that tagging support
would help dynamically typed programming languages (e.g., Lisp), and
SPARC contains some support for that, but as one of the IIRC Franz Lisp developers has explained in this newsgroup, they actually did not use
this feature, because the performance benefit was not big enough to
justify the complications of modifying their tagging architecture to
make use of that. However, in recent years AMD, ARM, and Intel have
added features to ignore the top (7,8, or 16) bits in every address (how
many depends on the feature and the selected variant of the feature), probably to support pointer tagging in such programming languages. I am
sure that no C need is behind this feature addition.
I'd love to see new architectures and new hardware features that are >>genuinely different, but they rarely turn up.
I see lots of architectural features that are not or badly supported by
C, and so obviously are not designed for C.
once upon a time Sun boasted architectural features to support
Java. AFAIK these were features to improve the indirect-branch
performance in the Java VM interpreter, to improve the startup
performance.
Admittedly, SIMD has existed for more than 50 years, so it's not
a new architectural feature, but the fact that it has been added
to many architectures after C became prominent is another
indication that architects do not restrain themselves to things
that C supports.
David Brown <david.brown@hesbynett.no> posted:
-----------------------
Of course, the challenge here is that programming is significantly
different from what we are used to, and you need a new type of OS as
well as new applications - and ideally, new programming languages.
That's a lot of big hurdles to clear even if the result is theoretically
more efficient. (And you still need to keep the fast single-threaded
cpus, and the fast SIMD / vector processing systems, for other kinds of
tasks.)
Possibly the biggest millstone around the neck of computing
architectures is the C language.
C is not an albatross !! it is a standard to which one designs--
exactly like air (our atmosphere) provides the standard to which
airplanes have to be designed.
C ended up being this model because its floor supports almost all other programming languages: certainly {Fortran, C++, Algol68, Pascal, Jovial,
...} and is not all that bad when doing {LISP, RPG, COBOL, APL, Snowbal}.
For every processor that is not just
for highly niche code (like gpus), what matters is how fast C code can
run on it. Most other languages either use a similar model, or run on
VMs written in C.
You state that like it was a BAD thing--it is not. I just we had all chosen the same standards at which to design {BE or LE} is like having the steering wheel on the {left or right}, ...
Why bother with support for multiple stacks, or other
interesting hardware innovations, if it doesn't support faster C?
I can argue that having 2 stacks {one for the preserved state from
caller to callee, the other for data} enables ever so slightly faster
C--but that is not the point--the point is robustness in the face of
threats (buffer overruns, ROP, malicious use of memory).
The speed advantage is by knowing that registers written to the call-
stack do not need to be written to L2 (or farther) if RET has occurred
and the cache line replaced. Saves a trifling of power, too. These is
another advantage is when an EXIT instruction is still reading from
stack and an ENTER instruction starts writing to the stack. WE taught
the compiler a prescribed order to utilize the registers, so that when
an EXIT is running and an ENTER is decoded, the EXIT can be short
circuited and some of the ENTER short circuited, eliding cycle waste.
With
all due respect to Anton and other Forth enthusiasts, "fastest Forth
benchmarks" is not going to attract much investment money.
I'd love to see new architectures and new hardware features that are
genuinely different, but they rarely turn up.
My 66000 is replete with those features--and it is argued here daily
that it (my 66000 ISA) has gone too far !!
[Rbase+Rindex<<scale+DISP] is more than most would allow. Yet with universal Constants, a single memory reference can access anywhere in memory at any time.
Jump-Through-Table (switch) making PIC standard; while making the tables smaller {1/8th to 1/4th}
Load IP instructions (CALX, JMPX, CALA, JMPA) enable control transfers directly through GOT (or other SW table).
Multi-line multi-instruction ATOMIC sequences freely available to SW.
Transcendental instructions that take FDIV number of cycles.
Context Switches performed without instruction execution--as if the
state of a thread was treated like a write-back cache.
Interrupt tables that can be used as a low level scheduler built into
the priority (and privilege) model with support for vVMs monitoring vMs
One can schedule an DPC/sofIRQ in 1 instruction that never fails (excepting when the interrupt message takes an unrecoverable ECC failure between core and table.)
Even with C programming,
there are so many things that could be made more efficient with a bit of
interesting hardware. (I say this with little knowledge of the
complications implied.) A lot of time in C code is spend in memory
allocation work - that's got to be a prime candidate for hardware
acceleration, especially if we can get away from the brutish malloc/free
approach.
And then C++ goes all 'new' on using memory...
(Stack-based allocators are one possibility for a lot of
allocations.)
There could be hardware support for threading, locking,
and inter-process communication.
My 66000 can switch threads in a single instruction.
My 66000 ESM provides unrealized synchronization capabilities.
My 66000 Interrupt tables provide single instruction message send and
single instruction message receives.
SW determines what the messages mean.
Separate data stacks and return stacks
would make things faster and more secure.
Already present. But in addition, My 66000 DRAM controller is free from RowHammer-like attack vectors--By Architecture of the memory hierarchy.
Plus, the code sees a 64-bit Virtual Address Space, while the system has
four 64-bit physical address spaces. The spaces are used to determine
the consistency model:: {
Cacheable DRAM is causally ordered and coherent and cached
unCacheable DRAM is sequentially consistent incoherent
ROM is unordered incoherent but cached
MMI/O is sequentially consistent incoherent
Config is strongly ordered incoherent
}
Fat pointers that can track
access modes and range limits, at least in common cases, would aid
reliability and security.
A bit Too "Cheri" for me.
On Fri, 19 Jun 2026 06:02:16 GMT, Anton Ertl wrote:
Another architectural feature: One might think that tagging support
would help dynamically typed programming languages (e.g., Lisp), and
SPARC contains some support for that, but as one of the IIRC Franz Lisp developers has explained in this newsgroup, they actually did not use
this feature, because the performance benefit was not big enough to
justify the complications of modifying their tagging architecture to
make use of that. However, in recent years AMD, ARM, and Intel have
added features to ignore the top (7,8, or 16) bits in every address (how many depends on the feature and the selected variant of the feature), probably to support pointer tagging in such programming languages. I am sure that no C need is behind this feature addition.
The architectural support for tagging in SPARC only avoided the need to
untag and tag integers in compiled code.
David Ungar's thesis on SOAR provided measurements of the impact of this
on benchmarks for Smalltalk.
The layout of having the tags in the bottom 2 bits of a 32 bit word works fine without architectural support, being able to turn on traps for unaligned data access helps though.
In 64 bit machines you can use the bottom three bits for Lisp tags but SPARC64 didn't provide instructions to work with this.
Franz Lisp doesn't use tags at all and only ran on VAX and 68k.
In previous discussions, I had tried to press Mitch to see if he could remember what kind of benchmarks they had run on the 88100 that showed
it running Lisp faster than SPARC.
To me, the old SPEC li benchmark was a test of the speed of an interpreter written in C and doesn't say anything useful about how well a system would run Lisp that had been compiled to machine code.
There were well known (non SPEC) Lisp benchmarks at the time.
I'd love to see new architectures and new hardware features that are >>genuinely different, but they rarely turn up.
I see lots of architectural features that are not or badly supported by
C, and so obviously are not designed for C.
The one architectural feature of Lisp Machines that I don't think has been carried forward was a multi-way switch instruction.
The rest of the MIT Lisp Machine microarchitecture was just a pipelined, three address, load/store one that provides another data point for the discussion from a few months ago on whether VAX could have been a RISC
using TTL chips.
MitchAlsup wrote:
David Brown <david.brown@hesbynett.no> posted:
-----------------------
Of course, the challenge here is that programming is significantly
different from what we are used to, and you need a new type of OS
as well as new applications - and ideally, new programming
languages. That's a lot of big hurdles to clear even if the result
is theoretically more efficient. (And you still need to keep the
fast single-threaded cpus, and the fast SIMD / vector processing
systems, for other kinds of tasks.)
Possibly the biggest millstone around the neck of computing
architectures is the C language.
C is not an albatross !! it is a standard to which one designs--
exactly like air (our atmosphere) provides the standard to which
airplanes have to be designed.
C ended up being this model because its floor supports almost all
other programming languages: certainly {Fortran, C++, Algol68,
Pascal, Jovial, ...} and is not all that bad when doing {LISP, RPG,
COBOL, APL, Snowbal}.
For every processor that is not
just for highly niche code (like gpus), what matters is how fast C
code can run on it. Most other languages either use a similar
model, or run on VMs written in C.
You state that like it was a BAD thing--it is not. I just we had
all chosen the same standards at which to design {BE or LE} is like
having the steering wheel on the {left or right}, ...
Why bother with support for multiple stacks,
or other interesting hardware innovations, if it doesn't support
faster C?
I can argue that having 2 stacks {one for the preserved state from
caller to callee, the other for data} enables ever so slightly
faster C--but that is not the point--the point is robustness in the
face of threats (buffer overruns, ROP, malicious use of memory).
The speed advantage is by knowing that registers written to the
call- stack do not need to be written to L2 (or farther) if RET has occurred and the cache line replaced. Saves a trifling of power,
too. These is another advantage is when an EXIT instruction is
still reading from stack and an ENTER instruction starts writing to
the stack. WE taught the compiler a prescribed order to utilize the registers, so that when an EXIT is running and an ENTER is decoded,
the EXIT can be short circuited and some of the ENTER short
circuited, eliding cycle waste.
With >> all due respect to Anton and other Forth enthusiasts, "fastest
Forth benchmarks" is not going to attract much investment money.
I'd love to see new architectures and new hardware features that
are genuinely different, but they rarely turn up.
My 66000 is replete with those features--and it is argued here daily
that it (my 66000 ISA) has gone too far !!
[Rbase+Rindex<<scale+DISP] is more than most would allow. Yet with universal Constants, a single memory reference can access anywhere
in memory at any time.
Nice to have.
Jump-Through-Table (switch) making PIC standard; while making the
tables smaller {1/8th to 1/4th}
I used this idea, in its extreme form when my 486 Word Count code had
the state variable in BL and loaded the next byte into BH: At this
point I could jump directly to the code BX was pointing to, so a
256*number of main states (=2, inside or outside a word) => a
512-entry jump table.
When the Pentium turned up a few years later, branching got
relatively even costlier, so I got rid of every branch inside the
256-byte main processing loop.
Load IP instructions (CALX, JMPX, CALA, JMPA) enable control
transfers directly through GOT (or other SW table).
Also nice to have.
Multi-line multi-instruction ATOMIC sequences freely available to
SW.
Transcendental instructions that take FDIV number of cycles.
:-)
Context Switches performed without instruction execution--as if the
state of a thread was treated like a write-back cache.
Interrupt tables that can be used as a low level scheduler built
into the priority (and privilege) model with support for vVMs
monitoring vMs One can schedule an DPC/sofIRQ in 1 instruction that
never fails (excepting when the interrupt message takes an
unrecoverable ECC failure between core and table.)
Even with C
programming, there are so many things that could be made more
efficient with a bit of interesting hardware. (I say this with
little knowledge of the complications implied.) A lot of time in
C code is spend in memory allocation work - that's got to be a
prime candidate for hardware acceleration, especially if we can
get away from the brutish malloc/free approach.
And then C++ goes all 'new' on using memory...
(Stack-based allocators are one possibility for a lot
of allocations.)
There could be hardware support for threading,
locking, and inter-process communication.
My 66000 can switch threads in a single instruction.
My 66000 ESM provides unrealized synchronization capabilities.
My 66000 Interrupt tables provide single instruction message send
and single instruction message receives.
SW determines what the messages mean.
Separate data stacks and return
stacks would make things faster and more secure.
Already present. But in addition, My 66000 DRAM controller is free
from RowHammer-like attack vectors--By Architecture of the memory hierarchy.
Plus, the code sees a 64-bit Virtual Address Space, while the
system has four 64-bit physical address spaces. The spaces are used
to determine the consistency model:: {
Cacheable DRAM is causally ordered and coherent and cached
unCacheable DRAM is sequentially consistent incoherent
ROM is unordered incoherent but cached
MMI/O is sequentially consistent incoherent
Config is strongly ordered incoherent
}
Fat pointers that can
track access modes and range limits, at least in common cases,
would aid reliability and security.
A bit Too "Cheri" for me.
:-)
The Mill is probably the closest to Cheri that is still in active development.
Terje
On Fri, 19 Jun 2026 16:16:05 +0200
Terje Mathisen <terje.mathisen@tmsw.no> wrote:
MitchAlsup wrote:
David Brown <david.brown@hesbynett.no> posted:
Fat pointers that can
track access modes and range limits, at least in common cases,
would aid reliability and security.
A bit Too "Cheri" for me.
:-)
The Mill is probably the closest to Cheri that is still in active
development.
Terje
Did not Ivan said that he likes capabilities, but decided that Mill
already has too many innovative concepts as it goes, Including
capabilities would be too much.
Possibly the biggest millstone around the neck of computing
architectures is the C language. ...
De-facto standards are /always/ albatrosses to some extent. Things are
done that way because things are done that way - processors are designed
to run C (or C-model languages, if you like) because that's what
existing code is written in, and code is written in C (or similar
languages, or languages with a VM written in C) because that's how
existing processors work.
According to David Brown <david.brown@hesbynett.no>:
Possibly the biggest millstone around the neck of computing
architectures is the C language. ...
De-facto standards are /always/ albatrosses to some extent. Things are
done that way because things are done that way - processors are designed
to run C (or C-model languages, if you like) because that's what
existing code is written in, and code is written in C (or similar
languages, or languages with a VM written in C) because that's how
existing processors work.
C killed off every memory model other than flat byte addressed memory. Pointers are sort of typed, but any real C program does stuff like this:
p = (struct foo *) malloc(42 * sizeof(struct foo));
or worse
typedef struct { // string with explicit length
int len:
char str[0];
} varstr;
varstr *p;
char *s = "swordfish";
// initialize p from s
p = (varstr *)malloc(sizeof(varstr)+strlen(s));
p->len = strlen(s);
strncpy(p->str, s, p->len);
so in practice pointers all have to be pointers to bytes or something
that can losslessly be converted to and from them.
This evolution was certainly helped along by the horrible implementaton
of segmented memory in the Intel 8086 and 286, which persuaded people
that segments are a plague to be avoided rather than a tool to make
programs more reliable.
The Mill is probably the closest to Cheri that is still in active development.
On Fri, 19 Jun 2026 16:16:05 +0200---------------------
Terje Mathisen <terje.mathisen@tmsw.no> wrote:
MitchAlsup wrote:
Transcendental instructions that take FDIV number of cycles.
:-)
"FDIV number of cycles" is a moving target. Mitch has a tendency of
using "Opteron" as his measurement stick. The question of what is
"number of cycles" is also not obvious. Single or double precision?
Latency or throughput?
Apple has single-cycle FDIV throughput since ~2019. That applies to
both scalar and 128bit SIMD variants of instruction.
So, for single-precision vector variant the throughput is 4 FDIV per
clock.
Intel has 4-cycle SP FDIV throughput (or 5-cycle by other sources) for 256-bit vectors since 2015. That's 2 SP FDIV per clock.
AMD started with 6-cycle 256-bit SP FDIV on Zen1.
It progressed to 3-3.5 cycles on Zen 2/3/4. Then on Zen5 they
progressed to 3 cycles per 512bit vector register. So, by now they are
at 5.33 SP FDIV per clock - ahead of Apple of 6 years ago.
I don't know where Apple stands right now.
Somehow, I suspect that when Mitch says that his transcendental
instructions "take FDIV number of cycles" he does not mean that he can
run 5.33 transcendental instructions per clock.
Against DP rather than SP and latency rather than throughput, Mitch's
claim is probably closer to reality. But still...
Apple of 6 years ago had latency of DP FDIV = 10.
AMD has worst case latency = 13 since Zen1 (best latency used to be
faster, but on newer chips worst and best are the same).
Intel has worst case DP FDIV latency = 14 since Ivy Bridge (2012-04).
That's probably close to the date when Mitch started to consider His
66000.
Fat pointers that can
track access modes and range limits, at least in common cases,
would aid reliability and security.
A bit Too "Cheri" for me.
:-)
The Mill is probably the closest to Cheri that is still in active development.
Terje
Did not Ivan said that he likes capabilities, but decided that Mill
already has too many innovative concepts as it goes, Including
capabilities would be too much.
Also, claiming that Mill is still in active development sounds to me as--- Synchronet 3.22a-Linux NewsLink 1.2
a stretch of the word "active".
De-facto standards are /always/ albatrosses to some extent. Things are
done that way because things are done that way - processors are designed
to run C (or C-model languages, if you like) because that's what
existing code is written in, and code is written in C (or similar
languages, or languages with a VM written in C) because that's how
existing processors work.
This is not necessarily a bad thing - it lets everyone get stuff done.
But it means that we are stuck on a local maxima. If there is a better
way out there somewhere, it would be a long and arduous journey to get >there.
As an interesting thought experiment, let's assume that a vast
amount of memory is available with access times better than
SRAM (let's suppose 1-cycle for the purposes of this thread).
Would registers even be needed in such an architecture?
On Thu, 18 Jun 2026 14:39:07 GMT, scott@slp53.sl.home (Scott Lurndal)
wrote:
As an interesting thought experiment, let's assume that a vast
amount of memory is available with access times better than
SRAM (let's suppose 1-cycle for the purposes of this thread).
Would registers even be needed in such an architecture?
Back when logic and memory were more evenly matched, computers still
had one register - the accumulator. And instructions basically did
arithmetic between memory and the accumulator. Of course,
memory-to-memory operations were also possible without even an
accumulator.
But since memory isn't likely to get that fast, it's not really useful
to think of how to design for something that can't happen.
On 6/19/2026 11:59 AM, John Levine wrote:
According to David Brown <david.brown@hesbynett.no>:
Possibly the biggest millstone around the neck of computing
architectures is the C language. ...
De-facto standards are /always/ albatrosses to some extent. Things are
done that way because things are done that way - processors are designed >>> to run C (or C-model languages, if you like) because that's what
existing code is written in, and code is written in C (or similar
languages, or languages with a VM written in C) because that's how
existing processors work.
C killed off every memory model other than flat byte addressed memory.
Pointers are sort of typed, but any real C program does stuff like this:
p = (struct foo *) malloc(42 * sizeof(struct foo));
Fwiw, why all of the casts?
On Fri, 19 Jun 2026 09:24:43 +0200, David Brown
<david.brown@hesbynett.no> wrote:
De-facto standards are /always/ albatrosses to some extent. Things are >done that way because things are done that way - processors are designed >to run C (or C-model languages, if you like) because that's what
existing code is written in, and code is written in C (or similar >languages, or languages with a VM written in C) because that's how >existing processors work.
This is not necessarily a bad thing - it lets everyone get stuff done.
But it means that we are stuck on a local maxima. If there is a better >way out there somewhere, it would be a long and arduous journey to get >there.
I might well be willing to concede that C does have its flaws. But
these are flaws it has _as a programming language_, and not flaws that
have affected the design of computers. Why do I say this?
Because while C, as a procedural language, took a lot from its
predecessors - its punctuation came from PL/I
- it was designed to not
be too different from the underlying hardware. It had pointers to
memory,
which most programming languages up until then did not bother
with, in order to be able to substitute for assembler language.
If, instead, a "better" language, like ALGOL
or Pascal, had become the "standard", we might have ended up with computers like the Burroughs
B6500 or the Intel 432.
That would indeed have been a situation where computers were less efficient and powerful because they were designed
around the peculiarities of the most-used higher-level language.
John Savard--- Synchronet 3.22a-Linux NewsLink 1.2
quadibloc@invalid.com (John Savard) writes:
On Thu, 18 Jun 2026 14:39:07 GMT, scott@slp53.sl.home (Scott Lurndal) >wrote:
As an interesting thought experiment, let's assume that a vast
amount of memory is available with access times better than
SRAM (let's suppose 1-cycle for the purposes of this thread).
Would registers even be needed in such an architecture?
Back when logic and memory were more evenly matched, computers still
had one register - the accumulator. And instructions basically did >arithmetic between memory and the accumulator. Of course,
memory-to-memory operations were also possible without even an
accumulator.
And some computers in those days simply used memory to memory operations exclusively without needing an accumulator.
Memory superscaler/OoO require a ROB that works with addresses rather than registers (perhaps CAM based); the size of the ROB is still limited
to the degree of OoO.
That noted, it seems to me that if access to all of memory costs
the same as access to a register, the need for OoO support in
the core becomes less interesting when the normal delays for which instruction-level parallism helps don't apply
(e.g. cache misses, NUMA latency, etc).
But since memory isn't likely to get that fast, it's not really useful
to think of how to design for something that can't happen.
Never say never.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 6/19/2026 11:59 AM, John Levine wrote:
According to David Brown <david.brown@hesbynett.no>:
Possibly the biggest millstone around the neck of computing
architectures is the C language. ...
De-facto standards are /always/ albatrosses to some extent. Things are >>> done that way because things are done that way - processors are designed >>> to run C (or C-model languages, if you like) because that's what
existing code is written in, and code is written in C (or similar
languages, or languages with a VM written in C) because that's how
existing processors work.
C killed off every memory model other than flat byte addressed memory.
Pointers are sort of typed, but any real C program does stuff like this: >>
p = (struct foo *) malloc(42 * sizeof(struct foo));
Fwiw, why all of the casts?
C and C++ handle void* conversions differently. You must cast
the malloc result to a pointer of the declared type when using C++.
It doesn't hurt to add the cast in C, and may help with documenting
the intention of the programmer who wrote the code.
quadibloc@invalid.com (John Savard) posted:
On Fri, 19 Jun 2026 09:24:43 +0200, David Brown
<david.brown@hesbynett.no> wrote:
De-facto standards are /always/ albatrosses to some extent. Things are
done that way because things are done that way - processors are designed >>> to run C (or C-model languages, if you like) because that's what
existing code is written in, and code is written in C (or similar
languages, or languages with a VM written in C) because that's how
existing processors work.
This is not necessarily a bad thing - it lets everyone get stuff done.
But it means that we are stuck on a local maxima. If there is a better
way out there somewhere, it would be a long and arduous journey to get
there.
I might well be willing to concede that C does have its flaws. But
these are flaws it has _as a programming language_, and not flaws that
have affected the design of computers. Why do I say this?
Because while C, as a procedural language, took a lot from its
predecessors - its punctuation came from PL/I
- it was designed to not
be too different from the underlying hardware.
PL/1s most useful memory trick was using an area. So, one could 'malloc'
a bunch of data, and then free it all with one free! Nothing prevents C
from doing this, but C++ has new and new is not compatible with area.
If, instead, a "better" language, like ALGOL
Algol was ruined with its parameter passing in 'thunks' and strict
1-file compilation.
or Pascal, had become the "standard", we might have ended up with
computers like the Burroughs B6500 or the Intel 432.
That is one bullet we dodged!!
"When your Register file is as big as your cache,
register access will be as slow as your cache" Andy Glew.
In article <2026Jun19.080216@mips.complang.tuwien.ac.at>, >anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
once upon a time Sun boasted architectural features to support
Java. AFAIK these were features to improve the indirect-branch
performance in the Java VM interpreter, to improve the startup
performance.
Did this become obsolete when Java runtime environments switched to
JITing to native code?
They don't, but they don't do a good job of making those features usable >either. Support for new instructions is readily provided via intrinsics,
but those aren't portable.
Robert Swindells <rjs@fdy2.co.uk> posted:
In previous discussions, I had tried to press Mitch to see if he could
remember what kind of benchmarks they had run on the 88100 that showed
it running Lisp faster than SPARC.
M88K shift instructions could perform extracts, whereas SPARC had to use
2 shifts to perform an extract; indexing was scaled:: both helped interpreters.
On 2026-06-19 18:16, Michael S wrote:The most Cheri-like feature of the Mill is probably the hardware byte granularity security, with the user ability to create subsets in size
On Fri, 19 Jun 2026 16:16:05 +0200
Terje Mathisen <terje.mathisen@tmsw.no> wrote:
MitchAlsup wrote:
David Brown <david.brown@hesbynett.no> posted:
[snip]
Fat pointers that can
track access modes and range limits, at least in common cases,
would aid reliability and security.
A bit Too "Cheri" for me.
:-)
The Mill is probably the closest to Cheri that is still in active
development.
Terje
Did not Ivan said that he likes capabilities, but decided that Mill
already has too many innovative concepts as it goes, Including
capabilities would be too much.
As I recall, Ivan used to say that he knew how to /build/ a capability > machine, but did not know how to /sell/ it. I believe he meant that such
a machine would not run "normal" C/C++ code, at least not very well,
which would turn away many potential customers.
On 6/19/2026 7:16 AM, Terje Mathisen wrote:
[...]
The Mill is probably the closest to Cheri that is still in active
development.
How close are you guys to making a Mill processor?
The architectural support for tagging in SPARC only avoided the need to
untag and tag integers in compiled code.
David Ungar's thesis on SOAR provided measurements of the impact of this
on benchmarks for Smalltalk.
The layout of having the tags in the bottom 2 bits of a 32 bit word works >fine without architectural support, being able to turn on traps for >unaligned data access helps though.
Franz Lisp doesn't use tags at all and only ran on VAX and 68k.
To me, the old SPEC li benchmark was a test of the speed of an interpreter >written in C and doesn't say anything useful about how well a system would >run Lisp that had been compiled to machine code.
The one architectural feature of Lisp Machines that I don't think has been >carried forward was a multi-way switch instruction.
The rest of the MIT Lisp Machine microarchitecture was just a pipelined, >three address, load/store one that provides another data point for the >discussion from a few months ago on whether VAX could have been a RISC
using TTL chips.
On 19/06/2026 00:37, MitchAlsup wrote:
David Brown <david.brown@hesbynett.no> posted:
-----------------------
Fat pointers that can track
access modes and range limits, at least in common cases, would aid
reliability and security.
A bit Too "Cheri" for me.
I think "Cheri" took it too far. I believe there is scope for tagging a bit of information onto pointers without trying to do everything.
I also think a lot can be done on the side of programming languages and tools, which could catch far more possible pointer mistakes. That won't stop the bad guys, of course, but I think more bad accesses are from
bugs than hackers.
- [C] was designed to not
be too different from the underlying hardware.
The underlying hardware *of that time*. Therefore, it may have
contributed to "locking in" that style of hardware. But I do not pretend
to know that a different style of hardware would be better today.
Algol was ruined with its parameter passing in 'thunks' and strict
1-file compilation.
The 1-file compilation was an implementation issue. For example,
Burroughs Algol supported (and no doubt still supports) separate
compilation of subprograms. (IIRC, even the paper-tape-based HP Algol
for the HP2100 series did that.) Algol can do pass-by-value, and the >alternative pass-by-name method could have been deprecated and removed
as the language evolved, or reduced to pass-by-reference, if it was
judged to be an obstruction.
or Pascal, had become the "standard", we might have ended up with
computers like the Burroughs B6500 or the Intel 432.
That is one bullet we dodged!!
On Fri, 19 Jun 2026 14:34:10 GMT, MitchAlsup wrote:
Robert Swindells <rjs@fdy2.co.uk> posted:
In previous discussions, I had tried to press Mitch to see if he could
remember what kind of benchmarks they had run on the 88100 that showed
it running Lisp faster than SPARC.
M88K shift instructions could perform extracts, whereas SPARC had to use
2 shifts to perform an extract; indexing was scaled:: both helped
interpreters.
Production Lisp environments are not interpreters, even back then.
jgd@cix.co.uk (John Dallman) writes:
In article <2026Jun19.080216@mips.complang.tuwien.ac.at>,
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
once upon a time Sun boasted architectural features to support
Java. AFAIK these were features to improve the indirect-branch
performance in the Java VM interpreter, to improve the startup
performance.
Did this become obsolete when Java runtime environments switched to
JITing to native code?
I don't remember in which SPARC this feature was added, but IIRC it
was after the introduction of the HotSpot Java VM implementation.
Note that HotSpot uses an interpreter on startup and on the cold
methods, and only compiles hot methods to native code after executing
them for a while and collecting execution statistics.
They don't, but they don't do a good job of making those features usable
either. Support for new instructions is readily provided via intrinsics,
but those aren't portable.
Yes. But with a programming language like C, what is the alternative?
GNU C supports a vector extension; I don't know how fast Intel added
support for AVX, AVX2, and the various AVX-512 variants to gcc and
llvm (which also supports this extension); certainly recent gcc and
clang versions use AVX-512 if you press the right buttons.
Fortran supports the array sublanguage which AFAIU makes vectorization
easy within an expression. But as Thomas Koenig tells us, gcc's
Fortran front end translates that into scalar code and relies on auto-vectorization in the back end to produce vectorized code.
Intel's Fortran compiler ifort has been replaced by something that
uses IIRC LLVM as back end.
And finally we have auto-vectorization, where it is a matter of luck
whether the scalar code is vectorized or not (e.g., I have code that
one compiler auto-vectorizes with -Os, but not -O3, and another
compiler that autovectorizes with -O3, but not -Os).
On Fri, 19 Jun 2026 14:34:10 GMT, MitchAlsup wrote:
Robert Swindells <rjs@fdy2.co.uk> posted:
In previous discussions, I had tried to press Mitch to see if he could
remember what kind of benchmarks they had run on the 88100 that showed
it running Lisp faster than SPARC.
M88K shift instructions could perform extracts, whereas SPARC had to use
2 shifts to perform an extract; indexing was scaled:: both helped
interpreters.
Production Lisp environments are not interpreters, even back then.
PL/1s most useful memory trick was using an area. So, one could 'malloc'
a bunch of data, and then free it all with one free!
Nothing prevents C from doing this, but C++ has new and new is not >compatible with area.
On 6/19/2026 2:24 AM, David Brown wrote:--------------------------
On 19/06/2026 00:37, MitchAlsup wrote:
A bit Too "Cheri" for me.
I think "Cheri" took it too far. I believe there is scope for tagging a bit of information onto pointers without trying to do everything.
I also think a lot can be done on the side of programming languages and tools, which could catch far more possible pointer mistakes. That won't stop the bad guys, of course, but I think more bad accesses are from
bugs than hackers.
Agreed, this is a route I experimented with.
A basic bounds-checking mechanism can help with debugging and security.
One option here is, say, using pointer tagging bits to encode
bounds-check information and then have the compiler emit instructions to detect (roughly) when an access has gone out-of-bounds.
Extending it to the scope CHERI did adds new problems:
Adds significant implementation overhead;
Interferes with C programming practices;
And with a glaring weakness:
By its design, it is theoretically incapable of by itself forming a
sandbox capable of stopping actively hostile code.
It *could* still make it a PITA for human programmers to break out of,
but if a determined human programmer (or an AI assisted one) could put
in the work and break out of it via convoluted pointer de-referencing
(and if this break-ability is likely necessary for things like the C
runtime to be able to work), this is a weak point.
And, if it can't lock down security against actively hostile code, then
its more heavy-handed aspects are no longer justifiable.
Meanwhile, if the task is subdivided, some similar benefits can be
realized more cheaply:
Bounds checked pointers to trap on out-of-bounds access;
ASLR to make it much harder for shell-code to know where anything is;
Tagging to make it harder to stomp the link register;
Any attack attempt will need to know an N-bit magic number.
On 6/20/2026 9:25 AM, Robert Swindells wrote:
On Fri, 19 Jun 2026 14:34:10 GMT, MitchAlsup wrote:
Robert Swindells <rjs@fdy2.co.uk> posted:
In previous discussions, I had tried to press Mitch to see if he could >>> remember what kind of benchmarks they had run on the 88100 that showed >>> it running Lisp faster than SPARC.
M88K shift instructions could perform extracts, whereas SPARC had to use >> 2 shifts to perform an extract; indexing was scaled:: both helped
interpreters.
Production Lisp environments are not interpreters, even back then.
Lisp is a funny language:
Big promises in the design;
But, only deliver them poorly (and can't improve on the delivery of any given thing without eroding the original promises).
Simplicity and elegance of an interpreter,
But only if already operating within a Lisp environment...
Clean and elegant syntax,
That actually sucks real hard to use in practice.
Performance, but only if compiled to something else...
A naive Lisp interpreter being almost the slowest style of interpreter...
On Fri, 19 Jun 2026 06:02:16 GMT, Anton Ertl wrote:[...]
Another architectural feature: One might think that tagging support
would help dynamically typed programming languages (e.g., Lisp), and
SPARC contains some support for that, but as one of the IIRC Franz Lisp
developers has explained in this newsgroup, they actually did not use
this feature, because the performance benefit was not big enough to
Franz Lisp doesn't use tags at all and only ran on VAX and 68k.
BGB <cr88192@gmail.com> posted:
On 6/20/2026 9:25 AM, Robert Swindells wrote:
On Fri, 19 Jun 2026 14:34:10 GMT, MitchAlsup wrote:
Robert Swindells <rjs@fdy2.co.uk> posted:
In previous discussions, I had tried to press Mitch to see if he could >>>>> remember what kind of benchmarks they had run on the 88100 that showed >>>>> it running Lisp faster than SPARC.
M88K shift instructions could perform extracts, whereas SPARC had to use >>>> 2 shifts to perform an extract; indexing was scaled:: both helped
interpreters.
Production Lisp environments are not interpreters, even back then.
Lisp is a funny language:
Big promises in the design;
But, only deliver them poorly (and can't improve on the delivery of any
given thing without eroding the original promises).
Simplicity and elegance of an interpreter,
But only if already operating within a Lisp environment...
Clean and elegant syntax,
That actually sucks real hard to use in practice.
Performance, but only if compiled to something else...
A naive Lisp interpreter being almost the slowest style of interpreter...
Given that one can create a data structure in LISP, and then execute it;
how would you do this without an interpreter or a JIT ??
BGB <cr88192@gmail.com> posted:
On 6/19/2026 2:24 AM, David Brown wrote:--------------------------
On 19/06/2026 00:37, MitchAlsup wrote:
A bit Too "Cheri" for me.
I think "Cheri" took it too far. I believe there is scope for tagging a >>> bit of information onto pointers without trying to do everything.
I also think a lot can be done on the side of programming languages and
tools, which could catch far more possible pointer mistakes. That won't >>> stop the bad guys, of course, but I think more bad accesses are from
bugs than hackers.
Agreed, this is a route I experimented with.
A basic bounds-checking mechanism can help with debugging and security.
One option here is, say, using pointer tagging bits to encode
bounds-check information and then have the compiler emit instructions to
detect (roughly) when an access has gone out-of-bounds.
How do you do this with 64-bit registers and a 64-bit virtual address space ??!!
Extending it to the scope CHERI did adds new problems:
Adds significant implementation overhead;
Interferes with C programming practices;
Which means it is not going to fly.....
...
And with a glaring weakness:
By its design, it is theoretically incapable of by itself forming a
sandbox capable of stopping actively hostile code.
It *could* still make it a PITA for human programmers to break out of,
but if a determined human programmer (or an AI assisted one) could put
in the work and break out of it via convoluted pointer de-referencing
(and if this break-ability is likely necessary for things like the C
runtime to be able to work), this is a weak point.
And, if it can't lock down security against actively hostile code, then
its more heavy-handed aspects are no longer justifiable.
Meanwhile, if the task is subdivided, some similar benefits can be
realized more cheaply:
Bounds checked pointers to trap on out-of-bounds access;
ASLR to make it much harder for shell-code to know where anything is;
With pure PIC coding practices, and the link-pointer being stored directly onto the call stack (not the data stack), one has no particular reason to know where they currently are (IP) and few ways of seeing where they are.
Tagging to make it harder to stomp the link register;
Put it somewhere it can't be stomped on !! like in memory on a page the application has no access permissions.
Any attack attempt will need to know an N-bit magic number.
No need for a number, it cannot be accessed outside of calling and returning. {{Which is not being done with a series of instructions--but by 1 designed for the task at hand}}
Instead, C presents a programming model way down at the vonNeumann level::
1 unit of work (step) at a time.
or Pascal, had become the "standard", we might have ended up with
computers like the Burroughs B6500 or the Intel 432.
That is one bullet we dodged!!
I doubt it. Several parallel strands of RISC research independently found that moving complexity from the hardware into the compiler made computers faster and cheaper. IBM's PL.8 compiler had excellent error checking even though it was originally targeted at the RISC 801, but somehow people always want to turn off the error checks in the production build of their code.
BGB <cr88192@gmail.com> posted:
On 6/20/2026 9:25 AM, Robert Swindells wrote:
On Fri, 19 Jun 2026 14:34:10 GMT, MitchAlsup wrote:Lisp is a funny language:
Robert Swindells <rjs@fdy2.co.uk> posted:
In previous discussions, I had tried to press Mitch to see if he
could remember what kind of benchmarks they had run on the 88100
that showed it running Lisp faster than SPARC.
M88K shift instructions could perform extracts, whereas SPARC had to
use 2 shifts to perform an extract; indexing was scaled:: both
helped interpreters.
Production Lisp environments are not interpreters, even back then.
Big promises in the design;
But, only deliver them poorly (and can't improve on the delivery of any
given thing without eroding the original promises).
Simplicity and elegance of an interpreter,
But only if already operating within a Lisp environment...
Clean and elegant syntax,
That actually sucks real hard to use in practice.
Performance, but only if compiled to something else...
A naive Lisp interpreter being almost the slowest style of
interpreter...
Given that one can create a data structure in LISP, and then execute it;
how would you do this without an interpreter or a JIT ??
C killed off every memory model other than flat byte addressed memory.
Pointers are sort of typed, but any real C program does stuff like this:
p = (struct foo *) malloc(42 * sizeof(struct foo));
typedef struct { // string with explicit length
int len:
char str[0];
} varstr;
varstr *p;
char *s = "swordfish";
// initialize p from s
p = (varstr *)malloc(sizeof(varstr)+strlen(s));
len = strlen(s);strncpy(p->str, s, p->len);
so in practice pointers all have to be pointers to bytes or something
that can losslessly be converted to and from them.
This evolution was certainly helped along by the horrible implementaton
of segmented memory in the Intel 8086 and 286, which persuaded people
that segments are a plague to be avoided rather than a tool to make
programs more reliable.
According to Niklas Holsti <niklas.holsti@tidorum.invalid>:
- [C] was designed to not
be too different from the underlying hardware.
The underlying hardware *of that time*. Therefore, it may have
contributed to "locking in" that style of hardware. But I do not pretend >>to know that a different style of hardware would be better today.
C evolved from B which had a memory model that addressed words, which made >sense for a lot of the computers of the 1960s. I gather the earliest >versions of C were on the GE 635 which was a 36 bit word addressed machine >but when it moved to the byte addressed PDP-11, dmr had to add typed pointers >so it could do something reasonable with pointers to character strings vs >pointers to words.
I think that with or without C, flat byte addressed memory would have won out >due to the success of S/360 and the PDP-11, both of which were programmed
in lots of languages other than C.
Call by name and thunks were a mistake. The Algol committee was trying
to write an elegant description of call by reference and only when
Jensen's device came along did they realize what they'd done. Alan
Perlis, who was on the Algol committee, told me this. Then when they
tried to fix Algol 60, the committee was hijacked by people who
produced Algol 68 which was quite a good language, but was defined so >obscurely that people wrongly assumed it was hard to learn and use.
or Pascal, had become the "standard", we might have ended up with
computers like the Burroughs B6500 or the Intel 432.
On 6/20/2026 5:01 PM, MitchAlsup wrote:---------------
Tagging to make it harder to stomp the link register;
Put it somewhere it can't be stomped on !! like in memory on a page the application has no access permissions.
Multiple stacks is a big ask, and non-accessible memory is not so good
when dealing with an ISA where user code needs to handle the Link-Register.
Can at least reduce success rate (for stomped LR being able to redirect control without immediate CPU fault) from 100% down to 0.4%, ...
Stack canaries can also help, but then compilers (and programmers) like
to disable them for sake of the "usually fraction of a percent"
performance overhead.
On Sat, 20 Jun 2026 01:01:29 GMT, MitchAlsup <user5857@newsgrouper.org.invalid> wrote:
Instead, C presents a programming model way down at the vonNeumann level:: >1 unit of work (step) at a time.
That could be considered a flaw.
Of course, there are languages that address that flaw.
APL has mathematical operators that act directly on vectors and
matrices without loops.
Modula-2, ADA, and some other languages include constructs for--- Synchronet 3.22a-Linux NewsLink 1.2
parallel execution.
But then, even C has fork().
John Savard
On 2026-Jun-20 15:07, John Levine wrote:
or Pascal, had become the "standard", we might have ended up with
computers like the Burroughs B6500 or the Intel 432.
That is one bullet we dodged!!
I doubt it. Several parallel strands of RISC research independently found that moving complexity from the hardware into the compiler made computers faster and cheaper. IBM's PL.8 compiler had excellent error checking even though it was originally targeted at the RISC 801, but somehow people always
want to turn off the error checks in the production build of their code.
I suspect that is because error checks were so badly designed.
e.g. the x86 BOUND instruction costs more to set up than it saves
because it requires 2 bounds to be set up in memory and then
read every time.
If checks are designed from a risc point of view
they should have little to no runtime costs.
For example, almost all arrays are 1-dimension, base-0 or base-1
and most array bounds are constants, so one only needs to check,
- for base-0 a single index unsigned < register or constant limit,
- for base-1 a single index != 0 and unsigned <= register or constant limit. (It uses an unsigned compare because signed negative integers
will be treated as large positive unsigned integers and fault.)
Since the index will already be in a register, this is just a
reg-reg or reg-imm compare and possibly fault.
There are two forms of conditional check ChkCC.
With the standard check, the following LD or ST is not dependent on
the check success and could be speculatively executed before an
index fault was thrown. It is therefore slightly faster but not
Spectre safe, suitable for secure environments.
The second form is a sequential check ChkSeqCC has 3 operands:
the source index register , the limit imm or register, and a dest register.
ChkSeqCC rd_index, rs1_index, imm_limit
ChkSeqCC rd_index, rs1_index, rs2_limit
When the check succeeds the rs1_index is copied into rd_index register,
and the rd_index register is then used in the LD or ST instruction.
This creates a sequential dependency of the LD/ST on the check having
been passed and thus blocks Spectre style speculative indexing.
On Sat, 20 Jun 2026 22:08:23 GMT, MitchAlsup wrote:
BGB <cr88192@gmail.com> posted:
On 6/20/2026 9:25 AM, Robert Swindells wrote:
On Fri, 19 Jun 2026 14:34:10 GMT, MitchAlsup wrote:Lisp is a funny language:
Robert Swindells <rjs@fdy2.co.uk> posted:
In previous discussions, I had tried to press Mitch to see if he
could remember what kind of benchmarks they had run on the 88100
that showed it running Lisp faster than SPARC.
M88K shift instructions could perform extracts, whereas SPARC had to >>>>> use 2 shifts to perform an extract; indexing was scaled:: both
helped interpreters.
Production Lisp environments are not interpreters, even back then.
Big promises in the design;
But, only deliver them poorly (and can't improve on the delivery of any
given thing without eroding the original promises).
Simplicity and elegance of an interpreter,
But only if already operating within a Lisp environment...
Clean and elegant syntax,
That actually sucks real hard to use in practice.
Performance, but only if compiled to something else...
A naive Lisp interpreter being almost the slowest style of
interpreter...
Given that one can create a data structure in LISP, and then execute it;
how would you do this without an interpreter or a JIT ??
You don't do that for anything serious.
You have an Ahead Of Time compiler that takes a file of source code and generates a file of machine code equivalent to it just as you do for any other Algol-family language. You also have the option of compiling an individual function to RAM but not saving it out to a file.
A good overview of the SoTA back then is this:
<https://dreamsongs.com/Files/Timrep.pdf>
I would expect that some PDP-10s at CMU had MacLisp installed on them.
The Franz Lisp source code was available from UCB including the compiler.
The commercial Common Lisp implementations from Franz Inc., Lucid Inc. and Harlequin came after this point. I don't know if any of them were ported
to the M88K.
The Lisp system developed by the SPICE Project at CMU initially targeted
the PERQ but later switched to running on conventional CPUs and is still
in use today in SBCL and CMUCL variants.
John Levine <johnl@taugh.com> writes:
C killed off every memory model other than flat byte addressed memory.
At least in the C standard the memory is segmented into objects.
Pointers are sort of typed, but any real C program does stuff like this:
p = (struct foo *) malloc(42 * sizeof(struct foo));
That produces an object of a certain size, and you must only access it through pointers derived from p. And programs usually satisfy that requirement.
typedef struct { // string with explicit length
int len:
char str[0];
} varstr;
varstr *p;
char *s = "swordfish";
// initialize p from s
p = (varstr *)malloc(sizeof(varstr)+strlen(s));
len = strlen(s);strncpy(p->str, s, p->len);
so in practice pointers all have to be pointers to bytes or something
that can losslessly be converted to and from them.
So you want typed pointers. Other languages have more type safety.
What kind of segmentation do you have in mind that would provide type
safety?
This evolution was certainly helped along by the horrible implementaton
of segmented memory in the Intel 8086 and 286, which persuaded people
that segments are a plague to be avoided rather than a tool to make >programs more reliable.
The 286 provides segments that fit the C standard. It seems that what
people found horrible about them was that they are limited to 64KB,
and that using them is slow, and that the 80286 protected mode was
completely at odds with real mode instead of an upwards-compatible
thing.
The limit could be fixed, and they would require more hardware
resources and be even slower. The limit on the number of segments is probable also a problem if you want to use them for C-standard
objects.
Concerning the performance, one can probably improve that, at the cost
of additional hardware, but I fail to see how any segment-based
hardware could be as fast (or at least close to) as flat memory with
software bounds checking.
One issue that segments as on the 80286 do not fix is dangling
references (C memory safety checkers go to considerable lengths to
deal with that problem). So the language implementation of a language
with explicit deallocation (e.g., C or Pascal) would deallocate the
segment, but, given the finite number of segment numbers, pass out the segment number again, and and access through a dangling reference to
the old segment could wreak havoc.
One problem connected to C and the 286 segments is that each pointer
would require a segment number and an offset withing the segment. For
a language like Java, the segment number would be sufficient. But,
e.g., Pascal reference parameters can reference a specific field in a
record or an array element, so they would have to be represented by segment+offset, too.
Do you have any example of non-horrible segmentation that provides--- Synchronet 3.22a-Linux NewsLink 1.2
memory safety. If not, do you have an idea what that would look like?
- anton
On Sat, 20 Jun 2026 22:08:23 GMT, MitchAlsup wrote:
BGB <cr88192@gmail.com> posted:
On 6/20/2026 9:25 AM, Robert Swindells wrote:
On Fri, 19 Jun 2026 14:34:10 GMT, MitchAlsup wrote:Lisp is a funny language:
Robert Swindells <rjs@fdy2.co.uk> posted:
In previous discussions, I had tried to press Mitch to see if he
could remember what kind of benchmarks they had run on the 88100
that showed it running Lisp faster than SPARC.
M88K shift instructions could perform extracts, whereas SPARC had to
use 2 shifts to perform an extract; indexing was scaled:: both
helped interpreters.
Production Lisp environments are not interpreters, even back then.
Big promises in the design;
But, only deliver them poorly (and can't improve on the delivery of any
given thing without eroding the original promises).
Simplicity and elegance of an interpreter,
But only if already operating within a Lisp environment...
Clean and elegant syntax,
That actually sucks real hard to use in practice.
Performance, but only if compiled to something else...
A naive Lisp interpreter being almost the slowest style of
interpreter...
Given that one can create a data structure in LISP, and then execute it; how would you do this without an interpreter or a JIT ??
You don't do that for anything serious.
You have an Ahead Of Time compiler that takes a file of source code and generates a file of machine code equivalent to it just as you do for any other Algol-family language. You also have the option of compiling an individual function to RAM but not saving it out to a file.
A good overview of the SoTA back then is this:
<https://dreamsongs.com/Files/Timrep.pdf>
I would expect that some PDP-10s at CMU had MacLisp installed on them.
The Franz Lisp source code was available from UCB including the compiler.
The commercial Common Lisp implementations from Franz Inc., Lucid Inc. and Harlequin came after this point. I don't know if any of them were ported
to the M88K.
The Lisp system developed by the SPICE Project at CMU initially targeted--- Synchronet 3.22a-Linux NewsLink 1.2
the PERQ but later switched to running on conventional CPUs and is still
in use today in SBCL and CMUCL variants.
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
John Levine <johnl@taugh.com> writes:
C killed off every memory model other than flat byte addressed memory.
At least in the C standard the memory is segmented into objects.
Pointers are sort of typed, but any real C program does stuff like this: >>>
p = (struct foo *) malloc(42 * sizeof(struct foo));
That produces an object of a certain size, and you must only access it
through pointers derived from p. And programs usually satisfy that
requirement.
{
p = (struct foo *) malloc(42 * sizeof(struct foo));
fprintf( stream, "0x16,", p );
...
if( fscanf( stream, "x16", q ) ) {
use q
}
}
is q "derived" though p ??
I was told that the prolog application on M88K was faster than competing
RISC processors. I remember that SPECint XLISP and M88Ksim were higher performing than several other competitors. I was told that the bit-field extract instructions had a lot to do with that.
Though, I guess one merit of a Lisp like language is that it is a lot
easier to parse, and it could be possible to implement a fairly cheap compiler for it (in the basic case).
Usual downside it that the excessive parenthesis tend to turn into a usability issue.
One other major hassle was typically a lack of C style loops (with break
or continue), but this could be addressed in theory.
EricP <ThatWouldBeTelling@thevillage.com> posted:
On 2026-Jun-20 15:07, John Levine wrote:
or Pascal, had become the "standard", we might have ended up with
computers like the Burroughs B6500 or the Intel 432.
That is one bullet we dodged!!
I doubt it. Several parallel strands of RISC research independently found >>> that moving complexity from the hardware into the compiler made computers >>> faster and cheaper. IBM's PL.8 compiler had excellent error checking even >>> though it was originally targeted at the RISC 801, but somehow people always
want to turn off the error checks in the production build of their code.
I suspect that is because error checks were so badly designed.
e.g. the x86 BOUND instruction costs more to set up than it saves
because it requires 2 bounds to be set up in memory and then
read every time.
If checks are designed from a risc point of view
they should have little to no runtime costs.
For example, almost all arrays are 1-dimension, base-0 or base-1
and most array bounds are constants, so one only needs to check,
- for base-0 a single index unsigned < register or constant limit,
- for base-1 a single index != 0 and unsigned <= register or constant limit. >> (It uses an unsigned compare because signed negative integers
will be treated as large positive unsigned integers and fault.)
Since the index will already be in a register, this is just a
reg-reg or reg-imm compare and possibly fault.
My 66000 has bounds checks built into the CMP instruction.
C would use the CIN check (0<=Rindex<Rcomparand)
Fortran would use the FIN check (0<Rindex<=Rcomparand)
An advantage of condition-code-less comparisons.
BGB <cr88192@gmail.com> posted:
On 6/20/2026 5:01 PM, MitchAlsup wrote:---------------
Tagging to make it harder to stomp the link register;
Put it somewhere it can't be stomped on !! like in memory on a page the
application has no access permissions.
Multiple stacks is a big ask, and non-accessible memory is not so good
when dealing with an ISA where user code needs to handle the Link-Register.
Code does not need to access or look at the return address in My 66000 ISA--except for the case where one wants to walk the stack back on a
THROW() and its unstructured equivalent longjump().
In addition, code does not need to access a GOT entry and then call
the address of an entry, one can LD directly into IP and at the
same time deposit the return address where it can't be stomped.
Only EXIT and RET can access the return address and 99% of the
time it goes directly into IP.
Can at least reduce success rate (for stomped LR being able to redirect
control without immediate CPU fault) from 100% down to 0.4%, ...
My way gets it down to 0%.
Stack canaries can also help, but then compilers (and programmers) like
to disable them for sake of the "usually fraction of a percent"
performance overhead.
Stack canaries are <unnecessary> added work to the instruction stream.
Do you have any example of non-horrible segmentation that provides
memory safety. If not, do you have an idea what that would look like?
- anton
On 21/06/2026 20:57, MitchAlsup wrote:
There is a discussion going on at the moment about "pointer providence"
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
John Levine <johnl@taugh.com> writes:
C killed off every memory model other than flat byte addressed memory.
At least in the C standard the memory is segmented into objects.
Pointers are sort of typed, but any real C program does stuff like
this:
p = (struct foo *) malloc(42 * sizeof(struct foo));
That produces an object of a certain size, and you must only access it
through pointers derived from p. And programs usually satisfy that
requirement.
{
p = (struct foo *) malloc(42 * sizeof(struct foo));
fprintf( stream, "0x16,", p );
...
if( fscanf( stream, "x16", q ) ) {
use q
}
}
is q "derived" though p ??
On Sun, 21 Jun 2026 19:02:32 GMT, MitchAlsup wrote:
I was told that the prolog application on M88K was faster than competing RISC processors. I remember that SPECint XLISP and M88Ksim were higher performing than several other competitors. I was told that the bit-field extract instructions had a lot to do with that.
XLisp doesn't use tags to encode types, it just has a C union of structs with a one byte field for the type. You can still find the source to the version used by SPECint. It doesn't provide a compiler.
It isn't a useful benchmark for anyone who was interested in running Lisp back then.
I'm not trying to defend SPARC and am happy to take your word for it that M88K was fast for the time.
On 6/21/2026 11:37 AM, MitchAlsup wrote:
EricP <ThatWouldBeTelling@thevillage.com> posted:
On 2026-Jun-20 15:07, John Levine wrote:
I suspect that is because error checks were so badly designed.
or Pascal, had become the "standard", we might have ended up with >>>>>> computers like the Burroughs B6500 or the Intel 432.
That is one bullet we dodged!!
I doubt it. Several parallel strands of RISC research independently found
that moving complexity from the hardware into the compiler made computers >>> faster and cheaper. IBM's PL.8 compiler had excellent error checking even
though it was originally targeted at the RISC 801, but somehow people always
want to turn off the error checks in the production build of their code. >>
e.g. the x86 BOUND instruction costs more to set up than it saves
because it requires 2 bounds to be set up in memory and then
read every time.
If checks are designed from a risc point of view
they should have little to no runtime costs.
For example, almost all arrays are 1-dimension, base-0 or base-1
and most array bounds are constants, so one only needs to check,
- for base-0 a single index unsigned < register or constant limit,
- for base-1 a single index != 0 and unsigned <= register or constant limit.
(It uses an unsigned compare because signed negative integers
will be treated as large positive unsigned integers and fault.)
Since the index will already be in a register, this is just a
reg-reg or reg-imm compare and possibly fault.
My 66000 has bounds checks built into the CMP instruction.
C would use the CIN check (0<=Rindex<Rcomparand)
Fortran would use the FIN check (0<Rindex<=Rcomparand)
An advantage of condition-code-less comparisons.
Yes, although a perhaps minor quibble. You would use the compare
followed presumably by a branch on bit instruction. I believe Eric's proposal would generate a fault if the comparison failed, so a single instruction versus two for your solution. I am not sure how much the
extra instruction costs, but if it occurs on every array reference, it
might be an issue.
On 6/21/2026 1:20 PM, MitchAlsup wrote:
BGB <cr88192@gmail.com> posted:
On 6/20/2026 5:01 PM, MitchAlsup wrote:---------------
Tagging to make it harder to stomp the link register;
Put it somewhere it can't be stomped on !! like in memory on a page the >>> application has no access permissions.
Multiple stacks is a big ask, and non-accessible memory is not so good
when dealing with an ISA where user code needs to handle the Link-Register.
Code does not need to access or look at the return address in My 66000 ISA--except for the case where one wants to walk the stack back on a THROW() and its unstructured equivalent longjump().
In addition, code does not need to access a GOT entry and then call
the address of an entry, one can LD directly into IP and at the
same time deposit the return address where it can't be stomped.
Only EXIT and RET can access the return address and 99% of the
time it goes directly into IP.
This requires a CPU that can deal with PUSH/POP mechanics in hardware.
With a Link Register, HW doesn't need to deal with this.
Then again, did see a video recently about a new interrupt-handling and system call mechanism for x86-64 (called FRED).
And the big apparent change:
Mostly makes SYSCALL behave like a normal interrupt, but drops the IDT
in favor of BaseRegister + Disp and similar.
So, seemingly:
IDT:
Push RIP and RFLAGS and similar;
Jump to entry point loaded from IDT;
Specific behavior depends on interrupt type, etc.
SYSCALL:
Copies RIP and RFLAGS to different registers;
Jump to entry point from an MSR.
New mechanism (FRED):
Pushes stuff to stack again, but more stuff to the stack;
Jump to fixed entry point with a per-category displacement;
Stack contents are more consistent.
Well, contrast the interrupt mechanism used in my ISA:
Saves SR (flags) and PC and similar to special registers;
Branches to VBR + Disp (category);
Loads some mode-state from VBR;
VBR can encode which ISA mode handles the interrupt (similar to LR).
Mode flag causes SP and SSP to swap places in the decoder.
Debatable, but stack-swapping avoids a bunch of PITA...
On 2026-06-21 22:15, David Brown wrote:
On 21/06/2026 20:57, MitchAlsup wrote:
There is a discussion going on at the moment about "pointer providence"
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
John Levine <johnl@taugh.com> writes:
C killed off every memory model other than flat byte addressed memory. >>>At least in the C standard the memory is segmented into objects.
Pointers are sort of typed, but any real C program does stuff like
this:
p = (struct foo *) malloc(42 * sizeof(struct foo));
That produces an object of a certain size, and you must only access it >>> through pointers derived from p. And programs usually satisfy that
requirement.
{
p = (struct foo *) malloc(42 * sizeof(struct foo));
fprintf( stream, "0x16,", p );
...
if( fscanf( stream, "x16", q ) ) {
use q
}
}
is q "derived" though p ??
Perhaps you meant pointer "provenance"? I hope we will not rely on the "careful governance and guidance of God", or on an "instance of divine intervention" to ensure pointer safety...
(Meanings of "providence" quoted from Wiktionary.)
option thanOn Wed, 17 Jun 2026 15:22:35 -0500, BGB wrote:
An idle thought here is whether there is any "better"
means thatconventional register-machine designs.
As an interesting thought experiment, let's assume that a vast
amount of memory is available with access times better than
SRAM (let's suppose 1-cycle for the purposes of this thread).
Would registers even be needed in such an architecture?
Registers in high-performance CPUs give you several benefits:
1) The addresses are hard-coded in the instructions. This
read access can start early, that dependencies (read-after-write, write-after-write, write-after-read) can be determined earlyand used
for forwarding, and for renaming registers), and for reducingport
requirements.time is
2) They have many read and write ports.
3) Fast access time. Well, maybe. Thanks to 1) fast access
actually not necessary, it just means that you need fewerforwarding
paths.
Let's look at your thought experiment:manage to
Advantage 1 is missing. Some AMD64 implementations still
implement 0-cycle store-to-load-forwarding in many cases, butAFAIK
not as reliably as for registers.have seen
Advantage 2 tends to be missing. E.g., the most extreme I
up to now is 3 reads and 2 writes per cycle, and IIRC <5total memory
accesses per cycle, on a machine that can do 8 or 10instructions per
cycle, i.e. at least 16 register reads and 8 register writesper cycle
(maybe limited to less, but with advantage 1 mitigating thatto some
extent).
Advantage 3: What would single-cycle memory access mean ford=a+b+c? It
would be compiled tocycles.
t=b+c
d=a+t
With registers this has a latency of typically 2 cycles. With
single-cycle memory access this typically has a latency of 6
BTW, it's not just a though experiment:
A number of IA-64 implementations have had single-cycle D-cache
access. It still had registers.
Processors like the 6502 and the 6809 have single-cyclememory access.
They still have registers (actually, accumulators and index
registers).
- anton
Robert Swindells <rjs@fdy2.co.uk> posted:
I was told that the prolog application on M88K was faster than competing
RISC processors.
On 2026-06-21 22:15, David Brown wrote:
On 21/06/2026 20:57, MitchAlsup wrote:
There is a discussion going on at the moment about "pointer providence"
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
John Levine <johnl@taugh.com> writes:
C killed off every memory model other than flat byte addressed memory. >>>>At least in the C standard the memory is segmented into objects.
Pointers are sort of typed, but any real C program does stuff like
this:
p = (struct foo *) malloc(42 * sizeof(struct foo));
That produces an object of a certain size, and you must only access it >>>> through pointers derived from p. And programs usually satisfy that
requirement.
{
p = (struct foo *) malloc(42 * sizeof(struct foo));
fprintf( stream, "0x16,", p );
...
if( fscanf( stream, "x16", q ) ) {
use q
}
}
is q "derived" though p ??
Perhaps you meant pointer "provenance"? I hope we will not rely on the "careful governance and guidance of God", or on an "instance of divine intervention" to ensure pointer safety...
(Meanings of "providence" quoted from Wiktionary.)
On 2026-06-21 22:15, David Brown wrote:
On 21/06/2026 20:57, MitchAlsup wrote:
There is a discussion going on at the moment about "pointer providence"
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
John Levine <johnl@taugh.com> writes:
C killed off every memory model other than flat byte addressed memory. >>>>At least in the C standard the memory is segmented into objects.
Pointers are sort of typed, but any real C program does stuff like
this:
p = (struct foo *) malloc(42 * sizeof(struct foo));
That produces an object of a certain size, and you must only access it >>>> through pointers derived from p. And programs usually satisfy that
requirement.
{
p = (struct foo *) malloc(42 * sizeof(struct foo));
fprintf( stream, "0x16,", p );
...
if( fscanf( stream, "x16", q ) ) {
use q
}
}
is q "derived" though p ??
Perhaps you meant pointer "provenance"? I hope we will not rely on the "careful governance and guidance of God", or on an "instance of divine intervention" to ensure pointer safety...
Niklas Holsti <niklas.holsti@tidorum.invalid> schrieb:
On 2026-06-21 22:15, David Brown wrote:
On 21/06/2026 20:57, MitchAlsup wrote:
There is a discussion going on at the moment about "pointer providence"
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
John Levine <johnl@taugh.com> writes:
C killed off every memory model other than flat byte addressed memory. >>>>>At least in the C standard the memory is segmented into objects.
Pointers are sort of typed, but any real C program does stuff like >>>>>> this:
p = (struct foo *) malloc(42 * sizeof(struct foo));
That produces an object of a certain size, and you must only access it >>>>> through pointers derived from p. And programs usually satisfy that >>>>> requirement.
{
p = (struct foo *) malloc(42 * sizeof(struct foo));
fprintf( stream, "0x16,", p );
...
if( fscanf( stream, "x16", q ) ) {
use q
}
}
is q "derived" though p ??
Perhaps you meant pointer "provenance"? I hope we will not rely on the
"careful governance and guidance of God", or on an "instance of divine
intervention" to ensure pointer safety...
Has pointer safety been shown to be equivalent to the halting
problem? If so, "careful governance and guidance from God" may
indeed be required.
On 2026-06-22 13:44, Thomas Koenig wrote:
Niklas Holsti <niklas.holsti@tidorum.invalid> schrieb:
On 2026-06-21 22:15, David Brown wrote:
On 21/06/2026 20:57, MitchAlsup wrote:Perhaps you meant pointer "provenance"? I hope we will not rely on the
There is a discussion going on at the moment about "pointer providence" >>>
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
John Levine <johnl@taugh.com> writes:
C killed off every memory model other than flat byte addressed
memory.
At least in the C standard the memory is segmented into objects.
Pointers are sort of typed, but any real C program does stuff like >>>>>>> this:
p = (struct foo *) malloc(42 * sizeof(struct foo));
That produces an object of a certain size, and you must only
access it
through pointers derived from p. And programs usually satisfy that >>>>>> requirement.
{
p = (struct foo *) malloc(42 * sizeof(struct foo)); >>>>> fprintf( stream, "0x16,", p );
...
if( fscanf( stream, "x16", q ) ) {
use q
}
}
is q "derived" though p ??
"careful governance and guidance of God", or on an "instance of divine
intervention" to ensure pointer safety...
Has pointer safety been shown to be equivalent to the halting
problem? If so, "careful governance and guidance from God" may
indeed be required.
I would assume it is undecidable, for unrestricted programs. The aim of pointer provenance is no doubt to restrict programs to make it decidable
to some extent.
I am reminded of the person, apparently very religious, who some decades
ago posted to solicit help for reimplementing all of computing (gcc,
GNU, et cetera) on Biblical principles, because he thought Richard
Stallman was too atheistic and had tainted his products. I have not
heard how that went.
On 6/21/2026 11:37 AM, MitchAlsup wrote:
EricP <ThatWouldBeTelling@thevillage.com> posted:
On 2026-Jun-20 15:07, John Levine wrote:
I suspect that is because error checks were so badly designed.
or Pascal, had become the "standard", we might have ended up with >>>>>>> computers like the Burroughs B6500 or the Intel 432.
That is one bullet we dodged!!
I doubt it. Several parallel strands of RISC research independently found
that moving complexity from the hardware into the compiler made computers >>>> faster and cheaper. IBM's PL.8 compiler had excellent error checking even
though it was originally targeted at the RISC 801, but somehow people always
want to turn off the error checks in the production build of their code. >>>
e.g. the x86 BOUND instruction costs more to set up than it saves
because it requires 2 bounds to be set up in memory and then
read every time.
If checks are designed from a risc point of view
they should have little to no runtime costs.
For example, almost all arrays are 1-dimension, base-0 or base-1
and most array bounds are constants, so one only needs to check,
- for base-0 a single index unsigned < register or constant limit,
- for base-1 a single index != 0 and unsigned <= register or constant limit.
(It uses an unsigned compare because signed negative integers
will be treated as large positive unsigned integers and fault.)
Since the index will already be in a register, this is just a
reg-reg or reg-imm compare and possibly fault.
My 66000 has bounds checks built into the CMP instruction.
C would use the CIN check (0<=Rindex<Rcomparand)
Fortran would use the FIN check (0<Rindex<=Rcomparand)
An advantage of condition-code-less comparisons.
Yes, although a perhaps minor quibble. You would use the compare followed presumably by a branch on bit instruction. I believe Eric's proposal would generate a fault if the comparison failed, so a single instruction versus two for your solution. I am not sure how much the extra instruction costs, but if it occurs on every array reference, it might be an issue.
Niklas Holsti <niklas.holsti@tidorum.invalid> schrieb:
On 2026-06-21 22:15, David Brown wrote:
On 21/06/2026 20:57, MitchAlsup wrote:
There is a discussion going on at the moment about "pointer providence"
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
John Levine <johnl@taugh.com> writes:
C killed off every memory model other than flat byte addressed memory. >>>>>At least in the C standard the memory is segmented into objects.
Pointers are sort of typed, but any real C program does stuff like >>>>>> this:
p = (struct foo *) malloc(42 * sizeof(struct foo));
That produces an object of a certain size, and you must only access it >>>>> through pointers derived from p. And programs usually satisfy that >>>>> requirement.
{
p = (struct foo *) malloc(42 * sizeof(struct foo));
fprintf( stream, "0x16,", p );
...
if( fscanf( stream, "x16", q ) ) {
use q
}
}
is q "derived" though p ??
Perhaps you meant pointer "provenance"? I hope we will not rely on the
"careful governance and guidance of God", or on an "instance of divine
intervention" to ensure pointer safety...
Has pointer safety been shown to be equivalent to the halting
problem? If so, "careful governance and guidance from God" may
indeed be required.
Actually IA-32 (since the 486) and AMD64 have a bit for turning on
unaligned traps, but unfortunately there is too much software in
libararies that performs unaligned accesses.
On 6/22/2026 3:44 AM, Thomas Koenig wrote:
Niklas Holsti <niklas.holsti@tidorum.invalid> schrieb:
On 2026-06-21 22:15, David Brown wrote:
On 21/06/2026 20:57, MitchAlsup wrote:Perhaps you meant pointer "provenance"? I hope we will not rely on the
There is a discussion going on at the moment about "pointer providence" >>>
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
John Levine <johnl@taugh.com> writes:
C killed off every memory model other than flat byte addressed
memory.
At least in the C standard the memory is segmented into objects.
Pointers are sort of typed, but any real C program does stuff like >>>>>>> this:
p = (struct foo *) malloc(42 * sizeof(struct foo));
That produces an object of a certain size, and you must only
access it
through pointers derived from p. And programs usually satisfy that >>>>>> requirement.
{
p = (struct foo *) malloc(42 * sizeof(struct foo)); >>>>> fprintf( stream, "0x16,", p );
...
if( fscanf( stream, "x16", q ) ) {
use q
}
}
is q "derived" though p ??
"careful governance and guidance of God", or on an "instance of divine
intervention" to ensure pointer safety...
Has pointer safety been shown to be equivalent to the halting
problem? If so, "careful governance and guidance from God" may
indeed be required.
I don't know the answer to your question, but presumably we can do
better than C does. Isn't that one of the, at least claimed, advantages
of Rust, and perhaps even Ada?
Also, I believe that had the originators
of C not allowed arithmetic on pointers (comparisons for equality would still be allowed, and array addressing would have to use subscripts)
many of the problems with C pointers wouldn't have occurred. Of course, that horse has left the barn a long time ago.
On 2026-06-22 17:59, Stephen Fuld wrote:
On 6/22/2026 3:44 AM, Thomas Koenig wrote:
Niklas Holsti <niklas.holsti@tidorum.invalid> schrieb:
On 2026-06-21 22:15, David Brown wrote:
On 21/06/2026 20:57, MitchAlsup wrote:
There is a discussion going on at the moment about "pointer
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
John Levine <johnl@taugh.com> writes:
C killed off every memory model other than flat byte addressed >>>>>>>> memory.
At least in the C standard the memory is segmented into objects. >>>>>>>
Pointers are sort of typed, but any real C program does stuff like >>>>>>>> this:
p = (struct foo *) malloc(42 * sizeof(struct foo));
That produces an object of a certain size, and you must only
access it
through pointers derived from p. And programs usually satisfy that >>>>>>> requirement.
{
p = (struct foo *) malloc(42 * sizeof(struct foo)); >>>>>> fprintf( stream, "0x16,", p );
...
if( fscanf( stream, "x16", q ) ) {
use q
}
}
is q "derived" though p ??
providence"
Perhaps you meant pointer "provenance"? I hope we will not rely on the >>>> "careful governance and guidance of God", or on an "instance of divine >>>> intervention" to ensure pointer safety...
Has pointer safety been shown to be equivalent to the halting
problem? If so, "careful governance and guidance from God" may
indeed be required.
I don't know the answer to your question, but presumably we can do
better than C does. Isn't that one of the, at least claimed,
advantages of Rust, and perhaps even Ada?
Both Rust and Ada have to be restricted in certain ways in order to
ensure absence of pointer errors: Rust has to avoid "unsafe" code,
and
Ada has to avoid pointer-related "unchecked" constructs and certain undefined behavior (which does exist in Ada, but less so than in C). The
Ada subset called SPARK, together with its proof tools, is meant for
such programming, and has a feature similar to Rust "ownership" though standard Ada does not.
Also, I believe that had the originators of C not allowed arithmetic
on pointers (comparisons for equality would still be allowed, and
array addressing would have to use subscripts) many of the problems
with C pointers wouldn't have occurred. Of course, that horse has
left the barn a long time ago.
I recently helped to debug an Ada program that now and then, but not
often, was overwriting some buffers. At one point in that program I had *cough* used pointer arithmetic *blush* instead of array indexing, for
what I felt were good reasons at the time. But it bit me. An amusing
clue to the error was that the bug happened more often when the
satellite running the program was above Russia's borders. Perhaps you
can guess reasons for that :-)
Actually IA-32 (since the 486) and AMD64 have a bit for turning on
unaligned traps, but unfortunately there is too much software in
libararies that performs unaligned accesses.
Interesting... what's this bit called? In which register does
it live?
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
Actually IA-32 (since the 486) and AMD64 have a bit for turning on
unaligned traps, but unfortunately there is too much software in
libararies that performs unaligned accesses.
Interesting... what's this bit called? In which register does
it live?
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
Actually IA-32 (since the 486) and AMD64 have a bit for turning on unaligned traps, but unfortunately there is too much software in
libararies that performs unaligned accesses.
Interesting... what's this bit called? In which register does
it live?
Thanks!
Andy Valencia
Home page: https://www.vsta.org/andy/
Robert Swindells [2026-06-19 11:20:10] wrote:
On Fri, 19 Jun 2026 06:02:16 GMT, Anton Ertl wrote:[...]
Another architectural feature: One might think that tagging support
would help dynamically typed programming languages (e.g., Lisp), and
SPARC contains some support for that, but as one of the IIRC Franz Lisp
developers has explained in this newsgroup, they actually did not use
this feature, because the performance benefit was not big enough to
Franz Lisp doesn't use tags at all and only ran on VAX and 68k.
I guess you two aren't talking bout the same "Franz Lisp".
AFAIK Anton is referring to the commercial Common Lisp compiler
associated with the Franz Inc company, marketed under the name
"Allegro".
=== Stefan
I think it is more appropriate to say the M88K was peaky--some things
it did quite well, and others "not so much".
This requires a CPU that can deal with PUSH/POP mechanics in hardware.
With a Link Register, HW doesn't need to deal with this.
A quick grep of the Intel® 64 and IA-32 Architectures Software
Developer's Manual finds the AC bit in the EFLAGS register. I happen
to have an old 486 manual and it's there, too.
On Sun, 21 Jun 2026 13:55:59 -0500, BGB wrote:
Though, I guess one merit of a Lisp like language is that it is a lot
easier to parse, and it could be possible to implement a fairly cheap
compiler for it (in the basic case).
Usual downside it that the excessive parenthesis tend to turn into a
usability issue.
You use an editor that keeps track of them.
One other major hassle was typically a lack of C style loops (with break
or continue), but this could be addressed in theory.
It is addressed in practice.
You could run SBCL on your CPU, it has a RISC-V backend to the compiler.
I read with interest its interaction with SMEP/SMAP as well.
I have been out of the kernel game for many dog-years.
BGB <cr88192@gmail.com> posted:
On 6/20/2026 5:01 PM, MitchAlsup wrote:---------------
Tagging to make it harder to stomp the link register;
Put it somewhere it can't be stomped on !! like in memory on a page the
application has no access permissions.
Multiple stacks is a big ask, and non-accessible memory is not so good
when dealing with an ISA where user code needs to handle the Link-Register.
Code does not need to access or look at the return address in My 66000 ISA--except for the case where one wants to walk the stack back on a
THROW() and its unstructured equivalent longjump().
On 6/22/2026 9:26 AM, Niklas Holsti wrote:
On 2026-06-22 17:59, Stephen Fuld wrote:
On 6/22/2026 3:44 AM, Thomas Koenig wrote:
Niklas Holsti <niklas.holsti@tidorum.invalid> schrieb:
On 2026-06-21 22:15, David Brown wrote:
There is a discussion going on at the moment about "pointer
providence"
Perhaps you meant pointer "provenance"? I hope we will not rely on the >>>>> "careful governance and guidance of God", or on an "instance of divine >>>>> intervention" to ensure pointer safety...
Has pointer safety been shown to be equivalent to the halting
problem? If so, "careful governance and guidance from God" may
indeed be required.
I don't know the answer to your question, but presumably we can do
better than C does. Isn't that one of the, at least claimed,
advantages of Rust, and perhaps even Ada?
Both Rust and Ada have to be restricted in certain ways in order to
ensure absence of pointer errors: Rust has to avoid "unsafe" code,
I like Rust's solution. You can do unsafe things - sometimes they are
just necessary - but they are not the default was of doing things, and
you have to notate them in the source code which serves to discourage
them and points people debugging errors to certain areas of the code
that are more likley to be problematic.
and Ada has to avoid pointer-related "unchecked" constructs and
certain undefined behavior (which does exist in Ada, but less so than
in C). The Ada subset called SPARK, together with its proof tools, is
meant for such programming, and has a feature similar to Rust
"ownership" though standard Ada does not.
Is programming under SPARK rules significantly harder than under
nonSPARK Ada?
One could indeed say so, because the reason is Putin's attack onAlso, I believe that had the originators of C not allowed arithmetic
on pointers (comparisons for equality would still be allowed, and
array addressing would have to use subscripts) many of the problems
with C pointers wouldn't have occurred. Of course, that horse has
left the barn a long time ago.
I recently helped to debug an Ada program that now and then, but not
often, was overwriting some buffers. At one point in that program I
had *cough* used pointer arithmetic *blush* instead of array indexing,
for what I felt were good reasons at the time. But it bit me. An
amusing clue to the error was that the bug happened more often when
the satellite running the program was above Russia's borders. Perhaps
you can guess reasons for that :-)
Interesting. Perhaps it is because Russia has less "careful governance
and guidance from God" :-)
On 6/21/2026 2:56 PM, Robert Swindells wrote:
On Sun, 21 Jun 2026 13:55:59 -0500, BGB wrote:
Though, I guess one merit of a Lisp like language is that it is a lot
easier to parse, and it could be possible to implement a fairly cheap
compiler for it (in the basic case).
Usual downside it that the excessive parenthesis tend to turn into a
usability issue.
You use an editor that keeps track of them.
Probably.
The main editor I use on Windows, Notepad2, has syntax highlighting and >parenthesis matching.
Normal Notepad does not.
Though, would seem that these features have become fairly common in >text-editors in Linux land.
On 2026-06-22 19:50, Stephen Fuld wrote:
On 6/22/2026 9:26 AM, Niklas Holsti wrote:
On 2026-06-22 17:59, Stephen Fuld wrote:
On 6/22/2026 3:44 AM, Thomas Koenig wrote:
Niklas Holsti <niklas.holsti@tidorum.invalid> schrieb:
On 2026-06-21 22:15, David Brown wrote:
[snip]
There is a discussion going on at the moment about "pointer
providence"
Perhaps you meant pointer "provenance"? I hope we will not rely on >>>>>> the
"careful governance and guidance of God", or on an "instance of
divine
intervention" to ensure pointer safety...
Has pointer safety been shown to be equivalent to the halting
problem? If so, "careful governance and guidance from God" may
indeed be required.
I don't know the answer to your question, but presumably we can do
better than C does. Isn't that one of the, at least claimed,
advantages of Rust, and perhaps even Ada?
Both Rust and Ada have to be restricted in certain ways in order to
ensure absence of pointer errors: Rust has to avoid "unsafe" code,
I like Rust's solution. You can do unsafe things - sometimes they are
just necessary - but they are not the default was of doing things, and
you have to notate them in the source code which serves to discourage
them and points people debugging errors to certain areas of the code
that are more likley to be problematic.
Same in Ada, mostly: some unsafe things are named "Unchecked_Xxx",
others are available only if some specific predefined packages are used, which are not needed for most safe things.
and Ada has to avoid pointer-related "unchecked" constructs and
certain undefined behavior (which does exist in Ada, but less so than
in C). The Ada subset called SPARK, together with its proof tools, is
meant for such programming, and has a feature similar to Rust
"ownership" though standard Ada does not.
Is programming under SPARK rules significantly harder than under
nonSPARK Ada?
I don't have personal experience, but my impression is that it does not
make it markedly harder than the usual restrictions on embedded, more- or-less critical software do. SPARK is defined and supported by the
AdaCore company, not a standards group, and is evolving. The
documentation is at https://www.adacore.com/documentation?tab=spark; the main restrictions are (quoted from https://docs.adacore.com/live/wave/ spark2014/html/spark2014_rm/introduction.html#principal-language- restrictions, with my comments in []):
--- quote:
To facilitate formal analyses and verification, SPARK enforces a number
of global restrictions to Ada. While these are covered in more detail in
the remaining chapters of this document, the most notable restrictions are:
- Restrictions on the use of access types and values [pointers], similar
in some ways to the ownership model of the programming language Rust.
- All expressions (including function calls) are free of side effects.
- Aliasing of names is not permitted in general but the renaming of
entities is permitted as there is a static relationship between the two names. In analysis all names introduced by a renaming declaration are replaced by the name of the renamed entity. This replacement is applied recursively when there are multiple renames of an entity.
- Backward goto statements are not permitted.
- The use of controlled types is not currently permitted. [These are
types with automatic invocation of user-defined initialization and finalization operations on object creation, copying, and deletion.]
- Tasks and protected objects are permitted only if the Ravenscar
profile (or the Jorvik profile) is specified. [The main limitation in
these profiles is that the set of tasks (threads) is static, no task
ever terminates, and inter-task communication is by protected objects (monitors, synchronized objects) and not by rendez-vous.]
- Raising and handling of exceptions is not currently permitted
(exceptions can be included in a program but proof must be used to show
that they cannot be raised).
--- end quote.
One could indeed say so, because the reason is Putin's attack onAlso, I believe that had the originators of C not allowed
arithmetic on pointers (comparisons for equality would still be
allowed, and array addressing would have to use subscripts) many of
the problems with C pointers wouldn't have occurred. Of course,
that horse has left the barn a long time ago.
I recently helped to debug an Ada program that now and then, but not
often, was overwriting some buffers. At one point in that program I
had *cough* used pointer arithmetic *blush* instead of array
indexing, for what I felt were good reasons at the time. But it bit
me. An amusing clue to the error was that the bug happened more often
when the satellite running the program was above Russia's borders.
Perhaps you can guess reasons for that :-)
Interesting. Perhaps it is because Russia has less "careful
governance and guidance from God" :-)
Ukraine, as you may have guessed.
The Ada program runs a satellite-based GNSS receiver that acquires
(finds) and then tracks GNSS signals from GNSS satellites (GPS, Galileo,
and others) as those satellites rise or set. The purpose is to measure atmospheric properties from the way the atmosphere refracts the signal.
The design and/or coding error was in the transition between two stages
of the multi-stage procedure for finding and starting to track a GNSS
signal from a GNSS satellite.
So then: Russia attacks Ukraine => Ukraine defends itself with long- distance drones => Russia jams and perturbs GNSS signals along its
borders => the satellite software often loses track of a signal it is tracking => the satellite software often has to re-acquire signals =>
the bug manifests more often over Russia's borders.
If one favours the Ukrainian Orthodox church, which objects to this war, Russia is going against God's guidance. If one favours the Russian
Orthodox church, which blesses this war, Russia is following God's
guidance.
(The bug was not found in testing because it did not manifest on every transition between the two acquisition stages -- it manifested only when
two other dynamic program states occurred together, at the same time as
the transition, and one of these states is rather rare, at least in test conditions.)
MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:
BGB <cr88192@gmail.com> posted:
On 6/20/2026 5:01 PM, MitchAlsup wrote:---------------
Tagging to make it harder to stomp the link register;
Put it somewhere it can't be stomped on !! like in memory on a page the >> > application has no access permissions.
Multiple stacks is a big ask, and non-accessible memory is not so good
when dealing with an ISA where user code needs to handle the Link-Register.
Code does not need to access or look at the return address in My 66000 ISA--except for the case where one wants to walk the stack back on a THROW() and its unstructured equivalent longjump().
What about a debugging stack trace?
Thomas Koenig <tkoenig@netcologne.de> posted:
MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:The debugger runs in a separate process with access to application
BGB <cr88192@gmail.com> posted:
On 6/20/2026 5:01 PM, MitchAlsup wrote:---------------
Tagging to make it harder to stomp the link register;
Put it somewhere it can't be stomped on !! like in memory on a page the >> >> > application has no access permissions.
Multiple stacks is a big ask, and non-accessible memory is not so good >> >> when dealing with an ISA where user code needs to handle the Link-Register.
Code does not need to access or look at the return address in My 66000
ISA--except for the case where one wants to walk the stack back on a
THROW() and its unstructured equivalent longjump().
What about a debugging stack trace?
Root pointer and ASID. In that process, Call-stack is RW-.
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
Thomas Koenig <tkoenig@netcologne.de> posted:
MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:The debugger runs in a separate process with access to application
BGB <cr88192@gmail.com> posted:
On 6/20/2026 5:01 PM, MitchAlsup wrote:---------------
Tagging to make it harder to stomp the link register;
Put it somewhere it can't be stomped on !! like in memory on a page the >>>>>> application has no access permissions.
Multiple stacks is a big ask, and non-accessible memory is not so good >>>>> when dealing with an ISA where user code needs to handle the Link-Register.
Code does not need to access or look at the return address in My 66000 >>>> ISA--except for the case where one wants to walk the stack back on a
THROW() and its unstructured equivalent longjump().
What about a debugging stack trace?
Root pointer and ASID. In that process, Call-stack is RW-.
GLIBC has a function to obtain a backtrace at a current point
in time. This is called in the context of the thread that invokes
the call. It requires access to the call records on the stack
in the context of the thread (the glicb functions are backtrace(3)
and backtrace_symbols(3)).
/**
* Log a simulator stack traceback.
*/
void
c_osdep::backtrace(c_logger *lp)
{
int num_frames;
void *framelist[100];
char **strings;
num_frames = ::backtrace(framelist, sizeof(framelist)/sizeof(framelist[0]));
strings = ::backtrace_symbols(framelist, num_frames);
if (strings == NULL) {
lp->log("Unable to obtain simulator stack traceback: %s\n",
strerror(errno));
return;
}
for(int frame=0; frame < num_frames; frame++) {
lp->log("[%2.2d] %s\n", frame, strings[frame]);
}
::free(strings);
}
On 2026-06-22 13:44, Thomas Koenig wrote:
Niklas Holsti <niklas.holsti@tidorum.invalid> schrieb:
On 2026-06-21 22:15, David Brown wrote:
On 21/06/2026 20:57, MitchAlsup wrote:Perhaps you meant pointer "provenance"? I hope we will not rely on the
There is a discussion going on at the moment about "pointer providence" >>>
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
John Levine <johnl@taugh.com> writes:
C killed off every memory model other than flat byte addressed
memory.
At least in the C standard the memory is segmented into objects.
Pointers are sort of typed, but any real C program does stuff like >>>>>>> this:
p = (struct foo *) malloc(42 * sizeof(struct foo));
That produces an object of a certain size, and you must only
access it
through pointers derived from p. And programs usually satisfy that >>>>>> requirement.
{
p = (struct foo *) malloc(42 * sizeof(struct foo)); >>>>> fprintf( stream, "0x16,", p );
...
if( fscanf( stream, "x16", q ) ) {
use q
}
}
is q "derived" though p ??
"careful governance and guidance of God", or on an "instance of divine
intervention" to ensure pointer safety...
Has pointer safety been shown to be equivalent to the halting
problem? If so, "careful governance and guidance from God" may
indeed be required.
I would assume it is undecidable, for unrestricted programs. The aim of pointer provenance is no doubt to restrict programs to make it decidable
to some extent.
I am reminded of the person, apparently very religious, who some decades
ago posted to solicit help for reimplementing all of computing (gcc,
GNU, et cetera) on Biblical principles, because he thought Richard
Stallman was too atheistic and had tainted his products. I have not
heard how that went.
On 6/22/2026 7:38 AM, Niklas Holsti wrote:-------------
On 2026-06-22 13:44, Thomas Koenig wrote:
Niklas Holsti <niklas.holsti@tidorum.invalid> schrieb:
On 2026-06-21 22:15, David Brown wrote:
On 21/06/2026 20:57, MitchAlsup wrote:
In my case, I tended to use more conservative approaches and then only optimize based on what can be verified by the compiler within certain fundamental assumptions.
Say:
Pointer 1 points at a stack array in the local function;
Pointer 2 was derived from taking the address of a global array;
Compiler can safely assume no-alias.
Also, if two pointers were passed into a function, can also assume they don't alias with a pointer to a local array;
I am reminded of the person, apparently very religious, who some decades ago posted to solicit help for reimplementing all of computing (gcc,
GNU, et cetera) on Biblical principles, because he thought Richard Stallman was too atheistic and had tainted his products. I have not
heard how that went.
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
Thomas Koenig <tkoenig@netcologne.de> posted:
MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:The debugger runs in a separate process with access to application
BGB <cr88192@gmail.com> posted:
On 6/20/2026 5:01 PM, MitchAlsup wrote:---------------
Tagging to make it harder to stomp the link register;
Put it somewhere it can't be stomped on !! like in memory on a page the
application has no access permissions.
Multiple stacks is a big ask, and non-accessible memory is not so good >> >> when dealing with an ISA where user code needs to handle the Link-Register.
Code does not need to access or look at the return address in My 66000 >> > ISA--except for the case where one wants to walk the stack back on a
THROW() and its unstructured equivalent longjump().
What about a debugging stack trace?
Root pointer and ASID. In that process, Call-stack is RW-.
GLIBC has a function to obtain a backtrace at a current point
in time. This is called in the context of the thread that invokes
the call. It requires access to the call records on the stack
in the context of the thread (the glicb functions are backtrace(3)
and backtrace_symbols(3)).
BGB <cr88192@gmail.com> posted:
On 6/22/2026 7:38 AM, Niklas Holsti wrote:-------------
On 2026-06-22 13:44, Thomas Koenig wrote:
Niklas Holsti <niklas.holsti@tidorum.invalid> schrieb:
On 2026-06-21 22:15, David Brown wrote:
On 21/06/2026 20:57, MitchAlsup wrote:
In my case, I tended to use more conservative approaches and then only
optimize based on what can be verified by the compiler within certain
fundamental assumptions.
Say:
Pointer 1 points at a stack array in the local function;
Pointer 2 was derived from taking the address of a global array;
Compiler can safely assume no-alias.
Also, if two pointers were passed into a function, can also assume they
don't alias with a pointer to a local array;
C requires the compiler to prove that the pointers cannot alias.
Fortran specifies that if the 2 argument alias, it is a programming error.
-----------------
I am reminded of the person, apparently very religious, who some decades >>> ago posted to solicit help for reimplementing all of computing (gcc,
GNU, et cetera) on Biblical principles, because he thought Richard
Stallman was too atheistic and had tainted his products. I have not
heard how that went.
Rick...
C requires the compiler to prove that the pointers cannot alias.Hard proof that alias is impossible is harder to achieve in practice...
Fortran specifies that if the 2 argument alias, it is a programming error. >>
A softer "there is no reasonable possibility of alias" is easier to achieve.
According to BGB <cr88192@gmail.com>:
C requires the compiler to prove that the pointers cannot alias.Hard proof that alias is impossible is harder to achieve in practice...
Fortran specifies that if the 2 argument alias, it is a programming error. >>>
A softer "there is no reasonable possibility of alias" is easier to achieve.
Sort of. The standard says that the compiler can assume no type punning, so that
if pointers are of different types, they can't point at the same thing (with an
exception for pointers to unions.)
Even so, C has "restrict" to tell the compiler to assume that pointers never alias, and "volatile" to assume they always do.
Consider the Push/Pop mechanics in HW compared to FMAC in HW--which
do you think is easier ???
Now consider 16 pushed in a row versus a single instruction that performs
the same amount of work. Which one needs to translate an address more
often, which one needs to AGEN more often, and which one can access the
cache once for up to 8 registers ???
C requires the compiler to prove that the pointers cannot alias.
Thomas Koenig <tkoenig@netcologne.de> posted:
MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:The debugger runs in a separate process with access to application
BGB <cr88192@gmail.com> posted:
On 6/20/2026 5:01 PM, MitchAlsup wrote:---------------
Tagging to make it harder to stomp the link register;
Put it somewhere it can't be stomped on !! like in memory on a page the >> >> > application has no access permissions.
Multiple stacks is a big ask, and non-accessible memory is not so good >> >> when dealing with an ISA where user code needs to handle the Link-Register.
Code does not need to access or look at the return address in My 66000
ISA--except for the case where one wants to walk the stack back on a
THROW() and its unstructured equivalent longjump().
What about a debugging stack trace?
Root pointer and ASID. In that process, Call-stack is RW-.
scott@slp53.sl.home (Scott Lurndal) posted:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
Thomas Koenig <tkoenig@netcologne.de> posted:
MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:The debugger runs in a separate process with access to application
BGB <cr88192@gmail.com> posted:
On 6/20/2026 5:01 PM, MitchAlsup wrote:---------------
Tagging to make it harder to stomp the link register;
Put it somewhere it can't be stomped on !! like in memory on a page the
application has no access permissions.
Multiple stacks is a big ask, and non-accessible memory is not so good
when dealing with an ISA where user code needs to handle the Link-Register.
Code does not need to access or look at the return address in My 66000 >> >> > ISA--except for the case where one wants to walk the stack back on a
THROW() and its unstructured equivalent longjump().
What about a debugging stack trace?
Root pointer and ASID. In that process, Call-stack is RW-.
GLIBC has a function to obtain a backtrace at a current point
in time. This is called in the context of the thread that invokes
the call. It requires access to the call records on the stack
in the context of the thread (the glicb functions are backtrace(3)
and backtrace_symbols(3)).
When Thread is unExceptional it cannot access Call Stack,
when Thread is Exceptional it can.
ENTER, EXIT, and RET are exempt from the protection check.
Call Stack Pointer is not accessible to unprivileged code.
Don't see how one gets from a running application into debugger without taking an exception !?! or from running in the debugger to running in application without returning from an exception !!!
Usual downside it that the excessive parenthesis tend to turn into a usability issue.
BGB <cr88192@gmail.com> posted:
On 6/22/2026 7:38 AM, Niklas Holsti wrote:-------------
On 2026-06-22 13:44, Thomas Koenig wrote:
Niklas Holsti <niklas.holsti@tidorum.invalid> schrieb:
On 2026-06-21 22:15, David Brown wrote:
On 21/06/2026 20:57, MitchAlsup wrote:
In my case, I tended to use more conservative approaches and then only
optimize based on what can be verified by the compiler within certain
fundamental assumptions.
Say:
Pointer 1 points at a stack array in the local function;
Pointer 2 was derived from taking the address of a global array;
Compiler can safely assume no-alias.
Also, if two pointers were passed into a function, can also assume they
don't alias with a pointer to a local array;
C requires the compiler to prove that the pointers cannot alias.
Fortran specifies that if the 2 argument alias, it is a programming error.
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
C requires the compiler to prove that the pointers cannot alias.
I wish. Actually, by default gcc assumes (i.e., it does not prove)
that pointers to different types (except char) do not point to the
same address. One has to turn that off with -fno-strict-aliasing.
Other C compilers use the same assumption.
On 6/23/2026 9:25 PM, John Levine wrote:
According to BGB <cr88192@gmail.com>:
C requires the compiler to prove that the pointers cannot alias.Hard proof that alias is impossible is harder to achieve in practice...
Fortran specifies that if the 2 argument alias, it is a programming
error.
A softer "there is no reasonable possibility of alias" is easier to
achieve.
Sort of. The standard says that the compiler can assume no type
punning, so that
if pointers are of different types, they can't point at the same thing
(with an
exception for pointers to unions.)
Even so, C has "restrict" to tell the compiler to assume that pointers
never
alias, and "volatile" to assume they always do.
Possibly, though traditional type-based aliasing rules run into a
problem in that pointer casting can break its assumptions, and a lot of
code doesn't respect these rules (which taken purely at face value, are overly limiting).
One option though is "if enabled, assume the rules are followed unless
the compiler sees them being broken", in which case it disables TBAA
when faced with TBAA violations.
This approach seems to be moderately
effective, and allows benefiting from some of the performance advantages
of TBAA while also being more friendly to code that goes "wild west"
with things like pointer casts and "cast and dereference" patterns.
So, say, a nicer compromise (even if still breakable).
int foo1(char *s, int *t)
{
*s=*t+1;
return *t;
}
//assume not directly visible within same context:
int foo2()
{
int i, j;
i=4;
j=foo1((char *)(&i), &i);
return j;
}
What is the result of calling foo2?...
Here, foo2 breaks TBAA but in a way invisible to foo1.
For volatile, one typically needs to go a little further:
Every load and store needs to be performed explicitly;
There is a need to disallow load/store reordering;
...
Mostly because volatile may be used to access MMIO, and MMIO is more
strict than normal RAM in this area.
Though, could maybe be better if "volatile" could be broken into several subtypes depending on which particular behaviors are needed:
Weaker case: Assume aliasing happens.
May still prune non-aliasing load/store or reorder;
Normal case:
Every load/store needs to happen;
No reordering allowed.
Stronger case:
Like the above, but also needs to be synchronous between cores;
Though, this role overlaps with _Atomic.
There is also ambiguity as to how far the volatile-ness extends, but
this can be avoided by doing it at the point of cast-and-deref:
(*(volatile uint64_t *)ptr)
In this case, it applying explicitly to the deref operation rather than
the handling of the pointer before this point.
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
C requires the compiler to prove that the pointers cannot alias.
I wish. Actually, by default gcc assumes (i.e., it does not prove)
that pointers to different types (except char) do not point to the
same address. One has to turn that off with -fno-strict-aliasing.
Other C compilers use the same assumption.
scott@slp53.sl.home (Scott Lurndal) posted:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
Thomas Koenig <tkoenig@netcologne.de> posted:
MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:The debugger runs in a separate process with access to application
BGB <cr88192@gmail.com> posted:
On 6/20/2026 5:01 PM, MitchAlsup wrote:---------------
Tagging to make it harder to stomp the link register;
Put it somewhere it can't be stomped on !! like in memory on a page the
application has no access permissions.
Multiple stacks is a big ask, and non-accessible memory is not so good
when dealing with an ISA where user code needs to handle the Link-Register.
Code does not need to access or look at the return address in My 66000 >> >> > ISA--except for the case where one wants to walk the stack back on a
THROW() and its unstructured equivalent longjump().
What about a debugging stack trace?
Root pointer and ASID. In that process, Call-stack is RW-.
GLIBC has a function to obtain a backtrace at a current point
in time. This is called in the context of the thread that invokes
the call. It requires access to the call records on the stack
in the context of the thread (the glicb functions are backtrace(3)
and backtrace_symbols(3)).
When Thread is unExceptional it cannot access Call Stack,
when Thread is Exceptional it can.
ENTER, EXIT, and RET are exempt from the protection check.
Call Stack Pointer is not accessible to unprivileged code.
Don't see how one gets from a running application into debugger without >taking an exception !?! or from running in the debugger to running in >application without returning from an exception !!!
On Sat, 20 Jun 2026 10:15:41 -0400, Stefan Monnier
<monnier@iro.umontreal.ca> wrote:
Robert Swindells [2026-06-19 11:20:10] wrote:
On Fri, 19 Jun 2026 06:02:16 GMT, Anton Ertl wrote:[...]
Another architectural feature: One might think that tagging support
would help dynamically typed programming languages (e.g., Lisp), and
SPARC contains some support for that, but as one of the IIRC Franz
Lisp developers has explained in this newsgroup, they actually did
not use this feature, because the performance benefit was not big
enough to
Franz Lisp doesn't use tags at all and only ran on VAX and 68k.
I guess you two aren't talking bout the same "Franz Lisp". AFAIK Anton
is referring to the commercial Common Lisp compiler associated with the >>Franz Inc company, marketed under the name "Allegro".
=== Stefan
ISTM there were at least a couple of Lisps available for the Vax. I
can't speak to Franz, but I do know at least one Vax Lisp was a BIBOP[1] system that (generally) did not use tags.
In BIBOP, memory "pages"[2] are dedicated to a single data type. The
base address of the page is mapped to the type of the objects the page contains, and so the objects (and pointers to them) need no type
information themselves. This allowed for full width pointers, fixnums
and floats, and for conses, boxes, and other fixed sized data types (including user types) to avoid tagging.
On 24/06/2026 07:48, Anton Ertl wrote:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
C requires the compiler to prove that the pointers cannot alias.
I wish. Actually, by default gcc assumes (i.e., it does not prove)
that pointers to different types (except char) do not point to the
same address. One has to turn that off with -fno-strict-aliasing.
Other C compilers use the same assumption.
That's the way C is defined. It is debatable as to whether the rules in
the C standard are ideal ...
According to David Brown <david.brown@hesbynett.no>:
On 24/06/2026 07:48, Anton Ertl wrote:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
C requires the compiler to prove that the pointers cannot alias.
I wish. Actually, by default gcc assumes (i.e., it does not prove)
that pointers to different types (except char) do not point to the
same address. One has to turn that off with -fno-strict-aliasing.
Other C compilers use the same assumption.
That's the way C is defined. It is debatable as to whether the rules in
the C standard are ideal ...
One of the less fortunate things about C is that it is easy to write code that
is intuitively reasonable and sometimes works but isn't portable, e.g.:
char a[100];
a[0] = 42;
memcpy(a+1, a, 99);
A naive byte copy will fill a[] with 42, a more typical version that
moves larger blocks won't. This example is really obvious (it's
why there's also memmove()) but there's plenty of more subtle ones.
On 6/24/2026 3:17 PM, John Levine wrote:-------------------
One of the less fortunate things about C is that it is easy to write code that
is intuitively reasonable and sometimes works but isn't portable, e.g.:
char a[100];
a[0] = 42;
memcpy(a+1, a, 99);
A naive byte copy will fill a[] with 42, a more typical version that
moves larger blocks won't. This example is really obvious (it's
why there's also memmove()) but there's plenty of more subtle ones.
This one is why I added a "_memlzcpy()" function to my C library, whose
main purpose is to give this sort of self-overlapping copy behavior (and
to consolidate nearly every LZ77 style decompressor otherwise needing to supply their own version).
In the case of a short backwards copy, it will call "memmove()", but as noted the behavior in the case of a short forwards copy are different.
For non-overlap cases it can just invoke "memcpy()".
BGB <cr88192@gmail.com> schrieb:
Usual downside it that the excessive parenthesis tend to turn into a usability issue.Ample fun has been made of this over time.
From: jasmerb@mist.cs.orst.edu (Bryce Jasmer)
Newsgroups: rec.humor.funny
Subject: The Strategic Defense Initiative (SDI/Star Wars)
Keywords: computer, funny
Message-ID: <137457@looking.on.ca>
Date: 23 Apr 90 10:30:08 GMT
Sender: funnyr@looking.on.ca
Posted: Mon Apr 23 11:30:08 1990
Reply-Path: mist.cs.orst.edu!jasmerb
Through some clever security hole manipulation if I have been able to
break into all of the government's computers and acquire the Lisp code
to SDI. Here is the last page (tail -10) of it to prove that I actually
have the code:
)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) ))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
According to David Brown <david.brown@hesbynett.no>:
On 24/06/2026 07:48, Anton Ertl wrote:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
C requires the compiler to prove that the pointers cannot alias.
I wish. Actually, by default gcc assumes (i.e., it does not prove)
that pointers to different types (except char) do not point to the
same address. One has to turn that off with -fno-strict-aliasing.
Other C compilers use the same assumption.
That's the way C is defined. It is debatable as to whether the rules in
the C standard are ideal ...
One of the less fortunate things about C is that it is easy to write code that
is intuitively reasonable and sometimes works but isn't portable, e.g.:
char a[100];
a[0] = 42;
memcpy(a+1, a, 99);
A naive byte copy will fill a[] with 42, a more typical version that
moves larger blocks won't. This example is really obvious (it's
why there's also memmove()) but there's plenty of more subtle ones.
On 6/24/2026 3:17 PM, John Levine wrote:
According to David Brown <david.brown@hesbynett.no>:
On 24/06/2026 07:48, Anton Ertl wrote:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
C requires the compiler to prove that the pointers cannot alias.
I wish. Actually, by default gcc assumes (i.e., it does not prove)
that pointers to different types (except char) do not point to the
same address. One has to turn that off with -fno-strict-aliasing.
Other C compilers use the same assumption.
That's the way C is defined. It is debatable as to whether the rules in >>> the C standard are ideal ...
One of the less fortunate things about C is that it is easy to write
code that
is intuitively reasonable and sometimes works but isn't portable, e.g.:
char a[100];
a[0] = 42;
memcpy(a+1, a, 99);
A naive byte copy will fill a[] with 42, a more typical version that
moves larger blocks won't. This example is really obvious (it's
why there's also memmove()) but there's plenty of more subtle ones.
This one is why I added a "_memlzcpy()" function to my C library, whose
main purpose is to give this sort of self-overlapping copy behavior (and
to consolidate nearly every LZ77 style decompressor otherwise needing to supply their own version).
In the case of a short backwards copy, it will call "memmove()", but as noted the behavior in the case of a short forwards copy are different.
For non-overlap cases it can just invoke "memcpy()".
On 24/06/2026 23:34, BGB wrote:
On 6/24/2026 3:17 PM, John Levine wrote:"memmove" will not fill the array above with 42. "memmove" acts as
According to David Brown <david.brown@hesbynett.no>:
On 24/06/2026 07:48, Anton Ertl wrote:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
C requires the compiler to prove that the pointers cannot alias.
I wish. Actually, by default gcc assumes (i.e., it does not prove) >>>>> that pointers to different types (except char) do not point to the
same address. One has to turn that off with -fno-strict-aliasing.
Other C compilers use the same assumption.
That's the way C is defined. It is debatable as to whether the
rules in
the C standard are ideal ...
One of the less fortunate things about C is that it is easy to write
code that
is intuitively reasonable and sometimes works but isn't portable, e.g.:
char a[100];
a[0] = 42;
memcpy(a+1, a, 99);
A naive byte copy will fill a[] with 42, a more typical version that
moves larger blocks won't. This example is really obvious (it's
why there's also memmove()) but there's plenty of more subtle ones.
This one is why I added a "_memlzcpy()" function to my C library,
whose main purpose is to give this sort of self-overlapping copy
behavior (and to consolidate nearly every LZ77 style decompressor
otherwise needing to supply their own version).
In the case of a short backwards copy, it will call "memmove()", but
as noted the behavior in the case of a short forwards copy are different.
For non-overlap cases it can just invoke "memcpy()".
though it copies the source to a temporary buffer, then copies that temporary buffer to the destination. (If you want to fill the buffer
with the value 42, "memset" is the function to use.)
How is your "_memlzcpy" defined that is different from that?Here:
On Mon, 22 Jun 2026 18:49:40 -0400, George Neuner wrote:
On Sat, 20 Jun 2026 10:15:41 -0400, Stefan Monnier
<monnier@iro.umontreal.ca> wrote:
Robert Swindells [2026-06-19 11:20:10] wrote:
On Fri, 19 Jun 2026 06:02:16 GMT, Anton Ertl wrote:[...]
Another architectural feature: One might think that tagging support
would help dynamically typed programming languages (e.g., Lisp), and >>>>> SPARC contains some support for that, but as one of the IIRC Franz
Lisp developers has explained in this newsgroup, they actually did
not use this feature, because the performance benefit was not big
enough to
Franz Lisp doesn't use tags at all and only ran on VAX and 68k.
I guess you two aren't talking bout the same "Franz Lisp". AFAIK Anton
is referring to the commercial Common Lisp compiler associated with the >>>Franz Inc company, marketed under the name "Allegro".
=== Stefan
ISTM there were at least a couple of Lisps available for the Vax. I
can't speak to Franz, but I do know at least one Vax Lisp was a BIBOP[1]
system that (generally) did not use tags.
Franz Lisp used BiBOP.
On 6/25/2026 2:22 AM, David Brown wrote:
On 24/06/2026 23:34, BGB wrote:
On 6/24/2026 3:17 PM, John Levine wrote:"memmove" will not fill the array above with 42. "memmove" acts as
According to David Brown <david.brown@hesbynett.no>:
On 24/06/2026 07:48, Anton Ertl wrote:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
C requires the compiler to prove that the pointers cannot alias.
I wish. Actually, by default gcc assumes (i.e., it does not prove) >>>>>> that pointers to different types (except char) do not point to the>>>>>> same address. One has to turn that off with -fno-strict-aliasing.
Other C compilers use the same assumption.
That's the way C is defined. It is debatable as to whether the
rules in
the C standard are ideal ...
One of the less fortunate things about C is that it is easy to write
code that
is intuitively reasonable and sometimes works but isn't portable, e.g.: >>>>
    char a[100];
    a[0] = 42;
    memcpy(a+1, a, 99);
A naive byte copy will fill a[] with 42, a more typical version that>>>> moves larger blocks won't. This example is really obvious (it's
why there's also memmove()) but there's plenty of more subtle ones.
This one is why I added a "_memlzcpy()" function to my C library,
whose main purpose is to give this sort of self-overlapping copy
behavior (and to consolidate nearly every LZ77 style decompressor
otherwise needing to supply their own version).
In the case of a short backwards copy, it will call "memmove()", but >>> as noted the behavior in the case of a short forwards copy are
different.
For non-overlap cases it can just invoke "memcpy()".
though it copies the source to a temporary buffer, then copies that
temporary buffer to the destination. (If you want to fill the buffer
with the value 42, "memset" is the function to use.)
Yeah, this is why I created "_memlzcpy()", because the defined behavior
for "memmove()" is not what one wants for self-overlapping forward copy.
How is your "_memlzcpy" defined that is different from that?Here:
_memlzcpy(dst+1, dst, len);
Is functionally equivalent to:
memset(dst+1, *dst, len);
But, it can do more:
_memlzcpy(dst+2, dst, len); //repeating 2-byte pattern
_memlzcpy(dst+3, dst, len); //repeating 3-byte pattern
...
So, required to work for every self-overlap distance.
Or, in the case as commonly used in an LZ77 style decompressor:
_memlzcpy(dest, dest-distance, length);
Though, there are also:I'm guessing you really meant up to 31 bytes extra?
_memcpyf()
_memmovef()
_memlzcpyf()
Where the 'f' in this case means:
Allowed to be a little faster by potentially going up to 32 bytes extra.
Robert Swindells [2026-06-24 14:38:02] wrote:
On Mon, 22 Jun 2026 18:49:40 -0400, George Neuner wrote:
On Sat, 20 Jun 2026 10:15:41 -0400, Stefan Monnier
<monnier@iro.umontreal.ca> wrote:
Robert Swindells [2026-06-19 11:20:10] wrote:
On Fri, 19 Jun 2026 06:02:16 GMT, Anton Ertl wrote:[...]
Another architectural feature: One might think that tagging support >>>>>> would help dynamically typed programming languages (e.g., Lisp),
and SPARC contains some support for that, but as one of the IIRC
Franz Lisp developers has explained in this newsgroup, they
actually did not use this feature, because the performance benefit >>>>>> was not big enough to
Franz Lisp doesn't use tags at all and only ran on VAX and 68k.
I guess you two aren't talking bout the same "Franz Lisp". AFAIK Anton >>>>is referring to the commercial Common Lisp compiler associated with
the Franz Inc company, marketed under the name "Allegro".
=== Stefan
ISTM there were at least a couple of Lisps available for the Vax. I
can't speak to Franz, but I do know at least one Vax Lisp was a
BIBOP[1]
system that (generally) did not use tags.
Franz Lisp used BiBOP.
Side note: the BiBoP technique is largely orthogonal to the
architectural support for pointer tagging, because usually BiBoP is used
to "eliminate" the tags present inside the heap representation of
objects rather than the few tagbits stolen from pointers: the purpose of those tagbits is usually to be able to determine the type of the object *without* any memory access whereas BiBoP stores the corresponding info
in memory.
E.g. tagbits are most commonly used to distinguish between an immediate
small integer value and a pointer. BiBoP wouldn't help with that,
forcing the small integer to be stored in some "page of small integers"
which could have a very serious performance impact.
=== Stefan
BGB wrote:
On 6/25/2026 2:22 AM, David Brown wrote:
On 24/06/2026 23:34, BGB wrote:
On 6/24/2026 3:17 PM, John Levine wrote:"memmove" will not fill the array above with 42. "memmove" acts as though it copies the source to a temporary buffer, then copies that temporary buffer to the destination. (If you want to fill the buffer with the value 42, "memset" is the function to use.)
According to David Brown <david.brown@hesbynett.no>:
On 24/06/2026 07:48, Anton Ertl wrote:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
C requires the compiler to prove that the pointers cannot alias. >>>>>>>I wish. Actually, by default gcc assumes (i.e., it does not prove) >>>>>>> that pointers to different types (except char) do not point to the >>>>>>> same address. One has to turn that off with -fno-strict-aliasing. >>>>>>> Other C compilers use the same assumption.
That's the way C is defined. It is debatable as to whether the rules in
the C standard are ideal ...
One of the less fortunate things about C is that it is easy to write code that
is intuitively reasonable and sometimes works but isn't portable, e.g.: >>>>>
    char a[100];
    a[0] = 42;
    memcpy(a+1, a, 99);
A naive byte copy will fill a[] with 42, a more typical version that >>>>> moves larger blocks won't. This example is really obvious (it's
why there's also memmove()) but there's plenty of more subtle ones.
This one is why I added a "_memlzcpy()" function to my C library, whose main purpose is to give this sort of self-overlapping copy behavior (and to consolidate nearly every LZ77 style decompressor otherwise needing to supply their own version).
In the case of a short backwards copy, it will call "memmove()", but as noted the behavior in the case of a short forwards copy are different.
For non-overlap cases it can just invoke "memcpy()".
Yeah, this is why I created "_memlzcpy()", because the defined behavior for "memmove()" is not what one wants for self-overlapping forward copy.
How is your "_memlzcpy" defined that is different from that?Here:
_memlzcpy(dst+1, dst, len);
Is functionally equivalent to:
memset(dst+1, *dst, len);
But, it can do more:
_memlzcpy(dst+2, dst, len); //repeating 2-byte pattern
_memlzcpy(dst+3, dst, len); //repeating 3-byte pattern
...
So, required to work for every self-overlap distance.
Or, in the case as commonly used in an LZ77 style decompressor:
_memlzcpy(dest, dest-distance, length);
Though, there are also:
_memcpyf()
_memmovef()
_memlzcpyf()
Where the 'f' in this case means:
Allowed to be a little faster by potentially going up to 32 bytes extra.
I'm guessing you really meant up to 31 bytes extra?
This is what my own (faster than Google's version) LZ4 decompressor uses internally.
I am using either a pair of SSE or a single AVX register (so 32 bytes in both cases) as the copy granule. For the specific,very common, case of an overlapping copy that unrolls RLL-encoded data, I start by loading the starting pattern into the bottom of a register, then use the pattern length to index into a table of swizzle patterns that will generate the required results, for any pattern up to 32 bytes long.
swizzle_table:
[0,0,0,0,0,0,0,...
[0,1,0,1,0,1,0,1,...
[0,1,2,0,1,2,0,1,2,...
[0,1,2,3,0,1,2,3,...
[0,1,2,3,4,0,1,2,3,..
etc.
Note that having 31 entries of 32 bytes each means that I'm allocating almost a KB of $L1 cache space just for this table, but when you're decompressing lots of data it pays off.
Terje
According to David Brown <david.brown@hesbynett.no>:
On 24/06/2026 07:48, Anton Ertl wrote:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
C requires the compiler to prove that the pointers cannot alias.
I wish. Actually, by default gcc assumes (i.e., it does not prove)
that pointers to different types (except char) do not point to the
same address. One has to turn that off with -fno-strict-aliasing.
Other C compilers use the same assumption.
That's the way C is defined. It is debatable as to whether the rules in >>the C standard are ideal ...
One of the less fortunate things about C is that it is easy to write code that >is intuitively reasonable and sometimes works but isn't portable, e.g.:
char a[100];
a[0] = 42;
memcpy(a+1, a, 99);
A naive byte copy will fill a[] with 42, a more typical version that
moves larger blocks won't. This example is really obvious (it's
why there's also memmove()) but there's plenty of more subtle ones.
"memmove" will not fill the array above with 42. "memmove" acts as
though it copies the source to a temporary buffer, then copies that temporary buffer to the destination. (If you want to fill the buffer
with the value 42, "memset" is the function to use.)
On 2026-Jun-25 08:39, Terje Mathisen wrote:We do have that, in the form of a masked move, but it is more efficient
BGB wrote:
On 6/25/2026 2:22 AM, David Brown wrote:
On 24/06/2026 23:34, BGB wrote:
On 6/24/2026 3:17 PM, John Levine wrote:"memmove" will not fill the array above with 42. "memmove" acts >>>> as though it copies the source to a temporary buffer, then copies
According to David Brown <david.brown@hesbynett.no>:
On 24/06/2026 07:48, Anton Ertl wrote:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
C requires the compiler to prove that the pointers cannot alias. >>>>>>>>I wish. Actually, by default gcc assumes (i.e., it does not >>>>>>>> prove)
that pointers to different types (except char) do not point to the >>>>>>>> same address. One has to turn that off with
-fno-strict-aliasing.
Other C compilers use the same assumption.
That's the way C is defined. It is debatable as to whether >>>>>>> the rules in
the C standard are ideal ...
One of the less fortunate things about C is that it is easy to
write code that
is intuitively reasonable and sometimes works but isn't portable, >>>>>> e.g.:
    char a[100];
    a[0] = 42;
    memcpy(a+1, a, 99);
A naive byte copy will fill a[] with 42, a more typical version that >>>>>> moves larger blocks won't. This example is really obvious (it's >>>>>> why there's also memmove()) but there's plenty of more subtle ones. >>>>>>
This one is why I added a "_memlzcpy()" function to my C library,
whose main purpose is to give this sort of self-overlapping copy
behavior (and to consolidate nearly every LZ77 style decompressor
otherwise needing to supply their own version).
In the case of a short backwards copy, it will call "memmove()",
but as noted the behavior in the case of a short forwards copy are >>>>> different.
For non-overlap cases it can just invoke "memcpy()".
that temporary buffer to the destination. (If you want to fill >>>> the buffer with the value 42, "memset" is the function to use.)
Yeah, this is why I created "_memlzcpy()", because the defined
behavior for "memmove()" is not what one wants for self-overlapping
forward copy.
How is your "_memlzcpy" defined that is different from that?Here:
  _memlzcpy(dst+1, dst, len);
Is functionally equivalent to:
  memset(dst+1, *dst, len);
But, it can do more:
  _memlzcpy(dst+2, dst, len); //repeating 2-byte pattern
  _memlzcpy(dst+3, dst, len); //repeating 3-byte pattern
  ...
So, required to work for every self-overlap distance.
Or, in the case as commonly used in an LZ77 style decompressor:
  _memlzcpy(dest, dest-distance, length);
Though, there are also:
  _memcpyf()
  _memmovef()
  _memlzcpyf()
Where the 'f' in this case means:
Allowed to be a little faster by potentially going up to 32 bytes extra.
I'm guessing you really meant up to 31 bytes extra?
This is what my own (faster than Google's version) LZ4 decompressor
uses internally.
I am using either a pair of SSE or a single AVX register (so 32 bytes >> in both cases) as the copy granule. For the specific,very common, case
of an overlapping copy that unrolls RLL-encoded data, I start by
loading the starting pattern into the bottom of a register, then use
the pattern length to index into a table of swizzle patterns that will
generate the required results, for any pattern up to 32 bytes long.
swizzle_table:
[0,0,0,0,0,0,0,...
[0,1,0,1,0,1,0,1,...
[0,1,2,0,1,2,0,1,2,...
[0,1,2,3,0,1,2,3,...
[0,1,2,3,4,0,1,2,3,..
etc.
Note that having 31 entries of 32 bytes each means that I'm allocating
almost a KB of $L1 cache space just for this table, but when you're
decompressing lots of data it pays off.
If I had 256b,32B registers I would like to have LDV Load Variable and > STV Store Variable
instructions, which take an address, a src/dst simd register, and either
a scalar register
or immediate byte count in the range 0..32. LDV loads the specified
number of bytes into
the simd starting at the least significant byte and zero-fills any
unread ones.
These should be relatively easy to implement if one already has
unaligned SIMD LD/ST.
One might also consider LDBV/STBV variable length bit vectors 0 to 256b,
Thomas Koenig <tkoenig@netcologne.de> writes:
BGB <cr88192@gmail.com> schrieb:
Usual downside it that the excessive parenthesis tend to turn into a usability issue.Ample fun has been made of this over time.
From rec.humor.funny:
From: jasmerb@mist.cs.orst.edu (Bryce Jasmer)
Newsgroups: rec.humor.funny
Subject: The Strategic Defense Initiative (SDI/Star Wars)
Keywords: computer, funny
Message-ID: <137457@looking.on.ca>
Date: 23 Apr 90 10:30:08 GMT
Sender: funnyr@looking.on.ca
Posted: Mon Apr 23 11:30:08 1990
Reply-Path: mist.cs.orst.edu!jasmerb
Through some clever security hole manipulation if I have been able to
break into all of the government's computers and acquire the Lisp code
to SDI. Here is the last page (tail -10) of it to prove that I actually have the code:
)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) ))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
Andy Valencia--- Synchronet 3.22a-Linux NewsLink 1.2
Home page: https://www.vsta.org/andy/
To contact me: https://www.vsta.org/contact/andy.html
No AI was used in the composition of this message
According to David Brown <david.brown@hesbynett.no>:
On 24/06/2026 07:48, Anton Ertl wrote:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
C requires the compiler to prove that the pointers cannot alias.
I wish. Actually, by default gcc assumes (i.e., it does not prove)
that pointers to different types (except char) do not point to the
same address. One has to turn that off with -fno-strict-aliasing.
Other C compilers use the same assumption.
That's the way C is defined. It is debatable as to whether the rules in >the C standard are ideal ...
One of the less fortunate things about C is that it is easy to write code that
is intuitively reasonable and sometimes works but isn't portable, e.g.:
char a[100];
a[0] = 42;
memcpy(a+1, a, 99);
A naive byte copy will fill a[] with 42, a more typical version that
moves larger blocks won't. This example is really obvious (it's
why there's also memmove()) but there's plenty of more subtle ones.
On 24/06/2026 23:34, BGB wrote:---------------------
"memmove" will not fill the array above with 42. "memmove" acts as
though it copies the source to a temporary buffer, then copies that temporary buffer to the destination. (If you want to fill the buffer
with the value 42, "memset" is the function to use.)
How is your "_memlzcpy" defined that is different from that?
David Brown <david.brown@hesbynett.no> posted:
On 24/06/2026 23:34, BGB wrote:---------------------
"memmove" will not fill the array above with 42. "memmove" acts as
though it copies the source to a temporary buffer, then copies that
temporary buffer to the destination. (If you want to fill the buffer
with the value 42, "memset" is the function to use.)
Act as though it copies twice is utterly unnecessary as overlapping
memory can simply be performed back-to-front instead of front-to-back.
How is your "_memlzcpy" defined that is different from that?
John Levine <johnl@taugh.com> posted:
According to David Brown <david.brown@hesbynett.no>:
On 24/06/2026 07:48, Anton Ertl wrote:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
C requires the compiler to prove that the pointers cannot alias.
I wish. Actually, by default gcc assumes (i.e., it does not prove)
that pointers to different types (except char) do not point to the
same address. One has to turn that off with -fno-strict-aliasing.
Other C compilers use the same assumption.
That's the way C is defined. It is debatable as to whether the rules in >> >the C standard are ideal ...
One of the less fortunate things about C is that it is easy to write code that
is intuitively reasonable and sometimes works but isn't portable, e.g.:
char a[100];
a[0] = 42;
memcpy(a+1, a, 99);
Why not::
memset( a, 42, 100 );
BGB wrote:
On 6/25/2026 2:22 AM, David Brown wrote:
On 24/06/2026 23:34, BGB wrote:
On 6/24/2026 3:17 PM, John Levine wrote:"memmove" will not fill the array above with 42. "memmove" acts as
According to David Brown <david.brown@hesbynett.no>:
On 24/06/2026 07:48, Anton Ertl wrote:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
C requires the compiler to prove that the pointers cannot alias. >>>>>>>I wish. Actually, by default gcc assumes (i.e., it does not prove) >>>>>>> that pointers to different types (except char) do not point to the >>>>>>> same address. One has to turn that off with -fno-strict-aliasing. >>>>>>> Other C compilers use the same assumption.
That's the way C is defined. It is debatable as to whether the >>>>>> rules in
the C standard are ideal ...
One of the less fortunate things about C is that it is easy to
write code that
is intuitively reasonable and sometimes works but isn't portable,
e.g.:
    char a[100];
    a[0] = 42;
    memcpy(a+1, a, 99);
A naive byte copy will fill a[] with 42, a more typical version that >>>>> moves larger blocks won't. This example is really obvious (it's
why there's also memmove()) but there's plenty of more subtle ones.
This one is why I added a "_memlzcpy()" function to my C library,
whose main purpose is to give this sort of self-overlapping copy
behavior (and to consolidate nearly every LZ77 style decompressor
otherwise needing to supply their own version).
In the case of a short backwards copy, it will call "memmove()", but
as noted the behavior in the case of a short forwards copy are
different.
For non-overlap cases it can just invoke "memcpy()".
though it copies the source to a temporary buffer, then copies that
temporary buffer to the destination. (If you want to fill the
buffer with the value 42, "memset" is the function to use.)
Yeah, this is why I created "_memlzcpy()", because the defined
behavior for "memmove()" is not what one wants for self-overlapping
forward copy.
How is your "_memlzcpy" defined that is different from that?Here:
_memlzcpy(dst+1, dst, len);
Is functionally equivalent to:
memset(dst+1, *dst, len);
But, it can do more:
_memlzcpy(dst+2, dst, len); //repeating 2-byte pattern
_memlzcpy(dst+3, dst, len); //repeating 3-byte pattern
...
So, required to work for every self-overlap distance.
Or, in the case as commonly used in an LZ77 style decompressor:
_memlzcpy(dest, dest-distance, length);
Though, there are also:
_memcpyf()
_memmovef()
_memlzcpyf()
Where the 'f' in this case means:
Allowed to be a little faster by potentially going up to 32 bytes extra.
I'm guessing you really meant up to 31 bytes extra?
This is what my own (faster than Google's version) LZ4 decompressor uses internally.
I am using either a pair of SSE or a single AVX register (so 32 bytes in both cases) as the copy granule. For the specific,very common, case of
an overlapping copy that unrolls RLL-encoded data, I start by loading
the starting pattern into the bottom of a register, then use the pattern length to index into a table of swizzle patterns that will generate the required results, for any pattern up to 32 bytes long.
swizzle_table:
[0,0,0,0,0,0,0,...
[0,1,0,1,0,1,0,1,...
[0,1,2,0,1,2,0,1,2,...
[0,1,2,3,0,1,2,3,...
[0,1,2,3,4,0,1,2,3,..
etc.
Note that having 31 entries of 32 bytes each means that I'm allocating almost a KB of $L1 cache space just for this table, but when you're decompressing lots of data it pays off.
David Brown <david.brown@hesbynett.no> posted:
On 24/06/2026 23:34, BGB wrote:---------------------
"memmove" will not fill the array above with 42. "memmove" acts as
though it copies the source to a temporary buffer, then copies that
temporary buffer to the destination. (If you want to fill the buffer
with the value 42, "memset" is the function to use.)
Act as though it copies twice is utterly unnecessary as overlapping
memory can simply be performed back-to-front instead of front-to-back.
How is your "_memlzcpy" defined that is different from that?
Andy Valencia <vandys@vsta.org> posted:
Thomas Koenig <tkoenig@netcologne.de> writes:
BGB <cr88192@gmail.com> schrieb:
Usual downside it that the excessive parenthesis tend to turn into aAmple fun has been made of this over time.
usability issue.
From rec.humor.funny:
From: jasmerb@mist.cs.orst.edu (Bryce Jasmer)
Newsgroups: rec.humor.funny
Subject: The Strategic Defense Initiative (SDI/Star Wars)
Keywords: computer, funny
Message-ID: <137457@looking.on.ca>
Date: 23 Apr 90 10:30:08 GMT
Sender: funnyr@looking.on.ca
Posted: Mon Apr 23 11:30:08 1990
Reply-Path: mist.cs.orst.edu!jasmerb
Through some clever security hole manipulation if I have been able to
break into all of the government's computers and acquire the Lisp code
to SDI. Here is the last page (tail -10) of it to prove that I actually
have the code:
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
I remember the LISP on PDP-8. One could use the character ] to mean as many )s as needed to close the lambda.
The glibc function ::backtrace can be called at any time, in any context.
Then there are the unix context functions that also allow access to
resources not normally visible to an application - getcontext(2), makecontext(3) and the setjmp/sigsetjmp functions which also
gather the thread context, including the current stack pointer.
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
/**
* Log a simulator stack traceback.
*/
void
c_osdep::backtrace(c_logger *lp)
{
int num_frames;
void *framelist[100];
char **strings;
num_frames = ::backtrace(framelist, sizeof(framelist)/sizeof(framelist[0]));
strings = ::backtrace_symbols(framelist, num_frames);
if (strings == NULL) {
lp->log("Unable to obtain simulator stack traceback: %s\n",
strerror(errno));
return;
}
for(int frame=0; frame < num_frames; frame++) {
lp->log("[%2.2d] %s\n", frame, strings[frame]);
}
::free(strings);
}
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 6/19/2026 11:59 AM, John Levine wrote:
According to David Brown <david.brown@hesbynett.no>:
Possibly the biggest millstone around the neck of computing
architectures is the C language. ...
De-facto standards are /always/ albatrosses to some extent. Things are >>>> done that way because things are done that way - processors are designed >>>> to run C (or C-model languages, if you like) because that's what
existing code is written in, and code is written in C (or similar
languages, or languages with a VM written in C) because that's how
existing processors work.
C killed off every memory model other than flat byte addressed memory.
Pointers are sort of typed, but any real C program does stuff like this: >>>
p = (struct foo *) malloc(42 * sizeof(struct foo));
Fwiw, why all of the casts?
C and C++ handle void* conversions differently. You must cast
the malloc result to a pointer of the declared type when using C++.
It doesn't hurt to add the cast in C, and may help with documenting
the intention of the programmer who wrote the code.
GLIBC has a function to obtain a backtrace at a current point
in time. This is called in the context of the thread that invokes
the call. It requires access to the call records on the stack
in the context of the thread (the glicb functions are backtrace(3)
and backtrace_symbols(3)).
/**
* Log a simulator stack traceback.
*/
void
c_osdep::backtrace(c_logger *lp)
scott@slp53.sl.home (Scott Lurndal) posted:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
/**
* Log a simulator stack traceback.
*/
void
c_osdep::backtrace(c_logger *lp)
{
int num_frames;
void *framelist[100];
char **strings;
num_frames = ::backtrace(framelist, sizeof(framelist)/sizeof(framelist[0]));
Where does ::backtrace get access to the number of preserved registers
on the stack and where the return address is on a per subroutine basis ??
That is: each stack frame is of a different size with return address at a different spot per subroutine.
BGB <cr88192@gmail.com> posted:
On 6/22/2026 7:38 AM, Niklas Holsti wrote:-------------
On 2026-06-22 13:44, Thomas Koenig wrote:
Niklas Holsti <niklas.holsti@tidorum.invalid> schrieb:
On 2026-06-21 22:15, David Brown wrote:
On 21/06/2026 20:57, MitchAlsup wrote:
In my case, I tended to use more conservative approaches and then only
optimize based on what can be verified by the compiler within certain
fundamental assumptions.
Say:
Pointer 1 points at a stack array in the local function;
Pointer 2 was derived from taking the address of a global array;
Compiler can safely assume no-alias.
Also, if two pointers were passed into a function, can also assume they
don't alias with a pointer to a local array;
C requires the compiler to prove that the pointers cannot alias.
Fortran specifies that if the 2 argument alias, it is a programming error.
-----------------
I am reminded of the person, apparently very religious, who some decades >>> ago posted to solicit help for reimplementing all of computing (gcc,
GNU, et cetera) on Biblical principles, because he thought Richard
Stallman was too atheistic and had tainted his products. I have not
heard how that went.
Rick...
--------
But if you know that you have a "page of small integers" then you can just
do address comparisons between them, the Franz Lisp compiler did this.
According to MitchAlsup <user5857@newsgrouper.org.invalid>:
John Levine <johnl@taugh.com> posted:
One of the less fortunate things about C is that it is easy to write code that
is intuitively reasonable and sometimes works but isn't portable, e.g.:
char a[100];
a[0] = 42;
memcpy(a+1, a, 99);
Why not::
memset( a, 42, 100 );
Jeez, it's an example.
John Levine [2026-06-25 19:19:47] wrote:
According to MitchAlsup <user5857@newsgrouper.org.invalid>:
John Levine <johnl@taugh.com> posted:
One of the less fortunate things about C is that it is easy to write code that
is intuitively reasonable and sometimes works but isn't portable, e.g.: >>>>
char a[100];
a[0] = 42;
memcpy(a+1, a, 99);
Why not::
memset( a, 42, 100 );
Jeez, it's an example.
It's an example, indeed, but it's a pretty bad one since using `memset`
is more clear, more concise, and actually works, whereas your example
seems very contrived.
John Levine <johnl@taugh.com> posted:
According to David Brown <david.brown@hesbynett.no>:
On 24/06/2026 07:48, Anton Ertl wrote:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
C requires the compiler to prove that the pointers cannot alias.
I wish. Actually, by default gcc assumes (i.e., it does not prove)
that pointers to different types (except char) do not point to the
same address. One has to turn that off with -fno-strict-aliasing.
Other C compilers use the same assumption.
That's the way C is defined. It is debatable as to whether the rules in >>> the C standard are ideal ...
One of the less fortunate things about C is that it is easy to write code that
is intuitively reasonable and sometimes works but isn't portable, e.g.:
char a[100];
a[0] = 42;
memcpy(a+1, a, 99);
Why not::
memset( a, 42, 100 );
MitchAlsup wrote:
John Levine <johnl@taugh.com> posted:
According to David Brown <david.brown@hesbynett.no>:
On 24/06/2026 07:48, Anton Ertl wrote:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
C requires the compiler to prove that the pointers cannot alias.
I wish. Actually, by default gcc assumes (i.e., it does not prove) >>>>> that pointers to different types (except char) do not point to the
same address. One has to turn that off with -fno-strict-aliasing.
Other C compilers use the same assumption.
That's the way C is defined. It is debatable as to whether the
rules in
the C standard are ideal ...
One of the less fortunate things about C is that it is easy to write
code that
is intuitively reasonable and sometimes works but isn't portable, e.g.:
char a[100];
a[0] = 42;
memcpy(a+1, a, 99);
Why not::
memset( a, 42, 100 );
In the case of a single repeating byte, memset is of course optimal, but
the same LZ4 encoding is used to encode any repeating pattern, of
lengths from 1 and up. There is no indexed memset where the pattern is
of arbitrary length.
On 6/23/2026 5:54 PM, MitchAlsup wrote:
BGB <cr88192@gmail.com> posted:
On 6/22/2026 7:38 AM, Niklas Holsti wrote:-------------
On 2026-06-22 13:44, Thomas Koenig wrote:
Niklas Holsti <niklas.holsti@tidorum.invalid> schrieb:
On 2026-06-21 22:15, David Brown wrote:
On 21/06/2026 20:57, MitchAlsup wrote:
In my case, I tended to use more conservative approaches and then only
optimize based on what can be verified by the compiler within certain
fundamental assumptions.
Say:
Pointer 1 points at a stack array in the local function;
Pointer 2 was derived from taking the address of a global array;
Compiler can safely assume no-alias.
Also, if two pointers were passed into a function, can also assume they
don't alias with a pointer to a local array;
C requires the compiler to prove that the pointers cannot alias.
Fortran specifies that if the 2 argument alias, it is a programming error.
sorry of if this is way off base, but well...
What about container_of, or CONTAINING_RECORD?
scott@slp53.sl.home (Scott Lurndal) posted:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
/**
* Log a simulator stack traceback.
*/
void
c_osdep::backtrace(c_logger *lp)
{
int num_frames;
void *framelist[100];
char **strings;
num_frames = ::backtrace(framelist, sizeof(framelist)/sizeof(framelist[0]));
Where does ::backtrace get access to the number of preserved registers
on the stack and where the return address is on a per subroutine basis ??
Scott Lurndal <scott@slp53.sl.home> schrieb:
GLIBC has a function to obtain a backtrace at a current point
in time. This is called in the context of the thread that invokes
the call. It requires access to the call records on the stack
in the context of the thread (the glicb functions are backtrace(3)
and backtrace_symbols(3)).
/**
* Log a simulator stack traceback.
*/
void
c_osdep::backtrace(c_logger *lp)
Nit: That is not glibc code, glibc code is C (it would be strange to
have a C++ runtime library for C...)
it is certainly the case that people
have used memcpy() with overlapping regions and an assumption that it
copies forward in some way.
David Brown <david.brown@hesbynett.no> writes:
it is certainly the case that people
have used memcpy() with overlapping regions and an assumption that it
copies forward in some way.
More precisely, in 2010 there was a big flamewar because a newer glibc
used backwards stride on some processors for some combinations of
source and destination addresses, and this broke a pre-existing binary
(of a Flash player IIRC).
John Levine <johnl@taugh.com> posted:
According to David Brown <david.brown@hesbynett.no>:
On 24/06/2026 07:48, Anton Ertl wrote:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
C requires the compiler to prove that the pointers cannot alias.
I wish. Actually, by default gcc assumes (i.e., it does not prove)
that pointers to different types (except char) do not point to the
same address. One has to turn that off with -fno-strict-aliasing.
Other C compilers use the same assumption.
That's the way C is defined. It is debatable as to whether the rules in >>> the C standard are ideal ...
One of the less fortunate things about C is that it is easy to write code that
is intuitively reasonable and sometimes works but isn't portable, e.g.:
char a[100];
a[0] = 42;
memcpy(a+1, a, 99);
Why not::
memset( a, 42, 100 );
?????
A naive byte copy will fill a[] with 42, a more typical version that
moves larger blocks won't. This example is really obvious (it's
why there's also memmove()) but there's plenty of more subtle ones.
On 2026-Jun-26 06:24, Chris M. Thomasson wrote:
On 6/23/2026 5:54 PM, MitchAlsup wrote:
BGB <cr88192@gmail.com> posted:
On 6/22/2026 7:38 AM, Niklas Holsti wrote:-------------
On 2026-06-22 13:44, Thomas Koenig wrote:
Niklas Holsti <niklas.holsti@tidorum.invalid> schrieb:
On 2026-06-21 22:15, David Brown wrote:
On 21/06/2026 20:57, MitchAlsup wrote:
In my case, I tended to use more conservative approaches and then only >>>> optimize based on what can be verified by the compiler within certain
fundamental assumptions.
Say:
Pointer 1 points at a stack array in the local function;
Pointer 2 was derived from taking the address of a global array; >>>> Compiler can safely assume no-alias.
Also, if two pointers were passed into a function, can also assume they >>>> don't alias with a pointer to a local array;
C requires the compiler to prove that the pointers cannot alias.
Fortran specifies that if the 2 argument alias, it is a programming
error.
sorry of if this is way off base, but well...
What about container_of, or CONTAINING_RECORD?
If that is what I think it is, where it cast from
a pointer to a field inside a struct back to the containing struct
by subtracting the field byte offset and changing the pointer type, irrespective of programming language that mechanism has been used
by operating systems at least since RSX days.
It is a compact way of having structs linked to many other structures.
That macro is just a variant of the mechanism for C.
The method is used by WinNT and Linux, and I believe also by the BSD's.
GCC has a compile option, no_strict_alias or something, that anyone
using it and doing "illegal" pointer casting must use.
In Windows land, pointer casting at least used to be Microsoft's
recommended method and is supported by their compiler because
they use it too, extensively.
I have used it when I had complex multiple linkages between data
structures.
Say an object is in multiple double linked lists and an index tree and I
need to cast from a pointer to a list link field back to the object containing that link field. I also often put a validity check marker for
each object type at the start of the container and Assert its correctness. The marker is zeroed when the container is destroyed to catch
any dangling references.
On 6/26/2026 7:11 AM, EricP wrote:
On 2026-Jun-26 06:24, Chris M. Thomasson wrote:
On 6/23/2026 5:54 PM, MitchAlsup wrote:
BGB <cr88192@gmail.com> posted:
On 6/22/2026 7:38 AM, Niklas Holsti wrote:-------------
On 2026-06-22 13:44, Thomas Koenig wrote:
Niklas Holsti <niklas.holsti@tidorum.invalid> schrieb:
On 2026-06-21 22:15, David Brown wrote:
On 21/06/2026 20:57, MitchAlsup wrote:
In my case, I tended to use more conservative approaches and then only >>>>> optimize based on what can be verified by the compiler within certain >>>>> fundamental assumptions.
Say:
Pointer 1 points at a stack array in the local function;
Pointer 2 was derived from taking the address of a global array; >>>>> Compiler can safely assume no-alias.
Also, if two pointers were passed into a function, can also assume
they
don't alias with a pointer to a local array;
C requires the compiler to prove that the pointers cannot alias.
Fortran specifies that if the 2 argument alias, it is a programming
error.
sorry of if this is way off base, but well...
What about container_of, or CONTAINING_RECORD?
If that is what I think it is, where it cast from
a pointer to a field inside a struct back to the containing struct
by subtracting the field byte offset and changing the pointer type,
irrespective of programming language that mechanism has been used
by operating systems at least since RSX days.
It is a compact way of having structs linked to many other structures.
That macro is just a variant of the mechanism for C.
The method is used by WinNT and Linux, and I believe also by the BSD's.
GCC has a compile option, no_strict_alias or something, that anyone
using it and doing "illegal" pointer casting must use.
In Windows land, pointer casting at least used to be Microsoft's
recommended method and is supported by their compiler because
they use it too, extensively.
I have used it when I had complex multiple linkages between data
structures.
Say an object is in multiple double linked lists and an index tree and I
need to cast from a pointer to a list link field back to the object
containing that link field. I also often put a validity check marker for
each object type at the start of the container and Assert its
correctness.
The marker is zeroed when the container is destroyed to catch
any dangling references.
Yup. You got it and basically had to use it the same way I have in the
past. Its really cool. Also, check this shit out:
#define RALLOC_ALIGN_OF(mp_type) \
offsetof( \
struct { \
char pad_RALLOC_ALIGN_OF; \
mp_type type_RALLOC_ALIGN_OF; \
}, \
type_RALLOC_ALIGN_OF \
)
;^D
a[0] = 42;
memcpy(a+1, a, 99);
A naive byte copy will fill a[] with 42, a more typical version that
moves larger blocks won't. This example is really obvious (it's
why there's also memmove()) but there's plenty of more subtle ones.
The burroughs B3500 and medium systems successors, which is a >memory-to-memory architecture had a number of move instructions,
several of which had architecturally defined semantics for
overlapping source and destination fields, which included
functionality similar to that you describe above. ...
| Sysop: | DaiTengu |
|---|---|
| Location: | Appleton, WI |
| Users: | 1,124 |
| Nodes: | 10 (0 / 10) |
| Uptime: | 24:41:32 |
| Calls: | 14,394 |
| Calls today: | 3 |
| Files: | 186,389 |
| D/L today: |
6,226 files (1,574M bytes) |
| Messages: | 2,545,009 |
| Posted today: | 1 |