I had considered proceeding on to a CISC Concertina III. However, after starting to look at that project, I found that there was a lack of opcode space in one spot.
Just the other day, though, it occurred to me that there was a possibility of improving Concertina II.
After a long period of changing it, because I was dissatisfied with the various options for shortening the memory-reference instructions by one
bit, I decided to leave the memory-reference instructions at their full length, and claim opcode space from somewhere less important: I replaced 14-bit short instructions by 13-bit short instructions.
What occurred to me was that I could instead keep the full-length memory- reference instructions and have 14-bit short instructions, and fit everything else into space left by unused opcodes for 14-bit short instructions.
In order to make that work, I had to alter the 14-bit short instructions a bit. I re-ordered the fields in the shift instructions, so that I could
put the supervisor call instruction in with them, and the 14-bit branch instructions now only came with a three-bit field to select the condition they could test.
That let me fit all the instructions in.
The opcode space for block headers, however, was reduced. Which is not really that much of a bad thing; it means that now the block headers will--- Synchronet 3.21d-Linux NewsLink 1.2
be pared down (to just one!) and thus greatly simplified.
John Savard
I smell danger--running out of OpCode space early in the design.
After the general conceptualization of the ISA you should have half of
the OpCode space available for future additions !!.
On Sat, 07 Mar 2026 19:00:02 +0000, MitchAlsup wrote:
I smell danger--running out of OpCode space early in the design. After
the general conceptualization of the ISA you should have half of the
OpCode space available for future additions !!.
Well, while it is true that only 1/64th of the opcode space for 32-bit instructions is left, my current plan is to use 1/128th for headers, and
the other 1/128th for instructions longer than 32 bits. Which means that there is still space for 511 times as many instructions as are already defined, even if I never went beyond 48 bits.
quadi <quadibloc@ca.invalid> posted:
I had considered proceeding on to a CISC Concertina III. However, after
starting to look at that project, I found that there was a lack of opcode
space in one spot.
Just the other day, though, it occurred to me that there was a possibility >> of improving Concertina II.
After a long period of changing it, because I was dissatisfied with the
various options for shortening the memory-reference instructions by one
bit, I decided to leave the memory-reference instructions at their full
length, and claim opcode space from somewhere less important: I replaced
14-bit short instructions by 13-bit short instructions.
What occurred to me was that I could instead keep the full-length memory-
reference instructions and have 14-bit short instructions, and fit
everything else into space left by unused opcodes for 14-bit short
instructions.
In order to make that work, I had to alter the 14-bit short instructions a >> bit. I re-ordered the fields in the shift instructions, so that I could
put the supervisor call instruction in with them, and the 14-bit branch
instructions now only came with a three-bit field to select the condition
they could test.
I admire your effort.
That let me fit all the instructions in.
I smell danger--running out of OpCode space early in the design.
After the general conceptualization of the ISA you should have
half of the OpCode space available for future additions !!.
The opcode space for block headers, however, was reduced. Which is not
really that much of a bad thing; it means that now the block headers will
be pared down (to just one!) and thus greatly simplified.
John Savard
On Sun, 08 Mar 2026 01:14:42 +0000, quadi wrote:
On Sat, 07 Mar 2026 19:00:02 +0000, MitchAlsup wrote:
I smell danger--running out of OpCode space early in the design. After
the general conceptualization of the ISA you should have half of the
OpCode space available for future additions !!.
Well, while it is true that only 1/64th of the opcode space for 32-bit instructions is left, my current plan is to use 1/128th for headers, and the other 1/128th for instructions longer than 32 bits. Which means that there is still space for 511 times as many instructions as are already defined, even if I never went beyond 48 bits.
However, the lack of opcode space did cause me one problem. Previously, I had a type of header which started with the four bits 1111. This was followed by fourteen two-bit prefixes, which applied to every 16 bits remaining in the 256-bit code block.
They indicated:
00 - 17-bit instruction, starting with 0
01 - 17-bit instruction, starting with 1
10 - begin instruction with 32-bit parcel
11 - append another 32-bit instruction parcel.
John Savard--- Synchronet 3.21d-Linux NewsLink 1.2
quadi <quadibloc@ca.invalid> posted:
On Sun, 08 Mar 2026 01:14:42 +0000, quadi wrote:
On Sat, 07 Mar 2026 19:00:02 +0000, MitchAlsup wrote:
I smell danger--running out of OpCode space early in the design. After >>>> the general conceptualization of the ISA you should have half of the
OpCode space available for future additions !!.
Well, while it is true that only 1/64th of the opcode space for 32-bit
instructions is left, my current plan is to use 1/128th for headers, and >>> the other 1/128th for instructions longer than 32 bits. Which means that >>> there is still space for 511 times as many instructions as are already
defined, even if I never went beyond 48 bits.
However, the lack of opcode space did cause me one problem. Previously, I
had a type of header which started with the four bits 1111. This was
followed by fourteen two-bit prefixes, which applied to every 16 bits
remaining in the 256-bit code block.
They indicated:
00 - 17-bit instruction, starting with 0
01 - 17-bit instruction, starting with 1
10 - begin instruction with 32-bit parcel
11 - append another 32-bit instruction parcel.
11 simply adds another 32-bits to the current instruction parcel.
This gives access to {16, 32, 48, 64, 80, 96, ...}-bit instructions.
This can be treeified rather easily for wide decode.
John Savard
But I decided to go with a much simpler option instead. A bit of
compressive coding is still needed, but now the scheme is simple. I just switched from 17-bit short instructions to 16-bit instructions for code
with mixed-length instructions. In some respects, the limitations of 16-
bit instructions are complentary to those of 15-bit instructions, the
ones that can occur in pairs within 32-bit instruction code, and so the
two types can be mixed in a block to somewhat mitigate their
limitations.
Mixing these two types of short instructions in a single block is... an awkward and complicated workaround.
I've decided to drop that capability, because doing so makes more opcode space available for 48-bit (and longer) instructions in the variable-
length instruction blocks. I found that certain highly desirable classes
of 48-bit instructions are made impossible otherwise.
I have now begun uploading the description of the revised Concertina II instruction set to my web site. The block headers, the 32-bit
instruction formats, and the 16-bit and 15-bit instruction formats are
now all present at
http://www.quadibloc.com/arch/ct25int.htm
Thus, the squish of opcode space that made this iteration of Concertina
II possible _also_ makes a CISC instruction set possible. However, the
short instructions and the instructions longer than 32 bits are _both_ *severely* constrained in opcode space in the CISC mode.
Given that I was able to reduce the prefix for paired short instructions
from 1111 to 11, allowing the paired short instructions to return to being
15 bits long...
since 14-bit short instructions are possible, then 11 could be the prefix
for a single short instruction.
Thus, the squish of opcode space that made this iteration of Concertina II possible _also_ makes a CISC instruction set possible. However, the short instructions and the instructions longer than 32 bits are _both_
*severely* constrained in opcode space in the CISC mode.
FWIW:
IME, while pair encoding scheme can result in space savings over a pure 32/64/96 coding scheme, while avoiding the misalignment issues of a
16/32 coding scheme, a downside of a pair encoding is that the potential space savings are significantly reduced relative to a 16/32 scheme.
Say, for example:
An effective 16/32 scheme can get around a 20% space savings;
An effective pair-encoding implicitly drops to around 8%.
Mostly because it can only save space in cases when both instructions
can be pair encoded, versus when either instruction could be 16-bit
encoded.
Though, that said, pair encoding is an attractive option when the main
other option is 32-bit only, and one already has some mechanism in place
to deal with cracking an instruction.
On Tue, 10 Mar 2026 13:21:15 +0000, quadi wrote:
Thus, the squish of opcode space that made this iteration of Concertina
II possible _also_ makes a CISC instruction set possible. However, the
short instructions and the instructions longer than 32 bits are _both_
*severely* constrained in opcode space in the CISC mode.
And thus I had to re-think the longer instructions in CISC mode, making
a tweak to their definitions so that important functionality was not
lost.
On Tue, 10 Mar 2026 15:15:36 +0000, quadi wrote:
On Tue, 10 Mar 2026 13:21:15 +0000, quadi wrote:
Thus, the squish of opcode space that made this iteration of Concertina
II possible _also_ makes a CISC instruction set possible. However, the
short instructions and the instructions longer than 32 bits are _both_
*severely* constrained in opcode space in the CISC mode.
And thus I had to re-think the longer instructions in CISC mode, making
a tweak to their definitions so that important functionality was not
lost.
I've also tweaked the short instructions. The original 14-bit instructions
were made to be part of the standard Concertina II instruction set. This time, they're part of CISC mode. So what are they competing with? Other--- Synchronet 3.21d-Linux NewsLink 1.2
CISC architectures!
That insight led me to switching from a five-bit opcode field, providing only a restricted set of operate instructions, to instead include all the basic operate instructions - but to have the short instructions in CISC
mode only work with the first eight registers of each register bank.
John Savard
Why are you not counting the header overhead as part of the instruction
??
It seems to me that a 14-bit instruction with a 2-bit header
(descriptor) has a 16-bit footprint--and in the end that is what
matters.
On Fri, 13 Mar 2026 16:08:33 +0000, MitchAlsup wrote:
Why are you not counting the header overhead as part of the instruction
??
It seems to me that a 14-bit instruction with a 2-bit header
(descriptor) has a 16-bit footprint--and in the end that is what
matters.
I do count it, for some purposes. However, I make a distinction, as
well,
between several different types of short instruction, all of which have
a 16-bit footprint, by the number of bits available to specify the instruction, since it is in respect of this attribute that the various
short instruction formats differ.
I mean, I could call them Type A, B, C, and D but that would be
unnecessarily confusing.
Their nominal footprints are all 16 bits, since for purposes of
branching to them, their location is deemed to be a halfword, even in
the case of 15-
bit instructions, where the fields they're in within a 32-bit slot don't perfectly align with the 16-bit halfwords - the first one extends one
bit into the second 16 bits.
On Fri, 13 Mar 2026 16:28:39 +0000, quadi wrote:
On Fri, 13 Mar 2026 16:08:33 +0000, MitchAlsup wrote:
Why are you not counting the header overhead as part of the instruction
??
It seems to me that a 14-bit instruction with a 2-bit header
(descriptor) has a 16-bit footprint--and in the end that is what
matters.
I do count it, for some purposes. However, I make a distinction, as
well,
between several different types of short instruction, all of which have
a 16-bit footprint, by the number of bits available to specify the
instruction, since it is in respect of this attribute that the various
short instruction formats differ.
I mean, I could call them Type A, B, C, and D but that would be
unnecessarily confusing.
On top of that, in addition to 14-bit, 15-bit, and 16-bit short
instructions, I have 17-bit short instructions. Their footprint, in the sequence of instructions, is 16 bits, because the first bit is in the
header instead. Except that the fact that a short instruction is in that
spot is indicated by another bit. So shall we give it an 18-bit footprint, and then the 16-bit instructions have a 17-bit footprint too, while the 15- bit and 14-bit instructions do both have 16-bit footprints.
Do you really think a compiler writer will develop code to figure out
which of the many code formats you have should be emitted in each source
code situation?
On 3/13/2026 9:33 AM, quadi wrote:
On Fri, 13 Mar 2026 16:28:39 +0000, quadi wrote:
On Fri, 13 Mar 2026 16:08:33 +0000, MitchAlsup wrote:
Why are you not counting the header overhead as part of the instruction >>> ??
It seems to me that a 14-bit instruction with a 2-bit header
(descriptor) has a 16-bit footprint--and in the end that is what
matters.
I do count it, for some purposes. However, I make a distinction, as
well,
between several different types of short instruction, all of which have
a 16-bit footprint, by the number of bits available to specify the
instruction, since it is in respect of this attribute that the various
short instruction formats differ.
I mean, I could call them Type A, B, C, and D but that would be
unnecessarily confusing.
On top of that, in addition to 14-bit, 15-bit, and 16-bit short instructions, I have 17-bit short instructions. Their footprint, in the sequence of instructions, is 16 bits, because the first bit is in the header instead. Except that the fact that a short instruction is in that spot is indicated by another bit. So shall we give it an 18-bit footprint, and then the 16-bit instructions have a 17-bit footprint too, while the 15- bit and 14-bit instructions do both have 16-bit footprints.
Do you really think a compiler writer will develop code to figure out
which of the many code formats you have should be emitted in each source code situation?
On top of that, in addition to 14-bit, 15-bit, and 16-bit short instructions, I have 17-bit short instructions.
quadi <quadibloc@ca.invalid> schrieb:
On top of that, in addition to 14-bit, 15-bit, and 16-bit short
instructions, I have 17-bit short instructions.
Can these follow each other in arbitrary order?
quadi <quadibloc@ca.invalid> schrieb:
On top of that, in addition to 14-bit, 15-bit, and 16-bit short
instructions, I have 17-bit short instructions.
Can these follow each other in arbitrary order?
On Sat, 14 Mar 2026 14:55:08 +0000, Thomas Koenig wrote:
quadi <quadibloc@ca.invalid> schrieb:
On top of that, in addition to 14-bit, 15-bit, and 16-bit short
instructions, I have 17-bit short instructions.
Can these follow each other in arbitrary order?
Even I, who has indeed in Concertina II designed what has been in some of its iterations - and which has again become in the current iteration,
sadly - some really weird ISAs would shrink from designing an ISA in which the answer to that question would be "Yes".
Even I, who has indeed in Concertina II designed what has been in some
of its iterations - and which has again become in the current iteration, sadly - some really weird ISAs would shrink from designing an ISA in
which the answer to that question would be "Yes".
This way lies madness (for the compiler writer, at least).
The only viable solution would be to always use the same mode, and in
that case all the block overhad would be wasted.
On Tue, 10 Mar 2026 16:57:47 -0500, BGB wrote:
FWIW:
IME, while pair encoding scheme can result in space savings over a pure
32/64/96 coding scheme, while avoiding the misalignment issues of a
16/32 coding scheme, a downside of a pair encoding is that the potential
space savings are significantly reduced relative to a 16/32 scheme.
Say, for example:
An effective 16/32 scheme can get around a 20% space savings;
An effective pair-encoding implicitly drops to around 8%.
Mostly because it can only save space in cases when both instructions
can be pair encoded, versus when either instruction could be 16-bit
encoded.
Though, that said, pair encoding is an attractive option when the main
other option is 32-bit only, and one already has some mechanism in place
to deal with cracking an instruction.
I am aware of the issue you are raising here, and I certainly am aware
that forcing the programmer to choose shorter instructions in pairs limits the potential space savings.
So why did I use this mechanism?
For one thing, I used it in order to simply fetching and decoding instructions. If every instruction is 32 bits long, neither longer nor shorter, then it's very easy to fetch a block of memory, and decode all
the instructions in it in parallel - because you already know where they begin and end.
For another, look at the way I squeezed a short instruction - which I
would prefer to be 17 bits long - into 15 bits. The register banks are divided into four groups, and the two operands must both be registers from the same group in a 15 bit instruction.
That shows something about the type of code I expect to execute. Code
where instructions belonging to multiple threads (of a sort, not real
threads that execute asynchronously) are interleaved, so as to make it
easier to execute the instructions simultaneously in a pipeline.
That gives a bit of flexibility in instruction ordering, so it makes it easier to pair up short instructions.
And, as further evidence that I'm aware that having to use short
instructions in pairs is a disadvantage... this, along with the desire to
use pseudo-immediates (because I do accept Mitch Alsup's reasoning that getting data almost for free from the instruction stream beats an
additional memory access, with all the overhead that entails) led me to
set up the block header mechanism (which Mitch Alsup rightly criticizes; I just felt it was the least bad way to achieve what I wanted) so that
fetching instructions to decode _remained_ as naively straightforward _as
if_ all the instructions were the same length... even when the
instructions were allowed to vary in length.
And so with what are currently the Type I, II, and IV headers, the instruction stream consists of variable length instructions; short instructions can be placed individually at any position in the instruction stream.
There's even a CISC mode now, since I've squeezed things so much that this ISA is capable, with slight tweaking, of just having plain variable length instructions without blocks. But it's just barely capable of that; in that form, the short instructions only have 14 bits to play with, so the repertoire of those instructions is limited, and therefore the potential space savings they provide is less.
Of course when block headers allow 17-bit instructions at arbitrary positions... that _would_ maximize space savings, but there's the overhead
of the space the block header takes up. So any choice that is made
involves tradeoffs.
I also have a goal of making the ISA simple to implement, so in the CISC mode, instead of just saying "leave the register field all zeroes, and put the immediate right after the instruction", I have said that the pseudo- immediates aren't available in CISC mode. That avoids having to decode anything but the leading bits of the instruction in order to determine
where the next instruction starts.
It isn't the greatest variable-length instruction architecture; that capability is basically an afterthought appended to an architecture
intended to be used with block headers.
John Savard
On Sat, 14 Mar 2026 23:36:31 +0000, Thomas Koenig wrote:
This way lies madness (for the compiler writer, at least).
The only viable solution would be to always use the same mode, and in
that case all the block overhad would be wasted.
Not really. A compiler for a language intended for general-purpose computation could use one mode, and one generating code for embedded
systems could use another mode.
Even I, who has indeed in Concertina II designed what has been in
some of its iterations - and which has again become in the current iteration, sadly - some really weird ISAs would shrink from
designing an ISA in which the answer to that question would be
"Yes".
quadi <quadibloc@ca.invalid> schrieb:
On Sat, 14 Mar 2026 23:36:31 +0000, Thomas Koenig wrote:
This way lies madness (for the compiler writer, at least).
The only viable solution would be to always use the same mode, and in
that case all the block overhad would be wasted.
Not really. A compiler for a language intended for general-purpose
computation could use one mode, and one generating code for embedded
systems could use another mode.
So the space for the headers will be wasted. It can be assumed that the
vast majority programs are larger than 256 bits.
Given that the vast majority of programs work very well on RISC, what programming languages did you have in mind for your CISC-mode, and why
is your CISC mode supposed to be better at these particular programming languages including their runtime libraries?
Runtime libraries need not be in the primary language; for example, gfortran's runtime library is written in C.
Man, I am not sure why "find something that works, and makes sense, and
stick with it" poses such a problem...
That should make clear that my goal isn't to settle for anything
that _works_, but rather to find something that _works better_.
On Sat, 14 Mar 2026 22:58:24 +0000, quadi wrote:
Even I, who has indeed in Concertina II designed what has been in some
of its iterations - and which has again become in the current iteration,
sadly - some really weird ISAs would shrink from designing an ISA in
which the answer to that question would be "Yes".
What is the Concertina II architecture, and why has it gone through so
many iterations?
The basic idea behind the Concertina II architecture has been:
Start from a 32-bit RISC-like architecture.
The basic form of a memory-reference instruction is:
(Opcode) 5 bits for a load-store operation
(Destination Register) 5 bits for one of two 32-register register banks,
one for integers, one for floats
(Index Register) 3 bits - use only seven of the 32 integer registers, so
the instruction will fit in 32 bits
(Base Register) 3 bits - as above
(Displacement) 16 bits - as is conventional for most CISC and RISC microprocessors
In article <10p4p6g$lg6r$2@dont-email.me>, quadibloc@ca.invalid (quadi) wrote:
Even I, who has indeed in Concertina II designed what has been in
some of its iterations - and which has again become in the current
iteration, sadly - some really weird ISAs would shrink from
designing an ISA in which the answer to that question would be
"Yes".
iAPX 432 had instructions which weren't in whole bytes, and were
addressed by bit offset in a segment. You could only have 64K instruction bits in a segment, or 8K bytes. The idea was that no subroutine or
function ever needed to be bigger than that.
John
One of the engineers realised than since they were using 36-bit-wide
memory and CPU data path, for 4x8-bit bytes, each with parity, it was >possible to write an efficient emulator for the IBM 7090 in microcode on
the 360 model 65. More emulators were written for other 7000-series
machines, which had very varied architectures.
One of the engineers realised than since they were using 36-bit-wide
memory and CPU data path, for 4x8-bit bytes, each with parity, it was possible to write an efficient emulator for the IBM 7090 in microcode on
the 360 model 65.
You say that you have "short pointers" that provide constants from the
same block. Why do not you provide displacement in this way? AFAICS 9
bits would be enough to have either 8 bit pointer or 8 bit displacement,
for saving of 7 bits. 4 bits could be used to get full range of
registers, other 3 would give you room for other instructions.
On Sun, 15 Mar 2026 02:26:22 -0500, BGB wrote:
Man, I am not sure why "find something that works, and makes sense, and
stick with it" poses such a problem...
If _that_ is your goal, may I suggest x86-64? It's highly popular, so
there is a lot of software available to run on it. As well, the companies that make processors with this ISA have large economies of scale, and so their processors get to use the most advanced process nodes, and have a
lot of effort put into optimizing their microarchitecture.
That should make clear that my goal isn't to settle for anything that _works_, but rather to find something that _works better_.
I meant more in the sense of:
Pick an encoding encoding scheme that makes sense;
Don't just keep going back and forth with needlessly complex approaches
and never settling on anything.
I have managed to find some additional opcode space that has let me make
the paired 15-bit instructions just a bit less messy. But since I won't
be satisfied until they're not messy at all - and I am so close to
finally squeezing everything in that I want to include in the basic 32
bit instruction set - I am going to continue looking for some additional improvements for a while.
I didn't find a way to squeeze things yet further, but I found a
disastrous mistake in one part of the instruction set that showed the
same opcode space being allocated twice.
But it turned out to be easily fixable; I had added an opcode bit to one instruction, so taking that back gave me the opcode space I needed to
put everything right again.
In article <10p4p6g$lg6r$2@dont-email.me>, quadibloc@ca.invalid (quadi) wrote:
Even I, who has indeed in Concertina II designed what has been in
some of its iterations - and which has again become in the current
iteration, sadly - some really weird ISAs would shrink from
designing an ISA in which the answer to that question would be
"Yes".
iAPX 432 had instructions which weren't in whole bytes, and were
addressed by bit offset in a segment. You could only have 64K instruction bits in a segment, or 8K bytes. The idea was that no subroutine or
function ever needed to be bigger than that.
On Sun, 15 Mar 2026 16:53:51 -0500, BGB wrote:
I meant more in the sense of:
Pick an encoding encoding scheme that makes sense;
Don't just keep going back and forth with needlessly complex approaches
and never settling on anything.
That is definitely what I do want to end up doing.
I have managed to find some additional opcode space that has let me make
the paired 15-bit instructions just a bit less messy. But since I won't be satisfied until they're not messy at all - and I am so close to finally squeezing everything in that I want to include in the basic 32 bit instruction set - I am going to continue looking for some additional improvements for a while.
John Savard
I have managed to find some additional opcode space that has let me make
the paired 15-bit instructions just a bit less messy. But since I won't
be satisfied until they're not messy at all - and I am so close to
finally squeezing everything in that I want to include in the basic 32
bit instruction set - I am going to continue looking for some additional improvements for a while.
Having different formats for 15-bit instructions in different block
formats isn't something I count as "messy", that's just par for the
course in Concertina II.
I meant more in the sense of:
Pick an encoding encoding scheme that makes sense;
Don't just keep going back and forth with needlessly complex approaches
and never settling on anything.
I smell danger--running out of OpCode space early in the design.
After the general conceptualization of the ISA you should have half of
the OpCode space available for future additions !!.
So the type IV header could have...
3 bits for a decode field, to let pseudo-immediates be used;
7 bits to indicate whether each 32-bit instruction slot has a standard
32- bit instruction or a special instruction;
8 bits to indicate *which of 256 different sets of special instructions
is going to be used in this block*;
and I've still got four bits left over, maybe to make fifteen other
header types as good as this one.
But what if there are new data types that require longer instructions?
For example, today's IBM zSeries mainframes have instructions that
handle UTF-8 character strings natively in hardware.
I've decided to work out the formats for the additional header types all
this would need now, rather than leaving it for later. In order to avoid going to a 64-bit header, I've gone to an exotic length for the sixth
header type.
Originally, I used radix encoding, but I realized Chen-Ho encoding was
more appropriate.
(Although, as
Moore's Law peters out, it may not be possible to put circuitry for so
many different kinds of instructions on a single die!)
On Tue, 17 Mar 2026 00:06:28 +0000, quadi wrote:
Originally, I used radix encoding, but I realized Chen-Ho encoding was
more appropriate.
Anyways... see what happened?
You mention the fact that, far too early in the design phase, I have used
up almost all of the available opcode space.
In a sense, that certainly is true. The opcode space for the basic instruction set of the computer, when there are no headers present to add longer instructions to the instruction set, is perhaps more than 99% allocated! Which _is_ pretty bad.
But what did I do in response?
I still, despite recent changes to the block headers to make them consume less opcode space, some space left to define new types of headers. So what did I do? I defined three new types of header which have the effect of allowing the architecture to be modified... so as to add up to *five
hundred and twenty-eight* additional instruction sets to the ISA. These headers allow instructions from any one of those auxilliary instruction
sets to be combined with regular instructions in the same block.
So if there's a need for instructions acting on short floating-point
numbers, or UTF-8 strings, that the basic instruction set has not covered,
it will be possible to extend the instruction set to deal with it.
There should be enough room for it to meet the demands placed on it not merely in years to come, but even centuries or millenia. (Although, as Moore's Law peters out, it may not be possible to put circuitry for so
many different kinds of instructions on a single die!)
On 3/15/2026 9:35 AM, John Dallman wrote:
In article <10p4p6g$lg6r$2@dont-email.me>, quadibloc@ca.invalid (quadi) wrote:
Even I, who has indeed in Concertina II designed what has been in
some of its iterations - and which has again become in the current
iteration, sadly - some really weird ISAs would shrink from
designing an ISA in which the answer to that question would be
"Yes".
iAPX 432 had instructions which weren't in whole bytes, and were
addressed by bit offset in a segment. You could only have 64K instruction bits in a segment, or 8K bytes. The idea was that no subroutine or
function ever needed to be bigger than that.
On one hand? "kill it with fire!".
On the other? By the time I existed, it was basically already dead...
It is like, there are one of several ways things can go:
Abysmal Turd: iAPX 432
Turd: IA-64
Turd that flies: x86, x86-64
Sane: ARM, RISC-V, ...
Sane, but still died: PowerPC, MIPS, SPARC, ...
On Sat, 14 Mar 2026 22:58:24 +0000, quadi wrote:-----------------
What is the Concertina II architecture, and why has it gone through so
many iterations?
The basic idea behind the Concertina II architecture has been:
Start from a 32-bit RISC-like architecture.
The basic form of a memory-reference instruction is:
(Opcode) 5 bits for a load-store operation
(Destination Register) 5 bits for one of two 32-register register banks,
one for integers, one for floats
(Index Register) 3 bits - use only seven of the 32 integer registers, so
the instruction will fit in 32 bits
(Base Register) 3 bits - as above
(Displacement) 16 bits - as is conventional for most CISC and RISC microprocessors
There are 24 opcodes normally used, so 1/4 of the opcode space is
available for everything else.
I had wanted to provide all the advantages and features of popular CISC architectures too, though.
This meant trying to squeeze stuff into not enough opcode space. So, for example, on the IBM System/360, there are arithmetic instructions between two registers that only take up 16 bits.
With 32-bit register banks, a short instruction ought to look like this:
(Opcode) 7 bits
(Destination Register) 5 bits
(Source Register) 5 bits
But that's 17 bits long.
So what I usually did was place restrictions on the source and destination registers so that the short instructions could fit into 15 bits. A 32-bit instruction slot could start with 11 and then use 1/4 of the opcode space, containing a pair of these.
But the basic memory-reference instructions take up 3/4 of the opcode
space. I still need several other 32-bit instructions!
John Savard
On 3/17/2026 1:00 AM, quadi wrote:
On Tue, 17 Mar 2026 00:06:28 +0000, quadi wrote:
Originally, I used radix encoding, but I realized Chen-Ho encoding was
more appropriate.
Anyways... see what happened?
You mention the fact that, far too early in the design phase, I have used up almost all of the available opcode space.
In a sense, that certainly is true. The opcode space for the basic instruction set of the computer, when there are no headers present to add longer instructions to the instruction set, is perhaps more than 99% allocated! Which _is_ pretty bad.
But what did I do in response?
I still, despite recent changes to the block headers to make them consume less opcode space, some space left to define new types of headers. So what did I do? I defined three new types of header which have the effect of allowing the architecture to be modified... so as to add up to *five hundred and twenty-eight* additional instruction sets to the ISA. These headers allow instructions from any one of those auxilliary instruction sets to be combined with regular instructions in the same block.
So if there's a need for instructions acting on short floating-point numbers, or UTF-8 strings, that the basic instruction set has not covered, it will be possible to extend the instruction set to deal with it.
There should be enough room for it to meet the demands placed on it not merely in years to come, but even centuries or millenia. (Although, as Moore's Law peters out, it may not be possible to put circuitry for so
many different kinds of instructions on a single die!)
So you have produced a larger, hence more expensive chip.
And you then
expect your chip's potential users to pay more for a chip which, you
admit, a particular application, particularly embedded ones, has
features that application will never use.
Doesn't seem like a recipe for success. :-(
In article <10p4p6g$lg6r$2@dont-email.me>, quadibloc@ca.invalid (quadi) wrote:
Even I, who has indeed in Concertina II designed what has been in
some of its iterations - and which has again become in the current iteration, sadly - some really weird ISAs would shrink from
designing an ISA in which the answer to that question would be
"Yes".
iAPX 432 had instructions which weren't in whole bytes, and were
addressed by bit offset in a segment. You could only have 64K instruction bits in a segment, or 8K bytes. The idea was that no subroutine or
function ever needed to be bigger than that.
John--- Synchronet 3.21d-Linux NewsLink 1.2
It is like, there are one of several ways things can go:Great: PDP-11 but died due to address space limitations
Abysmal Turd: iAPX 432
Turd: IA-64
Turd that flies: x86, x86-64
Sane: ARM, RISC-V, ...
Sane, but still died: PowerPC, MIPS, SPARC, ...
M68K started turning into a turd,
but can't really be faulted that much,--- Synchronet 3.21d-Linux NewsLink 1.2
as its design was pretty close to a direct evolution of the PDP-11;
which was quite influential.
BGB <cr88192@gmail.com> posted:
On 3/15/2026 9:35 AM, John Dallman wrote:
In article <10p4p6g$lg6r$2@dont-email.me>, quadibloc@ca.invalid (quadi)
wrote:
Even I, who has indeed in Concertina II designed what has been in
some of its iterations - and which has again become in the current
iteration, sadly - some really weird ISAs would shrink from
designing an ISA in which the answer to that question would be
"Yes".
iAPX 432 had instructions which weren't in whole bytes, and were
addressed by bit offset in a segment. You could only have 64K instruction >> > bits in a segment, or 8K bytes. The idea was that no subroutine or
function ever needed to be bigger than that.
On one hand? "kill it with fire!".
On the other? By the time I existed, it was basically already dead...
432 is worthy of study if only to figure out "what not to do".
It is like, there are one of several ways things can go:
Abysmal Turd: iAPX 432
Turd: IA-64
Turd that flies: x86, x86-64
Sane: ARM, RISC-V, ...
Sane, but still died: PowerPC, MIPS, SPARC, ...
interesting viewpoint
I remember reading the manuals on Bitsavers for several 360
emulation options. I don't recall any 7090 emulator which involved
turning memory parity off in order for it to work!
Think about it. If this is possible, the parity checks must be
implemented by microcode.
The constant struggle to settle on a single OpCOde template is
indicative of your struggle--the lack of convergence should be telling
you that this goal is one thing holding your architecture back.
5 pounds of sand does not fit in a 4 pound bag !
In article <10p75jk$1dmen$1@dont-email.me>, quadibloc@ca.invalid (quadi) wrote:
I remember reading the manuals on Bitsavers for several 360
emulation options. I don't recall any 7090 emulator which involved
turning memory parity off in order for it to work!
Think about it. If this is possible, the parity checks must be
implemented by microcode.
The 360s were built out of hundreds of small circuit cards with discrete components on them. Those included very low-integrated circuits, with
maybe 10 transistors at most. The orders to the designers of the
different models were to microcode everything, unless they could improve price:performance by 30% or more with dedicated circuitry.
The microcoded world lasted until the RISC revolution of the 1980s, when integrated circuits were providing at least 10,000 times more transistors.
John
In article <10p75jk$1dmen$1@dont-email.me>, quadibloc@ca.invalid (quadi) wrote:
I remember reading the manuals on Bitsavers for several 360
emulation options. I don't recall any 7090 emulator which involved
turning memory parity off in order for it to work!
Think about it. If this is possible, the parity checks must be
implemented by microcode.
The 360s were built out of hundreds of small circuit cards with discrete components on them. Those included very low-integrated circuits, with
maybe 10 transistors at most. The orders to the designers of the
different models were to microcode everything, unless they could improve price:performance by 30% or more with dedicated circuitry.
The microcoded world lasted until the RISC revolution of the 1980s, when integrated circuits were providing at least 10,000 times more transistors.
The microcoded world lasted until the RISC revolution of the 1980s, when
integrated circuits were providing at least 10,000 times more transistors.
The 801 demonstrated (within IBM) that RISC was possible in the
second half of the 1970s. The key there were fast caches which
were fast enough to replace microcode storage. Separation of
I and D cache also played a role, of course, as did pipelinging.
BGB <cr88192@gmail.com> posted:
--------------
It is like, there are one of several ways things can go:Great: PDP-11 but died due to address space limitations
Abysmal Turd: iAPX 432
Turd: IA-64
Turd that flies: x86, x86-64
Sane: ARM, RISC-V, ...
Sane, but still died: PowerPC, MIPS, SPARC, ...
Brilliant: VAX but died because complexity limits perf over time
M68K started turning into a turd,
by adding too much complexity between 010 and 020
but can't really be faulted that much,
as its design was pretty close to a direct evolution of the PDP-11;
which was quite influential.
It is like, there are one of several ways things can go:
Abysmal Turd: iAPX 432
Turd: IA-64
Turd that flies: x86, x86-64
Sane: ARM, RISC-V, ...
Sane, but still died: PowerPC, MIPS, SPARC, ...
On 3/15/2026 9:35 AM, John Dallman wrote:
In article <10p4p6g$lg6r$2@dont-email.me>, quadibloc@ca.invalid (quadi)
wrote:
Even I, who has indeed in Concertina II designed what has been in
some of its iterations - and which has again become in the current
iteration, sadly - some really weird ISAs would shrink from
designing an ISA in which the answer to that question would be
"Yes".
iAPX 432 had instructions which weren't in whole bytes, and were
addressed by bit offset in a segment. You could only have 64K instruction
bits in a segment, or 8K bytes. The idea was that no subroutine or
function ever needed to be bigger than that.
On one hand? "kill it with fire!".
On the other? By the time I existed, it was basically already dead...
I guess, probably similar story as the Gen-Z people and the IA-64.
But, being of an older generation, IA-64 hype was going on when I was in high-school, and x86-64 also came on the scene. Some other people were
on Team IA-64, but I was on Team x86-64, mostly because as I could see
it at the time, it seemed like the writing was already on the wall for
IA-64 even back then.
In this case, I ended up being right...
IIRC, the thinking was, for IA-64:
 People already knew of its performance woes;
 It was also very much more expensive.
On the other side of things, the Athlon64 was already on the horizon;
and ended up getting one post graduation.
It is like, there are one of several ways things can go:
 Abysmal Turd: iAPX 432
 Turd: IA-64
 Turd that flies: x86, x86-64
 Sane: ARM, RISC-V, ...
 Sane, but still died: PowerPC, MIPS, SPARC, ...
M68K started turning into a turd, but can't really be faulted that much,
as its design was pretty close to a direct evolution of the PDP-11;
which was quite influential.
Well, and go back pretty far, and my current ISA design also follows an evolution tree that reaches back to the PDP-11, despite having almost
noting it common (and by sorta colliding with RISC-V, also sorta half-
way hybridizes it with the MIPS family).
Then again, maybe being a product of engineering rather than
naturalistic evolution sorta excludes it from the normal words of
phylogeny. But, then again, can one prove that even nature obeys it?...
Say, for example, what if you ended up with leaf slugs that could
natively reproduce chloroplasts, would they still be purely animals, or would their then stolen DNA and chloroplasts make then partially
descended from algae (as IIRC they had gained some DNA from the algae
via horizontal gene transfer or similar, but need to scavenge
chloroplasts from plant cells rather than making their own).
...
John Dallman <jgd@cix.co.uk> schrieb:
In article <10p75jk$1dmen$1@dont-email.me>, quadibloc@ca.invalid (quadi)
wrote:
I remember reading the manuals on Bitsavers for several 360
emulation options. I don't recall any 7090 emulator which involved
turning memory parity off in order for it to work!
Think about it. If this is possible, the parity checks must be
implemented by microcode.
The 360s were built out of hundreds of small circuit cards with discrete
components on them. Those included very low-integrated circuits, with
maybe 10 transistors at most. The orders to the designers of the
different models were to microcode everything, unless they could improve
price:performance by 30% or more with dedicated circuitry.
The microcoded world lasted until the RISC revolution of the 1980s, when
integrated circuits were providing at least 10,000 times more transistors.
The 801 demonstrated (within IBM) that RISC was possible in the
second half of the 1970s. The key there were fast caches which
were fast enough to replace microcode storage. Separation of
I and D cache also played a role, of course, as did pipelinging.
On 3/17/2026 1:00 AM, quadi wrote:
So if there's a need for instructions acting on short floating-point
numbers, or UTF-8 strings, that the basic instruction set has not
covered,
it will be possible to extend the instruction set to deal with it.
There should be enough room for it to meet the demands placed on it not
merely in years to come, but even centuries or millenia. (Although, as
Moore's Law peters out, it may not be possible to put circuitry for so
many different kinds of instructions on a single die!)
So you have produced a larger, hence more expensive chip. And you then expect your chip's potential users to pay more for a chip which, you
admit, a particular application, particularly embedded ones, has
features that application will never use.
Doesn't seem like a recipe for success. :-(
According to Thomas Koenig <tkoenig@netcologne.de>:
The microcoded world lasted until the RISC revolution of the 1980s, when >>> integrated circuits were providing at least 10,000 times more transistors. >>The 801 demonstrated (within IBM) that RISC was possible in the
second half of the 1970s. The key there were fast caches which
were fast enough to replace microcode storage. Separation of
I and D cache also played a role, of course, as did pipelinging.
That's partly it but I think it was more that the 801 and the PL.8
compiler were developed together. They had the insight that if you decomposed complicated instructions into simpler ones, the compiler
now could optimize them and some of those instructions were
optimized away. It certainly didn't hurt that the 801's cache could
provide an instruction every cycle so it was as fast as microcode
would be.
While the early FORTRAN compilers did optimizations that are still
quite respectable, the other 1960s compilers were not very
sophisticated and the instruction sets reflected that. For example, the
360's EDIT instructions are basically the COBOL picture formatter.
Thomas Koenig <tkoenig@netcologne.de> wrote:
John Dallman <jgd@cix.co.uk> schrieb:
In article <10p75jk$1dmen$1@dont-email.me>, quadibloc@ca.invalid (quadi) >>> wrote:The 801 demonstrated (within IBM) that RISC was possible in the
I remember reading the manuals on Bitsavers for several 360
emulation options. I don't recall any 7090 emulator which involved
turning memory parity off in order for it to work!
Think about it. If this is possible, the parity checks must be
implemented by microcode.
The 360s were built out of hundreds of small circuit cards with discrete >>> components on them. Those included very low-integrated circuits, with
maybe 10 transistors at most. The orders to the designers of the
different models were to microcode everything, unless they could improve >>> price:performance by 30% or more with dedicated circuitry.
The microcoded world lasted until the RISC revolution of the 1980s, when >>> integrated circuits were providing at least 10,000 times more transistors. >>
second half of the 1970s. The key there were fast caches which
were fast enough to replace microcode storage. Separation of
I and D cache also played a role, of course, as did pipelinging.
Tanenbaum in his book about computer architecture mentions results
of Bell and Newell from 1971. IIUC Bell and Newell coded
some programs in microcode of 2025 (microcode engine of 360/25).
Claim is that such program run 45 times faster than program
using 360 instruction set. They also created Fortran compiler/
interpreter combination with interpreter coded in 2025
microcode. They claimed that this Fortran run at comparable
speed as "native" Fortran on 360/50.
One can make different conclusion from this. Like Tanenbaum you
can claim that there is need for more specialized microcode.
But you can also realize that by making samer version of microcode
level and allowing compilers to target it (possibly via specialized interpreter) one can gain a lot of performance. The second things
leads to RISC-like designs.
I think that in 1970 designers knew that if machine is simple
enough, then hardwired design will be faster than microcoded
one. But if for compatibility reasons one had to implement
design that was too complex to directly implement in available
hardware, then microcoded thesign had advantage. And ability
to offer "the same" architecture on machines of varying sizes
was seen as big advantage.
AFAICS at least part of motivation for microcode was realization
that hardware could be made simpler and perform better if matched
with appropriate software. But simpler and presumably cheaper
hardware would mean less money to hardware folks and more to
software vendors.
Microcode was a way for hardware vendors to
get bigger part of the pie, by doing thing that otherwise
software folks would do.
So, I think that technical people realized around 1971 that
RISC-like apprach could be technically superior, but for
several follownig years microcode had business advantage.
On Tue, 17 Mar 2026 20:18:00 +0000, John Dallman wrote:
Think about it. If this is possible, the parity checks must be
implemented by microcode.
It is true that nearly all System/360 models, except the Model 75 and the 91/95/195, were microcoded.
And in that time period, backward compatibility was becoming
important, so new architectures had that hurdle to overcome.
On 3/17/2026 3:07 PM, John Levine wrote:
According to Thomas Koenig <tkoenig@netcologne.de>:
The microcoded world lasted until the RISC revolution of the 1980s, when >>>> integrated circuits were providing at least 10,000 times more transistors. >>>The 801 demonstrated (within IBM) that RISC was possible in the
second half of the 1970s. The key there were fast caches which
were fast enough to replace microcode storage. Separation of
I and D cache also played a role, of course, as did pipelinging.
That's partly it but I think it was more that the 801 and the PL.8
compiler were developed together. They had the insight that if you
decomposed complicated instructions into simpler ones, the compiler
now could optimize them and some of those instructions were
optimized away. It certainly didn't hurt that the 801's cache could
provide an instruction every cycle so it was as fast as microcode
would be.
While the early FORTRAN compilers did optimizations that are still
quite respectable, the other 1960s compilers were not very
sophisticated and the instruction sets reflected that. For example, the
360's EDIT instructions are basically the COBOL picture formatter.
So the instruction set reflected the compiler's need for picture
formatting, and optimized that. :-)
On Tue, 17 Mar 2026 09:09:03 -0700, Stephen Fuld wrote:
On 3/17/2026 1:00 AM, quadi wrote:
So if there's a need for instructions acting on short floating-point
numbers, or UTF-8 strings, that the basic instruction set has not
covered,
it will be possible to extend the instruction set to deal with it.
There should be enough room for it to meet the demands placed on it not
merely in years to come, but even centuries or millenia. (Although, as
Moore's Law peters out, it may not be possible to put circuitry for so
many different kinds of instructions on a single die!)
So you have produced a larger, hence more expensive chip. And you then expect your chip's potential users to pay more for a chip which, you
admit, a particular application, particularly embedded ones, has
features that application will never use.
Doesn't seem like a recipe for success. :-(
Subset implementations are possible.
But what _is_ the reasoning behind making the architecture so extensible
as to allow for totally impractical implementations? What possible gain could come from it?
Well, as I noted, new data types come into fashion.
So if people start needing instructions for handling 8-bit floats, or
UTF-8 strings... what's going to happen?
Are they going to start using totally new ISAs which have instructions for these kinds of data?
I want to avoid the need to do that. I want to provide an ISA with staying power, one that has room to grow. Despite my having squeezed the opcode space so much that there's hardly any of it left!
This helps to diminish the validity of the criticism that I shouldn't have squeezed the opcode space that much.--- Synchronet 3.21d-Linux NewsLink 1.2
John Savard
On 3/17/2026 3:07 PM, John Levine wrote:
According to Thomas Koenig <tkoenig@netcologne.de>:
The microcoded world lasted until the RISC revolution of the 1980s, when >>> integrated circuits were providing at least 10,000 times more transistors.
The 801 demonstrated (within IBM) that RISC was possible in the
second half of the 1970s. The key there were fast caches which
were fast enough to replace microcode storage. Separation of
I and D cache also played a role, of course, as did pipelinging.
That's partly it but I think it was more that the 801 and the PL.8
compiler were developed together. They had the insight that if you decomposed complicated instructions into simpler ones, the compiler
now could optimize them and some of those instructions were
optimized away. It certainly didn't hurt that the 801's cache could provide an instruction every cycle so it was as fast as microcode
would be.
While the early FORTRAN compilers did optimizations that are still
quite respectable, the other 1960s compilers were not very
sophisticated and the instruction sets reflected that. For example, the 360's EDIT instructions are basically the COBOL picture formatter.
So the instruction set reflected the compiler's need for picture
formatting, and optimized that. :-)
Stephen Fuld <sfuld@alumni.cmu.edu.invalid> posted:
On 3/17/2026 3:07 PM, John Levine wrote:s/compiler's/COBOL's/
According to Thomas Koenig <tkoenig@netcologne.de>:
The microcoded world lasted until the RISC revolution of the 1980s, when >> >>> integrated circuits were providing at least 10,000 times more transistors.
The 801 demonstrated (within IBM) that RISC was possible in the
second half of the 1970s. The key there were fast caches which
were fast enough to replace microcode storage. Separation of
I and D cache also played a role, of course, as did pipelinging.
That's partly it but I think it was more that the 801 and the PL.8
compiler were developed together. They had the insight that if you
decomposed complicated instructions into simpler ones, the compiler
now could optimize them and some of those instructions were
optimized away. It certainly didn't hurt that the 801's cache could
provide an instruction every cycle so it was as fast as microcode
would be.
While the early FORTRAN compilers did optimizations that are still
quite respectable, the other 1960s compilers were not very
sophisticated and the instruction sets reflected that. For example, the
360's EDIT instructions are basically the COBOL picture formatter.
So the instruction set reflected the compiler's need for picture
formatting, and optimized that. :-)
quadi <quadibloc@ca.invalid> writes:
On Tue, 17 Mar 2026 20:18:00 +0000, John Dallman wrote:
Think about it. If this is possible, the parity checks must be
implemented by microcode.
It is true that nearly all System/360 models, except the Model 75 and the
91/95/195, were microcoded.
There was also the 360/44 if I am not mistaken.
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
Stephen Fuld <sfuld@alumni.cmu.edu.invalid> posted:
On 3/17/2026 3:07 PM, John Levine wrote:s/compiler's/COBOL's/
According to Thomas Koenig <tkoenig@netcologne.de>:
The microcoded world lasted until the RISC revolution of the 1980s, when
integrated circuits were providing at least 10,000 times more transistors.
The 801 demonstrated (within IBM) that RISC was possible in the
second half of the 1970s. The key there were fast caches which
were fast enough to replace microcode storage. Separation of
I and D cache also played a role, of course, as did pipelinging.
That's partly it but I think it was more that the 801 and the PL.8
compiler were developed together. They had the insight that if you
decomposed complicated instructions into simpler ones, the compiler
now could optimize them and some of those instructions were
optimized away. It certainly didn't hurt that the 801's cache could
provide an instruction every cycle so it was as fast as microcode
would be.
While the early FORTRAN compilers did optimizations that are still
quite respectable, the other 1960s compilers were not very
sophisticated and the instruction sets reflected that. For example, the >> > 360's EDIT instructions are basically the COBOL picture formatter.
So the instruction set reflected the compiler's need for picture
formatting, and optimized that. :-)
and RPG
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
BGB <cr88192@gmail.com> posted:
On 3/15/2026 9:35 AM, John Dallman wrote:
In article <10p4p6g$lg6r$2@dont-email.me>, quadibloc@ca.invalid (quadi) >>>> wrote:
Even I, who has indeed in Concertina II designed what has been in
some of its iterations - and which has again become in the current
iteration, sadly - some really weird ISAs would shrink from
designing an ISA in which the answer to that question would be
"Yes".
iAPX 432 had instructions which weren't in whole bytes, and were
addressed by bit offset in a segment. You could only have 64K instruction >>>> bits in a segment, or 8K bytes. The idea was that no subroutine or
function ever needed to be bigger than that.
On one hand? "kill it with fire!".
On the other? By the time I existed, it was basically already dead...
432 is worthy of study if only to figure out "what not to do".
It is like, there are one of several ways things can go:
Abysmal Turd: iAPX 432
Turd: IA-64
Turd that flies: x86, x86-64
Sane: ARM, RISC-V, ...
Sane, but still died: PowerPC, MIPS, SPARC, ...
interesting viewpoint
Indeed. MIPS had software TLB[*], SPARC had register windows [**]
Alpha had a week [***] memory ordering. All survive
for backward compatibility and are generally
eschewed for new designs.
[*] Didn't perform well in large scale SMP without
tricks like page coloring. The virtually tagged
caches were troublesome.
[**] Register windows. Need I say more?
[***] Difficult to program multithreaded apps correctly, difficult to
port software to from other more strongly ordered
architectures.
On 3/17/2026 2:25 PM, Scott Lurndal wrote:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:They lack the obvious drawbacks of x86:
---------------
Nightmarish encoding scheme;
Mostly 2R / 2RI only;
Not enough registers / absurd levels of spill-and-fill needed;
While x86 had 8 registers, they weren't true GPRs;
All had fixed behaviors in certain instructions;
And, various edge-case limitations;
Like, being limited to 2R and excessive spill/fill is likely a worse
issue than, say:
Register windows (SPARC/etc);
Weird bit-sliced branches (MIPs).
Alpha initially lacked Byte/Half memory operations, etc, which also
wasn't great.
From a "design elegance" stance, IA-64 wasn't that bad;
In terms of practical issues, its design was lacking.
Contrast, say, major time wasters in RISC-V:
Need to do arithmetic or similar for indexed loads in RV64G;
Can be dealt with by indexed load;
Dealing with "imm/disp doesn't fit in imm12/disp12" issues;
Can be dealt with by jumbo prefixes;
...
Addressing a few of these issues makes it around 30% or so faster for a naive in-order implementation when running things like Doom and similar.
Or, ironically, a few things that x86 does have...
But, then people are left debating if it matters for OoO when running
the SPEC benchmark and similar, and there is currently a rising mindset
of "anyone CPU that cares about performance will be OoO, so the relative inefficiencies don't matter...".
While, seemingly ignoring the "mid range" that exists mostly in the
"budget cell-phone" space and similar, where they still care some about performance but in-order dominates.
Like, it is not a hard split between big server chips, and small microcontrollers.
But, I guess some people did some testing, and noted that for SPEC, the delta of indexed load/store dropped to closer to around 5% on an
in-order CPU when supported natively, which is maybe "not quite big
enough" for some people.
On 3/17/2026 2:25 PM, Scott Lurndal wrote:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
BGB <cr88192@gmail.com> posted:
On 3/15/2026 9:35 AM, John Dallman wrote:
In article <10p4p6g$lg6r$2@dont-email.me>, quadibloc@ca.invalid (quadi) >>>>> wrote:
Even I, who has indeed in Concertina II designed what has been in
some of its iterations - and which has again become in the current >>>>>> iteration, sadly - some really weird ISAs would shrink from
designing an ISA in which the answer to that question would be
"Yes".
iAPX 432 had instructions which weren't in whole bytes, and were
addressed by bit offset in a segment. You could only have 64K instruction >>>>> bits in a segment, or 8K bytes. The idea was that no subroutine or
function ever needed to be bigger than that.
On one hand? "kill it with fire!".
On the other? By the time I existed, it was basically already dead...
432 is worthy of study if only to figure out "what not to do".
Yeah, make it bad enough, and it becomes an example piece.
It is like, there are one of several ways things can go:
Abysmal Turd: iAPX 432
Turd: IA-64
Turd that flies: x86, x86-64
Sane: ARM, RISC-V, ...
Sane, but still died: PowerPC, MIPS, SPARC, ...
interesting viewpoint
Indeed. MIPS had software TLB[*], SPARC had register windows [**]
Alpha had a week [***] memory ordering. All survive
for backward compatibility and are generally
eschewed for new designs.
[*] Didn't perform well in large scale SMP without
tricks like page coloring. The virtually tagged
caches were troublesome.
[**] Register windows. Need I say more?
[***] Difficult to program multithreaded apps correctly, difficult to
port software to from other more strongly ordered
architectures.
These features are non-ideal (except partly Software TLB, which does
have some useful merits to go along with its drawbacks).
They lack the obvious drawbacks of x86:
Nightmarish encoding scheme;
Mostly 2R / 2RI only;
Not enough registers / absurd levels of spill-and-fill needed;
Sane, but still died: PowerPC, MIPS, SPARC, ...
quadi <quadibloc@ca.invalid> posted:
I want to avoid the need to do that. I want to provide an ISA with
staying power, one that has room to grow. Despite my having squeezed
the opcode space so much that there's hardly any of it left!
Second sentence is admirable, third says you are not on the path.
x86-64 has funky instruction encoding and due to 64-bit word
length constants no longer have full word range,
In article <10p8i54$1rmt5$1@dont-email.me>, cr88192@gmail.com (BGB)
wrote:
Sane, but still died: PowerPC, MIPS, SPARC, ...
PowerPC is not dead yet. IBM still sells a fair number of POWER
systems, and IBM i (formerly AS/400, formerly System/38) runs on it.
MIPS died of Itanium. SGI swallowed the Itanium hype and stopped MIPS development (they owned MIPS at the time). It never caught up once it
was restarted.
SPARC died by stages. Sun couldn't afford to keep development at a
high pitch, and the register windows did not help. Once Oracle owned
them, the high cost of chip development caused Oracle to cancel it.
They claimed they would support running SPARC Solaris until
2030-something, but got annoyed when asked how this would be
accomplished. You can't keep ISVs that way.
John
POWER9 - 2017
POWER10 - 2021
We are in 20026 now and I don't hear about 11.
Microarchitecture development on the high end stopped much earlier
- all high-end MIPS chips after R10K were reusing the same
microarchitecture.
Considering Oracle's financial situation at the time (very very
good) it does not look like they can't afford it. They just didn't
see a point.
In article <20260319120653.0000778b@yahoo.com>, already5chosen@yahoo.com (Michael S) wrote:
POWER9 - 2017 POWER10 - 2021 We are in 20026 now and I don't hear about
11.
Announced last July, though nobody has done a wikipedia page yet:
On Thu, 19 Mar 2026 10:46:00 +0000, John Dallman wrote:
In article <20260319120653.0000778b@yahoo.com>, already5chosen@yahoo.com
(Michael S) wrote:
POWER9 - 2017 POWER10 - 2021 We are in 20026 now and I don't hear about
11.
Announced last July, though nobody has done a wikipedia page yet:
One of your links didn't display clickably in my newsreader, so I just >Googled. At first, it seemed as if Power 11 was a new generation of
servers, but not necessarily chips, but eventually I did find that, yes,
the chips were definitely changed.
The chief distinguishing feature of Power 11 is that it offers a lot more >memory bandwidth. There appear to be multiple data buses coming from the >chips - apparently *sixteen*. Which is a figure usually associated with >things like the NEC SX-7.
On Thu, 19 Mar 2026 10:46:00 +0000, John Dallman wrote:
In article <20260319120653.0000778b@yahoo.com>,
already5chosen@yahoo.com (Michael S) wrote:
POWER9 - 2017 POWER10 - 2021 We are in 20026 now and I don't hear
about 11.
Announced last July, though nobody has done a wikipedia page yet:
One of your links didn't display clickably in my newsreader, so I
just Googled. At first, it seemed as if Power 11 was a new generation
of servers, but not necessarily chips, but eventually I did find
that, yes, the chips were definitely changed.
The chief distinguishing feature of Power 11 is that it offers a lot
more memory bandwidth. There appear to be multiple data buses coming
from the chips - apparently *sixteen*. Which is a figure usually
associated with things like the NEC SX-7.
John Savard
Indeed. MIPS had software TLB[*], SPARC had register windows [**]
Alpha had a week [***] memory ordering. All survive
for backward compatibility and are generally
eschewed for new designs.
On Wed, 18 Mar 2026 16:00:41 +0000, MitchAlsup wrote:
quadi <quadibloc@ca.invalid> posted:
I want to avoid the need to do that. I want to provide an ISA with
staying power, one that has room to grow. Despite my having squeezed
the opcode space so much that there's hardly any of it left!
Second sentence is admirable, third says you are not on the path.
Since you are the expert, I can't really argue. But I will make some
points in my own defense, even so.
I've said that I've used up the opcode space. That's true of the basic opcode space when no header is used, so that there are only 32-bit instructions (and some pairs of 16-bit instructions).
Except:
Waldek Hebisch <antispam@fricas.org> schrieb:
x86-64 has funky instruction encoding and due to 64-bit word
length constants no longer have full word range,
$ cat > foo.c
unsigned long foo()
{
return 0x1234567890abcdef;
}
$ gcc -S foo.c
$ cat foo.s
.file "foo.c"
.text
.globl foo
.type foo, @function
foo:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movabsq $1311768467294899695, %rax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size foo, .-foo
.ident "GCC: (GNU) 16.0.0 20260111 (experimental)"
.section .note.GNU-stack,"",@progbits
$
--------------------
SPARC died by stages. Sun couldn't afford to keep development at a
high pitch, and the register windows did not help. Once Oracle owned
them, the high cost of chip development caused Oracle to cancel it.
Considering Oracle's financial situation at the time (very very good)
it does not look like they can't afford it. They just didn't see a
point.
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
BGB <cr88192@gmail.com> posted:
On 3/15/2026 9:35 AM, John Dallman wrote:
In article <10p4p6g$lg6r$2@dont-email.me>, quadibloc@ca.invalid (quadi) >>>> wrote:
Even I, who has indeed in Concertina II designed what has been in
some of its iterations - and which has again become in the current
iteration, sadly - some really weird ISAs would shrink from
designing an ISA in which the answer to that question would be
"Yes".
iAPX 432 had instructions which weren't in whole bytes, and were
addressed by bit offset in a segment. You could only have 64K instruction >>>> bits in a segment, or 8K bytes. The idea was that no subroutine or
function ever needed to be bigger than that.
On one hand? "kill it with fire!".
On the other? By the time I existed, it was basically already dead...
432 is worthy of study if only to figure out "what not to do".
It is like, there are one of several ways things can go:
Abysmal Turd: iAPX 432
Turd: IA-64
Turd that flies: x86, x86-64
Sane: ARM, RISC-V, ...
Sane, but still died: PowerPC, MIPS, SPARC, ...
interesting viewpoint
Indeed. MIPS had software TLB[*], SPARC had register windows [**]
Alpha had a week [***] memory ordering.
for backward compatibility and are generally
eschewed for new designs.
[*] Didn't perform well in large scale SMP without
tricks like page coloring. The virtually tagged
caches were troublesome.
[**] Register windows. Need I say more?
[***] Difficult to program multithreaded apps correctly, difficult to
port software to from other more strongly ordered
architectures.
Michael S <already5chosen@yahoo.com> posted:
--------------------
SPARC died by stages. Sun couldn't afford to keep development at a
high pitch, and the register windows did not help. Once Oracle
owned them, the high cost of chip development caused Oracle to
cancel it.
Considering Oracle's financial situation at the time (very very
good) it does not look like they can't afford it. They just didn't
see a point.
It takes cubic dollars to fund a high end design team--something
like $200M/year just in development more when considering building
product.
Waldek Hebisch <antispam@fricas.org> schrieb:
x86-64 has funky instruction encoding and due to 64-bit word
length constants no longer have full word range,
$ cat > foo.c
unsigned long foo()
{
return 0x1234567890abcdef;
}
$ gcc -S foo.c
movabsq $1311768467294899695, %rax
On 3/17/2026 12:25 PM, Scott Lurndal wrote:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
BGB <cr88192@gmail.com> posted:
On 3/15/2026 9:35 AM, John Dallman wrote:
In article <10p4p6g$lg6r$2@dont-email.me>, quadibloc@ca.invalid (quadi) >>>> wrote:
Even I, who has indeed in Concertina II designed what has been in
some of its iterations - and which has again become in the current >>>>> iteration, sadly - some really weird ISAs would shrink from
designing an ISA in which the answer to that question would be
"Yes".
iAPX 432 had instructions which weren't in whole bytes, and were
addressed by bit offset in a segment. You could only have 64K instruction
bits in a segment, or 8K bytes. The idea was that no subroutine or
function ever needed to be bigger than that.
On one hand? "kill it with fire!".
On the other? By the time I existed, it was basically already dead...
432 is worthy of study if only to figure out "what not to do".
It is like, there are one of several ways things can go:
Abysmal Turd: iAPX 432
Turd: IA-64
Turd that flies: x86, x86-64
Sane: ARM, RISC-V, ...
Sane, but still died: PowerPC, MIPS, SPARC, ...
interesting viewpoint
Indeed. MIPS had software TLB[*], SPARC had register windows [**]
Alpha had a week [***] memory ordering.
Fwiw, SPARC in RMO mode was weak, but not as weak as a damn Alpha. At
least SPARC honored data dependent load dependencies, ala implied
consume membars.
https://en.cppreference.com/w/cpp/atomic/memory_order.html
I guess C++26 waves good bye to it. But, still... if they make a consume actually emit, say an acquire membar ala MEMBAR #LoadStore | #LoadLoad
on SPARC, or an acquire on any other platform, I would be pissed. If at
all, consume should give a warning and say C++26 does not like it
anymore. It should be just a compiler barrier unless on something like
an Alpha. Alpha, well shit out of luck? consume membar on Alpha would
emit a mb instruction for data dependent loads. C++ says the Alpha can die?
All survive
for backward compatibility and are generally
eschewed for new designs.
[*] Didn't perform well in large scale SMP without
tricks like page coloring. The virtually tagged
caches were troublesome.
[**] Register windows. Need I say more?
[***] Difficult to program multithreaded apps correctly, difficult to
port software to from other more strongly ordered
architectures.
On Thu, 19 Mar 2026 22:32:41 GMT
MitchAlsup <user5857@newsgrouper.org.invalid> wrote:
Michael S <already5chosen@yahoo.com> posted:
--------------------
SPARC died by stages. Sun couldn't afford to keep development at a
high pitch, and the register windows did not help. Once Oracle
owned them, the high cost of chip development caused Oracle to
cancel it.
Considering Oracle's financial situation at the time (very very
good) it does not look like they can't afford it. They just didn't
see a point.
It takes cubic dollars to fund a high end design team--something
like $200M/year just in development more when considering building
product.
It's Oracle of 2017 that we are talking about.
Total revenues : $ 37.728 B
Operating income: $ 12.710 B
Net income : $ 9.335 B
What is 0.2 B for such juggernaut ?
Just to put things in proportion, in November 2016 Oracle pid $9.3
billion USD for Netsuit.
Even given all the snipped below text--unless you can get ISA finished
and get a compiler written, software ported and a microarchitecture
built--it is nothing but a mental exercise.
It's Oracle of 2017 that we are talking about.
Total revenues : $ 37.728 B
Operating income: $ 12.710 B
Net income : $ 9.335 B
What is 0.2 B for such juggernaut ?
Just to put things in proportion, in November 2016 Oracle pid $9.3
billion USD for Netsuit.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> posted:
On 3/17/2026 12:25 PM, Scott Lurndal wrote:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
BGB <cr88192@gmail.com> posted:
On 3/15/2026 9:35 AM, John Dallman wrote:432 is worthy of study if only to figure out "what not to do".
In article <10p4p6g$lg6r$2@dont-email.me>, quadibloc@ca.invalid (quadi) >>>>>> wrote:
Even I, who has indeed in Concertina II designed what has been in >>>>>>> some of its iterations - and which has again become in the current >>>>>>> iteration, sadly - some really weird ISAs would shrink from
designing an ISA in which the answer to that question would be
"Yes".
iAPX 432 had instructions which weren't in whole bytes, and were
addressed by bit offset in a segment. You could only have 64K instruction
bits in a segment, or 8K bytes. The idea was that no subroutine or >>>>>> function ever needed to be bigger than that.
On one hand? "kill it with fire!".
On the other? By the time I existed, it was basically already dead... >>>>
It is like, there are one of several ways things can go:
Abysmal Turd: iAPX 432
Turd: IA-64
Turd that flies: x86, x86-64
Sane: ARM, RISC-V, ...
Sane, but still died: PowerPC, MIPS, SPARC, ...
interesting viewpoint
Indeed. MIPS had software TLB[*], SPARC had register windows [**]
Alpha had a week [***] memory ordering.
Fwiw, SPARC in RMO mode was weak, but not as weak as a damn Alpha. At
least SPARC honored data dependent load dependencies, ala implied
consume membars.
https://en.cppreference.com/w/cpp/atomic/memory_order.html
I guess C++26 waves good bye to it. But, still... if they make a consume
actually emit, say an acquire membar ala MEMBAR #LoadStore | #LoadLoad
on SPARC, or an acquire on any other platform, I would be pissed. If at
all, consume should give a warning and say C++26 does not like it
anymore. It should be just a compiler barrier unless on something like
an Alpha. Alpha, well shit out of luck? consume membar on Alpha would
emit a mb instruction for data dependent loads. C++ says the Alpha can die?
It already did.
All survive
for backward compatibility and are generally
eschewed for new designs.
[*] Didn't perform well in large scale SMP without
tricks like page coloring. The virtually tagged
caches were troublesome.
[**] Register windows. Need I say more?
[***] Difficult to program multithreaded apps correctly, difficult to
port software to from other more strongly ordered
architectures.
On 3/19/2026 6:42 PM, MitchAlsup wrote:
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> posted:
On 3/17/2026 12:25 PM, Scott Lurndal wrote:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
BGB <cr88192@gmail.com> posted:
On 3/15/2026 9:35 AM, John Dallman wrote:432 is worthy of study if only to figure out "what not to do".
In article <10p4p6g$lg6r$2@dont-email.me>, quadibloc@ca.invalid >>>>>>> (quadi)
wrote:
Even I, who has indeed in Concertina II designed what has been in >>>>>>>> some of its iterations - and which has again become in the current >>>>>>>> iteration, sadly - some really weird ISAs would shrink from
designing an ISA in which the answer to that question would be >>>>>>>> "Yes".
iAPX 432 had instructions which weren't in whole bytes, and were >>>>>>> addressed by bit offset in a segment. You could only have 64K
instruction
bits in a segment, or 8K bytes. The idea was that no subroutine or >>>>>>> function ever needed to be bigger than that.
On one hand? "kill it with fire!".
On the other? By the time I existed, it was basically already dead... >>>>>
It is like, there are one of several ways things can go:
    Abysmal Turd: iAPX 432
    Turd: IA-64
    Turd that flies: x86, x86-64
    Sane: ARM, RISC-V, ...
    Sane, but still died: PowerPC, MIPS, SPARC, ...
interesting viewpoint
Indeed. MIPS had software TLB[*], SPARC had register windows [**]
          Alpha had a week [***] memory ordering.
Fwiw, SPARC in RMO mode was weak, but not as weak as a damn Alpha. At
least SPARC honored data dependent load dependencies, ala implied
consume membars.
https://en.cppreference.com/w/cpp/atomic/memory_order.html
I guess C++26 waves good bye to it. But, still... if they make a consume >>> actually emit, say an acquire membar ala MEMBAR #LoadStore | #LoadLoad
on SPARC, or an acquire on any other platform, I would be pissed. If at
all, consume should give a warning and say C++26 does not like it
anymore. It should be just a compiler barrier unless on something like
an Alpha. Alpha, well shit out of luck? consume membar on Alpha would
emit a mb instruction for data dependent loads. C++ says the Alpha
can die?
It already did.
Well, I hope a C++ compiler does not treat a consume as an acquire
membar then! Grrrr.....
   All survive
          for backward compatibility and are generally
          eschewed for new designs.
[*] Didn't perform well in large scale SMP without
     tricks like page coloring.  The virtually tagged
     caches were troublesome.
[**] Register windows. Need I say more?
[***] Difficult to program multithreaded apps correctly, difficult to
       port software to from other more strongly ordered
       architectures.
They obviously thought it made sense. Let me qualify what I
wrote above. There is a saying "Where there is a will, there is a
business case." Such decisions are not always rational.
Michael S <already5chosen@yahoo.com> schrieb:
It's Oracle of 2017 that we are talking about.
Total revenues : $ 37.728 B
Operating income: $ 12.710 B
Net income : $ 9.335 B
What is 0.2 B for such juggernaut ?
Not sure if you have ever worked for a big company...
Each major project and product line is judged on its own merit.
If the business case does not appear to be there, they will
not spend the money.
Just to put things in proportion, in November 2016 Oracle pid $9.3
billion USD for Netsuit.
They obviously thought it made sense. Let me qualify what I
wrote above. There is a saying "Where there is a will, there is a
business case." Such decisions are not always rational.
Michael S <already5chosen@yahoo.com> schrieb:
What is 0.2 B for such juggernaut ?Each major project and product line is judged on its own merit.
If the business case does not appear to be there, they will
not spend the money.
Thomas Koenig <tkoenig@netcologne.de> writes:
Michael S <already5chosen@yahoo.com> schrieb:
What is 0.2 B for such juggernaut ?Each major project and product line is judged on its own merit.
If the business case does not appear to be there, they will
not spend the money.
I have to imagine that the board of Oracle was looking at a chip
design activity which would cost real money and play to no existing
strength of the company. Then would follow:
"Can we just run on chips made by somebody else?"
"Yes. But--"
"Thank you."
Andy Valencia
Home page: https://www.vsta.org/andy/
To contact me: https://www.vsta.org/contact/andy.html
No AI was used in the composition of this message
On Thu, 19 Mar 2026 22:24:09 +0000, MitchAlsup wrote:
Even given all the snipped below text--unless you can get ISA finished
and get a compiler written, software ported and a microarchitecture built--it is nothing but a mental exercise.
I can't argue with that; it is indisputable.
But I needed to get the basis right before proceeding with the hard work.
John Savard
quadi <quadibloc@ca.invalid> posted:
On Thu, 19 Mar 2026 22:24:09 +0000, MitchAlsup wrote:
Even given all the snipped below text--unless you can get ISA finished
and get a compiler written, software ported and a microarchitecture
built--it is nothing but a mental exercise.
I can't argue with that; it is indisputable.
But I needed to get the basis right before proceeding with the hard work.
20 years ago I had similar notions. But my experience with My 66000
changed my mind.
a) you have to have a compiler that can compile most things
so that you can see the deficiencies with your ISA
b) you have to have /bin/utils/ mostly compiled to see whatever
damage you have done to yourself in terms of external variables
and functions; SBI into and out of your OS; SBI' into and out
of your HyperVisor, ...
And after using these for a while, go back and correct the ISA.
If you fix ISA too soon you cut off much of your future.
quadi <quadibloc@ca.invalid> posted:
On Thu, 19 Mar 2026 22:24:09 +0000, MitchAlsup wrote:
Even given all the snipped below text--unless you can get ISA
finished and get a compiler written, software ported and a
microarchitecture built--it is nothing but a mental exercise.
I can't argue with that; it is indisputable.
But I needed to get the basis right before proceeding with the hard
work.
20 years ago I had similar notions. But my experience with My 66000
changed my mind.
a) you have to have a compiler that can compile most things
so that you can see the deficiencies with your ISA
b) you have to have /bin/utils/ mostly compiled to see whatever
damage you have done to yourself in terms of external variables and
functions; SBI into and out of your OS; SBI' into and out of your
HyperVisor, ...
And after using these for a while, go back and correct the ISA.
If you fix ISA too soon you cut off much of your future.
But after acquisition, in 2010, they did expand SPRAC design team and
most likely greately improved their financing. Only 7 years later they >decided to quit.
And I expect that in the decades of even larger performace
disadvantages for Sun's and Oracle SPARCs enough customers had jumped
ship or at least decided to jump ship at the next opportunity that by
2017 not enough SPARC business was left to pay for its continued
development.
So eventually, what had killed general-purpose MIPS, Alpha, HPPA, and (contemporaneously with SPARC) IA-64 also killed SPARC: High
development costs of these architectures supported by not enough
customer interest, due to customers defecting to Linux on Intel/AMD.
Interesting that Power and S390x still persevere. For S390x I expect
that the customers are not price-sensitive, and lots of legacy code in assembly language tied to some proprietary OS makes porting extra-hard
and extra-risky.
For Power, maybe the AS/400 customers are in a
similar position.
For MIPS, HPPA and SPARC there were lots of Unix
customers for whom the jump to Linux was not that hard. For Alpha the
VMS customer base apparently was not captured with enough ties to
sustain Alpha (or later IA-64).
On 3/21/2026 12:06 AM, Anton Ertl wrote:
For Power, maybe the AS/400 customers are in a
similar position.
No, actually the opposite position. AS/400 user code uses a very high
level system (i.e. no user assembly code) that provides much of the work
and is proprietary to IBM. While that system could be, and was, ported
to a different architecture (e.g. from S/38 to Power), of course, IBM
has no incentive to port it to a generic platform.
./gforth-fast -i kernl32b.fi startup.fs onebench.fs #on SPARC
./gforth-fast onebench.fs #on AMD64
sieve bubble matrix fib fft
0.312 0.277 0.136 0.353 0.195 5GHz SPARC M8
0.020 0.020 0.012 0.030 0.012 5GHz Ryzen 8700G
More than a factor of 10 for each benchmark. My guess is that the M8
has deficiencies in indirect-branch prediction.
Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
On 3/21/2026 12:06 AM, Anton Ertl wrote:
For Power, maybe the AS/400 customers are in a
similar position.
No, actually the opposite position. AS/400 user code uses a very
high level system (i.e. no user assembly code) that provides much of
the work and is proprietary to IBM. While that system could be, and
was, ported to a different architecture (e.g. from S/38 to Power),
of course, IBM has no incentive to port it to a generic platform.
That's an interesting statement. IBM could implement AS/400 on AMD64 machines (AFAIK it uses some extra bit for tagging on their enhanced
Power, but I am sure that an implementation on ARM A64 with top-byte
ignore and on AMD64 with similar freatures (don't remember the name)
would not incur too much overhead, if any). That would save them the
cost of continuing Power development.
So they probably think that they can charge AS/400 enough extra for
running on Power, that it more than makes up for the development costs
of Power. Why would AS/400 customers be willing to do that? My guess
is that the different architecture is successfully sold as a "secret
sauce" to them that justifies charging that much extra. Conversely,
if they just were given hardware with ARM, Intel or AMD CPUs and the
AS/400 (followon) OS, they would balk at the prices that IBM charges
them, even if IBM reduces these prices by their share in Power
development costs.
- anton
That's an interesting statement. IBM could implement AS/400 on AMD64
machines ...
So they probably think that they can charge AS/400 enough extra for
running on Power, that it more than makes up for the development costs
of Power. Why would AS/400 customers be willing to do that? ...
I am not sure that I follow your logic.
Is it based on assumption that System i constitutes a bulk of IBM POWER >income? I somehow think that it is does not.
According to Michael S <already5chosen@yahoo.com>:
That's an interesting statement. IBM could implement AS/400 on
AMD64 machines ...
That's probably true, give or take the fact that AS/400 and i are big
endian and AMD is little endian. POWER swings both ways, big endian
for i and little endian for p running AIX or linux.
So they probably think that they can charge AS/400 enough extra for
running on Power, that it more than makes up for the development
costs of Power. Why would AS/400 customers be willing to do that?
...
I am not sure that I follow your logic.
Is it based on assumption that System i constitutes a bulk of IBM
POWER income? I somehow think that it is does not.
IBM does not say, but I would think that most POWER machines are
running i.
If you want a linux server, there are a lot of
alternatives, mostly less expensive.
If you want i, you get it from
IBM.
They're not mutually exclusive. Recent i systems have a mode that can
run linux code, and POWER has a virtual machine hypervisor than can
run i and p virtual machines at on the same system.
Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
On 3/21/2026 12:06 AM, Anton Ertl wrote:
For Power, maybe the AS/400 customers are in a
similar position.
No, actually the opposite position. AS/400 user code uses a very high
level system (i.e. no user assembly code) that provides much of the work
and is proprietary to IBM. While that system could be, and was, ported
to a different architecture (e.g. from S/38 to Power), of course, IBM
has no incentive to port it to a generic platform.
That's an interesting statement. IBM could implement AS/400 on AMD64 machines (AFAIK it uses some extra bit for tagging on their enhanced
Power, but I am sure that an implementation on ARM A64 with top-byte
ignore and on AMD64 with similar freatures (don't remember the name)
would not incur too much overhead, if any).
That would save them the
cost of continuing Power development.
So they probably think that they can charge AS/400 enough extra for
running on Power, that it more than makes up for the development costs
of Power. Why would AS/400 customers be willing to do that? My guess
is that the different architecture is successfully sold as a "secret
sauce" to them that justifies charging that much extra. Conversely,
if they just were given hardware with ARM, Intel or AMD CPUs and the
AS/400 (followon) OS, they would balk at the prices that IBM charges
them, even if IBM reduces these prices by their share in Power
development costs.
On Sat, 21 Mar 2026 16:21:15 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
So they probably think that they can charge AS/400 enough extra for
running on Power, that it more than makes up for the development costs
of Power. Why would AS/400 customers be willing to do that? My guess
is that the different architecture is successfully sold as a "secret
sauce" to them that justifies charging that much extra. Conversely,
if they just were given hardware with ARM, Intel or AMD CPUs and the
AS/400 (followon) OS, they would balk at the prices that IBM charges
them, even if IBM reduces these prices by their share in Power
development costs.
- anton
I am not sure that I follow your logic.
Is it based on assumption that System i constitutes a bulk of IBM POWER >income?
According to Michael S <already5chosen@yahoo.com>:
That's an interesting statement. IBM could implement AS/400 on AMD64
machines ...
That's probably true, give or take the fact that AS/400 and i are big
endian and AMD is little endian.
POWER swings both ways, big endian for
i and little endian for p running AIX or linux.
And I know that there
used to be a fairly substantial embedded sales (by Freescale). I don't
know if they get any royalties, etc, from that and how dropping
development might effect that.
You have expressed your "distaste" of IBM marketing and their customer's >decisions before.
On Sun, 22 Mar 2026 01:12:58 -0000 (UTC)
John Levine <johnl@taugh.com> wrote:
IBM does not say, but I would think that most POWER machines are
running i.
I have no data, but if it was the case then why IBM would bother with >providing anything except the smallest POWER box?
Even the smallest box is probably huge overkill for i.
Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
On 3/21/2026 12:06 AM, Anton Ertl wrote:
For Power, maybe the AS/400 customers are in a
similar position.
No, actually the opposite position. AS/400 user code uses a very high >>level system (i.e. no user assembly code) that provides much of the work >>and is proprietary to IBM. While that system could be, and was, ported
to a different architecture (e.g. from S/38 to Power), of course, IBM
has no incentive to port it to a generic platform.
That's an interesting statement. IBM could implement AS/400 on AMD64 machines (AFAIK it uses some extra bit for tagging on their enhanced
Power, but I am sure that an implementation on ARM A64 with top-byte
ignore and on AMD64 with similar freatures (don't remember the name)
would not incur too much overhead, if any). That would save them the
cost of continuing Power development.
Yes, but to make the decision correctly requires knowledge held by IBM management, that neither you nor I have. For example, they haven't made major improvements in the Power architecture for years, so I suspect
their development team is rather small, and thus doesn't cost too much.
Is anybody still doing Alpha?
Tanenbaum in his book about computer architecture mentions results
of Bell and Newell from 1971. IIUC Bell and Newell coded
some programs in microcode of 2025 (microcode engine of 360/25).
Claim is that such program run 45 times faster than program
using 360 instruction set.
They also created Fortran compiler/
interpreter combination with interpreter coded in 2025
microcode. They claimed that this Fortran run at comparable
speed as "native" Fortran on 360/50.
| Sysop: | DaiTengu |
|---|---|
| Location: | Appleton, WI |
| Users: | 1,104 |
| Nodes: | 10 (0 / 10) |
| Uptime: | 492388:55:53 |
| Calls: | 14,151 |
| Calls today: | 2 |
| Files: | 186,281 |
| D/L today: |
2,600 files (984M bytes) |
| Messages: | 2,501,189 |
| Posted today: | 1 |