Forum: War Ensemble BBS

Concertina II: Finding Happiness Through Coding

From quadi@quadibloc@ca.invalid to comp.arch on Wed Apr 22 03:31:03 2026

From Newsgroup: comp.arch

I was not happy that when I did not use a block prefix, I had to omit the
Load Medium and Store Medium instructions from the basic load/store instructions.

I searched for available opcode space.

I found a little; enough for the _other_ block prefixes. But not a full
1/16 of the opcode space which is what the Type I header needed. Where did
I find it? In the opcodes for operate instructions which the 15-bit paired short instructions don't use.

So I thought that perhaps I could shrink the requirements of the Type I header. If, by making use of the fact that 10 (start of 32-bit or longer instruction) can only be followed by 11 (not the start of an instruction), then maybe I could replace four consecutive two-bit prefixes by one seven-
bit prefix.

But alas, this fact only reduced the possibilities to 81 + 27 + 27 + 1,
which is 136, which is greater than 128.

However, if I made use of the fact that I would know if the preceding 16-
bit zone began a 32-bit instruction, and added certain other restrictions
on the allowed combinations - by insisting that all pseudo-immediates be tidily put at the end of the block - I thought I was able to squeeze it in.

This may be a step too far, so I've saved everything if I need to go back.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Wed Apr 22 18:15:17 2026

From Newsgroup: comp.arch

quadi <quadibloc@ca.invalid> posted:

I was not happy that when I did not use a block prefix, I had to omit the Load Medium and Store Medium instructions from the basic load/store instructions.

Is LD Medium obtaining a sooth sayer from memory?

Is ST Medium putting a sooth sayer back in memory?

How do you know a sooth sayer fits in 2^(3+n) bytes???

I searched for available opcode space.

I found a little; enough for the _other_ block prefixes. But not a full
1/16 of the opcode space which is what the Type I header needed. Where did
I find it? In the opcodes for operate instructions which the 15-bit paired short instructions don't use.

So I thought that perhaps I could shrink the requirements of the Type I header. If, by making use of the fact that 10 (start of 32-bit or longer instruction) can only be followed by 11 (not the start of an instruction), then maybe I could replace four consecutive two-bit prefixes by one seven- bit prefix.

But alas, this fact only reduced the possibilities to 81 + 27 + 27 + 1, which is 136, which is greater than 128.

However, if I made use of the fact that I would know if the preceding 16-
bit zone began a 32-bit instruction, and added certain other restrictions
on the allowed combinations - by insisting that all pseudo-immediates be tidily put at the end of the block - I thought I was able to squeeze it in.

This may be a step too far, so I've saved everything if I need to go back.

John Savard

--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Thu Apr 23 03:35:44 2026

From Newsgroup: comp.arch

On Wed, 22 Apr 2026 18:15:17 +0000, MitchAlsup wrote:

quadi <quadibloc@ca.invalid> posted:

I was not happy that when I did not use a block prefix, I had to omit
the Load Medium and Store Medium instructions from the basic load/store
instructions.

Is LD Medium obtaining a sooth sayer from memory?

Is ST Medium putting a sooth sayer back in memory?

How do you know a sooth sayer fits in 2^(3+n) bytes???

No, I am not referring to one who channels the spirits of the dead.

Instead, the Medium data type refers to 48-bit floating-point values;
although not part of the IEEE 754 standard, they follow the pattern of the types defined in it. They offer a precision just above 11 decimal digits,
and an exponent range that exceeds 10 to plus or minus 99, thus
approximating the numbers pocket calculators make available.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Thu Apr 23 17:17:48 2026

From Newsgroup: comp.arch

On 4/22/2026 10:35 PM, quadi wrote:

On Wed, 22 Apr 2026 18:15:17 +0000, MitchAlsup wrote:

quadi <quadibloc@ca.invalid> posted:

I was not happy that when I did not use a block prefix, I had to omit
the Load Medium and Store Medium instructions from the basic load/store
instructions.

Is LD Medium obtaining a sooth sayer from memory?

Is ST Medium putting a sooth sayer back in memory?

How do you know a sooth sayer fits in 2^(3+n) bytes???

No, I am not referring to one who channels the spirits of the dead.

Instead, the Medium data type refers to 48-bit floating-point values; although not part of the IEEE 754 standard, they follow the pattern of the types defined in it. They offer a precision just above 11 decimal digits,
and an exponent range that exceeds 10 to plus or minus 99, thus
approximating the numbers pocket calculators make available.

Ironically, I had considered an intermediate format a few times, mostly represented as the Binary64 format with the low-order bits cut off.

Mostly hadn't amounted to much.

I did end up experimenting with support for a very niche converter:
(31:0) => (63:0)
As:
(31:4), (11:4), (11:4), (11:4), (11:4), (3:0)

Currently only available in an Imm32 instruction.

Seemingly, this pattern can deal with roughly 2/3 of the FPU constants
that miss as Binary16:
Multiples of 1/3, 1/5 and similar hit with this.

It fails for patterns like 1/7, 1/9, ... or similar, which have a
different bit pattern length (pattern doesn't repeat along an 8-bit
spacing).

Patterns like 1/7, 1/9, ... could be instead addressed with a pattern
that repeats on a multiple of 12 bits. But, this sort of thing is
getting a bit niche (would need different patterns to deal with
different fractions).

But, is a relatively affordable way to deal with this pattern; even if
it can't be crammed into a small size in the same way as simple BFP
patterns (and encoding an index into a table of possible patterns wont
save much over expressing the pattern directly).

Also, the 12-bit pattern case can be noted to miss more with patterns
that would hit with 8-bit or with binary16 (the 8-bit pattern case
mostly overlaps as well with the area covered by Binary16). A 6-bit
pattern could still overlap with Binary16's range, but would be more
limited in the fractions it can deal with.

Only really relevant for constant values though (as a live FP format,
would be worse than normal BFP).

Though, can make use of the extra bit left over from the Imm32f
encodings (which are actually stored as Imm33). More a debate though of
if it is worth the non-zero additional LUT cost to do so.

But, this combination would leave, statistically:
Imm16f: 63%
Imm6f 25% (S.E3.M2)
Imm32fu: 71% (8% over 63%, simply Binary64 truncated to 32 bits)
Imm32fn: 88% (25% hit rate over 63%, 8-bit pattern from above)

...

While Imm32fn has a higher hit rate than Imm32fu, they have a
non-overlap, so the combined Imm32fun in this case seems to have around
a 96% hit-rate, with around 4% in the "miss" category (irrational
constants, and stuff like 1/7 which has a 3 bit repeating pattern, vs
2-bit for 1/3 and 1/5).

If I added the 12-bit pattern (in addition to the existing two), could
maybe push it up to around a 97% or 98% hit rate, but the 12-bit pattern
by itself has a lower hit-rate than simply truncating the Binary64 value
to 32 bits, or even Binary16. So, selecting between 8b+12b pattern would
do worse than trunc32 + 8b pattern.

But, dunno.

However, the relative usage of floating point immediate values is low
enough that this doesn't make a big impact on code density.

Not much more "low hanging fruit" for improving code density ATM, but it
seems like if I could squeeze out a few more percent on overall code
density, it could put XG3 more solidly in the lead vs RV64GC+JX (where,
right now it is pretty close and which one wins/loses depends a lot on
the program being tested).

...

--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Thu Apr 23 18:08:04 2026

From Newsgroup: comp.arch

On 4/23/2026 5:17 PM, BGB wrote:

On 4/22/2026 10:35 PM, quadi wrote:

On Wed, 22 Apr 2026 18:15:17 +0000, MitchAlsup wrote:

quadi <quadibloc@ca.invalid> posted:

I was not happy that when I did not use a block prefix, I had to omit
the Load Medium and Store Medium instructions from the basic load/store >>>> instructions.

Is LD Medium obtaining a sooth sayer from memory?

Is ST Medium putting a sooth sayer back in memory?

How do you know a sooth sayer fits in 2^(3+n) bytes???

No, I am not referring to one who channels the spirits of the dead.

Instead, the Medium data type refers to 48-bit floating-point values;
although not part of the IEEE 754 standard, they follow the pattern of
the
types defined in it. They offer a precision just above 11 decimal digits,
and an exponent range that exceeds 10 to plus or minus 99, thus
approximating the numbers pocket calculators make available.

Ironically, I had considered an intermediate format a few times, mostly represented as the Binary64 format with the low-order bits cut off.

Mostly hadn't amounted to much.

I did end up experimenting with support for a very niche converter:
(31:0) => (63:0)
As:
(31:4), (11:4), (11:4), (11:4), (11:4), (3:0)

Currently only available in an Imm32 instruction.

Seemingly, this pattern can deal with roughly 2/3 of the FPU constants
that miss as Binary16:
Multiples of 1/3, 1/5 and similar hit with this.

It fails for patterns like 1/7, 1/9, ... or similar, which have a
different bit pattern length (pattern doesn't repeat along an 8-bit spacing).

Patterns like 1/7, 1/9, ... could be instead addressed with a pattern
that repeats on a multiple of 12 bits. But, this sort of thing is
getting a bit niche (would need different patterns to deal with
different fractions).

But, is a relatively affordable way to deal with this pattern; even if
it can't be crammed into a small size in the same way as simple BFP
patterns (and encoding an index into a table of possible patterns wont
save much over expressing the pattern directly).

Also, the 12-bit pattern case can be noted to miss more with patterns
that would hit with 8-bit or with binary16 (the 8-bit pattern case
mostly overlaps as well with the area covered by Binary16). A 6-bit
pattern could still overlap with Binary16's range, but would be more
limited in the fractions it can deal with.

Only really relevant for constant values though (as a live FP format,
would be worse than normal BFP).

Though, can make use of the extra bit left over from the Imm32f
encodings (which are actually stored as Imm33). More a debate though of
if it is worth the non-zero additional LUT cost to do so.

But, this combination would leave, statistically:
Imm16f: 63%
Imm6f 25% (S.E3.M2)
Imm32fu: 71% (8% over 63%, simply Binary64 truncated to 32 bits)
Imm32fn: 88% (25% hit rate over 63%, 8-bit pattern from above)

...

While Imm32fn has a higher hit rate than Imm32fu, they have a non-
overlap, so the combined Imm32fun in this case seems to have around a
96% hit-rate, with around 4% in the "miss" category (irrational
constants, and stuff like 1/7 which has a 3 bit repeating pattern, vs 2-
bit for 1/3 and 1/5).

If I added the 12-bit pattern (in addition to the existing two), could
maybe push it up to around a 97% or 98% hit rate, but the 12-bit pattern
by itself has a lower hit-rate than simply truncating the Binary64 value
to 32 bits, or even Binary16. So, selecting between 8b+12b pattern would
do worse than trunc32 + 8b pattern.

Relevant, but failed to mention, 12-bit pattern:
(31:4), (15:4), (15:4), (15:8), (3:0)

Which is effectively S.E11.M4 apart from the bits forming the pattern.

Though, could maybe see if there is some other patterns that could do
better here is terms of average hit-rate.

Or, if the seeming relative success of truncation and the 8-bit pattern
is more of a "take it as good enough and leave it at that" thing.

My last "big survey of floating point constants" had failed to take into account stats for repeating bit patterns (hadn't thought of trying to go
this route at the time; had thought in terms of power-of-10 scaling, but
this was much less feasible than trying to account more directly for the repeating bit patterns within the fractions).

As noted, a repeating pattern can in premise deal with all smaller
patters that have a common factor:
12 can handle 2, 3, 4, 6;
8 can handle 2, 4, 8.

Downfall of the 12-bit pattern is that it doesn't leave enough bits in
the mantissa for the top-end value.

Though, could squeeze a few bits out of the exponent:
(31:30),
(30)?4'h0:4'hF,
(29:4),
(15:4), (15:4), (15:12), (3:0)

Effectively giving 8-bits of usable mantissa.

Or, maybe sacrifice the sign bit (almost always positive for FPU
immediate values):
1'b0, (30),
(30) ? 4'h0 : 4'hF,
(29:4),
(31) ?
{ (15:4), (15:4), (15:12) } :
{ (11:4), (11:4), (11:4), (11:8) },
(3:0)

Haven't evaluated these possibilities yet though to determine possible
effects on hit-rate...

But, dunno.

However, the relative usage of floating point immediate values is low
enough that this doesn't make a big impact on code density.

Not much more "low hanging fruit" for improving code density ATM, but it seems like if I could squeeze out a few more percent on overall code density, it could put XG3 more solidly in the lead vs RV64GC+JX (where, right now it is pretty close and which one wins/loses depends a lot on
the program being tested).

...

--- Synchronet 3.21f-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri Apr 24 05:29:12 2026

From Newsgroup: comp.arch

MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

quadi <quadibloc@ca.invalid> posted:

I was not happy that when I did not use a block prefix, I had to omit the >> Load Medium and Store Medium instructions from the basic load/store
instructions.

Is LD Medium obtaining a sooth sayer from memory?

Is ST Medium putting a sooth sayer back in memory?

Obviously, this refers to steaks.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Fri Apr 24 12:01:34 2026

From Newsgroup: comp.arch

On Fri, 24 Apr 2026 05:29:12 +0000, Thomas Koenig wrote:

Obviously, this refers to steaks.

In a higher-level language, one has:

Real
Intermediate
Double Precision
Extended

But in Assembler, one needs

Floating
Medium
Double
Extended

because R for Real can be confused with R for Register, and I for
Intermediate can be confused with I for Integer.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From Robert Finch@robfi680@gmail.com to comp.arch on Fri Apr 24 10:53:19 2026

From Newsgroup: comp.arch

On 2026-04-24 8:01 a.m., quadi wrote:

On Fri, 24 Apr 2026 05:29:12 +0000, Thomas Koenig wrote:

Obviously, this refers to steaks.

In a higher-level language, one has:

Real
Intermediate
Double Precision
Extended

But in Assembler, one needs

Floating
Medium
Double
Extended

because R for Real can be confused with R for Register, and I for Intermediate can be confused with I for Integer.

John Savard

What about triple and quad precision? Or extended triple precision?

For Arpl at one point the float precision could be specified a bit like bitfields are specified in ‘C’ as in:
Float:8 myvar;

Changed it though to standard types as it was undesirable to support any bit-length for floats which would have to be done with software. Now it
is just:

float byte myvar;
float quad qvar;

Can also use shorter form for some types like:
double dvar;
Instead of having to type ‘float double dvar;’

Some float approximations will supply around 7 bits which works well to
fill in the significand for the progression of 16, 32, 64, 128-bit floats.

Having a 48-bit float type likely does not save any processing time over
a 64-bit type. It is more a matter of storage space.

48-bit floats in arrays may slow down indexed addressing; scaled index
address modes are usually a power of two.

--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Fri Apr 24 10:22:45 2026

From Newsgroup: comp.arch

On 4/24/2026 7:01 AM, quadi wrote:

On Fri, 24 Apr 2026 05:29:12 +0000, Thomas Koenig wrote:

Obviously, this refers to steaks.

In a higher-level language, one has:

Real
Intermediate
Double Precision
Extended

But in Assembler, one needs

Floating
Medium
Double
Extended

because R for Real can be confused with R for Register, and I for Intermediate can be confused with I for Integer.

I went with:
H: Half
F/S: Float or Single
D: Double
X: 128-bit (beyond this depends on context)

RV used Q for Binary128, but Q was more widely used for Int64 in my naming.

Int naming:
B/SB/UB: Byte
W/SW/UW: Int16 ("word")
L/SL/UL: Int32 ("long")
T/ST/UT: Int48 ("tword" / triple word), short lived
Q: Int64 ("qword")

RV had used:
B/H/{W|S}/D/Q

...

Did look into 48b Load/Store ops, but didn't stick.
Could have supported a 48-bit format mostly by using 48b Load/Store.
Other option being to fake it by using 64b ops, and MUX'ing.
Load, MUX, Store

Less efficient, but TW was super niche and hard to justify keeping it
around.

But, yeah, otherwise disrupted by a PSU failure on main PC (yesterday).
Waiting for a new PSU to show up, can't get back to "business as usual"
until then. Failed PSU was a 750W Rosewill, ordered a 750W MSI,
hopefully works... Was $30 more than another Rosewill, but hopefully
worth it (there were also much more expensive PSUs, I just didn't go for
the cheapest option in this case, but yeah).

Lots of bad luck in general yesterday.

John Savard

--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Fri Apr 24 16:08:03 2026

From Newsgroup: comp.arch

On Fri, 24 Apr 2026 10:53:19 -0400, Robert Finch wrote:

What about triple and quad precision? Or extended triple precision?

There is quad precision, referred to as extended precision.
Normally, there is no 96-bit triple precision. That may, however, make an appearance when the computer is working with storage divided into 48-bit
units instead of 32/64-bit units.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Fri Apr 24 16:12:26 2026

From Newsgroup: comp.arch

On Wed, 22 Apr 2026 03:31:03 +0000, quadi wrote:

This may be a step too far, so I've saved everything if I need to go
back.

While I had tried to organize the coding scheme, I decided that it was too complex to be tolerable.

The compromise of eliminating medium format floating-point loads and
stores from the default basic instruction set was not tolerable.

The compromise to the 15-bit paired instructions that preceded that was
also not tolerable.

So what to do? What I've been doing all along in this design process -
move the compromise somewhere else, and see if I can put up with it. So
now I've decided to take the 32-bit header for variable-length
instructions, and put the compromise there. This required bringing back 16-
bit short instructions.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Apr 24 18:52:07 2026

From Newsgroup: comp.arch

BGB <cr88192@gmail.com> posted:

On 4/24/2026 7:01 AM, quadi wrote:

--------------------

I went with:
H: Half
F/S: Float or Single
D: Double
X: 128-bit (beyond this depends on context)

RV used Q for Binary128, but Q was more widely used for Int64 in my naming.

Int naming:
B/SB/UB: Byte
W/SW/UW: Int16 ("word")
L/SL/UL: Int32 ("long")
T/ST/UT: Int48 ("tword" / triple word), short lived
Q: Int64 ("qword")

RV had used:
B/H/{W|S}/D/Q

This is what I use. Except I have signed and unsigned integer
arithmetic {B, BS, H, HS, W, WS, D} integers and {H, S, D}
floats.

John Savard

--- Synchronet 3.21f-Linux NewsLink 1.2

From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Fri Apr 24 09:28:31 2026

From Newsgroup: comp.arch

I was not happy that when I did not use a block prefix, I had to omit the >>> Load Medium and Store Medium instructions from the basic load/store
instructions.

Is LD Medium obtaining a sooth sayer from memory?
Is ST Medium putting a sooth sayer back in memory?

Obviously, this refers to steaks.

But these operations are too rare to include in usual ISAs,

=== Stefan
--- Synchronet 3.21f-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri Apr 24 20:05:45 2026

From Newsgroup: comp.arch

Stefan Monnier <monnier@iro.umontreal.ca> schrieb:

I was not happy that when I did not use a block prefix, I had to omit the >>>> Load Medium and Store Medium instructions from the basic load/store
instructions.

Is LD Medium obtaining a sooth sayer from memory?
Is ST Medium putting a sooth sayer back in memory?

Obviously, this refers to steaks.

But these operations are too rare to include in usual ISAs,

Well done!
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat Apr 25 01:48:04 2026

From Newsgroup: comp.arch

Thomas Koenig <tkoenig@netcologne.de> posted:

Stefan Monnier <monnier@iro.umontreal.ca> schrieb:

I was not happy that when I did not use a block prefix, I had to omit the
Load Medium and Store Medium instructions from the basic load/store >>>> instructions.

Is LD Medium obtaining a sooth sayer from memory?
Is ST Medium putting a sooth sayer back in memory?

Obviously, this refers to steaks.

But these operations are too rare to include in usual ISAs,

Well done!

You can sous vidé
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Sat Apr 25 02:08:10 2026

From Newsgroup: comp.arch

On Fri, 24 Apr 2026 16:12:26 +0000, quadi wrote:

So what to do? What I've been doing all along in this design process -
move the compromise somewhere else, and see if I can put up with it. So
now I've decided to take the 32-bit header for variable-length
instructions, and put the compromise there.

Something truly evil has occurred to me. But since it _is_ so evil, I
don't think that I will do it.

Instead of removing the LM (Load Medium) and STM (Store Medium) basic memory-reference instructions...

there are another two that, under a certain circumstance, could be removed.

Under that circumstance, there would still be IB (Insert Byte) and IH
(Insert Halfword), and ULB (Unsigned Load Byte) and ULH (Unsigned Load Halfword).

But _not_ I (Insert) and UL (Unsigned Load).

In the case of a *32-bit* architecture.

So the truly evil thing would be...

1) To decide that a 32-bit version of the architecture needs to be defined;
2) To decide that it should be the default;
3) To decide that not switching modes to get at instruction set extensions should apply to the switch between 32 bits and 64 bits too, so that 64-bit code would consist _entirely_ of instruction blocks that begin with a
block header, because instructions without a header could only be in one state, that being 32-bits.

Of course, it's 3) that exposes the true evilness of this scheme. So I
don't think it's a place I want to go.

However, let's say I do want to define 32-bit operation, but _with_ a mode bit.

Then in 32-bit mode, those two opcodes would get used for an uncompromised variable-length instruction header.

Now what is 64-bit mode going to look like? That would almost force going
back to the option of demoting LM and STM. Or does it mean I have to come
up with something more devious, something truly perverse, that somehow provides a headerless header, sneaking in an invisible mode bit in the
code itself? But there's no such thing as a free bit; they're like midday meals in this regard.

The System/360 had it simple - extra instructions needed for 64-bit
operation? Just shove them in the 64-bit opcode space. But in Concertina
II, instructions longer than 32 bits are somewhat wasteful in overhead,
though I've tried... and they're _only_ available with a particular
category of headers. So I feel I need to have everything _important_ in
the basic 32-bit instruction set.

This direction of thinking suggests... that I use some of the opcode space
I still do have free... for special 64-bit instructions that are available without a header. This has been done before in previous Concertina II iterations. Emergency long instructions - inefficient because _both_ 32-
bit words of the instruction have to begin with 9 or so overhead bits to indicate they belong to such an instruction... but less inefficient than adding a whole 32-bit header to the block if you just need one of them in
the block.

That way, I can add lots of extra instructions to be part of the basic headerless instruction set.

While such a capability may be a good thing in itself, though, using it as
an excuse to uglify the basic instruction set is _still_ something I would want to avoid, so I don't think it solves the problem of restoring the 32-
bit variable-length instruction header to its uncompromised glory.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Sat Apr 25 03:43:29 2026

From Newsgroup: comp.arch

On Sat, 25 Apr 2026 02:08:10 +0000, quadi wrote:

This direction of thinking suggests... that I use some of the opcode
space I still do have free... for special 64-bit instructions that are available without a header. This has been done before in previous
Concertina II iterations. Emergency long instructions - inefficient
because _both_ 32- bit words of the instruction have to begin with 9 or
so overhead bits to indicate they belong to such an instruction... but
less inefficient than adding a whole 32-bit header to the block if you
just need one of them in the block.

That way, I can add lots of extra instructions to be part of the basic headerless instruction set.

Thinking about this led me to do something completely different, which required me to use a bit more opcode space for headers - but I think I
left enough where I grabbed it from to still do this as well.

This was done to add some additional flexibility to one type of variable length code - now the _short_ instructions, instead of the memory-
reference operate instructions, can be given the power to alter the
condition codes.

This comes at a cost, though. Now the memory-reference operate
instructions can only use the first half of each register bank as
destination registers, and the header no longer provides access to a
secondary 32-bit instruction set as well, if this option is chosen.

While this could be very useful, by allowing short instructions to be used
in cases where they previously could not, it's still somewhat perverse,
like much else I have done in this and previous iterations of Concertina
II.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sat Apr 25 00:38:35 2026

From Newsgroup: comp.arch

On 4/24/2026 1:52 PM, MitchAlsup wrote:

BGB <cr88192@gmail.com> posted:

On 4/24/2026 7:01 AM, quadi wrote:

--------------------

I went with:
H: Half
F/S: Float or Single
D: Double
X: 128-bit (beyond this depends on context)

RV used Q for Binary128, but Q was more widely used for Int64 in my naming. >>
Int naming:
B/SB/UB: Byte
W/SW/UW: Int16 ("word")
L/SL/UL: Int32 ("long")
T/ST/UT: Int48 ("tword" / triple word), short lived
Q: Int64 ("qword")

RV had used:
B/H/{W|S}/D/Q

This is what I use. Except I have signed and unsigned integer
arithmetic {B, BS, H, HS, W, WS, D} integers and {H, S, D}
floats.

It likely depends on which "tradition" one is coming from.

In my case, I was coming from SH-4 and x86.
SH-4 was B/W/L (likewise for M68K and i386 syntax in GAS).
Though, differs from M68K and "i386" in various ways
(eg, no "%" on registers, ...).
Well, and 0x1234 vs $1234 or similar.
Eg: "mov 0x1234, r10" vs "mov #$1234, %d4"
But, seems even within GAS usage, this was inconsistent.
Q/X: from x86 (though x86 also used DQ instead of X for some ops).

At present, it seems like 'X' may have been a mistake (well, along with
trying to use both sets of mnemonics and then trying to auto-detect the
ASM style).

Though, there is still the problem that there is no good or fully
reliable way to tell the which ASM syntax is in use (and, neither
annotates it, and since both evolved from variants of GAS ASM syntax, it
makes it harder).

...

Well, and I guess one could try to argue the merits of, say:
0x1234
$1234
1234H
&H1234
#0x1234
#$1234
16'h1234
...
And, say:
(R10, 16)
16(R10)
[R10+16]
[R10,16]
...

Otherwise:
New PSU showed up, and is installed, and main PC is working again.

Decided to test the new decimal packing schemes against the "bulk
scavenged FP constants" test, results currently for this test;
Binary16 hit rate : 63.7%
Truncated to 32 bits: 66.9%
Packing, 8b-A: 73.9%
Packing, 8b-B: 62.5%
Packing, 12b : 61.3%
T32 + 8b-B + 12b: 77.2%
T32 + 8b-A: 76.9%

This is lower than my earlier estimates based on my smaller scale tests.

Where, as noted, unpacking patters:
Fp16: (15:14), (14) ? 6'h00 : 6'h3F, (13:0), 42'h0
T32: (31:0), 32'h0
8b-A: (31:4), (11:4), (11:4), (11:4), (11:4), (3:0)
8b-B: 1'b0, (30) ? 5'h10 : 5'h0F, (29:4),
(11:4), (11:4), (11:4), (11:8), (3:0)
12: 1'b0, (30) ? 5'h10 : 5'h0F, (29:4),
(15:4), (15:4), (15:12), (3:0)

The T32 + 8b-A case has nearly the same hit rate, but is cheaper (and,
is also what I had already implemented experimentally).

While T32 + 8B-A + 12b could potentially give the highest hit rate, this combination would also be the most expensive. And, without the exponent trickery, the hit-rate for 12b will suck.

But, as-is, would be exclusive to XG3 (XG1/XG2/RV being limited to the
Fp16 case for FPU immediate forms).

Still debatable if worth the costs (while it is improvement in hit rate,
it is also a bit of a corner case).

...

John Savard

--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat Apr 25 18:00:22 2026

From Newsgroup: comp.arch

BGB <cr88192@gmail.com> posted:

On 4/24/2026 1:52 PM, MitchAlsup wrote:

------------------

This is what I use. Except I have signed and unsigned integer
arithmetic {B, BS, H, HS, W, WS, D} integers and {H, S, D}
floats.

It likely depends on which "tradition" one is coming from.

IBM 360, 1963.
------------------

Well, and I guess one could try to argue the merits of, say:
0x1234
$1234
1234H
&H1234
#0x1234
#$1234
16'h1234

Use C notation when possible.

...
And, say:
(R10, 16)
16(R10)
[R10+16]
[R10,16]
...

The [] notations tell ASM that the instruction has to be a
memory reference, the () notations do not.

--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sat Apr 25 13:39:25 2026

From Newsgroup: comp.arch

On 4/25/2026 1:00 PM, MitchAlsup wrote:

BGB <cr88192@gmail.com> posted:

On 4/24/2026 1:52 PM, MitchAlsup wrote:

------------------

This is what I use. Except I have signed and unsigned integer
arithmetic {B, BS, H, HS, W, WS, D} integers and {H, S, D}
floats.

It likely depends on which "tradition" one is coming from.

IBM 360, 1963.

OK.

------------------

Well, and I guess one could try to argue the merits of, say:
0x1234
$1234
1234H
&H1234
#0x1234
#$1234
16'h1234

Use C notation when possible.

That is my preference (I usually use 0x1234 without any extra
adornment), usually...

Except that the 6502/65C816 and M68K fans seem to really like using
$1234 instead...

Stylistically, I think the 6502 ASM notation was influenced by Motorola (though differs somewhat from M68K notation).

Likewise, GAS's i386 syntax was likely influenced by the M68K ASM syntax.

...
And, say:
(R10, 16)
16(R10)
[R10+16]
[R10,16]
...

The [] notations tell ASM that the instruction has to be a
memory reference, the () notations do not.

Could be.
I think () comes mainly from the PDP/VAX/M68K style lineage...
Whereas [] were used more with Intel and ARM and similar.

As noted, I ended up preferring (Rb, Disp) over Disp(Rb), but RISC-V's standard ASM syntax went the other way.

Neither ended up using @Rb syntax though, which was used by Hitachi and
Texas Instruments, but in a way differing in the specifics from how DEC
had used it in PDP and VAX (or was some of this more due to AT&T, hard
to tell?...).

...

--- Synchronet 3.21f-Linux NewsLink 1.2

From Robert Finch@robfi680@gmail.com to comp.arch on Sat Apr 25 14:39:43 2026

From Newsgroup: comp.arch

On 2026-04-25 1:38 a.m., BGB wrote:

On 4/24/2026 1:52 PM, MitchAlsup wrote:

BGB <cr88192@gmail.com> posted:

On 4/24/2026 7:01 AM, quadi wrote:

--------------------

I went with:
    H: Half
    F/S: Float or Single
    D: Double
    X: 128-bit (beyond this depends on context)

RV used Q for Binary128, but Q was more widely used for Int64 in my
naming.

Int naming:
    B/SB/UB: Byte
    W/SW/UW: Int16 ("word")
    L/SL/UL: Int32 ("long")
    T/ST/UT: Int48 ("tword" / triple word), short lived
    Q: Int64 ("qword")

RV had used:
    B/H/{W|S}/D/Q

This is what I use. Except I have signed and unsigned integer
arithmetic {B, BS, H, HS, W, WS, D} integers and {H, S, D}
floats.

It likely depends on which "tradition" one is coming from.

In my case, I was coming from SH-4 and x86.
SH-4 was B/W/L (likewise for M68K and i386 syntax in GAS).
    Though, differs from M68K and "i386" in various ways
     (eg, no "%" on registers, ...).
     Well, and 0x1234 vs $1234 or similar.
       Eg: "mov 0x1234, r10" vs "mov #$1234, %d4"
       But, seems even within GAS usage, this was inconsistent.
Q/X: from x86 (though x86 also used DQ instead of X for some ops).

After begin confused enough times I started using Knuth.
B = byte
W = wyde (16 bits)
T = tetra (32 bits)
O = octa (64 bits)
H = hexi (128 bits)
I used 'D' at one time to represent 80-bits.

For floating-point where things seem to be more standard
H = half
S = single
D = double
Q = quad

At present, it seems like 'X' may have been a mistake (well, along with trying to use both sets of mnemonics and then trying to auto-detect the
ASM style).

Though, there is still the problem that there is no good or fully
reliable way to tell the which ASM syntax is in use (and, neither
annotates it, and since both evolved from variants of GAS ASM syntax, it makes it harder).

...

Well, and I guess one could try to argue the merits of, say:
0x1234
$1234
1234H
&H1234
#0x1234
#$1234
16'h1234
...
And, say:
(R10, 16)
16(R10)
[R10+16]
[R10,16]
...

Otherwise:
New PSU showed up, and is installed, and main PC is working again.

Decided to test the new decimal packing schemes against the "bulk
scavenged FP constants" test, results currently for this test;
Binary16 hit rate   : 63.7%
Truncated to 32 bits: 66.9%
Packing, 8b-A: 73.9%
Packing, 8b-B: 62.5%
Packing, 12b : 61.3%
    T32 + 8b-B + 12b: 77.2%
    T32 + 8b-A: 76.9%

This is lower than my earlier estimates based on my smaller scale tests.

Where, as noted, unpacking patters:
Fp16: (15:14), (14) ? 6'h00 : 6'h3F, (13:0), 42'h0
T32: (31:0), 32'h0
8b-A: (31:4), (11:4), (11:4), (11:4), (11:4), (3:0)
8b-B: 1'b0, (30) ? 5'h10 : 5'h0F, (29:4),
    (11:4), (11:4), (11:4), (11:8), (3:0)
12:   1'b0, (30) ? 5'h10 : 5'h0F, (29:4),
    (15:4), (15:4), (15:12), (3:0)

The T32 + 8b-A case has nearly the same hit rate, but is cheaper (and,
is also what I had already implemented experimentally).

While T32 + 8B-A + 12b could potentially give the highest hit rate, this combination would also be the most expensive. And, without the exponent trickery, the hit-rate for 12b will suck.

But, as-is, would be exclusive to XG3 (XG1/XG2/RV being limited to the
Fp16 case for FPU immediate forms).

Still debatable if worth the costs (while it is improvement in hit rate,
it is also a bit of a corner case).

...

John Savard

--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Sat Apr 25 21:56:22 2026

From Newsgroup: comp.arch

On Sat, 25 Apr 2026 18:00:22 +0000, MitchAlsup wrote:

BGB <cr88192@gmail.com> posted:

It likely depends on which "tradition" one is coming from.

IBM 360, 1963.

You're early. The IBM System/360 was announced on April 7, 1964.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sun Apr 26 00:28:06 2026

From Newsgroup: comp.arch

On 4/24/2026 9:53 AM, Robert Finch wrote:

On 2026-04-24 8:01 a.m., quadi wrote:

On Fri, 24 Apr 2026 05:29:12 +0000, Thomas Koenig wrote:

Obviously, this refers to steaks.

In a higher-level language, one has:

Real
Intermediate
Double Precision
Extended

But in Assembler, one needs

Floating
Medium
Double
Extended

because R for Real can be confused with R for Register, and I for
Intermediate can be confused with I for Integer.

John Savard

What about triple and quad precision? Or extended triple precision?

For Arpl at one point the float precision could be specified a bit like bitfields are specified in ‘C’ as in:
Float:8 myvar;

Changed it though to standard types as it was undesirable to support any bit-length for floats which would have to be done with software. Now it
is just:

float byte myvar;
float quad qvar;

Can also use shorter form for some types like:
double dvar;
Instead of having to type ‘float double dvar;’

Hrrm:
char float //FP8
short float //Binary16
float //Binary32
long float //48-bit
short double //48-bit (truncated Binary64, align=2)
double //Binary64
short long double //96-bit (truncated Binary128, align=4)
long double //Binary128
long long float //192-bit (truncated Binary256, align=8)
long long double //Binary256

Then, maybe:
unsigned char float //FP8U
signed char float //FP8A

Well, or add _BitFloat(N) or similar.

Some float approximations will supply around 7 bits which works well to
fill in the significand for the progression of 16, 32, 64, 128-bit floats.

Having a 48-bit float type likely does not save any processing time over
a 64-bit type. It is more a matter of storage space.

When I experimented with it before, it exists solely as a truncated
storage format.

I did experiment with special ops to save/reload the high 48 bits of a register to memory, but then noted that in this case it is likely better
to do it as a multi-op sequence.

This is less efficient, but saves on expending hardware resources on
something that is likely to be rarely if ever used.

48-bit floats in arrays may slow down indexed addressing; scaled index address modes are usually a power of two.

Yes.

Making array access to NPOT elements fast would be a harder problem.

Intermediate-sized elements more often make sense for packed vectors;
usually with 3 elements in a power-of 2 package.

So, say:
3x 10b (~ S.E5.M4 | E5.M5)
3x 20b (~ S.E5.M14)
3x 42b (~ S.E8.M33)

In my case, there were special ops to help with these formats.

Though, can note that my first major use of the 3x 42b case was not in
my ISA project, but rather in my BT2 3D engine, where I had noted that
in a use-case that was not particularly computationally bound, it
actually worked out faster to store 3D coordinate vectors in a packed
form and then unpack and repack them when it was time to do math on them (where, passing 192-bit structs around was significantly more expensive
in the Windows X64 ABI than passing 128 bit structs).

Also it was a case of, with a 1024km world size, normal Binary32 failed
to give sufficient precision. Well, and for whatever reason I didn't
think to just use fixed-point (the BT3 engine instead uses fixed-point internally for coords, but uses floating-point coords for the
serialization format, *1).

*1: Well, in this case, the BT3 engine uses a serialized format
representing XML trees that I was calling ABXE, which (maybe ironically)
was originally designed for BGBCC (though, the BT3 engine was partly
built from code copy/pasted from BGBCC, but follows after some design
elements from the BT2 engine as well).

Well, as noted:
BT1 engine: Used modified Quake Maps for world spawning, so no live
entities.

Despite ending up as a Minecraft clone, it was actually spawning the
regions inside of the Quake Map. Well, more specifically, it was
natively using the Half-Life variant of the map format, which had used a different (and more sane) system for specifying texture projection onto
brush faces.

If loading up a world with no voxel regions, the BT1 would behave more
like a Quake-ish engine. Had also given it the ability to load Quake3
and Doom3 maps (mostly similar technology). Did not have the ability to
load the Half-Life 2 map format, as by this point Valve had changed the
format quite significantly.

Things differed quite significantly for the BT2 engine, which used
solely regions with no entities. Instead it had used entity-spawn
blocks, which would fire off command strings when the region was loaded
to spawn in any entities. Any associated entities would unload when the
region was unloaded, and there was no persistent live state with the
entities (rather the "quest state" was held entirely in hidden spots
within the player's inventory; and entities could be set to spawn or
despawn based on the inventory state).

Things changed again for the BT3 Engine, which instead used exclusively persistent entities (stored via the ABXE format). Essentially each
region holding the equivalent of a binary-serialized XML document
describing the state of all of the entities within the region.

Structures could be described in XML format and then spawned into the
world, along with any entities. Generally, this would involve removing
prior spawned entities of given types from a bounding box before
spawning in new ones (and then generally using a stack of BMP images to
drive block placement, which each color mapped to a specific block
type). Though, potentially, BMP pixels could also be used to drive
entity spawning as well.

It is also possible that CSG brushes could be reintroduced, but would be
out of place in a Minecraft style world. Well, or more so than using 3D
models described in SCAD scripts already is.

Ironically, if adding brush-model geometry, the most sensible way to do
it would be to do it in a similar way to Quake Brush Entities, which
each entity holds its associated brush geometry but possibly remains
mostly static. Well, as opposed to reintroducing the concept of a
"worldspawn" which doesn't make as much sense in this case (and if a worldspawn existed, it would likely make sense as some sort of "global
and always loaded" entity; probably who's responsibilities would include things like managing the sky and day/night cycle).

Well, and might slightly increase the level of abstraction by being like:
<brush_aabb mx=... ... nz=... texture=... />
Rather than as individual planes:
<brush>
<face nx=... ny=... nz=... nw=... texture=... .../>
</brush>

Well, or the other (more complex) option being to allow inline SCAD or similar.

<entity classname="func_wall">
<scad>
<![CDATA[
...
]]>
</scad>
</entity>

Well, and extending the language to support texture maps and similar. On spawn, engine would likely run the script and then generate a binary
data blob to hold the geometry (as opposed to loading a 3D model from a
file, could make it otherwise equivalent to my existing support for BMD models, *2).

Well, or use my makeshift CSG BASIC language.

*2: Also ironically, despite BMD models being described using CSG, the
storage format generally uses closed meshes with BREP rules for
collision detection.

Well, as opposed to Quake where everything had used bounding boxes
(well, though player and mob collisions had used bounding boxes, and
generally BT3 had used Minecraft-like behavior where entities are
essentially non-solid but push away from each other).

The BT1 engine also had rigid body physics (like HL2 or Doom3), but
neither the BT2 nor BT3 engine had used this. Could bring it back, but
there are surprisingly few "actually useful" cases for rigid body
physics in a 3D engine (well, unless the game is HL2 or Portal, which
used it as a gameplay element).

Well, and then there is also soft-body physics, but there are very few
use cases outside of using it for cosmetic effects.

Well, and for player and entity movement and similar, it is pretty hard
to beat the naive option of just having entities with AABBs or similar
sliding around the scene (player is effectively shaped like a sliding refrigerator, but no one really notices).

...

--- Synchronet 3.21f-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Apr 26 07:30:13 2026

From Newsgroup: comp.arch

BGB <cr88192@gmail.com> schrieb:

Stylistically, I think the 6502 ASM notation was influenced by Motorola (though differs somewhat from M68K notation).

Which is not a surprise because the 6502 was developed
by people from the 6800 (not to be confused with the
with 68000 :-) development team at Motorola who had
formed their own company. Compare a die shot of the 6800 https://happytrees.org/dieshots/Motorola_-_6800#/media/File:Motorola_6800_die.jpg
with one of the 6502
https://commons.wikimedia.org/wiki/File:MOS_6502_die.jpg to see
the similarity in overall layout: PLA at the top, random logic
in the middle, ALU and registers at the bottom.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Sun Apr 26 14:53:09 2026

From Newsgroup: comp.arch

On Sun, 26 Apr 2026 07:30:13 +0000, Thomas Koenig wrote:

Which is not a surprise because the 6502 was developed by people from
the 6800 (not to be confused with the with 68000 :-) development team at Motorola who had formed their own company.

Unless you mean Freescale, which was spun off by Motorola itself, I don't
know what you're referring to. A web search did not turn anything up about 68000 engineers leaving Motorola and founding their own startup.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Sun Apr 26 17:07:18 2026

From Newsgroup: comp.arch

On 26/04/2026 16:53, quadi wrote:

On Sun, 26 Apr 2026 07:30:13 +0000, Thomas Koenig wrote:

Which is not a surprise because the 6502 was developed by people from
the 6800 (not to be confused with the with 68000 :-) development team at
Motorola who had formed their own company.

Unless you mean Freescale, which was spun off by Motorola itself, I don't know what you're referring to. A web search did not turn anything up about 68000 engineers leaving Motorola and founding their own startup.

According to Wikipedia (I have no independent reference) the 6502 design
team at MOS Technology had previously worked at Motorola on the 6800.

<https://en.wikipedia.org/wiki/MOS_Technology_6502>

--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Apr 26 18:12:54 2026

From Newsgroup: comp.arch

On Sun, 26 Apr 2026 14:53:09 -0000 (UTC)
quadi <quadibloc@ca.invalid> wrote:

On Sun, 26 Apr 2026 07:30:13 +0000, Thomas Koenig wrote:

Which is not a surprise because the 6502 was developed by people
from the 6800 (not to be confused with the with 68000 :-)
development team at Motorola who had formed their own company.

Unless you mean Freescale, which was spun off by Motorola itself, I
don't know what you're referring to. A web search did not turn
anything up about 68000 engineers leaving Motorola and founding their
own startup.

John Savard

Read again.

--- Synchronet 3.21f-Linux NewsLink 1.2

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Sun Apr 26 11:49:24 2026

From Newsgroup: comp.arch

On 2026-Apr-26 11:12, Michael S wrote:

On Sun, 26 Apr 2026 14:53:09 -0000 (UTC)
quadi <quadibloc@ca.invalid> wrote:

On Sun, 26 Apr 2026 07:30:13 +0000, Thomas Koenig wrote:

Which is not a surprise because the 6502 was developed by people
from the 6800 (not to be confused with the with 68000 :-)
development team at Motorola who had formed their own company.

Unless you mean Freescale, which was spun off by Motorola itself, I
don't know what you're referring to. A web search did not turn
anything up about 68000 engineers leaving Motorola and founding their
own startup.

John Savard

Read again.

I made the same parsing error.
The smiley face can be confusing as it is also used as a close
parenthesis to the qualifier "not to be confused with 68000"
phrase, making it visually appear the qualifier continues into the
"development team at Motorola who had formed their own company".

Rephrasing:
"Which is not a surprise because the 6502 was developed by people
from the 6800 development team at Motorola who had formed their
own company (not to be confused with the with 68000 :-)"

--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sun Apr 26 18:28:50 2026

From Newsgroup: comp.arch

BGB <cr88192@gmail.com> posted:

On 4/24/2026 9:53 AM, Robert Finch wrote:

------------------

Hrrm:
char float //FP8
short float //Binary16
float //Binary32
long float //48-bit
short double //48-bit (truncated Binary64, align=2)
double //Binary64
short long double //96-bit (truncated Binary128, align=4)
long double //Binary128
long long float //192-bit (truncated Binary256, align=8)
long long double //Binary256

Reminds me of Jumbo Shrimp:: is it a big Shrimp or a little Jumbo ??
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Sun Apr 26 20:31:42 2026

From Newsgroup: comp.arch

On Sun, 26 Apr 2026 17:07:18 +0200, David Brown wrote:

According to Wikipedia (I have no independent reference) the 6502 design
team at MOS Technology had previously worked at Motorola on the 6800.

I knew that, but apparently I misread the sentence, as it seemed to me to
be stating that members of the 68000 design team _also_ went somewhere
else.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Sun Apr 26 20:41:04 2026

From Newsgroup: comp.arch

On Sat, 25 Apr 2026 21:56:22 +0000, quadi wrote:

On Sat, 25 Apr 2026 18:00:22 +0000, MitchAlsup wrote:

BGB <cr88192@gmail.com> posted:

It likely depends on which "tradition" one is coming from.

IBM 360, 1963.

You're early. The IBM System/360 was announced on April 7, 1964.

I decided to add another header type to the architecture. This led to the diagram of the header formats being the file block360.gif, and to it being
704 pixels high.

That seemed to me to be a good omen. (The IBM 704 was a vacuum-tube
computer, IBM's first with core memory, hardware floating-point, and the
first FORTRAN compiler was developed for it, so arguably it ushered in the "modern" era of computing, as opposed to the "pioneer" era.)

But I don't let superstition rule me. It was apparent to me that the list
of block formats had become really complicated - and that it wasn't doing
what I wanted it to do.

But the new block format I had _previously_ added showed me... that there
was something that I really wanted to do that would _not_ have the unacceptable cost of 50% of the opcode space. Since I couldn't indicate whether the last instruction of the block ended there, or continued on,
with only one bit corresponding to each 16-bit area, _this_ type of block header would have to have the restriction that instructions can't cross
block boundaries.

Which means the first 16-bit area following the 16-bit header must begin
an instruction.

Therefore, I only need 14 bits, not 15 bits.

I just have to give up the paired 15-bit instructions, which nobody seems
to like anyways, as part of the basic block-independent instruction set.

Maybe, maybe, maybe, I am close to finding happiness, and can move on from
the preliminary design phase to fleshing out the ISA... but based on past experience, when that does actually happen, it will be a pleasant
*surprise*.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Mon Apr 27 00:34:13 2026

From Newsgroup: comp.arch

On Sun, 26 Apr 2026 20:41:04 +0000, quadi wrote:

Which means the first 16-bit area following the 16-bit header must begin
an instruction.

Therefore, I only need 14 bits, not 15 bits.

I just have to give up the paired 15-bit instructions, which nobody
seems to like anyways, as part of the basic block-independent
instruction set.

I have now updated the pages on the Concertina II architecture to reflect
this latest change.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Mon Apr 27 01:21:36 2026

From Newsgroup: comp.arch

On Sun, 26 Apr 2026 20:41:04 +0000, quadi wrote:

Maybe, maybe, maybe, I am close to finding happiness, and can move on
from the preliminary design phase to fleshing out the ISA... but based
on past experience, when that does actually happen, it will be a
pleasant *surprise*.

And, if I haven't noted it already, the reason that I can even begin to entertain the delusion that I am making some sort of progress, rather than just going around in circles, as all appearances would indicate, is
because in these last few weeks, I feel I have achieved my goal of
squeezing a very large instruction set into limited opcode space to a
greater degree than I had even hoped for previously.

So I am in a position to be more than satisfied with the extent of my
progress towards this fundamental goal of the Concertina II architecture.
All I need is to smooth down a few rough edges, so I can be satisfied that
the architecture is not too extravagantly ugly and inelegant (some degree
of ugliness and inelegancy, of course, must be tolerated as a necessary
means to achieve my unreasonably ambitious goals) and so on, and then I
should be able to move on to the next phase.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Mon Apr 27 08:14:04 2026

From Newsgroup: comp.arch

On 26/04/2026 22:31, quadi wrote:

On Sun, 26 Apr 2026 17:07:18 +0200, David Brown wrote:

According to Wikipedia (I have no independent reference) the 6502 design
team at MOS Technology had previously worked at Motorola on the 6800.

I knew that, but apparently I misread the sentence, as it seemed to me to
be stating that members of the 68000 design team _also_ went somewhere
else.

I can see how you could have made that misreading. But I'm glad you did
- it made me look up the history of the 6502 and learn a little more.
It's a device I used a lot as a kid in the BBC Micro, and one of the
first processors I programmed in assembly, so it has nostalgic interest
for me.

--- Synchronet 3.21f-Linux NewsLink 1.2

From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Sun Apr 26 23:43:48 2026

From Newsgroup: comp.arch

On 4/26/2026 6:21 PM, quadi wrote:

On Sun, 26 Apr 2026 20:41:04 +0000, quadi wrote:

Maybe, maybe, maybe, I am close to finding happiness, and can move on
from the preliminary design phase to fleshing out the ISA... but based
on past experience, when that does actually happen, it will be a
pleasant *surprise*.

And, if I haven't noted it already, the reason that I can even begin to entertain the delusion that I am making some sort of progress, rather than just going around in circles, as all appearances would indicate, is
because in these last few weeks, I feel I have achieved my goal of
squeezing a very large instruction set into limited opcode space to a
greater degree than I had even hoped for previously.

I don't think anyone else designing a CPU had the goal of "a very large instruction set". But if that was your goal, I think you have achieved
it! :-(
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Mon Apr 27 09:06:25 2026

From Newsgroup: comp.arch

On Sun, 26 Apr 2026 23:43:48 -0700, Stephen Fuld wrote:

I don't think anyone else designing a CPU had the goal of "a very large instruction set". But if that was your goal, I think you have achieved
it! :-(

Well, nobody else may have had it as a _goal_. But others have certainly
also _achieved_ that goal, even if they had never set it. The IBM System/
360 and its descendants are a case in point.

My intention was not to have a ridiculously large instruction set that is
not comparable to those of existing computers; instead, it is to have one
that is perhaps a bit larger than any existing computer, because it
combines certain things from more than one architecture.

Specifically:

- like the 68000 and the x86, memory-reference instructions are to have 16-
bit displacements.
- like most RISC architectures, the register banks will include 32
registers each. There will be no register that always contains zero, but register zero can appear to be zero for specific purposes such as indexing.
- there will be full base-index addressing, like on the System/360.
- the instruction set will combine the capabilities of the System/360 and
the Cray I.

And that was to be done, as far as possible, without making the
instructions involved longer than their counterparts on the IBM System/
360. That goal was not _strictly_ met, as it was an impossible goal, but
it was approached. 16-bit short instructions are limited in capability compared to their counterparts on the 360; to fully equal them, 18-bit instructions, which require the overhead of a block header, are needed.
Also, to fully equal the 48-bit string and packed decimal instructions of
the 360, 64-bit instructions are required.

A large instruction set, however, was a _requirement_, not a goal.
Squeezing it into the available opcode space provided by not allowing the
size of instructions to explode, making code significantly less compact
than on the 360... *that* was the goal.

I hope this has clarified my design philosophy.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Mon Apr 27 10:54:08 2026

From Newsgroup: comp.arch

On Mon, 27 Apr 2026 00:34:13 +0000, quadi wrote:

On Sun, 26 Apr 2026 20:41:04 +0000, quadi wrote:

Which means the first 16-bit area following the 16-bit header must
begin an instruction.

Therefore, I only need 14 bits, not 15 bits.

I just have to give up the paired 15-bit instructions, which nobody
seems to like anyways, as part of the basic block-independent
instruction set.

I have now updated the pages on the Concertina II architecture to
reflect this latest change.

Another side effect of this change, though, was that while I gained a 16-
bit header for variable-length instructions, I lost the 32-bit header for variable-length instructions that allowed the 17-bit short instructions.

I have figured out a reasonable way to bring that 32-bit header back at
what I felt was an acceptable cost, so the pages have now been revised to include that latest change.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From John Levine@johnl@taugh.com to comp.arch on Mon Apr 27 17:16:24 2026

From Newsgroup: comp.arch

According to Stephen Fuld <sfuld@alumni.cmu.edu.invalid>:

I don't think anyone else designing a CPU had the goal of "a very large >instruction set". But if that was your goal, I think you have achieved
it! :-(

Oh, I dunno. Look at the IBM 7030 STRETCH.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Mon Apr 27 19:27:05 2026

From Newsgroup: comp.arch

quadi <quadibloc@ca.invalid> posted:

On Sun, 26 Apr 2026 23:43:48 -0700, Stephen Fuld wrote:

I don't think anyone else designing a CPU had the goal of "a very large instruction set". But if that was your goal, I think you have achieved
it! :-(

Well, nobody else may have had it as a _goal_. But others have certainly also _achieved_ that goal, even if they had never set it. The IBM System/
360 and its descendants are a case in point.

My intention was not to have a ridiculously large instruction set that is not comparable to those of existing computers; instead, it is to have one that is perhaps a bit larger than any existing computer, because it
combines certain things from more than one architecture.

Specifically:

- like the 68000 and the x86, memory-reference instructions are to have 16- bit displacements.
- like most RISC architectures, the register banks will include 32
registers each. There will be no register that always contains zero, but register zero can appear to be zero for specific purposes such as indexing.
- there will be full base-index addressing, like on the System/360.
- the instruction set will combine the capabilities of the System/360 and the Cray I.

You might want to skip up to CRAY-XMP because they got the scatter/gather memory reference instructions.

And that was to be done, as far as possible, without making the
instructions involved longer than their counterparts on the IBM System/
360. That goal was not _strictly_ met, as it was an impossible goal, but
it was approached. 16-bit short instructions are limited in capability compared to their counterparts on the 360; to fully equal them, 18-bit instructions, which require the overhead of a block header, are needed. Also, to fully equal the 48-bit string and packed decimal instructions of the 360, 64-bit instructions are required.

A large instruction set, however, was a _requirement_, not a goal.
Squeezing it into the available opcode space provided by not allowing the size of instructions to explode, making code significantly less compact
than on the 360... *that* was the goal.

I hope this has clarified my design philosophy.

John Savard

--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Mon Apr 27 23:55:14 2026

From Newsgroup: comp.arch

On Mon, 27 Apr 2026 19:27:05 +0000, MitchAlsup wrote:

quadi <quadibloc@ca.invalid> posted:

- the instruction set will combine the capabilities of the System/360
and the Cray I.

You might want to skip up to CRAY-XMP because they got the
scatter/gather memory reference instructions.

Actually, I might indeed, but at this stage I wasn't feeling the need to
be too specific.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Tue Apr 28 02:33:03 2026

From Newsgroup: comp.arch

On Mon, 27 Apr 2026 10:54:08 +0000, quadi wrote:

I have figured out a reasonable way to bring that 32-bit header back at
what I felt was an acceptable cost, so the pages have now been revised
to include that latest change.

I have made another change to the Concertina II ISA basic design.

This time, I thought of a nifty feature that ought to be added. So I made
this change with some trepidation, since I was worried trying to do this
would make the whole thing fall apart.

But it turned out that I did have the opcode space for headers which
allowed me to easily insert the addition, and other things also worked
well enough that it was simple to make the addition without causing
problems for other parts of the architecture.

The addition? Now, in the header that provides VLIW features - now there
is only _one_ such header, I have eliminated the ability to associate VLIW features with the variable-length instructions - it is possible to
indicate the use of an alternate instruction set.

This alternate instruction set is the regular instruction set with two modifications. The paired short instructions now take up 50% of the opcode space instead of 25%. So there are no restrictions on which registers may
be used; but one instruction in a pair is always integer, and the other is floating-point. In order to make room for that, the load/store memory- reference instructions now only act on aligned operands when the alternate instruction set is specified.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Mon Apr 27 22:08:12 2026

From Newsgroup: comp.arch

On 4/27/2026 10:16 AM, John Levine wrote:

According to Stephen Fuld <sfuld@alumni.cmu.edu.invalid>:

I don't think anyone else designing a CPU had the goal of "a very large
instruction set". But if that was your goal, I think you have achieved
it! :-(

Oh, I dunno. Look at the IBM 7030 STRETCH.

Thanks for the pointer, John. I did look at it a little. It seems like
a wild machine! Caveat - my comments below are not from an exhaustive
study of the ISA.

But the big difference between the 7030 and John's system is that for
the 7030, the huge multiplicity is in the number of data formats
supported, not the instructions. The 7030 has only two instruction
lengths, 32 and 64 bits, and, as far as I can tell, no instruction
headers for blocks of instructions. And the complexity of different
data formats seems put in various "modifier" bits in the instruction,
not in the op code. The 7030 uses the same "trick" that Mitch uses, eliminating the subtract op code by having sign modifier bits in the non
op code part of the instruction.

So, while I certainly haven't done the count, I suspect John's ISA has
far more op codes (especially if you count the same operation in
different length instructions as different) than the 7030.
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Tue Apr 28 14:18:30 2026

From Newsgroup: comp.arch

On Mon, 27 Apr 2026 22:08:12 -0700, Stephen Fuld wrote:

But the big difference between the 7030 and John's system is that for
the 7030, the huge multiplicity is in the number of data formats
supported, not the instructions. The 7030 has only two instruction
lengths, 32 and 64 bits, and, as far as I can tell, no instruction
headers for blocks of instructions. And the complexity of different
data formats seems put in various "modifier" bits in the instruction,
not in the op code.

Yes; in fairness to the IBM 7030, in the IBM 360 one doesn't count the MVC instruction as 256 different instructions.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Tue Apr 28 17:48:09 2026

From Newsgroup: comp.arch

quadi <quadibloc@ca.invalid> posted:

On Mon, 27 Apr 2026 22:08:12 -0700, Stephen Fuld wrote:

But the big difference between the 7030 and John's system is that for
the 7030, the huge multiplicity is in the number of data formats
supported, not the instructions. The 7030 has only two instruction lengths, 32 and 64 bits, and, as far as I can tell, no instruction
headers for blocks of instructions. And the complexity of different
data formats seems put in various "modifier" bits in the instruction,
not in the op code.

Yes; in fairness to the IBM 7030, in the IBM 360 one doesn't count the MVC instruction as 256 different instructions.

Or LDM as up to 256 instructions, either.
STM, ...

And how many instructions should one attribute to SIO ??

John Savard

--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Tue Apr 28 21:04:36 2026

From Newsgroup: comp.arch

On Tue, 28 Apr 2026 02:33:03 +0000, quadi wrote:

The addition? Now, in the header that provides VLIW features - now there
is only _one_ such header, I have eliminated the ability to associate
VLIW features with the variable-length instructions - it is possible to indicate the use of an alternate instruction set.

For clarification: I reduced the number of headers which provide VLIW
features to only one somewhat earlier, not at the same time as when the alterate instruction set was added.

Now I have made another change - there was enough opcode space available
for headers that I was able to make the alternate instruction set
available with the two other types of header that do not indicate variable- length instructions.

I am in no rush to add to the alternate instruction set special additional instructions that start with 0110, 01110, and 011110, all that opcode
space being free in the alternate instruction set, but it's good to know
that such reserve capacity is conveniently available.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Wed Apr 29 17:36:46 2026

From Newsgroup: comp.arch

When I changed the instruction set so that I could have variable-length instructions in a block - with some restrictions - with a header that was
only 16 bits long, I gave up one important thing.

The paired 15-bit short instructions, while they could still be used in instruction slots other than the first one without a header, couldn't be
used in the first one. So neither they, nor any other version of short instructions, could be used in code that ignored block boundaries.

Well, I have now dropped one header type and added in a limited and
restricted form of paired short instructions that _can_ always be used in
any instruction slot, ignoring block boundaries.

Oops, it can only be used in the first slot because it conflicts with the paired 15-bit short instructions! What will be the best way to fix that...

John Savard

--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Wed Apr 29 21:08:57 2026

From Newsgroup: comp.arch

On Wed, 29 Apr 2026 17:36:46 +0000, quadi wrote:

Oops, it can only be used in the first slot because it conflicts with
the paired 15-bit short instructions! What will be the best way to fix that...

Finding no satisfactory way to make a paired short instruction that fits always, first I corrected my addition to only be for 32-bit mode, where
there is no collision, and then I removed the addition entirely - without removing what I subtracted - because _that_ was unimportant enough, the
large area of opcode space for headers that removing it freed up ought
instead to be left for something more badly needed, when such a thing is found.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Mon Apr 27 16:29:34 2026

From Newsgroup: comp.arch

quadi [2026-04-27 09:06:25] wrote:

On Sun, 26 Apr 2026 23:43:48 -0700, Stephen Fuld wrote:

I don't think anyone else designing a CPU had the goal of "a very large
instruction set". But if that was your goal, I think you have achieved
it! :-(

Well, nobody else may have had it as a _goal_. But others have certainly also _achieved_ that goal, even if they had never set it. The IBM System/
360 and its descendants are a case in point.

Arguably the Itanic was designed with such a goal as well where the
"size" was measured as a kind of "patentability".

=== Stefan
--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Thu Apr 30 10:41:01 2026

From Newsgroup: comp.arch

On Mon, 27 Apr 2026 16:29:34 -0400
Stefan Monnier <monnier@iro.umontreal.ca> wrote:

quadi [2026-04-27 09:06:25] wrote:

On Sun, 26 Apr 2026 23:43:48 -0700, Stephen Fuld wrote:

I don't think anyone else designing a CPU had the goal of "a very
large instruction set". But if that was your goal, I think you
have achieved it! :-(

Well, nobody else may have had it as a _goal_. But others have
certainly also _achieved_ that goal, even if they had never set it.
The IBM System/ 360 and its descendants are a case in point.

Arguably the Itanic was designed with such a goal as well where the
"size" was measured as a kind of "patentability".

=== Stefan

But measured by more conventional definition of size, IPF instruction
set is pretty small, esp. when compared to iAMD64 or ARM64. I'd think
that even POWER is at least twice bigger than IPF.

--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Thu Apr 30 16:41:35 2026

From Newsgroup: comp.arch

Based on the ways I found to squeeze in more opcode space, I've decided to
go back to an earlier version because I believe I can now achieve what I
had been unable to do before: have both the four-bit prefix portion of a
block header for variable length instructions, and allow the original 15-
bit paired short instructions in any instruction slot. The primary step required to achieve this was to determine that, while most compromises for
the Load Address instruction were not acceptable, _one_ compromise _was_ acceptable, letting me grab the opcode space I needed.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Thu Apr 30 22:59:18 2026

From Newsgroup: comp.arch

Now, finally, I have been able to combine:

- Basic memory-reference instructions without some strange restriction on addressing modes,
- Paired 15-bit short instructions that are able to appear in any slot,
- A 32-bit header for variable-length instructions that has two prefix
bits for every 16 bit slot remaining

It has taken a long time to find just the right way to squish everything
in.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

Who's Online

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,116
Nodes:	10 (0 / 10)
Uptime:	85:27:12
Calls:	14,305
Files:	186,338
D/L today:	647 files (184M bytes)
Messages:	2,525,478

Concertina II: Finding Happiness Through Coding

Who's Online

System Info