I was not happy that when I did not use a block prefix, I had to omit the Load Medium and Store Medium instructions from the basic load/store instructions.
I searched for available opcode space.--- Synchronet 3.21f-Linux NewsLink 1.2
I found a little; enough for the _other_ block prefixes. But not a full
1/16 of the opcode space which is what the Type I header needed. Where did
I find it? In the opcodes for operate instructions which the 15-bit paired short instructions don't use.
So I thought that perhaps I could shrink the requirements of the Type I header. If, by making use of the fact that 10 (start of 32-bit or longer instruction) can only be followed by 11 (not the start of an instruction), then maybe I could replace four consecutive two-bit prefixes by one seven- bit prefix.
But alas, this fact only reduced the possibilities to 81 + 27 + 27 + 1, which is 136, which is greater than 128.
However, if I made use of the fact that I would know if the preceding 16-
bit zone began a 32-bit instruction, and added certain other restrictions
on the allowed combinations - by insisting that all pseudo-immediates be tidily put at the end of the block - I thought I was able to squeeze it in.
This may be a step too far, so I've saved everything if I need to go back.
John Savard
quadi <quadibloc@ca.invalid> posted:
I was not happy that when I did not use a block prefix, I had to omit
the Load Medium and Store Medium instructions from the basic load/store
instructions.
Is LD Medium obtaining a sooth sayer from memory?
Is ST Medium putting a sooth sayer back in memory?
How do you know a sooth sayer fits in 2^(3+n) bytes???
On Wed, 22 Apr 2026 18:15:17 +0000, MitchAlsup wrote:
quadi <quadibloc@ca.invalid> posted:
I was not happy that when I did not use a block prefix, I had to omit
the Load Medium and Store Medium instructions from the basic load/store
instructions.
Is LD Medium obtaining a sooth sayer from memory?
Is ST Medium putting a sooth sayer back in memory?
How do you know a sooth sayer fits in 2^(3+n) bytes???
No, I am not referring to one who channels the spirits of the dead.
Instead, the Medium data type refers to 48-bit floating-point values; although not part of the IEEE 754 standard, they follow the pattern of the types defined in it. They offer a precision just above 11 decimal digits,
and an exponent range that exceeds 10 to plus or minus 99, thus
approximating the numbers pocket calculators make available.
On 4/22/2026 10:35 PM, quadi wrote:
On Wed, 22 Apr 2026 18:15:17 +0000, MitchAlsup wrote:
quadi <quadibloc@ca.invalid> posted:
I was not happy that when I did not use a block prefix, I had to omit
the Load Medium and Store Medium instructions from the basic load/store >>>> instructions.
Is LD Medium obtaining a sooth sayer from memory?
Is ST Medium putting a sooth sayer back in memory?
How do you know a sooth sayer fits in 2^(3+n) bytes???
No, I am not referring to one who channels the spirits of the dead.
Instead, the Medium data type refers to 48-bit floating-point values;
although not part of the IEEE 754 standard, they follow the pattern of
the
types defined in it. They offer a precision just above 11 decimal digits,
and an exponent range that exceeds 10 to plus or minus 99, thus
approximating the numbers pocket calculators make available.
Ironically, I had considered an intermediate format a few times, mostly represented as the Binary64 format with the low-order bits cut off.
Mostly hadn't amounted to much.
I did end up experimenting with support for a very niche converter:
(31:0) => (63:0)
As:
(31:4), (11:4), (11:4), (11:4), (11:4), (3:0)
Currently only available in an Imm32 instruction.
Seemingly, this pattern can deal with roughly 2/3 of the FPU constants
that miss as Binary16:
Multiples of 1/3, 1/5 and similar hit with this.
It fails for patterns like 1/7, 1/9, ... or similar, which have a
different bit pattern length (pattern doesn't repeat along an 8-bit spacing).
Patterns like 1/7, 1/9, ... could be instead addressed with a pattern
that repeats on a multiple of 12 bits. But, this sort of thing is
getting a bit niche (would need different patterns to deal with
different fractions).
But, is a relatively affordable way to deal with this pattern; even if
it can't be crammed into a small size in the same way as simple BFP
patterns (and encoding an index into a table of possible patterns wont
save much over expressing the pattern directly).
Also, the 12-bit pattern case can be noted to miss more with patterns
that would hit with 8-bit or with binary16 (the 8-bit pattern case
mostly overlaps as well with the area covered by Binary16). A 6-bit
pattern could still overlap with Binary16's range, but would be more
limited in the fractions it can deal with.
Only really relevant for constant values though (as a live FP format,
would be worse than normal BFP).
Though, can make use of the extra bit left over from the Imm32f
encodings (which are actually stored as Imm33). More a debate though of
if it is worth the non-zero additional LUT cost to do so.
But, this combination would leave, statistically:
Imm16f: 63%
Imm6f 25% (S.E3.M2)
Imm32fu: 71% (8% over 63%, simply Binary64 truncated to 32 bits)
Imm32fn: 88% (25% hit rate over 63%, 8-bit pattern from above)
...
While Imm32fn has a higher hit rate than Imm32fu, they have a non-
overlap, so the combined Imm32fun in this case seems to have around a
96% hit-rate, with around 4% in the "miss" category (irrational
constants, and stuff like 1/7 which has a 3 bit repeating pattern, vs 2-
bit for 1/3 and 1/5).
If I added the 12-bit pattern (in addition to the existing two), could
maybe push it up to around a 97% or 98% hit rate, but the 12-bit pattern
by itself has a lower hit-rate than simply truncating the Binary64 value
to 32 bits, or even Binary16. So, selecting between 8b+12b pattern would
do worse than trunc32 + 8b pattern.
But, dunno.
However, the relative usage of floating point immediate values is low
enough that this doesn't make a big impact on code density.
Not much more "low hanging fruit" for improving code density ATM, but it seems like if I could squeeze out a few more percent on overall code density, it could put XG3 more solidly in the lead vs RV64GC+JX (where, right now it is pretty close and which one wins/loses depends a lot on
the program being tested).
...
quadi <quadibloc@ca.invalid> posted:
I was not happy that when I did not use a block prefix, I had to omit the >> Load Medium and Store Medium instructions from the basic load/store
instructions.
Is LD Medium obtaining a sooth sayer from memory?
Is ST Medium putting a sooth sayer back in memory?
Obviously, this refers to steaks.
On Fri, 24 Apr 2026 05:29:12 +0000, Thomas Koenig wrote:
Obviously, this refers to steaks.
In a higher-level language, one has:
Real
Intermediate
Double Precision
Extended
But in Assembler, one needs
Floating
Medium
Double
Extended
because R for Real can be confused with R for Register, and I for Intermediate can be confused with I for Integer.
John Savard
On Fri, 24 Apr 2026 05:29:12 +0000, Thomas Koenig wrote:
Obviously, this refers to steaks.
In a higher-level language, one has:
Real
Intermediate
Double Precision
Extended
But in Assembler, one needs
Floating
Medium
Double
Extended
because R for Real can be confused with R for Register, and I for Intermediate can be confused with I for Integer.
John Savard
What about triple and quad precision? Or extended triple precision?
This may be a step too far, so I've saved everything if I need to go
back.
On 4/24/2026 7:01 AM, quadi wrote:--------------------
I went with:
H: Half
F/S: Float or Single
D: Double
X: 128-bit (beyond this depends on context)
RV used Q for Binary128, but Q was more widely used for Int64 in my naming.
Int naming:
B/SB/UB: Byte
W/SW/UW: Int16 ("word")
L/SL/UL: Int32 ("long")
T/ST/UT: Int48 ("tword" / triple word), short lived
Q: Int64 ("qword")
RV had used:
B/H/{W|S}/D/Q
John Savard
Obviously, this refers to steaks.I was not happy that when I did not use a block prefix, I had to omit the >>> Load Medium and Store Medium instructions from the basic load/storeIs LD Medium obtaining a sooth sayer from memory?
instructions.
Is ST Medium putting a sooth sayer back in memory?
Obviously, this refers to steaks.I was not happy that when I did not use a block prefix, I had to omit the >>>> Load Medium and Store Medium instructions from the basic load/storeIs LD Medium obtaining a sooth sayer from memory?
instructions.
Is ST Medium putting a sooth sayer back in memory?
But these operations are too rare to include in usual ISAs,
Stefan Monnier <monnier@iro.umontreal.ca> schrieb:
Obviously, this refers to steaks.I was not happy that when I did not use a block prefix, I had to omit theIs LD Medium obtaining a sooth sayer from memory?
Load Medium and Store Medium instructions from the basic load/store >>>> instructions.
Is ST Medium putting a sooth sayer back in memory?
But these operations are too rare to include in usual ISAs,
Well done!
So what to do? What I've been doing all along in this design process -
move the compromise somewhere else, and see if I can put up with it. So
now I've decided to take the 32-bit header for variable-length
instructions, and put the compromise there.
This direction of thinking suggests... that I use some of the opcode
space I still do have free... for special 64-bit instructions that are available without a header. This has been done before in previous
Concertina II iterations. Emergency long instructions - inefficient
because _both_ 32- bit words of the instruction have to begin with 9 or
so overhead bits to indicate they belong to such an instruction... but
less inefficient than adding a whole 32-bit header to the block if you
just need one of them in the block.
That way, I can add lots of extra instructions to be part of the basic headerless instruction set.
BGB <cr88192@gmail.com> posted:
On 4/24/2026 7:01 AM, quadi wrote:--------------------
I went with:
H: Half
F/S: Float or Single
D: Double
X: 128-bit (beyond this depends on context)
RV used Q for Binary128, but Q was more widely used for Int64 in my naming. >>
Int naming:
B/SB/UB: Byte
W/SW/UW: Int16 ("word")
L/SL/UL: Int32 ("long")
T/ST/UT: Int48 ("tword" / triple word), short lived
Q: Int64 ("qword")
RV had used:
B/H/{W|S}/D/Q
This is what I use. Except I have signed and unsigned integer
arithmetic {B, BS, H, HS, W, WS, D} integers and {H, S, D}
floats.
John Savard
On 4/24/2026 1:52 PM, MitchAlsup wrote:------------------
This is what I use. Except I have signed and unsigned integer
arithmetic {B, BS, H, HS, W, WS, D} integers and {H, S, D}
floats.
It likely depends on which "tradition" one is coming from.
Well, and I guess one could try to argue the merits of, say:
0x1234
$1234
1234H
&H1234
#0x1234
#$1234
16'h1234
...
And, say:
(R10, 16)
16(R10)
[R10+16]
[R10,16]
...
BGB <cr88192@gmail.com> posted:
On 4/24/2026 1:52 PM, MitchAlsup wrote:------------------
This is what I use. Except I have signed and unsigned integer
arithmetic {B, BS, H, HS, W, WS, D} integers and {H, S, D}
floats.
It likely depends on which "tradition" one is coming from.
IBM 360, 1963.
------------------
Well, and I guess one could try to argue the merits of, say:
0x1234
$1234
1234H
&H1234
#0x1234
#$1234
16'h1234
Use C notation when possible.
...
And, say:
(R10, 16)
16(R10)
[R10+16]
[R10,16]
...
The [] notations tell ASM that the instruction has to be a
memory reference, the () notations do not.
On 4/24/2026 1:52 PM, MitchAlsup wrote:
BGB <cr88192@gmail.com> posted:
On 4/24/2026 7:01 AM, quadi wrote:--------------------
I went with:
H: Half
F/S: Float or Single
D: Double
X: 128-bit (beyond this depends on context)
RV used Q for Binary128, but Q was more widely used for Int64 in my
naming.
Int naming:
B/SB/UB: Byte
W/SW/UW: Int16 ("word")
L/SL/UL: Int32 ("long")
T/ST/UT: Int48 ("tword" / triple word), short lived
Q: Int64 ("qword")
RV had used:
B/H/{W|S}/D/Q
This is what I use. Except I have signed and unsigned integer
arithmetic {B, BS, H, HS, W, WS, D} integers and {H, S, D}
floats.
It likely depends on which "tradition" one is coming from.
In my case, I was coming from SH-4 and x86.
SH-4 was B/W/L (likewise for M68K and i386 syntax in GAS).
Though, differs from M68K and "i386" in various ways
(eg, no "%" on registers, ...).
Well, and 0x1234 vs $1234 or similar.
Eg: "mov 0x1234, r10" vs "mov #$1234, %d4"
But, seems even within GAS usage, this was inconsistent.
Q/X: from x86 (though x86 also used DQ instead of X for some ops).
At present, it seems like 'X' may have been a mistake (well, along with trying to use both sets of mnemonics and then trying to auto-detect the
ASM style).
Though, there is still the problem that there is no good or fully
reliable way to tell the which ASM syntax is in use (and, neither
annotates it, and since both evolved from variants of GAS ASM syntax, it makes it harder).
...
Well, and I guess one could try to argue the merits of, say:
0x1234
$1234
1234H
&H1234
#0x1234
#$1234
16'h1234
...
And, say:
(R10, 16)
16(R10)
[R10+16]
[R10,16]
...
Otherwise:
New PSU showed up, and is installed, and main PC is working again.
Decided to test the new decimal packing schemes against the "bulk
scavenged FP constants" test, results currently for this test;
Binary16 hit rate : 63.7%
Truncated to 32 bits: 66.9%
Packing, 8b-A: 73.9%
Packing, 8b-B: 62.5%
Packing, 12b : 61.3%
T32 + 8b-B + 12b: 77.2%
T32 + 8b-A: 76.9%
This is lower than my earlier estimates based on my smaller scale tests.
Where, as noted, unpacking patters:
Fp16: (15:14), (14) ? 6'h00 : 6'h3F, (13:0), 42'h0
T32: (31:0), 32'h0
8b-A: (31:4), (11:4), (11:4), (11:4), (11:4), (3:0)
8b-B: 1'b0, (30) ? 5'h10 : 5'h0F, (29:4),
(11:4), (11:4), (11:4), (11:8), (3:0)
12: 1'b0, (30) ? 5'h10 : 5'h0F, (29:4),
(15:4), (15:4), (15:12), (3:0)
The T32 + 8b-A case has nearly the same hit rate, but is cheaper (and,
is also what I had already implemented experimentally).
While T32 + 8B-A + 12b could potentially give the highest hit rate, this combination would also be the most expensive. And, without the exponent trickery, the hit-rate for 12b will suck.
But, as-is, would be exclusive to XG3 (XG1/XG2/RV being limited to the
Fp16 case for FPU immediate forms).
Still debatable if worth the costs (while it is improvement in hit rate,
it is also a bit of a corner case).
...
John Savard
BGB <cr88192@gmail.com> posted:
It likely depends on which "tradition" one is coming from.
IBM 360, 1963.
On 2026-04-24 8:01 a.m., quadi wrote:
On Fri, 24 Apr 2026 05:29:12 +0000, Thomas Koenig wrote:
Obviously, this refers to steaks.
In a higher-level language, one has:
Real
Intermediate
Double Precision
Extended
But in Assembler, one needs
Floating
Medium
Double
Extended
because R for Real can be confused with R for Register, and I for
Intermediate can be confused with I for Integer.
John Savard
What about triple and quad precision? Or extended triple precision?
For Arpl at one point the float precision could be specified a bit like bitfields are specified in ‘C’ as in:
Float:8 myvar;
Changed it though to standard types as it was undesirable to support any bit-length for floats which would have to be done with software. Now it
is just:
float byte myvar;
float quad qvar;
Can also use shorter form for some types like:
double dvar;
Instead of having to type ‘float double dvar;’
Some float approximations will supply around 7 bits which works well to
fill in the significand for the progression of 16, 32, 64, 128-bit floats.
Having a 48-bit float type likely does not save any processing time over
a 64-bit type. It is more a matter of storage space.
48-bit floats in arrays may slow down indexed addressing; scaled index address modes are usually a power of two.
Stylistically, I think the 6502 ASM notation was influenced by Motorola (though differs somewhat from M68K notation).
Which is not a surprise because the 6502 was developed by people from
the 6800 (not to be confused with the with 68000 :-) development team at Motorola who had formed their own company.
On Sun, 26 Apr 2026 07:30:13 +0000, Thomas Koenig wrote:
Which is not a surprise because the 6502 was developed by people from
the 6800 (not to be confused with the with 68000 :-) development team at
Motorola who had formed their own company.
Unless you mean Freescale, which was spun off by Motorola itself, I don't know what you're referring to. A web search did not turn anything up about 68000 engineers leaving Motorola and founding their own startup.
On Sun, 26 Apr 2026 07:30:13 +0000, Thomas Koenig wrote:
Which is not a surprise because the 6502 was developed by people
from the 6800 (not to be confused with the with 68000 :-)
development team at Motorola who had formed their own company.
Unless you mean Freescale, which was spun off by Motorola itself, I
don't know what you're referring to. A web search did not turn
anything up about 68000 engineers leaving Motorola and founding their
own startup.
John Savard
On Sun, 26 Apr 2026 14:53:09 -0000 (UTC)
quadi <quadibloc@ca.invalid> wrote:
On Sun, 26 Apr 2026 07:30:13 +0000, Thomas Koenig wrote:
Which is not a surprise because the 6502 was developed by people
from the 6800 (not to be confused with the with 68000 :-)
development team at Motorola who had formed their own company.
Unless you mean Freescale, which was spun off by Motorola itself, I
don't know what you're referring to. A web search did not turn
anything up about 68000 engineers leaving Motorola and founding their
own startup.
John Savard
Read again.
On 4/24/2026 9:53 AM, Robert Finch wrote:------------------
Hrrm:
char float //FP8
short float //Binary16
float //Binary32
long float //48-bit
short double //48-bit (truncated Binary64, align=2)
double //Binary64
short long double //96-bit (truncated Binary128, align=4)
long double //Binary128
long long float //192-bit (truncated Binary256, align=8)
long long double //Binary256
According to Wikipedia (I have no independent reference) the 6502 design
team at MOS Technology had previously worked at Motorola on the 6800.
On Sat, 25 Apr 2026 18:00:22 +0000, MitchAlsup wrote:
BGB <cr88192@gmail.com> posted:
It likely depends on which "tradition" one is coming from.
IBM 360, 1963.
You're early. The IBM System/360 was announced on April 7, 1964.
Which means the first 16-bit area following the 16-bit header must begin
an instruction.
Therefore, I only need 14 bits, not 15 bits.
I just have to give up the paired 15-bit instructions, which nobody
seems to like anyways, as part of the basic block-independent
instruction set.
Maybe, maybe, maybe, I am close to finding happiness, and can move on
from the preliminary design phase to fleshing out the ISA... but based
on past experience, when that does actually happen, it will be a
pleasant *surprise*.
On Sun, 26 Apr 2026 17:07:18 +0200, David Brown wrote:
According to Wikipedia (I have no independent reference) the 6502 design
team at MOS Technology had previously worked at Motorola on the 6800.
I knew that, but apparently I misread the sentence, as it seemed to me to
be stating that members of the 68000 design team _also_ went somewhere
else.
On Sun, 26 Apr 2026 20:41:04 +0000, quadi wrote:
Maybe, maybe, maybe, I am close to finding happiness, and can move on
from the preliminary design phase to fleshing out the ISA... but based
on past experience, when that does actually happen, it will be a
pleasant *surprise*.
And, if I haven't noted it already, the reason that I can even begin to entertain the delusion that I am making some sort of progress, rather than just going around in circles, as all appearances would indicate, is
because in these last few weeks, I feel I have achieved my goal of
squeezing a very large instruction set into limited opcode space to a
greater degree than I had even hoped for previously.
I don't think anyone else designing a CPU had the goal of "a very large instruction set". But if that was your goal, I think you have achieved
it! :-(
On Sun, 26 Apr 2026 20:41:04 +0000, quadi wrote:
Which means the first 16-bit area following the 16-bit header must
begin an instruction.
Therefore, I only need 14 bits, not 15 bits.
I just have to give up the paired 15-bit instructions, which nobody
seems to like anyways, as part of the basic block-independent
instruction set.
I have now updated the pages on the Concertina II architecture to
reflect this latest change.
I don't think anyone else designing a CPU had the goal of "a very large >instruction set". But if that was your goal, I think you have achieved
it! :-(
On Sun, 26 Apr 2026 23:43:48 -0700, Stephen Fuld wrote:
I don't think anyone else designing a CPU had the goal of "a very large instruction set". But if that was your goal, I think you have achieved
it! :-(
Well, nobody else may have had it as a _goal_. But others have certainly also _achieved_ that goal, even if they had never set it. The IBM System/
360 and its descendants are a case in point.
My intention was not to have a ridiculously large instruction set that is not comparable to those of existing computers; instead, it is to have one that is perhaps a bit larger than any existing computer, because it
combines certain things from more than one architecture.
Specifically:
- like the 68000 and the x86, memory-reference instructions are to have 16- bit displacements.
- like most RISC architectures, the register banks will include 32
registers each. There will be no register that always contains zero, but register zero can appear to be zero for specific purposes such as indexing.
- there will be full base-index addressing, like on the System/360.
- the instruction set will combine the capabilities of the System/360 and the Cray I.
And that was to be done, as far as possible, without making the--- Synchronet 3.21f-Linux NewsLink 1.2
instructions involved longer than their counterparts on the IBM System/
360. That goal was not _strictly_ met, as it was an impossible goal, but
it was approached. 16-bit short instructions are limited in capability compared to their counterparts on the 360; to fully equal them, 18-bit instructions, which require the overhead of a block header, are needed. Also, to fully equal the 48-bit string and packed decimal instructions of the 360, 64-bit instructions are required.
A large instruction set, however, was a _requirement_, not a goal.
Squeezing it into the available opcode space provided by not allowing the size of instructions to explode, making code significantly less compact
than on the 360... *that* was the goal.
I hope this has clarified my design philosophy.
John Savard
quadi <quadibloc@ca.invalid> posted:
- the instruction set will combine the capabilities of the System/360
and the Cray I.
You might want to skip up to CRAY-XMP because they got the
scatter/gather memory reference instructions.
I have figured out a reasonable way to bring that 32-bit header back at
what I felt was an acceptable cost, so the pages have now been revised
to include that latest change.
According to Stephen Fuld <sfuld@alumni.cmu.edu.invalid>:
I don't think anyone else designing a CPU had the goal of "a very large
instruction set". But if that was your goal, I think you have achieved
it! :-(
Oh, I dunno. Look at the IBM 7030 STRETCH.
But the big difference between the 7030 and John's system is that for
the 7030, the huge multiplicity is in the number of data formats
supported, not the instructions. The 7030 has only two instruction
lengths, 32 and 64 bits, and, as far as I can tell, no instruction
headers for blocks of instructions. And the complexity of different
data formats seems put in various "modifier" bits in the instruction,
not in the op code.
On Mon, 27 Apr 2026 22:08:12 -0700, Stephen Fuld wrote:
But the big difference between the 7030 and John's system is that for
the 7030, the huge multiplicity is in the number of data formats
supported, not the instructions. The 7030 has only two instruction lengths, 32 and 64 bits, and, as far as I can tell, no instruction
headers for blocks of instructions. And the complexity of different
data formats seems put in various "modifier" bits in the instruction,
not in the op code.
Yes; in fairness to the IBM 7030, in the IBM 360 one doesn't count the MVC instruction as 256 different instructions.
John Savard--- Synchronet 3.21f-Linux NewsLink 1.2
The addition? Now, in the header that provides VLIW features - now there
is only _one_ such header, I have eliminated the ability to associate
VLIW features with the variable-length instructions - it is possible to indicate the use of an alternate instruction set.
Oops, it can only be used in the first slot because it conflicts with
the paired 15-bit short instructions! What will be the best way to fix that...
On Sun, 26 Apr 2026 23:43:48 -0700, Stephen Fuld wrote:
I don't think anyone else designing a CPU had the goal of "a very largeWell, nobody else may have had it as a _goal_. But others have certainly also _achieved_ that goal, even if they had never set it. The IBM System/
instruction set". But if that was your goal, I think you have achieved
it! :-(
360 and its descendants are a case in point.
quadi [2026-04-27 09:06:25] wrote:
On Sun, 26 Apr 2026 23:43:48 -0700, Stephen Fuld wrote:
I don't think anyone else designing a CPU had the goal of "a veryWell, nobody else may have had it as a _goal_. But others have
large instruction set". But if that was your goal, I think you
have achieved it! :-(
certainly also _achieved_ that goal, even if they had never set it.
The IBM System/ 360 and its descendants are a case in point.
Arguably the Itanic was designed with such a goal as well where the
"size" was measured as a kind of "patentability".
=== Stefan
| Sysop: | DaiTengu |
|---|---|
| Location: | Appleton, WI |
| Users: | 1,116 |
| Nodes: | 10 (0 / 10) |
| Uptime: | 85:27:12 |
| Calls: | 14,305 |
| Files: | 186,338 |
| D/L today: |
647 files (184M bytes) |
| Messages: | 2,525,478 |