• Concertina II: Finding Happiness Through Coding

    From quadi@quadibloc@ca.invalid to comp.arch on Wed Apr 22 03:31:03 2026
    From Newsgroup: comp.arch

    I was not happy that when I did not use a block prefix, I had to omit the
    Load Medium and Store Medium instructions from the basic load/store instructions.

    I searched for available opcode space.

    I found a little; enough for the _other_ block prefixes. But not a full
    1/16 of the opcode space which is what the Type I header needed. Where did
    I find it? In the opcodes for operate instructions which the 15-bit paired short instructions don't use.

    So I thought that perhaps I could shrink the requirements of the Type I header. If, by making use of the fact that 10 (start of 32-bit or longer instruction) can only be followed by 11 (not the start of an instruction), then maybe I could replace four consecutive two-bit prefixes by one seven-
    bit prefix.

    But alas, this fact only reduced the possibilities to 81 + 27 + 27 + 1,
    which is 136, which is greater than 128.

    However, if I made use of the fact that I would know if the preceding 16-
    bit zone began a 32-bit instruction, and added certain other restrictions
    on the allowed combinations - by insisting that all pseudo-immediates be tidily put at the end of the block - I thought I was able to squeeze it in.

    This may be a step too far, so I've saved everything if I need to go back.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Wed Apr 22 18:15:17 2026
    From Newsgroup: comp.arch


    quadi <quadibloc@ca.invalid> posted:

    I was not happy that when I did not use a block prefix, I had to omit the Load Medium and Store Medium instructions from the basic load/store instructions.

    Is LD Medium obtaining a sooth sayer from memory?

    Is ST Medium putting a sooth sayer back in memory?

    How do you know a sooth sayer fits in 2^(3+n) bytes???

    I searched for available opcode space.

    I found a little; enough for the _other_ block prefixes. But not a full
    1/16 of the opcode space which is what the Type I header needed. Where did
    I find it? In the opcodes for operate instructions which the 15-bit paired short instructions don't use.

    So I thought that perhaps I could shrink the requirements of the Type I header. If, by making use of the fact that 10 (start of 32-bit or longer instruction) can only be followed by 11 (not the start of an instruction), then maybe I could replace four consecutive two-bit prefixes by one seven- bit prefix.

    But alas, this fact only reduced the possibilities to 81 + 27 + 27 + 1, which is 136, which is greater than 128.

    However, if I made use of the fact that I would know if the preceding 16-
    bit zone began a 32-bit instruction, and added certain other restrictions
    on the allowed combinations - by insisting that all pseudo-immediates be tidily put at the end of the block - I thought I was able to squeeze it in.

    This may be a step too far, so I've saved everything if I need to go back.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Thu Apr 23 03:35:44 2026
    From Newsgroup: comp.arch

    On Wed, 22 Apr 2026 18:15:17 +0000, MitchAlsup wrote:
    quadi <quadibloc@ca.invalid> posted:

    I was not happy that when I did not use a block prefix, I had to omit
    the Load Medium and Store Medium instructions from the basic load/store
    instructions.

    Is LD Medium obtaining a sooth sayer from memory?

    Is ST Medium putting a sooth sayer back in memory?

    How do you know a sooth sayer fits in 2^(3+n) bytes???

    No, I am not referring to one who channels the spirits of the dead.

    Instead, the Medium data type refers to 48-bit floating-point values;
    although not part of the IEEE 754 standard, they follow the pattern of the types defined in it. They offer a precision just above 11 decimal digits,
    and an exponent range that exceeds 10 to plus or minus 99, thus
    approximating the numbers pocket calculators make available.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Thu Apr 23 17:17:48 2026
    From Newsgroup: comp.arch

    On 4/22/2026 10:35 PM, quadi wrote:
    On Wed, 22 Apr 2026 18:15:17 +0000, MitchAlsup wrote:
    quadi <quadibloc@ca.invalid> posted:

    I was not happy that when I did not use a block prefix, I had to omit
    the Load Medium and Store Medium instructions from the basic load/store
    instructions.

    Is LD Medium obtaining a sooth sayer from memory?

    Is ST Medium putting a sooth sayer back in memory?

    How do you know a sooth sayer fits in 2^(3+n) bytes???

    No, I am not referring to one who channels the spirits of the dead.

    Instead, the Medium data type refers to 48-bit floating-point values; although not part of the IEEE 754 standard, they follow the pattern of the types defined in it. They offer a precision just above 11 decimal digits,
    and an exponent range that exceeds 10 to plus or minus 99, thus
    approximating the numbers pocket calculators make available.


    Ironically, I had considered an intermediate format a few times, mostly represented as the Binary64 format with the low-order bits cut off.

    Mostly hadn't amounted to much.



    I did end up experimenting with support for a very niche converter:
    (31:0) => (63:0)
    As:
    (31:4), (11:4), (11:4), (11:4), (11:4), (3:0)

    Currently only available in an Imm32 instruction.

    Seemingly, this pattern can deal with roughly 2/3 of the FPU constants
    that miss as Binary16:
    Multiples of 1/3, 1/5 and similar hit with this.

    It fails for patterns like 1/7, 1/9, ... or similar, which have a
    different bit pattern length (pattern doesn't repeat along an 8-bit
    spacing).

    Patterns like 1/7, 1/9, ... could be instead addressed with a pattern
    that repeats on a multiple of 12 bits. But, this sort of thing is
    getting a bit niche (would need different patterns to deal with
    different fractions).

    But, is a relatively affordable way to deal with this pattern; even if
    it can't be crammed into a small size in the same way as simple BFP
    patterns (and encoding an index into a table of possible patterns wont
    save much over expressing the pattern directly).

    Also, the 12-bit pattern case can be noted to miss more with patterns
    that would hit with 8-bit or with binary16 (the 8-bit pattern case
    mostly overlaps as well with the area covered by Binary16). A 6-bit
    pattern could still overlap with Binary16's range, but would be more
    limited in the fractions it can deal with.



    Only really relevant for constant values though (as a live FP format,
    would be worse than normal BFP).

    Though, can make use of the extra bit left over from the Imm32f
    encodings (which are actually stored as Imm33). More a debate though of
    if it is worth the non-zero additional LUT cost to do so.


    But, this combination would leave, statistically:
    Imm16f: 63%
    Imm6f 25% (S.E3.M2)
    Imm32fu: 71% (8% over 63%, simply Binary64 truncated to 32 bits)
    Imm32fn: 88% (25% hit rate over 63%, 8-bit pattern from above)

    ...


    While Imm32fn has a higher hit rate than Imm32fu, they have a
    non-overlap, so the combined Imm32fun in this case seems to have around
    a 96% hit-rate, with around 4% in the "miss" category (irrational
    constants, and stuff like 1/7 which has a 3 bit repeating pattern, vs
    2-bit for 1/3 and 1/5).

    If I added the 12-bit pattern (in addition to the existing two), could
    maybe push it up to around a 97% or 98% hit rate, but the 12-bit pattern
    by itself has a lower hit-rate than simply truncating the Binary64 value
    to 32 bits, or even Binary16. So, selecting between 8b+12b pattern would
    do worse than trunc32 + 8b pattern.


    But, dunno.



    However, the relative usage of floating point immediate values is low
    enough that this doesn't make a big impact on code density.


    Not much more "low hanging fruit" for improving code density ATM, but it
    seems like if I could squeeze out a few more percent on overall code
    density, it could put XG3 more solidly in the lead vs RV64GC+JX (where,
    right now it is pretty close and which one wins/loses depends a lot on
    the program being tested).

    ...



    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Thu Apr 23 18:08:04 2026
    From Newsgroup: comp.arch

    On 4/23/2026 5:17 PM, BGB wrote:
    On 4/22/2026 10:35 PM, quadi wrote:
    On Wed, 22 Apr 2026 18:15:17 +0000, MitchAlsup wrote:
    quadi <quadibloc@ca.invalid> posted:

    I was not happy that when I did not use a block prefix, I had to omit
    the Load Medium and Store Medium instructions from the basic load/store >>>> instructions.

    Is LD Medium obtaining a sooth sayer from memory?

    Is ST Medium putting a sooth sayer back in memory?

    How do you know a sooth sayer fits in 2^(3+n) bytes???

    No, I am not referring to one who channels the spirits of the dead.

    Instead, the Medium data type refers to 48-bit floating-point values;
    although not part of the IEEE 754 standard, they follow the pattern of
    the
    types defined in it. They offer a precision just above 11 decimal digits,
    and an exponent range that exceeds 10 to plus or minus 99, thus
    approximating the numbers pocket calculators make available.


    Ironically, I had considered an intermediate format a few times, mostly represented as the Binary64 format with the low-order bits cut off.

    Mostly hadn't amounted to much.



    I did end up experimenting with support for a very niche converter:
      (31:0) => (63:0)
    As:
      (31:4), (11:4), (11:4), (11:4), (11:4), (3:0)

    Currently only available in an Imm32 instruction.

    Seemingly, this pattern can deal with roughly 2/3 of the FPU constants
    that miss as Binary16:
    Multiples of 1/3, 1/5 and similar hit with this.

    It fails for patterns like 1/7, 1/9, ... or similar, which have a
    different bit pattern length (pattern doesn't repeat along an 8-bit spacing).

    Patterns like 1/7, 1/9, ... could be instead addressed with a pattern
    that repeats on a multiple of 12 bits. But, this sort of thing is
    getting a bit niche (would need different patterns to deal with
    different fractions).

    But, is a relatively affordable way to deal with this pattern; even if
    it can't be crammed into a small size in the same way as simple BFP
    patterns (and encoding an index into a table of possible patterns wont
    save much over expressing the pattern directly).

    Also, the 12-bit pattern case can be noted to miss more with patterns
    that would hit with 8-bit or with binary16 (the 8-bit pattern case
    mostly overlaps as well with the area covered by Binary16). A 6-bit
    pattern could still overlap with Binary16's range, but would be more
    limited in the fractions it can deal with.



    Only really relevant for constant values though (as a live FP format,
    would be worse than normal BFP).

    Though, can make use of the extra bit left over from the Imm32f
    encodings (which are actually stored as Imm33). More a debate though of
    if it is worth the non-zero additional LUT cost to do so.


    But, this combination would leave, statistically:
      Imm16f: 63%
        Imm6f 25% (S.E3.M2)
      Imm32fu: 71% (8% over 63%, simply Binary64 truncated to 32 bits)
      Imm32fn: 88% (25% hit rate over 63%, 8-bit pattern from above)

    ...


    While Imm32fn has a higher hit rate than Imm32fu, they have a non-
    overlap, so the combined Imm32fun in this case seems to have around a
    96% hit-rate, with around 4% in the "miss" category (irrational
    constants, and stuff like 1/7 which has a 3 bit repeating pattern, vs 2-
    bit for 1/3 and 1/5).

    If I added the 12-bit pattern (in addition to the existing two), could
    maybe push it up to around a 97% or 98% hit rate, but the 12-bit pattern
    by itself has a lower hit-rate than simply truncating the Binary64 value
    to 32 bits, or even Binary16. So, selecting between 8b+12b pattern would
    do worse than trunc32 + 8b pattern.



    Relevant, but failed to mention, 12-bit pattern:
    (31:4), (15:4), (15:4), (15:8), (3:0)

    Which is effectively S.E11.M4 apart from the bits forming the pattern.

    Though, could maybe see if there is some other patterns that could do
    better here is terms of average hit-rate.

    Or, if the seeming relative success of truncation and the 8-bit pattern
    is more of a "take it as good enough and leave it at that" thing.


    My last "big survey of floating point constants" had failed to take into account stats for repeating bit patterns (hadn't thought of trying to go
    this route at the time; had thought in terms of power-of-10 scaling, but
    this was much less feasible than trying to account more directly for the repeating bit patterns within the fractions).


    As noted, a repeating pattern can in premise deal with all smaller
    patters that have a common factor:
    12 can handle 2, 3, 4, 6;
    8 can handle 2, 4, 8.

    Downfall of the 12-bit pattern is that it doesn't leave enough bits in
    the mantissa for the top-end value.

    Though, could squeeze a few bits out of the exponent:
    (31:30),
    (30)?4'h0:4'hF,
    (29:4),
    (15:4), (15:4), (15:12), (3:0)

    Effectively giving 8-bits of usable mantissa.

    Or, maybe sacrifice the sign bit (almost always positive for FPU
    immediate values):
    1'b0, (30),
    (30) ? 4'h0 : 4'hF,
    (29:4),
    (31) ?
    { (15:4), (15:4), (15:12) } :
    { (11:4), (11:4), (11:4), (11:8) },
    (3:0)


    Haven't evaluated these possibilities yet though to determine possible
    effects on hit-rate...



    But, dunno.



    However, the relative usage of floating point immediate values is low
    enough that this doesn't make a big impact on code density.


    Not much more "low hanging fruit" for improving code density ATM, but it seems like if I could squeeze out a few more percent on overall code density, it could put XG3 more solidly in the lead vs RV64GC+JX (where, right now it is pretty close and which one wins/loses depends a lot on
    the program being tested).

    ...




    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri Apr 24 05:29:12 2026
    From Newsgroup: comp.arch

    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    quadi <quadibloc@ca.invalid> posted:

    I was not happy that when I did not use a block prefix, I had to omit the >> Load Medium and Store Medium instructions from the basic load/store
    instructions.

    Is LD Medium obtaining a sooth sayer from memory?

    Is ST Medium putting a sooth sayer back in memory?

    Obviously, this refers to steaks.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Fri Apr 24 12:01:34 2026
    From Newsgroup: comp.arch

    On Fri, 24 Apr 2026 05:29:12 +0000, Thomas Koenig wrote:

    Obviously, this refers to steaks.

    In a higher-level language, one has:

    Real
    Intermediate
    Double Precision
    Extended

    But in Assembler, one needs

    Floating
    Medium
    Double
    Extended

    because R for Real can be confused with R for Register, and I for
    Intermediate can be confused with I for Integer.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Robert Finch@robfi680@gmail.com to comp.arch on Fri Apr 24 10:53:19 2026
    From Newsgroup: comp.arch

    On 2026-04-24 8:01 a.m., quadi wrote:
    On Fri, 24 Apr 2026 05:29:12 +0000, Thomas Koenig wrote:

    Obviously, this refers to steaks.

    In a higher-level language, one has:

    Real
    Intermediate
    Double Precision
    Extended

    But in Assembler, one needs

    Floating
    Medium
    Double
    Extended

    because R for Real can be confused with R for Register, and I for Intermediate can be confused with I for Integer.

    John Savard

    What about triple and quad precision? Or extended triple precision?

    For Arpl at one point the float precision could be specified a bit like bitfields are specified in ‘C’ as in:
    Float:8 myvar;

    Changed it though to standard types as it was undesirable to support any bit-length for floats which would have to be done with software. Now it
    is just:

    float byte myvar;
    float quad qvar;

    Can also use shorter form for some types like:
    double dvar;
    Instead of having to type ‘float double dvar;’

    Some float approximations will supply around 7 bits which works well to
    fill in the significand for the progression of 16, 32, 64, 128-bit floats.

    Having a 48-bit float type likely does not save any processing time over
    a 64-bit type. It is more a matter of storage space.

    48-bit floats in arrays may slow down indexed addressing; scaled index
    address modes are usually a power of two.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Fri Apr 24 10:22:45 2026
    From Newsgroup: comp.arch

    On 4/24/2026 7:01 AM, quadi wrote:
    On Fri, 24 Apr 2026 05:29:12 +0000, Thomas Koenig wrote:

    Obviously, this refers to steaks.

    In a higher-level language, one has:

    Real
    Intermediate
    Double Precision
    Extended

    But in Assembler, one needs

    Floating
    Medium
    Double
    Extended

    because R for Real can be confused with R for Register, and I for Intermediate can be confused with I for Integer.


    I went with:
    H: Half
    F/S: Float or Single
    D: Double
    X: 128-bit (beyond this depends on context)

    RV used Q for Binary128, but Q was more widely used for Int64 in my naming.

    Int naming:
    B/SB/UB: Byte
    W/SW/UW: Int16 ("word")
    L/SL/UL: Int32 ("long")
    T/ST/UT: Int48 ("tword" / triple word), short lived
    Q: Int64 ("qword")

    RV had used:
    B/H/{W|S}/D/Q

    ...

    Did look into 48b Load/Store ops, but didn't stick.
    Could have supported a 48-bit format mostly by using 48b Load/Store.
    Other option being to fake it by using 64b ops, and MUX'ing.
    Load, MUX, Store

    Less efficient, but TW was super niche and hard to justify keeping it
    around.




    But, yeah, otherwise disrupted by a PSU failure on main PC (yesterday).
    Waiting for a new PSU to show up, can't get back to "business as usual"
    until then. Failed PSU was a 750W Rosewill, ordered a 750W MSI,
    hopefully works... Was $30 more than another Rosewill, but hopefully
    worth it (there were also much more expensive PSUs, I just didn't go for
    the cheapest option in this case, but yeah).

    Lots of bad luck in general yesterday.


    John Savard

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Fri Apr 24 16:08:03 2026
    From Newsgroup: comp.arch

    On Fri, 24 Apr 2026 10:53:19 -0400, Robert Finch wrote:

    What about triple and quad precision? Or extended triple precision?

    There is quad precision, referred to as extended precision.
    Normally, there is no 96-bit triple precision. That may, however, make an appearance when the computer is working with storage divided into 48-bit
    units instead of 32/64-bit units.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Fri Apr 24 16:12:26 2026
    From Newsgroup: comp.arch

    On Wed, 22 Apr 2026 03:31:03 +0000, quadi wrote:

    This may be a step too far, so I've saved everything if I need to go
    back.

    While I had tried to organize the coding scheme, I decided that it was too complex to be tolerable.

    The compromise of eliminating medium format floating-point loads and
    stores from the default basic instruction set was not tolerable.

    The compromise to the 15-bit paired instructions that preceded that was
    also not tolerable.

    So what to do? What I've been doing all along in this design process -
    move the compromise somewhere else, and see if I can put up with it. So
    now I've decided to take the 32-bit header for variable-length
    instructions, and put the compromise there. This required bringing back 16-
    bit short instructions.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Apr 24 18:52:07 2026
    From Newsgroup: comp.arch


    BGB <cr88192@gmail.com> posted:

    On 4/24/2026 7:01 AM, quadi wrote:
    --------------------
    I went with:
    H: Half
    F/S: Float or Single
    D: Double
    X: 128-bit (beyond this depends on context)

    RV used Q for Binary128, but Q was more widely used for Int64 in my naming.

    Int naming:
    B/SB/UB: Byte
    W/SW/UW: Int16 ("word")
    L/SL/UL: Int32 ("long")
    T/ST/UT: Int48 ("tword" / triple word), short lived
    Q: Int64 ("qword")

    RV had used:
    B/H/{W|S}/D/Q

    This is what I use. Except I have signed and unsigned integer
    arithmetic {B, BS, H, HS, W, WS, D} integers and {H, S, D}
    floats.

    John Savard

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Fri Apr 24 09:28:31 2026
    From Newsgroup: comp.arch

    I was not happy that when I did not use a block prefix, I had to omit the >>> Load Medium and Store Medium instructions from the basic load/store
    instructions.
    Is LD Medium obtaining a sooth sayer from memory?
    Is ST Medium putting a sooth sayer back in memory?
    Obviously, this refers to steaks.

    But these operations are too rare to include in usual ISAs,


    === Stefan
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri Apr 24 20:05:45 2026
    From Newsgroup: comp.arch

    Stefan Monnier <monnier@iro.umontreal.ca> schrieb:
    I was not happy that when I did not use a block prefix, I had to omit the >>>> Load Medium and Store Medium instructions from the basic load/store
    instructions.
    Is LD Medium obtaining a sooth sayer from memory?
    Is ST Medium putting a sooth sayer back in memory?
    Obviously, this refers to steaks.

    But these operations are too rare to include in usual ISAs,

    Well done!
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat Apr 25 01:48:04 2026
    From Newsgroup: comp.arch


    Thomas Koenig <tkoenig@netcologne.de> posted:

    Stefan Monnier <monnier@iro.umontreal.ca> schrieb:
    I was not happy that when I did not use a block prefix, I had to omit the
    Load Medium and Store Medium instructions from the basic load/store >>>> instructions.
    Is LD Medium obtaining a sooth sayer from memory?
    Is ST Medium putting a sooth sayer back in memory?
    Obviously, this refers to steaks.

    But these operations are too rare to include in usual ISAs,

    Well done!

    You can sous vidé
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Sat Apr 25 02:08:10 2026
    From Newsgroup: comp.arch

    On Fri, 24 Apr 2026 16:12:26 +0000, quadi wrote:

    So what to do? What I've been doing all along in this design process -
    move the compromise somewhere else, and see if I can put up with it. So
    now I've decided to take the 32-bit header for variable-length
    instructions, and put the compromise there.

    Something truly evil has occurred to me. But since it _is_ so evil, I
    don't think that I will do it.

    Instead of removing the LM (Load Medium) and STM (Store Medium) basic memory-reference instructions...

    there are another two that, under a certain circumstance, could be removed.

    Under that circumstance, there would still be IB (Insert Byte) and IH
    (Insert Halfword), and ULB (Unsigned Load Byte) and ULH (Unsigned Load Halfword).

    But _not_ I (Insert) and UL (Unsigned Load).

    In the case of a *32-bit* architecture.

    So the truly evil thing would be...

    1) To decide that a 32-bit version of the architecture needs to be defined;
    2) To decide that it should be the default;
    3) To decide that not switching modes to get at instruction set extensions should apply to the switch between 32 bits and 64 bits too, so that 64-bit code would consist _entirely_ of instruction blocks that begin with a
    block header, because instructions without a header could only be in one state, that being 32-bits.

    Of course, it's 3) that exposes the true evilness of this scheme. So I
    don't think it's a place I want to go.

    However, let's say I do want to define 32-bit operation, but _with_ a mode bit.

    Then in 32-bit mode, those two opcodes would get used for an uncompromised variable-length instruction header.

    Now what is 64-bit mode going to look like? That would almost force going
    back to the option of demoting LM and STM. Or does it mean I have to come
    up with something more devious, something truly perverse, that somehow provides a headerless header, sneaking in an invisible mode bit in the
    code itself? But there's no such thing as a free bit; they're like midday meals in this regard.

    The System/360 had it simple - extra instructions needed for 64-bit
    operation? Just shove them in the 64-bit opcode space. But in Concertina
    II, instructions longer than 32 bits are somewhat wasteful in overhead,
    though I've tried... and they're _only_ available with a particular
    category of headers. So I feel I need to have everything _important_ in
    the basic 32-bit instruction set.

    This direction of thinking suggests... that I use some of the opcode space
    I still do have free... for special 64-bit instructions that are available without a header. This has been done before in previous Concertina II iterations. Emergency long instructions - inefficient because _both_ 32-
    bit words of the instruction have to begin with 9 or so overhead bits to indicate they belong to such an instruction... but less inefficient than adding a whole 32-bit header to the block if you just need one of them in
    the block.

    That way, I can add lots of extra instructions to be part of the basic headerless instruction set.

    While such a capability may be a good thing in itself, though, using it as
    an excuse to uglify the basic instruction set is _still_ something I would want to avoid, so I don't think it solves the problem of restoring the 32-
    bit variable-length instruction header to its uncompromised glory.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Sat Apr 25 03:43:29 2026
    From Newsgroup: comp.arch

    On Sat, 25 Apr 2026 02:08:10 +0000, quadi wrote:

    This direction of thinking suggests... that I use some of the opcode
    space I still do have free... for special 64-bit instructions that are available without a header. This has been done before in previous
    Concertina II iterations. Emergency long instructions - inefficient
    because _both_ 32- bit words of the instruction have to begin with 9 or
    so overhead bits to indicate they belong to such an instruction... but
    less inefficient than adding a whole 32-bit header to the block if you
    just need one of them in the block.

    That way, I can add lots of extra instructions to be part of the basic headerless instruction set.

    Thinking about this led me to do something completely different, which required me to use a bit more opcode space for headers - but I think I
    left enough where I grabbed it from to still do this as well.

    This was done to add some additional flexibility to one type of variable length code - now the _short_ instructions, instead of the memory-
    reference operate instructions, can be given the power to alter the
    condition codes.

    This comes at a cost, though. Now the memory-reference operate
    instructions can only use the first half of each register bank as
    destination registers, and the header no longer provides access to a
    secondary 32-bit instruction set as well, if this option is chosen.

    While this could be very useful, by allowing short instructions to be used
    in cases where they previously could not, it's still somewhat perverse,
    like much else I have done in this and previous iterations of Concertina
    II.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Sat Apr 25 00:38:35 2026
    From Newsgroup: comp.arch

    On 4/24/2026 1:52 PM, MitchAlsup wrote:

    BGB <cr88192@gmail.com> posted:

    On 4/24/2026 7:01 AM, quadi wrote:
    --------------------
    I went with:
    H: Half
    F/S: Float or Single
    D: Double
    X: 128-bit (beyond this depends on context)

    RV used Q for Binary128, but Q was more widely used for Int64 in my naming. >>
    Int naming:
    B/SB/UB: Byte
    W/SW/UW: Int16 ("word")
    L/SL/UL: Int32 ("long")
    T/ST/UT: Int48 ("tword" / triple word), short lived
    Q: Int64 ("qword")

    RV had used:
    B/H/{W|S}/D/Q

    This is what I use. Except I have signed and unsigned integer
    arithmetic {B, BS, H, HS, W, WS, D} integers and {H, S, D}
    floats.

    It likely depends on which "tradition" one is coming from.


    In my case, I was coming from SH-4 and x86.
    SH-4 was B/W/L (likewise for M68K and i386 syntax in GAS).
    Though, differs from M68K and "i386" in various ways
    (eg, no "%" on registers, ...).
    Well, and 0x1234 vs $1234 or similar.
    Eg: "mov 0x1234, r10" vs "mov #$1234, %d4"
    But, seems even within GAS usage, this was inconsistent.
    Q/X: from x86 (though x86 also used DQ instead of X for some ops).


    At present, it seems like 'X' may have been a mistake (well, along with
    trying to use both sets of mnemonics and then trying to auto-detect the
    ASM style).

    Though, there is still the problem that there is no good or fully
    reliable way to tell the which ASM syntax is in use (and, neither
    annotates it, and since both evolved from variants of GAS ASM syntax, it
    makes it harder).

    ...

    Well, and I guess one could try to argue the merits of, say:
    0x1234
    $1234
    1234H
    &H1234
    #0x1234
    #$1234
    16'h1234
    ...
    And, say:
    (R10, 16)
    16(R10)
    [R10+16]
    [R10,16]
    ...



    Otherwise:
    New PSU showed up, and is installed, and main PC is working again.


    Decided to test the new decimal packing schemes against the "bulk
    scavenged FP constants" test, results currently for this test;
    Binary16 hit rate : 63.7%
    Truncated to 32 bits: 66.9%
    Packing, 8b-A: 73.9%
    Packing, 8b-B: 62.5%
    Packing, 12b : 61.3%
    T32 + 8b-B + 12b: 77.2%
    T32 + 8b-A: 76.9%

    This is lower than my earlier estimates based on my smaller scale tests.

    Where, as noted, unpacking patters:
    Fp16: (15:14), (14) ? 6'h00 : 6'h3F, (13:0), 42'h0
    T32: (31:0), 32'h0
    8b-A: (31:4), (11:4), (11:4), (11:4), (11:4), (3:0)
    8b-B: 1'b0, (30) ? 5'h10 : 5'h0F, (29:4),
    (11:4), (11:4), (11:4), (11:8), (3:0)
    12: 1'b0, (30) ? 5'h10 : 5'h0F, (29:4),
    (15:4), (15:4), (15:12), (3:0)


    The T32 + 8b-A case has nearly the same hit rate, but is cheaper (and,
    is also what I had already implemented experimentally).

    While T32 + 8B-A + 12b could potentially give the highest hit rate, this combination would also be the most expensive. And, without the exponent trickery, the hit-rate for 12b will suck.


    But, as-is, would be exclusive to XG3 (XG1/XG2/RV being limited to the
    Fp16 case for FPU immediate forms).


    Still debatable if worth the costs (while it is improvement in hit rate,
    it is also a bit of a corner case).


    ...



    John Savard


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat Apr 25 18:00:22 2026
    From Newsgroup: comp.arch


    BGB <cr88192@gmail.com> posted:

    On 4/24/2026 1:52 PM, MitchAlsup wrote:
    ------------------
    This is what I use. Except I have signed and unsigned integer
    arithmetic {B, BS, H, HS, W, WS, D} integers and {H, S, D}
    floats.

    It likely depends on which "tradition" one is coming from.

    IBM 360, 1963.
    ------------------
    Well, and I guess one could try to argue the merits of, say:
    0x1234
    $1234
    1234H
    &H1234
    #0x1234
    #$1234
    16'h1234

    Use C notation when possible.
    ...
    And, say:
    (R10, 16)
    16(R10)
    [R10+16]
    [R10,16]
    ...

    The [] notations tell ASM that the instruction has to be a
    memory reference, the () notations do not.


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Sat Apr 25 13:39:25 2026
    From Newsgroup: comp.arch

    On 4/25/2026 1:00 PM, MitchAlsup wrote:

    BGB <cr88192@gmail.com> posted:

    On 4/24/2026 1:52 PM, MitchAlsup wrote:
    ------------------
    This is what I use. Except I have signed and unsigned integer
    arithmetic {B, BS, H, HS, W, WS, D} integers and {H, S, D}
    floats.

    It likely depends on which "tradition" one is coming from.

    IBM 360, 1963.

    OK.

    ------------------
    Well, and I guess one could try to argue the merits of, say:
    0x1234
    $1234
    1234H
    &H1234
    #0x1234
    #$1234
    16'h1234

    Use C notation when possible.

    That is my preference (I usually use 0x1234 without any extra
    adornment), usually...

    Except that the 6502/65C816 and M68K fans seem to really like using
    $1234 instead...

    Stylistically, I think the 6502 ASM notation was influenced by Motorola (though differs somewhat from M68K notation).

    Likewise, GAS's i386 syntax was likely influenced by the M68K ASM syntax.


    ...
    And, say:
    (R10, 16)
    16(R10)
    [R10+16]
    [R10,16]
    ...

    The [] notations tell ASM that the instruction has to be a
    memory reference, the () notations do not.


    Could be.
    I think () comes mainly from the PDP/VAX/M68K style lineage...
    Whereas [] were used more with Intel and ARM and similar.



    As noted, I ended up preferring (Rb, Disp) over Disp(Rb), but RISC-V's standard ASM syntax went the other way.

    Neither ended up using @Rb syntax though, which was used by Hitachi and
    Texas Instruments, but in a way differing in the specifics from how DEC
    had used it in PDP and VAX (or was some of this more due to AT&T, hard
    to tell?...).

    ...




    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Robert Finch@robfi680@gmail.com to comp.arch on Sat Apr 25 14:39:43 2026
    From Newsgroup: comp.arch

    On 2026-04-25 1:38 a.m., BGB wrote:
    On 4/24/2026 1:52 PM, MitchAlsup wrote:

    BGB <cr88192@gmail.com> posted:

    On 4/24/2026 7:01 AM, quadi wrote:
    --------------------
    I went with:
        H: Half
        F/S: Float or Single
        D: Double
        X: 128-bit (beyond this depends on context)

    RV used Q for Binary128, but Q was more widely used for Int64 in my
    naming.

    Int naming:
        B/SB/UB: Byte
        W/SW/UW: Int16 ("word")
        L/SL/UL: Int32 ("long")
        T/ST/UT: Int48 ("tword" / triple word), short lived
        Q: Int64 ("qword")

    RV had used:
        B/H/{W|S}/D/Q

    This is what I use. Except I have signed and unsigned integer
    arithmetic {B, BS, H, HS, W, WS, D} integers and {H, S, D}
    floats.

    It likely depends on which "tradition" one is coming from.


    In my case, I was coming from SH-4 and x86.
      SH-4 was B/W/L (likewise for M68K and i386 syntax in GAS).
        Though, differs from M68K and "i386" in various ways
         (eg, no "%" on registers, ...).
         Well, and 0x1234 vs $1234 or similar.
           Eg: "mov 0x1234, r10" vs "mov #$1234, %d4"
           But, seems even within GAS usage, this was inconsistent.
      Q/X: from x86 (though x86 also used DQ instead of X for some ops).


    After begin confused enough times I started using Knuth.
    B = byte
    W = wyde (16 bits)
    T = tetra (32 bits)
    O = octa (64 bits)
    H = hexi (128 bits)
    I used 'D' at one time to represent 80-bits.

    For floating-point where things seem to be more standard
    H = half
    S = single
    D = double
    Q = quad

    At present, it seems like 'X' may have been a mistake (well, along with trying to use both sets of mnemonics and then trying to auto-detect the
    ASM style).

    Though, there is still the problem that there is no good or fully
    reliable way to tell the which ASM syntax is in use (and, neither
    annotates it, and since both evolved from variants of GAS ASM syntax, it makes it harder).

    ...

    Well, and I guess one could try to argue the merits of, say:
      0x1234
      $1234
      1234H
      &H1234
      #0x1234
      #$1234
      16'h1234
      ...
    And, say:
      (R10, 16)
      16(R10)
      [R10+16]
      [R10,16]
      ...



    Otherwise:
      New PSU showed up, and is installed, and main PC is working again.


    Decided to test the new decimal packing schemes against the "bulk
    scavenged FP constants" test, results currently for this test;
      Binary16 hit rate   : 63.7%
      Truncated to 32 bits: 66.9%
      Packing, 8b-A: 73.9%
      Packing, 8b-B: 62.5%
      Packing, 12b : 61.3%
        T32 + 8b-B + 12b: 77.2%
        T32 + 8b-A: 76.9%

    This is lower than my earlier estimates based on my smaller scale tests.

    Where, as noted, unpacking patters:
      Fp16: (15:14), (14) ? 6'h00 : 6'h3F, (13:0), 42'h0
      T32: (31:0), 32'h0
      8b-A: (31:4), (11:4), (11:4), (11:4), (11:4), (3:0)
      8b-B: 1'b0, (30) ? 5'h10 : 5'h0F, (29:4),
        (11:4), (11:4), (11:4), (11:8), (3:0)
      12:   1'b0, (30) ? 5'h10 : 5'h0F, (29:4),
        (15:4), (15:4), (15:12), (3:0)


    The T32 + 8b-A case has nearly the same hit rate, but is cheaper (and,
    is also what I had already implemented experimentally).

    While T32 + 8B-A + 12b could potentially give the highest hit rate, this combination would also be the most expensive. And, without the exponent trickery, the hit-rate for 12b will suck.


    But, as-is, would be exclusive to XG3 (XG1/XG2/RV being limited to the
    Fp16 case for FPU immediate forms).


    Still debatable if worth the costs (while it is improvement in hit rate,
    it is also a bit of a corner case).


    ...



    John Savard



    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Sat Apr 25 21:56:22 2026
    From Newsgroup: comp.arch

    On Sat, 25 Apr 2026 18:00:22 +0000, MitchAlsup wrote:
    BGB <cr88192@gmail.com> posted:

    It likely depends on which "tradition" one is coming from.

    IBM 360, 1963.

    You're early. The IBM System/360 was announced on April 7, 1964.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Sun Apr 26 00:28:06 2026
    From Newsgroup: comp.arch

    On 4/24/2026 9:53 AM, Robert Finch wrote:
    On 2026-04-24 8:01 a.m., quadi wrote:
    On Fri, 24 Apr 2026 05:29:12 +0000, Thomas Koenig wrote:

    Obviously, this refers to steaks.

    In a higher-level language, one has:

    Real
    Intermediate
    Double Precision
    Extended

    But in Assembler, one needs

    Floating
    Medium
    Double
    Extended

    because R for Real can be confused with R for Register, and I for
    Intermediate can be confused with I for Integer.

    John Savard

    What about triple and quad precision? Or extended triple precision?

    For Arpl at one point the float precision could be specified a bit like bitfields are specified in ‘C’ as in:
    Float:8 myvar;

    Changed it though to standard types as it was undesirable to support any bit-length for floats which would have to be done with software. Now it
    is just:

    float byte myvar;
    float quad qvar;

    Can also use shorter form for some types like:
    double dvar;
    Instead of having to type ‘float double dvar;’


    Hrrm:
    char float //FP8
    short float //Binary16
    float //Binary32
    long float //48-bit
    short double //48-bit (truncated Binary64, align=2)
    double //Binary64
    short long double //96-bit (truncated Binary128, align=4)
    long double //Binary128
    long long float //192-bit (truncated Binary256, align=8)
    long long double //Binary256

    Then, maybe:
    unsigned char float //FP8U
    signed char float //FP8A

    Well, or add _BitFloat(N) or similar.


    Some float approximations will supply around 7 bits which works well to
    fill in the significand for the progression of 16, 32, 64, 128-bit floats.

    Having a 48-bit float type likely does not save any processing time over
    a 64-bit type. It is more a matter of storage space.


    When I experimented with it before, it exists solely as a truncated
    storage format.

    I did experiment with special ops to save/reload the high 48 bits of a register to memory, but then noted that in this case it is likely better
    to do it as a multi-op sequence.

    This is less efficient, but saves on expending hardware resources on
    something that is likely to be rarely if ever used.



    48-bit floats in arrays may slow down indexed addressing; scaled index address modes are usually a power of two.


    Yes.

    Making array access to NPOT elements fast would be a harder problem.



    Intermediate-sized elements more often make sense for packed vectors;
    usually with 3 elements in a power-of 2 package.

    So, say:
    3x 10b (~ S.E5.M4 | E5.M5)
    3x 20b (~ S.E5.M14)
    3x 42b (~ S.E8.M33)

    In my case, there were special ops to help with these formats.




    Though, can note that my first major use of the 3x 42b case was not in
    my ISA project, but rather in my BT2 3D engine, where I had noted that
    in a use-case that was not particularly computationally bound, it
    actually worked out faster to store 3D coordinate vectors in a packed
    form and then unpack and repack them when it was time to do math on them (where, passing 192-bit structs around was significantly more expensive
    in the Windows X64 ABI than passing 128 bit structs).

    Also it was a case of, with a 1024km world size, normal Binary32 failed
    to give sufficient precision. Well, and for whatever reason I didn't
    think to just use fixed-point (the BT3 engine instead uses fixed-point internally for coords, but uses floating-point coords for the
    serialization format, *1).

    *1: Well, in this case, the BT3 engine uses a serialized format
    representing XML trees that I was calling ABXE, which (maybe ironically)
    was originally designed for BGBCC (though, the BT3 engine was partly
    built from code copy/pasted from BGBCC, but follows after some design
    elements from the BT2 engine as well).


    Well, as noted:
    BT1 engine: Used modified Quake Maps for world spawning, so no live
    entities.

    Despite ending up as a Minecraft clone, it was actually spawning the
    regions inside of the Quake Map. Well, more specifically, it was
    natively using the Half-Life variant of the map format, which had used a different (and more sane) system for specifying texture projection onto
    brush faces.

    If loading up a world with no voxel regions, the BT1 would behave more
    like a Quake-ish engine. Had also given it the ability to load Quake3
    and Doom3 maps (mostly similar technology). Did not have the ability to
    load the Half-Life 2 map format, as by this point Valve had changed the
    format quite significantly.


    Things differed quite significantly for the BT2 engine, which used
    solely regions with no entities. Instead it had used entity-spawn
    blocks, which would fire off command strings when the region was loaded
    to spawn in any entities. Any associated entities would unload when the
    region was unloaded, and there was no persistent live state with the
    entities (rather the "quest state" was held entirely in hidden spots
    within the player's inventory; and entities could be set to spawn or
    despawn based on the inventory state).


    Things changed again for the BT3 Engine, which instead used exclusively persistent entities (stored via the ABXE format). Essentially each
    region holding the equivalent of a binary-serialized XML document
    describing the state of all of the entities within the region.

    Structures could be described in XML format and then spawned into the
    world, along with any entities. Generally, this would involve removing
    prior spawned entities of given types from a bounding box before
    spawning in new ones (and then generally using a stack of BMP images to
    drive block placement, which each color mapped to a specific block
    type). Though, potentially, BMP pixels could also be used to drive
    entity spawning as well.

    It is also possible that CSG brushes could be reintroduced, but would be
    out of place in a Minecraft style world. Well, or more so than using 3D
    models described in SCAD scripts already is.

    Ironically, if adding brush-model geometry, the most sensible way to do
    it would be to do it in a similar way to Quake Brush Entities, which
    each entity holds its associated brush geometry but possibly remains
    mostly static. Well, as opposed to reintroducing the concept of a
    "worldspawn" which doesn't make as much sense in this case (and if a worldspawn existed, it would likely make sense as some sort of "global
    and always loaded" entity; probably who's responsibilities would include things like managing the sky and day/night cycle).

    Well, and might slightly increase the level of abstraction by being like:
    <brush_aabb mx=... ... nz=... texture=... />
    Rather than as individual planes:
    <brush>
    <face nx=... ny=... nz=... nw=... texture=... .../>
    </brush>

    Well, or the other (more complex) option being to allow inline SCAD or similar.

    <entity classname="func_wall">
    <scad>
    <![CDATA[
    ...
    ]]>
    </scad>
    </entity>

    Well, and extending the language to support texture maps and similar. On spawn, engine would likely run the script and then generate a binary
    data blob to hold the geometry (as opposed to loading a 3D model from a
    file, could make it otherwise equivalent to my existing support for BMD models, *2).

    Well, or use my makeshift CSG BASIC language.

    *2: Also ironically, despite BMD models being described using CSG, the
    storage format generally uses closed meshes with BREP rules for
    collision detection.

    Well, as opposed to Quake where everything had used bounding boxes
    (well, though player and mob collisions had used bounding boxes, and
    generally BT3 had used Minecraft-like behavior where entities are
    essentially non-solid but push away from each other).

    The BT1 engine also had rigid body physics (like HL2 or Doom3), but
    neither the BT2 nor BT3 engine had used this. Could bring it back, but
    there are surprisingly few "actually useful" cases for rigid body
    physics in a 3D engine (well, unless the game is HL2 or Portal, which
    used it as a gameplay element).

    Well, and then there is also soft-body physics, but there are very few
    use cases outside of using it for cosmetic effects.

    Well, and for player and entity movement and similar, it is pretty hard
    to beat the naive option of just having entities with AABBs or similar
    sliding around the scene (player is effectively shaped like a sliding refrigerator, but no one really notices).

    ...



    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Apr 26 07:30:13 2026
    From Newsgroup: comp.arch

    BGB <cr88192@gmail.com> schrieb:

    Stylistically, I think the 6502 ASM notation was influenced by Motorola (though differs somewhat from M68K notation).

    Which is not a surprise because the 6502 was developed
    by people from the 6800 (not to be confused with the
    with 68000 :-) development team at Motorola who had
    formed their own company. Compare a die shot of the 6800 https://happytrees.org/dieshots/Motorola_-_6800#/media/File:Motorola_6800_die.jpg
    with one of the 6502
    https://commons.wikimedia.org/wiki/File:MOS_6502_die.jpg to see
    the similarity in overall layout: PLA at the top, random logic
    in the middle, ALU and registers at the bottom.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Sun Apr 26 14:53:09 2026
    From Newsgroup: comp.arch

    On Sun, 26 Apr 2026 07:30:13 +0000, Thomas Koenig wrote:

    Which is not a surprise because the 6502 was developed by people from
    the 6800 (not to be confused with the with 68000 :-) development team at Motorola who had formed their own company.

    Unless you mean Freescale, which was spun off by Motorola itself, I don't
    know what you're referring to. A web search did not turn anything up about 68000 engineers leaving Motorola and founding their own startup.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch on Sun Apr 26 17:07:18 2026
    From Newsgroup: comp.arch

    On 26/04/2026 16:53, quadi wrote:
    On Sun, 26 Apr 2026 07:30:13 +0000, Thomas Koenig wrote:

    Which is not a surprise because the 6502 was developed by people from
    the 6800 (not to be confused with the with 68000 :-) development team at
    Motorola who had formed their own company.

    Unless you mean Freescale, which was spun off by Motorola itself, I don't know what you're referring to. A web search did not turn anything up about 68000 engineers leaving Motorola and founding their own startup.


    According to Wikipedia (I have no independent reference) the 6502 design
    team at MOS Technology had previously worked at Motorola on the 6800.

    <https://en.wikipedia.org/wiki/MOS_Technology_6502>


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Sun Apr 26 18:12:54 2026
    From Newsgroup: comp.arch

    On Sun, 26 Apr 2026 14:53:09 -0000 (UTC)
    quadi <quadibloc@ca.invalid> wrote:

    On Sun, 26 Apr 2026 07:30:13 +0000, Thomas Koenig wrote:

    Which is not a surprise because the 6502 was developed by people
    from the 6800 (not to be confused with the with 68000 :-)
    development team at Motorola who had formed their own company.

    Unless you mean Freescale, which was spun off by Motorola itself, I
    don't know what you're referring to. A web search did not turn
    anything up about 68000 engineers leaving Motorola and founding their
    own startup.

    John Savard

    Read again.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Sun Apr 26 11:49:24 2026
    From Newsgroup: comp.arch

    On 2026-Apr-26 11:12, Michael S wrote:
    On Sun, 26 Apr 2026 14:53:09 -0000 (UTC)
    quadi <quadibloc@ca.invalid> wrote:

    On Sun, 26 Apr 2026 07:30:13 +0000, Thomas Koenig wrote:

    Which is not a surprise because the 6502 was developed by people
    from the 6800 (not to be confused with the with 68000 :-)
    development team at Motorola who had formed their own company.

    Unless you mean Freescale, which was spun off by Motorola itself, I
    don't know what you're referring to. A web search did not turn
    anything up about 68000 engineers leaving Motorola and founding their
    own startup.

    John Savard

    Read again.

    I made the same parsing error.
    The smiley face can be confusing as it is also used as a close
    parenthesis to the qualifier "not to be confused with 68000"
    phrase, making it visually appear the qualifier continues into the
    "development team at Motorola who had formed their own company".

    Rephrasing:
    "Which is not a surprise because the 6502 was developed by people
    from the 6800 development team at Motorola who had formed their
    own company (not to be confused with the with 68000 :-)"


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sun Apr 26 18:28:50 2026
    From Newsgroup: comp.arch


    BGB <cr88192@gmail.com> posted:

    On 4/24/2026 9:53 AM, Robert Finch wrote:
    ------------------

    Hrrm:
    char float //FP8
    short float //Binary16
    float //Binary32
    long float //48-bit
    short double //48-bit (truncated Binary64, align=2)
    double //Binary64
    short long double //96-bit (truncated Binary128, align=4)
    long double //Binary128
    long long float //192-bit (truncated Binary256, align=8)
    long long double //Binary256

    Reminds me of Jumbo Shrimp:: is it a big Shrimp or a little Jumbo ??
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Sun Apr 26 20:31:42 2026
    From Newsgroup: comp.arch

    On Sun, 26 Apr 2026 17:07:18 +0200, David Brown wrote:

    According to Wikipedia (I have no independent reference) the 6502 design
    team at MOS Technology had previously worked at Motorola on the 6800.

    I knew that, but apparently I misread the sentence, as it seemed to me to
    be stating that members of the 68000 design team _also_ went somewhere
    else.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Sun Apr 26 20:41:04 2026
    From Newsgroup: comp.arch

    On Sat, 25 Apr 2026 21:56:22 +0000, quadi wrote:
    On Sat, 25 Apr 2026 18:00:22 +0000, MitchAlsup wrote:
    BGB <cr88192@gmail.com> posted:

    It likely depends on which "tradition" one is coming from.

    IBM 360, 1963.

    You're early. The IBM System/360 was announced on April 7, 1964.

    I decided to add another header type to the architecture. This led to the diagram of the header formats being the file block360.gif, and to it being
    704 pixels high.

    That seemed to me to be a good omen. (The IBM 704 was a vacuum-tube
    computer, IBM's first with core memory, hardware floating-point, and the
    first FORTRAN compiler was developed for it, so arguably it ushered in the "modern" era of computing, as opposed to the "pioneer" era.)

    But I don't let superstition rule me. It was apparent to me that the list
    of block formats had become really complicated - and that it wasn't doing
    what I wanted it to do.

    But the new block format I had _previously_ added showed me... that there
    was something that I really wanted to do that would _not_ have the unacceptable cost of 50% of the opcode space. Since I couldn't indicate whether the last instruction of the block ended there, or continued on,
    with only one bit corresponding to each 16-bit area, _this_ type of block header would have to have the restriction that instructions can't cross
    block boundaries.

    Which means the first 16-bit area following the 16-bit header must begin
    an instruction.

    Therefore, I only need 14 bits, not 15 bits.

    I just have to give up the paired 15-bit instructions, which nobody seems
    to like anyways, as part of the basic block-independent instruction set.

    Maybe, maybe, maybe, I am close to finding happiness, and can move on from
    the preliminary design phase to fleshing out the ISA... but based on past experience, when that does actually happen, it will be a pleasant
    *surprise*.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Mon Apr 27 00:34:13 2026
    From Newsgroup: comp.arch

    On Sun, 26 Apr 2026 20:41:04 +0000, quadi wrote:

    Which means the first 16-bit area following the 16-bit header must begin
    an instruction.

    Therefore, I only need 14 bits, not 15 bits.

    I just have to give up the paired 15-bit instructions, which nobody
    seems to like anyways, as part of the basic block-independent
    instruction set.

    I have now updated the pages on the Concertina II architecture to reflect
    this latest change.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Mon Apr 27 01:21:36 2026
    From Newsgroup: comp.arch

    On Sun, 26 Apr 2026 20:41:04 +0000, quadi wrote:

    Maybe, maybe, maybe, I am close to finding happiness, and can move on
    from the preliminary design phase to fleshing out the ISA... but based
    on past experience, when that does actually happen, it will be a
    pleasant *surprise*.

    And, if I haven't noted it already, the reason that I can even begin to entertain the delusion that I am making some sort of progress, rather than just going around in circles, as all appearances would indicate, is
    because in these last few weeks, I feel I have achieved my goal of
    squeezing a very large instruction set into limited opcode space to a
    greater degree than I had even hoped for previously.

    So I am in a position to be more than satisfied with the extent of my
    progress towards this fundamental goal of the Concertina II architecture.
    All I need is to smooth down a few rough edges, so I can be satisfied that
    the architecture is not too extravagantly ugly and inelegant (some degree
    of ugliness and inelegancy, of course, must be tolerated as a necessary
    means to achieve my unreasonably ambitious goals) and so on, and then I
    should be able to move on to the next phase.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch on Mon Apr 27 08:14:04 2026
    From Newsgroup: comp.arch

    On 26/04/2026 22:31, quadi wrote:
    On Sun, 26 Apr 2026 17:07:18 +0200, David Brown wrote:

    According to Wikipedia (I have no independent reference) the 6502 design
    team at MOS Technology had previously worked at Motorola on the 6800.

    I knew that, but apparently I misread the sentence, as it seemed to me to
    be stating that members of the 68000 design team _also_ went somewhere
    else.


    I can see how you could have made that misreading. But I'm glad you did
    - it made me look up the history of the 6502 and learn a little more.
    It's a device I used a lot as a kid in the BBC Micro, and one of the
    first processors I programmed in assembly, so it has nostalgic interest
    for me.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Sun Apr 26 23:43:48 2026
    From Newsgroup: comp.arch

    On 4/26/2026 6:21 PM, quadi wrote:
    On Sun, 26 Apr 2026 20:41:04 +0000, quadi wrote:

    Maybe, maybe, maybe, I am close to finding happiness, and can move on
    from the preliminary design phase to fleshing out the ISA... but based
    on past experience, when that does actually happen, it will be a
    pleasant *surprise*.

    And, if I haven't noted it already, the reason that I can even begin to entertain the delusion that I am making some sort of progress, rather than just going around in circles, as all appearances would indicate, is
    because in these last few weeks, I feel I have achieved my goal of
    squeezing a very large instruction set into limited opcode space to a
    greater degree than I had even hoped for previously.

    I don't think anyone else designing a CPU had the goal of "a very large instruction set". But if that was your goal, I think you have achieved
    it! :-(
    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Mon Apr 27 09:06:25 2026
    From Newsgroup: comp.arch

    On Sun, 26 Apr 2026 23:43:48 -0700, Stephen Fuld wrote:

    I don't think anyone else designing a CPU had the goal of "a very large instruction set". But if that was your goal, I think you have achieved
    it! :-(

    Well, nobody else may have had it as a _goal_. But others have certainly
    also _achieved_ that goal, even if they had never set it. The IBM System/
    360 and its descendants are a case in point.

    My intention was not to have a ridiculously large instruction set that is
    not comparable to those of existing computers; instead, it is to have one
    that is perhaps a bit larger than any existing computer, because it
    combines certain things from more than one architecture.

    Specifically:

    - like the 68000 and the x86, memory-reference instructions are to have 16-
    bit displacements.
    - like most RISC architectures, the register banks will include 32
    registers each. There will be no register that always contains zero, but register zero can appear to be zero for specific purposes such as indexing.
    - there will be full base-index addressing, like on the System/360.
    - the instruction set will combine the capabilities of the System/360 and
    the Cray I.

    And that was to be done, as far as possible, without making the
    instructions involved longer than their counterparts on the IBM System/
    360. That goal was not _strictly_ met, as it was an impossible goal, but
    it was approached. 16-bit short instructions are limited in capability compared to their counterparts on the 360; to fully equal them, 18-bit instructions, which require the overhead of a block header, are needed.
    Also, to fully equal the 48-bit string and packed decimal instructions of
    the 360, 64-bit instructions are required.

    A large instruction set, however, was a _requirement_, not a goal.
    Squeezing it into the available opcode space provided by not allowing the
    size of instructions to explode, making code significantly less compact
    than on the 360... *that* was the goal.

    I hope this has clarified my design philosophy.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Mon Apr 27 10:54:08 2026
    From Newsgroup: comp.arch

    On Mon, 27 Apr 2026 00:34:13 +0000, quadi wrote:
    On Sun, 26 Apr 2026 20:41:04 +0000, quadi wrote:

    Which means the first 16-bit area following the 16-bit header must
    begin an instruction.

    Therefore, I only need 14 bits, not 15 bits.

    I just have to give up the paired 15-bit instructions, which nobody
    seems to like anyways, as part of the basic block-independent
    instruction set.

    I have now updated the pages on the Concertina II architecture to
    reflect this latest change.

    Another side effect of this change, though, was that while I gained a 16-
    bit header for variable-length instructions, I lost the 32-bit header for variable-length instructions that allowed the 17-bit short instructions.

    I have figured out a reasonable way to bring that 32-bit header back at
    what I felt was an acceptable cost, so the pages have now been revised to include that latest change.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From John Levine@johnl@taugh.com to comp.arch on Mon Apr 27 17:16:24 2026
    From Newsgroup: comp.arch

    According to Stephen Fuld <sfuld@alumni.cmu.edu.invalid>:
    I don't think anyone else designing a CPU had the goal of "a very large >instruction set". But if that was your goal, I think you have achieved
    it! :-(

    Oh, I dunno. Look at the IBM 7030 STRETCH.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Mon Apr 27 19:27:05 2026
    From Newsgroup: comp.arch


    quadi <quadibloc@ca.invalid> posted:

    On Sun, 26 Apr 2026 23:43:48 -0700, Stephen Fuld wrote:

    I don't think anyone else designing a CPU had the goal of "a very large instruction set". But if that was your goal, I think you have achieved
    it! :-(

    Well, nobody else may have had it as a _goal_. But others have certainly also _achieved_ that goal, even if they had never set it. The IBM System/
    360 and its descendants are a case in point.

    My intention was not to have a ridiculously large instruction set that is not comparable to those of existing computers; instead, it is to have one that is perhaps a bit larger than any existing computer, because it
    combines certain things from more than one architecture.

    Specifically:

    - like the 68000 and the x86, memory-reference instructions are to have 16- bit displacements.
    - like most RISC architectures, the register banks will include 32
    registers each. There will be no register that always contains zero, but register zero can appear to be zero for specific purposes such as indexing.
    - there will be full base-index addressing, like on the System/360.
    - the instruction set will combine the capabilities of the System/360 and the Cray I.

    You might want to skip up to CRAY-XMP because they got the scatter/gather memory reference instructions.

    And that was to be done, as far as possible, without making the
    instructions involved longer than their counterparts on the IBM System/
    360. That goal was not _strictly_ met, as it was an impossible goal, but
    it was approached. 16-bit short instructions are limited in capability compared to their counterparts on the 360; to fully equal them, 18-bit instructions, which require the overhead of a block header, are needed. Also, to fully equal the 48-bit string and packed decimal instructions of the 360, 64-bit instructions are required.

    A large instruction set, however, was a _requirement_, not a goal.
    Squeezing it into the available opcode space provided by not allowing the size of instructions to explode, making code significantly less compact
    than on the 360... *that* was the goal.

    I hope this has clarified my design philosophy.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Mon Apr 27 23:55:14 2026
    From Newsgroup: comp.arch

    On Mon, 27 Apr 2026 19:27:05 +0000, MitchAlsup wrote:
    quadi <quadibloc@ca.invalid> posted:

    - the instruction set will combine the capabilities of the System/360
    and the Cray I.

    You might want to skip up to CRAY-XMP because they got the
    scatter/gather memory reference instructions.

    Actually, I might indeed, but at this stage I wasn't feeling the need to
    be too specific.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Tue Apr 28 02:33:03 2026
    From Newsgroup: comp.arch

    On Mon, 27 Apr 2026 10:54:08 +0000, quadi wrote:

    I have figured out a reasonable way to bring that 32-bit header back at
    what I felt was an acceptable cost, so the pages have now been revised
    to include that latest change.

    I have made another change to the Concertina II ISA basic design.

    This time, I thought of a nifty feature that ought to be added. So I made
    this change with some trepidation, since I was worried trying to do this
    would make the whole thing fall apart.

    But it turned out that I did have the opcode space for headers which
    allowed me to easily insert the addition, and other things also worked
    well enough that it was simple to make the addition without causing
    problems for other parts of the architecture.

    The addition? Now, in the header that provides VLIW features - now there
    is only _one_ such header, I have eliminated the ability to associate VLIW features with the variable-length instructions - it is possible to
    indicate the use of an alternate instruction set.

    This alternate instruction set is the regular instruction set with two modifications. The paired short instructions now take up 50% of the opcode space instead of 25%. So there are no restrictions on which registers may
    be used; but one instruction in a pair is always integer, and the other is floating-point. In order to make room for that, the load/store memory- reference instructions now only act on aligned operands when the alternate instruction set is specified.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Mon Apr 27 22:08:12 2026
    From Newsgroup: comp.arch

    On 4/27/2026 10:16 AM, John Levine wrote:
    According to Stephen Fuld <sfuld@alumni.cmu.edu.invalid>:
    I don't think anyone else designing a CPU had the goal of "a very large
    instruction set". But if that was your goal, I think you have achieved
    it! :-(

    Oh, I dunno. Look at the IBM 7030 STRETCH.

    Thanks for the pointer, John. I did look at it a little. It seems like
    a wild machine! Caveat - my comments below are not from an exhaustive
    study of the ISA.

    But the big difference between the 7030 and John's system is that for
    the 7030, the huge multiplicity is in the number of data formats
    supported, not the instructions. The 7030 has only two instruction
    lengths, 32 and 64 bits, and, as far as I can tell, no instruction
    headers for blocks of instructions. And the complexity of different
    data formats seems put in various "modifier" bits in the instruction,
    not in the op code. The 7030 uses the same "trick" that Mitch uses, eliminating the subtract op code by having sign modifier bits in the non
    op code part of the instruction.

    So, while I certainly haven't done the count, I suspect John's ISA has
    far more op codes (especially if you count the same operation in
    different length instructions as different) than the 7030.
    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Tue Apr 28 14:18:30 2026
    From Newsgroup: comp.arch

    On Mon, 27 Apr 2026 22:08:12 -0700, Stephen Fuld wrote:

    But the big difference between the 7030 and John's system is that for
    the 7030, the huge multiplicity is in the number of data formats
    supported, not the instructions. The 7030 has only two instruction
    lengths, 32 and 64 bits, and, as far as I can tell, no instruction
    headers for blocks of instructions. And the complexity of different
    data formats seems put in various "modifier" bits in the instruction,
    not in the op code.

    Yes; in fairness to the IBM 7030, in the IBM 360 one doesn't count the MVC instruction as 256 different instructions.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Tue Apr 28 17:48:09 2026
    From Newsgroup: comp.arch


    quadi <quadibloc@ca.invalid> posted:

    On Mon, 27 Apr 2026 22:08:12 -0700, Stephen Fuld wrote:

    But the big difference between the 7030 and John's system is that for
    the 7030, the huge multiplicity is in the number of data formats
    supported, not the instructions. The 7030 has only two instruction lengths, 32 and 64 bits, and, as far as I can tell, no instruction
    headers for blocks of instructions. And the complexity of different
    data formats seems put in various "modifier" bits in the instruction,
    not in the op code.

    Yes; in fairness to the IBM 7030, in the IBM 360 one doesn't count the MVC instruction as 256 different instructions.

    Or LDM as up to 256 instructions, either.
    STM, ...

    And how many instructions should one attribute to SIO ??

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Tue Apr 28 21:04:36 2026
    From Newsgroup: comp.arch

    On Tue, 28 Apr 2026 02:33:03 +0000, quadi wrote:

    The addition? Now, in the header that provides VLIW features - now there
    is only _one_ such header, I have eliminated the ability to associate
    VLIW features with the variable-length instructions - it is possible to indicate the use of an alternate instruction set.

    For clarification: I reduced the number of headers which provide VLIW
    features to only one somewhat earlier, not at the same time as when the alterate instruction set was added.

    Now I have made another change - there was enough opcode space available
    for headers that I was able to make the alternate instruction set
    available with the two other types of header that do not indicate variable- length instructions.

    I am in no rush to add to the alternate instruction set special additional instructions that start with 0110, 01110, and 011110, all that opcode
    space being free in the alternate instruction set, but it's good to know
    that such reserve capacity is conveniently available.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Wed Apr 29 17:36:46 2026
    From Newsgroup: comp.arch

    When I changed the instruction set so that I could have variable-length instructions in a block - with some restrictions - with a header that was
    only 16 bits long, I gave up one important thing.

    The paired 15-bit short instructions, while they could still be used in instruction slots other than the first one without a header, couldn't be
    used in the first one. So neither they, nor any other version of short instructions, could be used in code that ignored block boundaries.

    Well, I have now dropped one header type and added in a limited and
    restricted form of paired short instructions that _can_ always be used in
    any instruction slot, ignoring block boundaries.

    Oops, it can only be used in the first slot because it conflicts with the paired 15-bit short instructions! What will be the best way to fix that...

    John Savard

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Wed Apr 29 21:08:57 2026
    From Newsgroup: comp.arch

    On Wed, 29 Apr 2026 17:36:46 +0000, quadi wrote:

    Oops, it can only be used in the first slot because it conflicts with
    the paired 15-bit short instructions! What will be the best way to fix that...

    Finding no satisfactory way to make a paired short instruction that fits always, first I corrected my addition to only be for 32-bit mode, where
    there is no collision, and then I removed the addition entirely - without removing what I subtracted - because _that_ was unimportant enough, the
    large area of opcode space for headers that removing it freed up ought
    instead to be left for something more badly needed, when such a thing is found.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Mon Apr 27 16:29:34 2026
    From Newsgroup: comp.arch

    quadi [2026-04-27 09:06:25] wrote:
    On Sun, 26 Apr 2026 23:43:48 -0700, Stephen Fuld wrote:
    I don't think anyone else designing a CPU had the goal of "a very large
    instruction set". But if that was your goal, I think you have achieved
    it! :-(
    Well, nobody else may have had it as a _goal_. But others have certainly also _achieved_ that goal, even if they had never set it. The IBM System/
    360 and its descendants are a case in point.

    Arguably the Itanic was designed with such a goal as well where the
    "size" was measured as a kind of "patentability".


    === Stefan
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Thu Apr 30 10:41:01 2026
    From Newsgroup: comp.arch

    On Mon, 27 Apr 2026 16:29:34 -0400
    Stefan Monnier <monnier@iro.umontreal.ca> wrote:

    quadi [2026-04-27 09:06:25] wrote:
    On Sun, 26 Apr 2026 23:43:48 -0700, Stephen Fuld wrote:
    I don't think anyone else designing a CPU had the goal of "a very
    large instruction set". But if that was your goal, I think you
    have achieved it! :-(
    Well, nobody else may have had it as a _goal_. But others have
    certainly also _achieved_ that goal, even if they had never set it.
    The IBM System/ 360 and its descendants are a case in point.

    Arguably the Itanic was designed with such a goal as well where the
    "size" was measured as a kind of "patentability".


    === Stefan

    But measured by more conventional definition of size, IPF instruction
    set is pretty small, esp. when compared to iAMD64 or ARM64. I'd think
    that even POWER is at least twice bigger than IPF.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Thu Apr 30 16:41:35 2026
    From Newsgroup: comp.arch

    Based on the ways I found to squeeze in more opcode space, I've decided to
    go back to an earlier version because I believe I can now achieve what I
    had been unable to do before: have both the four-bit prefix portion of a
    block header for variable length instructions, and allow the original 15-
    bit paired short instructions in any instruction slot. The primary step required to achieve this was to determine that, while most compromises for
    the Load Address instruction were not acceptable, _one_ compromise _was_ acceptable, letting me grab the opcode space I needed.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Thu Apr 30 22:59:18 2026
    From Newsgroup: comp.arch

    Now, finally, I have been able to combine:

    - Basic memory-reference instructions without some strange restriction on addressing modes,
    - Paired 15-bit short instructions that are able to appear in any slot,
    - A 32-bit header for variable-length instructions that has two prefix
    bits for every 16 bit slot remaining

    It has taken a long time to find just the right way to squish everything
    in.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2