• Random Idea: Densely Packed Ternary

    From BGB@cr88192@gmail.com to comp.arch on Thu Mar 19 01:59:30 2026
    From Newsgroup: comp.arch

    So, idea:
    Schemes for densely packing ternary values in a way that allows for
    cheap hardware decoding (assuming implementation on binary hardware).

    Likely mapping here would map 0/1/-1 as 0/1/2

    Encoding scheme here is partly inspired by Chen-Ho, but this is not the
    only possibility here.

    The idea here is to try to "factor" things such that decoding can look
    at fewer than the full 7/8 bits to decode each trit.


    So, first idea, 4 trits in 7 bits:
    000abcd //abcd: all 0 or 1
    001xxxx //abcd=2 (well, put this somewhere in here)
    01yyxxx //-
    1000bcd //bcd=0|1, a=2
    1001acd //acd=0|1, b=2
    1010abd //abd=0|1, c=2
    1011abc //abc=0|1, d=2

    11000ab //cd=2
    11001ac //bd=2
    11010bc //ad=2
    11011bd //ac=2
    11100cd //ab=2
    11101ad //bc=2
    11110xx //[a/b/c/d]=0, default=2
    11111xx //[a/b/c/d]=1, default=2



    There is enough encoding space in theory, but I am having a harder time
    coming up with a good packing scheme for the 5 trit case (space is
    tighter, and the 5 factor is also being annoying).

    Though, one possible case could be:
    Most cases:
    Assume e=0|1, so just stick on another 'e' bit.
    So, leverage the 7-bit scheme as the starter.
    Then shove the 'e' trit into the remaining spaces.


    Some other cases:
    000abcde //abcde: all 0 or 1

    00100xx0 //abcde=2
    00101xx0 //-

    00110xx0 //11110xx, e=2
    00111xx0 //11111xx, e=2

    001abcd1 //abcd=0|1, e=2

    0100xxx0 //1000xxx, e=2
    0100xxx1 //1001xxx, e=2
    0101xxx0 //1010xxx, e=2
    0101xxx1 //1011xxx, e=2

    01100xx0 //11100xx, e=2
    01100xx1 //11101xx, e=2

    01110xx0 //11000xx, e=2
    01110xx1 //11010xx, e=2
    01111xx0 //11001xx, e=2
    01111xx1 //11011xx, e=2

    1xxxxxxx //punch through to 7b case, e=0|1


    But, unclear if this is the best strategy...



    Could partly allow the 4 and 5 trit cases to share some of the same
    lookup tables, but may not benefit much in the face of "combinatorial
    logic evilness" (if you MUX on the input side, then synthesis tends to
    clone all the logic).

    One option could be to put the 'e' bit on the MSB side, but this breaks
    the pattern. One other option could be to always decode the 7 bit case
    as-if it were the 8 bit case, but then ignore the 'e' trit in cases
    where only 4 trits are expected.

    Another option being to decode the 7 bit case the same as the 8-bit
    case, just the LSB is ignored (as is the 'e' trit of the output)


    Though, with any sort of DPT scheme, a lot would depend on how it is
    intended to be unpacked and used.



    Main use case I can think of ATM are mostly things like multiplying SIMD vectors by ternary values (say, to mask or sign-invert elements).

    Say, for example:
    dv = sv * (__vec4f_t) { 1.0, 1.0, -1.0, 0.0 };

    But, maybe other use-cases could exist.

    But, in normal code, even SIMD code, this sort of thing may not be
    common enough to justify a special instruction for it (vs, say, loading
    the vector into a register and doing a normal SIMD multiply).


    Granted, even in the absence of a solid use-case, or explicit CPU
    support, could still use normal RAM-based lookup tables.

    Though, the relative merit of a more complex encoding scheme is
    diminished in the case of normal RAM based lookup tables (it doesn't
    hurt per-se, but does require more complex logic to initialize the
    lookup tables, so this could arguably be a drawback).



    But, any thoughts?...

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Thu Mar 19 11:51:50 2026
    From Newsgroup: comp.arch

    On Thu, 19 Mar 2026 01:59:30 -0500
    BGB <cr88192@gmail.com> wrote:

    So, idea:
    Schemes for densely packing ternary values in a way that allows for
    cheap hardware decoding (assuming implementation on binary hardware).


    That's not very interesting.
    It's obvious that 5t->8b is best and there are multiple ways of doing it cheaply.
    There is no denser T->B packing until 17t->27b, which is obviously
    impractical.

    An opposite is a little more interesting, i.e. packing binary data to
    ternary storage.
    There could be two separate challenges: do it in the way that is easy
    to implement on binary logic or do it in a way that is easy to
    implement in ternary logic. Not that I know which basic ops are easy in
    ternary logic.
    The most likely candidate is 11b->7t, but figuring out details can be interesting for somebody with a lot of spare time at hand.

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Thu Mar 19 05:39:57 2026
    From Newsgroup: comp.arch

    On 3/19/2026 4:51 AM, Michael S wrote:
    On Thu, 19 Mar 2026 01:59:30 -0500
    BGB <cr88192@gmail.com> wrote:

    So, idea:
    Schemes for densely packing ternary values in a way that allows for
    cheap hardware decoding (assuming implementation on binary hardware).


    That's not very interesting.
    It's obvious that 5t->8b is best and there are multiple ways of doing it cheaply.
    There is no denser T->B packing until 17t->27b, which is obviously impractical.


    Granted...


    An opposite is a little more interesting, i.e. packing binary data to
    ternary storage.
    There could be two separate challenges: do it in the way that is easy
    to implement on binary logic or do it in a way that is easy to
    implement in ternary logic. Not that I know which basic ops are easy in ternary logic.
    The most likely candidate is 11b->7t, but figuring out details can be interesting for somebody with a lot of spare time at hand.


    Possible, though I realized after doing a mock-up that my original idea
    failed at one of its primary goals:
    To be significantly cheaper to decode in hardware than the naive
    polynomial...

    Or, naive: enc=va+vb*3+vc*9+vd*27

    It failed in that it would still require looking at all 7 bits of the
    4t/7b case to determine the value of each trit.


    I was then fiddling with other schemes, to hopefully find one that
    allows for cheaper decoding (namely, requiring at most 5 or 6 bits to be examined to determine the value of any given trit).

    But, at the moment, I haven't figured out a good encoding scheme that
    could achieve this goal...


    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Thu Mar 19 12:46:03 2026
    From Newsgroup: comp.arch

    On 3/19/2026 5:39 AM, BGB wrote:
    On 3/19/2026 4:51 AM, Michael S wrote:
    On Thu, 19 Mar 2026 01:59:30 -0500
    BGB <cr88192@gmail.com> wrote:

    So, idea:
    Schemes for densely packing ternary values in a way that allows for
    cheap hardware decoding (assuming implementation on binary hardware).


    That's not very interesting.
    It's obvious that 5t->8b is best and there are multiple ways of doing it
    cheaply.
    There is no denser T->B packing until 17t->27b, which is obviously
    impractical.


    Granted...


    An opposite is a little more interesting, i.e. packing binary data to
    ternary storage.
    There could be two separate challenges: do it in the way that is easy
    to implement on binary logic or do it in a way that is easy to
    implement in ternary logic. Not that I know which basic ops are easy in
    ternary logic.
    The most likely candidate is 11b->7t, but figuring out details can be
    interesting for somebody with a lot of spare time at hand.


    Possible, though I realized after doing a mock-up that my original idea failed at one of its primary goals:
    To be significantly cheaper to decode in hardware than the naive polynomial...

    Or, naive: enc=va+vb*3+vc*9+vd*27

    It failed in that it would still require looking at all 7 bits of the
    4t/7b case to determine the value of each trit.


    I was then fiddling with other schemes, to hopefully find one that
    allows for cheaper decoding (namely, requiring at most 5 or 6 bits to be examined to determine the value of any given trit).

    But, at the moment, I haven't figured out a good encoding scheme that
    could achieve this goal...


    Next scheme:
    00ab0cd //abcd: all 0 or 1
    00ab10c //abc2
    00ab11d //ab2d

    010a0cd //a2cd
    011b0cd //2bcd
    010a10c //a2c2
    011b10c //2bc2

    010a11d //a22d
    011b11d //2b2d

    10xx0xx //-

    10ab10x //ab22
    110x0cd //22cd

    100a11x //a222
    101b11x //2b22
    111x00c //22c2
    111x01d //222d
    111x11x //2222

    The 7b/4t trace able to decode each trit using 4b of lookup.

    The 5t case would be encoded in a similar way to before:
    All the defined 4t cases punch through to having an extra 'e' bit (0/1);
    The undefined cases in this scheme punch through to the "e=2" cases.

    00ab0cde //abcde
    10ab0cd0 //abcd2

    10ab100e //ab22e
    11000cde //22cde
    10ab1010 //ab222
    11010cd0 //22cd2

    100a110e //a222e
    101b110e //2b22e
    100a1110 //a2222
    101b1110 //2b222

    111000ce //22c2e
    111001de //222de
    111100c0 //22c22
    111101d0 //222d2

    111011xe //2222e
    111111x0 //22222

    Which still needs 4b lookup to decode a/b/c/d, but 6b for 'e'.

    ...


    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Thu Mar 19 22:30:10 2026
    From Newsgroup: comp.arch


    BGB <cr88192@gmail.com> posted:

    So, idea:
    Schemes for densely packing ternary values in a way that allows for
    cheap hardware decoding (assuming implementation on binary hardware).

    Likely mapping here would map 0/1/-1 as 0/1/2

    Consider binary NAND gate, then
    consider binary NOR gate with inverted inputs !

    All you are doing is assigning a value to a voltage.

    -----------------------------
    But, any thoughts?...

    never going to fly so I waste no time on assembling a failure.
    smatter people than me discarded it long ago in the age of tubes
    where it was "not that hard" to build trit-gates and storage.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Robert Finch@robfi680@gmail.com to comp.arch on Thu Mar 19 19:17:08 2026
    From Newsgroup: comp.arch

    On 2026-03-19 6:30 p.m., MitchAlsup wrote:

    BGB <cr88192@gmail.com> posted:

    So, idea:
    Schemes for densely packing ternary values in a way that allows for
    cheap hardware decoding (assuming implementation on binary hardware).

    Likely mapping here would map 0/1/-1 as 0/1/2

    Consider binary NAND gate, then
    consider binary NOR gate with inverted inputs !

    All you are doing is assigning a value to a voltage.

    -----------------------------
    But, any thoughts?...

    never going to fly so I waste no time on assembling a failure.
    smatter people than me discarded it long ago in the age of tubes
    where it was "not that hard" to build trit-gates and storage.

    Now there are quantum computers and I think they are four-state.
    So maybe they could handle ternary.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Fri Mar 20 00:54:39 2026
    From Newsgroup: comp.arch

    On 3/19/2026 6:17 PM, Robert Finch wrote:
    On 2026-03-19 6:30 p.m., MitchAlsup wrote:

    BGB <cr88192@gmail.com> posted:

    So, idea:
    Schemes for densely packing ternary values in a way that allows for
    cheap hardware decoding (assuming implementation on binary hardware).

    Likely mapping here would map 0/1/-1 as 0/1/2

    Consider binary NAND gate, then
    consider binary NOR  gate with inverted inputs !

    All you are doing is assigning a value to a voltage.

    -----------------------------
    But, any thoughts?...

    never going to fly so I waste no time on assembling a failure.
    smatter people than me discarded it long ago in the age of tubes
    where it was "not that hard" to build trit-gates and storage.

    Now there are quantum computers and I think they are four-state.
    So maybe they could handle ternary.

    FWIW:
    Storing 4 states as bits is a lot easier than storing 3 states as bits
    (since each 4-state can be stored as 2 bits).

    Granted, one can store trits as 2-bits each, but this costs more bits.




    But, yeah, my post from earlier was missing some states, and fitting
    these last states in (for the 5t/8b case) in a way that doesn't wreck lookup-size is proving a little more difficult.

    But, yeah, seems if I can't fully crack this problem, this leaves the
    use of a lookup table as the more practical option.

    It works OK for the 4-trits in 7-bits case, but thus far my efforts are
    still failing for the 5 trit case...

    As noted, 4-trit pattern:
    0 0ab 0cd //abcd: all 0 or 1
    0 0ab 10c //abc2
    0 0ab 11d //ab2d

    0 10a 0cd //a2cd
    0 11b 0cd //2bcd

    0 10a 10c //a2c2
    0 11b 10c //2bc2

    0 10a 11d //a22d
    0 11b 11d //2b2d

    1 0xx 0xx //-

    1 0ab 10x //ab22
    1 00a 110 //a222
    1 01b 110 //2b22

    1 10x 0cd //22cd
    1 110 00c //22c2
    1 110 01d //222d

    1 110 110 //2222

    Can resolve any trit with a 4 bit lookup...



    Stuff falls on its face for the 5-trit / 8-bit case:

    0 0ab 0cd e //abcde: all 0 or 1

    0 0ab 10c e //abc2e
    0 0ab 11d e //ab2de

    0 10a 0cd e //a2cde
    0 11b 0cd e //2bcde

    0 10a 10c e //a2c2e
    0 11b 10c e //2bc2e
    0 10a 11d e //a22de
    0 11b 11d e //2b2de

    1 0ab 0cd 0 //abcd2
    1 00a 00c 1 //a2c22
    1 01b 00c 1 //2bc22
    1 00a 01d 1 //a22d2
    1 01b 01d 1 //2b2d2

    1 0ab 100 e //ab22e
    1 00a 110 e //a222e
    1 01b 110 e //2b22e

    1 0ab 101 0 //ab222
    1 00e 111 0 //2222e
    1 010 111 0 //22222
    1 0ab 1c1 1 //abc22

    1 100 0cd e //22cde
    1 110 00c e //22c2e
    1 110 01d e //222de

    1 101 0cd 0 //22cd2
    1 111 00c 0 //22c22
    1 111 01d 0 //222d2
    1 1a1 0cd 1 //a2cd2

    1 10a 10x 0 //a2222
    1 10b 11x 0 //2b222
    1 110 10c 0 //22c22
    1 110 11d 0 //222d2

    1 1ab 10d 1 //ab2d2
    1 1db 11c 1 //2bcd2

    Where the logic still needs to look at all 8-bits to decode each trit,
    even if it can be done with a decision tree...

    But, the decision tree can't ignore any of the bits per trit.

    The goal would have been 6 bits, but I seem to have failed after trying
    to get all the cases fit in.



    Issue I think is that with 243 states in 8 bits, there isn't any real
    way to organize things to avoid needing all of the bits to fully
    disambiguate some cases...

    Whereas, it is a little easier to fit 81 states in 7 bits.


    Having beaten at this basically all day, I think I may be defeated on
    this for the time being.

    Well, either this, or I am just not being smart enough for this level of mental puzzle.


    Annoyingly, can't use a brute force as, even though the problem looks
    simple enough, a brute force search through this combination space would likely take a very long time.

    I guess, it could be like the branch-predictor FSM where one possible
    option could be to throw a genetic algorithm at the problem. But, this
    may be a problem for another day.


    ...


    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Niklas Holsti@niklas.holsti@tidorum.invalid to comp.arch on Fri Mar 20 11:53:52 2026
    From Newsgroup: comp.arch

    On 2026-03-20 1:17, Robert Finch wrote:
    On 2026-03-19 6:30 p.m., MitchAlsup wrote:

    BGB <cr88192@gmail.com> posted:

    So, idea:
    Schemes for densely packing ternary values in a way that allows for
    cheap hardware decoding (assuming implementation on binary hardware).

    Likely mapping here would map 0/1/-1 as 0/1/2

    Consider binary NAND gate, then
    consider binary NOR  gate with inverted inputs !

    All you are doing is assigning a value to a voltage.

    -----------------------------
    But, any thoughts?...

    never going to fly so I waste no time on assembling a failure.
    smatter people than me discarded it long ago in the age of tubes
    where it was "not that hard" to build trit-gates and storage.

    Now there are quantum computers and I think they are four-state.

    I don't see in what sense qubits are "four-state". AIUI the state of a
    qubit is a superposition of "0" and "1" where the "proportions" of the
    two pure values can depend (via entangling) in a complex way on the
    states of other qubits. At the end of a quantum computation the
    "read-out" value of a qubit is 0 or 1, so binary.

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Fri Mar 20 05:18:19 2026
    From Newsgroup: comp.arch

    On 3/20/2026 4:53 AM, Niklas Holsti wrote:
    On 2026-03-20 1:17, Robert Finch wrote:
    On 2026-03-19 6:30 p.m., MitchAlsup wrote:

    BGB <cr88192@gmail.com> posted:

    So, idea:
    Schemes for densely packing ternary values in a way that allows for
    cheap hardware decoding (assuming implementation on binary hardware).

    Likely mapping here would map 0/1/-1 as 0/1/2

    Consider binary NAND gate, then
    consider binary NOR  gate with inverted inputs !

    All you are doing is assigning a value to a voltage.

    -----------------------------
    But, any thoughts?...

    never going to fly so I waste no time on assembling a failure.
    smatter people than me discarded it long ago in the age of tubes
    where it was "not that hard" to build trit-gates and storage.

    Now there are quantum computers and I think they are four-state.

    I don't see in what sense qubits are "four-state". AIUI the state of a
    qubit is a superposition of "0" and "1" where the "proportions" of the
    two pure values can depend (via entangling) in a complex way on the
    states of other qubits. At the end of a quantum computation the "read-
    out" value of a qubit is 0 or 1, so binary.


    Yeah.


    I had missed that this was in reference to qubits, but yeah.

    Like, qubits would be represented as some sort of entangled state, and
    then the result is measured by "collapsing" the state and seeing where
    it lands.


    Meanwhile, ironically, my idea here ATM had noting to do with "actual"
    ternary (in the electronics sense), rather debating if it could make
    sense in some niche use-cases in an otherwise binary context.

    But, at present, it isn't looking all that promising.
    Meh results;
    Not much in terms of a strong use-case (where one would want such an encoding).


    Only the 4t/7b case is looking promising ATM, and even then, still not
    an obvious use-case. Using it so negate or zero SIMD elements is
    possible, but not a strong use-case.

    Even then, in the hypothetical SIMD use-case, would still need to
    compete against ops that use a 2-bit scheme, IIRC:
    00: Zero
    01: Keep
    10: Negate
    11: Bitwise NOT (more typically for integer SIMD).

    And, probably don't need to add more ops that don't have strong
    use-cases ATM. Even if ironically, my SIMD stuff is still pretty minimal
    if compared with RV-P or RV-V (both of which went a little excessive on "features").


    ...


    --- Synchronet 3.21d-Linux NewsLink 1.2