• 16-bit fp constants

    From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat Apr 25 12:21:51 2026
    From Newsgroup: comp.arch

    One idea for squeezing in some more frequently used constants into
    a 16-bit floating point formats is to allow some space for periodic
    binary fractions in the last few bits.

    An example of a possible encoding:

    One bit for sign.
    Four bits of exponent.
    Two bits to encode the length of the final period:
    00 for period 2
    01 for period 3
    10 for period 4
    11 for period 6
    Nine bits of mantissa

    This would allow exact encoding of constants like 0.1 (period 4),
    1./3. (period 2), 1./7 (period 3) or 1./9. (period 6). Fractions
    of two could then be encoded with 00 as the final two bits.
    Rounding would be included.

    1./25. would not work because of a peridod of length 20.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Sat Apr 25 12:49:52 2026
    From Newsgroup: comp.arch

    On 4/25/2026 7:21 AM, Thomas Koenig wrote:
    One idea for squeezing in some more frequently used constants into
    a 16-bit floating point formats is to allow some space for periodic
    binary fractions in the last few bits.

    An example of a possible encoding:

    One bit for sign.
    Four bits of exponent.
    Two bits to encode the length of the final period:
    00 for period 2
    01 for period 3
    10 for period 4
    11 for period 6
    Nine bits of mantissa

    This would allow exact encoding of constants like 0.1 (period 4),
    1./3. (period 2), 1./7 (period 3) or 1./9. (period 6). Fractions
    of two could then be encoded with 00 as the final two bits.
    Rounding would be included.

    1./25. would not work because of a peridod of length 20.


    Seems like a clever idea, may need to look into this, and evaluate its
    cost and effectiveness.


    I guess it differs from my recent 32-bit packing schemes in that it
    specifies the period directly, and would need some logic to deal with predicting the final rounding, rather than storing it directly.

    the comparably shorter periods are also less of an issue for a smaller encoding if assuming the pattern is taken from the low bits of the mantissa.


    Does a quick mock-up in my "big stats tool" (running stats for all the C
    files on one of my drives):
    32-bit format: 77% hit rate (combined 33b format).
    16-bit format: 74% hit rate (combined with Binary16 into 17b format)

    Formats in isolation:
    Binary16 : 63.7%
    16-bit packed format: 58.4%

    So, in isolation it does worse on average than Binary16, but if it and Binary16 are glued together, the average hit rate is higher than either
    of them.


    Cheaper hardware encoding could be possible if the format were designed
    to keep more in common with Binary16, say:
    0,S.E5.M10
    1,S.E5.M8.P2

    If actual 16-bits are needed, could sacrifice the sign bit as FPU
    immediate values are overwhelmingly positive.


    Then, the high-order bits are unpacked the same as for Binary16, but if
    the high bit is set it tries to unpack the repeating pattern (into a
    12-bit GCF of similar, with the last partial segment applying RNE).

    I have yet to evaluate how this would effect hit-rate though.



    Could maybe look into something like this as I am currently sticking a
    16-bit FP value into an Imm17 spot, which still leaves 1 bit to work as
    a selector between regular Binary16 and a more densely packed format;
    And is more friendly to XG1/XG2 and my RV+JX schemes (where the Jumbo
    prefix being able to expand a register field to 33 bits, is not a thing
    due to running out of bits, *1).

    *1: Where, despite XG3 being partly a bit-repack of XG2, they are not
    strictly 1:1 in some cases as there have ended up being some amount of
    minor encoding tweaks as well (and some features that may depend on
    which ISA one is running).

    Though, I was originally going to make a more drastic change of
    relocating the branch instructions from the F0 to the F8 block, but then
    this would have turned into a much bigger pain for the decoder (which
    needs to deal with all 3 ISA variants), to I moved the branches back to
    their original location.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Sat Apr 25 22:25:53 2026
    From Newsgroup: comp.arch

    On 4/25/2026 12:49 PM, BGB wrote:
    On 4/25/2026 7:21 AM, Thomas Koenig wrote:
    One idea for squeezing in some more frequently used constants into
    a 16-bit floating point formats is to allow some space for periodic
    binary fractions in the last few bits.

    An example of a possible encoding:

    One bit for sign.
    Four bits of exponent.
    Two bits to encode the length of the final period:
    00 for period 2
    01 for period 3
    10 for period 4
    11 for period 6
    Nine bits of mantissa

    This would allow exact encoding of constants like 0.1 (period 4),
    1./3. (period 2), 1./7 (period 3) or 1./9. (period 6).  Fractions
    of two could then be encoded with 00 as the final two bits.
    Rounding would be included.

    1./25. would not work because of a peridod of length 20.


    Seems like a clever idea, may need to look into this, and evaluate its
    cost and effectiveness.

    ...



    Cheaper hardware encoding could be possible if the format were designed
    to keep more in common with Binary16, say:
      0,S.E5.M10
      1,S.E5.M8.P2

    If actual 16-bits are needed, could sacrifice the sign bit as FPU
    immediate values are overwhelmingly positive.


    Then, the high-order bits are unpacked the same as for Binary16, but if
    the high bit is set it tries to unpack the repeating pattern (into a 12-
    bit GCF of similar, with the last partial segment applying RNE).

    I have yet to evaluate how this would effect hit-rate though.


    Tested, switching from S.E4.M9.P2 to S.E5.M8.P2 results in hit-rate
    increasing by around 0.3% ...

    It seems the 4-bit exponent was being the bigger limitation here.

    Either way, the basic strategy appears to be effective in this case.


    --- Synchronet 3.21f-Linux NewsLink 1.2