According to Waldek Hebisch <antispam@fricas.org>:
quadi <quadibloc@ca.invalid> wrote:
I remember having read one article in a computer magazine where someone >>> mentioned that an unfortunate result of the transition from the IBM 7090 >>> to the IBM System/360 was that a lot of FORTRAN programs that were able to >>> use ordinary real nubers had to be switched over to double precision to >>> yield acceptable results.
Note that IBM floating point format effectively lost about 3 bits of >>accuracy compared to modern 32-bit format. I am not sure how much they >>lost compared to IBM 7090 but it looks that it was at least 5 bits. >>Assuming that accuracy requirements are uniformly distributed between
20 and say 60 bits, we can estimate that loss of 5 bits affected about
25% (or more) of applications that could run using 36-bits. That is
"a lot" of programs.
But it does not mean that 36-bits are somewhat magical. Simply, given >>36-bit machine original author had extra motivation to make sure that
the program run in 36-bit floating point.
It's worse than that, because the 360's floating point had wobbling precision.
Depending on the number of leading zero bits in the fraction it could lose anywhere from 1 to 5 bits of precision compared to a rounded binary format. Hence the badness of the result depended more than usual on the input
data.
Well, IBM format had twice the rage of IEEE format, so effectively one
bit moved from mantissa to exponent. Looking at representable values
except at low end of the range only nomalized values matter. In
hex format 15/16 of values are normalized, ...
According to Waldek Hebisch <antispam@fricas.org>:
Well, IBM format had twice the rage of IEEE format, so effectively one
bit moved from mantissa to exponent. Looking at representable values >>except at low end of the range only nomalized values matter. In
hex format 15/16 of values are normalized, ...
That's the same mistake IBM made when they designed the 360's FP.
Leading fraction digits are geometrically distributed, not linearly.
(Look at a slide rule to see what I mean.)
There are on average two leading zeros so only half of the values are normalized.
Quadi, have your computer architectures included IBM 360 floating point support? There is probably more demand for that than for 36-bit these
days.
According to Waldek Hebisch <antispam@fricas.org>:
Well, IBM format had twice the rage of IEEE format, so effectively one
bit moved from mantissa to exponent. Looking at representable values >>except at low end of the range only nomalized values matter. In hex
format 15/16 of values are normalized, ...
That's the same mistake IBM made when they designed the 360's FP.
Leading fraction digits are geometrically distributed, not linearly.
(Look at a slide rule to see what I mean.)
There are on average two leading zeros so only half of the values are
normalized.
No. By _definition_ hex floating point number is normalized if and
only if its leading hex digit is different than zero.
According to Waldek Hebisch <antispam@fricas.org>:
There are on average two leading zeros so only half of the values are
normalized.
No. By _definition_ hex floating point number is normalized if and
only if its leading hex digit is different than zero.
I wrote sloppily. On average a normalized hex FP number has two leading zeros so you lose another bit compared to binary, in addition to what you lose by no hidden bit and no rounding.
On Sun, 15 Feb 2026 14:37:00 +0000, John Dallman wrote:
Quadi, have your computer architectures included IBM 360 floating point
support? There is probably more demand for that than for 36-bit these
days.
Yes, in fact they have. The goal there is to facilitate data interchange
and emulation, not to provide better quality floating-point arithmetic... since, of course, it provides rather the opposite, as has been discussed
in this thread.
The original CISC Concertina I architecture went further; it had the goal
of being able to natively emulate the floating-point of just about every computer ever made.
quadi <quadibloc@ca.invalid> wrote:
On Sun, 15 Feb 2026 14:37:00 +0000, John Dallman wrote:
Quadi, have your computer architectures included IBM 360 floating point
support? There is probably more demand for that than for 36-bit these
days.
Yes, in fact they have. The goal there is to facilitate data interchange and emulation, not to provide better quality floating-point arithmetic... since, of course, it provides rather the opposite, as has been discussed in this thread.
The original CISC Concertina I architecture went further; it had the goal of being able to natively emulate the floating-point of just about every computer ever made.
That was probably already written, but since you are revising your
design it may be worth stating some facts. If you have 64-bit
machine with convenient access to 32-bit, 16-bit and 8-bit parts
you can store any number of bits between 4 and 64 wasting at most
50% of storage and have simple access to each item. So in terms
of memory use you are trying to avoid this 50% loss. In practice
loss will be much smaller because:
- power of 2 quantities are quite popular
- when program needs large number of items of some other size
programmer is likely to use packing/unpacking routines, keeping
data is space efficient packed formant for most time and unpacking
it for processing
- machine with fast bit-extract/bit-insert instruction can perform
most operation quite fast even on packed data
so possible gain in memory consumption is quite low. Given that
non-standard memory modules and support chips tend to be much more
expensive than standard ones, economically attempting such savings
make no sense.
Of course, that is also question of speed. The argument above shows
that loss of speed on access itself can be quite small. So what
remains is speed of processing data. As long as you do processing
on power of 2 sized items (that is unusual sizes are limited to
storage), loss of speed can be modest, basically dedicated 36-bit
machine probably can do 2 times as much 36-bit float operations
as standard machine can do 64-bit operations. Practically, this
loss will be than loss of storage, but still does not look significant
enough to warrant developement of special machine.
Things are somewhat different when you want bit-accurate result
using old formats. Here already one-complement arithmetic has
significant overhead on two-complement machine.
And emulating
old floating point formats is mare expensive. OTOH, modern
machines are much faster than old ones. For example modern CPU
seem to be more than 1000 times faster than real CDC-6600, so
even slow emulation is likely to be faster than real machine,
which means that emulated machine can do the work of orignal
one.
So to summarize: practical consideration leave rather small space
for machine using non-power-of-two formats, and it is rather
unlikely that any design can fit there.
Of course, there is very good reason to expore non-mainstream
approaches, namely having fun. But once you realize that
mainstream designs make their choices for good reasons,
exploring alternatives gets less funny (at least for me).
But once you realize that mainstream
designs make their choices for good reasons,
exploring alternatives gets less funny (at least for me).
On Tue, 17 Feb 2026 20:43:35 +0000, Waldek Hebisch wrote:
But once you realize that mainstream
designs make their choices for good reasons,
exploring alternatives gets less funny (at least for me).
At one time, back in the past, the mainstream computers had word lengths
such as 12 bits, 18 bits, 24 bits, 30 bits, 36 bits, 48 bits, 60 bits...
all multiples of 6 bits.
The reason for this was that computers needed a character set with
letters, numbers, and various special characters - and a six-bit
character, with 64 possibilities, was adequate for that.
As technology advanced, and computer power became cheaper, it became
possible to think of using computers for more applications. Using an eight- bit character allowed the use of lower-case characters, getting rid of a limitation of the older computers that could possibly become annoying in
the future. Of course, a 7-bit character would also be enough for that -
and at least one company, ASI, actually made computers with word lengths
that were multiples of 7 bits.
Even before System/360, IBM made a computer built around a 64-bit word,
the STRETCH. It was intended to be a very powerful scientific computer,
but it also had the very rare feature of bit addressing - which a power-of- two word length made much more practical.
Hardly any architectures provide bit addressing these days, though.
None the less, a character set that includes lower-case is a good reason. Since a 36-bit word works better with 9-bit characters instead of 6-bit characters being addressable, nothing is really lost by going to 36 bits.
Of course, there's another good reason for sticking with 32-bit or 64-bit designs: because that's what everyone else is using, standard memory
modules have data buses corresponding to such widths, possibly with extra bits for ECC.
To me, those don't seem to be enough "good reasons" to absolutely preclude different word lengths. But there would definitely have to be a real
benefit to justify the cost and effort to use a different length. It seems
to me there is a real benefit, in that the available data sizes in the 32- bit world aren't optimized to the needs of scientific computation.
But it's quite correct to feel this real benefit isn't enough to make machines oriented around the 36-bit word length likely.
John Savard
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
{{One can STILL argue whether
deNormals were a plus or a minus in IEEE}}
I am surprised to read that from you, who has always written that
denormals can be implemented cheaply and efficiently in hardware. The
additional hardware cost (or the cost of trapping and software
emulation) has been the only argument against denormals that I ever
encountered.
It is only after IEEE 754-2008 came with FMAC that deNormals became
a low cost addition. {And that has been my point--you seem to have
forgotten the -2008 part or the argument}
- anton
On 2/12/2026 11:09 AM, MitchAlsup wrote:
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
{{One can STILL argue whether
deNormals were a plus or a minus in IEEE}}
I am surprised to read that from you, who has always written that
denormals can be implemented cheaply and efficiently in hardware. The
additional hardware cost (or the cost of trapping and software
emulation) has been the only argument against denormals that I ever
encountered.
It is only after IEEE 754-2008 came with FMAC that deNormals became
a low cost addition. {And that has been my point--you seem to have forgotten the -2008 part or the argument}
And, can note, this is assuming that one actually pays the cost of
native hardware FMAC.
Well, and the secondary irony that it is mainly cost-added for FMUL,
whereas FADD almost invariably has the necessary support hardware already.
But:
FMUL is expensive operation + cheap normalizer (if no denormals);
FADD is cheap operation with expensive normalizer.
FMAC then is gluing the costs of the two units together, but:
With roughly the latency of both;
The need to be significantly wider internally to deal with some cases.
So, FMAC is a single unit that costs more than both units taken
separately, and with a higher latency.
BGB <cr88192@gmail.com> posted:
On 2/12/2026 11:09 AM, MitchAlsup wrote:
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
{{One can STILL argue whether
deNormals were a plus or a minus in IEEE}}
I am surprised to read that from you, who has always written that
denormals can be implemented cheaply and efficiently in
hardware. The additional hardware cost (or the cost of trapping
and software emulation) has been the only argument against
denormals that I ever encountered.
It is only after IEEE 754-2008 came with FMAC that deNormals
became a low cost addition. {And that has been my point--you seem
to have forgotten the -2008 part or the argument}
And, can note, this is assuming that one actually pays the cost ofIt is exceedingly difficult to get an IEEE quality rounded result if
native hardware FMAC.
not done in HW.
Well, and the secondary irony that it is mainly cost-added for
FMUL, whereas FADD almost invariably has the necessary support
hardware already.
But:
FMUL is expensive operation + cheap normalizer (if no denormals);
FADD is cheap operation with expensive normalizer.
FMAC then is gluing the costs of the two units together, but:
With roughly the latency of both;
The need to be significantly wider internally to deal with some
cases.
The add stage after the multiplication tree is <essentially> 2ª as
wide. FMUL needs a 108-bit 2-input adder
FMAC needs a 160-bit 3-input adder and a 52-bit incrementor.
The multiplication tree is the same, normalizer is larger.
So, FMAC is a single unit that costs more than both units taken separately, and with a higher latency.
Prior RISC processors did FMUL in 3-4 cycles (mostly 4).
Later RISC processors and x86 did FMAC in 4-cycles (occasionally 5).
Quadi, have your computer architectures included IBM 360 floating point support? There is probably more demand for that than for 36-bit these
days.
On Thu, 19 Feb 2026 17:30:50 GMT
MitchAlsup <user5857@newsgrouper.org.invalid> wrote:
BGB <cr88192@gmail.com> posted:
On 2/12/2026 11:09 AM, MitchAlsup wrote:
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
{{One can STILL argue whether
deNormals were a plus or a minus in IEEE}}
I am surprised to read that from you, who has always written that
denormals can be implemented cheaply and efficiently in
hardware. The additional hardware cost (or the cost of trapping
and software emulation) has been the only argument against
denormals that I ever encountered.
It is only after IEEE 754-2008 came with FMAC that deNormals
became a low cost addition. {And that has been my point--you seem
to have forgotten the -2008 part or the argument}
And, can note, this is assuming that one actually pays the cost of native hardware FMAC.It is exceedingly difficult to get an IEEE quality rounded result if
not done in HW.
Well, and the secondary irony that it is mainly cost-added for
FMUL, whereas FADD almost invariably has the necessary support
hardware already.
But:
FMUL is expensive operation + cheap normalizer (if no denormals);
FADD is cheap operation with expensive normalizer.
FMAC then is gluing the costs of the two units together, but:
With roughly the latency of both;
The need to be significantly wider internally to deal with some
cases.
The add stage after the multiplication tree is <essentially> 2× as
wide. FMUL needs a 108-bit 2-input adder
FMAC needs a 160-bit 3-input adder and a 52-bit incrementor.
The multiplication tree is the same, normalizer is larger.
So, FMAC is a single unit that costs more than both units taken separately, and with a higher latency.
Prior RISC processors did FMUL in 3-4 cycles (mostly 4).
Later RISC processors and x86 did FMAC in 4-cycles (occasionally 5).
Arm Inc. application processors cores have FMAC latency=4 for
multiplicands, but 2 for accumulator.
Maybe we should switch to 18-bit bytes to support UNICODE.
BGB <cr88192@gmail.com> posted:
On 2/12/2026 11:09 AM, MitchAlsup wrote:It is exceedingly difficult to get an IEEE quality rounded result if
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
{{One can STILL argue whether
deNormals were a plus or a minus in IEEE}}
I am surprised to read that from you, who has always written that
denormals can be implemented cheaply and efficiently in hardware. The >>>> additional hardware cost (or the cost of trapping and software
emulation) has been the only argument against denormals that I ever
encountered.
It is only after IEEE 754-2008 came with FMAC that deNormals became
a low cost addition. {And that has been my point--you seem to have
forgotten the -2008 part or the argument}
And, can note, this is assuming that one actually pays the cost of
native hardware FMAC.
not done in HW.
Well, and the secondary irony that it is mainly cost-added for FMUL,
whereas FADD almost invariably has the necessary support hardware already. >>
But:
FMUL is expensive operation + cheap normalizer (if no denormals);
FADD is cheap operation with expensive normalizer.
FMAC then is gluing the costs of the two units together, but:
With roughly the latency of both;
The need to be significantly wider internally to deal with some cases.
The add stage after the multiplication tree is <essentially> 2× as wide. FMUL needs a 108-bit 2-input adder
FMAC needs a 160-bit 3-input adder and a 52-bit incrementor.
The multiplication tree is the same, normalizer is larger.
So, FMAC is a single unit that costs more than both units taken
separately, and with a higher latency.
Prior RISC processors did FMUL in 3-4 cycles (mostly 4).
Later RISC processors and x86 did FMAC in 4-cycles (occasionally 5).
On 2/19/2026 11:30 AM, MitchAlsup wrote:
BGB <cr88192@gmail.com> posted:
On 2/12/2026 11:09 AM, MitchAlsup wrote:It is exceedingly difficult to get an IEEE quality rounded result if
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
{{One can STILL argue whether
deNormals were a plus or a minus in IEEE}}
I am surprised to read that from you, who has always written that
denormals can be implemented cheaply and efficiently in hardware. The >>>>> additional hardware cost (or the cost of trapping and software
emulation) has been the only argument against denormals that I ever>>>>> encountered.
It is only after IEEE 754-2008 came with FMAC that deNormals became
a low cost addition. {And that has been my point--you seem to have
forgotten the -2008 part or the argument}
And, can note, this is assuming that one actually pays the cost of
native hardware FMAC.
not done in HW.
Likely depends.
Can use the trick of bumping to the next size up and use that for computation.Neither of those work!
So, for Binary32 compute it as Binary64, and for Binary64 compute it as Binary128.
BGB wrote:
On 2/19/2026 11:30 AM, MitchAlsup wrote:
BGB <cr88192@gmail.com> posted:
On 2/12/2026 11:09 AM, MitchAlsup wrote:It is exceedingly difficult to get an IEEE quality rounded result if
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
{{One can STILL argue whether
deNormals were a plus or a minus in IEEE}}
I am surprised to read that from you, who has always written that
denormals can be implemented cheaply and efficiently in hardware. >>>>>> The
additional hardware cost (or the cost of trapping and software
emulation) has been the only argument against denormals that I ever >>>>>> encountered.
It is only after IEEE 754-2008 came with FMAC that deNormals became
a low cost addition. {And that has been my point--you seem to have
forgotten the -2008 part or the argument}
And, can note, this is assuming that one actually pays the cost of
native hardware FMAC.
not done in HW.
Likely depends.
Can use the trick of bumping to the next size up and use that for
computation.
So, for Binary32 compute it as Binary64, and for Binary64 compute it
as Binary128.
Neither of those work!
I believed this to be true but I was shown the error of my thinking by
more knowledgable people in the 754 working group. I.e. they had a very simple/small example where doing the calculation in the next higher precision would still cause double rounding errors.
Also note that Mitch have stated multiple times that you need ~160
mantissa bits during FMAC double calculations.
Terje
The add stage after the multiplication tree is <essentially> 2× as wide. FMUL needs a 108-bit 2-input adder
FMAC needs a 160-bit 3-input adder and a 52-bit incrementor.
The multiplication tree is the same, normalizer is larger.
| Sysop: | DaiTengu |
|---|---|
| Location: | Appleton, WI |
| Users: | 1,104 |
| Nodes: | 10 (0 / 10) |
| Uptime: | 492389:01:01 |
| Calls: | 14,151 |
| Calls today: | 2 |
| Files: | 186,281 |
| D/L today: |
2,617 files (990M bytes) |
| Messages: | 2,501,190 |
| Posted today: | 1 |