From Newsgroup: comp.arch
Thomas Koenig <
tkoenig@netcologne.de> wrote:
See https://arxiv.org/html/2504.08009v1 . Looks quite interesting...
My first impression is that main purpose of the first author is to
promote his name. The methods are known for many years. Usually
they are used to do multiprecision computations. What is
potentially new is that FP64 performance of GPU-s is so bad that
there is gain from applying them to produce FP64 result.
In slightly different spirit, it is disappointing that modern
machines have rather bad support for modular computations.
More precisely, to get correct n-bit modular result one needs
to do n-bit computation and modular reduction must be done as a
separate step. That significantly increases cost of such
computations. There is still win in appropriate situatations,
like matrix mutiplication, but hardware with better support for
modular operations probaby could do them faster.
--
Waldek Hebisch
--- Synchronet 3.22a-Linux NewsLink 1.2