• Ozaki Scheme II, an interesting variant of matrix multiplication

    From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Tue Jun 9 21:42:21 2026
    From Newsgroup: comp.arch

    See https://arxiv.org/html/2504.08009v1 . Looks quite interesting...
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Wed Jun 10 13:00:17 2026
    From Newsgroup: comp.arch

    On 2026-Jun-09 17:42, Thomas Koenig wrote:
    See https://arxiv.org/html/2504.08009v1 . Looks quite interesting...


    The original "abs" page format has links to other papers
    by the same authors.

    https://arxiv.org/abs/2504.08009v1

    It is often worthwhile to check what else they have written
    but arxiv indexes on last name and first initial so
    watch out for authors with the same name. e.g.

    https://arxiv.org/search/cs?searchtype=author&query=Ozaki,+K

    Error Analysis of Matrix Multiplication Emulation Using Ozaki-II Scheme https://arxiv.org/abs/2602.02549


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.arch on Thu Jun 11 04:01:47 2026
    From Newsgroup: comp.arch

    Thomas Koenig <tkoenig@netcologne.de> wrote:
    See https://arxiv.org/html/2504.08009v1 . Looks quite interesting...

    My first impression is that main purpose of the first author is to
    promote his name. The methods are known for many years. Usually
    they are used to do multiprecision computations. What is
    potentially new is that FP64 performance of GPU-s is so bad that
    there is gain from applying them to produce FP64 result.

    In slightly different spirit, it is disappointing that modern
    machines have rather bad support for modular computations.
    More precisely, to get correct n-bit modular result one needs
    to do n-bit computation and modular reduction must be done as a
    separate step. That significantly increases cost of such
    computations. There is still win in appropriate situatations,
    like matrix mutiplication, but hardware with better support for
    modular operations probaby could do them faster.
    --
    Waldek Hebisch
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From quadibloc@quadibloc@invalid.com (John Savard) to comp.arch on Sun Jun 21 22:57:35 2026
    From Newsgroup: comp.arch

    I've looked for more information about this.

    Although it's an efficient way to perform some FP64 computations
    through the use of a resource that only supports FP8... it has the
    limitation of only working at its most efficient if one is working
    with square matrices.

    So it is still useful to have a computer with real FP64, even if this
    technique allows people doing scientific computations to bask in a
    subsidy from people doing AI work.

    John Savard
    --- Synchronet 3.22a-Linux NewsLink 1.2