Forum: War Ensemble BBS

Dark
Log in

Username Password

Ozaki Scheme II, an interesting variant of matrix multiplication

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Tue Jun 9 21:42:21 2026

From Newsgroup: comp.arch

See https://arxiv.org/html/2504.08009v1 . Looks quite interesting...
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.22a-Linux NewsLink 1.2

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Wed Jun 10 13:00:17 2026

From Newsgroup: comp.arch

On 2026-Jun-09 17:42, Thomas Koenig wrote:

See https://arxiv.org/html/2504.08009v1 . Looks quite interesting...

The original "abs" page format has links to other papers
by the same authors.

https://arxiv.org/abs/2504.08009v1

It is often worthwhile to check what else they have written
but arxiv indexes on last name and first initial so
watch out for authors with the same name. e.g.

https://arxiv.org/search/cs?searchtype=author&query=Ozaki,+K

Error Analysis of Matrix Multiplication Emulation Using Ozaki-II Scheme https://arxiv.org/abs/2602.02549

--- Synchronet 3.22a-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.arch on Thu Jun 11 04:01:47 2026

From Newsgroup: comp.arch

Thomas Koenig <tkoenig@netcologne.de> wrote:

See https://arxiv.org/html/2504.08009v1 . Looks quite interesting...

My first impression is that main purpose of the first author is to
promote his name. The methods are known for many years. Usually
they are used to do multiprecision computations. What is
potentially new is that FP64 performance of GPU-s is so bad that
there is gain from applying them to produce FP64 result.

In slightly different spirit, it is disappointing that modern
machines have rather bad support for modular computations.
More precisely, to get correct n-bit modular result one needs
to do n-bit computation and modular reduction must be done as a
separate step. That significantly increases cost of such
computations. There is still win in appropriate situatations,
like matrix mutiplication, but hardware with better support for
modular operations probaby could do them faster.
--
Waldek Hebisch
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadibloc@quadibloc@invalid.com (John Savard) to comp.arch on Sun Jun 21 22:57:35 2026

From Newsgroup: comp.arch

I've looked for more information about this.

Although it's an efficient way to perform some FP64 computations
through the use of a resource that only supports FP8... it has the
limitation of only working at its most efficient if one is working
with square matrices.

So it is still useful to have a computer with real FP64, even if this
technique allows people doing scientific computations to bask in a
subsidy from people doing AI work.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

Who's Online
Recent Visitors
- N Cline
  Fri Jun 26 13:25:22 2026
  from Palmer, Ga via Telnet
- N Cline
  Fri Jun 26 12:13:09 2026
  from Palmer, Ga via Telnet
- Noozle
  Fri Jun 26 10:51:12 2026
  from Noozle City via Telnet
- N Cline
  Thu Jun 25 19:30:21 2026
  from Palmer, Ga via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,124
Nodes:	10 (0 / 10)
Uptime:	24:41:33
Calls:	14,394
Calls today:	3
Files:	186,389
D/L today:	6,226 files (1,574M bytes)
Messages:	2,545,009
Posted today:	1