Should an ISA contain an instruction that gives Write-Permission
from the Data Cache (or L2) line outward in the memory hierarchy,
while keeping the <now shared> line resident ?? {allow}
Should an ISA contain an instruction that invalidates (without
writing back) a Data Cache (or L2) line ?? {Discard}
Should an ISA contain an instruction that gives Write-Permission
from the Data Cache (or L2) line outward in the memory hierarchy,
while keeping the <now shared> line resident ?? {allow}
Should an ISA contain an instruction that invalidates (without
writing back) a Data Cache (or L2) line ?? {Discard}
On 2026-May-08 19:34, MitchAlsup wrote:
Should an ISA contain an instruction that gives Write-Permission
from the Data Cache (or L2) line outward in the memory hierarchy,
while keeping the <now shared> line resident ?? {allow}
Do you mean changes a single line from using write-invalidate
protocol to write-update so any remote writes are forwarded
by the home directory to the current line owner?
In effect, blocks line movement but not updates.
Or something else?
Should an ISA contain an instruction that invalidates (without
writing back) a Data Cache (or L2) line ?? {Discard}
Not unprivileged or applications could un-zero fields that had
been intentionally zeroed out but still held in cache.
Should an ISA contain an instruction that invalidates (without
writing back) a Data Cache (or L2) line ?? {Discard}
Should an ISA contain an instruction that gives Write-Permission
from the Data Cache (or L2) line outward in the memory hierarchy,
while keeping the <now shared> line resident ?? {allow}
Should an ISA contain an instruction that invalidates (without
writing back) a Data Cache (or L2) line ?? {Discard}
On 2026-05-08 7:34 p.m., MitchAlsup wrote:
Should an ISA contain an instruction that gives Write-Permission
from the Data Cache (or L2) line outward in the memory hierarchy,
while keeping the <now shared> line resident ?? {allow}
Trying to fathom what is going on with this. Is it an issue with keeping
the cache coherent? Sounds like the D$ cache line was write-protected
and now it is to be made writable?
Should an ISA contain an instruction that invalidates (without
writing back) a Data Cache (or L2) line ?? {Discard}
Q+ has several I$ and D$ cache operations wrapped up in a single
instruction called ‘CACHE’. I thought it best to put these in one instruction since they are infrequently used. The instruction has the
same format as a load/store but the source/dest register is replaced by
a command code. It uses the supplied address (if an address is needed).
Turn cache on/off (D$ only)
Invalidate entire cache (I$ or D$ or both)
Invalidate cache line (I$ or D$ or both)
Invalidate TLB
Invalidate TLB entry
Both the I$ and D$ caches can be invalidated with a single instruction.
Robert Finch <robfi680@gmail.com> posted:
On 2026-05-08 7:34 p.m., MitchAlsup wrote:
Should an ISA contain an instruction that gives Write-Permission
from the Data Cache (or L2) line outward in the memory hierarchy,
while keeping the <now shared> line resident ?? {allow}
Trying to fathom what is going on with this. Is it an issue with keeping the cache coherent? Sounds like the D$ cache line was write-protected
and now it is to be made writable?
Consider the stack, and after adding a number to SP there are now
a bunch of lines that are neither accessible nor containing a useful
value.
Should an ISA contain an instruction that invalidates (without
writing back) a Data Cache (or L2) line ?? {Discard}
Q+ has several I$ and D$ cache operations wrapped up in a single instruction called ‘CACHE’. I thought it best to put these in one instruction since they are infrequently used. The instruction has the
same format as a load/store but the source/dest register is replaced by
a command code. It uses the supplied address (if an address is needed).
I have the same format, a memory reference that does not need a DST register specifier, so it becomes the OpCode.
Turn cache on/off (D$ only)
Why would you want the cache turned off??
Invalidate entire cache (I$ or D$ or both)
What if the cache is 1GB in size ??? This could take a long time.
Invalidate cache line (I$ or D$ or both)
Invalidate TLB
With a coherent TLB this is unnecessary.
Invalidate TLB entry
--- Synchronet 3.22a-Linux NewsLink 1.2Both the I$ and D$ caches can be invalidated with a single instruction.
That may take a long time !
Should an ISA contain an instruction that gives Write-Permission
from the Data Cache (or L2) line outward in the memory hierarchy,
while keeping the <now shared> line resident ?? {allow}
Should an ISA contain an instruction that invalidates (without
writing back) a Data Cache (or L2) line ?? {Discard}
On 5/8/26 7:34 PM, MitchAlsup wrote:
Should an ISA contain an instruction that gives Write-Permission
from the Data Cache (or L2) line outward in the memory hierarchy,
while keeping the <now shared> line resident ?? {allow}
Intel added Cache Line WriteBack to memory to help with memory
persistence (IIRC), which can be viewed as a reliability
assertion (data will not be lost on power failure).
There could
also be performance reasons for pushing data outward while
retaining it locally in a clean (shared) state; a remote request
for the data might have lower latency (sourcing directly from
L3, e.g., rather than an L3 coherence directory indicating where
the data is and having to request the data from and a state
change for the owner).
For L1 to L2, cache line granularity might be too fine for
'checkpointing' data from a merely parity protected L1 to an
ECC-protected L2, though My 66000's VVM (with appropriate
acceleration) might make substantial blocks fast/low overhead.
On the other hand, assigning reliability factor at a page level
might be awkward from PTE bit starvation, granularity
inflexibility, and timing.
Would this also ensure data presence in outer cache/memory on a
clean line? E.g., if applied with an L2 target when L2 is non-
inclusive (but possibly tag inclusive or at least snoop
filtering) and the line is clean, would the line be written back
if not present in L2?
If one had a mode that disallowed escape of dirty lines, this
might be used as a means to commit temporary, local values. This
seems somewhat similar to a transactional memory mechanism,
though transactional memory would typically distinguish old
dirty lines (and perhaps clean ones) allowing them to be written
back on replacement.
I also wonder if this might be used to assist in determining
what cache indexes have been replaced in L2. With lazy writeback
the timing factors may be fuzzed more. My mind does not work
well for this type of problem.
Should an ISA contain an instruction that invalidates (without
writing back) a Data Cache (or L2) line ?? {Discard}
EricP pointed out a possible security issue if OS page zeroing
could be thwarted.
This could be worked around by having such
page (or cache line) zeroing use special cases that act as if
the zeroed memory was written back to memory. Forcing a
distinguishing between explicit zeroing to provide a base value
and zeroing to remove access to old data may facilitate software
bugs when the difference is not recognized/remembered.
This is similar to the problem that data cache block allocate
had where old data (that the current thread was not permitted to
read) of a possibly different address could be read. This was
generally "solved" by defining allocation as either no-op on a
cache hit and cache block zero on a miss.
Since the benefit of
doing nothing on a cache hit may not have been very beneficial
(one might use a bit of cache bandwidth) and zeroing provides
other benefits, block zeroing seems to be preferred (though I
still like allocation).
(This also is reminiscent of the Mill's unbacked memory, which
was memory that reads as zero [providing an implicit data cache
block zero] and has no physical memory address until evicted
from last level cache.
For highly temporary data, the data
would never leave the cache; this could also allow cache as
memory as long as no cache was forced to be written back. I do
not know if unbacked memory allowed an application to release
the memory, which would be like an invalidate without
writeback.)
Optimistic updates sounds similar to transactional memory or
versioned memory.
With respect to Stefan Monnier's seeing this as undefined
behavior, I think this might be presented similarly to memory
ordering with a weaker memory model. I.e., the result of a read
would still return a previously held value, but the "version"
might be unexpected. The result is not "undefined" but timing
dependent.
I suspect one would have to be very careful about defining how
such would interact with ESM (and perhaps other memory
interaction methods).
If the memory so cleared is thread local, I do not _think_--- Synchronet 3.22a-Linux NewsLink 1.2
there would be consistency issues. (I think IBM defined "local"
memory transactions which supported speculation but not system
atomicity.) Yet I feel that there might be uses for value
checkpointing (versioning) where the address is shared by
multiple threads.
Obviously, hardware could in some cases interweave versions into
a consistent order, but forcing software to handle the cases
when hardware fails sounds problematic. Explicit checkpoints
like with transactional memory, might be easier for programmers
to use correctly than a fully flexible handling of speculation.
On the other hand, finer-grained control could allow software
to exploit knowledge that is not easily observed by (or
communicated to) hardware.
I think there are opportunities for versioned memory and/or
other timing/speculation manipulation, but I do not have a clue
about what interface should be presented to software. A RISC-
like approach of cache line control instructions could provide
flexibility, but the overhead for idiom recognition should also
be considered.
Modal operation (like transactional memory or ASM) simplifies
some aspects and complicates others.
I tend to favor complexity (flexibility), so my opinion is
dangerous.
Paul Clayton <paaronclayton@gmail.com> posted:
On 5/8/26 7:34 PM, MitchAlsup wrote:
Should an ISA contain an instruction that gives Write-Permission
from the Data Cache (or L2) line outward in the memory hierarchy,
while keeping the <now shared> line resident ?? {allow}
Intel added Cache Line WriteBack to memory to help with memory
persistence (IIRC), which can be viewed as a reliability
assertion (data will not be lost on power failure).
Thank you Paul ! An interesting rational
There could
also be performance reasons for pushing data outward while
retaining it locally in a clean (shared) state; a remote request
for the data might have lower latency (sourcing directly from
L3, e.g., rather than an L3 coherence directory indicating where
the data is and having to request the data from and a state
change for the owner).
In a directory based caching system the directory is in a position
that ANY shared cache line can be granted into the Exclusive state {minimizing transfer distance}.
For L1 to L2, cache line granularity might be too fine for
'checkpointing' data from a merely parity protected L1 to an
ECC-protected L2, though My 66000's VVM (with appropriate
acceleration) might make substantial blocks fast/low overhead.
If you care about RAS, you cannot have write back L1 caches
with that property.
On the other hand, assigning reliability factor at a page level
might be awkward from PTE bit starvation, granularity
inflexibility, and timing.
Would this also ensure data presence in outer cache/memory on a
clean line? E.g., if applied with an L2 target when L2 is non-
inclusive (but possibly tag inclusive or at least snoop
filtering) and the line is clean, would the line be written back
if not present in L2?
A whole different can or worms.....
If one had a mode that disallowed escape of dirty lines, this
might be used as a means to commit temporary, local values. This
seems somewhat similar to a transactional memory mechanism,
though transactional memory would typically distinguish old
dirty lines (and perhaps clean ones) allowing them to be written
back on replacement.
Luckily, I have a fundamental disagreement on ISA-extensions that
provide SW the illusion that "lots or places" can be in intermediate
states (i.e. TM).
I also wonder if this might be used to assist in determining
what cache indexes have been replaced in L2. With lazy writeback
the timing factors may be fuzzed more. My mind does not work
well for this type of problem.
Should an ISA contain an instruction that invalidates (without
writing back) a Data Cache (or L2) line ?? {Discard}
EricP pointed out a possible security issue if OS page zeroing
could be thwarted.
Why is the OS zeroing a page that has already been mapped into
unprivileged VAS ???
Luckily, in My 66000, this zeroing is 1 instruction {MS #0,[&page]}
and the interconnect is designed to transport the page zero in one transaction.
This could be worked around by having such
page (or cache line) zeroing use special cases that act as if
the zeroed memory was written back to memory. Forcing a
distinguishing between explicit zeroing to provide a base value
and zeroing to remove access to old data may facilitate software
bugs when the difference is not recognized/remembered.
This is similar to the problem that data cache block allocate
had where old data (that the current thread was not permitted to
read) of a possibly different address could be read. This was
generally "solved" by defining allocation as either no-op on a
cache hit and cache block zero on a miss.
vVM is allowed to 'allocate' cache lines (CI without Read) when
a line boundary is crossed and more than 1 complete line remains
in the loop--saving interconnect BW and coherence messages.
Since the benefit of
doing nothing on a cache hit may not have been very beneficial
(one might use a bit of cache bandwidth) and zeroing provides
other benefits, block zeroing seems to be preferred (though I
still like allocation).
(This also is reminiscent of the Mill's unbacked memory, which
was memory that reads as zero [providing an implicit data cache
block zero] and has no physical memory address until evicted
from last level cache.
I always liked that feature. I count not work it into a more
conventional architecture, except for the 'known' program stack.
For highly temporary data, the data
would never leave the cache; this could also allow cache as
memory as long as no cache was forced to be written back. I do
not know if unbacked memory allowed an application to release
the memory, which would be like an invalidate without
writeback.)
Known stack.
Optimistic updates sounds similar to transactional memory or
versioned memory.
With respect to Stefan Monnier's seeing this as undefined
behavior, I think this might be presented similarly to memory
ordering with a weaker memory model. I.e., the result of a read
would still return a previously held value, but the "version"
might be unexpected. The result is not "undefined" but timing
dependent.
SW would consider this undefined--SW depends (way too much) of
a read returning exactly the last thing read.
I suspect one would have to be very careful about defining how
such would interact with ESM (and perhaps other memory
interaction methods).
Any kind of ATOMIC thing is WAY better to do it correct and
SLOW than to take ANY chance of doing it wrong.
If the memory so cleared is thread local, I do not _think_
there would be consistency issues. (I think IBM defined "local"
memory transactions which supported speculation but not system
atomicity.) Yet I feel that there might be uses for value
checkpointing (versioning) where the address is shared by
multiple threads.
Obviously, hardware could in some cases interweave versions into
a consistent order, but forcing software to handle the cases
when hardware fails sounds problematic. Explicit checkpoints
like with transactional memory, might be easier for programmers
to use correctly than a fully flexible handling of speculation.
On the other hand, finer-grained control could allow software
to exploit knowledge that is not easily observed by (or
communicated to) hardware.
I think there are opportunities for versioned memory and/or
other timing/speculation manipulation, but I do not have a clue
about what interface should be presented to software. A RISC-
like approach of cache line control instructions could provide
flexibility, but the overhead for idiom recognition should also
be considered.
Modal operation (like transactional memory or ASM) simplifies
some aspects and complicates others.
I tend to favor complexity (flexibility), so my opinion is
dangerous.
Should an ISA contain an instruction that gives Write-Permission
from the Data Cache (or L2) line outward in the memory hierarchy,
while keeping the <now shared> line resident ?? {allow}
Should an ISA contain an instruction that invalidates (without
writing back) a Data Cache (or L2) line ?? {Discard}
With respect to Stefan Monnier's seeing this as undefined
behavior, I think this might be presented similarly to memory
ordering with a weaker memory model.
I.e., the result of a read
would still return a previously held value, but the "version"
might be unexpected. The result is not "undefined" but timing
dependent.
On 5/10/26 8:33 PM, MitchAlsup wrote:[...]
The OS zeros the physical page before assigning it to the new
context (or more likely assigns a zero page and does copy on
write, which is just zeroing the page).
Luckily, in My 66000, this zeroing is 1 instruction {MS #0,[&page]}
and the interconnect is designed to transport the page zero in one
transaction.
This is more flexible than having cache line and page clearing
instructions.
Robert Finch <robfi680@gmail.com> posted:
On 2026-05-08 7:34 p.m., MitchAlsup wrote:
Should an ISA contain an instruction that gives Write-Permission
from the Data Cache (or L2) line outward in the memory hierarchy,
while keeping the <now shared> line resident ?? {allow}
Trying to fathom what is going on with this. Is it an issue with keeping
the cache coherent? Sounds like the D$ cache line was write-protected
and now it is to be made writable?
Consider the stack, and after adding a number to SP there are now
a bunch of lines that are neither accessible nor containing a useful
value.
Paul Clayton <paaronclayton@gmail.com> writes:
On 5/10/26 8:33 PM, MitchAlsup wrote:[...]
The OS zeros the physical page before assigning it to the new
context (or more likely assigns a zero page and does copy on
write, which is just zeroing the page).
Assigning a zero page for reading is a good idea. Copying that page
on writing appears inefficient to me, because it needs to read the
zero page into cache and write it to a newly allocated page.
On 5/10/26 8:33 PM, MitchAlsup wrote:--------------------------
If you care about RAS, you cannot have write back L1 caches
with that property.
Different customers may have different preferences. I seem to
recall that Intel offered the option to replicate parity-
protected L1 data cache to allow recovery (at the cost of half
the capacity).
vVM is allowed to 'allocate' cache lines (CI without Read) when
a line boundary is crossed and more than 1 complete line remains
in the loop--saving interconnect BW and coherence messages.
That may be the most common use for avoiding read-to-own, but it
is not the only use.
Any kind of ATOMIC thing is WAY better to do it correct and
SLOW than to take ANY chance of doing it wrong.
Yes, though slow can also motivate incorrect software. Being
able to clearly communicate the dangers also seems important
(which argues for simplicity/orthogonality).
Paul Clayton <paaronclayton@gmail.com> writes:
On 5/10/26 8:33 PM, MitchAlsup wrote:[...]
The OS zeros the physical page before assigning it to the new
context (or more likely assigns a zero page and does copy on
write, which is just zeroing the page).
Assigning a zero page for reading is a good idea. Copying that page
on writing appears inefficient to me, because it needs to read the
zero page into cache and write it to a newly allocated page.
A better approach is to do just the writes. I think that zeroing the
page on demand is a good approach, because then it is already in the
D-cache,
but AFAIK Linux actually zeros physical pages ahead of time typically on a separate (otherwise idle) core, and just maps one of
those pages to the virtual page that needs to be written to. I wonder
why Linux does that.
Luckily, in My 66000, this zeroing is 1 instruction {MS #0,[&page]}
and the interconnect is designed to transport the page zero in one
transaction.
This is more flexible than having cache line and page clearing >instructions.
In what way is it more flexible? It is a page-clearing instruction.
- anton--- Synchronet 3.22a-Linux NewsLink 1.2
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
Robert Finch <robfi680@gmail.com> posted:
On 2026-05-08 7:34 p.m., MitchAlsup wrote:
Should an ISA contain an instruction that gives Write-Permission
from the Data Cache (or L2) line outward in the memory hierarchy,
while keeping the <now shared> line resident ?? {allow}
Trying to fathom what is going on with this. Is it an issue with keeping >> the cache coherent? Sounds like the D$ cache line was write-protected
and now it is to be made writable?
Consider the stack, and after adding a number to SP there are now
a bunch of lines that are neither accessible nor containing a useful
value.
Seems to me that the code will certainly call another function
almost immediately that will simply reuse the already
present stack cache line; prematurely invalidating it will
actually slow things down.
I see no benefit in invalidating it pre-emptively.
It would certainly cause problems for code that intentionally
uses the soi disant "free" stack space in legal but unusual ways.
| Sysop: | DaiTengu |
|---|---|
| Location: | Appleton, WI |
| Users: | 1,116 |
| Nodes: | 10 (0 / 10) |
| Uptime: | 85:27:36 |
| Calls: | 14,305 |
| Files: | 186,338 |
| D/L today: |
647 files (184M bytes) |
| Messages: | 2,525,478 |