Forum: War Ensemble BBS

Re: Inverted Page Tables

From Paul Clayton@paaronclayton@gmail.com to comp.arch on Mon Feb 16 19:29:45 2026

From Newsgroup: comp.arch

On 2/9/26 8:56 PM, Robert Finch wrote:

On 2026-02-09 5:27 p.m., BGB wrote:

[snip]

My usual thing here was:
In mixed page-size configurations, the MMU is set to the
smallest in-use page size;
When set to larger page sizes, a few of the bits that were
low-order bits with a smaller page size change to being
high-order bits with a larger page size (all other bits
remaining in the same place).

This makes sense if one knows (or can predict) the page size
early enough.

Itanium allowed different base page sizes for different
"segments" which were looked up in a small table, so this would
be similar to (if a bit slower than) per ASID base page sizes.

One could also do prediction based on the register used for the
base address (e.g., recording the last page size encountered
when a register, or register group, was used).

I suspect a simple predictor could be highly accurate (100% for
programs that only use base size pages, if the predictor is
initialized on thread switches). The stack is likely to use base
page size and many other accesses would have temporal locality
with respect to the page size.

[snip]

Seemed "obvious enough".

Yes, such is obviously workable with page size known in advance.

Paper seems to have the idea of subdividing associativity if
multiple page sizes are in use at the same time. I guess this
works, if one has enough associativity.

The skewed associativity helps to reduce conflict misses, so the
cost of lower associativity is reduced.

[snip]
{On 2/9/26 8:56 PM, Robert Finch wrote:}

I found a slide presentation on the idea.

I am wondering how a LRU system would work with the skewed entries.
I assume the entries would be shifting between the ways for LRU.
So, if there is a different size page in one of the ways would
it still work?

With ordinary skewed associativity, LRU would use compressed/
truncated timestamps, but an alternative is bit per entry NRU
with the interesting aspect being that skewing groups different
NRU bits. The original skewed associativity paper ("Skewed
Associative Caches", Seznec and Bodin, 1992) proposed using
recently used bits with periodic clearing of all of the bits and
choosing victims randomly first among not recently used, then
among not modified. With a TLB 'prefer keeping larger pages'
might be part of the policy.

(Timestamps could be miss count. One might also have different
offsets for different banks or sections of banks; I am guessing
that such might reduce wraparound issues. For an L1 TLB, this
might not require many bits to provide a close approximation of
LRU.)

A slightly later paper ("A case for two-way skewed-associative
caches", Seznec, 1993) proposed one bit per pair of cache lines
with the bit set when bank zero is accessed and cleared with
bank one is accessed.

Sharing the indexing with two ways (e.g., two address hashing
functions in a four-way associative cache) would allow the pair
to use true LRU, limiting the damage from something like random
replacement. Combining this with a use bit might further improve
replacement choices.

One might also track L1 evictions in an L2 TLB and use a
"sticky/second chance" bit when such are soon returned to L1 so
that they are less likely to be evicted (since all entries
eligible for replacement might have sticky bits set, one would
have to be a victim, though one might choose to use cuckoo
replacement then). A victim cache would have similar use. With a
non-clustered TLB, the moving overhead is lower than with a data
cache because the blocks are small.

One interesting aspect of skewed associative caches is that one
can implement cuckoo (a.k.a. elbow) caches that choose a victim
in one group of ways and look for a victim in the victim's other
ways (potentially recursive to some depth). This can approach a
fully associative cache. Of course, it also depends on having a
replacement policy that works well with skewed associativity.

The use of overlaid skewed associativity to support multiple
page sizes may not be especially practical since one would
typically want different page sizes to be much larger (16 times
or more, probably), so one is less likely to have shared bits
for indexing. Yet I still consider it a clever and potentially
useful mechanism. For an L2 TLB with a modest number of page
sizes used by a thread, hash-rehash probably works well enough.

With huge pages, there is also more benefit of a separate TLB
since the tag and translation address will be smaller, though
one could dynamically share storage and use sign-extension to
use smaller tags for some pages.

With page size prediction, it might be practical to only index
the two most likely page sizes.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Paul Clayton@paaronclayton@gmail.com to comp.arch on Wed Feb 18 14:03:36 2026

From Newsgroup: comp.arch

On 2/9/26 12:36 AM, Robert Finch wrote:

There used to be separate two bits in my MMU project PTE to
encode the type and a shortcut page indicator. They were
combined into a single two-bit field, 0=PTE,1=PTP, and
2=shortcut. That leaves an extra unused code. I am wondering
what to use the code for. The only thing I can think of ATM is a
meta-data record indicator.

By "meta-data record indicator" do you mean 'invalid' (i.e.,
whatever information is in the other bits is managed by the OS)?

I have also been considering PTEs that are not a power-of-two in
size. For instance, 48-bits or possibly 56 bits. The PTEs would
be stored in the last part of a page with a page header and
other information at the beginning of the page. For an 8kB page
the last (or middle) 6kB would be PTEs. IDK what the header
would be used for, other than perhaps some sort of error
management. That is, if the first part or the last part of the
page was overwritten, it could generate an error. Same as a
header on a block of memory.

I think one might want blocks of PTEs to be within a minimum
memory access chunk (cache block or smaller). This avoids having
to fetch two cache blocks for a PTE (if the memory system is
limited to aligned fetches) and allows the extra bits in the
cache block to be applied to the PTEs in that chunk (without
having to fetch separately from the top of the table).

With 32-byte chunks and 48-bit PTEs, one would have 16 bits of
chunk-local metadata in most chunks. (Special casing one chunk
feels extra funky somehow.) 16 bits does not seem like much.
Giving all the bits saved from a smaller PTE to provide chunk-
local information — with a power-of-two PTEs per chunk — might
be interesting. It is not clear if it would be useful to have
policies applied to 4- or 8-page chunks.

When a chunk could be coalesced into a single page in terms of
translation, the address portions of all but one PTE could be
repurposed (in theory). I do not know what information one would
want to pack into such space. One interesting possibility would
be to increase the physical address space, though that does not
seem that useful (because of the awkwardness of applying only to
"large" pages) and would not use that many bits.

Having some information be common to a group of pages could save
a few bits, but managing the extra complexity is probably
something OS developers would avoid. E.g., if a group of pages
shared the most significant N bits of translation — sort of inverted-from-usual page coloring — most OSes would just have
the allowed physical address space be N bits smaller.
--- Synchronet 3.21b-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Toadster
  Sun Mar 22 09:52:29 2026
  from Ashland, Wi via Telnet
- Microbot
  Sun Mar 22 08:05:36 2026
  from Moore, Ok via Telnet
- Pixelrez
  Sat Mar 21 16:03:42 2026
  from Lenexa,ks via Telnet
- Pixelrez
  Sat Mar 21 15:57:39 2026
  from Lenexa,ks via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,104
Nodes:	10 (0 / 10)
Uptime:	492389:00:55
Calls:	14,151
Calls today:	2
Files:	186,281
D/L today:	2,617 files (990M bytes)
Messages:	2,501,189
Posted today:	1

Re: Inverted Page Tables

Who's Online

Recent Visitors

System Info