If anyone is interested...
In the 1973-09-13 issue on page 118 is an article by Texas Instruments >announcing the first single transistor per cell 4096b DRAM,
with an explanation of how it works at the semiconductor level,
plus on how to design a 16 kB ECC memory board using their chip.
That is followed by an article various microcomputer bus designs.
https://www.worldradiohistory.com/Archive-Electronics/70s/73/Electronics-1973-09-13.pdf
If anyone is interested...
I stumbled over a stash of old Electronics magazine PDF's dating back
to the 1930's up to 1990's.
In the 1973-09-13 issue on page 118 is an article by Texas Instruments announcing the first single transistor per cell 4096b DRAM,
with an explanation of how it works at the semiconductor level,
plus on how to design a 16 kB ECC memory board using their chip.
That is followed by an article various microcomputer bus designs.
https://www.worldradiohistory.com/Archive-Electronics/70s/73/Electronics-1973-09-13.pdf
EricP <ThatWouldBeTelling@thevillage.com> schrieb:
In the 1973-09-13 issue on page 118 is an article by Texas Instruments
announcing the first single transistor per cell 4096b DRAM,
with an explanation of how it works at the semiconductor level,
plus on how to design a 16 kB ECC memory board using their chip.
That is followed by an article various microcomputer bus designs.
https://www.worldradiohistory.com/Archive-Electronics/70s/73/Electronics-1973-09-13.pdf
Interesting read, thanks!
Clearly, semiconductor memory was the new hot thing (pun intended)
at the time, there was also and from Intel in the same issue,
promising less than 0.1 mW per bit.
Referring to our previous discussions about an early
RISC, which would have certainly have required a cache:
I took a look at the TI memory handbook from 1975, at https://www.synfo.nl/datasheets/TI_1975_Memory-Data-Book.pdf which
had a 1024*1 memory chips with 150 ns write cycle time (which
what would have been used for cache, presumably). They also had
programmable ROMs of 4096 bits with 55 ns access time. From this
figure alone, anybody can be forgiven for thinking that microcode
is the way to go...
Yes RISC it needs a cache or it just wastes all its potential
concurrency in stalls.
Yes RISC it needs a cache or it just wastes all its potential
concurrency in stalls.
FWIW, the original ARM did not have a cache,
Stefan Monnier <monnier@iro.umontreal.ca> writes:
Yes RISC it needs a cache or it just wastes all its potentialFWIW, the original ARM did not have a cache,
concurrency in stalls.
Indeed, the ARM2 used in the Archimedes does not have a cache and runs
rings around contemporary CISCs (including 386 and 68020, with a small I-cache on the 68020).
It runs at 8MHz, the same speed as the first HPPA implementation
(TS-1, a board, not a chip), which does have 64K+64K cache. However,
the ARM2 does not have an MMU, while the 386 and the TS-1 have one,
and the 68020 was usually used with an MMU.
It seems to me that ARM made this clock work with DRAM without cache
by making good use of staying in the same row: In particular,
consecutive instructions usually are from the same row. In addition,
ARM includes load-multiple and store-multiple instructions that access consecutive data that usually are in the same row.
By contrast, note that the VAX 11/780 has a 5MHz clock (and about
10CPI) and a cache. Even if the DRAM at the time of the VAX was
somewhat slower than at the time of the Archimedes, and the VAX has an
MMU, I am sure that an ARM-like RISC with an MMU, FPU and just DRAM
would have required less implementation effort and performed better
than the VAX 11/780 if implemented with the same technology as the VAX 11/780. If you add a cache to the RISC (as the VAX 11/780 has, even
better. If you convert the VAX 11/780 microcode store into a cache,
even better. And, to combat code size, use something like ARM T32
instead of A32, and the decoder and instruction buffering for that
would still fit in the implementation budget (the VAX 11/780 also has instruction buffering and a decoder for variable-length instructions.
- anton
For a VAX memory read it had to (roughly speaking):
(1) translate virtual->physical address
(2) go through the cache (read miss)
(3) get to the SBI (there is a 1 entry store buffer on cache output)
(4) negotiate for SBI
(5) SBI take 2 cycles to transmit control and read address
(6) memory controller does its thing
(7) memory controller negotiates for SBI
(8) memory controller transmits 32B cache line (1 control + 8*4B data clocks) (9) cache receives and saves 32B cache line
(10) cache returns 4B value to 780 core
EricP <ThatWouldBeTelling@thevillage.com> schrieb:
For a VAX memory read it had to (roughly speaking):
(1) translate virtual->physical address
(2) go through the cache (read miss)
(3) get to the SBI (there is a 1 entry store buffer on cache output)
(4) negotiate for SBI
(5) SBI take 2 cycles to transmit control and read address
(6) memory controller does its thing
(7) memory controller negotiates for SBI
(8) memory controller transmits 32B cache line (1 control + 8*4B data clocks)
(9) cache receives and saves 32B cache line
(10) cache returns 4B value to 780 core
The VAX cache line was 8 bytes according to https://dl.acm.org/doi/pdf/10.1145/357353.357356
..
Anton Ertl wrote:
It seems to me that ARM made this clock work with DRAM without cache
by making good use of staying in the same row: In particular,
consecutive instructions usually are from the same row.
There is a copy of the ARM-1 Hardware Reference Manual from 1986 here
http://chrisacorns.computinghistory.org.uk/docs/Acorn/OEM/OEM.html
(1) The MMU (if any) was external to the cpu (ie "not their problem")
(2) It looks like the RAS and CAS DRAM signals came directly from the
ARM cpu chip which was designed to work synchronously with DRAM.
(3) There was only 1 memory bank.
(4) There was no cache
(5) There was no standard system bus to plug in IO adapters
Compared to a VAX-780, VAX had an MMU and a system bus,
the Synchronous Backplane Interface (SBI) onto which all memory and
IO adapter boards were hung. There were multiple memory boards.
For a VAX memory read it had to (roughly speaking):
(1) translate virtual->physical address
Thomas Koenig wrote:
EricP <ThatWouldBeTelling@thevillage.com> schrieb:
For a VAX memory read it had to (roughly speaking):
(1) translate virtual->physical address
(2) go through the cache (read miss)
(3) get to the SBI (there is a 1 entry store buffer on cache output)
(4) negotiate for SBI
(5) SBI take 2 cycles to transmit control and read address
(6) memory controller does its thing
(7) memory controller negotiates for SBI
(8) memory controller transmits 32B cache line (1 control + 8*4B data
clocks)
(9) cache receives and saves 32B cache line
(10) cache returns 4B value to 780 core
The VAX cache line was 8 bytes according to
https://dl.acm.org/doi/pdf/10.1145/357353.357356
..
You are correct. I was thinking it was 8 words.
(That's a lot of tag overhead for such a small data cache.)
So 2*4B data clocks.
FWIW, the original ARM did not have a cache,
They apparently were not very worried about pin count for the
chip, so they did not even multiplex address bus (26 pins) and data bus
(32 pins) the way many others did.
EricP <ThatWouldBeTelling@thevillage.com> writes:
Anton Ertl wrote:
It seems to me that ARM made this clock work with DRAM without cache
by making good use of staying in the same row: In particular,
consecutive instructions usually are from the same row.
This is called page mode; not sure if that was available when the VAX
was designed. Later (supposedly starting in 1986) fast page mode was introduced (not sure how that affected performance). The ARM1
hardware reference manual says "150 nanoseconds row access DRAM" and
"8 MIPS peak". Not sure how 150nanoseconds and 8 MIPS go together,
given that every instruction needs a memory access.
There is a copy of the ARM-1 Hardware Reference Manual from 1986 here
http://chrisacorns.computinghistory.org.uk/docs/Acorn/OEM/OEM.html
(1) The MMU (if any) was external to the cpu (ie "not their problem")
(2) It looks like the RAS and CAS DRAM signals came directly from the
ARM cpu chip which was designed to work synchronously with DRAM.
I see no RAS and CAS pins in the ARM pinout. What I see is 26 address
lines, while 13 would have been enough if the memory controller was in
the ARM1. They apparently were not very worried about pin count for
the chip, so they did not even multiplex address bus (26 pins) and
data bus (32 pins) the way many others did.
The ARM1 has a "translate" signal for telling the MMU (not existing on
the first systems) that this is a virtual address.
The ARM1 also has a "seq" output signal that indicate sequential
memory access: "It may be used, in combination with the low-order
address lines, to indicate that the next cycle can use a fast memory
mode (for example DRAM page mode) and/or to by- pass the address
translation system."
(3) There was only 1 memory bank.
Page 23 says that the ARM co-processor board (it's a co-processor to
the BBC Model B, not a co-processor to the ARM) carries "2MBytes DRAM,
a bootstrap ROM, and an additional 2MBytes of DRAM on a daughter
board". On page 24 it says "IC23 to IC150 [...] ICs that make up the
4MBytes of RAM". I.e., 128 chips; sounds like at least 4 banks of RAM
to me.
Page 27 says: "RAS0..RAS3 [...] There are four banks of RAMs"
(4) There was no cache
(5) There was no standard system bus to plug in IO adapters
The co-processor board also accesses the host system through
memory-mapped I/O (the ARM1 has no I/O pins), and the BBC Micro has a
system bus with I/O.
The Archimedes A400 series (with an ARM2) has a 4-slot backplane.
Compared to a VAX-780, VAX had an MMU and a system bus,
the Synchronous Backplane Interface (SBI) onto which all memory and
IO adapter boards were hung. There were multiple memory boards.
For a VAX memory read it had to (roughly speaking):
(1) translate virtual->physical address
For sequential accesses, the same translation can be used in the usual
case, just as the same DRAM row can be used in the usual case. This
means that the usual case can be as fast as without translation.
In the unusual case, the translation will add latency, yes. But
that's true even with caches, unless you have a virtually addressed
cache (which is the common case for L1 these days).
One thing that's possible if you are willing to pay for a more complex
system (as the VAX 11/780 microarchitects clearly were) is to have
separate control, address and data lines for the different banks, and
use that to access the banks alternatingly, with a bandwidth advantage.
But yes, in general bigger memory subsystems tend to be slower. The
VAX 11/780 compensated that partly with its cache.
- anton
Anton Ertl wrote:
EricP <ThatWouldBeTelling@thevillage.com> writes:
Yes, but as I discovered playing around with my "TTL pipelined risc-VAX" >design that adds parallel buses, and buses with plug-in boards need >connectors, and you quickly run out of PCB edge pins.
| Sysop: | DaiTengu |
|---|---|
| Location: | Appleton, WI |
| Users: | 1,090 |
| Nodes: | 10 (0 / 10) |
| Uptime: | 16:38:18 |
| Calls: | 13,943 |
| Calls today: | 2 |
| Files: | 187,033 |
| D/L today: |
6,513 files (1,817M bytes) |
| Messages: | 2,460,168 |