From Newsgroup: comp.sys.ibm.ps2.hardware
http://perilsofparallel.blogspot.com/2009/01/multi-multicore-single-system-image.html
http://perilsofparallel.blogspot.com/2009/01/multi-multicore-single-system-image.html?showComment=1241145060000#c8871026021207257403
"Cameron Bahar said...
Greg,
I'm a fan of your book and I read it after working at Locus Computing
for a good number of years. Was interested to learn more about different cluster architectures and why Locus had essentially failed even though
we all thought it was so cool and great to have this technology. At
Locus, working under the leadership of Jerry Popek and company we did
actually build a SSI unix-flavored operating system for IBM called
AIX-TCF (transparent computing facility). IBM ran into trouble in 1992
and moved operations to Austin Texas and essentially disconnected from
Locus over the years after Locus refused a buyout offer from IBM which
was a mistake. A lot of the Locus folks ended up at Interactive/Sun/Solaris-X86 and thus was born the Solaris X86 operating system. Locus's demise benefited Sunsoft and Solaris X86, even though
Sun didn't leverage the X86 platform to it's own demise! Will be
interesting to see what Larry will do with Solaris.
Now to SSI and why didn't it take off. I think there are a number of
reasons. Foremost, the concept and vision is certainly attractive. We
had single system semantics across nodes and all OS services were
distributed. We could migrate processes from one node to another. We had remote devices. We had a distributed replication filesystem (I worked on
that a little) and transparency all over the system. But, I think the
main problem was that ethernet got us effectively 4-5 Mbits/sec on a
10Mbit link and were a tightly coupled operating system passing messages
for consistency and coherency between nodes. The slow network caused
lots of problems and things didn't always converge to a stable steady
state. As you increased the number of nodes, the complexity grows and
the number of messages increases and scalability suffers. HA was also problematic and not well thought out and we didn't have shared SAN's to
help keep data around. So a shared nothing SSI cluster with a slow
network is dead in its tracks.
I joined Teradata after Locus and there we had a shared nothing database
with a dedicated fibre optic network with Gigabit links. Now this worked
a lot better.
The distinction between Teradata and Locus is interesting, because
Teradata is extremely successful and still leads in performance today.
What is this difference:
Teradata runs 1 application and that is a parallel database for data
mining. One function, entire system optimized for a RDBMS with a
specific workload. They have a parallel application interface but the
whole thing always runs the database and is not GENERAL PURPOSE. So they
pick a problem and solve it well.
With Locus, the idea is this is GENERAL PURPOSE and all apps will run unmodified and it's the holy grail. As I've learned over the last 20
years, systems are built and optimized for certain workloads and there's
no one size fits all system.
So HP/IBM/Intel/Tandem/Pyramid and all others bought the cool-aid, but
at the end of the day it just doesn't work and is not needed.
What worked was a unix server and NFS for remote access to data. That
model WON and SSI lost. Why did this win, because it was loosely coupled collection of independent nodes and the work was partitioned among the
nodes with great autonomy between the nodes, i.e. No message passing.
Jerry passed away last year, god rest his soul. He was a visionary and a
great teacher. He also wrote the first paper on virtualization which
vmware is based on. At least the virtualization idea has caught on and
has spawned the cloud computing era. Again independent nodes with single operating systems providing service, no message passing.
A few years ago I founded a company called ParaScale (parallel
scalability). We're building a cloud storage platform which basically
means an application layer software that runs on top of a number of
Linux servers and federates them into a single cloud storage platform providing file services over standard protocols. Again, the idea here is loosely coupled, autonomous nodes all operating with the guidance of a
few master nodes that are highly available. By learning from the past
and focusing on providing a specific service (in this case storage
services) we hope to achieve our goals of reliability and scalability
and virtualization.
I enjoyed reading your multi-part post on SSI and it brought back some
good memories. We have a few Locus people working at ParaScale and still solving interesting distributed system problems.
Best,
Cameron
--
April 30, 2009 at 8:31 PM"
--- Synchronet 3.22a-Linux NewsLink 1.2