• Article: Multi-Multicore Single System Image / Cloud Computing. AGood Idea? (part 1)

    From Louis Ohland@ohland@charter.net to comp.sys.ibm.ps2.hardware on Tue Jun 23 08:38:40 2026
    From Newsgroup: comp.sys.ibm.ps2.hardware

    http://perilsofparallel.blogspot.com/2009/01/multi-multicore-single-system-image.html

    http://perilsofparallel.blogspot.com/2009/01/multi-multicore-single-system-image.html?showComment=1241145060000#c8871026021207257403

    "Cameron Bahar said...
    Greg,

    I'm a fan of your book and I read it after working at Locus Computing
    for a good number of years. Was interested to learn more about different cluster architectures and why Locus had essentially failed even though
    we all thought it was so cool and great to have this technology. At
    Locus, working under the leadership of Jerry Popek and company we did
    actually build a SSI unix-flavored operating system for IBM called
    AIX-TCF (transparent computing facility). IBM ran into trouble in 1992
    and moved operations to Austin Texas and essentially disconnected from
    Locus over the years after Locus refused a buyout offer from IBM which
    was a mistake. A lot of the Locus folks ended up at Interactive/Sun/Solaris-X86 and thus was born the Solaris X86 operating system. Locus's demise benefited Sunsoft and Solaris X86, even though
    Sun didn't leverage the X86 platform to it's own demise! Will be
    interesting to see what Larry will do with Solaris.

    Now to SSI and why didn't it take off. I think there are a number of
    reasons. Foremost, the concept and vision is certainly attractive. We
    had single system semantics across nodes and all OS services were
    distributed. We could migrate processes from one node to another. We had remote devices. We had a distributed replication filesystem (I worked on
    that a little) and transparency all over the system. But, I think the
    main problem was that ethernet got us effectively 4-5 Mbits/sec on a
    10Mbit link and were a tightly coupled operating system passing messages
    for consistency and coherency between nodes. The slow network caused
    lots of problems and things didn't always converge to a stable steady
    state. As you increased the number of nodes, the complexity grows and
    the number of messages increases and scalability suffers. HA was also problematic and not well thought out and we didn't have shared SAN's to
    help keep data around. So a shared nothing SSI cluster with a slow
    network is dead in its tracks.

    I joined Teradata after Locus and there we had a shared nothing database
    with a dedicated fibre optic network with Gigabit links. Now this worked
    a lot better.

    The distinction between Teradata and Locus is interesting, because
    Teradata is extremely successful and still leads in performance today.

    What is this difference:

    Teradata runs 1 application and that is a parallel database for data
    mining. One function, entire system optimized for a RDBMS with a
    specific workload. They have a parallel application interface but the
    whole thing always runs the database and is not GENERAL PURPOSE. So they
    pick a problem and solve it well.

    With Locus, the idea is this is GENERAL PURPOSE and all apps will run unmodified and it's the holy grail. As I've learned over the last 20
    years, systems are built and optimized for certain workloads and there's
    no one size fits all system.

    So HP/IBM/Intel/Tandem/Pyramid and all others bought the cool-aid, but
    at the end of the day it just doesn't work and is not needed.

    What worked was a unix server and NFS for remote access to data. That
    model WON and SSI lost. Why did this win, because it was loosely coupled collection of independent nodes and the work was partitioned among the
    nodes with great autonomy between the nodes, i.e. No message passing.

    Jerry passed away last year, god rest his soul. He was a visionary and a
    great teacher. He also wrote the first paper on virtualization which
    vmware is based on. At least the virtualization idea has caught on and
    has spawned the cloud computing era. Again independent nodes with single operating systems providing service, no message passing.

    A few years ago I founded a company called ParaScale (parallel
    scalability). We're building a cloud storage platform which basically
    means an application layer software that runs on top of a number of
    Linux servers and federates them into a single cloud storage platform providing file services over standard protocols. Again, the idea here is loosely coupled, autonomous nodes all operating with the guidance of a
    few master nodes that are highly available. By learning from the past
    and focusing on providing a specific service (in this case storage
    services) we hope to achieve our goals of reliability and scalability
    and virtualization.

    I enjoyed reading your multi-part post on SSI and it brought back some
    good memories. We have a few Locus people working at ParaScale and still solving interesting distributed system problems.

    Best,
    Cameron
    --

    April 30, 2009 at 8:31 PM"
    --- Synchronet 3.22a-Linux NewsLink 1.2