• Book to catch up on current AI?

    From Mike Spencer@mds@bogus.nodomain.nowhere to comp.misc on Wed Sep 10 01:40:52 2025
    From Newsgroup: comp.misc


    Nearly 40 years ago, the MIT press published Parallel Distributed
    Processing, Vol. 1 & 2, by Rumelhart, McClelland et al. I read those
    as well as similar material published at MIT in the early 90s, wrote
    some functional toy code (on an Osborne I).

    But I haven't kept up.

    Can someone suggest books at more or less the same level of
    technicality that that I might look at to catch up a bit on how neural
    nets are now constructed, trained, connected etc. to produce what is
    being called "large language models"?

    The net is of course rife, indeed inundated, with stuff on the topic.
    But the vast bulk of it falls into one of two categories. One category
    is mass media news and pop science reporting, intended to provoke "Oh,
    gee whiz" by the average person or at best a vague notion of the
    subject for for the literate but non-technical. The other category
    is material intended for someone who has read all the technical
    literature for the last 40 years or at least has obtained a master's
    degree in AI computing/theory in the last decade. In the latter case,
    just the terminology is a barrier.

    I'm now an old guy. I'm not going to completely beat up all the math
    that has evolved since PDP but I'd like to get a more or less
    caught-up handle on how this stuff works internally.

    Any suggestions?

    [ Yes, I had a look at some of the AI newsgroups. Moribund or
    highjacked by politics.]
    --
    Mike Spencer Nova Scotia, Canada
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From David LaRue@huey.dll@tampabay.rr.com to comp.misc on Wed Sep 10 06:59:12 2025
    From Newsgroup: comp.misc

    Mike Spencer <mds@bogus.nodomain.nowhere> wrote in news:874itakjhn.fsf@enoch.nodomain.nowhere:


    Nearly 40 years ago, the MIT press published Parallel Distributed
    Processing, Vol. 1 & 2, by Rumelhart, McClelland et al. I read those
    as well as similar material published at MIT in the early 90s, wrote
    some functional toy code (on an Osborne I).

    But I haven't kept up.

    Can someone suggest books at more or less the same level of
    technicality that that I might look at to catch up a bit on how neural
    nets are now constructed, trained, connected etc. to produce what is
    being called "large language models"?

    The net is of course rife, indeed inundated, with stuff on the topic.
    But the vast bulk of it falls into one of two categories. One category
    is mass media news and pop science reporting, intended to provoke "Oh,
    gee whiz" by the average person or at best a vague notion of the
    subject for for the literate but non-technical. The other category
    is material intended for someone who has read all the technical
    literature for the last 40 years or at least has obtained a master's
    degree in AI computing/theory in the last decade. In the latter case,
    just the terminology is a barrier.

    I'm now an old guy. I'm not going to completely beat up all the math
    that has evolved since PDP but I'd like to get a more or less
    caught-up handle on how this stuff works internally.

    Any suggestions?

    [ Yes, I had a look at some of the AI newsgroups. Moribund or
    highjacked by politics.]

    Hi Mike,

    You might look up Stephane Charette's work on Dark Net. He has several examples and technical descriptions online. His work and examples mainly focus on video recognition of still frames. LLMs, from my limited understanding, are just massive text training sets obtained by grabbing everything the developer deems potentially useful and using that to train massive AI Networks.

    He does explain the math and training concepts he found useful.

    https://www.youtube.com/c/StephaneCharette/videos
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From ram@ram@zedat.fu-berlin.de (Stefan Ram) to comp.misc on Wed Sep 10 09:15:24 2025
    From Newsgroup: comp.misc

    Mike Spencer <mds@bogus.nodomain.nowhere> wrote or quoted:
    Can someone suggest books at more or less the same level of
    technicality that that I might look at to catch up a bit on how neural
    nets are now constructed, trained, connected etc. to produce what is
    being called "large language models"?

    From short to long:

    "The Little Book of Deep Learning" - François Fleuret
    (GPT in section 5.3 "Attention Models")

    "Understanding Deep Learning" - Simon J.D. Prince (section 12.5
    "Transformers for natural language processing")

    "Dive into Deep Learning" - Aston Zhang, Zachary C. Lipton,
    Mu Li, And Alexander J. Smola (chapter 11 "Attention
    Mechanisms and Transformers")

    Hands-on:

    "The Complete Guide to Deep Learning with Python Keras, Tensorflow,
    And Pytorch" - Joseph G. Derek (section 15.4 "NLP Applications:
    Chatbots, Sentiment Analysis Tools") - only conditionally
    recommended due to deficiencies in source code formatting


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Bruce@07.013@scorecrow.com to comp.misc on Mon Sep 15 17:44:41 2025
    From Newsgroup: comp.misc

    On 10/09/2025 05:40, Mike Spencer wrote:
    Any suggestions?

    <https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/>

    (In case you didn't recognise the name, Stephen Wolfram is the person
    behind Mathematica <https://www.wolfram.com/mathematica/>)
    --
    Bruce Horrocks
    Hampshire, England
    --- Synchronet 3.21a-Linux NewsLink 1.2