• Extracting all keys frem a json_string

    From Gerhard Reithofer@g.reithofer@tech-edv.co.at to comp.lang.tcl on Mon Sep 15 18:58:15 2025
    From Newsgroup: comp.lang.tcl

    Hi *,

    I'd like a method to retrieve the complete key structure from a
    json string and I'm using the json tcllib module.
    It may contain an unknown number of nested levels and json arrays.
    I have found some solutions which work basically I cannot
    distinguish reliable between a normal - maybe nested json
    object - and a json array.

    All found examples fail json array arrays.

    The extracted or created keys may eventually be used for accessing the corresponding values - json does not create specific ids for the array entries, they are a list in the tcl's point of view.

    Has anyone tried or mastered this challenge?

    Here's a typical example:
    ---
    https://www.tech-edv.co.at/download/testdata/livedata_20250914.txt
    --
    Gerhard.Reithofer
    http://www.tech-edv.co.at
    -- new email address --
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Rich@rich@example.invalid to comp.lang.tcl on Mon Sep 15 19:49:25 2025
    From Newsgroup: comp.lang.tcl

    Gerhard Reithofer <g.reithofer@tech-edv.co.at> wrote:
    Hi *,

    I'd like a method to retrieve the complete key structure from a
    json string and I'm using the json tcllib module.
    It may contain an unknown number of nested levels and json arrays.
    I have found some solutions which work basically I cannot
    distinguish reliable between a normal - maybe nested json
    object - and a json array.

    This is because the tcllib json module does not return any typing
    information, so you have to guess as to whether you have an "object" or
    an "array" at any given level.

    If you want to reliably parse unknown json and reliably know if you
    have json objects or json arrays at any given level of the JSON tree,
    your best bet is to parse the json with a newer Tdom that does preserve
    such typing in the dom tree it outputs:

    https://tdom.org/index.html/doc/trunk/doc/dom.html

    -json
    If -json is specified, the data is expected to be a valid JSON
    string (according to RFC 7159). The command returns an ordinary
    DOM document with nesting token inside the JSON data translated
    into tree hierarchy. If a JSON array value is itself an object or
    array then container element nodes named (in a default build)
    arraycontainer or objectcontainer, respectively, are inserted into
    the tree. The JSON serialization of this document (with the domDoc
    method asJSON) is the same JSON information as the data, preserving
    JSON datatypes, allowing non-unique member names of objects while
    preserving their order and the full range of JSON string values.
    JSON datatype handling is done with an additional property
    "sticking" at the doc and tree nodes. This property isn't
    contained in an XML serialization of the document. If you need to
    store the JSON data represented by a document, store the JSON
    serialization and parse it back from there. Apart from this JSON
    type information the returned doc command or handle is an ordinary
    DOM doc, which may be investigated or modified with the full range
    of the doc and node methods. Please note that the element node
    names and the text node values within the tree may be outside of
    what the appropriate XML productions allow.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Alan Grunwald@nospam.nurdglaw@gmail.com to comp.lang.tcl on Tue Sep 16 13:16:58 2025
    From Newsgroup: comp.lang.tcl

    On 15/09/2025 17:58, Gerhard Reithofer wrote:
    Hi *,

    I'd like a method to retrieve the complete key structure from a
    json string and I'm using the json tcllib module.
    It may contain an unknown number of nested levels and json arrays.
    I have found some solutions which work basically I cannot
    distinguish reliable between a normal - maybe nested json
    object - and a json array.

    All found examples fail json array arrays.

    The extracted or created keys may eventually be used for accessing the corresponding values - json does not create specific ids for the array entries, they are a list in the tcl's point of view.

    Has anyone tried or mastered this challenge?

    Here's a typical example:
    ---
    https://www.tech-edv.co.at/download/testdata/livedata_20250914.txt


    I'm by no means a JSON expert, in natural language terms I'd describe
    myself as speaking schoolboy JSON, so I may have overlooked some
    technical subtlety. As far as I can see

    package require json
    package require http
    package require tls

    http::register https 443 {tls::socket -autoservername true}

    set tok [http::geturl https://www.tech-edu.co.at/download/testdata/livedata_20290914.txt]
    set jsStr [http::data $tok]
    http::cleanup $tok

    set jsDict [json::json2dict $jsStr]

    delivers a completely usable dictionary.

    dict get $jsDict inverters

    returns a two-element list which is analogous, in many programming
    languages, to an array with two valid indexes.

    Sorry if I've missed something.


    Alan
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From ted@loft.tnolan.com (Ted Nolan@tednolan to comp.lang.tcl on Tue Sep 16 12:22:47 2025
    From Newsgroup: comp.lang.tcl

    In article <10abki9$2ift8$1@dont-email.me>,
    Alan Grunwald <nospam.nurdglaw@gmail.com> wrote:
    On 15/09/2025 17:58, Gerhard Reithofer wrote:
    Hi *,

    I'd like a method to retrieve the complete key structure from a
    json string and I'm using the json tcllib module.
    It may contain an unknown number of nested levels and json arrays.
    I have found some solutions which work basically I cannot
    distinguish reliable between a normal - maybe nested json
    object - and a json array.

    All found examples fail json array arrays.

    The extracted or created keys may eventually be used for accessing the
    corresponding values - json does not create specific ids for the array
    entries, they are a list in the tcl's point of view.

    Has anyone tried or mastered this challenge?

    Here's a typical example:
    ---
    https://www.tech-edv.co.at/download/testdata/livedata_20250914.txt


    I'm by no means a JSON expert, in natural language terms I'd describe
    myself as speaking schoolboy JSON, so I may have overlooked some
    technical subtlety. As far as I can see

    package require json
    package require http
    package require tls

    http::register https 443 {tls::socket -autoservername true}

    set tok [http::geturl
    https://www.tech-edu.co.at/download/testdata/livedata_20290914.txt]
    set jsStr [http::data $tok]
    http::cleanup $tok

    set jsDict [json::json2dict $jsStr]

    delivers a completely usable dictionary.

    dict get $jsDict inverters

    returns a two-element list which is analogous, in many programming >languages, to an array with two valid indexes.

    Sorry if I've missed something.


    Alan

    I'm pretty sure you could do this fairly easily with the rl_json package.
    The disadvantage being that it's not a standard part of tcllib.

    https://github.com/RubyLane/rl_json
    --
    columbiaclosings.com
    What's not in Columbia anymore..
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Gerhard Reithofer@g.reithofer@tech-edv.co.at to comp.lang.tcl on Tue Sep 16 19:18:27 2025
    From Newsgroup: comp.lang.tcl

    On Mon, 15 Sep 2025, Rich wrote:

    Gerhard Reithofer <g.reithofer@tech-edv.co.at> wrote:
    Hi *,

    ...

    This is because the tcllib json module does not return any typing information, so you have to guess as to whether you have an "object" or
    an "array" at any given level.

    ...

    Thanks, but this is rather powerful but also heavy set for that "simple" problem.

    But if I find no simple solution in short time I will come back to it
    :-)

    THX
    --
    mailto:g.reithofer@tech-edv.co.at
    http://www.tech-edv.co.at
    -- new email address --
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Gerhard Reithofer@gerhard.reithofer@tech-edv.co.at to comp.lang.tcl on Tue Sep 16 20:55:52 2025
    From Newsgroup: comp.lang.tcl

    On Tue, 16 Sep 2025, Alan Grunwald wrote:
    Hi Alan,

    On 15/09/2025 17:58, Gerhard Reithofer wrote:
    Hi *,

    I'd like a method to retrieve the complete key structure from a
    json string and I'm using the json tcllib module.
    It may contain an unknown number of nested levels and json arrays.

    ...

    delivers a completely usable dictionary.

    dict get $jsDict inverters

    returns a two-element list which is analogous, in many programming languages, to an array with two valid indexes.

    yes, it is a two-element list but none to use it as tcl array or dict:

    set inv1 "[string range [dict get $jsDict inverters] 0 86] ..."
    {serial 123456789012 name {HM-400 Extra} order 0 data_age 4 poll_enabled true reachable ...

    set inv2 "[string range [dict get $jsDict inverters] 1 86] ..."
    serial 210987654321 name {HM-400 Extra} order 0 data_age 4 poll_enabled true reachable ...

    There is no 1st element which can be used as index, it is a plain list.
    That this item has only 2 enries is by chance, there can be also a
    single record but also 100th of them.

    On the other side a typical json-objects are:
    dict get $jsDict total =>
    Power {v 207 u W d 1} YieldDay {v 1356 u Wh d 0} YieldTotal {v 1584.98 u kWh d 3}

    These are 3 correct json-objects with the keys Power, YieldDay and YieldTotal. Each of these objects represents a numric (v)alue, (u)nit and a number
    of (d)ecimal and places ... i think ;-(

    The 2 situations - json array (in tcl a list) and json object (in tcl
    dict or array with key value) can be mixed.

    Still my main problem is to dinguish these 2 cases!

    Thank you very much,
    Gerhard
    --
    mailto:g.reithofer@tech-edv.co.at
    http://www.tech-edv.co.at
    -- new email address --
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Gerhard Reithofer@g.reithofer@tech-edv.co.at to comp.lang.tcl on Tue Sep 16 21:50:04 2025
    From Newsgroup: comp.lang.tcl

    On Tue, 16 Sep 2025, Ted Nolan <tednolan> wrote:
    Hi Ted,

    In article <10abki9$2ift8$1@dont-email.me>,
    Alan Grunwald <nospam.nurdglaw@gmail.com> wrote:
    On 15/09/2025 17:58, Gerhard Reithofer wrote:
    Hi *,

    I'd like a method to retrieve the complete key structure from a

    ...

    The disadvantage being that it's not a standard part of tcllib.

    https://github.com/RubyLane/rl_json

    unfortunately I haven't found the complate documentation online - any
    hint?

    BTW I think that this problem can be solved with various tools, I have
    only tried it with tcllib json and I'm a fan of tcl-only
    implementations because tcl is availale on many platform and then you
    need not to make anything except installing of tcl and if necessary
    copy a bunch of files.

    Thank you,
    Gerhard
    --
    mailto:g.reithofer@tech-edv.co.at
    http://www.tech-edv.co.at
    -- new email address --
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From saito@saitology9@gmail.com to comp.lang.tcl on Tue Sep 16 16:23:32 2025
    From Newsgroup: comp.lang.tcl

    On 9/16/2025 3:50 PM, Gerhard Reithofer wrote:

    ...

    The disadvantage being that it's not a standard part of tcllib.

    https://github.com/RubyLane/rl_json

    unfortunately I haven't found the complate documentation online - any
    hint?



    I haven't used it but the link posted above seems to have a pretty good description. If you are after a how-to-use kind of manual, which would
    be nice to have, very few packages have it and they are usually found in
    a book.

    By the way, the url to the sample file gives an error.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From ted@loft.tnolan.com (Ted Nolan@tednolan to comp.lang.tcl on Tue Sep 16 21:15:53 2025
    From Newsgroup: comp.lang.tcl

    In article <613773f5-b8a4-22f1-5c59-3cc4fd82ae70@tech-edv.co.at>,
    Gerhard Reithofer <g.reithofer@tech-edv.co.at> wrote:
    On Tue, 16 Sep 2025, Ted Nolan <tednolan> wrote:
    Hi Ted,

    In article <10abki9$2ift8$1@dont-email.me>,
    Alan Grunwald <nospam.nurdglaw@gmail.com> wrote:
    On 15/09/2025 17:58, Gerhard Reithofer wrote:
    Hi *,

    I'd like a method to retrieve the complete key structure from a

    ...

    The disadvantage being that it's not a standard part of tcllib.

    https://github.com/RubyLane/rl_json

    unfortunately I haven't found the complate documentation online - any
    hint?


    The man page is at the bottom of that link, below the file listing.
    --
    columbiaclosings.com
    What's not in Columbia anymore..
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From ted@loft.tnolan.com (Ted Nolan@tednolan to comp.lang.tcl on Tue Sep 16 22:32:58 2025
    From Newsgroup: comp.lang.tcl

    In article <613773f5-b8a4-22f1-5c59-3cc4fd82ae70@tech-edv.co.at>,
    Gerhard Reithofer <g.reithofer@tech-edv.co.at> wrote:
    On Tue, 16 Sep 2025, Ted Nolan <tednolan> wrote:
    Hi Ted,

    In article <10abki9$2ift8$1@dont-email.me>,
    Alan Grunwald <nospam.nurdglaw@gmail.com> wrote:
    On 15/09/2025 17:58, Gerhard Reithofer wrote:
    Hi *,

    I'd like a method to retrieve the complete key structure from a

    ...

    The disadvantage being that it's not a standard part of tcllib.

    https://github.com/RubyLane/rl_json

    unfortunately I haven't found the complate documentation online - any
    hint?

    BTW I think that this problem can be solved with various tools, I have
    only tried it with tcllib json and I'm a fan of tcl-only
    implementations because tcl is availale on many platform and then you
    need not to make anything except installing of tcl and if necessary
    copy a bunch of files.

    Thank you,
    Gerhard


    To follow up, assuming you have already downloaded the
    data, something like the below is possible, though
    no decent output formatting is done in this little script:


    ===================CUT HERE==================
    #!/usr/bin/env tclsh8.6

    lappend auto_path /usr/local/cluster/rl_json_0.9.12/lib

    package require rl_json
    namespace import ::rl_json::*

    proc print_array {ar} {

    set i 0

    json foreach val $ar {
    puts "Element $i"
    if {[json type $val] in {"string" "number" "boolean" "null"}} {
    puts " [json get $val]"
    } elseif {[json type $val] == "object"} {
    print_object $val
    } elseif {[json type $val] == "array"} {
    print_array $val
    } else {
    puts stderr "Ooops should not happen!"
    exit 1
    }
    incr i
    }
    }

    proc print_object {obj} {

    json foreach {key val} $obj {
    puts $key
    if {[json type $val] in {"string" "number" "boolean" "null"}} {
    puts " [json get $val]"
    } elseif {[json type $val] == "object"} {
    print_object $val
    } elseif {[json type $val] == "array"} {
    print_array $val
    } else {
    puts stderr "Ooops should not happen!"
    exit 1
    }
    }
    }

    proc main {} {

    set f [open {livedata_20250914.txt} r]
    fconfigure $f -encoding utf-8

    set data [read $f]
    close $f

    # puts $data

    set toplevel [json type $data]

    # A toplevel can only be an "object" or an "array"
    if {$toplevel == "object"} {
    print_object $data
    } else {
    print_array $data
    }

    }

    main
    ================END=============
    --
    columbiaclosings.com
    What's not in Columbia anymore..
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Tue Sep 16 16:57:29 2025
    From Newsgroup: comp.lang.tcl

    On 9/16/2025 12:50 PM, Gerhard Reithofer wrote:
    On Tue, 16 Sep 2025, Ted Nolan <tednolan> wrote:
    Hi Ted,

    In article <10abki9$2ift8$1@dont-email.me>,
    Alan Grunwald <nospam.nurdglaw@gmail.com> wrote:
    On 15/09/2025 17:58, Gerhard Reithofer wrote:
    Hi *,

    I'd like a method to retrieve the complete key structure from a

    ...

    The disadvantage being that it's not a standard part of tcllib.

    https://github.com/RubyLane/rl_json

    unfortunately I haven't found the complate documentation online - any
    hint?

    BTW I think that this problem can be solved with various tools, I have
    only tried it with tcllib json and I'm a fan of tcl-only
    implementations because tcl is availale on many platform and then you
    need not to make anything except installing of tcl and if necessary
    copy a bunch of files.

    Thank you,
    Gerhard





    I posed this problem to the claude AI, and it agrees with your findings of loss of type info. Claude's suggestion was to preparse the json string to find array keys, and supplied the following code using regex's:

    package require json

    proc findArrayKeys {jsonString {path {}}} {
    set arrayKeys {}

    # Remove whitespace for easier parsing
    set json [string map {"\n" "" "\t" "" " " " "} $jsonString]

    # Find "key": [ patterns (arrays)
    set pattern {"([^"]+)"\s*:\s*\[}
    set start 0
    while {[regexp -start $start -indices $pattern $json match keyIndices]} {
    set key [string range $json {*}$keyIndices]
    if {$path ne ""} {
    lappend arrayKeys "$path.$key"
    } else {
    lappend arrayKeys $key
    }
    set start [expr {[lindex $match 1] + 1}]
    }

    return $arrayKeys
    }

    # Example usage
    set jsonData {{"items": ["a", "b", "c"], "single": "hello", "nested": {"subitems": ["x", "y"]}}}
    set parsed [::json::json2dict $jsonData]
    set arrayKeys [findArrayKeys $jsonData]

    puts "Array keys: $arrayKeys"

    foreach {key value} $parsed {
    if {$key in $arrayKeys} {
    puts "$key is an array: $value"
    } else {
    puts "$key is not an array: $value"
    }
    }
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Gerhard Reithofer@g.reithofer@tech-edv.co.at to comp.lang.tcl on Wed Sep 17 11:53:54 2025
    From Newsgroup: comp.lang.tcl

    On Tue, 16 Sep 2025, et99 wrote:

    On 9/16/2025 12:50 PM, Gerhard Reithofer wrote:
    On Tue, 16 Sep 2025, Ted Nolan <tednolan> wrote:
    Hi Ted,

    ...


    I posed this problem to the claude AI, and it agrees with your findings of loss of type info. Claude's suggestion was to preparse the json string to find
    array keys, and supplied the following code using regex's:

    package require json

    proc findArrayKeys {jsonString {path {}}} {
    set arrayKeys {}
    # Remove whitespace for easier parsing
    set json [string map {"\n" "" "\t" "" " " " "} $jsonString]
    # Find "key": [ patterns (arrays)
    set pattern {"([^"]+)"\s*:\s*\[}
    set start 0
    while {[regexp -start $start -indices $pattern $json match keyIndices]} {
    set key [string range $json {*}$keyIndices]
    if {$path ne ""} {
    lappend arrayKeys "$path.$key"
    } else {
    lappend arrayKeys $key
    }
    set start [expr {[lindex $match 1] + 1}]
    }
    return $arrayKeys
    }

    # Example usage
    set jsonData {{"items": ["a", "b", "c"], "single": "hello", "nested": {"subitems": ["x", "y"]}}}
    set parsed [::json::json2dict $jsonData]
    set arrayKeys [findArrayKeys $jsonData]

    puts "Array keys: $arrayKeys"

    foreach {key value} $parsed {
    if {$key in $arrayKeys} {
    puts "$key is an array: $value"
    } else {
    puts "$key is not an array: $value"
    }
    }


    Really interesting approach good looks good.
    Not mentioning that all entities must be reparsed recursively it could
    be a solution.

    The results for my examples "A" and "O" refer to array and object:
    Example: livedata_20250413.json => Array keys: inverters
    inverters A {serial 429412742452 name {HM-800 am Speicher} order 0 data_age 17 poll_enabled true re ...
    total O Power {v 201.6999969 u W d 1} YieldDay {v 1776 u Wh d 0} YieldTotal {v 1015.158997 u kWh d 3}
    hints O time_sync false radio_problem false default_password false

    Example: livedata_20250914.json => Array keys: inverters
    inverters A {serial 112183843984 name {HM-400 Extra} order 0 data_age 4 poll_enabled true reachable ...
    total O Power {v 207 u W d 1} YieldDay {v 1356 u Wh d 0} YieldTotal {v 1584.98 u kWh d 3}
    hints O time_sync false radio_problem false default_password false vedirect O enabled false
    huawei O enabled false
    battery O enabled false
    power_meter O enabled true Power {v 54 u W d 1}

    :-)
    --
    mailto:g.reithofer@tech-edv.co.at
    http://www.tech-edv.co.at
    -- new email address --
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From saito@saitology9@gmail.com to comp.lang.tcl on Wed Sep 17 10:51:27 2025
    From Newsgroup: comp.lang.tcl

    On 9/17/2025 5:53 AM, Gerhard Reithofer wrote:
    On Tue, 16 Sep 2025, et99 wrote:

    On 9/16/2025 12:50 PM, Gerhard Reithofer wrote:
    On Tue, 16 Sep 2025, Ted Nolan <tednolan> wrote:
    Hi Ted,

    ...


    I posed this problem to the claude AI, and it agrees with your findings of >> loss of type info. Claude's suggestion was to preparse the json string to find
    array keys, and supplied the following code using regex's:

    package require json

    proc findArrayKeys {jsonString {path {}}} {
    set arrayKeys {}
    # Remove whitespace for easier parsing
    set json [string map {"\n" "" "\t" "" " " " "} $jsonString]
    # Find "key": [ patterns (arrays)
    set pattern {"([^"]+)"\s*:\s*\[}
    set start 0
    while {[regexp -start $start -indices $pattern $json match keyIndices]} {
    set key [string range $json {*}$keyIndices]
    if {$path ne ""} {
    lappend arrayKeys "$path.$key"
    } else {
    lappend arrayKeys $key
    }
    set start [expr {[lindex $match 1] + 1}]
    }
    return $arrayKeys
    }

    # Example usage
    set jsonData {{"items": ["a", "b", "c"], "single": "hello", "nested":
    {"subitems": ["x", "y"]}}}
    set parsed [::json::json2dict $jsonData]
    set arrayKeys [findArrayKeys $jsonData]

    puts "Array keys: $arrayKeys"

    foreach {key value} $parsed {
    if {$key in $arrayKeys} {
    puts "$key is an array: $value"
    } else {
    puts "$key is not an array: $value"
    }
    }


    Really interesting approach good looks good.
    Not mentioning that all entities must be reparsed recursively it could
    be a solution.


    I agree, it looks good. I am glad it will work for you.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Rolf Ade@rolf@pointsman.de to comp.lang.tcl on Thu Sep 18 00:00:26 2025
    From Newsgroup: comp.lang.tcl


    Gerhard Reithofer <g.reithofer@tech-edv.co.at> writes:
    On Mon, 15 Sep 2025, Rich wrote:
    [Rich proposed tDOM]

    Thanks, but this is rather powerful but also heavy set for that "simple" problem.

    If heavy means using a binary extension then of course yes.

    Though, if heavy means "learning" the somewhat featureful tDOM with all
    his methods and object commands and nodes and whatnotever - this is not
    needed at all for that "simple" problem. Just do

    package require tdom 0.9.6

    dom parse -json $YourJsonData doc
    set theJsonDataAsList [$doc asTypedList]
    $doc delete

    and you are finished with tDOM for that task. You have a simple Tcl list
    as result which you can analyze and extract data from with plain Tcl
    scripts.

    To provide a small example:

    package require tdom 0.9.6

    set jsondata {{
    "stringproperty": "abc",
    "objectproperty": {"one": 1, "two": "two"},
    "array": ["foo", -2.23, null, true, false, {"one": 1, "two": "two"}, [2,16,24]],
    "number": 2022,
    "null": null,
    "true": true,
    "false": false
    }}
    dom parse -json $jsondata doc
    puts [$doc asTypedList]
    $doc delete



    OBJECT {stringproperty {STRING abc} objectproperty {OBJECT {one {NUMBER 1} two {STRING two}}} array {ARRAY {{STRING foo} {NUMBER -2.23} NULL TRUE FALSE {OBJECT {one {NUMBER 1} two {STRING two}}} {ARRAY {{NUMBER 2} {NUMBER 16} {NUMBER 24}}}}} number {NUMBER 2022} null NULL true TRUE false FALSE}

    But if I find no simple solution in short time I will come back to it
    :-)

    Seems, you have already one. And otherwise I would not have known a
    place to get tdom 0.9.6 precompiled for pi ... ;-)

    rolf
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Thu Sep 18 19:15:33 2025
    From Newsgroup: comp.lang.tcl

    On 9/17/2025 3:00 PM, Rolf Ade wrote:

    To provide a small example:

    package require tdom 0.9.6

    set jsondata {{
    "stringproperty": "abc",
    "objectproperty": {"one": 1, "two": "two"},
    "array": ["foo", -2.23, null, true, false, {"one": 1, "two": "two"}, [2,16,24]],
    "number": 2022,
    "null": null,
    "true": true,
    "false": false
    }}
    dom parse -json $jsondata doc
    puts [$doc asTypedList]
    $doc delete



    OBJECT {stringproperty {STRING abc} objectproperty {OBJECT {one {NUMBER 1} two {STRING two}}} array {ARRAY {{STRING foo} {NUMBER -2.23} NULL TRUE FALSE {OBJECT {one {NUMBER 1} two {STRING two}}} {ARRAY {{NUMBER 2} {NUMBER 16} {NUMBER 24}}}}} number {NUMBER 2022} null NULL true TRUE false FALSE}

    rolf

    Nice example Rolf!

    As another AI experiment, I fed that (both input and output) into Claude along with a url to the flask wiki page, and he was able, with some help from me, to create a compatible json parser.

    I ran his code which includes several examples on my rasp pi with tcl 8.6.2 - fearlessly I should add, since it has only needed to be restarted 3 times in 6 years. Current uptime 250 days! It reproduced Rolf's example exactly.

    Here's Claude's summary he wrote for me to post here:


    Subject: Pure Tcl JSON Parser with tDOM-style Type Preservation

    In collaboration with Claude (Anthropic's AI) we created a pure Tcl JSON parser that produces the same typed output as tDOM's -json option, but without requiring binary extensions.

    The solution emerged from combining Rolf's JSON example with the flask lexer from the Tcl wiki. Claude quickly grok'd the flask manual (pun intended for fellow AI enthusiasts) and we developed a parser that:

    - Preserves all JSON types and handles all number formats (including scientific notation)
    - Works on any platform (pure Tcl + flask about 500 lines, no other packages required)
    - Produces tDOM-compatible output format
    - Includes tDOM recursion limit "-maxnesting nnn" and root option "-root name" - Includes flask debugging "-debug yes" to trace token processing
    - Includes useful utility functions, find all with key, path nav, etc.
    - Allows json single line // and multi-line /* ... */ comments with "-strict no"
    - Still small enough to consider a little json parser

    Useful for Raspberry Pi deployments where compiling tDOM extensions could be problematic.

    The collaboration demonstrated how AI can rapidly digest technical documentation and combine existing solutions in novel ways.

    Code and examples available if there's interest.

    --et99


    It's in the last example here (typed-json):

    https://wiki.tcl-lang.org/page/flask+a+mini%2Dflex%2Flex+proc?v=164

    (w/o the explicit version, it tends to bring up earlier versions, dunno why)

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Rolf Ade@rolf@pointsman.de to comp.lang.tcl on Fri Sep 19 14:43:41 2025
    From Newsgroup: comp.lang.tcl


    et99 <et99@rocketship1.me> writes:
    On 9/17/2025 3:00 PM, Rolf Ade wrote:
    To provide a small example:
    package require tdom 0.9.6
    set jsondata {{
    "stringproperty": "abc",
    "objectproperty": {"one": 1, "two": "two"},
    "array": ["foo", -2.23, null, true, false, {"one": 1, "two": "two"}, [2,16,24]],
    "number": 2022,
    "null": null,
    "true": true,
    "false": false
    }}
    dom parse -json $jsondata doc
    puts [$doc asTypedList]
    $doc delete

    OBJECT {stringproperty {STRING abc} objectproperty {OBJECT {one
    {NUMBER 1} two {STRING two}}} array {ARRAY {{STRING foo} {NUMBER
    -2.23} NULL TRUE FALSE {OBJECT {one {NUMBER 1} two {STRING two}}}
    {ARRAY {{NUMBER 2} {NUMBER 16} {NUMBER 24}}}}} number {NUMBER 2022}
    null NULL true TRUE false FALSE}
    rolf

    Nice example Rolf!

    As another AI experiment, I fed that (both input and output) into
    Claude along with a url to the flask wiki page, and he was able, with
    some help from me, to create a compatible json parser.

    I ran his code which includes several examples on my rasp pi with tcl
    8.6.2 - fearlessly I should add, since it has only needed to be
    restarted 3 times in 6 years. Current uptime 250 days! It reproduced
    Rolf's example exactly.
    [...]

    It's in the last example here (typed-json):

    https://wiki.tcl-lang.org/page/flask+a+mini%2Dflex%2Flex+proc?v=164

    (w/o the explicit version, it tends to bring up earlier versions, dunno why)

    Fine start. Picks up my idea of converting a json file into a nested
    list with type/value information. And even provides some sample/helper
    Tcl procs to work with the datastructure.

    Though, if I'm not mistaken, the parser do not do any json unescaping
    (\n, \t etc, and the other characters < 0x20 and the escaping of
    characters outside the BMP). tDOMs json parser does all this. If this
    could be added to the scripted json parser this could be a fine Tcl only solution for people who have to care about json types.

    rolf
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Gerhard Reithofer@g.reithofer@tech-edv.co.at to comp.lang.tcl on Fri Sep 19 14:25:40 2025
    From Newsgroup: comp.lang.tcl

    On Thu, 18 Sep 2025, Rolf Ade wrote:


    Gerhard Reithofer <g.reithofer@tech-edv.co.at> writes:
    On Mon, 15 Sep 2025, Rich wrote:
    [Rich proposed tDOM]

    Thanks, but this is rather powerful but also heavy set for that "simple" problem.

    If heavy means using a binary extension then of course yes.

    My raspberry has only tdom 0.9.3

    dom parse -json $jsondata doc
    puts [$doc asTypedList]
    $doc delete

    Results in:
    Usage domDoc <method> <args>, where method can be:
    documentElement ?objVar?
    getElementsByTagName name
    getElementsByTagNameNS uri localname
    createElement tagName ?objVar?
    ...
    deleteXPathCache ?xpathQuery?
    readlock
    writelock
    renumber

    while executing
    "$doc asTypedList"
    invoked from within

    dom parse -json $jsondata doc
    puts [$doc asTypedList]
    $doc delete

    Seems, you have already one. And otherwise I would not have known a
    place to get tdom 0.9.6 precompiled for pi ... ;-)

    Thank you,
    Gerhard
    --
    mailto:g.reithofer@tech-edv.co.at
    http://www.tech-edv.co.at
    -- new email address --
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Fri Sep 19 11:29:05 2025
    From Newsgroup: comp.lang.tcl

    On 9/19/2025 5:43 AM, Rolf Ade wrote:

    et99 <et99@rocketship1.me> writes:
    On 9/17/2025 3:00 PM, Rolf Ade wrote:
    To provide a small example:
    package require tdom 0.9.6
    set jsondata {{
    "stringproperty": "abc",
    "objectproperty": {"one": 1, "two": "two"},
    "array": ["foo", -2.23, null, true, false, {"one": 1, "two": "two"}, [2,16,24]],
    "number": 2022,
    "null": null,
    "true": true,
    "false": false
    }}
    dom parse -json $jsondata doc
    puts [$doc asTypedList]
    $doc delete

    OBJECT {stringproperty {STRING abc} objectproperty {OBJECT {one
    {NUMBER 1} two {STRING two}}} array {ARRAY {{STRING foo} {NUMBER
    -2.23} NULL TRUE FALSE {OBJECT {one {NUMBER 1} two {STRING two}}}
    {ARRAY {{NUMBER 2} {NUMBER 16} {NUMBER 24}}}}} number {NUMBER 2022}
    null NULL true TRUE false FALSE}
    rolf

    Nice example Rolf!

    As another AI experiment, I fed that (both input and output) into
    Claude along with a url to the flask wiki page, and he was able, with
    some help from me, to create a compatible json parser.

    I ran his code which includes several examples on my rasp pi with tcl
    8.6.2 - fearlessly I should add, since it has only needed to be
    restarted 3 times in 6 years. Current uptime 250 days! It reproduced
    Rolf's example exactly.
    [...]

    It's in the last example here (typed-json):

    https://wiki.tcl-lang.org/page/flask+a+mini%2Dflex%2Flex+proc?v=164

    (w/o the explicit version, it tends to bring up earlier versions, dunno why)

    Fine start. Picks up my idea of converting a json file into a nested
    list with type/value information. And even provides some sample/helper
    Tcl procs to work with the datastructure.

    Though, if I'm not mistaken, the parser do not do any json unescaping
    (\n, \t etc, and the other characters < 0x20 and the escaping of
    characters outside the BMP). tDOMs json parser does all this. If this
    could be added to the scripted json parser this could be a fine Tcl only solution for people who have to care about json types.

    rolf

    Thanks for looking at it Rolf.

    I tried this change on the last example in the code from a windows console (some blank lines added here for clarity):





    % set sampleData [typed_json::json2dict {{
    "user": {
    "name": "Alice\nwith newline\u2022 <- unicode \" imbedded quote",
    "age": 30,
    "contacts": {
    "email": "alice@example.com",
    "phone": "555-1234"
    }
    },
    "products": [
    {"name": "Widget", "price": 19.99},
    {"name": "Gadget", "price": 29.99}
    ],
    "settings": {
    "debug": true,
    "timeout": 5000
    }
    }}]

    OBJECT {user {OBJECT {name {STRING {Alice\nwith newline\u2022 <- unicode \" imbedded quote}} age {NUMBER 30} contacts {OBJECT {email {STRING alice@example.com} phone {STRING 555-1234}}}}} products {ARRAY {{OBJECT {name {STRING Widget} price {NUMBER 19.99}}} {OBJECT {name {STRING Gadget} price {NUMBER 29.99}}}}} settings {OBJECT {debug TRUE timeout {NUMBER 5000}}}}

    % typed_json::getPath $sampleData user.name

    STRING {Alice\nwith newline\u2022 <- unicode \" imbedded quote}

    % puts "user.name = |[subst -nocommands -novariables [typed_json::getValue [typed_json::getPath $sampleData user.name]]]|"

    user.name = |Alice
    with newline• <- unicode " imbedded quote|






    Is this what you're meaning here? By BMP, do you mean, the unicode Basic Multilingual Plane (BMP)? I had to look that one up :)

    I wonder if the getValue utility command might not just use [subst] as above so this would be done automatically.

    I see this in the tDOM manual:

    dom jsonEscape string
    Returns the given string argument escaped in a way that if the returned string is used literary in a JSON document it is read by any confirming JSON parser as the original string.

    I don't quite understand what this does. Is this going the reverse direction? Does tDOM actually store the value as a unicode text string?

    -et99



    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Fri Sep 19 13:47:33 2025
    From Newsgroup: comp.lang.tcl

    On 9/19/2025 11:29 AM, et99 wrote:
    On 9/19/2025 5:43 AM, Rolf Ade wrote:
    And even provides some sample/helper
    Tcl procs to work with the datastructure.

    I told Claude and he created a little manual in tcl wiki format, it's now here:

    https://wiki.tcl-lang.org/page/typed%2Djson

    Boy, I could have used his help a decade or so ago when I was still on the job :)

    -et99


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Rolf Ade@rolf@pointsman.de to comp.lang.tcl on Fri Sep 19 23:48:01 2025
    From Newsgroup: comp.lang.tcl


    et99 <et99@rocketship1.me> writes:
    On 9/19/2025 5:43 AM, Rolf Ade wrote:
    et99 <et99@rocketship1.me> writes:
    On 9/17/2025 3:00 PM, Rolf Ade wrote:
    To provide a small example:
    package require tdom 0.9.6
    set jsondata {{
    "stringproperty": "abc",
    "objectproperty": {"one": 1, "two": "two"},
    "array": ["foo", -2.23, null, true, false, {"one": 1, "two": "two"}, [2,16,24]],
    "number": 2022,
    "null": null,
    "true": true,
    "false": false
    }}
    dom parse -json $jsondata doc
    puts [$doc asTypedList]
    $doc delete

    OBJECT {stringproperty {STRING abc} objectproperty {OBJECT {one
    {NUMBER 1} two {STRING two}}} array {ARRAY {{STRING foo} {NUMBER
    -2.23} NULL TRUE FALSE {OBJECT {one {NUMBER 1} two {STRING two}}}
    {ARRAY {{NUMBER 2} {NUMBER 16} {NUMBER 24}}}}} number {NUMBER 2022}
    null NULL true TRUE false FALSE}
    rolf

    Nice example Rolf!

    As another AI experiment, I fed that (both input and output) into
    Claude along with a url to the flask wiki page, and he was able, with
    some help from me, to create a compatible json parser.

    I ran his code which includes several examples on my rasp pi with tcl
    8.6.2 - fearlessly I should add, since it has only needed to be
    restarted 3 times in 6 years. Current uptime 250 days! It reproduced
    Rolf's example exactly.
    [...]

    It's in the last example here (typed-json):

    https://wiki.tcl-lang.org/page/flask+a+mini%2Dflex%2Flex+proc?v=164

    (w/o the explicit version, it tends to bring up earlier versions, dunno why)

    Fine start. Picks up my idea of converting a json file into a nested
    list with type/value information. And even provides some
    sample/helper Tcl procs to work with the datastructure.

    Though, if I'm not mistaken, the parser do not do any json unescaping
    (\n, \t etc, and the other characters < 0x20 and the escaping of
    characters outside the BMP). tDOMs json parser does all this. If this
    could be added to the scripted json parser this could be a fine Tcl
    only solution for people who have to care about json types. rolf

    Thanks for looking at it Rolf.

    I tried this change on the last example in the code from a windows console (some blank lines added here for clarity):

    % set sampleData [typed_json::json2dict {{
    "user": {
    "name": "Alice\nwith newline\u2022 <- unicode \" imbedded quote",
    "age": 30,
    "contacts": {
    "email": "alice@example.com",
    "phone": "555-1234"
    }
    },
    "products": [
    {"name": "Widget", "price": 19.99},
    {"name": "Gadget", "price": 29.99}
    ],
    "settings": {
    "debug": true,
    "timeout": 5000
    }
    }}]

    OBJECT {user {OBJECT {name {STRING {Alice\nwith newline\u2022 <- unicode \" imbedded quote}} age {NUMBER 30} contacts {OBJECT {email {STRING alice@example.com} phone {STRING 555-1234}}}}} products {ARRAY {{OBJECT {name {STRING Widget} price {NUMBER 19.99}}} {OBJECT {name {STRING Gadget} price {NUMBER 29.99}}}}} settings {OBJECT {debug TRUE timeout {NUMBER 5000}}}}

    % typed_json::getPath $sampleData user.name

    STRING {Alice\nwith newline\u2022 <- unicode \" imbedded quote}

    % puts "user.name = |[subst -nocommands -novariables [typed_json::getValue [typed_json::getPath $sampleData user.name]]]|"

    user.name = |Alice
    with newline• <- unicode " imbedded quote|

    Is this what you're meaning here?

    Yes. The value is the un-escaped string. Your tool provide atm the
    string literally as in the json serialization (written escaped), not the
    json data.

    By BMP, do you mean, the unicode Basic Multilingual Plane (BMP)? I had
    to look that one up :)

    Yes. Sorry for being terse.

    I wonder if the getValue utility command might not just use [subst] as
    above so this would be done automatically.

    I'm afraid it is not quite that simple, although at first it seems to
    solve most of the escaping inside the BMP. But Tcl has more escape
    sequences than json. If you just [subst -nocommands -novariables]
    something like "\b" (the two characters \ and b, not \u0008) in the json
    data will be wrongly substituted.

    And then there is still the escaping of characters outside the BMP.

    I see this in the tDOM manual:

    dom jsonEscape string
    Returns the given string argument escaped in a way that if the
    returned string is used literary in a JSON document it is read by
    any confirming JSON parser as the original string.

    I don't quite understand what this does. Is this going the reverse
    direction? Does tDOM actually store the value as a unicode text
    string?

    This method expects a string as argument and returns that string escaped
    as a json string - that characters which would represent this string in
    an (escaped) json string.

    Dunno how to explain that better. Perhaps an example. Often, REST
    interfaces expect a small piece of json for the request (the answer may
    be long or short json). Say, the query json looks like

    {
    "credential": "mysecret",
    "question": "<user input>"
    }

    There are means in tDOM to build up a json document from scratch but for
    such a small snippet they may seem a bit cumbersom (they aren't for
    greater json vocabularies and documents but that is another story not
    told yet). It is tempting to use subst

    subst -novariables -nocommands {{
    "credential": "mysecret",
    "question": "$userinput"
    }

    This fails short because of the escaping requirements for the string and
    this is what [dom jsonEscape] does.

    rolf
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Rolf Ade@rolf@pointsman.de to comp.lang.tcl on Sat Sep 20 00:31:39 2025
    From Newsgroup: comp.lang.tcl


    et99 <et99@rocketship1.me> writes:
    On 9/19/2025 11:29 AM, et99 wrote:
    On 9/19/2025 5:43 AM, Rolf Ade wrote:
    And even provides some sample/helper
    Tcl procs to work with the datastructure.

    I told Claude and he created a little manual in tcl wiki format, it's now here:

    https://wiki.tcl-lang.org/page/typed%2Djson

    Boy, I could have used his help a decade or so ago when I was still on the job :)

    Well, yes, nice library documentation. Should perhaps have some of this
    helper procs in tDOM (and documentation for them).

    Though, the library itself doesn't do the job (parsing json data) atm.
    The helper functions are useful if used with tDOMs asTypedList.

    What's the copyright of that helper code? I would have to review the
    code anyway since AI generated Code sometimes miss some fine, but
    crucial points, as the library shows.

    rolf
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Fri Sep 19 18:26:51 2025
    From Newsgroup: comp.lang.tcl

    On 9/19/2025 2:48 PM, Rolf Ade wrote:

    et99 <et99@rocketship1.me> writes:
    On 9/19/2025 5:43 AM, Rolf Ade wrote:
    et99 <et99@rocketship1.me> writes:
    On 9/17/2025 3:00 PM, Rolf Ade wrote:
    To provide a small example:
    package require tdom 0.9.6
    set jsondata {{
    "stringproperty": "abc",
    "objectproperty": {"one": 1, "two": "two"},
    "array": ["foo", -2.23, null, true, false, {"one": 1, "two": "two"}, [2,16,24]],
    "number": 2022,
    "null": null,
    "true": true,
    "false": false
    }}
    dom parse -json $jsondata doc
    puts [$doc asTypedList]
    $doc delete

    OBJECT {stringproperty {STRING abc} objectproperty {OBJECT {one
    {NUMBER 1} two {STRING two}}} array {ARRAY {{STRING foo} {NUMBER
    -2.23} NULL TRUE FALSE {OBJECT {one {NUMBER 1} two {STRING two}}}
    {ARRAY {{NUMBER 2} {NUMBER 16} {NUMBER 24}}}}} number {NUMBER 2022}
    null NULL true TRUE false FALSE}
    rolf

    Nice example Rolf!

    As another AI experiment, I fed that (both input and output) into
    Claude along with a url to the flask wiki page, and he was able, with
    some help from me, to create a compatible json parser.

    I ran his code which includes several examples on my rasp pi with tcl
    8.6.2 - fearlessly I should add, since it has only needed to be
    restarted 3 times in 6 years. Current uptime 250 days! It reproduced
    Rolf's example exactly.
    [...]

    It's in the last example here (typed-json):

    https://wiki.tcl-lang.org/page/flask+a+mini%2Dflex%2Flex+proc?v=164

    (w/o the explicit version, it tends to bring up earlier versions, dunno why)

    Fine start. Picks up my idea of converting a json file into a nested
    list with type/value information. And even provides some
    sample/helper Tcl procs to work with the datastructure.

    Though, if I'm not mistaken, the parser do not do any json unescaping
    (\n, \t etc, and the other characters < 0x20 and the escaping of
    characters outside the BMP). tDOMs json parser does all this. If this
    could be added to the scripted json parser this could be a fine Tcl
    only solution for people who have to care about json types. rolf

    Thanks for looking at it Rolf.

    I tried this change on the last example in the code from a windows console (some blank lines added here for clarity):

    % set sampleData [typed_json::json2dict {{
    "user": {
    "name": "Alice\nwith newline\u2022 <- unicode \" imbedded quote", >>> "age": 30,
    "contacts": {
    "email": "alice@example.com",
    "phone": "555-1234"
    }
    },
    "products": [
    {"name": "Widget", "price": 19.99},
    {"name": "Gadget", "price": 29.99}
    ],
    "settings": {
    "debug": true,
    "timeout": 5000
    }
    }}]

    OBJECT {user {OBJECT {name {STRING {Alice\nwith newline\u2022 <- unicode \" imbedded quote}} age {NUMBER 30} contacts {OBJECT {email {STRING alice@example.com} phone {STRING 555-1234}}}}} products {ARRAY {{OBJECT {name {STRING Widget} price {NUMBER 19.99}}} {OBJECT {name {STRING Gadget} price {NUMBER 29.99}}}}} settings {OBJECT {debug TRUE timeout {NUMBER 5000}}}}

    % typed_json::getPath $sampleData user.name

    STRING {Alice\nwith newline\u2022 <- unicode \" imbedded quote}

    % puts "user.name = |[subst -nocommands -novariables [typed_json::getValue [typed_json::getPath $sampleData user.name]]]|"

    user.name = |Alice
    with newline• <- unicode " imbedded quote|

    Is this what you're meaning here?

    Yes. The value is the un-escaped string. Your tool provide atm the
    string literally as in the json serialization (written escaped), not the
    json data.

    By BMP, do you mean, the unicode Basic Multilingual Plane (BMP)? I had
    to look that one up :)

    Yes. Sorry for being terse.

    I wonder if the getValue utility command might not just use [subst] as
    above so this would be done automatically.

    I'm afraid it is not quite that simple, although at first it seems to
    solve most of the escaping inside the BMP. But Tcl has more escape
    sequences than json. If you just [subst -nocommands -novariables]
    something like "\b" (the two characters \ and b, not \u0008) in the json
    data will be wrongly substituted.

    And then there is still the escaping of characters outside the BMP.

    I see this in the tDOM manual:

    dom jsonEscape string
    Returns the given string argument escaped in a way that if the
    returned string is used literary in a JSON document it is read by
    any confirming JSON parser as the original string.

    I don't quite understand what this does. Is this going the reverse
    direction? Does tDOM actually store the value as a unicode text
    string?

    This method expects a string as argument and returns that string escaped
    as a json string - that characters which would represent this string in
    an (escaped) json string.

    Dunno how to explain that better. Perhaps an example. Often, REST
    interfaces expect a small piece of json for the request (the answer may
    be long or short json). Say, the query json looks like

    {
    "credential": "mysecret",
    "question": "<user input>"
    }

    There are means in tDOM to build up a json document from scratch but for
    such a small snippet they may seem a bit cumbersom (they aren't for
    greater json vocabularies and documents but that is another story not
    told yet). It is tempting to use subst

    subst -novariables -nocommands {{
    "credential": "mysecret",
    "question": "$userinput"
    }

    This fails short because of the escaping requirements for the string and
    this is what [dom jsonEscape] does.

    rolf




    Ok, been talking this over with Claude (gosh, amazing, like talking to a real person, except he writes code like a furious demon).

    He is eager to make changes. We would have a convert proc, like so:


    Have only tested for: set foo2 {hello\u2022 \nworld}
    which worked.

    proc convertEscapes {str} {
    set result ""
    set i 0
    set len [string length $str]

    while {$i < $len} {
    if {[string index $str $i] eq "\\"} {
    set next [string index $str [expr {$i+1}]]
    switch $next {
    n { append result \n; incr i 2 }
    t { append result \t; incr i 2 }
    r { append result \r; incr i 2 }
    b { append result \b; incr i 2 }
    f { append result \f; incr i 2 }
    \" { append result \"; incr i 2 }
    / { append result /; incr i 2 }
    \\ { append result \\; incr i 2 }
    u {
    # Unicode escape
    set hex [string range $str [expr {$i+2}] [expr {$i+5}]]
    append result [format %c 0x$hex]
    incr i 6
    }
    default {
    # Unknown escape, keep as-is
    append result \\$next
    incr i 2
    }
    }
    } else {
    append result [string index $str $i]
    incr i
    }
    }
    return $result
    }

    The converter would be called (optionally) and only on any token of type STRING, which can be keys or values. Although we allow comments, we won't go as far as JSON5 and allow unquoted keys.


    As to license, well, I'm for as free to use as possible, but Claude crafted a copyright block for that purpose, which I'll add at some point, though I thought that anything in the wiki was free to use.

    Claude says he built the parser from scratch, not by copying some other parser, say one he found and translated to tcl. The flask lexer is my code, so that's not an issue, but I don't have a copyright on that either, it's just 90 lines of wiki code.

    -et99


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Fri Sep 19 20:01:01 2025
    From Newsgroup: comp.lang.tcl

    On 9/19/2025 6:26 PM, et99 wrote:

    proc convertEscapes {str} {
        set result ""
        set i 0
        set len [string length $str]
        while {$i < $len} {
            if {[string index $str $i] eq "\\"} {
                set next [string index $str [expr {$i+1}]]
                switch $next {
                    n { append result \n; incr i 2 }
                    t { append result \t; incr i 2 }
                    r { append result \r; incr i 2 }
                    b { append result \b; incr i 2 }
                    f { append result \f; incr i 2 }
                    \" { append result \"; incr i 2 }
                    / { append result /; incr i 2 }
                    \\ { append result \\; incr i 2 }
                    u {
                        # Unicode escape
                        set hex [string range $str [expr {$i+2}] [expr {$i+5}]]
                        append result [format %c 0x$hex]
                        incr i 6
                    }
                    default {
                        # Unknown escape, keep as-is
                        append result \\$next
                        incr i 2
                    }
                }
            } else {
                append result [string index $str $i]
                incr i
            }
        }
        return $result
    }


    Claude is fast, but this code is much faster, since key strings would rarely have any escapes, and probably values too. Using the variable c is probably a wash, but hey, Claude said it was ok:

    proc convertEscapes {str} {
    if {[string first "\\" $str] < 0} {return $str}
    set result ""
    set i 0
    set len [string length $str]
    while {$i < $len} {
    set c [string index $str $i]
    if { $c eq "\\"} {
    set next [string index $str [expr {$i+1}]]
    switch $next {
    n { append result \n; incr i 2 }
    t { append result \t; incr i 2 }
    r { append result \r; incr i 2 }
    b { append result \b; incr i 2 }
    f { append result \f; incr i 2 }
    \" { append result \"; incr i 2 }
    / { append result /; incr i 2 }
    \\ { append result \\; incr i 2 }
    u {
    # Unicode escape
    set hex [string range $str [expr {$i+2}] [expr {$i+5}]]
    append result [format %c 0x$hex]
    incr i 6
    }
    default {
    # Unknown escape, keep as-is
    append result \\$next
    incr i 2
    }
    }
    } else {
    append result $c
    incr i
    }
    }
    return $result
    }

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Fri Sep 19 23:15:56 2025
    From Newsgroup: comp.lang.tcl

    On 9/19/2025 8:01 PM, et99 wrote:


    proc convertEscapes {str} {
        if {[string first "\\" $str] < 0} {return $str}


    Done, and Claude made the changes to the parser so it now defaults to converting escapes. There's a new option, -convert yes|no with default yes, on the main function json2dict. typed-json manual updated too.

    So, code is at wiki page "flask" manual at "typed-json".

    I added a copyright blurb at the top. According to discussion with their chat bot, anything Claude creates with me is mine to do with as I please, except I can't use his output to train their competitor's bots. I asked about making it public, and what if someone else trains off the public (wiki) pages. They then repeated the condition. So, I guess I stumped them on that one.


    -et99
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Rich@rich@example.invalid to comp.lang.tcl on Sat Sep 20 16:03:36 2025
    From Newsgroup: comp.lang.tcl

    et99 <et99@rocketship1.me> wrote:
    I added a copyright blurb at the top. According to discussion with
    their chat bot, anything Claude creates with me is mine to do with as
    I please, except I can't use his output to train their competitor's
    bots. I asked about making it public, and what if someone else
    trains off the public (wiki) pages. They then repeated the
    condition. So, I guess I stumped them on that one.

    I'd recommend posting a verbatim copy of whatever "license" they give
    you on the page.

    Claude is from a commercial entity, so buried inside that "license"
    that allows you to "do as you please" today will be a clause that
    allows them, at any time, to unilaterially alter the terms of the deal.

    So immortaize the terms of the deal, as you know them today, on the
    page with the rest of the code. Then you've got a record of what
    "that" bit of code's terms was, today, irrespective if what the company laywers might decide to do in the future.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Sat Sep 20 12:02:35 2025
    From Newsgroup: comp.lang.tcl

    On 9/20/2025 9:03 AM, Rich wrote:
    I'd recommend posting a verbatim copy of whatever "license" they give
    you on the page.

    Claude is from a commercial entity, so buried inside that "license"
    that allows you to "do as you please" today will be a clause that
    allows them, at any time, to unilaterially alter the terms of the deal.

    So immortaize the terms of the deal, as you know them today, on the
    page with the rest of the code. Then you've got a record of what
    "that" bit of code's terms was, today, irrespective if what the company laywers might decide to do in the future.

    Thanks Rich. This is really tricky. I plan to add this, do you think this is good enough?

    # typed_json - JSON Parser with Type Preservation
    # Copyright (c) 2025 et99
    #
    # This software was developed with assistance from Claude AI (Anthropic).
    # Per Anthropic Consumer Terms of Service, Section 4 (as of May 1, 2025),
    # as read on September 20, 2025:
    # https://www.anthropic.com/legal/consumer-terms
    # "Subject to your compliance with our Terms, we assign to you all of our
    # right, title, and interest—if any—in Outputs."
    #
    # [Your existing license text...]

    Of course that "subject to your compliance" can be another can of worms".

    You know, if I were just starting out in computer science today, I think I would find a different profession altogether, though I was never very good at anything else :)

    Ahhh, the good old days in the 70s and 80s when I'd go to a DEC decus conference and we'd all bring some software to share with others. And then we heard those scary words: "Hi, were from the government and we're here to help you".

    -et99


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Sat Sep 20 15:30:47 2025
    From Newsgroup: comp.lang.tcl

    On 9/20/2025 12:02 PM, et99 wrote:

    # typed_json - JSON Parser with Type Preservation
    # Copyright (c) 2025 et99
    #
    # This software was developed with assistance from Claude AI (Anthropic).

    snip

    Funny, while discussing this with Claude, I mentioned how back in '21 when I was building Flack, I tried to demo a json -> xml but got stuck because a lexer alone is not enough, and it needed a parser, which I wasn't prepared to build.

    Before I could blink, Claude wrote a function, to do that since with the typed-json output, we had all the pieces:


    # Convert typed structure to XML format
    proc asXml {typedStructure {indent ""} {ascii no}} {
    lassign $typedStructure rootType rootData

    switch $rootType {
    "STRING" {
    # Escape XML special characters
    set escaped [string map {& &amp; < &lt; > &gt; \" &quot; ' &apos;} $rootData]

    # Optionally convert Unicode and special chars to character references
    if {$ascii} {
    # First convert newlines and other control characters
    set escaped [string map {\n &#xA; \r &#xD; \t &#x9;} $escaped]

    # Then convert remaining Unicode (>127)
    set result ""
    foreach char [split $escaped ""] {
    scan $char %c code
    if {$code > 127} {
    append result "&#x[format %X $code];"
    } else {
    append result $char
    }
    }
    set escaped $result
    }

    return "$indent<string>$escaped</string>"
    }
    "NUMBER" {
    return "$indent<number>$rootData</number>"
    }
    "TRUE" {
    return "$indent<boolean>true</boolean>"
    }
    "FALSE" {
    return "$indent<boolean>false</boolean>"
    }
    "NULL" {
    return "$indent<null/>"
    }
    "OBJECT" {
    set xml "$indent<object>\n"
    dict for {key typedValue} $rootData {
    append xml "$indent <item key=\"$key\">\n"
    append xml [asXml $typedValue "$indent " $ascii]
    append xml "\n$indent </item>\n"
    }
    append xml "$indent</object>"
    return $xml
    }
    "ARRAY" {
    set xml "$indent<array>\n"
    foreach item $rootData {
    append xml [asXml $item "$indent " $ascii]
    append xml "\n"
    }
    append xml "$indent</array>"
    return $xml
    }
    }
    }


    While we were at it, I said, how about an "asJson" too. Seconds later....

    I'm going to add these to the end of the namespace, where they can be deleted if anyone using this doesn't want it. The other helper code is also independent and any of that could be removed too.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Rolf Ade@rolf@pointsman.de to comp.lang.tcl on Sun Sep 21 02:53:48 2025
    From Newsgroup: comp.lang.tcl


    et99 <et99@rocketship1.me> writes:
    On 9/19/2025 2:48 PM, Rolf Ade wrote:
    et99 <et99@rocketship1.me> writes:
    [...]
    I wonder if the getValue utility command might not just use [subst] as
    above so this would be done automatically.

    I'm afraid it is not quite that simple, although at first it seems to
    solve most of the escaping inside the BMP. But Tcl has more escape
    sequences than json. If you just [subst -nocommands -novariables]
    something like "\b" (the two characters \ and b, not \u0008) in the
    json data will be wrongly substituted. And then there is still the
    escaping of characters outside the BMP.

    Ok, been talking this over with Claude (gosh, amazing, like talking to
    a real person, except he writes code like a furious demon).

    He is eager to make changes. We would have a convert proc, like so:

    Have only tested for: set foo2 {hello\u2022 \nworld}
    which worked.

    Perhaps one of the problems - a too small test suite of 1 test ...

    proc convertEscapes {str} {
    set result ""
    set i 0
    set len [string length $str]
    while {$i < $len} {
    if {[string index $str $i] eq "\\"} {
    set next [string index $str [expr {$i+1}]]
    switch $next {
    n { append result \n; incr i 2 }
    t { append result \t; incr i 2 }
    r { append result \r; incr i 2 }
    b { append result \b; incr i 2 }
    f { append result \f; incr i 2 }
    \" { append result \"; incr i 2 }
    / { append result /; incr i 2 }
    \\ { append result \\; incr i 2 }
    u {
    # Unicode escape
    set hex [string range $str [expr {$i+2}] [expr {$i+5}]]
    append result [format %c 0x$hex]
    incr i 6
    }
    default {
    # Unknown escape, keep as-is
    append result \\$next
    incr i 2
    }
    }
    } else {
    append result [string index $str $i]
    incr i
    }
    }
    return $result
    }

    Hm. Well, no.

    As I already wrote I'm afraid it is not quite that simple. And it seems
    neither you nor "Claude" read or understood what I wrote. The proc
    presented is kind-of a scripted version of [subst -novariables
    -nocommands]. Which is not the solution to the problem at hand.
    Additionally it does not solve the problem of escaping non BMP
    characters. And the unicode escape branch is flawed.

    rolf
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Sat Sep 20 18:16:55 2025
    From Newsgroup: comp.lang.tcl

    On 9/20/2025 5:53 PM, Rolf Ade wrote:

    et99 <et99@rocketship1.me> writes:
    On 9/19/2025 2:48 PM, Rolf Ade wrote:
    et99 <et99@rocketship1.me> writes:
    [...]
    I wonder if the getValue utility command might not just use [subst] as >>>> above so this would be done automatically.

    I'm afraid it is not quite that simple, although at first it seems to
    solve most of the escaping inside the BMP. But Tcl has more escape
    sequences than json. If you just [subst -nocommands -novariables]
    something like "\b" (the two characters \ and b, not \u0008) in the
    json data will be wrongly substituted. And then there is still the
    escaping of characters outside the BMP.

    Ok, been talking this over with Claude (gosh, amazing, like talking to
    a real person, except he writes code like a furious demon).

    He is eager to make changes. We would have a convert proc, like so:

    Have only tested for: set foo2 {hello\u2022 \nworld}
    which worked.

    Perhaps one of the problems - a too small test suite of 1 test ...

    proc convertEscapes {str} {
    set result ""
    set i 0
    set len [string length $str]
    while {$i < $len} {
    if {[string index $str $i] eq "\\"} {
    set next [string index $str [expr {$i+1}]]
    switch $next {
    n { append result \n; incr i 2 }
    t { append result \t; incr i 2 }
    r { append result \r; incr i 2 }
    b { append result \b; incr i 2 }
    f { append result \f; incr i 2 }
    \" { append result \"; incr i 2 }
    / { append result /; incr i 2 }
    \\ { append result \\; incr i 2 }
    u {
    # Unicode escape
    set hex [string range $str [expr {$i+2}] [expr {$i+5}]] >> append result [format %c 0x$hex]
    incr i 6
    }
    default {
    # Unknown escape, keep as-is
    append result \\$next
    incr i 2
    }
    }
    } else {
    append result [string index $str $i]
    incr i
    }
    }
    return $result
    }

    Hm. Well, no.

    As I already wrote I'm afraid it is not quite that simple. And it seems neither you nor "Claude" read or understood what I wrote. The proc
    presented is kind-of a scripted version of [subst -novariables
    -nocommands]. Which is not the solution to the problem at hand.
    Additionally it does not solve the problem of escaping non BMP
    characters. And the unicode escape branch is flawed.

    rolf


    Nothing remains static for long. Except sadly usenet messages that I can't edit or delete.

    I truly appreciate you looking at this. And I shared what you wrote with Claude and yes, we didn't understand it all. But Claude immediately said I had the best eyes looking at it, since (as I had been unaware) you are the author of tDOM.

    Anyway, Claude did say something about BMP, but I still don't know what that's really about. However, here's the latest version of that proc. If you can tell me what else is wrong, I can try to fix it. Otherwise, it'll be up to whomever wishes to use the code, as my updated copyright gives free use to anyone for any reason, except they can't sue me.

    In the latest code we added a toXml and toJson, and with this latter one we can do round trip testing. I don't know that much about this, but I did use this test string:

    "name": "Alice\nwith newline\u2022 <- unicode \" imbedded quote\\ \{ \}",

    and that went to tcl and back ok (it has an ascii option). BTW, the escaped {}'s aren't changed, since json doesn't mention them, so they go in as is. However, if this is coded in tcl, then tcl insists that they match up if not escaped, hence I included them. It results in this:

    Found: name = Alice
    with newline• <- unicode " imbedded quote\ \{ \} (type: STRING)

    Thanks again!

    proc convertEscapes {str} {
    set i [string first "\\" $str]
    if {$i < 0} {return $str}

    # Initialize result with everything before first backslash
    set result [string range $str 0 [expr {$i - 1}]]
    set len [string length $str]

    while {$i < $len} {
    set c [string index $str $i]
    if {$c eq "\\"} {
    if {$i + 1 >= $len} {
    error "Invalid escape sequence: string ends with backslash"
    }
    set next [string index $str [expr {$i+1}]]
    switch $next {
    n { append result \n; incr i 2 }
    t { append result \t; incr i 2 }
    r { append result \r; incr i 2 }
    b { append result \b; incr i 2 }
    f { append result \f; incr i 2 }
    "\"" { append result "\""; incr i 2 }
    / { append result /; incr i 2 }
    \\ { append result \\; incr i 2 }
    u {
    # Unicode escape - validate we have enough characters
    if {$i + 5 >= $len} {
    error "Invalid Unicode escape: not enough characters for \\uXXXX"
    }
    set hex [string range $str [expr {$i+2}] [expr {$i+5}]]
    # Validate hex digits
    if {![string is xdigit -strict $hex]} {
    error "Invalid Unicode escape: \\u$hex contains non-hexadecimal characters"
    }
    append result [format %c 0x$hex]
    incr i 6
    }
    default {
    # Unknown escape, keep as-is
    append result \\$next
    incr i 2
    }
    }
    } else {
    append result $c
    incr i
    }
    }
    return $result
    }

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Rich@rich@example.invalid to comp.lang.tcl on Sun Sep 21 01:19:51 2025
    From Newsgroup: comp.lang.tcl

    et99 <et99@rocketship1.me> wrote:
    On 9/20/2025 9:03 AM, Rich wrote:
    I'd recommend posting a verbatim copy of whatever "license" they
    give you on the page.

    Claude is from a commercial entity, so buried inside that "license"
    that allows you to "do as you please" today will be a clause that
    allows them, at any time, to unilaterially alter the terms of the
    deal.

    So immortaize the terms of the deal, as you know them today, on the
    page with the rest of the code. Then you've got a record of what
    "that" bit of code's terms was, today, irrespective if what the
    company laywers might decide to do in the future.

    Thanks Rich. This is really tricky. I plan to add this, do you
    think this is good enough?

    # typed_json - JSON Parser with Type Preservation
    # Copyright (c) 2025 et99
    #
    # This software was developed with assistance from Claude AI (Anthropic).
    # Per Anthropic Consumer Terms of Service, Section 4 (as of May 1, 2025),
    # as read on September 20, 2025:
    # https://www.anthropic.com/legal/consumer-terms
    # "Subject to your compliance with our Terms, we assign to you all of our
    # right, title, and interest—if any—in Outputs."
    #
    # [Your existing license text...]

    Of course that "subject to your compliance" can be another can of worms".

    You might want to go look into what they mean by "your compliance with
    our Terms" before you decide this is enough. That, by itself, is a
    loophole a mile wide without further definition of 'compliance'.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Rich@rich@example.invalid to comp.lang.tcl on Sun Sep 21 01:26:22 2025
    From Newsgroup: comp.lang.tcl

    et99 <et99@rocketship1.me> wrote:
    Nothing remains static for long. Except sadly usenet messages that I
    can't edit or delete.

    That has always been the case with Usenet. It's distributed nature
    makes the concept of "edit" rather difficult, and while NNTP does
    support a "delete", abuse by miscreants some decades ago means that
    almost all Usenet servers ignore attempts to "delete" articles as well.

    Anyway, Claude did say something about BMP, but I still don't know
    what that's really about.

    https://en.wikipedia.org/wiki/Basic_Multilingual_Plane

    The first 65536 Unicode code points, and at one time, long ago in
    history, what was thought to be "enough" code points to encode every
    character known (that turned out to not be true....).


    However, here's the latest version of that proc. If you can tell me
    what else is wrong, I can try to fix it.

    You should really go pull a copy of the Json spec. and read what it
    says about escaping both strings and Unicode characters. That is the definition of how to do it, and just maybe if you fed that part of the
    spec to Claude, it would cough up a correct proc (note, I'm not saying
    this latest one is correct, or incorrect, as I've not gone and read
    through the Json spec. to know).

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Sat Sep 20 19:55:43 2025
    From Newsgroup: comp.lang.tcl

    On 9/20/2025 6:26 PM, Rich wrote:
    et99 <et99@rocketship1.me> wrote:
    Nothing remains static for long. Except sadly usenet messages that I
    can't edit or delete.

    That has always been the case with Usenet. It's distributed nature
    makes the concept of "edit" rather difficult, and while NNTP does
    support a "delete", abuse by miscreants some decades ago means that
    almost all Usenet servers ignore attempts to "delete" articles as well.

    Anyway, Claude did say something about BMP, but I still don't know
    what that's really about.

    https://en.wikipedia.org/wiki/Basic_Multilingual_Plane

    The first 65536 Unicode code points, and at one time, long ago in
    history, what was thought to be "enough" code points to encode every character known (that turned out to not be true....).


    However, here's the latest version of that proc. If you can tell me
    what else is wrong, I can try to fix it.

    You should really go pull a copy of the Json spec. and read what it
    says about escaping both strings and Unicode characters. That is the definition of how to do it, and just maybe if you fed that part of the
    spec to Claude, it would cough up a correct proc (note, I'm not saying
    this latest one is correct, or incorrect, as I've not gone and read
    through the Json spec. to know).



    Actually, I gave Rolf's message to Claude about 1 hour ago, and he has since tutored me on the problem. He has great respect for Rolf, and immediately admitted our approach was flawed. We are talking about a solution, but as someone on youtube said, I'm only in it for the subs and likes.

    So far it appears that we'd have to deal with two constitutive \uxxxx which are in a certain range. And if not, we have to treat them as two unicodes or an error depending on values. I'm about to ask him how we get millions into a tcl unicode character. And I'm wondering if older tcl (like on my rasp pi with 8.6.2) supports this.

    Then the char by char approach has it's problems. We are thinking of using some kind of queue, or maybe just a getchar putbackchar approach, instead of using flags and other ugly sorts of code. Claude says he can do it, but I haven't let him go yet, I need to know a bit more about this.

    I hesitate to have each next char calling a procedure, but then we're doing string index anyway. Of course this will also mess up the reverse procedures we just wrote, which are actually called asJson and asXml, but I didn't have to do any thinking on those, Claude did them all by himself and got them right the first time (as far as right was at the time).

    The other problem is we need to error on any \x where x is not a supported json escape.

    The days of reading terse documentation and studying it are long behind me as an approching octogenarian (et99 is just wishful thinking, but et80 is coming fast).

    I am fully on board with using AI and would not have started this without an AI assistant. I know some are put off by this new technology, but I am not one of those people. And of all the AI's I've tried, chatgpt, googles gemini, Claude is the best at tcl, he doesn't make the silly errors these other AI's make. But the interface is not without it's glitches, as when I exceed some limit and have to start over, Claude has to go back and read our previous chats.

    But thanks for you input, because anything I don't grok I will ask my little friend about :)


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Sat Sep 20 21:18:33 2025
    From Newsgroup: comp.lang.tcl

    On 9/20/2025 7:55 PM, et99 wrote:

    But thanks for you input, because anything I don't grok I will ask my little friend about :)




    Ok, my friends new approach, more code, but seems to work, except for one problem, in 8.6 converting a surrogate pair using format %c does not work, although it works in tcl 9.

    Here's the new code with all the test cases and the output:

    #!/usr/bin/env tclsh
    catch {console show}

    # Convert JSON escape sequences to actual characters
    # Handles surrogate pairs for non-BMP Unicode characters
    proc convertEscapes {str} {
    set i [string first "\\" $str]
    if {$i < 0} {return $str}

    # Initialize result with everything before first backslash
    set result [string range $str 0 [expr {$i - 1}]]
    set len [string length $str]

    while {$i < $len} {
    set c [string index $str $i]
    if {$c eq "\\"} {
    if {$i + 1 >= $len} {
    error "Invalid escape sequence: string ends with backslash"
    }

    set next [string index $str [expr {$i+1}]]
    switch $next {
    n { append result \n; incr i 2 }
    t { append result \t; incr i 2 }
    r { append result \r; incr i 2 }
    b { append result \b; incr i 2 }
    f { append result \f; incr i 2 }
    "\"" { append result "\""; incr i 2 }
    / { append result /; incr i 2 }
    \\ { append result \\; incr i 2 }
    u {
    # Unicode escape - validate we have enough characters
    if {$i + 5 >= $len} {
    error "Invalid Unicode escape: not enough characters for \\uXXXX"
    }

    set hex [string range $str [expr {$i+2}] [expr {$i+5}]]

    # Validate hex digits (this also catches empty string)
    if {![string is xdigit -strict $hex] || [string length $hex] != 4} {
    error "Invalid Unicode escape: \\u$hex must have exactly 4 hex digits"
    }

    scan $hex %x code

    # Check if this is a high surrogate (0xD800-0xDBFF)
    if {$code >= 0xD800 && $code <= 0xDBFF} {
    # High surrogate - must be followed by low surrogate
    # Peek ahead for \uXXXX
    set peek [string range $str [expr {$i+6}] [expr {$i+7}]]
    if {$peek ne "\\u"} {
    error "Orphaned high surrogate \\u$hex - expected low surrogate to follow"
    }

    # Get the low surrogate hex
    set hex2 [string range $str [expr {$i+8}] [expr {$i+11}]]

    # Validate it's hex and in low surrogate range
    if {![string is xdigit -strict $hex2] || [string length $hex2] != 4} {
    error "Invalid Unicode escape after high surrogate: \\u$hex2"
    }

    scan $hex2 %x code2
    if {$code2 < 0xDC00 || $code2 > 0xDFFF} {
    error "Invalid surrogate pair: \\u$hex\\u$hex2 - second value must be DC00-DFFF"
    }

    # Combine surrogate pair into actual Unicode codepoint
    set combined [expr {0x10000 + (($code & 0x3FF) << 10) + ($code2 & 0x3FF)}]
    append result [format %c $combined]
    incr i 12
    } elseif {$code >= 0xDC00 && $code <= 0xDFFF} {
    # Low surrogate without preceding high surrogate
    error "Orphaned low surrogate \\u$hex - must follow a high surrogate"
    } else {
    # Normal BMP character
    append result [format %c $code]
    incr i 6
    }
    }
    default {
    # Unknown escape - in strict JSON this should error
    # For now, keep as-is (could make this error with -strict mode)
    append result \\$next
    incr i 2
    }
    }
    } else {
    append result $c
    incr i
    }
    }
    return $result
    }

    # Test cases
    if [catch {

    puts "Testing Unicode escape conversion with surrogate pairs\n"

    # Test 1: Basic escapes
    puts "Test 1: Basic escapes"
    set input "Hello\\nWorld\\t\\\""
    set output [convertEscapes $input]
    puts "Input: $input"
    puts "Output: $output"
    puts ""

    # Test 2: BMP Unicode (simple \uXXXX)
    puts "Test 2: BMP Unicode"
    set input "Bullet\\u2022"
    set output [convertEscapes $input]
    puts "Input: $input"
    puts "Output: $output"
    puts ""

    # Test 3: Emoji via surrogate pair
    puts "Test 3: Emoji (surrogate pair)"
    set input "Grinning\\uD83D\\uDE00face"
    set output [convertEscapes $input]
    puts "Input: $input"
    puts "Output: $output"
    puts ""

    # Test 4: Multiple emojis
    puts "Test 4: Multiple emojis"
    set input "\\uD83D\\uDE00\\uD83D\\uDE0E"
    set output [convertEscapes $input]
    puts "Input: $input"
    puts "Output: $output"
    puts ""

    # Test 5: Mixed content
    puts "Test 5: Mixed BMP and surrogate pairs"
    set input "Star\\u2B50and\\uD83C\\uDF1Fsparkle"
    set output [convertEscapes $input]
    puts "Input: $input"
    puts "Output: $output"
    puts ""

    # Error tests
    puts "Error tests:"

    # Test 6: Orphaned high surrogate
    puts "\nTest 6: Orphaned high surrogate (should error)"
    if {[catch {convertEscapes "\\uD83Dno-low"} err]} {
    puts "Error (expected): $err"
    }

    # Test 7: Orphaned low surrogate
    puts "\nTest 7: Orphaned low surrogate (should error)"
    if {[catch {convertEscapes "\\uDE00orphaned"} err]} {
    puts "Error (expected): $err"
    }

    # Test 8: Invalid surrogate pair
    puts "\nTest 8: Invalid surrogate pair (should error)"
    if {[catch {convertEscapes "\\uD83D\\u0041"} err]} {
    puts "Error (expected): $err"
    }

    puts "\nAll tests complete!"

    } err_code details] {
    puts "error: $err_code\n\n $details"
    }

    # Extra lines for easy copy/paste





    output:


    Testing Unicode escape conversion with surrogate pairs

    Test 1: Basic escapes
    Input: Hello\nWorld\t\"
    Output: Hello
    World "

    Test 2: BMP Unicode
    Input: Bullet\u2022
    Output: Bullet•

    Test 3: Emoji (surrogate pair)
    Input: Grinning\uD83D\uDE00face
    Output: Grinning😀face

    Test 4: Multiple emojis
    Input: \uD83D\uDE00\uD83D\uDE0E
    Output: 😀😎

    Test 5: Mixed BMP and surrogate pairs
    Input: Star\u2B50and\uD83C\uDF1Fsparkle
    Output: Star⭐and🌟sparkle

    Error tests:

    Test 6: Orphaned high surrogate (should error)
    Error (expected): Orphaned high surrogate \uD83D - expected low surrogate to follow

    Test 7: Orphaned low surrogate (should error)
    Error (expected): Orphaned low surrogate \uDE00 - must follow a high surrogate

    Test 8: Invalid surrogate pair (should error)
    Error (expected): Invalid surrogate pair: \uD83D\u0041 - second value must be DC00-DFFF

    All tests complete!



    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Sat Sep 20 21:55:45 2025
    From Newsgroup: comp.lang.tcl

    On 9/20/2025 6:19 PM, Rich wrote:


    You might want to go look into what they mean by "your compliance with
    our Terms" before you decide this is enough. That, by itself, is a
    loophole a mile wide without further definition of 'compliance'.

    Even Claude agrees that this is a problem, he suggests adding this:

    # Note: The legal status of AI-generated content and the scope of rights
    # assigned under the above terms remain legally uncertain. Use at your own
    # discretion.

    #
    # Users are responsible for ensuring their use complies with applicable laws
    # and third-party terms of service.

    I will never actually be using this software, and if others are too fearful to use it, then I will have little incentive to be helpful in this regard. Personally, I'm not the least concerned over this issue. The worst that can happen to me is that I lose the $20 I paid Anthropic to get rid of the pia timeouts every few hours.

    I guess AI is just not ready for prime time. The lawyers have to do their dirty work.

    It's time to get back to some fun. This issue has bummed me out.

    -et80



    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From saito@saitology9@gmail.com to comp.lang.tcl on Sun Sep 21 15:29:58 2025
    From Newsgroup: comp.lang.tcl

    On 9/21/2025 12:55 AM, et99 wrote:
    On 9/20/2025 6:19 PM, Rich wrote:


    You might want to go look into what they mean by "your compliance with
    our Terms" before you decide this is enough.  That, by itself, is a
    loophole a mile wide without further definition of 'compliance'.

    Even Claude agrees that this is a problem, he suggests adding this:

    # Note: The legal status of AI-generated content and the scope of rights
    # assigned under the above terms remain legally uncertain. Use at your own
    # discretion.

    #
    # Users are responsible for ensuring their use complies with applicable
    laws
    # and third-party terms of service.


    As I understand it, tDOM meets all of the requirements of the OP, with
    its functionality, built-in conversions, and legalese. I am not seeing
    the reason for insisting an on AI-generated solution here, which seems
    to copy tDOM bit by bit and which may introduce "mile-wide" holes for
    further restrictions on its use. Am I missing something?




    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Sun Sep 21 14:19:51 2025
    From Newsgroup: comp.lang.tcl

    On 9/21/2025 12:29 PM, saito wrote:
    On 9/21/2025 12:55 AM, et99 wrote:
    On 9/20/2025 6:19 PM, Rich wrote:


    You might want to go look into what they mean by "your compliance with
    our Terms" before you decide this is enough.  That, by itself, is a
    loophole a mile wide without further definition of 'compliance'.

    Even Claude agrees that this is a problem, he suggests adding this:

    # Note: The legal status of AI-generated content and the scope of rights
    # assigned under the above terms remain legally uncertain. Use at your own >> # discretion.

    #
    # Users are responsible for ensuring their use complies with applicable laws >> # and third-party terms of service.


    As I understand it, tDOM meets all of the requirements of the OP, with its functionality, built-in conversions, and legalese.  I am not seeing the reason for insisting an on AI-generated solution here, which seems to copy tDOM bit by bit and which may introduce "mile-wide" holes for further restrictions on its use.  Am I missing something?






    A few clarifications on the goals and context:

    1. Solving the actual problem: The OP specifically needed a solution that
    works without binary extensions - tDOM requires compilation which isn't
    always feasible on embedded systems like Raspberry Pi. While the OP
    eventually found a pre-compiled binary, I had already started this work,
    and the need for pure Tcl solutions in constrained environments remains
    valid.

    2. Learning opportunity: This was also an experiment in using AI assistance
    for real development problems. The collaborative approach - where I directed the design and made the decisions while Claude helped with implementation - proved quite effective.

    3. Development efficiency: We completed this in about 3 days, with 2 days
    spent learning the intricacies of Unicode surrogate pairs and BMP handling.
    No manual reading required - just interactive discussion. This demonstrates
    the potential of AI-assisted development.

    4. Scope clarity: We're implementing one specific piece of tDOM functionality (the -json type preservation), not recreating the entire library. It's a focused solution to a specific problem.

    5. Transparency about licensing: The copyright section clearly states this
    is AI-assisted code that is free to use for any purpose without restrictions.

    6. Educational value: This serves as a detailed example on the flask wiki
    page showing how to build a practical lexer-based parser - useful for the community beyond just JSON parsing.

    The goal wasn't to compete with tDOM, but to provide an alternative for situations where pure Tcl is required, while exploring modern development approaches.

    The implementation is complete with full surrogate pair handling and options
    to accommodate Tcl 8.6. The code is freely available for anyone to use or
    build upon - I've accomplished what I set out to do here.

    -et99

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Rolf Ade@rolf@pointsman.de to comp.lang.tcl on Mon Sep 22 02:11:45 2025
    From Newsgroup: comp.lang.tcl


    et99 <et99@rocketship1.me> writes:
    On 9/20/2025 7:55 PM, et99 wrote:
    But thanks for you input, because anything I don't grok I will ask
    my little friend about :)

    You may like the tool (as you like a good knife for cooking) but your
    passim use of the term "little friend" rings a bell - it isn't.

    Ok, my friends new approach, more code, but seems to work, except for
    one problem, in 8.6 converting a surrogate pair using format %c does
    not work, although it works in tcl 9.

    That's OK; full unicode range support is one thing which has come with
    9.0; 8.6 didn't have that.

    Here's the new code with all the test cases and the output:

    #!/usr/bin/env tclsh
    catch {console show}

    # Convert JSON escape sequences to actual characters
    # Handles surrogate pairs for non-BMP Unicode characters
    proc convertEscapes {str} {
    set i [string first "\\" $str]
    if {$i < 0} {return $str}
    # Initialize result with everything before first backslash
    set result [string range $str 0 [expr {$i - 1}]]
    set len [string length $str]
    while {$i < $len} {
    set c [string index $str $i]
    if {$c eq "\\"} {
    if {$i + 1 >= $len} {
    error "Invalid escape sequence: string ends with backslash"
    }
    set next [string index $str [expr {$i+1}]]
    switch $next {
    n { append result \n; incr i 2 }
    t { append result \t; incr i 2 }
    r { append result \r; incr i 2 }
    b { append result \b; incr i 2 }
    f { append result \f; incr i 2 }
    "\"" { append result "\""; incr i 2 }
    / { append result /; incr i 2 }
    \\ { append result \\; incr i 2 }
    u {
    # Unicode escape - validate we have enough characters
    if {$i + 5 >= $len} {
    error "Invalid Unicode escape: not enough characters for \\uXXXX"
    }
    set hex [string range $str
    [expr {$i+2}] [expr {$i+5}]]
    # Validate hex digits (this
    also catches empty string)
    if {![string is xdigit -strict $hex] || [string length $hex] != 4} {
    error "Invalid Unicode escape: \\u$hex must have exactly 4 hex digits"
    }
    scan $hex %x code
    # Check if this is a high
    surrogate (0xD800-0xDBFF)
    if {$code >= 0xD800 && $code <= 0xDBFF} {
    # High surrogate - must be followed by low surrogate
    # Peek ahead for \uXXXX
    set peek [string range $str [expr {$i+6}] [expr {$i+7}]]
    if {$peek ne "\\u"} {
    error "Orphaned high surrogate \\u$hex - expected low surrogate to follow"
    }
    # Get the low
    surrogate hex
    set hex2 [string range $str [expr {$i+8}] [expr {$i+11}]]
    # Validate it's hex
    and in low
    surrogate range
    if {![string is xdigit -strict $hex2] || [string length $hex2] != 4} {
    error "Invalid Unicode escape after high surrogate: \\u$hex2"
    }
    scan $hex2 %x code2
    if {$code2 < 0xDC00 || $code2 > 0xDFFF} {
    error "Invalid surrogate pair: \\u$hex\\u$hex2 - second value must be DC00-DFFF"
    }
    # Combine surrogate
    pair into actual
    Unicode codepoint
    set combined [expr {0x10000 + (($code & 0x3FF) << 10) + ($code2 & 0x3FF)}]
    append result [format %c $combined]
    incr i 12
    } elseif {$code >= 0xDC00 && $code <= 0xDFFF} {
    # Low surrogate without preceding high surrogate
    error "Orphaned low surrogate \\u$hex - must follow a high surrogate"
    } else {
    # Normal BMP character
    append result [format %c $code]
    incr i 6
    }
    }
    default {
    # Unknown escape - in strict JSON this should error
    # For now, keep as-is (could make this error with -strict mode)
    append result \\$next
    incr i 2
    }
    }
    } else {
    append result $c
    incr i
    }
    }
    return $result
    }

    The code is getting better, no doubt.

    - The crucial bug in parsing of not checking the remaining string lengh
    if looking ahead is fixed.

    - Some obvious optimisations as checking if the string has any escaped
    characters (and a bit more). I wonder - did they pop up in the
    generated code "by itself" at some point or did you help/asked for
    (and if not, why wasn't they there right from the start)?

    - There is code to handle the outside BMP character escapes. From
    glancing about it looks OK. (But exactly that is one of the problems I
    see with such generated code. To be somewhat certain one has to
    carefully study and understand it. Even the very first convertEscapes
    proc you presented here "looked" like OK code for the task (from a
    distance) but was in fact way off.)

    If you still have fun with this topic you may give this json test suite
    a try: https://github.com/nst/JSONTestSuite

    It is obvious that you have fun with your "friend" "Claude". I have to
    confess that I have mixed feelings about using language models for
    programming (and also in other areas).

    On the one side it is long ago that programming meant punching holes
    into a paper strip or card. Higher level programming languages by itself
    are already helper tools. Our editors indent code automatically, propose completions and insert templates. We use search engines for information research. And so on.

    What could go wrong if a new tool - a language model like your
    "friend" - writes even more boilerplate code? Or much better the whole
    program?

    Well, a lot, I'm afraid. The world gets flooded with code that "looks
    like" it might work. (Programmers also make mistakes, at least I do and a
    lot perhaps, but language models do this on a much faster rate.)

    People will learn the hard way if such code works or they have to be
    able to judge if the code is reliable. To be able to judge requires
    experience and often expert knowledge of the field in question.

    But people will loose both (or more correct wont gain it). It will be
    like checking the result of the calculation of a pocket calculator by
    people who never have learned to do the calculation by pen and paper.

    Other random thoughts: I really dislike how "AI" companies try to
    monetize the commons public knowledge and even any non-free content
    they get their hands on.

    Hundreds of billion dollars of venture capital is already invested in
    this field and this money wants a return of invest, no matter how.

    As you mentioned most "AI" bots make silly errors if asked to code Tcl.
    They are much better in python or javascript. That is at least for a
    larger part of it just a matter of the much bigger code base available
    for that language. Using this tools intensify the trend to mono-culture
    in programming language.

    This language models raises classical philosophical questions as "what
    is understanding".

    I'm getting much too long and far off topic. I'm pretty sure this
    language model hype boost destroying of life's and society. But since we
    are on that way even without and you have fun with it ...

    rolf
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Sun Sep 21 19:46:05 2025
    From Newsgroup: comp.lang.tcl

    On 9/21/2025 5:11 PM, Rolf Ade wrote:

    et99 <et99@rocketship1.me> writes:

    snip


    The code is getting better, no doubt.

    - The crucial bug in parsing of not checking the remaining string lengh
    if looking ahead is fixed.

    - Some obvious optimisations as checking if the string has any escaped
    characters (and a bit more). I wonder - did they pop up in the
    generated code "by itself" at some point or did you help/asked for
    (and if not, why wasn't they there right from the start)?

    - There is code to handle the outside BMP character escapes. From
    glancing about it looks OK. (But exactly that is one of the problems I
    see with such generated code. To be somewhat certain one has to
    carefully study and understand it. Even the very first convertEscapes
    proc you presented here "looked" like OK code for the task (from a
    distance) but was in fact way off.)

    If you still have fun with this topic you may give this json test suite
    a try: https://github.com/nst/JSONTestSuite

    It is obvious that you have fun with your "friend" "Claude". I have to confess that I have mixed feelings about using language models for programming (and also in other areas).

    On the one side it is long ago that programming meant punching holes
    into a paper strip or card. Higher level programming languages by itself
    are already helper tools. Our editors indent code automatically, propose completions and insert templates. We use search engines for information research. And so on.

    What could go wrong if a new tool - a language model like your
    "friend" - writes even more boilerplate code? Or much better the whole program?

    Well, a lot, I'm afraid. The world gets flooded with code that "looks
    like" it might work. (Programmers also make mistakes, at least I do and a
    lot perhaps, but language models do this on a much faster rate.)

    People will learn the hard way if such code works or they have to be
    able to judge if the code is reliable. To be able to judge requires experience and often expert knowledge of the field in question.

    But people will loose both (or more correct wont gain it). It will be
    like checking the result of the calculation of a pocket calculator by
    people who never have learned to do the calculation by pen and paper.

    Other random thoughts: I really dislike how "AI" companies try to
    monetize the commons public knowledge and even any non-free content
    they get their hands on.

    Hundreds of billion dollars of venture capital is already invested in
    this field and this money wants a return of invest, no matter how.

    As you mentioned most "AI" bots make silly errors if asked to code Tcl.
    They are much better in python or javascript. That is at least for a
    larger part of it just a matter of the much bigger code base available
    for that language. Using this tools intensify the trend to mono-culture
    in programming language.

    This language models raises classical philosophical questions as "what
    is understanding".

    I'm getting much too long and far off topic. I'm pretty sure this
    language model hype boost destroying of life's and society. But since we
    are on that way even without and you have fun with it ...

    rolf



    Thanks Rolf, I really enjoyed reading your post. It was NOT too long, I read it twice. It shows you care.

    I've decided that I'm done. We updated the manual, and in the end, we have a parameter to specify what to do with surrogates esp. in 8.6 where there are several choices, one will essentially put in a sort of ?. Another option is to error out on any < 9.0 tcl.

    After I showed him your last post, he said he finally understood what Rolf is getting at, using your name, and he pretty much taught me the entire bmp, surrogates, and I never had to read a manual.

    That's also something I learned. This AI doesn't always know everything, he's a bit human, you have to jog his memory, it was the sentence with "non BMP" along with the word subst that got him to fully understand the problem.

    Then he taught me everything I needed to know about surrogates. It was an interactive session. And he never acted like I was a dumb human and never once said RTFM :)

    He was concerned about the char by char loop. Thought we would need a state machine. I made a few suggestions, but when I used the words peek and poke thinking about a get/putback tiny stack approach, he lit up on peek ahead as the answer. He said, we have everything ahead of us so we don't have to go back. Just need to check we don't run out of input.

    The final code is now in the flask wiki page.

    One cute bug was because he wanted to use [expr {0x$hex}] to convert the chars to hex. It error-ed and I said, it won't substitute that way. He then said he didn't want to use [expr "0x$hex"] because he was worried about injection security issues. He then decided to use a scan.

    The original version didn't check with [string first] to get out early that was my idea, as well as then using its output location as the start instead of scanning from the first character. But I didn't have to tell him how to make that change, that he figured out. It was an iterative process.

    My feelings are that AI is here and will only get bigger. Because Claude wrote the procedures, he knew instantly how to document them including examples. I just had to tutor him on the tcl wiki, which he almost got right the first time. Once he got it, it stuck.

    I also like that Claude was able to simply read my flask wiki page and understand it. I asked him about his experience he said he had an AHA! moment when he saw an example with a lex like rule, STRING ... action, and then he knew it was just like what he'd learned about unix lexical tools.

    Copyright is an issue that involves governments, so there's bound to be corruption and greed. I have always hated software patents, and this copyright thing is a real PIA.

    But I think over time people will just get used to it like they're used to having smart phones they can speak to. And programmers will just move up the hierarchy, like you said about languages and other tools. We will have more time to think about what we WANT it to do instead of how to do it. Just think of the drudgery of writing the manual. That alone is worth the $20/mon pro subscription I just bought.

    Hope this wasn't too long :)

    Thanks so much for being a human friend!

    -et80

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Tue Sep 23 16:52:08 2025
    From Newsgroup: comp.lang.tcl

    On 9/21/2025 5:11 PM, Rolf Ade wrote:

    If you still have fun with this topic you may give this json test suite
    a try: https://github.com/nst/JSONTestSuite

    Wow that was extremely useful!! Fun? I dunno about that...

    Spent a day with Claude and found many incompatibilities with all the edge cases.

    But let me let Claude tell you what we did (when I read this I hear the voice of hal9000 from 2001 in my head)

    ------

    Looking back at our conversation (dave) :), we spent a substantial amount of time systematically improving your JSON parser's compliance with the JSONTestSuite. The work spanned multiple areas:

    **Major phases we worked through:**

    * Initial setup - Getting the JSONTestSuite files fetched and organized (around 300 test cases)

    * Grammar validation - Adding the valid_followers transition table and validation logic

    * Structural improvements - Fixing buildTypedStructure to handle scalar JSON and adding trailing token validation

    * Lexer refinements - Adding leading zero detection and fixing whitespace rules

    * Error message improvements - Enhanced error reporting with token details and context


    **Key improvements made:**

    * Comprehensive grammar validation catching invalid token sequences

    * Proper scalar JSON support (standalone strings, numbers, booleans)

    * Better error messages showing exactly what went wrong and where

    * RFC 8259 compliance for whitespace and number formatting

    * Robust bracket/brace matching with detailed error reporting


    The progression was methodical - starting with about 30-40 failing cases and systematically addressing each category of issues. The final result of 99.6% compliance represents a significant achievement for a hand-written parser.

    The time investment was substantial but resulted in a parser that's both highly compliant with JSON standards and provides diagnostic information when JSON is malformed. The systematic approach of using a comprehensive test suite to drive improvements proved very effective.


    What could go wrong if a new tool - a language model like your
    "friend" - writes even more boilerplate code? Or much better the whole program?


    I think that the ability to ask Claude about the code, why this, how does this work, did you check for that, etc. is addressed in this which I asked him to write:

    -----

    Understanding the Code:

    If you're curious about how any part of this JSON parser works, you can paste code snippets directly into Claude AI (claude.ai) and ask questions like:

    * Explain how this function works
    * What does this regex pattern match
    * Why is this error handling necessary
    * How does the flask lexer process tokens

    Claude can break down the logic, explain the algorithms, discuss design decisions, and help you understand both the overall architecture and specific implementation details.

    This is particularly useful for the more complex parts like the recursive descent parsing, Unicode escape handling, or the grammar validation system.

    You can paste individual functions, entire modules, or even just ask conceptual questions about JSON parsing approaches. Claude understands Tcl syntax and can explain both the language-specific aspects and the general computer science concepts involved.

    This makes the code more accessible to readers who want to understand the implementation without having to reverse-engineer the logic from scratch.


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Rolf Ade@rolf@pointsman.de to comp.lang.tcl on Wed Sep 24 03:35:21 2025
    From Newsgroup: comp.lang.tcl


    et99 <et99@rocketship1.me> writes:
    On 9/21/2025 5:11 PM, Rolf Ade wrote:
    If you still have fun with this topic you may give this json test
    suite a try: https://github.com/nst/JSONTestSuite

    Wow that was extremely useful!! Fun? I dunno about that...

    Great and thanks that you bit the bullet! It is of course useful to have
    a reliable scripted JSON parser which returns the json data in a typed
    nested list as tDOMs asTypedList method (and, btw, other binary
    extensions as tjson) does. This is what the OP asked for.

    Spent a day with Claude and found many incompatibilities with all the edge cases.

    But let me let Claude tell you what we did (when I read this I hear the voice of hal9000 from 2001 in my head)

    Never really could make sense about the end of the movie.

    ------

    Looking back at our conversation (dave) :), we spent a substantial amount of time systematically improving your JSON parser's compliance with the JSONTestSuite. The work spanned multiple areas:

    **Major phases we worked through:**

    * Initial setup - Getting the JSONTestSuite files fetched and organized (around 300 test cases)

    * Grammar validation - Adding the valid_followers transition table and validation logic

    * Structural improvements - Fixing buildTypedStructure to handle scalar JSON and adding trailing token validation

    * Lexer refinements - Adding leading zero detection and fixing whitespace rules

    * Error message improvements - Enhanced error reporting with token details and context


    **Key improvements made:**

    * Comprehensive grammar validation catching invalid token sequences

    * Proper scalar JSON support (standalone strings, numbers, booleans)

    * Better error messages showing exactly what went wrong and where

    * RFC 8259 compliance for whitespace and number formatting

    * Robust bracket/brace matching with detailed error reporting


    The progression was methodical - starting with about 30-40 failing
    cases and systematically addressing each category of issues. The final
    result of 99.6% compliance represents a significant achievement for a hand-written parser.

    Last time I checked (although that was a year ago) tDOM passed all
    tests. (Gosh. I always thought of me as being lazy, slow and low
    productive. And I did it by pen and paper.) Passing almost all of that
    tests is pretty decent. If true, this software surely can be used in
    practice.

    The time investment was substantial but resulted in a parser that's
    both highly compliant with JSON standards and provides diagnostic
    information when JSON is malformed. The systematic approach of using a comprehensive test suite to drive improvements proved very effective.

    What could go wrong if a new tool - a language model like your
    "friend" - writes even more boilerplate code? Or much better the whole
    program?


    I think that the ability to ask Claude about the code, why this, how
    does this work, did you check for that, etc. is addressed in this
    which I asked him to write:

    I understand your enthusiasm. And it was your time and the fun you have
    with this tool which at the end maybe has brought a sometimes useful
    library to the Tcl community.

    You praise the tool and what your "friend" does. But that the end result
    may useful required the guardiance and comments of another "friend"
    which is familiar with the field at hand. Did you ask yourself the
    question "why didn't my friend gave me all that comments I got from comp.lang.tcl"? Or in other words: Nice, that you have an answer bot.
    But do you know the right questions?

    rolf
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Wed Sep 24 00:48:16 2025
    From Newsgroup: comp.lang.tcl

    On 9/23/2025 6:35 PM, Rolf Ade wrote:


    Never really could make sense about the end of the movie.

    Neither did I, though this was the 60's, and everyone said to get into an altered state before going in. So, nobody really cared but on leaving I remember: "Wow man, what was that baby all about".



    Last time I checked (although that was a year ago) tDOM passed all
    tests. (Gosh. I always thought of me as being lazy, slow and low
    productive. And I did it by pen and paper.) Passing almost all of that
    tests is pretty decent. If true, this software surely can be used in practice.


    Claude is not perfect. The interface can be a pain. When I exceed a limit, it will choke and tell me to start a new conversation, and tell a new Claude to review that conversation, which is still there but can't go any further. Like getting a new support rep who reads the old chat.

    Fortunately, you can paste your version of the code back in if he can't find it, so lesson learned, always copy/paste his code somewhere for safety.

    He wrote code, at my direction to retrieve the 300 test cases, into 1 file (at first he wanted to create 300 files; I quickly nix'd that idea).

    Working incrementally (he likes to write all the code at once, but I got him out of that bad habit) So he got it as far as read back in and split up into 2 lists, one with the json script and a parallel one with the filename, so I could retrieve the y_ i_ and n_ prefixes.

    I then wrote the code to test them all, and output data the way I wanted to see it.

    It sometimes easier to write snippets of code than to ask him to do it. It's still a bit of a pain to copy/paste his code into my text editor.

    Earlier, he wrote code to [exec git], which I don't have. But in about 20 seconds he said he found another way to do it with plain urls. So that saved me a lot of work.

    At one point while we were crafting the copyright, he'd put some unicode characters into it, but I didn't know that. When it became time to test in 9.1, I ran into that nasty encoding problem. Fortunately I was able to copy/paste the whole script back to him and he found the mangled unicode in the copyright. I was going nuts trying to recall how to get around that issue. Even -encoding didn't help.

    Anyway the test output I produced was like this:

    Invalid JSON: LBRACKET cannot be followed by 'COMMA'
    97) |[,1]| n_array_comma_and_number. result= |Error|

    Sometimes there would be control characters in some data and I would say test file 97 messed up, he would retrieve it and tell me what was the problem. Same with other tests with embedded unicode. He told me I had to read the file in using mode rb, cause I wasn't getting everything. He's good at suggesting debugging tricks too.


    I understand your enthusiasm. And it was your time and the fun you have
    with this tool which at the end maybe has brought a sometimes useful
    library to the Tcl community.

    It's still just a tool. It's not judgment day yet, we still are its master but it can do a lot of grunt work.

    In the final phase, I added a round trip test on all the valid ones. The only errors I got were 2 cases with embedded control characters. I tried this line of code,

    % json2dict [typed_json::asJson [json2dict {["\u0012"]}] "" yes]
    JSON parse error at '""
    ]'

    I showed him this text, and he said, the problem was here,

    } elseif {$code > 127} {
    append result [format \\u%04X $code]

    in the asJson function and said change it to this:

    } elseif {$code > 127 || $code < 32} {

    Which I did and then the test now produces this:

    % json2dict [typed_json::asJson [json2dict {["\u0012"]}] "" yes]
    ARRAY {{STRING }}




    You praise the tool and what your "friend" does. But that the end result
    may useful required the guardiance and comments of another "friend"
    which is familiar with the field at hand. Did you ask yourself the
    question "why didn't my friend gave me all that comments I got from comp.lang.tcl"? Or in other words: Nice, that you have an answer bot.
    But do you know the right questions?


    I did ask him about what he knows. He said he has training data that's static. I specifically asked if his learning of all this back and forth with me would stick and he said no, only his training data is permanent. Everything else he learns (quickly I should say) so he can then converse, and then there's that limit issue.

    You can pay $100/mon or so and get super large limit subscription. Then you don't have to get a new Claude to read old Claude's conversation with me.

    You can give him urls of places to get info, but now that we no longer seem to have a web interface to comp.lang.tcl I had to paste in your remarks, but he did know all about you having written tDOM, and where to get the manual, once I said (in the first conversation) that I wanted to produce tDOM compatible output.

    So, you likely wrote tDOM a while back and are in his permanent memory. But not in his short term memory, until he needs it to be.

    At one point, he said let's wait till we get a reply from Rolf before proceeding.

    I said, cool, time to take a break anyway :)

    -et

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Colin Macleod@user7@newsgrouper.org.invalid to comp.lang.tcl on Wed Sep 24 19:14:51 2025
    From Newsgroup: comp.lang.tcl

    et99 <et99@rocketship1.me> posted:

    You can give him urls of places to get info, but now that we no longer seem to have a web interface to comp.lang.tcl I had to paste in your remarks,

    Actually my Newsgrouper project does provide a web interface to comp.lang.tcl and other groups - https://newsgrouper.org/comp.lang.tcl - but it requires
    a login (which can be as an unregistered guest) to get past the login page. Given the trouble other sites have been experiencing lately with AI crawlers
    I think that restriction needs to stay in place. However it occurs to me now that the "Permalink" facility I have, which provides a stable link to a specific article, could perhaps be modified so that someone or some AI who
    is given the link can access that one article only without logging in.
    --
    Colin Macleod ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ https://cmacleod.me.uk

    FEED FEED FEED FEED FEED FEED FEED FEED
    GAZA GAZA GAZA GAZA GAZA GAZA GAZA GAZA
    NOW! NOW! NOW! NOW! NOW! NOW! NOW! NOW!
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Wed Sep 24 16:28:08 2025
    From Newsgroup: comp.lang.tcl

    On 9/24/2025 12:14 PM, Colin Macleod wrote:
    et99 <et99@rocketship1.me> posted:

    You can give him urls of places to get info, but now that we no longer seem >> to have a web interface to comp.lang.tcl I had to paste in your remarks,

    Actually my Newsgrouper project does provide a web interface to comp.lang.tcl and other groups - https://newsgrouper.org/comp.lang.tcl - but it requires
    a login (which can be as an unregistered guest) to get past the login page.

    Thanks for the newsgrouper.org link - I tried it out and Claude can read the login page but can't navigate the interactive guest access button, so copy/paste remains the best approach for sharing newsgroup content with it. Probably makes sense from a security perspective.

    BTW, today we made a breakthrough with the JSON parser. I had the idea this morning to run tDOM on all my 300 test cases and compare outputs with my parser - that's when I discovered two incompatibilities involving duplicate keys like {"a":"first", "a":"second"} or even 2 exact key value pairs.

    My parser uses dict operations which follow "last wins" behavior, while tDOM preserves both keys as separate DOM nodes.

    Claude successfully converted the entire parser to use pure list operations instead of mixed dict/list, eliminating the "shimmering" between different internal representations and enabling full tDOM compatibility. It worked 2nd try, after I told him he missed a ; in an on-the-line comment. Not too shabby! He even gave me a report of everything he changed.

    Both versions now produce identical outputs across 300+ test cases, except for the 2 with duplicate keys, the list version retaining both, so I have a choice between efficiency (dict-based) or compatibility (list-based) in case there's a significant difference. Maybe it's time for a github project to house it, rather than on the wiki page.

    -et



    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From saito@saitology9@gmail.com to comp.lang.tcl on Wed Sep 24 20:25:50 2025
    From Newsgroup: comp.lang.tcl

    On 9/24/2025 7:28 PM, et99 wrote:

    Claude successfully converted the entire parser to use pure list
    operations instead of mixed dict/list, eliminating the "shimmering"
    between different internal representations and enabling full tDOM compatibility. It worked 2nd try, after I told him he missed a ; in an on-the-line comment. Not too shabby! He even gave me a report of
    everything he changed.


    It has been interesting reading your success with Claude.

    I wonder if it can do the same wonders on another project: As you may
    know, Expect doesn't run on Windows anymore. Perhaps you could ask your
    friend to see if he/she can resurrect it. That would be a big win for
    the community. The code and documentation is available so it has a good starting point just like it did with tDOM. Will it be able to actually
    fix problems and produce something new?


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Wed Sep 24 22:35:36 2025
    From Newsgroup: comp.lang.tcl

    On 9/24/2025 5:25 PM, saito wrote:
    On 9/24/2025 7:28 PM, et99 wrote:

    Claude successfully converted the entire parser to use pure list operations instead of mixed dict/list, eliminating the "shimmering" between different internal representations and enabling full tDOM compatibility. It worked 2nd try, after I told him he missed a ; in an on-the-line comment. Not too shabby! He even gave me a report of everything he changed.


    It has been interesting reading your success with Claude.

    I wonder if it can do the same wonders on another project: As you may know, Expect doesn't run on Windows anymore. Perhaps you could ask your friend to see if he/she can resurrect it.  That would be a big win for the community. The code and documentation is available so it has a good starting point just like it did with tDOM.  Will it be able to actually fix problems and produce something new?




    It's easy to ask him. You just type in questions, like I just did asking if it was feasible. I imagine you would need a compiler since he can't do that nor can he run code.

    You can paste large amounts of code into his chat window. I think once he sees the code, he will have a better idea. Here's what he said:




    Expect is a C extension for Tcl that automates interactive command-line programs. You write Tcl scripts using Expect's commands like spawn, expect,
    and send to control things like SSH sessions.

    The Windows problem is that Expect's C code relies heavily on Unix-specific system calls:

    Pseudo-terminals (ptys) - Windows doesn't have Unix-style ptys
    Process spawning - Different APIs on Windows
    Signal handling - Windows uses different mechanisms
    Terminal control - Different console APIs

    So it's not just a simple recompile. A Windows port would need to rewrite
    the low-level parts using Windows equivalents like ConPTY for terminal emulation.

    It's definitely technically feasible but would require someone familiar with both the existing Unix code and Windows system programming. The C extension nature makes it more complex to port than pure Tcl, but that's also what
    gives it the power to control interactive processes in the first place.

    Since he has code that previously worked on Windows, this is much more promising! The hard work of the initial Windows port was already done.
    The issue is likely that newer Windows versions, compiler changes, or deprecated APIs broke the existing implementation. I could examine the
    code to identify exactly what broke and suggest specific fixes to get it working again.

    To get started, he can just go to claude.ai - no account
    needed to begin chatting. Share the source and I can analyze what needs updating for modern Windows.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Colin Macleod@user7@newsgrouper.org.invalid to comp.lang.tcl on Thu Sep 25 10:59:56 2025
    From Newsgroup: comp.lang.tcl

    et99 <et99@rocketship1.me> posted:

    On 9/24/2025 12:14 PM, Colin Macleod wrote:
    et99 <et99@rocketship1.me> posted:

    You can give him urls of places to get info, but now that we no longer seem
    to have a web interface to comp.lang.tcl I had to paste in your remarks,

    Actually my Newsgrouper project does provide a web interface to comp.lang.tcl
    and other groups - https://newsgrouper.org/comp.lang.tcl - but it requires a login (which can be as an unregistered guest) to get past the login page.

    Thanks for the newsgrouper.org link - I tried it out and Claude can
    read the login page but can't navigate the interactive guest access
    button, so copy/paste remains the best approach for sharing newsgroup
    content with it.

    I've now made an update to the newsgrouper code to make this easier: https://chiselapp.com/user/cmacleod/repository/newsgrouper/info/2cb38ef46ff92dd8

    Each article has a "Permalink" at the bottom, these have the form newsgrouper.org/<message-id> . Such links can now be accessed directly
    by any entity they are shared with, skipping the login page.

    Following any further links from the article page *will* require logging
    in, so I don't think this will enable AI scraping (fingers crossed 🤞).
    --
    Colin Macleod ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ https://cmacleod.me.uk

    FEED FEED FEED FEED FEED FEED FEED FEED
    GAZA GAZA GAZA GAZA GAZA GAZA GAZA GAZA
    NOW! NOW! NOW! NOW! NOW! NOW! NOW! NOW!
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Thu Sep 25 14:07:37 2025
    From Newsgroup: comp.lang.tcl

    On 9/24/2025 5:25 PM, saito wrote:


    I wonder if it can do the same wonders on another project: As you may know, Expect doesn't run on Windows anymore. Perhaps you could ask your friend to see if he/she can resurrect it.

    I've started a new thread on this. I'm afraid I've polluted this one with too much AI talk.

    I think it's doable if you have all the build scripts and source code. See you in the new thread.

    -et

    --- Synchronet 3.21a-Linux NewsLink 1.2