• Fastest way to run two external processes

    From Mark Summerfield@m.n.summerfield@gmail.com to comp.lang.tcl on Wed Apr 29 07:38:23 2026
    From Newsgroup: comp.lang.tcl

    I need to run two external processes (on Linux):

    pdftotext -tsv one.pdf
    pdftotext -tsv two.pdf

    For each one I need to acquire the output and post-process it.
    Both are completely independent.
    (However, once I've finished post-processing I then do some work on
    both sets of post-processed data together.)

    Each external process takes about 3 secs so it takes just over 6 secs
    to acquire the data from both processes.

    When I've done something similar in Python I've used the multiprocessing
    module and this has got my runtime close to the 3 secs.

    In my experiments with Tcl's threading I've found the threading startup overhead to be rather large.

    What is the fastest way to run two independent processes concurrently
    and acquire their outputs using Tcl?
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Mark Summerfield@m.n.summerfield@gmail.com to comp.lang.tcl on Wed Apr 29 08:51:17 2026
    From Newsgroup: comp.lang.tcl

    On Wed, 29 Apr 2026 07:38:23 -0000 (UTC), Mark Summerfield wrote:

    I need to run two external processes (on Linux):

    pdftotext -tsv one.pdf
    pdftotext -tsv two.pdf

    For each one I need to acquire the output and post-process it.
    Both are completely independent.
    (However, once I've finished post-processing I then do some work on
    both sets of post-processed data together.)

    Each external process takes about 3 secs so it takes just over 6 secs
    to acquire the data from both processes.

    When I've done something similar in Python I've used the multiprocessing module and this has got my runtime close to the 3 secs.

    In my experiments with Tcl's threading I've found the threading startup overhead to be rather large.

    What is the fastest way to run two independent processes concurrently
    and acquire their outputs using Tcl?

    Here's my serial version:

    proc app::serial {pdftotext pdf1 pdf2} {
    puts serial
    set pdf1tsv [exec $pdftotext -tsv $pdf1 -]
    set pdf2tsv [exec $pdftotext -tsv $pdf2 -]
    list $pdf1tsv $pdf2tsv
    }

    This takes ~2 sec for two ~650 page PDFs.

    With some help from Gemini (after I got past non-working and slow
    solutions) I did a multiprocessing version:

    proc app::multiprocess {pdftotext pdf1 pdf2} {
    set p1 [open "|$pdftotext -tsv $pdf1 - 2>@1" r]
    try {
    set p2 [open "|$pdftotext -tsv $pdf2 - 2>@1" r]
    try {
    fconfigure $p1 -blocking 0
    fconfigure $p2 -blocking 0
    set pdf1tsv ""
    set pdf2tsv ""
    while {![eof $p1] || ![eof $p2]} {
    append pdf1tsv [read $p1]
    append pdf2tsv [read $p2]
    after 1
    }
    } finally {
    close $p2
    }
    } finally {
    close $p1
    }
    list $pdf1tsv $pdf2tsv
    }

    This takes ~1 sec.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From meshparts@alexandru.dadalau@meshparts.de to comp.lang.tcl on Wed Apr 29 11:24:34 2026
    From Newsgroup: comp.lang.tcl

    Am 29.04.2026 um 10:51 schrieb Mark Summerfield:
    This takes ~1 sec.
    So it's 2x faster, as expected.
    What's the issue?
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Mark Summerfield@m.n.summerfield@gmail.com to comp.lang.tcl on Wed Apr 29 09:46:56 2026
    From Newsgroup: comp.lang.tcl

    On Wed, 29 Apr 2026 11:24:34 +0200, meshparts wrote:

    Am 29.04.2026 um 10:51 schrieb Mark Summerfield:
    This takes ~1 sec.
    So it's 2x faster, as expected.
    What's the issue?

    When I originally asked I only had the serial approach.
    I replied to myself once I had the multiprocessing approach which
    solved the problem so that people could see it was solved.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Ralf Fassel@ralfixx@gmx.de to comp.lang.tcl on Wed Apr 29 12:30:12 2026
    From Newsgroup: comp.lang.tcl

    * Mark Summerfield <m.n.summerfield@gmail.com>
    | With some help from Gemini (after I got past non-working and slow
    | solutions) I did a multiprocessing version:

    | proc app::multiprocess {pdftotext pdf1 pdf2} {
    | set p1 [open "|$pdftotext -tsv $pdf1 - 2>@1" r]
    | try {
    | set p2 [open "|$pdftotext -tsv $pdf2 - 2>@1" r]
    | try {
    | fconfigure $p1 -blocking 0
    | fconfigure $p2 -blocking 0

    Depending on the output of $pdftotext, some -encoding option might be necessary, too.

    | set pdf1tsv ""
    | set pdf2tsv ""
    | while {![eof $p1] || ![eof $p2]} {
    | append pdf1tsv [read $p1]
    | append pdf2tsv [read $p2]
    | after 1
    | }

    I don't like the busy-waiting loop for eof, but a solution using
    fileevents would require namespace vars or globals to collect the output
    and signallig 'done', so ymmv.

    R'
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From abu@user13892@newsgrouper.org.invalid to comp.lang.tcl on Thu Apr 30 00:51:16 2026
    From Newsgroup: comp.lang.tcl


    I don't understand why Threads are not used (in particular Thread Pools)

    Here's my solution. Please tell me if there's a significant speed penalty.

    # ===============================

    package require Thread

    # run up to 3 parallel workers; extra jobs are queued
    set mytpool [tpool::create -minworkers 3]

    # ..
    set jobs {
    "exec $pdftotext -tsv $pdf1 - 2>@1"
    "exec $pdftotext -tsv $pdf2 - 2>@1"
    "exec $pdftotext -tsv $pdf3 - 2>@1"
    }

    set T0 [clock milliseconds]
    set myjobIDs {}
    # scheduled all jobs
    foreach job $jobs {
    lappend myjobIDs [tpool::post -nowait $mytpool $job]
    }
    unset RESULT
    puts "waiting for RESULT..."
    while { [llength $myjobIDs] > 0 } {
    # get the completed jobs; myjobIDs is updated with the list of the still pending jobs
    set completedJobs [tpool::wait $mytpool $myjobIDs myjobIDs]
    foreach job $completedJobs {
    puts "== Job $job completed at [expr {[clock milliseconds]-$T0}] msec"
    set RESULT($job) [tpool::get $mytpool $job]
    }
    }

    puts "Result saved in the RESULT() array"
    puts "Total processing time: [expr {[clock milliseconds]-$T0}] msec"
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Ralf Fassel@ralfixx@gmx.de to comp.lang.tcl on Thu Apr 30 14:23:58 2026
    From Newsgroup: comp.lang.tcl

    * abu <user13892@newsgrouper.org.invalid>
    | I don't understand why Threads are not used (in particular Thread Pools)

    Most probably because Mark stated in Message-ID: <10sschf$3nvs2$1@dont-email.me>

    In my experiments with Tcl's threading I've found the threading
    startup overhead to be rather large.

    | Here's my solution. Please tell me if there's a significant speed penalty.

    Did you compare your version to Mark's solution? This would be the best comparison when running on the same hardware...

    R'
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Mark Summerfield@m.n.summerfield@gmail.com to comp.lang.tcl on Fri May 1 07:04:08 2026
    From Newsgroup: comp.lang.tcl

    I created a tiny test program (65 LOC; shown at the end) to
    compare timings. I did multiple timings and here're the averages:

    serial (2 LOC) 2.020 sec
    multiprocess (19 LOC) 1.055 sec
    threaded (13 LOC) 1.061 sec

    Since the difference between the multiprocess and threaded
    approaches is so small and that the threaded code is simpler
    and more appealing, I'm going to use the threaded version in
    my programs (which only ever work with two PDFs at a time)
    — so thank you "abu"!

    #!/usr/bin/env tclsh9
    # usage: time ./concurrent.tcl <s|m|t> <file1.pdf> <file2.pdf>

    package require thread

    proc main {} {
    set pdftotext [auto_execok pdftotext]
    set pdf1 [lindex $::argv 1]
    set pdf2 [lindex $::argv 2]
    switch [lindex $::argv 0] {
    s { serial $pdftotext $pdf1 $pdf2 }
    m { multiprocess $pdftotext $pdf1 $pdf2 }
    t { threaded $pdftotext $pdf1 $pdf2 }
    }
    }

    proc serial {pdftotext pdf1 pdf2} {
    puts -nonewline "serial "
    set tsv1 [exec $pdftotext -tsv $pdf1 - 2>@1]
    set tsv2 [exec $pdftotext -tsv $pdf2 - 2>@1]
    puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"
    }

    proc multiprocess {pdftotext pdf1 pdf2} {
    puts -nonewline multiprocess
    set p1 [open "|$pdftotext -tsv $pdf1 - 2>@1" r]
    try {
    set p2 [open "|$pdftotext -tsv $pdf2 - 2>@1" r]
    try {
    fconfigure $p1 -blocking 0
    fconfigure $p2 -blocking 0
    set tsv1 ""
    set tsv2 ""
    while {![eof $p1] || ![eof $p2]} {
    append tsv1 [read $p1]
    append tsv2 [read $p2]
    after 1
    }
    } finally {
    close $p2
    }
    } finally {
    close $p1
    }
    puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"
    }

    proc threaded {pdftotext pdf1 pdf2} {
    puts -nonewline "threaded "
    set pool [tpool::create -minworkers 2]
    set job1 [tpool::post -nowait $pool "exec $pdftotext -tsv $pdf1 - 2>@1"]
    set job2 [tpool::post -nowait $pool "exec $pdftotext -tsv $pdf2 - 2>@1"]
    set job_ids [list $job1 $job2]
    while {[llength $job_ids] > 0} {
    foreach job_id [tpool::wait $pool $job_ids job_ids] {
    if {$job_id eq $job1} {
    set tsv1 [tpool::get $pool $job_id]
    } else {
    set tsv2 [tpool::get $pool $job_id]
    }
    }
    }
    puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"
    }

    main
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Olivier@user1108@newsgrouper.org.invalid to comp.lang.tcl on Fri May 1 10:06:35 2026
    From Newsgroup: comp.lang.tcl


    Mark Summerfield <m.n.summerfield@gmail.com> posted:

    I need to run two external processes (on Linux):

    pdftotext -tsv one.pdf
    pdftotext -tsv two.pdf


    I am not an expert, but the construction (with Tcl 9.x) :

    1) launch both processes in background

    2) check the status with ::tcl::process

    3) post-process the output of each process as soon as it has ended (*)

    seems doable but no one mentions something similar, is this a construction
    to avoid ?

    (*) with a monolithic script if it is fast, I mean no thread or
    different interpreters
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Ralf Fassel@ralfixx@gmx.de to comp.lang.tcl on Fri May 1 22:54:54 2026
    From Newsgroup: comp.lang.tcl

    * Mark Summerfield <m.n.summerfield@gmail.com>
    | I created a tiny test program (65 LOC; shown at the end) to
    | compare timings. I did multiple timings and here're the averages:

    | serial (2 LOC) 2.020 sec
    | multiprocess (19 LOC) 1.055 sec
    | threaded (13 LOC) 1.061 sec

    | Since the difference between the multiprocess and threaded
    | approaches is so small and that the threaded code is simpler
    | and more appealing, I'm going to use the threaded version in
    | my programs (which only ever work with two PDFs at a time)
    | — so thank you "abu"!

    I wonder: you stated in your initial message

    Message-ID: <10sschf$3nvs2$1@dont-email.me>
    In my experiments with Tcl's threading I've found the threading
    startup overhead to be rather large.

    Can you tell what is/was the difference to the current solution which
    obviously has no "startup overhead"?

    R'
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Emiliano@emiliano@example.invalid to comp.lang.tcl on Sat May 2 00:34:54 2026
    From Newsgroup: comp.lang.tcl

    On Wed, 29 Apr 2026 07:38:23 -0000 (UTC)
    Mark Summerfield <m.n.summerfield@gmail.com> wrote:

    I need to run two external processes (on Linux):

    pdftotext -tsv one.pdf
    pdftotext -tsv two.pdf

    You can use pipes and run the processes in the background, collecting output with the event loop. Here's a rough draft

    proc runit {var file} {
    lassign [chan pipe] cr cw
    exec pdftotext -tsv $file - >@ $cw &
    chan close $cw
    chan configure $cr -blocking 0
    chan event $cr readable [list handle $var $cr]
    }
    proc handle {var fd} {
    global $var
    append $var [chan read $fd]
    if {[chan eof $fd]} {
    chan close $fd
    set ::done 1
    }
    }
    puts "sequential: [time {
    set out1 [exec pdftotext -tsv one.pdf -]
    set out2 [exec pdftotext -tsv two.pdf -]
    puts "one.pdf [string length $out1]"
    puts "two.pdf [string length $out2]"
    }]"
    puts "parallel: [time {
    runit out1 one.pdf
    runit out2 two.pdf
    vwait done
    vwait done
    puts "one.pdf [string length $out1]"
    puts "two.pdf [string length $out2]"
    }]"


    Regards
    --
    Emiliano
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Mark Summerfield@m.n.summerfield@gmail.com to comp.lang.tcl on Sat May 2 06:57:06 2026
    From Newsgroup: comp.lang.tcl

    On Fri, 01 May 2026 22:54:54 +0200, Ralf Fassel wrote:

    * Mark Summerfield <m.n.summerfield@gmail.com>
    | I created a tiny test program (65 LOC; shown at the end) to
    | compare timings. I did multiple timings and here're the averages:

    | serial (2 LOC) 2.020 sec
    | multiprocess (19 LOC) 1.055 sec
    | threaded (13 LOC) 1.061 sec

    | Since the difference between the multiprocess and threaded
    | approaches is so small and that the threaded code is simpler
    | and more appealing, I'm going to use the threaded version in
    | my programs (which only ever work with two PDFs at a time)
    | — so thank you "abu"!

    I wonder: you stated in your initial message

    Message-ID: <10sschf$3nvs2$1@dont-email.me>
    In my experiments with Tcl's threading I've found the threading
    startup overhead to be rather large.

    Can you tell what is/was the difference to the current solution which obviously has no "startup overhead"?

    R'

    Yes, the difference was that I started out using thread::create etc.,
    rather than using tpool. I've put a new version that compares them
    all at the end. Anyone can compare timings for themselves if they
    have one or two big PDF files (the program needs two but for tests
    it is fine if it is the same one).

    On an old laptop:

    serial (2 LOC) 6.37 sec
    multiprocess (19 LOC) 3.33 sec
    thread pool (15 LOC) 3.60 sec
    threaded (22 LOC) 3.66 sec

    I've now gone back to using the multiprocess version.
    Here's the full test code.

    #!/usr/bin/env tclsh9
    # usage: time ./concurrent.tcl <s|m|p|t> <file1.pdf> <file2.pdf>

    package require thread 3

    const OPT -tsv ;# OR if not supported by older pdftotext use: -bbox
    const PDFTOTEXT [auto_execok pdftotext]

    proc main {} {
    set pdf1 [lindex $::argv 1]
    set pdf2 [lindex $::argv 2]
    switch [lindex $::argv 0] {
    s { serial $pdf1 $pdf2 }
    m { multiprocess $pdf1 $pdf2 }
    p { thread_pool $pdf1 $pdf2 }
    t { threaded $pdf1 $pdf2 }
    }
    }

    proc serial {pdf1 pdf2} {
    puts -nonewline "serial "
    set tsv1 [exec $::PDFTOTEXT $::OPT $pdf1 - 2>@1]
    set tsv2 [exec $::PDFTOTEXT $::OPT $pdf2 - 2>@1]
    puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"
    }

    proc multiprocess {pdf1 pdf2} {
    puts -nonewline multiprocess
    set p1 [open "|$::PDFTOTEXT $::OPT $pdf1 - 2>@1" r]
    try {
    set p2 [open "|$::PDFTOTEXT $::OPT $pdf2 - 2>@1" r]
    try {
    fconfigure $p1 -blocking 0
    fconfigure $p2 -blocking 0
    set tsv1 ""
    set tsv2 ""
    while {![eof $p1] || ![eof $p2]} {
    append tsv1 [read $p1]
    append tsv2 [read $p2]
    after 1
    }
    } finally {
    close $p2
    }
    } finally {
    close $p1
    }
    puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"
    }

    proc thread_pool {pdf1 pdf2} {
    puts -nonewline "thread pool "
    set pool [tpool::create -minworkers 2]
    set job1 [tpool::post -nowait $pool \
    "exec $::PDFTOTEXT $::OPT $pdf1 - 2>@1"]
    set job2 [tpool::post -nowait $pool \
    "exec $::PDFTOTEXT $::OPT $pdf2 - 2>@1"]
    set job_ids [list $job1 $job2]
    while {[llength $job_ids] > 0} {
    foreach job_id [tpool::wait $pool $job_ids job_ids] {
    if {$job_id eq $job1} {
    set tsv1 [tpool::get $pool $job_id]
    } else {
    set tsv2 [tpool::get $pool $job_id]
    }
    }
    }
    puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"
    }

    proc threaded {pdf1 pdf2} {
    puts -nonewline "threaded "
    set tid1 [thread::create -joinable]
    set tid2 [thread::create -joinable]
    tsv::set shared pdf1 $pdf1
    tsv::set shared pdf2 $pdf2
    tsv::set shared pdftotext $::PDFTOTEXT
    tsv::set shared opt $::OPT
    thread::send -async $tid1 {
    tsv::set shared tsv1 \
    [exec -encoding utf-8 {*}[tsv::get shared pdftotext] \
    [tsv::get shared opt] [tsv::get shared pdf1] - 2>@1]
    }
    thread::send -async $tid2 {
    tsv::set shared tsv2 \
    [exec -encoding utf-8 {*}[tsv::get shared pdftotext] \
    [tsv::get shared opt] [tsv::get shared pdf2] - 2>@1]
    }
    thread::release $tid1
    thread::join $tid1
    thread::release $tid2
    thread::join $tid2
    set tsv1 [tsv::get shared tsv1]
    set tsv2 [tsv::get shared tsv2]
    puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"
    }

    main
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Ashok@apnmbx-public@yahoo.com to comp.lang.tcl on Sat May 9 16:03:16 2026
    From Newsgroup: comp.lang.tcl

    Shameless plug...

    Bit late to the topic, but the simplest way to parallelize multiple
    processes or threads and wait for completion is promises, if you do not
    mind an external package. Bit of a learning curve however.

    lappend promises [promise::pexec pdftotext pdf1.pdf pdf1.txt]
    lappend promises [promise::pexec pdftotext pdf2.pdf pdf2.txt]
    set waiter [promise::all $promises]
    # Assumes eventloop not running!
    promise::eventloop $waiter

    Timing:

    % time {demo} <- using promises
    2606403 microseconds per iteration
    % time {demo2} <- sequential exec's
    4762417 microseconds per iteration

    https://wiki.tcl-lang.org/page/promise
    https://tcl-promise.magicsplat.com/ https://www.magicsplat.com/blog/tags/promises/

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Ralf Fassel@ralfixx@gmx.de to comp.lang.tcl on Mon May 11 11:08:06 2026
    From Newsgroup: comp.lang.tcl

    * Ashok <apnmbx-public@yahoo.com>
    | Shameless plug...

    | Bit late to the topic, but the simplest way to parallelize multiple
    | processes or threads and wait for completion is promises, if you do
    | not mind an external package. Bit of a learning curve however. --<snip-snip>--
    | https://tcl-promise.magicsplat.com/

    Ashok,
    since coroutines are already part of TCL, any chance of getting promises
    into the core? It would seem to me as a 'natural' addition for async
    features in TCL, and the package looks quite mature...

    R'
    --- Synchronet 3.22a-Linux NewsLink 1.2