• src/sbbs3/websrvr.cpp

    From Rob Swindell (on Debian Linux)@1:103/705 to Git commit to main/sbbs/master on Sat May 9 14:04:17 2026
    https://gitlab.synchro.net/main/sbbs/-/commit/f7b10a614935817ba8965ec1
    Modified Files:
    src/sbbs3/websrvr.cpp
    Log Message:
    websrvr: don't call destroy_session() with sentinel tls_sess value (-1)

    When TLS setup fails after add_private_key() returns an error, the code
    calls cryptDestroySession() directly and sets tls_sess = -1, then calls close_session_no_rb() which would pass -1 to destroy_session(), triggering
    a spurious "Destroying a session (-1) that's not in sess_list" error.

    Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell (on Windows 11)@1:103/705 to Git commit to main/sbbs/master on Thu Jun 4 09:44:21 2026
    https://gitlab.synchro.net/main/sbbs/-/commit/50258e70bf63ba7a82af7515
    Modified Files:
    src/sbbs3/websrvr.cpp
    Log Message:
    websrvr: detect TLS client disconnect in session_check() (#1155)

    session_check()'s is_tls branch treated a readable socket as "connected"
    and latched session->tls_pending; once set, it returned "connected" on
    every later call without re-probing the socket. But a peer's TLS
    close_notify (and a FIN) arrive as readable bytes, so after an HTTPS
    client hung up, session_check() reported it connected forever. The
    JavaScript disconnect check in js_OperationCallback (ead5ccf16) relies on session_check(), so its abort never armed (offline_counter stayed 0): a badly-formed SSJS/XJS page that loops on mswait() without checking for disconnection (e.g. the webv4 user/system stats) ran forever, pinning its http_session thread, a MaxClients slot, and a CLOSE_WAIT socket -- a pile
    of zombie HTTPS clients in sbbsctrl/MQTT and eventual MaxClients
    exhaustion.

    Why this only bit Windows: socket_check() (xpdev) has two paths. On
    non-Windows builds it uses poll() (CFLAGS += -DPREFER_POLL, set only in build/Common.gmake, i.e. the GNU-make/Unix builds). poll() reports
    POLLHUP when the peer closes its end -- even while there is still buffered
    data to read -- and socket_check() returns false on POLLHUP before it
    ever runs the readable/MSG_PEEK logic. So on Unix the close was detected, session_check() returned false, and tls_pending never latched. Windows (MSBuild) does not define PREFER_POLL and uses select(), which has no
    POLLHUP equivalent: a closing TLS socket simply looks "readable"
    (MSG_PEEK returns the encrypted close_notify bytes), so the latch was set
    and the disconnect masked. The session_check() bug is platform-
    independent; poll()/POLLHUP merely hid it everywhere except Windows.

    Fix: drop the tls_pending liveness latch. Use peeked_valid (a decrypted
    byte already buffered) as the readable fast-path, and when the raw socket
    is readable, probe via cryptPopData(1 byte) -- which a raw MSG_PEEK
    cannot do -- to tell apart application data (connected; the byte is
    cached in session->peeked so the next sess_recv() returns it), CRYPT_ERROR_TIMEOUT (connected, no app data yet) and CRYPT_ERROR_COMPLETE
    (peer closed -> disconnected). The probe is non-blocking (CRYPT_OPTION_NET_READTIMEOUT == 0, set at session setup) and runs in the session's own thread, so there is no concurrent reader. Also close the
    socket in place in recvbufsocket() when session_check() reports a
    disconnect (it previously relied on the latch returning true and the
    following sess_recv() failing).

    Latch introduced in d93478b918 (famous-15-sons); the readable-as-
    connected + tls_pending set predates it (dbbfabf1b1, funky-27-foam).

    Validated on a production Windows server: CLOSE_WAIT count ~22 -> 0,
    sbbsctrl thread count 221 -> 25, and ran overnight with no zombie HTTPS clients.

    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell (on Windows 11)@1:103/705 to Git commit to main/sbbs/master on Sun Jun 21 20:54:25 2026
    https://gitlab.synchro.net/main/sbbs/-/commit/65643e6ca604c3520da18e50
    Modified Files:
    src/sbbs3/websrvr.cpp
    Log Message:
    websrvr: bound drain_outbuf() so a dead client can't wedge the server

    drain_outbuf() spun in a SLEEP(1) loop as long as the outbuf ring buffer
    held data and the socket was still valid, with no timeout and no check of
    the terminate_server flag (the "/* ToDo: This should probably timeout eventually... */" note acknowledged this). When a client stops reading,
    the output thread blocks in its send and the buffer never drains, so the session thread spins forever. Under a distributed web scrape (many
    abandoned Alibaba/Aliyun keep-alive connections) this hung web-server
    shutdown: the "Waiting for N child threads to terminate" loop never
    completed because several http_session_thread()s were stuck in
    drain_outbuf() <- send_error().

    Bound the wait: return (not break) when terminate_server is set, or once
    the buffer has stalled for max_inactivity seconds. Returning rather than falling through matters - the output thread can hold outbuf_write while
    blocked in a send, so the trailing pthread_mutex_lock() would just re-hang; returning lets the caller close the socket, which unblocks the output
    thread.

    Unbounded since the original SLEEP-based drain in 00f254912d (maker-8-money).

    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell (on Debian Linux)@1:103/705 to Git commit to main/sbbs/master on Tue Jun 23 13:40:10 2026
    https://gitlab.synchro.net/main/sbbs/-/commit/f6d382c13949040c841d4465
    Modified Files:
    src/sbbs3/websrvr.cpp
    Log Message:
    websrvr: add debug-level timing probes to localize webv4 login stall (#1169)

    Issue #1169 reports an exactly-90-second stall on every webv4 portal login/logout, logged between "Initializing User Objects" and the first
    "Adding query value" line. It was initially suspected to be related to
    #1153 (Windows exclusive user.tab locking), but the reporter confirmed
    the stall persists on a current nightly that already carries the #1153
    fix, so it is unrelated.

    Tracing the path shows js_CreateUserObjects() and its area-object
    creators only build lazy JS skeletons and take no user.tab lock, and the stalling request is anonymous (no user-record write at all), so the
    native "Initializing User Objects" step is an unlikely culprit. To
    localize the delay empirically, add LOG_DEBUG probes that bisect the gap between that log line and query-string parsing:

    - http_checkuser(): "User Objects initialized" (bounds js_CreateUserObjects)
    - check_request(): "Authorization check complete" (bounds check_ars tail)
    - respond(): "Responding to request (dynamic=%d)"
    - exec_ssjs(): "beginning JS request" / "initializing request properties"
    (brackets JS_BEGINREQUEST to catch a blocking begin-request)

    The adjacent pair of lines that straddles the 90s gap in a debug log
    localizes the offending region. Probes are tagged "#1169 timing probe"
    for easy removal once root-caused.

    Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell (on Windows 11)@1:103/705 to Git commit to main/sbbs/master on Tue Jun 23 22:21:39 2026
    https://gitlab.synchro.net/main/sbbs/-/commit/b0f02c4e61aa835f2b9b9e21
    Modified Files:
    src/sbbs3/websrvr.cpp
    Log Message:
    websrvr: read buffered TLS request body directly (fix #1169 login stall)

    A webv4 login/logout is an HTTPS POST whose body (credentials) often arrives
    in the same TLS record as the headers, so it sits decrypted-but-unread in the TLS layer with nothing left on the raw socket. read_post_data() -> recvbufsocket() gated each read on session_check(), which since 50258e70b ("detect TLS client disconnect", #1155) only treats a TLS session as readable when a byte has been peeked (peeked_valid) - it no longer short-circuits on tls_pending. With the body buffered but no peeked byte, session_check() fell through to socket_check() on the raw socket and blocked for the full MaxInactivity timeout (60-90s) before the buffered body was finally read.
    That is the #1169 "login stalls ~90s at Initializing User Objects" symptom: POST-only (login/logout), duration == MaxInactivity, no wire traffic.

    Guard the recvbufsocket() wait with tls_pending the same way sockreadline() already does for header reads: when TLS data is already buffered, read it directly instead of waiting on the raw socket. Header reads were unaffected because sockreadline() kept its own tls_pending guard; only the body read regressed.

    Manifests whenever the body is TLS-buffered at read time (reliably on Windows, intermittently on Linux v3.22a); absent in v3.21f, which predates 50258e70b. Verified on vert: the auth POST's "Authorization check complete" -> "Responding to request" gap went from 60s to 0s.

    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell (on Windows 11)@1:103/705 to Git commit to main/sbbs/master on Tue Jun 23 23:20:54 2026
    https://gitlab.synchro.net/main/sbbs/-/commit/659de100d04037459107de30
    Modified Files:
    src/sbbs3/websrvr.cpp
    Log Message:
    websrvr: remove #1169 timing probes (issue resolved)

    Reverts the debug-level timing probes added in f6d382c13 to localize the
    webv4 login stall; #1169 is now root-caused and fixed in b0f02c4e6 (recvbufsocket reads buffered TLS data directly instead of waiting on the
    raw socket for MaxInactivity).

    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell (on Windows 11)@1:103/705 to Git commit to main/sbbs/master on Tue Jun 23 23:20:54 2026
    https://gitlab.synchro.net/main/sbbs/-/commit/a6cb9dffb18f6ba070113911
    Modified Files:
    src/sbbs3/websrvr.cpp
    Log Message:
    websrvr: consolidate js_CreateUserObjects() branches in http_checkuser()

    The user>0 and guest (NULL user) branches differed only in the user argument and an error-log string; collapse them into a single call with a ternary for the user pointer. No functional change (the anonymous failure path now logs the same "creating user objects" message as the authenticated path).

    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell (on Windows 11)@1:103/705 to Git commit to main/sbbs/master on Tue Jun 23 23:20:54 2026
    https://gitlab.synchro.net/main/sbbs/-/commit/3e57627a712015d1e417056b
    Modified Files:
    src/sbbs3/websrvr.cpp
    Log Message:
    websrvr: log authenticated user logon/logoff at LOG_INFO

    http_logon()/http_logoff() logged every web logon and logoff at LOG_DEBUG, so webv4 (and HTTP-auth) user logins were invisible in the server log unless debug-level web logging was enabled - unlike the FTP (ftpsrvr.cpp:2695), mail (mailsrvr.cpp:1422/4380/4489) and terminal (answer.cpp:452) servers, which all record a successful user login at LOG_INFO.

    Log a logon at LOG_INFO when a real user authenticated (user.number > 0) and keep anonymous/Guest logons (number == 0) at LOG_DEBUG, so the constant per-request anonymous churn (bots, crawlers) stays quiet. http_logoff() already early-returns unless a user was logged in, so it moves to LOG_INFO unconditionally.

    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)