Forum: War Ensemble BBS

src/sbbs3/websrvr.cpp

From Rob Swindell (on Debian Linux)@1:103/705 to Git commit to main/sbbs/master on Sat May 9 14:04:17 2026

https://gitlab.synchro.net/main/sbbs/-/commit/f7b10a614935817ba8965ec1
Modified Files:
src/sbbs3/websrvr.cpp
Log Message:
websrvr: don't call destroy_session() with sentinel tls_sess value (-1)

When TLS setup fails after add_private_key() returns an error, the code
calls cryptDestroySession() directly and sets tls_sess = -1, then calls close_session_no_rb() which would pass -1 to destroy_session(), triggering
a spurious "Destroying a session (-1) that's not in sess_list" error.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
--- SBBSecho 3.37-Linux
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)

From Rob Swindell (on Windows 11)@1:103/705 to Git commit to main/sbbs/master on Thu Jun 4 09:44:21 2026

https://gitlab.synchro.net/main/sbbs/-/commit/50258e70bf63ba7a82af7515
Modified Files:
src/sbbs3/websrvr.cpp
Log Message:
websrvr: detect TLS client disconnect in session_check() (#1155)

session_check()'s is_tls branch treated a readable socket as "connected"
and latched session->tls_pending; once set, it returned "connected" on
every later call without re-probing the socket. But a peer's TLS
close_notify (and a FIN) arrive as readable bytes, so after an HTTPS
client hung up, session_check() reported it connected forever. The
JavaScript disconnect check in js_OperationCallback (ead5ccf16) relies on session_check(), so its abort never armed (offline_counter stayed 0): a badly-formed SSJS/XJS page that loops on mswait() without checking for disconnection (e.g. the webv4 user/system stats) ran forever, pinning its http_session thread, a MaxClients slot, and a CLOSE_WAIT socket -- a pile
of zombie HTTPS clients in sbbsctrl/MQTT and eventual MaxClients
exhaustion.

Why this only bit Windows: socket_check() (xpdev) has two paths. On
non-Windows builds it uses poll() (CFLAGS += -DPREFER_POLL, set only in build/Common.gmake, i.e. the GNU-make/Unix builds). poll() reports
POLLHUP when the peer closes its end -- even while there is still buffered
data to read -- and socket_check() returns false on POLLHUP before it
ever runs the readable/MSG_PEEK logic. So on Unix the close was detected, session_check() returned false, and tls_pending never latched. Windows (MSBuild) does not define PREFER_POLL and uses select(), which has no
POLLHUP equivalent: a closing TLS socket simply looks "readable"
(MSG_PEEK returns the encrypted close_notify bytes), so the latch was set
and the disconnect masked. The session_check() bug is platform-
independent; poll()/POLLHUP merely hid it everywhere except Windows.

Fix: drop the tls_pending liveness latch. Use peeked_valid (a decrypted
byte already buffered) as the readable fast-path, and when the raw socket
is readable, probe via cryptPopData(1 byte) -- which a raw MSG_PEEK
cannot do -- to tell apart application data (connected; the byte is
cached in session->peeked so the next sess_recv() returns it), CRYPT_ERROR_TIMEOUT (connected, no app data yet) and CRYPT_ERROR_COMPLETE
(peer closed -> disconnected). The probe is non-blocking (CRYPT_OPTION_NET_READTIMEOUT == 0, set at session setup) and runs in the session's own thread, so there is no concurrent reader. Also close the
socket in place in recvbufsocket() when session_check() reports a
disconnect (it previously relied on the latch returning true and the
following sess_recv() failing).

Latch introduced in d93478b918 (famous-15-sons); the readable-as-
connected + tls_pending set predates it (dbbfabf1b1, funky-27-foam).

Validated on a production Windows server: CLOSE_WAIT count ~22 -> 0,
sbbsctrl thread count 221 -> 25, and ran overnight with no zombie HTTPS clients.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
--- SBBSecho 3.37-Linux
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)

From Rob Swindell (on Windows 11)@1:103/705 to Git commit to main/sbbs/master on Sun Jun 21 20:54:25 2026

https://gitlab.synchro.net/main/sbbs/-/commit/65643e6ca604c3520da18e50
Modified Files:
src/sbbs3/websrvr.cpp
Log Message:
websrvr: bound drain_outbuf() so a dead client can't wedge the server

drain_outbuf() spun in a SLEEP(1) loop as long as the outbuf ring buffer
held data and the socket was still valid, with no timeout and no check of
the terminate_server flag (the "/* ToDo: This should probably timeout eventually... */" note acknowledged this). When a client stops reading,
the output thread blocks in its send and the buffer never drains, so the session thread spins forever. Under a distributed web scrape (many
abandoned Alibaba/Aliyun keep-alive connections) this hung web-server
shutdown: the "Waiting for N child threads to terminate" loop never
completed because several http_session_thread()s were stuck in
drain_outbuf() <- send_error().

Bound the wait: return (not break) when terminate_server is set, or once
the buffer has stalled for max_inactivity seconds. Returning rather than falling through matters - the output thread can hold outbuf_write while
blocked in a send, so the trailing pthread_mutex_lock() would just re-hang; returning lets the caller close the socket, which unblocks the output
thread.

Unbounded since the original SLEEP-based drain in 00f254912d (maker-8-money).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
--- SBBSecho 3.37-Linux
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)

From Rob Swindell (on Debian Linux)@1:103/705 to Git commit to main/sbbs/master on Tue Jun 23 13:40:10 2026

https://gitlab.synchro.net/main/sbbs/-/commit/f6d382c13949040c841d4465
Modified Files:
src/sbbs3/websrvr.cpp
Log Message:
websrvr: add debug-level timing probes to localize webv4 login stall (#1169)

Issue #1169 reports an exactly-90-second stall on every webv4 portal login/logout, logged between "Initializing User Objects" and the first
"Adding query value" line. It was initially suspected to be related to
#1153 (Windows exclusive user.tab locking), but the reporter confirmed
the stall persists on a current nightly that already carries the #1153
fix, so it is unrelated.

Tracing the path shows js_CreateUserObjects() and its area-object
creators only build lazy JS skeletons and take no user.tab lock, and the stalling request is anonymous (no user-record write at all), so the
native "Initializing User Objects" step is an unlikely culprit. To
localize the delay empirically, add LOG_DEBUG probes that bisect the gap between that log line and query-string parsing:

- http_checkuser(): "User Objects initialized" (bounds js_CreateUserObjects)
- check_request(): "Authorization check complete" (bounds check_ars tail)
- respond(): "Responding to request (dynamic=%d)"
- exec_ssjs(): "beginning JS request" / "initializing request properties"
(brackets JS_BEGINREQUEST to catch a blocking begin-request)

The adjacent pair of lines that straddles the 90s gap in a debug log
localizes the offending region. Probes are tagged "#1169 timing probe"
for easy removal once root-caused.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
--- SBBSecho 3.37-Linux
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)

From Rob Swindell (on Windows 11)@1:103/705 to Git commit to main/sbbs/master on Tue Jun 23 22:21:39 2026

https://gitlab.synchro.net/main/sbbs/-/commit/b0f02c4e61aa835f2b9b9e21
Modified Files:
src/sbbs3/websrvr.cpp
Log Message:
websrvr: read buffered TLS request body directly (fix #1169 login stall)

A webv4 login/logout is an HTTPS POST whose body (credentials) often arrives
in the same TLS record as the headers, so it sits decrypted-but-unread in the TLS layer with nothing left on the raw socket. read_post_data() -> recvbufsocket() gated each read on session_check(), which since 50258e70b ("detect TLS client disconnect", #1155) only treats a TLS session as readable when a byte has been peeked (peeked_valid) - it no longer short-circuits on tls_pending. With the body buffered but no peeked byte, session_check() fell through to socket_check() on the raw socket and blocked for the full MaxInactivity timeout (60-90s) before the buffered body was finally read.
That is the #1169 "login stalls ~90s at Initializing User Objects" symptom: POST-only (login/logout), duration == MaxInactivity, no wire traffic.

Guard the recvbufsocket() wait with tls_pending the same way sockreadline() already does for header reads: when TLS data is already buffered, read it directly instead of waiting on the raw socket. Header reads were unaffected because sockreadline() kept its own tls_pending guard; only the body read regressed.

Manifests whenever the body is TLS-buffered at read time (reliably on Windows, intermittently on Linux v3.22a); absent in v3.21f, which predates 50258e70b. Verified on vert: the auth POST's "Authorization check complete" -> "Responding to request" gap went from 60s to 0s.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
--- SBBSecho 3.37-Linux
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)

From Rob Swindell (on Windows 11)@1:103/705 to Git commit to main/sbbs/master on Tue Jun 23 23:20:54 2026

https://gitlab.synchro.net/main/sbbs/-/commit/659de100d04037459107de30
Modified Files:
src/sbbs3/websrvr.cpp
Log Message:
websrvr: remove #1169 timing probes (issue resolved)

Reverts the debug-level timing probes added in f6d382c13 to localize the
webv4 login stall; #1169 is now root-caused and fixed in b0f02c4e6 (recvbufsocket reads buffered TLS data directly instead of waiting on the
raw socket for MaxInactivity).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
--- SBBSecho 3.37-Linux
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)

From Rob Swindell (on Windows 11)@1:103/705 to Git commit to main/sbbs/master on Tue Jun 23 23:20:54 2026

https://gitlab.synchro.net/main/sbbs/-/commit/a6cb9dffb18f6ba070113911
Modified Files:
src/sbbs3/websrvr.cpp
Log Message:
websrvr: consolidate js_CreateUserObjects() branches in http_checkuser()

The user>0 and guest (NULL user) branches differed only in the user argument and an error-log string; collapse them into a single call with a ternary for the user pointer. No functional change (the anonymous failure path now logs the same "creating user objects" message as the authenticated path).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
--- SBBSecho 3.37-Linux
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)

From Rob Swindell (on Windows 11)@1:103/705 to Git commit to main/sbbs/master on Tue Jun 23 23:20:54 2026

https://gitlab.synchro.net/main/sbbs/-/commit/3e57627a712015d1e417056b
Modified Files:
src/sbbs3/websrvr.cpp
Log Message:
websrvr: log authenticated user logon/logoff at LOG_INFO

http_logon()/http_logoff() logged every web logon and logoff at LOG_DEBUG, so webv4 (and HTTP-auth) user logins were invisible in the server log unless debug-level web logging was enabled - unlike the FTP (ftpsrvr.cpp:2695), mail (mailsrvr.cpp:1422/4380/4489) and terminal (answer.cpp:452) servers, which all record a successful user login at LOG_INFO.

Log a logon at LOG_INFO when a real user authenticated (user.number > 0) and keep anonymous/Guest logons (number == 0) at LOG_DEBUG, so the constant per-request anonymous churn (bots, crawlers) stays quiet. http_logoff() already early-returns unless a user was logged in, so it moves to LOG_INFO unconditionally.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
--- SBBSecho 3.37-Linux
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)

Who's Online

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,124
Nodes:	10 (0 / 10)
Uptime:	17:00:12
Calls:	14,391
Files:	186,389
D/L today:	1,137 files (346M bytes)
Messages:	2,544,789

src/sbbs3/websrvr.cpp

Who's Online

System Info