v7.x ARCHITECTUREADVANCED · LESSON 41 / 41~5 min read

Reading server logs as a discipline practice.

With the LaunchAgent running the daemons 24/7, the dashboard's brain produces a steady stream of log output you never have to look at — until you have to look at it. Three of v7.8's most consequential bug fixes came from realizing that a log line firing every five minutes for three weeks wasn't noise; it was a real bug everyone had stopped noticing. This lesson covers where the logs live, what the entries actually mean, and three concrete rules for reading server.log as the discipline practice it should be — not the chore it often is.

Where the logs live

FileSourceWhat's in it
~/Library/Logs/SwingDeck/server.logcontrol_server's Python loggerApplication-level logs: audit cycle progress, broker calls, alert dispatch, data-provider fetches
~/Library/Logs/SwingDeck/launchd.out.logLaunchAgent-captured stdoutAnything the Python process prints before its logger initializes; framework startup messages
~/Library/Logs/SwingDeck/launchd.err.logLaunchAgent-captured stderrCrashes, tracebacks, anything that bypasses the logger. Critical when the server itself fails to start

The Python logger writes structured-ish lines to server.log; launchd captures stdio separately. When something's wrong, the right order is: start with launchd.err.log (did the process even start?), then server.log (did it run, then fail?), then tail it live during a reproduction attempt.

Rule 1 — Repeated identical errors are not transient

The most expensive bug class in v7.8 was a log line that fired every five minutes for three weeks:

2026-05-15 23:50:13 [WARNING] __main__: Broker reconcile failed:
  E*Trade GET /v1/accounts/list → 404

Every five minutes. For weeks. The badge color hadn't changed; the dashboard felt healthy because everything else worked. The line was mentally classified as "transient API issue" each time it scrolled past during other debugging.

It wasn't transient. E*TRADE had quietly migrated /v1/accounts/list to require a .json suffix. The reconcile daemon was 404-ing on every poll because the URL was actually wrong. The fix was one character.

The rule: a log line that fires every N minutes for weeks is a structural failure, not a glitch. The phrase "I've been seeing that one for a while" is the smell. Investigate immediately when you notice the repetition pattern; don't wait for downstream consequences to surface.

Rule 2 — Identical counters across independent systems are diagnostic

The FEED DEGRADED false alarm fired because three different data providers (Tradier, Polygon Quote, E*TRADE) all showed exactly 370 failed calls in the same session. Three independent providers arriving at the same fail count on the same poll cadence isn't coincidence — it's the same set of calls failing across all three providers because they were all being asked the same impossible question (macro symbols like ^VIX sent to equity-only APIs).

The rule: if multiple independent systems report identical numbers, they're not independent failures. Same count, same cadence = one routing bug hitting multiple innocent services. Look for what's common across them, not what's wrong with each individually.

Rule 3 — 404 is information, not noise

APIs return specific status codes for specific reasons. A response shape is itself a clue:

StatusLikely meaningAction
401Auth issue (expired token, missing signature)Re-authenticate, check token freshness
403Permission denied / scope insufficientCheck API scope grants in vendor portal
404URL doesn't exist at the vendorProbe variants (with/without .json, with version prefix). Vendor may have deprecated quietly
429Rate-limitedBack off, reduce poll cadence, add jitter
500 / 502 / 503Vendor server errorRetry with backoff; if persistent, the vendor's status page is the right next stop
Tomcat HTML 404Reverse proxy / load balancer doesn't have the routeThe route is genuinely gone or being WAF-gated. Probe with browser UA + curl to rule out client-stack issues

The rule: 404 specifically says "this URL is wrong." Treat it as a deprecation notice the vendor forgot to send. A Tomcat-styled HTML 404 in particular is the WAF / reverse proxy layer saying the route isn't routed — different shape from a JSON 404 the application emitted on purpose.

What to actually do, weekly

A 5-minute Friday ritual that catches most issues before they cascade:

# How many warnings this week?
grep -c WARNING ~/Library/Logs/SwingDeck/server.log

# Group by error type
grep WARNING ~/Library/Logs/SwingDeck/server.log \
  | awk -F'__main__: ' '{print $2}' \
  | cut -d':' -f1 \
  | sort | uniq -c | sort -rn | head -10

# Anything fire more than 50 times?
# > 50 = look at it. > 200 = drop everything and trace.

Most weeks the top 3-5 entries are known harmless background warnings (Stooq DNS timeouts when their site is flaky, etc.). When a new entry appears in the top 5 — or an existing one jumps from 12 to 412 — that's the moment to trace it back to root cause.

The real lesson

A 24/7 daemon's logs are a forensic trail. Most days nothing in them is actionable. Some days everything in them is actionable — and you only notice if you have a baseline of what "normal" looks like for your install. Reading server.log weekly isn't optional ops chore; it's the same discipline as the Friday close ritual (lesson 10) applied to the infrastructure layer. The framework's job is to make the trade discipline easy; the logs are how you keep the framework itself honest. The work isn't catching the bug when it screams. It's noticing when the system stops screaming about something it should be screaming about.


Related: L10 — Friday close ritual · L38 — LaunchAgent 24/7 · three silent broker bugs post

← LESSON 40
Pattern clusters
TIER COMPLETE →
Back to Advanced index