docs: bootstrap bugs + networking docs from end-to-end LAN test #1

Merged
Sponge merged 2 commits from feature/bootstrap-fixes into develop 2026-05-25 06:00:24 +00:00
Owner

Summary

Documentation work from the first end-to-end LAN test of the ubuntu-scripts bootstrap on a Multipass VM. Captures the 3 real bugs surfaced during testing + comprehensive networking guidance for adopters.

Changes

b34f73e — docs: capture #1 #2 #3 bootstrap bugs from first end-to-end world test

  • Creates docs/TODO.md
  • #1 BLOCKER: memory-focused-scheduler missing from bootstrap
  • #2 LOW: enable-experimental-swap k3s-restart BackOff churn
  • #3 NIT: db-dbdepl-util parallel-init race (Funcom design — leave alone)

9f2d75f — docs: expand networking docs from end-to-end LAN test findings

  • ubuntu-scripts/README.md: new troubleshooting subsections (network topology bare-metal-vs-VM, NAT options matrix, kubectl-apply-reset gotcha)
  • docs/networking.md (NEW): full design doc with topology diagrams, iptables DNAT pattern, NAT-piercing options, diagnostics
  • ubuntu-scripts/examples/UserEngine.ini.example (NEW): annotated CVar overrides for adopters

Bug fixes for the 3 in docs/TODO.md not landed yet

Branch stays open after this merge for the actual bootstrap-fix commits to follow.

Test plan

  • End-to-end install + world creation tested on Multipass dune-test VM (Ubuntu 24.04 + k3s)
  • Game-server LAN connect works (verified by the operator from his own machine)
  • No personal/private details in any committed file (NAT findings generalized to universal pattern; Sponge's specific Starlink+NETGEAR topology stays in CLAUDE.md only)
  • Case-collision audit clean across all files
## Summary Documentation work from the first end-to-end LAN test of the ubuntu-scripts bootstrap on a Multipass VM. Captures the 3 real bugs surfaced during testing + comprehensive networking guidance for adopters. ## Changes **b34f73e — docs: capture #1 #2 #3 bootstrap bugs from first end-to-end world test** - Creates docs/TODO.md - #1 BLOCKER: memory-focused-scheduler missing from bootstrap - #2 LOW: enable-experimental-swap k3s-restart BackOff churn - #3 NIT: db-dbdepl-util parallel-init race (Funcom design — leave alone) **9f2d75f — docs: expand networking docs from end-to-end LAN test findings** - ubuntu-scripts/README.md: new troubleshooting subsections (network topology bare-metal-vs-VM, NAT options matrix, kubectl-apply-reset gotcha) - docs/networking.md (NEW): full design doc with topology diagrams, iptables DNAT pattern, NAT-piercing options, diagnostics - ubuntu-scripts/examples/UserEngine.ini.example (NEW): annotated CVar overrides for adopters ## Bug fixes for the 3 in docs/TODO.md not landed yet Branch stays open after this merge for the actual bootstrap-fix commits to follow. ## Test plan - [x] End-to-end install + world creation tested on Multipass dune-test VM (Ubuntu 24.04 + k3s) - [x] Game-server LAN connect works (verified by the operator from his own machine) - [x] No personal/private details in any committed file (NAT findings generalized to universal pattern; Sponge's specific Starlink+NETGEAR topology stays in CLAUDE.md only) - [x] Case-collision audit clean across all files
Three real bugs surfaced during the first end-to-end world-creation test
on the Multipass dune-test VM (Ubuntu 24.04 + k3s, world
sh-9e14edb5d4003e4b-qmqtjz, "Dune Test — Multipass VM"). The bootstrap
got us to operators-Running and the world up to PHASE=Running with both
Survival_1 + Overmap game servers serving — but only after working around
#1 with a live kubectl patch.

User directive (verbatim, 2026-05-24): "make a feature branch, and start
dealing with #1 #2 and #3, although... pause on the 1 2 and 3 right this
second, do the stage commit push and feature branch now".

This commit creates docs/TODO.md and captures the three bugs as pending
entries on feature/bootstrap-fixes so the branch has a clear ledger to
work against when work resumes:

  #1 BLOCKER memory-focused-scheduler missing from bootstrap. Funcom's
  world-template.yaml references this custom scheduler on every game-server
  set (30+ refs) but their Alpine VHDX must pre-deploy it — ours doesn't.
  Game-server pods sit Pending forever with zero scheduling events because
  the default kube-scheduler ignores them. Worked around live with a
  kubectl patch rewriting every set's schedulerName to default-scheduler.
  Three fix paths captured (deploy the scheduler / sed-strip in world.sh /
  runtime kubectl patch post-apply); recommend deploying the scheduler.

  #2 LOW enable-experimental-swap k3s-restart-induced BackOff churn.
  setup/experimental_swap.sh:restart_k3s_after_swap() restarts k3s + sleeps
  120s, but in-namespace dependents (director, gateway, text-router) come
  up before Postgres is ready and crash-loop with Connection refused until
  BackOff settles. Fix: kubectl-wait for DB Ready before returning.

  #3 NIT db-dbdepl-util parallel-init race — two jobs both CREATE DATABASE
  dune, one wins, one gets UniqueViolation and lands in Error. Functionally
  harmless (migrations complete via the winning job). Funcom design — leave
  open, monitor across Funcom build updates.

Work paused per user directive — picking up router questions next.
Surfaces all publicly-useful learnings from the first end-to-end world test
on the Multipass dune-test VM (Ubuntu 24.04 + k3s), generalized to a
universal "what does my topology need?" reference. No personal-host details
(no specific WAN IPs, no specific HostIds, no ISP-specific findings).

User directive (verbatim, 2026-05-24): "How much of it is 'dont mention we
did this' and how much is 'this is super helpful to others' that we can
document for the ubuntu setup process? Cause this SHOULD work bare metal,
but we are using an ubuntu VM, which either way, that much I know we
should document."

Changes:

* ubuntu-scripts/README.md — three new troubleshooting subsections:
  - "Network topology — bare-metal vs VM" — explains why VM-based installs
    need iptables DNAT (game pods use hostNetwork:true and bind to the
    VM's bridge IP, not the host's LAN IP). Includes the working iptables
    rule pattern + the persistence step.
  - "NAT options — choosing your reachability path" — matrix covering
    real-public-IPv4 / single-NAT / double-NAT / CGNAT / LAN-only, with
    detection commands and recommended fix per case. Links to
    docs/networking.md for deeper detail.
  - "kubectl apply -f /home/dune/.dune/sh-*.yaml resets your world" —
    the rendered spec ships image tags as 0-0-shipping placeholder, so
    re-applying it after update-from-downloads or kubectl patch operations
    overwrites the live-patched state. Treat the rendered YAML as a seed,
    not the live source of truth.

* docs/networking.md (NEW) — the deep design doc:
  - How HOST_DATACENTER_IP_ADDRESS announces work
  - Game-server port architecture (UDP 7777-7889 + TCP 31982)
  - Bare-metal vs VM topology with ASCII diagrams
  - Full iptables DNAT pattern for VM installs
  - NAT-piercing options for every common residential topology
    (single-NAT, double-NAT, CGNAT, LAN-only) with concrete commands
  - Tunnel-service options (Playit.gg, Tailscale Funnel, Cloudflare
    Spectrum, self-hosted WireGuard)
  - Diagnostic commands (ss listener check, tcpdump on WAN, director
    log search, iptables counter check)
  - Common failure-mode lookup table

* ubuntu-scripts/examples/UserEngine.ini.example (NEW) — annotated example
  CVar overrides anyone can use as a starting point. Includes:
  - Sandworm tuning block
  - Sandstorm + treasure
  - World-border kill-mechanisms DISABLED (Hazard.EnableQuicksandOnIGWBorders,
    Vehicle.LevelBorders*) — for friend-server sandbox use
  - Resource yield multipliers (commented out, pointing to Funcom defaults)
  - Vehicle fuel multiplier (dw.VehiclePowerConsumptionMultiplier=0)
    flagged with the "this is Funcom-internal dw.* namespace, verify on
    your build" caveat
  - Player-death loot toggle (commented out, for PvE-friendly configs)

Personal-host details (specific WAN/HostId/ISP topology) deliberately
kept out — those stayed in the session conversation and are not part of
the public release.
Sponge merged commit 5e7e3e4f49 into develop 2026-05-25 06:00:24 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Sponge/Dune-Awakening-Server-Tools!1
No description provided.