Bootstrap bugs #1 #2 #3 + UE5 server-binary item catalog (#4) #3

Merged
Sponge merged 12 commits from feature/bootstrap-bugs into develop 2026-05-26 04:00:51 +00:00
Owner

Summary

Resolves all four pending entries from docs/TODO.md via parallel-worktree agents. Verbatim entries with full resolution metadata (agent IDs, branch names, commit hashes) archived in docs/FINALIZED.md per LAW #0.

Code changes

  • #1memory-focused-scheduler bootstrap deploy (Fix Path A). New ubuntu-scripts/bootstrap/manifests/40-memory-focused-scheduler.yaml: SA + 2 ClusterRoleBindings to standard system:kube-scheduler + system:volume-scheduler + ConfigMap with KubeSchedulerConfiguration registering memory-focused-scheduler + Deployment of registry.k8s.io/kube-scheduler:v1.31.4. setup/bootstrap.sh applies it after operator workloads with a kubectl wait --for=condition=Available --timeout=180s. Fix Path B sed/jq fallback documented in ubuntu-scripts/README.md.
  • #2experimental_swap.sh kubectl-wait DB Ready (Fix Path A). restart_k3s_after_swap() shrinks floor sleep from 120s to 30s, adds API-server-up loop so kubectl queries don't race the systemctl restart, then blocks on kubectl rollout status statefulset/db-dbdepl-sts --timeout=180s as the real readiness gate. Skips gracefully if the statefulset isn't present yet.

Documentation changes

  • #3 — Cosmetic db-dbdepl-util Error Job (Funcom design, leave-alone disposition). ISSUES.md #3 expanded with explicit do-not-file-a-bug framing, kubectl get jobs | grep Error for spotting it, the success-signal log line, verification commands, and the upstream-Funcom-design note.

Reverse-engineering output

  • #4 — UE5 server-binary item catalog (Stream B). 9 files under DataExtract/ITEMS/ covering all 8 categories + INDEX. Tier scheme confirmed as _T0.._T6 suffix (7 tiers, zero-indexed). Schema-class counts: Weapons 145, Vehicles 377, Utility 351, Misc 399, Garments 68, Customization 109, Construction 569, Augmentations 35. Per-item template IDs are NOT in the server binary; they live in cooked client .pak content and need a follow-on extraction pass.

Housekeeping

  • DataExtract/CVAR/ reorganization (sibling to DataExtract/ITEMS/); .gitignore updated for new paths.
  • docs/FINALIZED.md created with verbatim archives; docs/TODO.md pending section now empty.

Test plan

  • Clean Ubuntu 24.04 VM: run setup.sh end-to-end and confirm memory-focused-scheduler Deployment becomes Available before world creation
  • Create a world: confirm sg-survival-1-pod-1 + sg-overmap-pod-2 schedule on first try (no manual kubectl patch)
  • Run battlegroup enable-experimental-swap: confirm director / gateway / text-router do NOT crash-loop through the k3s restart
  • Verify db-dbdepl-util Error Job still appears and ISSUES.md #3 reads clearly to a new adopter
  • Spot-check 3-4 entries from each DataExtract/ITEMS/ category file for accuracy against the binary
## Summary Resolves all four pending entries from `docs/TODO.md` via parallel-worktree agents. Verbatim entries with full resolution metadata (agent IDs, branch names, commit hashes) archived in `docs/FINALIZED.md` per LAW #0. ### Code changes - **#1 — `memory-focused-scheduler` bootstrap deploy** (Fix Path A). New `ubuntu-scripts/bootstrap/manifests/40-memory-focused-scheduler.yaml`: SA + 2 ClusterRoleBindings to standard `system:kube-scheduler` + `system:volume-scheduler` + ConfigMap with `KubeSchedulerConfiguration` registering `memory-focused-scheduler` + Deployment of `registry.k8s.io/kube-scheduler:v1.31.4`. `setup/bootstrap.sh` applies it after operator workloads with a `kubectl wait --for=condition=Available --timeout=180s`. Fix Path B sed/jq fallback documented in `ubuntu-scripts/README.md`. - **#2 — `experimental_swap.sh` kubectl-wait DB Ready** (Fix Path A). `restart_k3s_after_swap()` shrinks floor sleep from 120s to 30s, adds API-server-up loop so kubectl queries don't race the systemctl restart, then blocks on `kubectl rollout status statefulset/db-dbdepl-sts --timeout=180s` as the real readiness gate. Skips gracefully if the statefulset isn't present yet. ### Documentation changes - **#3 — Cosmetic `db-dbdepl-util` Error Job** (Funcom design, leave-alone disposition). `ISSUES.md` #3 expanded with explicit do-not-file-a-bug framing, `kubectl get jobs | grep Error` for spotting it, the success-signal log line, verification commands, and the upstream-Funcom-design note. ### Reverse-engineering output - **#4 — UE5 server-binary item catalog** (Stream B). 9 files under `DataExtract/ITEMS/` covering all 8 categories + INDEX. Tier scheme confirmed as `_T0`..`_T6` suffix (7 tiers, zero-indexed). Schema-class counts: Weapons 145, Vehicles 377, Utility 351, Misc 399, Garments 68, Customization 109, Construction 569, Augmentations 35. Per-item template IDs are NOT in the server binary; they live in cooked client `.pak` content and need a follow-on extraction pass. ### Housekeeping - `DataExtract/CVAR/` reorganization (sibling to `DataExtract/ITEMS/`); `.gitignore` updated for new paths. - `docs/FINALIZED.md` created with verbatim archives; `docs/TODO.md` pending section now empty. ## Test plan - [ ] Clean Ubuntu 24.04 VM: run `setup.sh` end-to-end and confirm `memory-focused-scheduler` Deployment becomes Available before world creation - [ ] Create a world: confirm `sg-survival-1-pod-1` + `sg-overmap-pod-2` schedule on first try (no manual `kubectl patch`) - [ ] Run `battlegroup enable-experimental-swap`: confirm director / gateway / text-router do NOT crash-loop through the k3s restart - [ ] Verify `db-dbdepl-util` `Error` Job still appears and `ISSUES.md` #3 reads clearly to a new adopter - [ ] Spot-check 3-4 entries from each `DataExtract/ITEMS/` category file for accuracy against the binary
Mirrors the 3 bootstrap bugs from docs/TODO.md into a public-facing root-level
markdown file with adopter-friendly framing: symptom (with reproducible
output snippets), root cause, workaround that works TODAY, and the proposed
permanent fix (with the variants we considered).

Why a separate file from docs/TODO.md:

- docs/TODO.md is workflow ledger style — user-verbatim directives, pending /
  in-progress / finalized sections, internal-engineering tone
- ISSUES.md is adopter-facing — "if you hit this symptom, here's what to do"
  with copy-pasteable workarounds and concrete kubectl commands

The two complement each other; ISSUES.md links docs/TODO.md for deeper
diagnosis and docs/networking.md for NAT/reachability questions that aren't
bootstrap bugs.

Coverage:

* #1 BLOCKER — memory-focused-scheduler missing. Symptom (Pending pods with
  zero scheduler events), root cause (Funcom template references a scheduler
  we don't deploy), workaround (kubectl patch every set's schedulerName to
  default-scheduler), proposed fix (deploy the scheduler / strip from world
  template render / runtime patch in world.sh — variants A/B/C).
* #2 LOW — enable-experimental-swap k3s-restart BackOff churn. Symptom
  (RESTARTS=3-4 on bgd/sgw/tr after experimental_swap.sh), root cause
  (dependents reconnect to Postgres before DB pod finishes its own restart),
  workaround (wait it out OR force-restart dependents), proposed fix
  (kubectl wait for DB Ready in restart_k3s_after_swap()).
* #3 NIT — db-dbdepl-util parallel-init race. Symptom (Error+Completed Jobs
  side-by-side), root cause (Funcom's design — parallel jobs both try
  CREATE DATABASE dune), workaround (delete the Errored one or ignore),
  no proposed fix (Funcom's bug, leave alone).

Also includes a "Reporting new issues" footer pointing adopters to the
Forgejo issue tracker with a request template (Ubuntu version, topology,
status output, director logs).

This commit is the first new commit on feature/bootstrap-bugs after
branching from develop @ 5e7e3e4. Branch is scoped to the bug-fix work
itself — this docs commit is the public-facing surface; subsequent
commits on this branch will land the actual bootstrap manifest / world.sh
changes.
When the repo lives at a dev-folder location separate from the Steam install
folder (rather than inside the Steam folder directly), users typically create
symlinks `images` and `scripts` at the repo root pointing back at Funcom's
content. Those symlinks were NOT matched by the existing `images/` + `scripts/`
gitignore patterns (which only catch real directories).

This commit adds root-anchored patterns `/images` and `/scripts` that catch
the symlink case, alongside the existing `images/` + `scripts/` directory
patterns. Both layouts work:

  - Repo cloned INTO Steam folder       → images/ + scripts/ are real dirs → ignored
  - Repo cloned to dev location         → images + scripts are symlinks   → ignored

No behavior change for users who run the repo from inside the Steam install
folder (the documented happy path in README.md). This just adds correctness
for users who keep the repo at a separate dev location and symlink the
Funcom-IP dirs into place.
Organizing reverse-engineering output by category. Update .gitignore to
catch CVARS-rest.md + SETTINGS-CATALOG-*.txt firehose dumps at the new
path (already-ignored at the old path).
restart_k3s_after_swap() previously did 'systemctl restart k3s' + sleep 120
and returned. The 120s covered k3s itself but the in-namespace dependents
(battlegroup-director, server-gateway, text-router) tried to reconnect to
Postgres before the DB pod finished its own restart-and-readiness, causing
~2 min of crash-loop BackOff that looked broken in test logs.

Now:
- Floor wait shrunk to 30s (k3s containerd/API come back inside this window).
- Wait for the k3s API server to respond before any kubectl query so we
  don't race 'connection refused' immediately after systemctl restart.
- If statefulset/db-dbdepl-sts exists in funcom-seabass-$bgname, block on
  'kubectl rollout status statefulset/db-dbdepl-sts --timeout=180s' so
  dependents only reconnect after Postgres is Ready.
- If the statefulset isn't present yet, skip the gate cleanly — the
  BackOff symptom only occurs when those deployments already exist.
- Function now takes $bgname; main() passes it through.

docs/TODO.md #2 fix path A.
Expand the #3 entry from a terse nit into adopter-facing guidance:
- Reframe severity language so users do not mistake the Error Job
  for an install failure or open spurious bug reports.
- Add explicit verification command for confirming the install is
  actually healthy after settle time (kubectl get pods | grep -v
  Completed | grep -v Running should be empty), plus the battlegroup
  CR phase check as a secondary confirmation.
- Spell out the upstream-Funcom-design framing and note that we
  expect the symptom to disappear if Funcom changes their job-spawn
  pattern in a future build update, with no change required on our
  end. State explicitly that we are not patching around this in
  ubuntu-scripts/ because it would be ongoing maintenance burden
  for a cosmetic symptom.
- Demote the kubectl delete workaround from a recommended fix to a
  'if you really must' footnote, per the docs/TODO.md directive
  that calls a world.sh sweep 'a maintenance burden we don't want'.
Funcom's world template hard-codes schedulerName: memory-focused-scheduler
on every game-server set. Their Alpine VHDX pre-deploys a scheduler with
that name; our clean Ubuntu install did not, so game-server pods stayed
phase: Pending forever with zero scheduling events while every other pod
in the stack scheduled fine.

Add bootstrap/manifests/40-memory-focused-scheduler.yaml — a secondary
kube-scheduler (registry.k8s.io/kube-scheduler:v1.31.4) running in
kube-system, registered under the memory-focused-scheduler name via a
KubeSchedulerConfiguration profile. Reuses the default plugin set
(NodeResourcesFit already does memory-aware scoring) and the standard
system:kube-scheduler + system:volume-scheduler ClusterRoles.

setup/bootstrap.sh applies the manifest after the operator workloads and
kubectl-waits for the Deployment to become Available before returning so
downstream world.sh sees a ready scheduler.

README troubleshooting gains a Scheduler section explaining the design
plus a sed/jq-based fallback path for users who can't run a custom
scheduler (managed k8s, hardened policies, etc.).
Item-catalog derivative work from string-table analysis of the
DuneSandboxServer-Linux-Shipping binary. Categorised into the 8 buckets
requested (Weapons / Vehicles / Utility / Misc / Garments / Customization /
Construction / Augmentations) plus an INDEX summarising counts and the
canonical 7-tier scheme.

Documents the schema-level item data the server binary exposes:
UCLASS / USTRUCT / UENUM type names, DataTable row-struct shapes
(WeaponItemTableRow, ArmorItemTableRow, AugmentItemTableRow, etc.),
runtime processors, stat-enum surfaces, and the 8 explicit
tier-tagged literal IDs that surface as plain strings (WeaponsSet_T5,
WeaponsSet_T6, HeavyArmorSet_T6, LightArmorSet_T6, VehicleAssembly_*_T6).

Per-item template IDs (Pistol_T3_Maula, etc.) are NOT in the server
binary — they live in cooked client .pak content and need a follow-on
extraction pass against UnrealPak-extracted assets.
Implements docs/TODO.md #1 Fix Path A. Spawned in parallel worktree.
Implements docs/TODO.md #2 Fix Path A. Spawned in parallel worktree.
Implements docs/TODO.md #3 — leave-alone disposition + adopter-facing guidance.
Spawned in parallel worktree.
Implements docs/TODO.md #4 — DataExtract/ITEMS/ catalog covering schema-level
data for the 8 requested categories + 7-tier scheme. Per-item template IDs
identified as living in client .pak content, follow-on pass needed for full
enumeration. Spawned in parallel worktree.
All four entries landed via parallel-worktree agents on 2026-05-25:
- #1 memory-focused-scheduler bootstrap deploy   (merge 3aed094)
- #2 experimental_swap kubectl-wait DB Ready     (merge 3febbec)
- #3 ISSUES.md cosmetic Error Job documentation  (merge 8ea5b82)
- #4 DataExtract/ITEMS/ UE5 server-binary catalog (merge cbdc05b)

Verbatim entries preserved per LAW #0; resolution metadata (agent ID,
branch, commit, merge) appended to each FINALIZED entry. TODO.md
pending section now empty.
Sponge merged commit f2245c9fbb into develop 2026-05-26 04:00:51 +00:00
Sponge deleted branch feature/bootstrap-bugs 2026-05-26 04:00:51 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Sponge/Dune-Awakening-Server-Tools!3
No description provided.