docs: decompose TODO #5/#5b + execute Phase A smoke-test (exact parity with attempt-3)

Sponge commented

2026-05-26 17:55:34 +00:00

Owner

Summary

Decomposes TODO #5 / #5b (client-pak item ID enumeration + Path C strategic pivot) into 14 cascade-pickable tasks across 5 phases, then executes the entire Phase A smoke-test cascade end-to-end with exact-digit parity against the attempt-3 reference catalog.

What's in this PR

docs/DECOMPOSED.md (new) — 14 tasks across Phase A (smoke-test, no new code) → Phase B (attempt-4 row-decode pipeline) → Phase C (catalog enrichment) → Phase D (cross-platform) → Phase E (hygiene). Every task carries verbatim user directives from TODO.md or STATUS.md per LAW #0.
Phase A executed + resolved:
- A1 — python -m dune_extract --dry-run cold from a fresh ./install-prereqs.sh: exit 0, all 4 self-checks green (Steam install / aesdumpster / Win64 binary / repak_cli 0.2.3), per-row parser status correctly points to STATUS.md §Path forward to attempt 4.
- A2 — real extraction via python -m dune_extract: 131,654 file entries across 37 paks / 16 skipped / 0 failed / 25,618 stems categorized in 2.96s wall. Per-category totals match attempt-3 to the digit across all 8 categories.
- A3 — Python-based per-category stem-set diff: EQUAL across all 8 categories (Weapons 4303, Vehicles 8590, Garments 5205, Augmentations 194, Customization 499, Construction 3763, Misc 883, Utility 2134), zero symmetric difference. Acceptable formatting drift only (blockquote prefix, 2-col vs 3-col DataTable inventory).

Conclusion

tools/dune-extract/ (PR #4 scaffold) is production-ready at the stem-enumeration layer. Cold install works, real extraction works, output is content-equivalent to the in-repo attempt-3 reference catalog. The strategic pivot premise from TODO #5b ("ship tooling, not pre-extracted data") is validated — end users can reproduce the entire current catalog from this tool against their own legal install.

Cascade state after this PR

TODO #5 remains [~] (Phase B / row-decode pipeline still pending)
DECOMPOSED.md Phase A all [x], Phases B-E all [ ]
Next pickup per YOLO cascade rule (lowest grain first): B1 — EncodedPakEntries blob decode (the hardest of the three RE steps; 1-3 day estimate per STATUS.md)

Test plan for reviewer

```bash
cd tools/dune-extract
source .venv/bin/activate
python -m dune_extract --dry-run # A1
python -m dune_extract # A2 (~3 sec)
diff -q dune-extract-output/ITEMS-Weapons-Paths.md ../../DataExtract/ITEMS/ITEMS-Weapons-Paths.md
```

## Summary Decomposes TODO #5 / #5b (client-pak item ID enumeration + Path C strategic pivot) into 14 cascade-pickable tasks across 5 phases, then executes the entire Phase A smoke-test cascade end-to-end with **exact-digit parity** against the attempt-3 reference catalog. ### What's in this PR 1. **`docs/DECOMPOSED.md`** (new) — 14 tasks across Phase A (smoke-test, no new code) → Phase B (attempt-4 row-decode pipeline) → Phase C (catalog enrichment) → Phase D (cross-platform) → Phase E (hygiene). Every task carries verbatim user directives from TODO.md or STATUS.md per LAW #0. 2. **Phase A executed + resolved:** - **A1** — `python -m dune_extract --dry-run` cold from a fresh `./install-prereqs.sh`: exit 0, all 4 self-checks green (Steam install / aesdumpster / Win64 binary / repak_cli 0.2.3), per-row parser status correctly points to STATUS.md §Path forward to attempt 4. - **A2** — real extraction via `python -m dune_extract`: 131,654 file entries across 37 paks / 16 skipped / 0 failed / 25,618 stems categorized in 2.96s wall. Per-category totals match attempt-3 to the digit across all 8 categories. - **A3** — Python-based per-category stem-set diff: EQUAL across all 8 categories (Weapons 4303, Vehicles 8590, Garments 5205, Augmentations 194, Customization 499, Construction 3763, Misc 883, Utility 2134), zero symmetric difference. Acceptable formatting drift only (blockquote prefix, 2-col vs 3-col DataTable inventory). ### Conclusion `tools/dune-extract/` (PR #4 scaffold) is production-ready at the stem-enumeration layer. Cold install works, real extraction works, output is content-equivalent to the in-repo attempt-3 reference catalog. The strategic pivot premise from TODO #5b (\"ship tooling, not pre-extracted data\") is validated — end users can reproduce the entire current catalog from this tool against their own legal install. ### Cascade state after this PR - TODO #5 remains `[~]` (Phase B / row-decode pipeline still pending) - DECOMPOSED.md Phase A all `[x]`, Phases B-E all `[ ]` - Next pickup per YOLO cascade rule (lowest grain first): **B1 — EncodedPakEntries blob decode** (the hardest of the three RE steps; 1-3 day estimate per STATUS.md) ### Test plan for reviewer \`\`\`bash cd tools/dune-extract source .venv/bin/activate python -m dune_extract --dry-run # A1 python -m dune_extract # A2 (~3 sec) diff -q dune-extract-output/ITEMS-Weapons-Paths.md ../../DataExtract/ITEMS/ITEMS-Weapons-Paths.md \`\`\`

Sponge added 3 commits

2026-05-26 17:55:34 +00:00

docs: DECOMPOSED.md — TODO #5/#5b breakdown into 14 cascade-pickable tasks 958e5de01f

Decomposes the active client-pak extraction work into five ordered phases:
- A1-A3 smoke-test the landed Path C scaffold (no new code)
- B1-B4 attempt-4 row-decode pipeline (EncodedPakEntries → AES+Oodle → UAsset parse → field-filter wire)
- C1-C3 catalog enrichment + sample re-publication
- D1-D3 cross-platform (Windows/Proton/WSL) + diff tool
- E1-E3 hygiene (verbose output, JSON/CSV parity, troubleshooting cookbook)

Each task carries verbatim user directives from docs/TODO.md #5/#5b
(LAW #0) and verbatim step text from DataExtract/ITEMS/STATUS.md
§Path forward to attempt 4. Suggested first pickup: A1 (cheapest
gate; surfaces scaffold cold-start friction without writing code).

docs: A1 resolved — dune-extract --dry-run cold smoke test passes clean c9e18da0c7

Verbatim dry-run stdout captured in DECOMPOSED.md A1 Resolved block.
All four self-checks green: Steam install auto-detected (53 paks
42.4 GB), aesdumpster + Win64 binary + repak_cli 0.2.3 all resolved,
all 8 categories selected, per-row parser status correctly points to
STATUS.md §Path forward to attempt 4. Zero drift from
tools/dune-extract/README.md §3 quick-start. No follow-up tasks filed.
Next pickup per cascade: A2 (real run end-to-end + 25,618-stem parity).

docs: A2 + A3 resolved — Phase A complete, exact parity with attempt-3 25e1ec4548

A2 — real extraction via `python -m dune_extract` defaults: 131,654
file entries across 37 paks, 16 skipped, 0 failed, 25,618 stems
categorized in 2.964s wall. Per-category totals match attempt-3 to
the digit across all 8 categories.

A3 — Python-based per-category stem-set diff: EQUAL on all 8
categories (Weapons 4303, Vehicles 8590, Garments 5205, Augmentations
194, Customization 499, Construction 3763, Misc 883, Utility 2134),
zero symmetric difference. Acceptable formatting drift (blockquote
prefix on totals line, 3-col vs 2-col DataTable inventory tables);
no content drift.

Phase A complete. No regression tasks filed. Cascade ready for
Phase B (attempt-4 row-decode pipeline: EncodedPakEntries decode →
AES+Oodle UAsset decompress → UAsset binary parse).

Sponge merged commit 9260e53370 into develop

2026-05-26 18:00:53 +00:00

Sponge deleted branch feature/decompose-pak-extraction

2026-05-26 18:00:53 +00:00

Sponge referenced this pull request from a commit

2026-05-26 18:00:53 +00:00

Merge pull request 'docs: decompose TODO #5/#5b + execute Phase A smoke-test (exact parity with attempt-3)' (#5) from feature/decompose-pak-extraction into develop

Sponge referenced this pull request

2026-05-26 18:44:44 +00:00

dune-extract attempt-4: B1 EncodedPakEntries + FU1-4 + B2 read/decrypt/Oodle + B3 DataTable parser #6

Sponge referenced this pull request from a commit

2026-05-29 20:13:00 +00:00