docs: decompose TODO #5/#5b + execute Phase A smoke-test (exact parity with attempt-3) #5

Merged
Sponge merged 3 commits from feature/decompose-pak-extraction into develop 2026-05-26 18:00:53 +00:00
Owner

Summary

Decomposes TODO #5 / #5b (client-pak item ID enumeration + Path C strategic pivot) into 14 cascade-pickable tasks across 5 phases, then executes the entire Phase A smoke-test cascade end-to-end with exact-digit parity against the attempt-3 reference catalog.

What's in this PR

  1. docs/DECOMPOSED.md (new) — 14 tasks across Phase A (smoke-test, no new code) → Phase B (attempt-4 row-decode pipeline) → Phase C (catalog enrichment) → Phase D (cross-platform) → Phase E (hygiene). Every task carries verbatim user directives from TODO.md or STATUS.md per LAW #0.

  2. Phase A executed + resolved:

    • A1python -m dune_extract --dry-run cold from a fresh ./install-prereqs.sh: exit 0, all 4 self-checks green (Steam install / aesdumpster / Win64 binary / repak_cli 0.2.3), per-row parser status correctly points to STATUS.md §Path forward to attempt 4.
    • A2 — real extraction via python -m dune_extract: 131,654 file entries across 37 paks / 16 skipped / 0 failed / 25,618 stems categorized in 2.96s wall. Per-category totals match attempt-3 to the digit across all 8 categories.
    • A3 — Python-based per-category stem-set diff: EQUAL across all 8 categories (Weapons 4303, Vehicles 8590, Garments 5205, Augmentations 194, Customization 499, Construction 3763, Misc 883, Utility 2134), zero symmetric difference. Acceptable formatting drift only (blockquote prefix, 2-col vs 3-col DataTable inventory).

Conclusion

tools/dune-extract/ (PR #4 scaffold) is production-ready at the stem-enumeration layer. Cold install works, real extraction works, output is content-equivalent to the in-repo attempt-3 reference catalog. The strategic pivot premise from TODO #5b ("ship tooling, not pre-extracted data") is validated — end users can reproduce the entire current catalog from this tool against their own legal install.

Cascade state after this PR

  • TODO #5 remains [~] (Phase B / row-decode pipeline still pending)
  • DECOMPOSED.md Phase A all [x], Phases B-E all [ ]
  • Next pickup per YOLO cascade rule (lowest grain first): B1 — EncodedPakEntries blob decode (the hardest of the three RE steps; 1-3 day estimate per STATUS.md)

Test plan for reviewer

```bash
cd tools/dune-extract
source .venv/bin/activate
python -m dune_extract --dry-run # A1
python -m dune_extract # A2 (~3 sec)
diff -q dune-extract-output/ITEMS-Weapons-Paths.md ../../DataExtract/ITEMS/ITEMS-Weapons-Paths.md
```

## Summary Decomposes TODO #5 / #5b (client-pak item ID enumeration + Path C strategic pivot) into 14 cascade-pickable tasks across 5 phases, then executes the entire Phase A smoke-test cascade end-to-end with **exact-digit parity** against the attempt-3 reference catalog. ### What's in this PR 1. **`docs/DECOMPOSED.md`** (new) — 14 tasks across Phase A (smoke-test, no new code) → Phase B (attempt-4 row-decode pipeline) → Phase C (catalog enrichment) → Phase D (cross-platform) → Phase E (hygiene). Every task carries verbatim user directives from TODO.md or STATUS.md per LAW #0. 2. **Phase A executed + resolved:** - **A1** — `python -m dune_extract --dry-run` cold from a fresh `./install-prereqs.sh`: exit 0, all 4 self-checks green (Steam install / aesdumpster / Win64 binary / repak_cli 0.2.3), per-row parser status correctly points to STATUS.md §Path forward to attempt 4. - **A2** — real extraction via `python -m dune_extract`: 131,654 file entries across 37 paks / 16 skipped / 0 failed / 25,618 stems categorized in 2.96s wall. Per-category totals match attempt-3 to the digit across all 8 categories. - **A3** — Python-based per-category stem-set diff: EQUAL across all 8 categories (Weapons 4303, Vehicles 8590, Garments 5205, Augmentations 194, Customization 499, Construction 3763, Misc 883, Utility 2134), zero symmetric difference. Acceptable formatting drift only (blockquote prefix, 2-col vs 3-col DataTable inventory). ### Conclusion `tools/dune-extract/` (PR #4 scaffold) is production-ready at the stem-enumeration layer. Cold install works, real extraction works, output is content-equivalent to the in-repo attempt-3 reference catalog. The strategic pivot premise from TODO #5b (\"ship tooling, not pre-extracted data\") is validated — end users can reproduce the entire current catalog from this tool against their own legal install. ### Cascade state after this PR - TODO #5 remains `[~]` (Phase B / row-decode pipeline still pending) - DECOMPOSED.md Phase A all `[x]`, Phases B-E all `[ ]` - Next pickup per YOLO cascade rule (lowest grain first): **B1 — EncodedPakEntries blob decode** (the hardest of the three RE steps; 1-3 day estimate per STATUS.md) ### Test plan for reviewer \`\`\`bash cd tools/dune-extract source .venv/bin/activate python -m dune_extract --dry-run # A1 python -m dune_extract # A2 (~3 sec) diff -q dune-extract-output/ITEMS-Weapons-Paths.md ../../DataExtract/ITEMS/ITEMS-Weapons-Paths.md \`\`\`
Decomposes the active client-pak extraction work into five ordered phases:
- A1-A3 smoke-test the landed Path C scaffold (no new code)
- B1-B4 attempt-4 row-decode pipeline (EncodedPakEntries → AES+Oodle → UAsset parse → field-filter wire)
- C1-C3 catalog enrichment + sample re-publication
- D1-D3 cross-platform (Windows/Proton/WSL) + diff tool
- E1-E3 hygiene (verbose output, JSON/CSV parity, troubleshooting cookbook)

Each task carries verbatim user directives from docs/TODO.md #5/#5b
(LAW #0) and verbatim step text from DataExtract/ITEMS/STATUS.md
§Path forward to attempt 4. Suggested first pickup: A1 (cheapest
gate; surfaces scaffold cold-start friction without writing code).
Verbatim dry-run stdout captured in DECOMPOSED.md A1 Resolved block.
All four self-checks green: Steam install auto-detected (53 paks
42.4 GB), aesdumpster + Win64 binary + repak_cli 0.2.3 all resolved,
all 8 categories selected, per-row parser status correctly points to
STATUS.md §Path forward to attempt 4. Zero drift from
tools/dune-extract/README.md §3 quick-start. No follow-up tasks filed.
Next pickup per cascade: A2 (real run end-to-end + 25,618-stem parity).
A2 — real extraction via `python -m dune_extract` defaults: 131,654
file entries across 37 paks, 16 skipped, 0 failed, 25,618 stems
categorized in 2.964s wall. Per-category totals match attempt-3 to
the digit across all 8 categories.

A3 — Python-based per-category stem-set diff: EQUAL on all 8
categories (Weapons 4303, Vehicles 8590, Garments 5205, Augmentations
194, Customization 499, Construction 3763, Misc 883, Utility 2134),
zero symmetric difference. Acceptable formatting drift (blockquote
prefix on totals line, 3-col vs 2-col DataTable inventory tables);
no content drift.

Phase A complete. No regression tasks filed. Cascade ready for
Phase B (attempt-4 row-decode pipeline: EncodedPakEntries decode →
AES+Oodle UAsset decompress → UAsset binary parse).
Sponge merged commit 9260e53370 into develop 2026-05-26 18:00:53 +00:00
Sponge deleted branch feature/decompose-pak-extraction 2026-05-26 18:00:53 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Sponge/Dune-Awakening-Server-Tools!5
No description provided.