Client-pak item extraction: Path C — tools/dune-extract/ + 25,618-stem sample catalog #4
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feature/client-pak-extraction"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Implements docs/TODO.md #5 — the client-pak item ID enumeration follow-on to FINALIZED #4 — via the Path C strategic pivot: ship the tooling, not pre-extracted rich data.
The tool —
tools/dune-extract/(17 files / +3,122 lines)End-user shippable Python package. MIT-licensed (matches
ubuntu-scripts/LICENSEposture). Operates entirely on the user's own legally-installed copy of Dune Awakening; never uploads or shares extracted content.Features:
aes_acquire.KeyHandle— key lives only in process memory, never persisted to disk, never echoed to terminal, never logged.aesdumpster-rsaes.txtside-effect scrubbed on exit.bytearray+ zero-on-cleanup.FlavorText/LoreDescription/BackgroundStory/ etc.), Funcom-internal dev commentary (Comments/Notes/Author). Description-length + Dune-proper-noun + quoted-dialogue + em-dash-prose heuristics catch narrative prose even when it lives in a normally-allowed field. Dev items themselves are included; their dev-only commentary is not.--include-asset-paths/--include-lore/--include-dev-commentary) for local-only deep dives; README warns against publishing results from these flags.--dry-runmode validates Steam install + AES key acquisition + pak readability without extracting; users verify before committing to a full run.install-prereqs.shdoes user-level setup (cargo + aesdumpster-rs + repak + Python venv); no sudo.tools/dune-extract/dune_extract/third_party/README.md: AESDumpster (GHFear) → aesdumpster-rs (yuhkix) → repak (trumank).uasset_parser.py) stubbed withNotImplementedErrorpointing atSTATUS.md §Path forward to attempt 4. Row decode is the open workstream — 3 RE steps documented (EncodedPakEntries decode → AES+Oodle UAsset decompress → UAsset binary parse), estimated 1-3 days agent time. Runs locally per user via the tool when complete; never published by us.The sample catalog —
DataExtract/ITEMS/Pre-baked from the same code path the tool uses, generated against game build 1968181 on 2026-05-25/26. Quick-start for adopters who can't run the tool yet.
ITEMS-<Category>-IDs.md(8 files) — 25,618 categorized item-bearing asset stems by tier + dev/test flag (T0-T6 + Unknown + Dev/Test/Cut subsection)ITEMS-<Category>-Paths.md(8 files) — path tree + per-stem pak+directory provenance + 1,725 DataTable files inventoried per categoryITEMS-INDEX.md— master index with countsSTATUS.md— three-attempt history + UE5.4 FDI quirks documented + attempt-4 RE roadmapPer-category stem counts (final state, attempt 3):
IP posture
Sits on the well-precedented fact-vs-expression doctrine (US copyright). What ships: mechanical/functional facts (names, stats, tier, schematic prereqs, dev-flag values) — same class as every major game wiki publishes routinely. What doesn't ship: narrative expression (lore prose), visual / audio assets (we never had these), or asset paths that would serve as a roadmap into binary assets we don't ship. AES black-box operates under DMCA Section 1201(f) interoperability exception + Section 117 essential-step doctrine + Sega v. Accolade (1992) precedent — same legal class as 25+ years of UE-modding tooling.
Files changed
12 commits across three sub-streams:
tools/dune-extract/)Test plan
Verified by agents (no live game runs per project LAW):
python3 -m py_compileclean across all 11 Python modules indune_extract/FieldFilter.allow()passed 12 policy cases covering mechanical include + lore exclude + dev-commentary exclude + asset-path exclude + Description-length + proper-noun + quoted-dialogue + em-dash heuristic + opt-in flag flipsenrichment.categorize()/get_tier()/is_dev()smoke-testeduasset_parser.parse_row()raisesNotImplementedErrorwithattempt-4referencepython -m dune_extract --version/--helpcleanpython -m dune_extract --dry-runagainst live install reports 53 paks / 42.4 GB and all 3 external tools foundbash -n install-prereqs.shcleanCo-Authored-By: Claudeanywhere.claude/.envgitignored,aes.txtscrubbed inaes_acquire.py)End-user validation TODO (post-merge, on a clean machine):
tools/dune-extract/install-prereqs.shon fresh Ubuntu — confirm Rust + aesdumpster + repak + Python venv land cleanly without sudopython -m dune_extract --dry-runreports Steam install + key acquisition + pak readability all OKpython -m dune_extractproduces a catalog matching the in-repo sample (stems + paths + DataTable inventory)AES key acquired inline via aesdumpster-rs and wired into .claude/.env (gitignored). Second extraction agent hit a Funcom-specific UE5.4 pak-index variant that breaks repak / CUE4Parse / umodel identically ('Invalid FString length 4194304' at PathHashIndex parse). Workaround: mmap + regex on the unencrypted FullDirectoryIndex region — 9,030 asset stems enumerated across 8 categories + 53 dev/test/cut flagged. Per-row DataTable contents (names, descriptions, stats, schematic prereqs, dev-flag values) still pending. Four unblock paths documented.Lift attempt-3's .extract/parse_paks.py + .extract/build_enrichment.py into a packaged, CLI-driven, end-user-shippable tool. Algorithmic logic preserved byte-for-byte (UE5.4 FDI parsing + Funcom variant quirks + category routing + dev/test heuristics); shape refactored for argparse CLI, public package API, and per-format catalog writers. Layout: tools/dune-extract/ LICENSE MIT, scoped to the tool README.md End-user docs + IP-posture explainer install-prereqs.sh Rust + aesdumpster-rs + repak + Python venv (user-level) pyproject.toml PEP 621 metadata, console-script entry examples/usage.md Invocation examples (dry-run, JSON, CSV, power-user flags) dune_extract/ __init__.py Public API exports __main__.py argparse CLI + dry-run report steam_locator.py Auto-detect Linux Steam install (3 canonical roots) aes_acquire.py Black-box AES wrapper (in-memory KeyHandle, zero-on-exit, aesdumpster stdout capture, aes.txt side-effect scrub) fdi_parser.py Lifted FDI parser — 5 Funcom quirks preserved enrichment.py Lifted category routing + tier extraction + dev heuristic field_filter.py Configurable include/exclude policy + Description-length / Dune-proper-noun lore heuristic pak_extract.py repak wrapper (deferred; v0.1 catalog gen doesn't need it) catalog_writer.py Markdown + JSON + CSV writers uasset_parser.py Stub raising NotImplementedError with attempt-4 reference third_party/README.md Attribution chain (AESDumpster -> aesdumpster-rs -> ours) Field-filter defaults per user directive 2026-05-26 turn 3: INCLUDE — item ID, display name, short mechanical description, mechanical stats (Damage/RPM/Range/Weight/Capacity/etc.), tier, schematic prereqs, dev-flag VALUES (bIsDevOnly/bIsHidden/bIsTestItem/bIsDeprecated) EXCLUDE BY DEFAULT (opt-in via flag for local power-user runs): --include-asset-paths Mesh/Icon/Sound/Texture/Material + /Game/ refs --include-lore FlavorText/BackgroundStory/JournalEntry/Quote/... --include-dev-commentary Comments/Notes/Author/EditNotes/DevNotes/... Dev ITEMS themselves: always included. Only the dev *commentary* fields on them get stripped. AES black-box flow: 1. aesdumpster (yuhkix-rs port) statically scans Win64 shipping binary 2. stdout captured to in-process buffer, key parsed via regex 3. aes.txt side-effect file overwritten + unlinked in temp dir 4. KeyHandle stores key as bytearray, zeros on context-manager exit 5. Key never touches disk, never logged, never argv-leaked except where repak forces --aes-key <HEX> (documented limitation, mitigations listed) CLI: dune-extract [--client-path DIR] [--output-dir DIR] [--categories LIST] [--format markdown|json|csv] [--include-asset-paths] [--include-lore] [--include-dev-commentary] [--dry-run] [--verbose] [--version] Dry-run verified against the live install: detects 53 paks / 42.4 GB, finds aesdumpster + repak + Win64 binary, reports per-row parser as in-development with attempt-4 reference. Real-run path is wired but out of scope per agent constraints. UAsset row parsing (attempt 4) is the next surface — three steps remain documented in DataExtract/ITEMS/STATUS.md and surfaced via uasset_parser.status() for tooling integration.