Client-pak item extraction: Path C — tools/dune-extract/ + 25,618-stem sample catalog #4

Merged
Sponge merged 12 commits from feature/client-pak-extraction into develop 2026-05-26 16:03:16 +00:00
Owner

Summary

Implements docs/TODO.md #5 — the client-pak item ID enumeration follow-on to FINALIZED #4 — via the Path C strategic pivot: ship the tooling, not pre-extracted rich data.

The tool — tools/dune-extract/ (17 files / +3,122 lines)

End-user shippable Python package. MIT-licensed (matches ubuntu-scripts/LICENSE posture). Operates entirely on the user's own legally-installed copy of Dune Awakening; never uploads or shares extracted content.

Features:

  • AES black-box via aes_acquire.KeyHandle — key lives only in process memory, never persisted to disk, never echoed to terminal, never logged. aesdumpster-rs aes.txt side-effect scrubbed on exit. bytearray + zero-on-cleanup.
  • Field filter with defaults landed exactly to user directive: includes mechanical stats / item names / short functional descriptions / tier / schematic prereqs / dev-flag values; excludes asset paths, lore (FlavorText / LoreDescription / BackgroundStory / etc.), Funcom-internal dev commentary (Comments / Notes / Author). Description-length + Dune-proper-noun + quoted-dialogue + em-dash-prose heuristics catch narrative prose even when it lives in a normally-allowed field. Dev items themselves are included; their dev-only commentary is not.
  • Power-user opt-in flags (--include-asset-paths / --include-lore / --include-dev-commentary) for local-only deep dives; README warns against publishing results from these flags.
  • Steam install auto-detect across the 3 canonical Linux locations.
  • --dry-run mode validates Steam install + AES key acquisition + pak readability without extracting; users verify before committing to a full run.
  • install-prereqs.sh does user-level setup (cargo + aesdumpster-rs + repak + Python venv); no sudo.
  • Attribution chain documented per MIT terms in tools/dune-extract/dune_extract/third_party/README.md: AESDumpster (GHFear) → aesdumpster-rs (yuhkix) → repak (trumank).
  • Per-row DataTable parsing (uasset_parser.py) stubbed with NotImplementedError pointing at STATUS.md §Path forward to attempt 4. Row decode is the open workstream — 3 RE steps documented (EncodedPakEntries decode → AES+Oodle UAsset decompress → UAsset binary parse), estimated 1-3 days agent time. Runs locally per user via the tool when complete; never published by us.

The sample catalog — DataExtract/ITEMS/

Pre-baked from the same code path the tool uses, generated against game build 1968181 on 2026-05-25/26. Quick-start for adopters who can't run the tool yet.

  • ITEMS-<Category>-IDs.md (8 files) — 25,618 categorized item-bearing asset stems by tier + dev/test flag (T0-T6 + Unknown + Dev/Test/Cut subsection)
  • ITEMS-<Category>-Paths.md (8 files) — path tree + per-stem pak+directory provenance + 1,725 DataTable files inventoried per category
  • ITEMS-INDEX.md — master index with counts
  • STATUS.md — three-attempt history + UE5.4 FDI quirks documented + attempt-4 RE roadmap

Per-category stem counts (final state, attempt 3):

Category Stems
Weapons 4,304
Vehicles 8,591
Garments 5,205
Augmentations 194
Customization 499
Construction 3,763
Misc 883
Utility 2,134
Total 25,618
of which dev/test/cut 464

IP posture

Sits on the well-precedented fact-vs-expression doctrine (US copyright). What ships: mechanical/functional facts (names, stats, tier, schematic prereqs, dev-flag values) — same class as every major game wiki publishes routinely. What doesn't ship: narrative expression (lore prose), visual / audio assets (we never had these), or asset paths that would serve as a roadmap into binary assets we don't ship. AES black-box operates under DMCA Section 1201(f) interoperability exception + Section 117 essential-step doctrine + Sega v. Accolade (1992) precedent — same legal class as 25+ years of UE-modding tooling.

Files changed

12 commits across three sub-streams:

  • TODO #5 add + 3 status updates (attempt-1 AES-blocked, attempt-2 stems-via-regex, attempt-3 FDI-parser-cracked + strategic pivot)
  • 3 worktree merges (attempt 1 STATUS, attempt 2 + 3 catalog work)
  • 1 tool-scaffold merge (tools/dune-extract/)

Test plan

Verified by agents (no live game runs per project LAW):

  • python3 -m py_compile clean across all 11 Python modules in dune_extract/
  • FieldFilter.allow() passed 12 policy cases covering mechanical include + lore exclude + dev-commentary exclude + asset-path exclude + Description-length + proper-noun + quoted-dialogue + em-dash heuristic + opt-in flag flips
  • enrichment.categorize() / get_tier() / is_dev() smoke-tested
  • uasset_parser.parse_row() raises NotImplementedError with attempt-4 reference
  • python -m dune_extract --version / --help clean
  • python -m dune_extract --dry-run against live install reports 53 paks / 42.4 GB and all 3 external tools found
  • bash -n install-prereqs.sh clean
  • No Co-Authored-By: Claude anywhere
  • AES key never in any tracked file (.claude/.env gitignored, aes.txt scrubbed in aes_acquire.py)

End-user validation TODO (post-merge, on a clean machine):

  • Clone repo + run tools/dune-extract/install-prereqs.sh on fresh Ubuntu — confirm Rust + aesdumpster + repak + Python venv land cleanly without sudo
  • python -m dune_extract --dry-run reports Steam install + key acquisition + pak readability all OK
  • python -m dune_extract produces a catalog matching the in-repo sample (stems + paths + DataTable inventory)
  • Spot-check 3-4 entries per category against the sample
  • Verify field filter strips correctly on a real extraction (no lore, no asset paths, no dev commentary in default-flag output)
## Summary Implements docs/TODO.md #5 — the client-pak item ID enumeration follow-on to FINALIZED #4 — via the **Path C strategic pivot**: ship the **tooling**, not pre-extracted rich data. ### The tool — `tools/dune-extract/` (17 files / +3,122 lines) End-user shippable Python package. MIT-licensed (matches `ubuntu-scripts/LICENSE` posture). Operates entirely on the user's own legally-installed copy of Dune Awakening; never uploads or shares extracted content. Features: - **AES black-box** via `aes_acquire.KeyHandle` — key lives only in process memory, never persisted to disk, never echoed to terminal, never logged. `aesdumpster-rs` `aes.txt` side-effect scrubbed on exit. `bytearray` + zero-on-cleanup. - **Field filter** with defaults landed exactly to user directive: includes mechanical stats / item names / short functional descriptions / tier / schematic prereqs / dev-flag values; excludes asset paths, lore (`FlavorText` / `LoreDescription` / `BackgroundStory` / etc.), Funcom-internal dev commentary (`Comments` / `Notes` / `Author`). Description-length + Dune-proper-noun + quoted-dialogue + em-dash-prose heuristics catch narrative prose even when it lives in a normally-allowed field. Dev items themselves are included; their dev-only commentary is not. - **Power-user opt-in flags** (`--include-asset-paths` / `--include-lore` / `--include-dev-commentary`) for local-only deep dives; README warns against publishing results from these flags. - **Steam install auto-detect** across the 3 canonical Linux locations. - **`--dry-run`** mode validates Steam install + AES key acquisition + pak readability without extracting; users verify before committing to a full run. - **`install-prereqs.sh`** does user-level setup (cargo + aesdumpster-rs + repak + Python venv); no sudo. - **Attribution chain** documented per MIT terms in `tools/dune-extract/dune_extract/third_party/README.md`: AESDumpster (GHFear) → aesdumpster-rs (yuhkix) → repak (trumank). - **Per-row DataTable parsing** (`uasset_parser.py`) stubbed with `NotImplementedError` pointing at `STATUS.md §Path forward to attempt 4`. Row decode is the open workstream — 3 RE steps documented (EncodedPakEntries decode → AES+Oodle UAsset decompress → UAsset binary parse), estimated 1-3 days agent time. Runs locally per user via the tool when complete; never published by us. ### The sample catalog — `DataExtract/ITEMS/` Pre-baked from the same code path the tool uses, generated against game build 1968181 on 2026-05-25/26. Quick-start for adopters who can't run the tool yet. - `ITEMS-<Category>-IDs.md` (8 files) — 25,618 categorized item-bearing asset stems by tier + dev/test flag (T0-T6 + Unknown + Dev/Test/Cut subsection) - `ITEMS-<Category>-Paths.md` (8 files) — path tree + per-stem pak+directory provenance + 1,725 DataTable files inventoried per category - `ITEMS-INDEX.md` — master index with counts - `STATUS.md` — three-attempt history + UE5.4 FDI quirks documented + attempt-4 RE roadmap Per-category stem counts (final state, attempt 3): | Category | Stems | |----------|------:| | Weapons | 4,304 | | Vehicles | 8,591 | | Garments | 5,205 | | Augmentations | 194 | | Customization | 499 | | Construction | 3,763 | | Misc | 883 | | Utility | 2,134 | | **Total** | **25,618** | | of which dev/test/cut | 464 | ### IP posture Sits on the well-precedented **fact-vs-expression** doctrine (US copyright). What ships: mechanical/functional facts (names, stats, tier, schematic prereqs, dev-flag values) — same class as every major game wiki publishes routinely. What doesn't ship: narrative expression (lore prose), visual / audio assets (we never had these), or asset paths that would serve as a roadmap into binary assets we don't ship. **AES black-box** operates under DMCA Section 1201(f) interoperability exception + Section 117 essential-step doctrine + Sega v. Accolade (1992) precedent — same legal class as 25+ years of UE-modding tooling. ### Files changed 12 commits across three sub-streams: - TODO #5 add + 3 status updates (attempt-1 AES-blocked, attempt-2 stems-via-regex, attempt-3 FDI-parser-cracked + strategic pivot) - 3 worktree merges (attempt 1 STATUS, attempt 2 + 3 catalog work) - 1 tool-scaffold merge (`tools/dune-extract/`) ## Test plan Verified by agents (no live game runs per project LAW): - [x] `python3 -m py_compile` clean across all 11 Python modules in `dune_extract/` - [x] `FieldFilter.allow()` passed 12 policy cases covering mechanical include + lore exclude + dev-commentary exclude + asset-path exclude + Description-length + proper-noun + quoted-dialogue + em-dash heuristic + opt-in flag flips - [x] `enrichment.categorize()` / `get_tier()` / `is_dev()` smoke-tested - [x] `uasset_parser.parse_row()` raises `NotImplementedError` with `attempt-4` reference - [x] `python -m dune_extract --version` / `--help` clean - [x] `python -m dune_extract --dry-run` against live install reports 53 paks / 42.4 GB and all 3 external tools found - [x] `bash -n install-prereqs.sh` clean - [x] No `Co-Authored-By: Claude` anywhere - [x] AES key never in any tracked file (`.claude/.env` gitignored, `aes.txt` scrubbed in `aes_acquire.py`) End-user validation TODO (post-merge, on a clean machine): - [ ] Clone repo + run `tools/dune-extract/install-prereqs.sh` on fresh Ubuntu — confirm Rust + aesdumpster + repak + Python venv land cleanly without sudo - [ ] `python -m dune_extract --dry-run` reports Steam install + key acquisition + pak readability all OK - [ ] `python -m dune_extract` produces a catalog matching the in-repo sample (stems + paths + DataTable inventory) - [ ] Spot-check 3-4 entries per category against the sample - [ ] Verify field filter strips correctly on a real extraction (no lore, no asset paths, no dev commentary in default-flag output)
Captures user directives on client location + tool autonomy verbatim.
Follow-on to FINALIZED #4 schema catalog — extract per-item template IDs
from cooked client paks at /home/sponge/.local/share/Steam/steamapps/
common/DuneAwakening/DuneSandbox/Content/Paks/.
Follow-on pass to FINALIZED #4 confirmed all 53 .pak files in the
player-side install are fully AES-encrypted (every block + index +
footer); zero plaintext magic-byte hits across ~40 GB scanned with
repak_cli v0.2.3. Per user directive (deferral on AES-key acquisition),
extraction stops here.

DataExtract/ITEMS/STATUS.md captures the full probe log (repak info on
every priority pak, magic-byte scan across all 53, IoStore absence
confirmed), the structural blocker analysis, the tool inventory left on
the box (repak at ~/.cargo/bin/repak), and four AES-key acquisition
options for the next attempt (in-memory dump, static .rdata scan,
community key DBs, official Funcom comms).

ITEMS-INDEX.md gets one new file-list entry + a short status paragraph
pointing at STATUS.md. All existing schema catalog files are untouched
per the never-delete-info LAW.
Agent confirmed all 53 game-data paks at DuneSandbox/Content/Paks/ are
AES-encrypted (every block, not just index). Zero IDs enumerated. Probe
log + AES-key acquisition options + resumption recipe committed to
DataExtract/ITEMS/STATUS.md. INDEX additively updated to point at STATUS.

Schema catalog from FINALIZED #4 remains the only authoritative source
at this layer until AES key acquired.
User directive: extraction scope is ALL item IDs / names / data including
dev items, test items, cut content — verbatim quote landed in TODO per
LAW #0. First extraction attempt (agent ade66d405bbee5d7b, merge f4848a6)
hit AES encryption on every pak; task stays [~] in_progress pending AES
key acquisition. STATUS.md in DataExtract/ITEMS/ holds the probe log +
resumption recipe.
Attempt 1 (no AES key): blocked at file listing. Attempt 2 (AES key
recovered via aesdumpster-rs against the Win64 shipping binary): the
key is right and the pak footer parses cleanly, but Funcom ships a
non-standard UE5.4 pak-index variant that breaks every Linux tool
tested — repak_cli v0.2.3, CUE4Parse v1.2.2 built from source against
master, and Gildor's umodel (UE 1-4 only).

What IS recoverable without a working pak parser: the FullDirectoryIndex
region of each pak ships unencrypted, allowing every .uasset / .uexp
filename to be enumerated via mmap + binary regex scan. The new
ITEMS-<Category>-IDs.md companion files (Pattern A from docs/TODO #5)
ship the per-category enumeration:

  Weapons:       1,190 asset stems (T0-T6 + Tier Unknown + Dev/Test)
  Vehicles:        443 asset stems
  Garments:      2,446 asset stems
  Augmentations:   113 asset stems (all currently T6)
  Customization:   346 asset stems
  Construction:  3,576 asset stems
  Misc:            724 asset stems
  Utility:         149 asset stems
  Total:         9,030 unique item-bearing asset stems

What is still blocked: per-row DataTable contents — display names,
descriptions, exact stat values, asset references, schematic
prereqs, dev/test flags. These live inside AES-encrypted .uasset
data blocks; their decryption is blocked by the same pak-format
mismatch that breaks the index parser.

STATUS.md updated with the full Attempt 2 probe log, the format-mismatch
analysis (CUE4Parse fails on 'Invalid FString length 4194304' at the
mountpoint-FString read), and three path-forward options: wait for
community FModel patch, reverse-engineer Funcom's index variant, or
in-game memory hook.

INDEX.md updated with the new dual-layer structure (schema layer from
FINALIZED #4 + enumeration layer from this pass) and current-status
section.

No raw pak content, no AES key text, no .extract/ workspace commits.
Stream B posture preserved — summarized intelligence only, derived
from .extract/catalog/per-category.json (gitignored).
AES key recovered + applied, but Funcom ships a non-standard UE5.4 pak-index
variant that repak / CUE4Parse / umodel cannot parse (Invalid FString length
'4194304' at the PathHashIndex). Footer parses cleanly (V11 magic, EncryptionFlag=0,
all-zero EncryptionGuid) — divergence is in the index struct itself.

Workaround: mmap + binary regex on the FullDirectoryIndex region (unencrypted
plaintext in every pak). Across 35 non-media client paks (~13.4 GB scanned):
9,030 unique item-bearing asset stems across all 8 categories with tier
breakdown + 53 dev/test/cut flagged. 8 new ITEMS-<category>-IDs.md companion
files plus updated INDEX and STATUS. Per-row DataTable contents still blocked
on a working custom UE5.4 pak-index parser.
AES key acquired inline via aesdumpster-rs and wired into .claude/.env
(gitignored). Second extraction agent hit a Funcom-specific UE5.4
pak-index variant that breaks repak / CUE4Parse / umodel identically
('Invalid FString length 4194304' at PathHashIndex parse). Workaround:
mmap + regex on the unencrypted FullDirectoryIndex region — 9,030
asset stems enumerated across 8 categories + 53 dev/test/cut flagged.
Per-row DataTable contents (names, descriptions, stats, schematic
prereqs, dev-flag values) still pending. Four unblock paths documented.
Reverse-engineered Funcom's UE5.4 pak-index variant via hand-written Python parser
(`.extract/parse_paks.py`, gitignored — derivative output only ships here). The
structure-level discovery: Funcom ships standard UE5.4 FullDirectoryIndex byte
layout, but with non-standard quirks at the footer + primary-index layer that fool
stock UE-modding tools (repak_cli, CUE4Parse, umodel, ZenTools). The parser handles:

- Two footer-size variants (221 with GUID prefix vs 204 without; detected via
  `size - magic_position`)
- PathHashIndex blob in primary-index region precedes the FDI (reverse of the
  published UE5 spec — fools every stock tool with the now-infamous
  `Invalid FString length '4194304'` error)
- Bogus IndexOffset values in some tiny paks (FDI lives BEFORE the declared offset)
- Multi-block FDI structure in DuneSandbox.pak (two separate FDIs chained,
  totaling 16,463 files)
- Non-4-byte-aligned FDI positions in some smaller paks (Controller.pak, etc.)

Result: ALL 37 non-media client paks parsed cleanly, zero failures, ~2.5 seconds
total parse time.

ZenTools investigation (the original attempt-3 hypothesis) was a structural dead end:
ZenTools extracts cooked packages from IoStore container files (.ucas/.utoc) and
Dune Awakening ships pure legacy .pak with zero IoStore files. Documented in
STATUS.md for posterity.

Net deliverables — eight new ITEMS-<Category>-Paths.md companion-to-companion
enrichment files alongside the existing schema-layer (ITEMS-<Category>.md) and
attempt-2 IDs-layer (ITEMS-<Category>-IDs.md) files:

  Weapons:        4,304 stems  (was 1,190 in attempt 2)
  Vehicles:       8,591 stems  (was 443)
  Garments:       5,205 stems  (was 2,446)
  Augmentations:    194 stems  (was 113)
  Customization:    499 stems  (was 346)
  Construction:   3,763 stems  (was 3,576)
  Misc:             883 stems  (was 724)
  Utility:        2,134 stems  (was 149)
  TOTAL:         25,618 stems  (was 9,030)

Each file ships: full source-pak + directory provenance per stem, the DataTable
files backing the category, production stems grouped by tier T0-T6 + Tier Unknown,
and a heuristic-flagged Dev/Test/Archive/Prototype section. Per LAW: no info
deleted — the attempt-2 ITEMS-<Category>-IDs.md files remain as historical record.

Still blocked: per-row DataTable contents (display names, descriptions, exact
stats, schematic prereqs, authoritative bIsDevOnly/bIsHidden flags). Requires
decoding Funcom's EncodedPakEntries blob (variable-length per-entry encoding)
plus reading + AES-decrypting + Oodle-decompressing each .uasset data block plus
parsing the UAsset binary format. STATUS.md §Path forward documents the gap.

The AES key in `.claude/.env` is NOT needed for FDI parsing (the FDI region itself
is unencrypted in Funcom's format). The parser script and pak-file-map.json
intermediate live in `.extract/` (gitignored per workspace contract).
ZenTools turned out to be a structural mismatch (it extracts cooked packages
from IoStore .ucas/.utoc containers; Dune ships pure legacy pak with zero
IoStore files). But the agent pivoted: hand-wrote a Python parser for
Funcom's UE5.4 FullDirectoryIndex variant. ~2.5 seconds across 37 non-media
paks, zero failures.

Funcom's 5 layout quirks vs stock UE5.4 FDI:
  - Two footer-size variants
  - PathHashIndex before FDI (not MountPoint form)
  - Bogus IndexOffset in some tiny paks
  - Multi-block FDIs in DuneSandbox.pak
  - Non-4-byte-aligned positions in some smaller paks

Stem coverage went from 9,030 → 25,618 (2.8x). Biggest gains: Vehicles
19.4x, Utility 14.3x, Weapons 3.6x, Garments 2.1x. Plus 1,725 DataTable
files inventoried across categories, 464 dev/test/cut heuristic-flagged
stems, full pak+directory provenance per stem, tier-grouped T0-T6.

Per-row DataTable CONTENTS (names, descriptions, exact stats, schematic
prereqs, authoritative bIsDevOnly flags) still pending — three more RE
steps documented in STATUS.md §Path forward to attempt 4
(EncodedPakEntries decode → AES-decrypt + Oodle-decompress UAssets →
UAsset binary parse). 1-3 days estimated.
Captures attempt-3 outcome (Funcom UE5.4 FDI parser cracked, 25,618
stems, parser+enrichment scripts survived in .extract/) plus three
verbatim user directives establishing the strategic pivot:

- ship tooling instead of pre-extracted rich data
- wiki-precedent / facts-vs-expression legal posture
- AES black-box approach (key never user-facing)
- refined field filter: no asset paths, no lore, no Funcom-internal
  dev commentary; dev items themselves are included

Plan landed as Path C: lift .extract/parse_paks.py +
build_enrichment.py into tools/dune-extract/, wrap with CLI + AES
black-box + default field filter, ship pre-baked catalog as a sample
generated by the same tool. Per-row rich data runs locally per
user, never published by us.
Lift attempt-3's .extract/parse_paks.py + .extract/build_enrichment.py
into a packaged, CLI-driven, end-user-shippable tool. Algorithmic logic
preserved byte-for-byte (UE5.4 FDI parsing + Funcom variant quirks +
category routing + dev/test heuristics); shape refactored for argparse
CLI, public package API, and per-format catalog writers.

Layout:
  tools/dune-extract/
    LICENSE                      MIT, scoped to the tool
    README.md                    End-user docs + IP-posture explainer
    install-prereqs.sh           Rust + aesdumpster-rs + repak + Python venv (user-level)
    pyproject.toml               PEP 621 metadata, console-script entry
    examples/usage.md            Invocation examples (dry-run, JSON, CSV, power-user flags)
    dune_extract/
      __init__.py                Public API exports
      __main__.py                argparse CLI + dry-run report
      steam_locator.py           Auto-detect Linux Steam install (3 canonical roots)
      aes_acquire.py             Black-box AES wrapper (in-memory KeyHandle, zero-on-exit,
                                 aesdumpster stdout capture, aes.txt side-effect scrub)
      fdi_parser.py              Lifted FDI parser — 5 Funcom quirks preserved
      enrichment.py              Lifted category routing + tier extraction + dev heuristic
      field_filter.py            Configurable include/exclude policy + Description-length /
                                 Dune-proper-noun lore heuristic
      pak_extract.py             repak wrapper (deferred; v0.1 catalog gen doesn't need it)
      catalog_writer.py          Markdown + JSON + CSV writers
      uasset_parser.py           Stub raising NotImplementedError with attempt-4 reference
      third_party/README.md      Attribution chain (AESDumpster -> aesdumpster-rs -> ours)

Field-filter defaults per user directive 2026-05-26 turn 3:
  INCLUDE — item ID, display name, short mechanical description, mechanical
    stats (Damage/RPM/Range/Weight/Capacity/etc.), tier, schematic prereqs,
    dev-flag VALUES (bIsDevOnly/bIsHidden/bIsTestItem/bIsDeprecated)
  EXCLUDE BY DEFAULT (opt-in via flag for local power-user runs):
    --include-asset-paths     Mesh/Icon/Sound/Texture/Material + /Game/ refs
    --include-lore            FlavorText/BackgroundStory/JournalEntry/Quote/...
    --include-dev-commentary  Comments/Notes/Author/EditNotes/DevNotes/...
  Dev ITEMS themselves: always included. Only the dev *commentary* fields
  on them get stripped.

AES black-box flow:
  1. aesdumpster (yuhkix-rs port) statically scans Win64 shipping binary
  2. stdout captured to in-process buffer, key parsed via regex
  3. aes.txt side-effect file overwritten + unlinked in temp dir
  4. KeyHandle stores key as bytearray, zeros on context-manager exit
  5. Key never touches disk, never logged, never argv-leaked except where
     repak forces --aes-key <HEX> (documented limitation, mitigations listed)

CLI:
  dune-extract [--client-path DIR] [--output-dir DIR] [--categories LIST]
               [--format markdown|json|csv]
               [--include-asset-paths] [--include-lore] [--include-dev-commentary]
               [--dry-run] [--verbose] [--version]

Dry-run verified against the live install: detects 53 paks / 42.4 GB,
finds aesdumpster + repak + Win64 binary, reports per-row parser as
in-development with attempt-4 reference. Real-run path is wired but
out of scope per agent constraints.

UAsset row parsing (attempt 4) is the next surface — three steps remain
documented in DataExtract/ITEMS/STATUS.md and surfaced via
uasset_parser.status() for tooling integration.
End-user shippable extraction tool built on attempt-3's battle-tested FDI
parser. Algorithmic logic preserved byte-for-byte from .extract/parse_paks.py
+ build_enrichment.py; everything else refactored for packaging + CLI.

Highlights:
- AES black-box via KeyHandle (bytearray + zero-on-exit, aes.txt side-effect
  scrubbed, key never persisted to disk)
- Field filter defaults exactly to user's turn-3 directive: no asset paths,
  no lore, no Funcom-internal dev commentary; dev items themselves
  included; Description heuristic catches narrative prose via length +
  proper-noun + quoted-dialogue + em-dash-context checks
- Power-user opt-in flags (--include-asset-paths / --include-lore /
  --include-dev-commentary) for local-only use, warn against publishing
- uasset_parser.py stubbed with NotImplementedError pointing at STATUS.md
  §Path forward to attempt 4 — row-decode is the open workstream
- Steam install auto-detection across 3 canonical Linux roots
- install-prereqs.sh user-level (cargo + aesdumpster-rs + repak + Python venv,
  no sudo)
- Attribution chain documented per MIT terms (AESDumpster -> aesdumpster-rs
  -> repak)
- MIT LICENSE scoped to tools/dune-extract/ matching ubuntu-scripts/ posture

17 files, +3,122 lines. Smoke-tested: py_compile clean, FieldFilter 12 cases
validated, --dry-run against live install reports 53 paks / 42.4 GB and
external tools all found.
Sponge merged commit 757bac5a8b into develop 2026-05-26 16:03:16 +00:00
Sponge deleted branch feature/client-pak-extraction 2026-05-26 16:03:16 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Sponge/Dune-Awakening-Server-Tools!4
No description provided.