Phase E hygiene: iron out the quirks (8 fixes + sample regen) #12
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feature/dune-extract-quirk-fixes"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
User goal: iron out the quirks. Closes the 8 ironable items from the post-Phase-D test+validate sweep. Doesn't fully complete E1/E2/E3 (those need broader scope) but lifts the sharpest-edged issues.
Fixes (8 + 2 bonuses)
FieldFilterrecurses into nested struct values —--include-Xflags now actually strip nested-struct keysfield_filter.pyFGameplayTagContainer/FGameplayTag/FPrimaryAssetId/FSoftClassPath/FSoftObjectPath(eliminates_raw_for the common cases)uasset_parser.pyCompressionBlockSizedecoded asstored_value >> 8(Funcom'sactual << 8encoding)pak_extract.py_CatalogJSONEncoder— tuples→arrays, sets→sorted, bytes→hex, dataclasses→dictscatalog_writer.pyitems-rows.csvgainsvaluecolumn (raw JSON-encoded value) alongsidevalue_displaycatalog_writer.py2026-05-28T07:39:06Z)catalog_writer.py--client-pathfail-fast validation — checks for paks before proceeding__main__.pycomp_method > 6as garbage (fixes spurious entries fromST_Localization_Dialogueresync)pak_extract.pyST_Localization_Dialogue.uexpinvestigated — non-standard chunk header layout; other 10 ST_* tables still parse (27,816 key→English pairs)[ ]→[x](shipped in PR #10 but the checkbox wasn't toggled)DECOMPOSED.mdSample catalog regenerated
DataExtract/ITEMS/re-emitted with all fixes baked in:ITEMS-Loot-Tables.mdshowsRequiredTags={ tags=[] }, ForbiddenTags={ tags=[] }(was=_raw_)ITEMS-INDEX.mdheader:Generated by dune-extract v0.1.0 on 2026-05-28T07:39:06Z(UTC)Validated
Deferred (still documented as known limitations)
FGameplayTagQuery(TagDictionary/QueryTokenStream) — more involved than the simple containers, deferred.ST_Localization_Dialogue— non-standard chunk header, deferred.Test plan
User: "iron out the quirks" — closes the 8 ironable items from the post-Phase-D test+validate sweep. Doesn't fully complete E1/E2/E3 (those need broader work) but lifts the sharpest-edged issues. Fixes: 1. FieldFilter recurses into nested struct values --include-asset-paths / --include-lore / --include-dev-commentary now actually strip those buckets from inside StaticData = { ... } and similar nested tagged-structs. Pseudo-dict VALUES (FText {_string_table, _key}, SoftObject, Object, raw fallbacks) are recognised by their `_`-marker keys and never recursed into. 2. Native struct decoders for common types previously emitting _raw_ - FGameplayTagContainer -> {tags:[...]} - FGameplayTag -> {tag:...} - FPrimaryAssetId -> {type, name} - FSoftClassPath / FSoftObjectPath -> {_soft_object, _sub} Top _raw_ culprits before: TagDictionary (3312), QueryTokenStream (3312), RequiredTags (1656), ForbiddenTags (1656), ItemTemplateId (714). RequiredTags / ForbiddenTags / GameplayTag / SoftObjectPath now decode. FGameplayTagQuery internals (TagDictionary / QueryTokenStream) remain _raw_ — Phase E follow-on. 3. CompressionBlockSize decoded properly as stored_value >> 8 (Funcom encodes per-block uncompressed size as actual << 8 in the in-pak header). Previously hardcoded UE-default 64 KiB — empirically the same value but now derived from the on-disk field. Future-proofs against builds where Funcom changes block size. 4. JSON output uses _CatalogJSONEncoder instead of default=str. - tuples -> arrays (was Python-repr strings) - sets -> sorted arrays - bytes -> hex strings - dataclasses -> dicts (via asdict) - Path -> str 5. items-rows.csv gains a `value` column carrying the raw JSON-encoded value (programmatic consumers) alongside the existing display-formatted column (renamed to value_display — ✓/✗ bools, truncated strings, etc., for spreadsheet humans). 6. Catalog timestamps now UTC ISO-8601 (2026-05-28T07:39:06Z) instead of local-time strftime. Two extracts from different timezones produce stable headers that diff cleanly. 7. --client-path fail-fast validation — checks the target actually contains Dune Awakening paks via _has_paks() before proceeding. Catches typos in ~10 ms instead of failing 3 seconds later. 8. Data-area walker rejects entries with comp_method > 6 as garbage. Previously the resync after a non-standard chunk (ST_Localization_Dialogue.uexp's unknown chunk header) yielded a spurious entry with comp_method=12,632,304. Now those are correctly skipped. + ST_Localization_Dialogue.uexp investigated: uses a non-standard chunk header (leading 32-byte table of offsets we don't decode). The other 10 ST_* tables parse cleanly (27,816 key->English pairs); Dialogue keys remain unresolved. Documented; cracking the format is a Phase E follow-on if dialogue text becomes important. + D3 heading [ ] -> [x] correction (shipped in PR #10 but the checkbox marker wasn't toggled). Sample catalog regen: DataExtract/ITEMS/ re-emitted with all fixes baked in. ITEMS-Loot-Tables.md shows RequiredTags/ForbiddenTags decoding to {tags=[]} (was =_raw_). ITEMS-INDEX.md header now reads "2026-05-28T07:39:06Z" UTC. Validated: - py_compile across all modules: clean - B1 / B2 / B3 probes: identical to prior pass - Localization.pak walker: 21 entries (was 22 with bogus #7); 0 entries with comp_method > 6 - FieldFilter recursion test: nested Icon/Mesh stripped when --include-asset-paths is off, kept when on - JSON encoder smoke: tuple/bytes/Path/dataclass/set all clean - --client-path /tmp (no paks): fail-fast error in ~10 ms