Computational analysis of the CDLI Proto-Elamite corpus (1,467 tablets from Susa, ~3100 BCE). Extracts testable knowledge about Proto-Elamite numerical systems, administrative technology, and livestock management.
# 1. Clone with the dataset submodule (sfu-natlang/pe-sign-value-data)
git clone --recurse-submodules https://github.com/MahmoodKhalil57/ProtoElamite
# (already cloned? run: git submodule update --init)
# 2. Verify all 9 primary claims in one command
bun run verify-all.tsExpected output: 9/9 claims PASS in ~0.1 seconds.
The Proto-Elamite sign-value data lives in the
pe-sign-value-data/submodule, pinned tosfu-natlang/pe-sign-value-data(SFU Natural Language Lab).
Nine reproducible, falsifiable claims — all independently verified:
- N45 = 100 in decimal context (not 120 as previously assumed)
- N30C = 3 × N24 (capacity system base ratio)
- Standardized measures 6.1 and 150.1 — canonical vessel sizes
- Dual-system accounting convention — parallel decimal + capacity sub-totals
- Tabular bookkeeping with cross-footing — spreadsheet precursor at ~3100 BCE
- Animal production norm of 1.0 by-product per standard-tier animal (7 data points)
- Three-tier yield hierarchy matching sheep/goat/cattle biology
- Meluhha trade reference on P008239 (me-lu-ha = Indus Valley)
- zu-M003~B authorization formula (13 tablets, standardized closing mark)
- M054+M340 is an accounting identity (honest null result — NOT a recipe)
See CLAIMS.md for full evidence and falsification criteria.
Production recipes (metallurgical ratios, dye mordants, ceramic formulas, agricultural timings). The corpus is administrative — tablets record receipts, distributions, and censuses, not manufacturing processes. See GOAL-ASSESSMENT.md for the honest reframe.
| File | Purpose |
|---|---|
| README.md | This file — project entry point |
| CLAIMS.md | 10 reproducible, falsifiable claims with evidence |
| FINDINGS.md | All 25 detailed findings from the investigation |
| ROADMAP.md | Research roadmap ordered by confidence |
| GOAL-ASSESSMENT.md | What can and cannot be extracted from this corpus |
- verify-all.ts — runs all 9 primary claims end-to-end
- verify-sums.ts — summation verification (Claim 1)
- dual-system-verify.ts — dual-system accounting (Claim 4)
- verify-tabular.ts — tabular bookkeeping (Claim 5)
- verify-twins.ts — cross-tablet household matching
- corpus-verification.ts — corpus-wide summation test
- parse-corpus.ts — ATF parser with bug fixes
- analyze.ts — syllabary application and phonetic reconstruction
- classify-systems.ts — numerical system classifier
- analyze-m388.ts — M388 structural position analysis
- phase2-commodities.ts — commodity profiles + network
- phase3-cluster.ts — M147 cluster + agent tracking
- phase4-recipes.ts — syllabary expansion + Meluhha
- phase4-complex.ts — K-means sign classification
- read-tablets.ts — comprehensive tablet reader
- animal-husbandry.ts — production norms (Claim 6, 7)
- animal-dictionary.ts — M362 compound catalog
- cross-match-herds.ts — three-tablet herd management system
- m054-m340-deep.ts — M054+M340 investigation (Claim 10)
- trade-and-formulas.ts — M325 group + closing formula (Claim 9)
- huha-and-formula.ts — hu-ha name investigation
- derive-capacity.ts — capacity system derivation (Claim 2)
- magic-numbers.ts — recurring values (Claim 3)
- close-matches.ts — non-verifying tablet analysis
- recipe-hunt.ts — scale-invariant ratio hunt (null result)
- recipe-verify.ts — recipe candidate verification (null result)
- synthesis.ts — prosopography + synthesis
- N01 = 1
- N14 = 10
- N45 = 100 (confirmed — corrects prior N45=120 interpretation)
- N34 = 60
- N50 = 1000
- N08 = 0.5 (fraction)
- N24 = 1 (base unit)
- N30C = 3 × N24 (confirmed)
- N30D = 15 × N24 (best hypothesis)
- N39B = 75 × N24 (best hypothesis)
- Base ratios: 1 : 3 : 15 : 75 (factors ×3 × 5 × 5)
- 6.1 =
1(N24) + 2(N30C)— small standard (15 tablets) - 150.1 =
2(N39B) + 1(N24)— large double-measure (44 tablets)
| Sign | Meaning | Source |
|---|---|---|
| M362 | Animal (livestock) | Dahl 2005 |
| M362+X | Animal of household X | Dahl 2005 |
| M036 | Grain container | Dahl 2005 |
| M260–M270 | Beer vessels | Dahl 2005 |
| M269 | Milk/butter/oil containers | Dahl 2005 |
| M288 (pu₂) | Unit marker / totalizer | Our analysis |
| M157 | Document header (general accounts) | Our analysis |
| M388 | Section marker + semantic determiner | Our analysis |
| M106/M106~A | Animal by-product (capacity) | Our analysis + Dahl context |
| M309~A | Standard allocation (always qty=1) | Our analysis |
| M102~E | Standard pair (always qty=2) | Our analysis |
| M206~B | "me" phonetic (in me-lu-ha) | Desset 2022 |
| M301 | "lu" phonetic | Dahl + Desset |
| M263 | "ha" phonetic | Dahl + Desset |
- Bun runtime (v1.3+):
curl -fsSL https://bun.sh/install | bash - CDLI data:
git clone https://github.com/sfu-natlang/pe-sign-value-data
Place this project directory next to pe-sign-value-data/ so scripts can find the corpus.
ProtoElamite/
├── README.md (this file)
├── CLAIMS.md (falsifiable claims)
├── FINDINGS.md (detailed findings)
├── ROADMAP.md (research roadmap)
├── GOAL-ASSESSMENT.md (honest assessment)
├── CLAUDE.md (project instructions)
├── verify-all.ts (unified pipeline — start here)
├── parse-corpus.ts (ATF parser)
├── analyze.ts (syllabary tool)
├── [20+ analysis scripts] (focused investigations)
├── corpus-parsed.json (cached corpus analysis)
└── pe-sign-value-data/ (CDLI corpus, git submodule)
When referring to specific claims, cite the claim number and the verification script:
"N45 = 100 in decimal context (Claim 1, verified by P008031, P008136, P008019 via
verify-all.ts)."
- CDLI corpus: Cuneiform Digital Library Initiative
- Dahl 2005: "Complex Graphemes in Proto-Elamite"
- SFU NatLang Lab: pe-sign-value-data
Research code for academic/educational use. The CDLI corpus data is under its own license (see pe-sign-value-data).