Skip to content

Latest commit

 

History

History
1339 lines (984 loc) · 62.6 KB

File metadata and controls

1339 lines (984 loc) · 62.6 KB

Citations and Data Sources — Indus Decipherment Pipeline

This document lists every primary, secondary, and tertiary data source used in the Glossa-Lab Indus decipherment pipeline (Phase-22 through Phase-29). Each source is cited per the author's prescribed citation format where given, otherwise per standard archaeological / linguistic conventions (Author Year, full reference, publisher, ISBN/DOI, license, URL).

If you use the Glossa-Lab Indus pipeline in academic work, please cite all of the underlying data sources below in addition to the Glossa-Lab project itself.


A. Indus seal and inscription catalogues

A.1 Mahadevan 1977 — The Indus Script: Texts, Concordance and Tables

Used in: Phase-22 baseline corpus reference; Phase-28 Mahadevan-Parpola crosswalk; Phase-29 MahadevanInscriptionLoader (1,669 inscriptions, 5,361 sign tokens).

Mahadevan, Iravatham. 1977. The Indus Script: Texts, Concordance and Tables. Memoirs of the Archaeological Survey of India, No. 77. New Delhi: Archaeological Survey of India. Pp. 825.

  • Available: Internet Archive TheIndusScript.TextConcordanceAndTablesIravathanMahadevan (full 34.6 MB OCR'd PDF, 27,021 views, 49 favorites). https://archive.org/details/TheIndusScript.TextConcordanceAndTablesIravathanMahadevan
  • Mirror: masi77indusscripttextsconcordancestablesiravathammahadevanalt_443_h (alternate upload by Murali Warrier, CC0 1.0).
  • Author's preferred citation (per RMRL Digital Library): "Mahadevan 1977" or "M77" with full reference as above.
  • Posthumous note: Iravatham Mahadevan (1930-2018) was an epigraphist, IAS officer, and India's foremost Indus script scholar. The Roja Muthiah Research Library (RMRL) maintains the canonical collection of his papers at https://rmrl.in/en/dl/research-papers/mahadevan.

BibTeX:

@book{mahadevan1977,
  author = {Mahadevan, Iravatham},
  title = {The Indus Script: Texts, Concordance and Tables},
  series = {Memoirs of the Archaeological Survey of India},
  number = {77},
  publisher = {Archaeological Survey of India},
  address = {New Delhi},
  year = {1977},
  pages = {825}
}

A.2 Joshi & Parpola 1987 — CISI Vol. 1: Collections in India

Used in: Phase-22 contact-zone reference; Phase-25 sign-list cross-checking; Phase-28 iconographic anchor figures.

Joshi, Jagat Pati & Asko Parpola (eds.). 1987. Corpus of Indus Seals and Inscriptions. 1: Collections in India. Annales Academiae Scientiarum Fennicae, Series B, vol. 239. Memoirs of the Archaeological Survey of India, vol. 86. Helsinki: Suomalainen Tiedeakatemia (Finnish Academy of Science and Letters). Pp. xxxii + 392. ISBN 951-41-0555-9. ISSN 0066-2011.

  • Cited as: "CISI 1" or "Joshi & Parpola 1987" per Parpola's prescription (Parpola 1994a, 2010, 2018).
  • Publisher contact: Tiedekirja, Helsinki (https://tiedekirja.fi/).

A.3 Shah & Parpola 1991 — CISI Vol. 2: Collections in Pakistan

Used in: Same as A.2.

Shah, Sayid Ghulam Mustafa & Asko Parpola (eds.). 1991. Corpus of Indus Seals and Inscriptions. 2: Collections in Pakistan. Annales Academiae Scientiarum Fennicae, Series B. Helsinki: Suomalainen Tiedeakatemia. Pp. xxxii + 448. ISBN 951-41-0556-7.

A.4 Parpola, Pande & Koskikallio 2010 — CISI Vol. 3.1: Mohenjo-daro and Harappa

Used in: Phase-26 find-spot map cross-checking; Phase-28 catalogue reference. To acquire: Phase-30 priority. €220 from Tiedekirja or via ILL.

Parpola, Asko, B. M. Pande & Petteri Koskikallio (eds.). 2010. Corpus of Indus Seals and Inscriptions. Volume 3: New material, untraced objects, and collections outside India and Pakistan. Part 1: Mohenjo-daro and Harappa. In collaboration with Richard H. Meadow & J. Mark Kenoyer. Annales Academiae Scientiarum Fennicae, Humaniora 359. Memoirs of the Archaeological Survey of India, vol. 96. Helsinki: Suomalainen Tiedeakatemia. Pp. lx + 444. ISBN 978-951-41-1040-5.

A.5 Parpola & Koskikallio 2019 — CISI Vol. 3.2: Iranian, BMAC, Indus border sites

Used in: Phase-30 target. €160 from Tiedekirja.

Parpola, Asko, B. M. Pande & Petteri Koskikallio (eds.). 2019. Corpus of Indus Seals and Inscriptions. Volume 3.2: Shahr-i Sokhta, Mundigak, Mehrgarh, Nausharo, Sibri, Dauda-damb, Chanhu-daro, Ahar, Balathal, Gilund, Kalibangan, Rojdi. In collaboration with Massimo Vidale, Alessandra Lazzari, Catherine Jarrige, Jean-François Jarrige, Gonzague Quivron, Hélène Trompetent. Annales Academiae Scientiarum Fennicae, Humaniora 383. Helsinki: Suomalainen Tiedeakatemia. Pp. 394. ISBN 978-951-41-1134-1.

A.6 Parpola & Koskikallio 2022 — CISI Vol. 3.3: Indo-Iranian Borderlands

Used in: Phase-28 OCR target (40-page front-matter only — full plates volume needs separate acquisition).

Parpola, Asko & Petteri Koskikallio (eds.). 2022. Corpus of Indus Seals and Inscriptions. Volume 3.3: Indo-Iranian Borderlands. Annales Academiae Scientiarum Fennicae, Humaniora 386. Tampere: Suomalainen Tiedeakatemia. Pp. lxxxvii + 683. ISBN 978-951-41-1153-2.

  • Review: Fuls, Andreas. 2022. "Corpus of Indus Seals and Inscriptions. Volume 3.3 Indo-Iranian Borderlands." Iranian Journal of Archaeological Studies 12(2): 139-143. https://doi.org/10.22111/ijas.2022.45467.1268
  • Includes: Potts, D. T. 2022. "The Graffiti from Tepe Yahya and the Role of the Indo-Iranian Borderlands in the Formation of the Harappan Writing System." Pp. xxvii-xxxiii in CISI 3.3.

A.7 Wells 2015 — The Archaeology and Epigraphy of Indus Writing

Used in: Phase-22 sign list reference; Phase-28 fish-family allograph membership; Phase-30 target for full integration.

Wells, Bryan K. 2015. The Archaeology and Epigraphy of Indus Writing. With technical appendices by Andreas Fuls. Oxford: Archaeopress. Pp. x + 143. ISBN paperback 978-1-78491-046-4. ISBN epublication 978-1-78491-047-1.

A.8 Wells 2006 — PhD thesis: Epigraphic Approaches to Indus Writing

Used in: Phase-22 sign list (676 graphemes); Phase-29 Wells corpus.

Wells, Bryan Kenneth. 2006. Epigraphic Approaches to Indus Writing. PhD dissertation, Harvard University, Department of Anthropology. Pp. ~400.

A.9 Fuls 2022, 2023a — Corpus of Indus Inscriptions (Mathematica Epigraphica vol. 3)

Used in: Phase-29 MathematicaEpigraphicaLoader (no-op default; not yet acquired). Expected: 5,509 inscriptions, 19,616 sign occurrences.

Fuls, Andreas. 2022 (1st ed.) / 2023 (2nd ed.). Corpus of Indus Inscriptions. Mathematica Epigraphica vol. 3. Berlin: Independently published. Pp. 582. ISBN paperback 978-1-67180-486-9.

A.10 Fuls 2023b — A Catalog of Indus Signs (Mathematica Epigraphica vol. 4)

Used in: Phase-30 target — sign list (~700 graphemes) cross-validation.

Fuls, Andreas. 2023. A Catalog of Indus Signs. Mathematica Epigraphica vol. 4. Berlin: Independently published. Pp. ~540. ISBN 979-8-398-42230-6.

A.11 Wells & Fuls — Interactive Corpus of Indus Texts (ICIT)

Used in: Phase-29 ICITCorpusLoader (no-op default; API access by request). Expected: 4,537 objects, 5,509 texts, 19,616 sign occurrences.

Wells, Bryan K. & Andreas Fuls. 2008-present. Interactive Corpus of Indus Texts (ICIT). Online database, TU Berlin. https://www.epigraphica.de/indus/.

A.12 Mahadevan 2003 — Tamil-Brahmi Epigraphy

Used in: Phase-25 typology fit (KL=0.0033 vs Indus); Phase-30 expansion target.

Mahadevan, Iravatham. 2003. Early Tamil Epigraphy: From the Earliest Times to the Sixth Century A.D. Harvard Oriental Series 62. Cambridge, MA: Harvard University Press. ISBN 978-0-674-01227-1.

Mahadevan, Iravatham. 2014. Early Tamil Epigraphy: Tamil-Brahmi Inscriptions. Revised and enlarged 2nd edition, Volume 1. Chennai: Central Institute of Classical Tamil.


B. Sumerian / Akkadian / Mesopotamian corpora

B.1 ePSD2 — Electronic Pennsylvania Sumerian Dictionary 2.7.2

Used in: Phase-29 EPSD2NamesLoader (4,848 entries: 1,222 PN, 2,068 DN, ...).

Tinney, Steve, Philip Jones, Niek Veldhuis, et al. 2017-present. electronic Pennsylvania Sumerian Dictionary 2nd Edition (ePSD2). Version 2.7.2 (released 2024-08-31). Philadelphia: University of Pennsylvania Museum of Archaeology and Anthropology, Babylonian Section. http://oracc.org/epsd2.

  • Citation prescription (per ePSD2 home page): "ePSD2 (URL accessed YYYY-MM-DD)."
  • License: CC BY-SA. The Pennsylvania Sumerian Dictionary Project, 2017-.
  • Bulk download: https://oracc.museum.upenn.edu/json/epsd2-names.zip (4.5 MB ZIP, 37 MB uncompressed gloss-qpn.json).
  • Subset shipped at: backend/glossa_lab/data/epsd2_names_subset.json (842 KB, 4,848 entries).
  • Major credits (per ePSD2 News): Niek Veldhuis (admin/ur3), Philip Jones (admin/ed3b/oakk), Steve Tinney (overall), John Carnahan (Drehem names), Jana Matuszak (DSSt). 50,000+ instances of names total.

B.2 ETCSL — Electronic Text Corpus of Sumerian Literature

Used in: Phase-29 cross-reference; Phase-30 expansion target.

Black, Jeremy A., Graham Cunningham, Jarle Ebeling, Esther Flückiger-Hawker, Eleanor Robson, Jon Taylor, Gábor Zólyomi. 1998-2006. The Electronic Text Corpus of Sumerian Literature (ETCSL). Oxford: University of Oxford, Faculty of Oriental Studies. https://etcsl.orinst.ox.ac.uk.

  • Citation prescription (per ETCSL): "[ETCSL] [Composition] (revised: date)."

B.3 CDLI — Cuneiform Digital Library Initiative

Used in: Phase-22 Meluhha tablet extraction (1,462 tablets); ongoing reference.

Englund, Robert K., Bertrand Lafont, Klaus Wagensonner, et al. 2000-present. Cuneiform Digital Library Initiative (CDLI). Berlin & Los Angeles: CDLI. https://cdli.mpiwg-berlin.mpg.de.

B.4 BDTNS — Sumerian Tablet Database (Madrid)

Used in: Phase-26 cross-reference (via ePSD2 import).

Molina, Manuel et al. 2002-present. Base de Datos de Textos Neosumerios (BDTNS). Madrid: CSIC. http://bdtns.filol.csic.es.

B.5 Vandorpe — Susa Sukkalmah Prosopography

Used in: Phase-30 contact-zone PN expansion target.

Vandorpe, Lieselot. 2015 (PhD)/2019 (publication). East Side Story: Susa under the Sukkalmah Dynasty (1930-1450 B.C.) — A prosopographical study. PhD dissertation, Ghent University. https://www.academia.edu/14695347.

B.6 De Graef — Sukkalmah Susa onomasticon

Used in: Phase-30 reference.

De Graef, Katrien. 2019. "Susa under the Sukkalmah Dynasty: A Society Between the Mountains and the Plain." Pp. 84-110 in Elam in the 2nd Millennium BC. Tübingen: Mohr Siebeck.

B.7 Steinkeller 2013 — Akkadian Susa

Used in: Phase-30 reference for Akkadian-Susa contact period.

Steinkeller, Piotr. 2013. "Puzur-Inšušinak at Susa: A Pivotal Episode of Early Elamite History Reconsidered." In Susa and Elam: Archaeological, Philological, Historical and Geographical Perspectives. Mémoires de la Délégation en Perse.


C. Decipherment hypotheses (primary)

C.1 Parpola 1994a — Deciphering the Indus Script

Used in: Phase-25 phoneme map foundation; Phase-27 anchor scoring; Phase-28 expanded phoneme map; the central Dravidian decipherment hypothesis.

Parpola, Asko. 1994a. Deciphering the Indus Script. Cambridge: Cambridge University Press. Pp. xxiii + 374. ISBN 978-0-521-43079-1.

C.2 Parpola 2010 — Coimbatore paper

Used in: Phase-27 iconographic anchors (12 anchors from figs. 5-23); Phase-28 fish-family allograph extension.

Parpola, Asko. 2010. "A Dravidian solution to the Indus script problem." Coimbatore: Kalaignar M. Karunanidhi Classical Tamil Award lecture (World Classical Tamil Conference, 25 June 2010). Pp. 39.

C.3 Parpola 1981 — Yoke-carrier

Used in: Phase-28 yoke-carrier phoneme entry (kavai).

Parpola, Asko. 1981. "On the Harappan yoke-carrier pictogram and the kavai worship." Pp. ??? in Proceedings of the 5th International Conference on Tamil Studies.

C.4 Parpola 2004, 1985, 2018 — Buffalo, Sky-Garment, Murukan

Used in: Phase-28 buffalo (erumai) phoneme entry; Phase-30 expansion target.

Parpola, Asko. 1985. The Sky-Garment: A study of the Harappan religion and its relation to the Mesopotamian and later Indian religions. Studia Orientalia 57. Helsinki: Finnish Oriental Society.

Parpola, Asko. 2004. "From archaeology to stratigraphy of Vedic syncretism: The banyan tree and the water buffalo as Harappan-Dravidian symbols of royalty." In Vedic Studies (vol. ed. ???).

Parpola, Asko. 2018. "Indus Seals and Glyptic Studies: An Overview." In Seals and Sealing in the Ancient World: Case Studies from the Near East, Egypt, the Aegean, and South Asia. Cambridge: Cambridge University Press. https://researchportal.helsinki.fi/en/publications/indus-seals-and-glyptics-studies-an-overview.

C.5 Mahadevan papers (40+ papers, RMRL Digital Library)

Used in: Phase-25-29 phoneme readings; Phase-28 Murukan signs.

Mahadevan, Iravatham. 1970-2018. Research Papers (40+ papers). Chennai: Roja Muthiah Research Library, Indus Research Centre. https://rmrl.in/en/dl/research-papers/mahadevan.

Key papers used:

    1. "Dravidian Parallels in Proto-Indian Script."
    1. "Study of the Indus Script through Bi-lingual Parallels."
    1. "Towards a Grammar of the Indus Texts: 'Intelligible to the eye, if not to the ears'." Tamil Civilisation 4(3-4): 133-143.
    1. "Murukaṉ in the Indus Script."
    1. "Meluhha and Agastya: Alpha and Omega of the Indus Script."
    1. "Vestiges of Indus Civilisation in Old Tamil."
    1. "Akam and Puram: 'Address' Signs of the Indus Script."
    1. Dravidian Proof of the Indus Script via the Rig Veda: A Case Study. Indus Research Centre Bulletin, Roja Muthiah Research Library.
  • 2017/2018. Toponyms, Directions and Tribal Names in the Indus Script. With M. V. Bhaskar. Oxford: Archaeopress.

C.6 Wells 2015 (already cited as A.7) — Dholavira reading + 17 sign decipherments

C.7 Yajnadevam 2024 — Cryptanalytic Sanskrit decipherment

Used in: Phase-30 falsification round target. Note: The lipi repository corpus data is derived from Fuls' ICIT corpus. Yajnadevam's Sanskrit readings are a separate publication.

Yajnadevam (pseudonym). 2024. "A cryptanalytic decipherment of the Indus script." ResearchGate preprint, November 2024. https://www.researchgate.net/publication/387756000_A_cryptanalytic_decipherment_of_the_Indus_script.

C.8 Mahaveer H. Muhammad — Sindhu Prakrit alphabet

Used in: Phase-30 falsification round target.

H. Muhammad, Mahaveer. 2023. The Alphabet of the Sindhu Prakrit: The decipherment of the Indus Script. Eliva Press.

H. Muhammad, Mahaveer. 2024. "The Decoded Indus Seal M-282." Preprints 2024.07.2105.v1. https://doi.org/10.20944/PREPRINTS202407.2105.V1.

H. Muhammad, Mahaveer. 2024-12-27. "The Indus Elephant Seals: The Royal Terminology of The Indus Valley Civilization." SSRN Working Paper 5073283. https://ssrn.com/abstract=5073283.

C.9 Neukart 2025 — Computational/cosmological reading

Used in: Phase-30 falsification round target.

Neukart, Florian. 2025. "Cracking the Code of the Indus Valley Civilization: A Computational Approach to Lost Knowledge." SSRN Working Paper 5141753. https://ssrn.com/abstract=5141753.

C.10 S. R. Rao 1982 — Sanskrit decipherment

Used in: Phase-30 falsification round target.

Rao, Shikaripur Ranganatha. 1982. The Decipherment of the Indus Script. Bombay: Asia Publishing.

C.11 Bonta 1996, 2010 — Indus Valley script analyses

Used in: Phase-30 reference.

Bonta, Steven. 1996. Topics in the Study of the Indus Valley Script. Brigham Young University.

Bonta, Steven. 2010. The Indus Valley Script: A New Interpretation. Pennsylvania State University.

C.12 Farmer, Sproat & Witzel 2004 — "Collapse of the Indus-Script Thesis"

Used in: Phase-15-style structural rebuttals (anti-decipherment hypothesis).

Farmer, Steve, Richard Sproat & Michael Witzel. 2004. "The Collapse of the Indus-Script Thesis: The Myth of a Literate Harappan Civilization." Electronic Journal of Vedic Studies 11(2): 19-57. http://go.nature.com/vasrw5.


D. Computational + statistical Indus papers

D.1 Rao et al. 2009 — Conditional Entropy

Used in: Phase-15 entropy benchmarks.

Rao, Rajesh P. N., Nisha Yadav, Mayank N. Vahia, Hrishikesh Joglekar, R. Adhikari, Iravatham Mahadevan. 2009. "Entropic Evidence for Linguistic Structure in the Indus Script." Science 324: 1165. https://doi.org/10.1126/science.1170391.

D.2 Sproat 2014 — Critique of Indus-as-language

Used in: Phase-15 falsification reference.

Sproat, Richard. 2014. "A Statistical Comparison of Written Language and Nonlinguistic Symbol Systems." Language 90(2): 457-481.

D.3 Yadav et al. 2010 — Zipf-Mandelbrot

Used in: Phase-22+ Zipf statistics.

Yadav, Nisha, Hrishikesh Joglekar, Rajesh P. N. Rao, Mayank N. Vahia, R. Adhikari, Iravatham Mahadevan. 2010. "Statistical analysis of the Indus script using n-grams." PLOS ONE 5(3): e9506. https://doi.org/10.1371/journal.pone.0009506.

D.4 Ashraf & Sinha 2018 — Direction detection

Used in: Phase-22+ direction normalisation.

Ashraf, Mohammed Imran & Sitabhra Sinha. 2018. "The 'handedness' of language: Directional symmetry breaking of sign usage in words." PLOS ONE 13(1): e0190735. https://doi.org/10.1371/journal.pone.0190735.

D.5 Mukhopadhyay 2019, 2023 — Sign context + structural analysis

Used in: Phase-29 reference for ePSD2 integration plan.

Ansumali Mukhopadhyay, Bahata. 2019. "Interrogating Indus inscriptions through their context, structure and compositional semantics, to understand their inner logic of message conveyance." Palgrave Communications. https://doi.org/10.2139/ssrn.3184583.

D.5b Mukhopadhyay 2020 — Tax tokens, trade licences, metrological records

Used in: Phase-33 semantic reanalysis of fish-sign interpretation. Epistemic status: HIGH — directly challenges the fish-sign = mīn phonetic reading used in Phase-29d.

Ansumali Mukhopadhyay, Bahata. 2020. "Ancient Tax Tokens, Trade Licenses and Metrological Records?: Making Sense of Indus Inscribed Objects Through Script-Internal, Contextual, Linguistic, and Ethnohistorical Lenses." Preprint. https://doi.org/10.2139/ssrn.3538764.

Key claims relevant to Glossa Lab research:

  • Fish signs (M-047, M-306 etc.) signify apotropaic eye-beads (carnelian/agate "fish-eye-beads" documented in Mesopotamian lexicons as NA4-IGI-KU6), NOT the phonetic reading mīn (fish/star) used in Phase-29d reverse-Janabiyah search. This supports the Phase-32 T8 finding (Enmenanak signal NOT SIGNIFICANT).
  • Bird signs signify lapis lazuli / precious stones via the ancient root "Kapautaka" (pigeon-coloured, Sanskrit/Old Persian for lapis lazuli blue).
  • Sign maṇi (bead/eye/gem/amulet) connects Tamil, Sanskrit, and Akkadian maninnu (Amarna letters), providing independent support for the IVC– Mesopotamia trade network hypothesis.
  • 9 structural sign-classes (PF1, PF2, PPF, CM, PCL, NUM, MET, CROP, ENC) with semantic roles in tax-administration — consistent with Phase-30 positional grammar findings.
  • Inscriptions = logographic tax/trade records, NOT phonetic personal-name lists. This is a strong alternative to the reverse-Janabiyah phonetic reading approach.

Citation note: The PDF Ancient_Tax_Tokens_Trade_Licenses_and_Me.pdf was obtained via Academia.edu and is stored locally at C:\Users\trist\Downloads\Ancient_Tax_Tokens_Trade_Licenses_and_Me.pdf. NOT yet in the discovery database — add manually via Settings → Discovery → Import.

D.5c Mukhopadhyay 2023 — Semantic scope of Indus inscriptions (full published paper)

Used in: Phase-33 semantic reanalysis; direct follow-up and peer-reviewed expansion of D.5b (2020 preprint). Epistemic status: HIGH — open-access peer-reviewed; 15k+ accesses, 147 Altmetric score.

Ansumali Mukhopadhyay, Bahata. 2023. "Semantic scope of Indus inscriptions comprising taxation, trade and craft licensing, commodity control and access control: archaeological and script-internal evidence." Humanities and Social Sciences Communications 10: 972. https://doi.org/10.1057/s41599-023-02320-7. Open access.

Key claims relevant to Glossa Lab research:

  • Builds on the 2019 logographic/semasiographic structural analysis and the 2020 tax tokens preprint, providing archaeological cross-validation for the administrative-commercial interpretation of Indus seals and tablets.
  • Seals found near city gates (Harappa), craft workshops (Chanhu-daro), and public buildings (Mohenjo-daro) along with standardized weights → taxation.
  • Sealings on storage containers and "warehouse" chambers (Lothal) → commodity control and licensing roles consistent with Mukhopadhyay 2020.
  • Two-sided tablets: obverse = commercial license type; reverse = fee/quantity notations. Supports logographic NOT phonetic interpretation.
  • Strongly challenges any personal-name (anthroponym/toponym) reading approach; no proper nouns encoded in ISC.
  • Directly supports the Phase-32 T8 finding (Enmenanak personal-name signal NOT SIGNIFICANT under this framework).
  • Local PDF: C:\Users\trist\Downloads\s41599-023-02320-7.pdf

D.6 Daggumati & Revesz 2021 — Allograph detection via positional data mining

Used in: Phase-28 allograph family inspiration; sign-list reduction methodology.

Daggumati, Shruti & Peter Z. Revesz. 2021. "A method of identifying allographs in undeciphered scripts and its application to the Indus Valley Script." Humanities and Social Sciences Communications 8: 50. https://doi.org/10.1057/s41599-021-00713-0. Open access.

Key findings:

  • General positional data-mining method for identifying redundant (allographic) signs using sign position within inscriptions as the discriminating signal.
  • Applied to Indus Valley Script: finds 50 pairs (23 mirrored + 27 non-mirrored) that can be merged, reducing the estimated sign list significantly.
  • Shows multi-directionality of IVS: mirrored signs denote writing direction not semantic difference (except Type 5 grammatical marker cases).
  • Reduces sign count to ~417 unique graphemes; supports decipherment tractability.
  • Local PDF: C:\Users\trist\Downloads\s41599-021-00713-0.pdf

D.6b Daggumati & Revesz 2018 — CNN-based cross-script similarity

Used in: Phase-30+ falsification context; comparative visual script analysis.

Daggumati, Shruti & Peter Z. Revesz. 2018. "Data Mining Ancient Script Image Data Using Convolutional Neural Networks." In Proceedings of the 22nd International Database Engineering & Applications Symposium (IDEAS 2018), Villa San Giovanni, Italy, June 18–20, pp. 1–6. ACM. https://doi.org/10.1145/3216122.3216163.

Key findings relevant to Glossa Lab:

  • Trained CNNs on Phoenician alphabet (22 symbols), Brahmi script (27 symbols), and Indus Valley Script (25 symbols) using 3,552 images (25×25px each).
  • Counterintuitive finding: Indus Valley script symbols are visually CLOSER to Phoenician alphabet than to Brahmi, despite geographic proximity to Brahmi.
    • Phoenician avg. match strength (without duplicates): 0.6546
    • Brahmi avg. match strength (without duplicates): 0.6490
    • 14/22 Phoenician symbols uniquely mapped vs. only 13/27 Brahmi symbols.
  • Provides tentative phoneme assignments for 25 most-frequent Indus signs based on CNN-derived Phoenician nearest-neighbour matches.
  • Supports NW Semitic / Phoenician contact hypothesis (relevant to Fuls NW Semitic falsification experiments).
  • Local PDF: C:\Users\trist\Downloads\3216122.3216163.pdf

D.7 Dixit et al. 2025 — ASR-net + MI-net (Florida Tech)

Used in: Phase-30 multimodal AI integration target.

Dixit, Vaishnavi, Nushrat Hussain, Shubham Basak, Deva Atturu, Debasis Mitra, Ujjwal Bhattacharya. 2025. "Deep Learning in Archiving Indus Script and Motif Information." Journal of Computer Applications in Archaeology 8(1): 156-169. https://doi.org/10.5334/jcaa.175.

D.8 Bhaskar 2024 — Anisotropy in the Indus sign system

Used in: Phase-30 reference.

Bhaskar, M. V. 2024. "Markers and agencies of anisotropy in the Indus sign system." Indian Journal of History of Science 59: 1-27. https://doi.org/10.1007/s43539-023-00102-3.

D.9 Fuls 2013 — Positional analysis

Used in: Phase-22+ positional profiler / clustering.

Fuls, Andreas. 2013. "Positional Analysis of Indus Signs." Voprosi Epigrafiki (Epigrafika) 7(1): 253-275.

D.10 Konasukawa 2020 — Harappa stratigraphy

Konasukawa, A. 2020. "Stratigraphic study of inscribed objects from Harappa." Pp. ??? in Studies on Indus Script. Mohenjodaro: National Fund for Mohenjodaro.

D.11 Tsouparopoulou 2014 — Seal database methodology

Tsouparopoulou, Christina. 2014. "Creating an online database for the documentation of seals, sealings and seal impressions in the Ancient Near East." Studia Orientalia Electronica 2: 37-68.

D.12 Tamburini 2025 — Coupled SA for Ancient Script Decipherment

Used in: Phase-37 — CSA upgrade to Glossa Lab SA engine (k-permutations, chain coupling).

Tamburini, Fabio. 2025. "On automatic decipherment of lost ancient scripts relying on combinatorial optimisation and coupled simulated annealing." Frontiers in Artificial Intelligence 8: 1581129. https://doi.org/10.3389/frai.2025.1581129. Open access (CC BY). Code: https://github.com/ftamburin/CSA_OptMatcher

Key contributions:

  • Coupled SA (CSA): multiple SA chains running simultaneously that communicate periodically, converging to better solutions than independent parallel restarts.
  • k-permutations encoding: allows null mappings (signs left unassigned), one-to-many, and many-to-one mappings between sign sets. More expressive than bijective SA.
  • Fixed-anchor injection: partial knowledge of sign-to-phoneme mappings can be hard-coded as constraints — validates Glossa Lab's anchor approach.
  • Benchmarked on: Ugaritic→Hebrew (29/30 signs correct), Linear B→Mycenaean Greek, Romance language cognate identification. All outperform prior state-of-the-art.
  • Limitations for Indus: method targets bilingual corpora; Indus has no known bilingual text, so language identification remains the primary challenge.

E. Dravidian / South Indian comparative

E.1 Burrow & Emeneau 1984 — DEDR

Used in: Phase-25 Tamil-Brahmi typology fit; Phase-30 phoneme map expansion.

Burrow, Thomas & M. B. Emeneau. 1984. A Dravidian Etymological Dictionary (DEDR), 2nd edition. Oxford: Clarendon Press.

E.2 Krishnamurti 2003 — The Dravidian Languages

Krishnamurti, Bhadriraju. 2003. The Dravidian Languages. Cambridge: Cambridge University Press. ISBN 978-0-521-77111-5.

E.3 Balakrishnan 2019 — Journey of a Civilization: Indus to Vaigai

Used in: Phase-29 reference for IRC engagement.

Balakrishnan, R. 2019. Journey of a Civilization: Indus to Vaigai. Chennai: Roja Muthiah Research Library.

E.4 Joseph 2018 — Early Indians

Joseph, Tony. 2018. Early Indians: The Story of Our Ancestors and Where We Came From. New Delhi: Juggernaut. ISBN 978-93-86228-98-7.

E.5 Rajan & Sivananthan 2025 — TN Archaeology Dept comparative graffiti study

Used in: Phase-29 Tamil Nadu prize context.

Rajan, K. & R. Sivananthan. 2025. Indus Signs And Graffiti Marks of Tamil Nadu - A Morphological Study. Chennai: Tamil Nadu State Department of Archaeology.


F. Bahrain / Dilmun / Persian Gulf contact zone

F.1 Crawford 2001 — Saar seals

Used in: Phase-22 contact-zone reference (catalog of 95 Early Dilmun seals); Phase-30 acquisition target.

Crawford, Harriet. 2001. Early Dilmun Seals from Saar: Art and Commerce in Bronze Age Bahrain. London-Bahrain Archaeological Expedition: Saar Excavation Reports II. Ludlow: Archaeology International. Pp. 110. ISBN 0-9539561-0-5.

F.2 Laursen 2010 — Janabiyah seal #10

Used in: Phase-25 Janabiyah readout; Phase-27 reverse Janabiyah search; Phase-29 ReverseJanabiyahSearchV3 reference.

Laursen, Steffen Terp. 2010. "The westward transmission of Indus Valley sealing technology: origin and development of the 'Gulf Type' seal and other administrative technologies in Early Dilmun, c.2100-2000 BC." Arabian Archaeology and Epigraphy 21(2): 96-134. https://doi.org/10.1111/j.1600-0471.2010.00329.x.

F.3 Frenez 2018, 2020, 2024 — Indus-Bactrian contact

Frenez, Dennys. 2018. "Manufacturing and trade of Asian elephant ivory in Bronze Age Middle Asia: Evidence from Gonur Depe (Margiana, Turkmenistan)." Archaeological Research in Asia 15.

Frenez, Dennys. 2020. "Mirrored signs: Administrative and scriptorial information in the Indus Civilization clay sealings." Pp. 21-38 in Studies on the Indus Script.

F.4 Vidale & Frenez 2015 — Bactrian seals

Vidale, Massimo & Dennys Frenez. 2015. "Indus Components in the Iconography of a White Marble Cylinder Seal from Konar Sandal South (Kerman, Iran)." South Asian Studies 31(1): 144-154.

F.5 Possehl 2006 — Indus-Mesopotamia contact

Possehl, Gregory L. 2006. "Shu-ilishu's Cylinder Seal." Expedition 48(1): 42-43.

F.6 Potts 1990, 2016 — Susa archaeology

Potts, Daniel T. 1990. The Arabian Gulf in Antiquity, Vol. 1: From Prehistory to the Fall of the Achaemenid Empire. Oxford: Oxford University Press.

Potts, Daniel T. 2016. The Archaeology of Elam: Formation and Transformation of an Ancient Iranian State. 2nd ed. Cambridge: Cambridge University Press.


G. BMAC / Iranian / Bactrian contexts

G.1 Lerner 2010 — Bactria-Margiana seals

Lerner, Judith A. 2010. "Observations on the Typology and Style of Seals and Sealings from Bactria and the Indo-Iranian Borderlands." Pp. 245-266 in Coins, Art and Chronology II. Vienna: ÖAW.

G.2 Desset 2020-2022 — Linear Elamite decipherment

Used in: Phase-30 cross-civilizational target.

Desset, François et al. 2022. "The decipherment of Linear Elamite writing." Zeitschrift für Assyriologie und Vorderasiatische Archäologie.


H. Ancient DNA / population genetics

H.1 Narasimhan et al. 2019 — South Asian aDNA

Used in: Phase-30 prior on Dravidian-language hypothesis.

Narasimhan, Vagheesh M., Nick Patterson, Priya Moorjani, Nadin Rohland, Rebecca Bernardos, Swapan Mallick, Iosif Lazaridis, et al. 2019. "The formation of human populations in South and Central Asia." Science 365(6457): eaat7487. https://doi.org/10.1126/science.aat7487.

H.2 Reich 2018 — Who We Are and How We Got Here

Reich, David. 2018. Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past. New York: Pantheon. ISBN 978-1-101-87034-6.

H.3 Shinde et al. 2019 — Rakhigarhi aDNA

Shinde, Vasant, Vagheesh M. Narasimhan, et al. 2019. "An Ancient Harappan Genome Lacks Ancestry from Steppe Pastoralists or Iranian Farmers." Cell 179(3): 729-735. https://doi.org/10.1016/j.cell.2019.08.048.


I. Comparative writing-system papers

I.3 Perna 2014 — Birth of Administration and Writing in Minoan Crete

Used in: Phase-33 comparative methodology; supports administrative-logographic interpretation; parallel evidence for cretulae/sealing system functional to IVC. Epistemic status: HIGH — peer-reviewed book chapter, established Mycenologist.

Perna, Massimo. 2014. "The Birth of Administration and Writing in Minoan Crete: Some Thoughts on Hieroglyphics and Linear A." Chapter 19 in KE-RA-ME-JA: Studies Presented to Cynthia W. Shelmerdine, edited by Dimitri Nakassis, Joann Gulizio, and Sarah A. James. Prehistory Monographs 46. Philadelphia: INSTAP Academic Press. Pp. 251–259. ISBN 978-1-931534-76-5.

Key findings applicable to IVC research:

  • The Minoan cretulae system (clay sealings on storage vessels, door pegs) tracks commodity withdrawals without phonetic content — functions identically to the IVC sealings on Lothal "warehouse" containers (Mukhopadhyay 2023, D.5c).
  • Cretan Hieroglyphic (96 signs, ~350 docs) and Linear A (97 signs, ~1,500 docs, 8,000 tokens) coexisted as functionally independent systems: CH on seals, Linear A on clay tablets. Direct comparative model for IVC's seal-vs-tablet distinction.
  • 90% of Linear A tablets record only short economic entries (1–3 words). 708 flat-based nodules were shaped around folded parchment sheets — evidence that far richer content was written on perishable materials. Strongly supports: IVC inscriptions = durable fraction of a larger administrative system.
  • IVC sign count and token count are within normal range for Bronze Age logographic writing systems (not anomalously large).
  • Local PDF: C:\Users\trist\Downloads\The_Birth_of_Administration_and_Writing.pdf

I.4 Tóth 2018 — Beginnings of the Epigraphical Tradition in India

Used in: Phase-32/33 falsification context; confirms 1,500-year gap between IVC and Brāhmī; relevant to IVC→script descent questions. Epistemic status: MODERATE — doctoral dissertation summary (Eötvös Loránd University), peer-reviewed defence committee.

Tóth, Ibolya. 2018. The Beginnings of the Epigraphical Tradition in India, with special regard to the Cultural Exchange between India and the Hellenistic World. Summary of Doctoral Dissertation. Doctoral School of Linguistics. Budapest: Eötvös Loránd University, Faculty of Humanities.

Key findings applicable to IVC research:

  • Confirms the 1,500-year gap: IVC ends ~1900 BCE; earliest confirmed Brāhmī inscriptions are Aśoka's Rock Edicts (~250 BCE). No epigraphic evidence bridges this gap. Decisive against any IVC→Brāhmī phonetic lineage claim.
  • Kharoṣṭhī derives from Aramaic (Northwest India, Achaemenid period) — NOT from IVC. If IVC encoded NW-Indian languages, the successor script tradition was externally replaced, not organically evolved.
  • Pāṇini (~4th century BCE) mentions lipikara (scribe) and yavanānī (Greek writing) — writing existed before Aśoka on perishable materials. Consistent with the Minoan parchment model (Perna 2014, I.3).
  • Brāhmī's geometric character suggests possible adaptation of a pre-existing script; Indian scholars' connections to IVC are treated as speculative in the literature.
  • Local PDF: C:\Users\trist\Downloads\Summary_of_Doctoral_Dissertation_The_Beg.pdf

I.5 Kalyanaraman (undated) — Nine IVS Inscriptions in the Ancient Near East

Used in: Gulf corpus expansion reference; rebus decipherment alternative hypothesis. Epistemic status: LOW — unpublished working paper, no peer review; advocate of controversial rebus/Meluhha metalwork hypothesis. Use cautiously.

Kalyanaraman, S. (undated). "Nine Indus Script Inscriptions of Ancient Near East, catalogues of metalwork, lapidary work repertoire." Working paper. Available via Academia.edu and ANU Open Research Repository. hdl: 1885/162902.

Points of limited value for Glossa Lab:

  • Identifies Dilmun seals (Bahrain) and Failaka (Kuwait) seal as bearing Indus script hieroglyphs, consistent with Laursen 2010 (F.2) and Gadd 1932 (F.7). The Dilmun seal images discussed by Elisabeth C.L. During Caspers (Proceedings of the Seminar for Arabian Studies 6, 1976, pp. 8–39) are an additional Gulf corpus reference.
  • Gadd Seal 1 from Ur (BM 120573): proposes the cuneiform text reads SAG KUSIDA (Sumerian 'chief' + Meluhha borrowing 'money-lender') with Indus bull field symbol = bharata 'metal alloy'. If correct, this would be the only bilingual IVC-cuneiform seal, highly relevant to the Gulf contact zone hypothesis. Speculative; not verified. Caution: Rebus punning methodology (Munda/Dravidian phonetic punning on pictographic readings) is not mainstream-accepted; etymological chains are long. Do NOT cite as established decipherment.
  • Local PDF: C:\Users\trist\Downloads\Nine_Indus_Script_Inscriptions_of_Ancien.pdf

I.6 Anonymous (undated) — Linguistic Lineages and Lost Scripts

Used in: Phase-32/33 cross-civilizational statistical benchmarking. Epistemic status: LOW — conference paper, no named authors, no peer review.

Anonymous. (undated). "Linguistic Lineages and Lost Scripts: A Cross-Civilizational Analysis of the Indus Valley Script." Conference paper.

Statistical findings (replicable, useful):

  • Based on Fuls (2019) A Catalogue of Indus Signs (~900 signs): Mean count 60.93, SD 129.61, Median 8, Max 867. χ² = 10,157.98 (p < 0.001) against uniform distribution. Confirms structured, non-random sign distribution consistent with a formal writing system. Replicates our Phase-22 Zipf findings with a different dataset.
  • Primary Core Signs (high-frequency) vs. Secondary Composite Signs (rare) classification mirrors the allograph-reduction methodology of Daggumati & Revesz 2021 (D.6).
  • Cross-script comparison: Oracle Bone Script (~50% visual similarity, highest), Egyptian Hieroglyphs (~30%), Proto-Elamite (~20%). Both IVC and OBS share short inscriptions (3–5 signs), core radical repetition, ideographic bases. Convergent function, not historical contact.
  • Seals-as-Geographical-Indication-tags hypothesis aligns with Mukhopadhyay 2023 (D.5c). Caution: Similarity percentages are visual estimates, not formally quantified. Dholavira "pottery" reading is speculative with no linguistic grounding.
  • Local PDF: C:\Users\trist\Downloads\LINGUISTIC_LINEAGES_AND_LOST_SCRIPTS_A_C.pdf

I.7 Steele & Boyes (eds.) — Writing Around the Ancient Mediterranean (CREWS project)

Used in: Phase-33 comparative epigraphy methodology; Cypro-Minoan as IVC parallel; potmarks vs. script sign debates. Epistemic status: HIGH — Cambridge CREWS project, open access CC BY 3.0.

Steele, Philippa M. & Philip J. Boyes (eds.). (undated, ~2021). Writing Around the Ancient Mediterranean: Practices and Adaptations. CREWS (Contexts of and Relations between Early Writing Systems) project. Oxbow Books / Casemate. Open access: https://books.casematepublishing.com/Writing_Around_the_Ancient_Mediterranean.pdf License: CC BY 3.0.

Chapter 4 specifically: "Cypro-Minoan and its potmarks and vessel inscriptions as challenges to Aegean Scripts corpora." Authors from the CREWS team.

Key findings applicable to IVC research:

  • Methodological framework for distinguishing potmarks (makers' marks, possibly non-linguistic) from script inscriptions on pottery — the exact debate for IVC.
  • Methodology for building a unified sign repertory from mixed media (seals, pottery, tablets) with variant forms: directly applicable to IVC corpus construction.
  • Cypro-Minoan (~100 signs, ~230 inscriptions, 1550–1050 BCE, Cyprus trade context) is a high-quality comparandum: undeciphered, short inscriptions (avg. ~5 signs), traded objects as primary carriers. Structural profile matches IVC.
  • The CREWS methodological framework for allograph identification and sign classification mirrors the Phase-32 allograph work (Daggumati & Revesz 2021, D.6).
  • Local PDF: C:\Users\trist\Downloads\Ch_4_Cypro_Minoan_and_its_potmarks_and_v.pdf

I.8 Debating Sign Repertory Construction — SE European signs methodology

Used in: Phase-32/33 sign repertory validation methodology. Epistemic status: MODERATE — scholarly chapter from undetermined larger work.

Anonymous chapter (undated). "Debugging the Process of Building a Repertory of the Southeastern European Signs." Chapter 4, Part I. (~23 pages).

Key findings applicable to IVC research:

  • Explicit methodology for testing a sign repertory by using known scripts (Linear B used as benchmark) to validate the repertory construction process.
  • Southeastern European signs (Vinča culture, ~5000–3500 BCE) as another undeciphered prehistoric sign system — provides temporal and cultural comparanda outside the Near East.
  • The "debugging" approach — iterating a proposed sign list against known-answer systems before applying to unknown — is directly applicable to Phase-32/33 foundation checking. Caution: Author and publication details unknown; treat as methodological reference only.
  • Local PDF: C:\Users\trist\Downloads\Chapter_4_part_I_Debugging_the_process_o.pdf

I.9 Kalyanaraman (undated) — Fish-fin and reed-mollusc signs as Bronze Age mint

Used in: Phase-32/33 fish-sign family reassessment; alternative to Parpola mīn reading. Epistemic status: LOW — unpublished working paper, same author as I.5. Use cautiously.

Kalyanaraman, S. (undated). "Indus Script hieroglyphs 1. fish-fin 'khambhaṛā', 2. reed-mollusc 'eraka-sippi' signify Bronze Age mint." Working paper. (~25 pages).

Points of limited value for Glossa Lab:

  • Fish-fin sign: khambhaṛā → kammaṭa 'mint, coiner, coinage' — a third interpretation of the fish-sign family alongside (1) Parpola phonetic mīn and (2) Mukhopadhyay 2020 apotropaic eye-bead reading. All three agree the signs have commercial/craft significance.
  • Reed-mollusc sign: eraka 'molten metal' + sippi 'casting vessel' → smelting context. Supports the Phase-32 finding that fish-sign Enmenanak signal is NOT SIGNIFICANT under any of the three frameworks. Caution: Same methodological concerns as I.5; rebus chains are long and unverified.
  • Local PDF: C:\Users\trist\Downloads\Indus_Script_hieroglyphs_1_fish_fin_kham.pdf

I.1 Knight & Sproat 2009 — Cipher decipherment

Knight, Kevin & Richard Sproat. 2009. "Writing systems, transliteration and decipherment." NAACL Tutorial.

I.2 Robinson 2015 — Nature commentary

Robinson, Andrew. 2015. "Ancient civilization: Cracking the Indus script." Nature 526: 499-501. https://doi.org/10.1038/526499a.


J. Software / computational frameworks

J.1 PyMuPDF (fitz)

McKie, Jorj X. & contributors. 2016-present. PyMuPDF (fitz): Python bindings for the MuPDF library. https://github.com/pymupdf/PyMuPDF. License: AGPL.

J.2 Mistral AI (pixtral-12b-2409)

Used in: Phase-28 OCR.

Mistral AI Team. 2024. Pixtral-12B-2409 vision-language model. https://mistral.ai/. License: commercial API access.

J.3 ReportLab

Used in: Phase-26-29 PDF generation.

ReportLab Inc. ReportLab Open Source. https://www.reportlab.com/.


K. CC0 / Wikipedia-style sources

K.1 Wikipedia (where used for scratch reference, not as primary citation)

Wikipedia contributors. 2026. Various articles on Indus Valley Civilization, Mahadevan, Parpola, Mohenjo-daro, Harappa, Ur III, etc. License: CC BY-SA 3.0.


L. Glossa-Lab project (cite the project itself)

Pierson, Tristen Kyle & contributors. 2026. Glossa-Lab: An agentic computational linguistics research platform for statistical analysis and decipherment of ancient writing systems. BitConcepts LLC. GitHub: https://github.com/BitConcepts/glossa-lab. License: MIT (source code); CC BY 4.0 (research outputs in research/indus/).


License compatibility

Source License Compatible with redistribution?
Mahadevan 1977 (M77) Public domain (Indian Government, ASI) Yes
ePSD2 names CC BY-SA Yes (with attribution + ShareAlike)
CDLI tablets CC BY-NC-SA 3.0 Yes (non-commercial only)
Crawford 2001 PDF Copyrighted, but freely uploaded to archive.org Reference only
Wells 2006 PhD Open access at archive.org Yes
Wells 2015 Copyrighted, Archaeopress License purchase required
CISI 1, 2, 3.1, 3.2, 3.3 Copyrighted, Suomalainen Tiedeakatemia Reference only
Parpola 1994a Copyrighted, CUP Reference only
Parpola 2010 Open conference paper Reference + cite
Mahadevan papers (RMRL) Released by RMRL with attribution Yes (with attribution)
Fuls vol. 3, vol. 4 Copyrighted, Independently published Purchase required
ICIT database Restricted (TU Berlin) API access only
Yajnadevam 2024 ResearchGate preprint Reference only
Narasimhan 2019 Science article Subscription
Shinde 2019 Cell article Subscription
ETCSL CC BY-SA Yes
ICIPS / RMRL Various Yes (with attribution)

How to cite the Glossa-Lab Indus pipeline

If you use any of the Phase-22 through Phase-29 pipeline outputs in academic work, please cite:

  1. The Glossa-Lab project itself (see Section L).
  2. Mahadevan 1977 if you use the M77 corpus or M77 sign codes.
  3. Parpola 1994a + Parpola 2010 if you use the phoneme map or iconographic anchors.
  4. ePSD2 (Tinney et al.) if you use the Sumerian/Akkadian names corpus (license: CC BY-SA — your work must also be CC BY-SA).
  5. Joshi & Parpola 1987 + Shah & Parpola 1991 + Parpola, Pande & Koskikallio 2010, 2019, 2022 if you use any CISI sign sequences or find-spot data.
  6. Laursen 2010 if you use the Janabiyah seal #10 reading.
  7. Crawford 2001 if you use Saar seal data.
  8. Wells 2015 / Wells 2006 / Fuls 2022, 2023 if you use the Wells/Fuls sign list or ICIT corpus.
  9. CDLI for any cuneiform tablet citations.

If you use the Enmenanak or Enheduana finding from Phase-29, please note that the structural fit is suggestive but not yet statistically significant; cite the Phase-29 synthesis at reports/phase29_synthesis.md and the underlying ePSD2 + Parpola sources.


Acknowledgements (per-author attribution)

We are indebted to the following scholars whose work directly enabled this pipeline (alphabetical):

  • R. Balakrishnan — Indus Research Centre, Mahadevan's successor.

  • Dennys Frenez — Indus-BMAC contact-zone studies.

  • Andreas Fuls (TU Berlin) — ICIT database, Mathematica Epigraphica vols. 3-4, positional analysis methodology, anchor-amplification methodology.

  • Philip Jones — ePSD2 admin/oakk + admin/ed3b + Penn Sumerian Dictionary.

  • J. Mark Kenoyer — Harappa archaeological project (HARP).

  • Petteri Koskikallio — CISI Vol. 3.1, 3.2, 3.3 co-editor.

  • Steffen Terp Laursen — Bahrain/Dilmun seal corpus, Janabiyah reading.

  • Iravatham Mahadevan (1930-2018)In memoriam. The foundational M77 concordance, Tamil-Brahmi epigraphy, 40+ Indus-script papers.

  • Richard H. Meadow — HARP co-director.

  • Asko Parpola (Helsinki) — Decipherment hypothesis, CISI editor, phoneme map source. Helsinki research portal: https://researchportal.helsinki.fi/.

  • B. M. Pande — CISI Vol. 3 co-editor.

  • Daniel T. Potts — Tepe Yahya, Susa archaeology.

  • K. Rajan + R. Sivananthan — Tamil Nadu State Department of Archaeology.

  • Steve Tinney — Penn Sumerian Dictionary lead.

  • Niek Veldhuis — ePSD2 admin/ur3 + literary corpora.

  • Massimo Vidale — Indus-Iranian seal studies.

  • Bryan K. Wells — ICIT, Wells sign list, Wells 2006 PhD + Wells 2015 book.

  • Roja Muthiah Research Library (Chennai) — open access to Mahadevan's papers.

  • Tata Institute of Fundamental Research (TIFR) — Yadav, Vahia, Joglekar, Adhikari, Rao computational analyses.

  • University of Pennsylvania Museum — ePSD2 hosting.


Last updated: 2026-04-30. Maintained as part of the Glossa-Lab Indus decipherment pipeline. Any errors of attribution are the project's own. Please open an issue at https://github.com/BitConcepts/glossa-lab if you find any.


A.14 Seth & Kharakwal 2023 — Harappan Script Material from Kanmer, Gujarat

Used in: Phase-32/33 peripheral IVC corpus; site-specific sign data from Rann of Kutch. Epistemic status: HIGH — peer-reviewed chapter in 2023 academic volume, CC BY 4.0 Open Access.

Seth, Hansmukh & J. S. Kharakwal. 2023. "Harappan Script Material from Kanmer, Gujarat." In Research on Indus Civilization in the Wake of Hundred Years of Excavation at Harappa, edited by [editors]. Thiruvananthapuram: Department of Archaeology, University of Kerala. CC BY 4.0 Open Access. Local PDF: C:\Users\trist\Downloads\Indus_Script_Material_from_Kanmer_Gujara.pdf

Key value:

  • Provides inscription data from Kanmer, a coastal IVC site in Rann of Kutch, Gujarat (~2500–1800 BCE), excavated 2005–2012 by MS University Vadodara + Deccan College.
  • Kanmer is geographically peripheral (Gujarat coast, near Arabian Sea trade routes) — its sign usage pattern may reflect regional variation relevant to Phase-32 T3 bigram transition analysis.
  • 2023 publication = post-ICIT — may include inscriptions not in M77 or Wells/Fuls corpus.
  • The parent volume covers the Harappa centenary (100 years since 1924 discovery), providing historiographic breadth alongside the corpus data.
  • License: CC BY 4.0 — freely reusable with attribution.

A.13 Holdat LLC / Miller 2025 — Computational Indus Corpus

Used in: V8-V24 autonomous decipherment campaign; foundation check corpus verification; all analyses in backend/scripts/v8_autonomous_loop.py and v18_autonomous_loop.py.

Miller, William (Sr) (publishing as Holdat LLC / WILL DA BEATZ HOLDAT). 2025. Indus Valley Script: Computational Evidence for the Minimal Grammar Hypothesis. GitHub: holdatllc/indus_valley_repo. License: see repository LICENSE file.

  • Corpus: 1,670 seals, 7,002 sign tokens, 9 sites, 390 distinct sign IDs (Mahadevan M-numbers).
  • Fields: cisi_number, site, iconography, position, letters (M-number), semantic role annotations.
  • Important caveat: Independent computational study using Mahadevan M-sign numbering. Sign transcriptions have NOT been independently verified against CISI photographic plates. All sign identifications derive from Mahadevan 1977 (A.1) as processed by Miller. Use with caution for publication-grade work; cross-check against CISI plates before publishing any specific sign readings.
  • Acknowledgement: William Miller Sr authored the structural constraint model and provided the corpus CSV that powers the Holdat-based analysis phases.

F.7 Gadd 1932 — Seals of Ancient Indian Style Found at Ur

Used in: Gulf corpus acquisition plan (Laursen Table 1 nos. 16-21, 23-24); foundation check western Gulf corpus discussion.

Gadd, C. J. 1932. "Seals of Ancient Indian Style Found at Ur." Proceedings of the British Academy 18: 191-210. Archive scan: https://archive.org/download/in.gov.ignca.33779/33779_text.pdf

  • Identifies and illustrates 17 Indian-style seals from Ur excavations (1923-1930).
  • Confirmed museum numbers: BM 123208 (U.17649), Penn U.8685, BM 120228, BM 123059.

F.8 Kjærum 1983 — Failaka/Dilmun Seals

Used in: Gulf corpus (Laursen Table 1 nos. 12-13).

Kjærum, Poul. 1983. Failaka/Dilmun: The Second Millennium Settlements. Jutland Archaeological Society Publications XVII. Moesgaard. — Cat. nos. 279, 319 (Failaka seals with Indus-script inscriptions).

F.9 Kjærum 1994 — Qala'at al-Bahrain Seals

Used in: Gulf corpus (Laursen Table 1 nos. 6-7).

Kjærum, Poul. 1994. "Seals of 'Dilmun-type' from Failaka, Kuwait." In D.T. Potts, H.A. Al-Naboodah & P. Hellyer (eds.), Archaeology of the United Arab Emirates. London: Trident. — Figs. 1725, 1726 = Qala'at al-Bahrain seals.

F.10 Al-Sindi 1999 — Bahrain National Museum Seals

Used in: Gulf corpus (Laursen Table 1 nos. 8-9, 56).

Al-Sindi, Khalid M. 1999. Corpus of Seals from the Bahrain National Museum. Manama: Bahrain National Museum. — Nos. 160 (BBM 20362), 180 (BBM 18839), 182 (Saar/Karzakkan cemetery pieces).


E.6 Dravidian.py Derived Corpus (Glossa Lab, 2026)

Used in: v8/v18 autonomous loops (PDR phoneme inventories); Phase-32 T4 Tamil LM; P30-E1 Sanskrit falsification comparison.

Glossa Lab contributors. 2026. dravidian.py: Old Tamil / Proto-Dravidian phoneme inventory and attested vocabulary. File: backend/glossa_lab/data/dravidian.py GitHub: https://github.com/BitConcepts/glossa-lab

  • Data sources (all credited):
    • DEDR (Burrow & Emeneau 1984, E.1) — etymological roots and reconstructions
    • Parpola 1994a (C.1) — Dravidian rebus phoneme map
    • Parpola 2010 (C.2) — iconographic anchor phoneme assignments
    • Sangam Tamil literature (E.3) — attested Old Tamil personal names and vocabulary
    • Krishnamurti 2003 (E.2) — phonological system
  • Vocabulary: 1,740 entries (dict: Tamil word → gloss) + 2,155 attested forms
  • Corpus inscriptions: 1,297 Old Tamil inscription sequences (character level)
  • Important caveat: This is a DERIVED corpus compiled from the above sources. It is not independently peer-reviewed. Every entry is traceable to DEDR, Sangam, or Parpola. The phoneme inventories (PDR_INITIALS, PDR_MEDIALS, PDR_TERMINALS) use Classical Old Tamil forms, not phonologically reconstructed Proto-Dravidian. Label "OldTamil" or "Classical Tamil" in any publication; do NOT call it "PDR."

Citation Requirements Standard (v2, 2026-05-11)

Every data file, corpus file, and report generated by Glossa Lab MUST include citation metadata. This section defines the required format.

_citation block format (JSON files)

"_citation": {
  "primary_sources": ["A.1", "A.10"],
  "derivation": "Computed from Mahadevan M77 + Holdat corpus v2025",
  "authors_credited": [
    "Mahadevan, Iravatham (1977) — sign system",
    "Miller, William Sr / Holdat LLC (2025) — corpus transcription"
  ],
  "year_data": "1977/2025",
  "license": "M77 public domain (ASI); Holdat see repository LICENSE",
  "glossa_lab_version": "2026-05-11",
  "see_also": "CITATIONS.md sections A.1, A.13"
}

Required for every report/output file

Every reports/*.json file MUST have one of:

  1. "_citation" key with at least primary_sources and derivation
  2. "citations" array listing CITATIONS.md section IDs (e.g., ["A.1", "C.2"])
  3. Reference to foundation_check_report.json which covers the full citation audit

Required for every script

Every Python script that loads external data MUST have a docstring citing:

  • Author(s) of the data
  • CITATIONS.md section ID(s)
  • Any derivation from multiple sources

Enforcement

See AGENTS.md Rule H19 (Citation Required) for enforcement policy. The foundation check (GET /api/v1/research/foundation-check) verifies that key data files have _citation metadata.


Acknowledgements additions (2026-05-11 update)

Additional acknowledgements since the last update:

  • William Miller Sr (Holdat LLC) — Independent computational analysis of 1,670 Indus seals; structural constraint model and minimal grammar hypothesis; provides the primary digital corpus for the V8-V24 distributional campaigns.
  • Iravatham Mahadevan (1930-2018) (additional note) — The dravidian.py vocabulary and the M77-based analyses are ultimately grounded in Mahadevan's 50 years of epigraphic work. His Tamil-Brahmi corpus (2003) is the only direct parallel corpus available to this project.
  • Burrow, Thomas & Emeneau, Murray Barnson — The DEDR (1984) underlies every Proto-Dravidian / Old Tamil phoneme assignment in this project.
  • Sangam poets (collectively, ~300 BCE–300 CE) — The Old Tamil attested vocabulary in dravidian.py derives from their inscriptions.

Last updated: June 2026. For attribution concerns contact tpierson@bitconcepts.tech — we respond within 48 hours. See also ATTRIBUTION.md.


Section I — ICIT-Scale Indus Corpus Reconstruction Sources (2026-05-14)

This section covers all sources acquired or planned for the ICIT-scale Indus corpus reconstruction project. Branch: corpus/icit-scale-reconstruction. All sources are cited per H18. Rights classes and acquisition status are tracked in glossa-corpus/indus/sources/*/provenance.yaml.

I.1 — mayig/indus-valley-script-corpus (GitHub, MIT)

  • Author: mcskware (GitHub: mayig)
  • Title: Indus Valley Script Corpus — digitization of CISI in JSON format
  • Repository: https://github.com/mayig/indus-valley-script-corpus
  • License: MIT
  • Accessed: 2026-05-14
  • Coverage: 179 Mohenjo-daro inscriptions (Parpola P-numbers); full repo may contain additional sites
  • Original corpus: Parpola, A. et al. (1987-2010). Corpus of Indus Seals and Inscriptions, Vols. 1-3. Suomalainen Tiedeakatemia, Helsinki.
  • Used in: indus_cisi.py (existing), corpus_indus_objectize.py (expanded), indus_corpus_v2.py
  • Rights gate: ML training OK (MIT); redistribution OK with attribution

I.2 — The Metropolitan Museum of Art Open Access

  • Author: The Metropolitan Museum of Art
  • Title: Met Museum Collections Open Access API
  • URL: https://collectionapi.metmuseum.org
  • License: CC0 (public domain objects)
  • Accessed: 2026-05-14 (ongoing)
  • Search endpoint: /public/collection/v1/search?hasImages=true&q=Indus+Valley
  • Object endpoint: /public/collection/v1/objects/{objectID}
  • Used in: corpus_indus_acquire_free.py, corpus_indus_objectize.py
  • Rights gate: CC0 objects: ML training OK, redistribution OK. Per-object isPublicDomain flag must be verified.

I.3 — Cleveland Museum of Art Open Access API

  • Author: Cleveland Museum of Art
  • Title: Cleveland Museum of Art Open Access
  • URL: https://openaccess-api.clevelandart.org
  • License: CC0 (public domain artworks)
  • Accessed: 2026-05-14 (ongoing)
  • API endpoint: https://openaccess-api.clevelandart.org/api/artworks/?q=indus
  • Sample object: https://www.clevelandart.org/art/1973.160
  • Used in: corpus_indus_acquire_free.py, corpus_indus_objectize.py
  • Rights gate: CC0 public-domain objects: ML training OK, redistribution OK. Unrestricted metadata and images via API.

I.4 — Penn Museum Collections Open Data

I.5 — Indian Culture Portal (Government of India)

I.6 — Roja Muthiah Research Library / Indus Research Centre

  • Author: Roja Muthiah Research Library, Chennai; Indus Research Centre
  • Title: RMRL Indus Script Portal + Bulletins (Mahadevan 1977 concordance based)
  • URL: https://rmrl.in/en/irc
  • Portal: https://indusscript.in
  • Bulletins: https://rmrl.in/bulletin/bulletin-No-{1-6}-{date}.pdf
  • License: RMRL research use — contact required for concordance export
  • Accessed: 2026-05-14
  • Rights class: rmrl-research — research use; redistribution requires contact
  • Note: RMRL states a new expanded concordance is in development — highest priority institutional contact.
  • Used in: corpus_indus_acquire_free.py, future concordance cooperation
  • Rights gate: Research use; contact RMRL (https://rmrl.in/en/irc) before any export or redistribution.

I.7 — Museums of India Repository

  • Author: Ministry of Culture, Government of India; C-DAC
  • Title: Museums of India Repository
  • URL: https://www.museumsofindia.gov.in/repository/
  • Endpoints:
    • Museum list: https://museumsofindia.gov.in/repository/collection/musuemList
    • Search: https://museumsofindia.gov.in/repository/search-api
  • License: Restrictive assumed; rights tracked per record
  • Rights class: india-museum-restricted — discovery and metadata reconciliation only
  • Used in: corpus_indus_acquire_free.py (metadata discovery only)
  • Rights gate: No ML training or redistribution without explicit per-record rights clearance.

I.8 — Internet Archive — Indus Script IIIF Sources

  • Author: Various (Mahadevan 1977 scan; corpus-vol-2 scan; other scans)
  • Title: Internet Archive IIIF — Indus Script OCR sources
  • URL: https://archive.org
  • Key items:
    • archive.org/details/TheIndusScript.TextConcordanceAndTablesIravathanMahadevan
    • archive.org/details/corpus-vol-2
  • IIIF manifest pattern: https://iiif.archive.org/iiif/{identifier}/manifest.json
  • License: Varies per item; treat as derivative fallback
  • Rights class: internet-archive-derivative — OCR seeding only
  • Used in: corpus_indus_acquire_free.py (IIIF manifests for OCR seed data)
  • Rights gate: Do NOT canonicalize readings without reconciliation against official editions. Do NOT include in ML training or redistribution.

Section I added: 2026-05-14. Branch: corpus/icit-scale-reconstruction.