This document lists every primary, secondary, and tertiary data source used in the Glossa-Lab Indus decipherment pipeline (Phase-22 through Phase-29). Each source is cited per the author's prescribed citation format where given, otherwise per standard archaeological / linguistic conventions (Author Year, full reference, publisher, ISBN/DOI, license, URL).
If you use the Glossa-Lab Indus pipeline in academic work, please cite all of the underlying data sources below in addition to the Glossa-Lab project itself.
Used in: Phase-22 baseline corpus reference; Phase-28 Mahadevan-Parpola crosswalk; Phase-29 MahadevanInscriptionLoader (1,669 inscriptions, 5,361 sign tokens).
Mahadevan, Iravatham. 1977. The Indus Script: Texts, Concordance and Tables. Memoirs of the Archaeological Survey of India, No. 77. New Delhi: Archaeological Survey of India. Pp. 825.
- Available: Internet Archive
TheIndusScript.TextConcordanceAndTablesIravathanMahadevan(full 34.6 MB OCR'd PDF, 27,021 views, 49 favorites). https://archive.org/details/TheIndusScript.TextConcordanceAndTablesIravathanMahadevan - Mirror:
masi77indusscripttextsconcordancestablesiravathammahadevanalt_443_h(alternate upload by Murali Warrier, CC0 1.0). - Author's preferred citation (per RMRL Digital Library): "Mahadevan 1977" or "M77" with full reference as above.
- Posthumous note: Iravatham Mahadevan (1930-2018) was an epigraphist, IAS officer, and India's foremost Indus script scholar. The Roja Muthiah Research Library (RMRL) maintains the canonical collection of his papers at https://rmrl.in/en/dl/research-papers/mahadevan.
BibTeX:
@book{mahadevan1977,
author = {Mahadevan, Iravatham},
title = {The Indus Script: Texts, Concordance and Tables},
series = {Memoirs of the Archaeological Survey of India},
number = {77},
publisher = {Archaeological Survey of India},
address = {New Delhi},
year = {1977},
pages = {825}
}Used in: Phase-22 contact-zone reference; Phase-25 sign-list cross-checking; Phase-28 iconographic anchor figures.
Joshi, Jagat Pati & Asko Parpola (eds.). 1987. Corpus of Indus Seals and Inscriptions. 1: Collections in India. Annales Academiae Scientiarum Fennicae, Series B, vol. 239. Memoirs of the Archaeological Survey of India, vol. 86. Helsinki: Suomalainen Tiedeakatemia (Finnish Academy of Science and Letters). Pp. xxxii + 392. ISBN 951-41-0555-9. ISSN 0066-2011.
- Cited as: "CISI 1" or "Joshi & Parpola 1987" per Parpola's prescription (Parpola 1994a, 2010, 2018).
- Publisher contact: Tiedekirja, Helsinki (https://tiedekirja.fi/).
Used in: Same as A.2.
Shah, Sayid Ghulam Mustafa & Asko Parpola (eds.). 1991. Corpus of Indus Seals and Inscriptions. 2: Collections in Pakistan. Annales Academiae Scientiarum Fennicae, Series B. Helsinki: Suomalainen Tiedeakatemia. Pp. xxxii + 448. ISBN 951-41-0556-7.
Used in: Phase-26 find-spot map cross-checking; Phase-28 catalogue reference. To acquire: Phase-30 priority. €220 from Tiedekirja or via ILL.
Parpola, Asko, B. M. Pande & Petteri Koskikallio (eds.). 2010. Corpus of Indus Seals and Inscriptions. Volume 3: New material, untraced objects, and collections outside India and Pakistan. Part 1: Mohenjo-daro and Harappa. In collaboration with Richard H. Meadow & J. Mark Kenoyer. Annales Academiae Scientiarum Fennicae, Humaniora 359. Memoirs of the Archaeological Survey of India, vol. 96. Helsinki: Suomalainen Tiedeakatemia. Pp. lx + 444. ISBN 978-951-41-1040-5.
- Preview PDF: Academia.edu (Petteri Koskikallio) — 17-page front-matter. https://www.academia.edu/89021072
Used in: Phase-30 target. €160 from Tiedekirja.
Parpola, Asko, B. M. Pande & Petteri Koskikallio (eds.). 2019. Corpus of Indus Seals and Inscriptions. Volume 3.2: Shahr-i Sokhta, Mundigak, Mehrgarh, Nausharo, Sibri, Dauda-damb, Chanhu-daro, Ahar, Balathal, Gilund, Kalibangan, Rojdi. In collaboration with Massimo Vidale, Alessandra Lazzari, Catherine Jarrige, Jean-François Jarrige, Gonzague Quivron, Hélène Trompetent. Annales Academiae Scientiarum Fennicae, Humaniora 383. Helsinki: Suomalainen Tiedeakatemia. Pp. 394. ISBN 978-951-41-1134-1.
Used in: Phase-28 OCR target (40-page front-matter only — full plates volume needs separate acquisition).
Parpola, Asko & Petteri Koskikallio (eds.). 2022. Corpus of Indus Seals and Inscriptions. Volume 3.3: Indo-Iranian Borderlands. Annales Academiae Scientiarum Fennicae, Humaniora 386. Tampere: Suomalainen Tiedeakatemia. Pp. lxxxvii + 683. ISBN 978-951-41-1153-2.
- Review: Fuls, Andreas. 2022. "Corpus of Indus Seals and Inscriptions. Volume 3.3 Indo-Iranian Borderlands." Iranian Journal of Archaeological Studies 12(2): 139-143. https://doi.org/10.22111/ijas.2022.45467.1268
- Includes: Potts, D. T. 2022. "The Graffiti from Tepe Yahya and the Role of the Indo-Iranian Borderlands in the Formation of the Harappan Writing System." Pp. xxvii-xxxiii in CISI 3.3.
Used in: Phase-22 sign list reference; Phase-28 fish-family allograph membership; Phase-30 target for full integration.
Wells, Bryan K. 2015. The Archaeology and Epigraphy of Indus Writing. With technical appendices by Andreas Fuls. Oxford: Archaeopress. Pp. x + 143. ISBN paperback 978-1-78491-046-4. ISBN epublication 978-1-78491-047-1.
- Available: Archaeopress eBook £15.83 (private use). Google Play Books $22. https://www.archaeopress.com/Archaeopress/Products/9781784910464
Used in: Phase-22 sign list (676 graphemes); Phase-29 Wells corpus.
Wells, Bryan Kenneth. 2006. Epigraphic Approaches to Indus Writing. PhD dissertation, Harvard University, Department of Anthropology. Pp. ~400.
- Open access: Internet Archive https://archive.org/details/epigraphicapproachestoinduswritingbryankennethwellsphd.thesis_230_Z
Used in: Phase-29 MathematicaEpigraphicaLoader (no-op default; not yet
acquired). Expected: 5,509 inscriptions, 19,616 sign occurrences.
Fuls, Andreas. 2022 (1st ed.) / 2023 (2nd ed.). Corpus of Indus Inscriptions. Mathematica Epigraphica vol. 3. Berlin: Independently published. Pp. 582. ISBN paperback 978-1-67180-486-9.
- Author's preferred citation: "Fuls 2022" or "Fuls 2023" (specifying edition); see https://www.epigraphica.de/indus/.
- Available: Amazon paperback ~$45 / Kindle. Free PDF on Academia.edu https://www.academia.edu/83046175.
- Author email: andreas.fuls@tu-berlin.de.
Used in: Phase-30 target — sign list (~700 graphemes) cross-validation.
Fuls, Andreas. 2023. A Catalog of Indus Signs. Mathematica Epigraphica vol. 4. Berlin: Independently published. Pp. ~540. ISBN 979-8-398-42230-6.
- Available: Amazon paperback. Free PDF on Academia.edu https://www.academia.edu/103538728 and ResearchGate https://www.researchgate.net/publication/373522673_A_Catalog_of_Indus_Signs.
Used in: Phase-29 ICITCorpusLoader (no-op default; API access by request).
Expected: 4,537 objects, 5,509 texts, 19,616 sign occurrences.
Wells, Bryan K. & Andreas Fuls. 2008-present. Interactive Corpus of Indus Texts (ICIT). Online database, TU Berlin. https://www.epigraphica.de/indus/.
- Citation prescription: "Wells & Fuls (ICIT, accessed YYYY-MM-DD)."
- API access: by email to andreas.fuls@tu-berlin.de.
Used in: Phase-25 typology fit (KL=0.0033 vs Indus); Phase-30 expansion target.
Mahadevan, Iravatham. 2003. Early Tamil Epigraphy: From the Earliest Times to the Sixth Century A.D. Harvard Oriental Series 62. Cambridge, MA: Harvard University Press. ISBN 978-0-674-01227-1.
Mahadevan, Iravatham. 2014. Early Tamil Epigraphy: Tamil-Brahmi Inscriptions. Revised and enlarged 2nd edition, Volume 1. Chennai: Central Institute of Classical Tamil.
Used in: Phase-29 EPSD2NamesLoader (4,848 entries: 1,222 PN, 2,068 DN, ...).
Tinney, Steve, Philip Jones, Niek Veldhuis, et al. 2017-present. electronic Pennsylvania Sumerian Dictionary 2nd Edition (ePSD2). Version 2.7.2 (released 2024-08-31). Philadelphia: University of Pennsylvania Museum of Archaeology and Anthropology, Babylonian Section. http://oracc.org/epsd2.
- Citation prescription (per ePSD2 home page): "ePSD2 (URL accessed YYYY-MM-DD)."
- License: CC BY-SA. The Pennsylvania Sumerian Dictionary Project, 2017-.
- Bulk download: https://oracc.museum.upenn.edu/json/epsd2-names.zip (4.5 MB ZIP, 37 MB uncompressed gloss-qpn.json).
- Subset shipped at:
backend/glossa_lab/data/epsd2_names_subset.json(842 KB, 4,848 entries). - Major credits (per ePSD2 News): Niek Veldhuis (admin/ur3), Philip Jones (admin/ed3b/oakk), Steve Tinney (overall), John Carnahan (Drehem names), Jana Matuszak (DSSt). 50,000+ instances of names total.
Used in: Phase-29 cross-reference; Phase-30 expansion target.
Black, Jeremy A., Graham Cunningham, Jarle Ebeling, Esther Flückiger-Hawker, Eleanor Robson, Jon Taylor, Gábor Zólyomi. 1998-2006. The Electronic Text Corpus of Sumerian Literature (ETCSL). Oxford: University of Oxford, Faculty of Oriental Studies. https://etcsl.orinst.ox.ac.uk.
- Citation prescription (per ETCSL): "[ETCSL] [Composition] (revised: date)."
Used in: Phase-22 Meluhha tablet extraction (1,462 tablets); ongoing reference.
Englund, Robert K., Bertrand Lafont, Klaus Wagensonner, et al. 2000-present. Cuneiform Digital Library Initiative (CDLI). Berlin & Los Angeles: CDLI. https://cdli.mpiwg-berlin.mpg.de.
- Citation prescription: "CDLI Pxxxxxx (https://cdli.mpiwg-berlin.mpg.de/Pxxxxxx)."
- License: CC BY-NC-SA 3.0.
Used in: Phase-26 cross-reference (via ePSD2 import).
Molina, Manuel et al. 2002-present. Base de Datos de Textos Neosumerios (BDTNS). Madrid: CSIC. http://bdtns.filol.csic.es.
Used in: Phase-30 contact-zone PN expansion target.
Vandorpe, Lieselot. 2015 (PhD)/2019 (publication). East Side Story: Susa under the Sukkalmah Dynasty (1930-1450 B.C.) — A prosopographical study. PhD dissertation, Ghent University. https://www.academia.edu/14695347.
Used in: Phase-30 reference.
De Graef, Katrien. 2019. "Susa under the Sukkalmah Dynasty: A Society Between the Mountains and the Plain." Pp. 84-110 in Elam in the 2nd Millennium BC. Tübingen: Mohr Siebeck.
Used in: Phase-30 reference for Akkadian-Susa contact period.
Steinkeller, Piotr. 2013. "Puzur-Inšušinak at Susa: A Pivotal Episode of Early Elamite History Reconsidered." In Susa and Elam: Archaeological, Philological, Historical and Geographical Perspectives. Mémoires de la Délégation en Perse.
Used in: Phase-25 phoneme map foundation; Phase-27 anchor scoring; Phase-28 expanded phoneme map; the central Dravidian decipherment hypothesis.
Parpola, Asko. 1994a. Deciphering the Indus Script. Cambridge: Cambridge University Press. Pp. xxiii + 374. ISBN 978-0-521-43079-1.
- Reprint: 2009 paperback edition.
- Available: Internet Archive https://archive.org/details/decipheringindus0000parp.
Used in: Phase-27 iconographic anchors (12 anchors from figs. 5-23); Phase-28 fish-family allograph extension.
Parpola, Asko. 2010. "A Dravidian solution to the Indus script problem." Coimbatore: Kalaignar M. Karunanidhi Classical Tamil Award lecture (World Classical Tamil Conference, 25 June 2010). Pp. 39.
- Available: Helsinki research portal https://researchportal.helsinki.fi/.
Used in: Phase-28 yoke-carrier phoneme entry (kavai).
Parpola, Asko. 1981. "On the Harappan yoke-carrier pictogram and the kavai worship." Pp. ??? in Proceedings of the 5th International Conference on Tamil Studies.
Used in: Phase-28 buffalo (erumai) phoneme entry; Phase-30 expansion target.
Parpola, Asko. 1985. The Sky-Garment: A study of the Harappan religion and its relation to the Mesopotamian and later Indian religions. Studia Orientalia 57. Helsinki: Finnish Oriental Society.
Parpola, Asko. 2004. "From archaeology to stratigraphy of Vedic syncretism: The banyan tree and the water buffalo as Harappan-Dravidian symbols of royalty." In Vedic Studies (vol. ed. ???).
Parpola, Asko. 2018. "Indus Seals and Glyptic Studies: An Overview." In Seals and Sealing in the Ancient World: Case Studies from the Near East, Egypt, the Aegean, and South Asia. Cambridge: Cambridge University Press. https://researchportal.helsinki.fi/en/publications/indus-seals-and-glyptics-studies-an-overview.
Used in: Phase-25-29 phoneme readings; Phase-28 Murukan signs.
Mahadevan, Iravatham. 1970-2018. Research Papers (40+ papers). Chennai: Roja Muthiah Research Library, Indus Research Centre. https://rmrl.in/en/dl/research-papers/mahadevan.
Key papers used:
-
- "Dravidian Parallels in Proto-Indian Script."
-
- "Study of the Indus Script through Bi-lingual Parallels."
-
- "Towards a Grammar of the Indus Texts: 'Intelligible to the eye, if not to the ears'." Tamil Civilisation 4(3-4): 133-143.
-
- "Murukaṉ in the Indus Script."
-
- "Meluhha and Agastya: Alpha and Omega of the Indus Script."
-
- "Vestiges of Indus Civilisation in Old Tamil."
-
- "Akam and Puram: 'Address' Signs of the Indus Script."
-
- Dravidian Proof of the Indus Script via the Rig Veda: A Case Study. Indus Research Centre Bulletin, Roja Muthiah Research Library.
- 2017/2018. Toponyms, Directions and Tribal Names in the Indus Script. With M. V. Bhaskar. Oxford: Archaeopress.
Used in: Phase-30 falsification round target. Note: The lipi repository corpus data is derived from Fuls' ICIT corpus. Yajnadevam's Sanskrit readings are a separate publication.
Yajnadevam (pseudonym). 2024. "A cryptanalytic decipherment of the Indus script." ResearchGate preprint, November 2024. https://www.researchgate.net/publication/387756000_A_cryptanalytic_decipherment_of_the_Indus_script.
Used in: Phase-30 falsification round target.
H. Muhammad, Mahaveer. 2023. The Alphabet of the Sindhu Prakrit: The decipherment of the Indus Script. Eliva Press.
H. Muhammad, Mahaveer. 2024. "The Decoded Indus Seal M-282." Preprints 2024.07.2105.v1. https://doi.org/10.20944/PREPRINTS202407.2105.V1.
H. Muhammad, Mahaveer. 2024-12-27. "The Indus Elephant Seals: The Royal Terminology of The Indus Valley Civilization." SSRN Working Paper 5073283. https://ssrn.com/abstract=5073283.
Used in: Phase-30 falsification round target.
Neukart, Florian. 2025. "Cracking the Code of the Indus Valley Civilization: A Computational Approach to Lost Knowledge." SSRN Working Paper 5141753. https://ssrn.com/abstract=5141753.
Used in: Phase-30 falsification round target.
Rao, Shikaripur Ranganatha. 1982. The Decipherment of the Indus Script. Bombay: Asia Publishing.
Used in: Phase-30 reference.
Bonta, Steven. 1996. Topics in the Study of the Indus Valley Script. Brigham Young University.
Bonta, Steven. 2010. The Indus Valley Script: A New Interpretation. Pennsylvania State University.
Used in: Phase-15-style structural rebuttals (anti-decipherment hypothesis).
Farmer, Steve, Richard Sproat & Michael Witzel. 2004. "The Collapse of the Indus-Script Thesis: The Myth of a Literate Harappan Civilization." Electronic Journal of Vedic Studies 11(2): 19-57. http://go.nature.com/vasrw5.
Used in: Phase-15 entropy benchmarks.
Rao, Rajesh P. N., Nisha Yadav, Mayank N. Vahia, Hrishikesh Joglekar, R. Adhikari, Iravatham Mahadevan. 2009. "Entropic Evidence for Linguistic Structure in the Indus Script." Science 324: 1165. https://doi.org/10.1126/science.1170391.
Used in: Phase-15 falsification reference.
Sproat, Richard. 2014. "A Statistical Comparison of Written Language and Nonlinguistic Symbol Systems." Language 90(2): 457-481.
Used in: Phase-22+ Zipf statistics.
Yadav, Nisha, Hrishikesh Joglekar, Rajesh P. N. Rao, Mayank N. Vahia, R. Adhikari, Iravatham Mahadevan. 2010. "Statistical analysis of the Indus script using n-grams." PLOS ONE 5(3): e9506. https://doi.org/10.1371/journal.pone.0009506.
Used in: Phase-22+ direction normalisation.
Ashraf, Mohammed Imran & Sitabhra Sinha. 2018. "The 'handedness' of language: Directional symmetry breaking of sign usage in words." PLOS ONE 13(1): e0190735. https://doi.org/10.1371/journal.pone.0190735.
Used in: Phase-29 reference for ePSD2 integration plan.
Ansumali Mukhopadhyay, Bahata. 2019. "Interrogating Indus inscriptions through their context, structure and compositional semantics, to understand their inner logic of message conveyance." Palgrave Communications. https://doi.org/10.2139/ssrn.3184583.
Used in: Phase-33 semantic reanalysis of fish-sign interpretation. Epistemic status: HIGH — directly challenges the fish-sign = mīn phonetic reading used in Phase-29d.
Ansumali Mukhopadhyay, Bahata. 2020. "Ancient Tax Tokens, Trade Licenses and Metrological Records?: Making Sense of Indus Inscribed Objects Through Script-Internal, Contextual, Linguistic, and Ethnohistorical Lenses." Preprint. https://doi.org/10.2139/ssrn.3538764.
Key claims relevant to Glossa Lab research:
- Fish signs (M-047, M-306 etc.) signify apotropaic eye-beads (carnelian/agate "fish-eye-beads" documented in Mesopotamian lexicons as NA4-IGI-KU6), NOT the phonetic reading mīn (fish/star) used in Phase-29d reverse-Janabiyah search. This supports the Phase-32 T8 finding (Enmenanak signal NOT SIGNIFICANT).
- Bird signs signify lapis lazuli / precious stones via the ancient root "Kapautaka" (pigeon-coloured, Sanskrit/Old Persian for lapis lazuli blue).
- Sign
maṇi(bead/eye/gem/amulet) connects Tamil, Sanskrit, and Akkadianmaninnu(Amarna letters), providing independent support for the IVC– Mesopotamia trade network hypothesis. - 9 structural sign-classes (PF1, PF2, PPF, CM, PCL, NUM, MET, CROP, ENC) with semantic roles in tax-administration — consistent with Phase-30 positional grammar findings.
- Inscriptions = logographic tax/trade records, NOT phonetic personal-name lists. This is a strong alternative to the reverse-Janabiyah phonetic reading approach.
Citation note: The PDF Ancient_Tax_Tokens_Trade_Licenses_and_Me.pdf was
obtained via Academia.edu and is stored locally at
C:\Users\trist\Downloads\Ancient_Tax_Tokens_Trade_Licenses_and_Me.pdf.
NOT yet in the discovery database — add manually via Settings → Discovery → Import.
Used in: Phase-33 semantic reanalysis; direct follow-up and peer-reviewed expansion of D.5b (2020 preprint). Epistemic status: HIGH — open-access peer-reviewed; 15k+ accesses, 147 Altmetric score.
Ansumali Mukhopadhyay, Bahata. 2023. "Semantic scope of Indus inscriptions comprising taxation, trade and craft licensing, commodity control and access control: archaeological and script-internal evidence." Humanities and Social Sciences Communications 10: 972. https://doi.org/10.1057/s41599-023-02320-7. Open access.
Key claims relevant to Glossa Lab research:
- Builds on the 2019 logographic/semasiographic structural analysis and the 2020 tax tokens preprint, providing archaeological cross-validation for the administrative-commercial interpretation of Indus seals and tablets.
- Seals found near city gates (Harappa), craft workshops (Chanhu-daro), and public buildings (Mohenjo-daro) along with standardized weights → taxation.
- Sealings on storage containers and "warehouse" chambers (Lothal) → commodity control and licensing roles consistent with Mukhopadhyay 2020.
- Two-sided tablets: obverse = commercial license type; reverse = fee/quantity notations. Supports logographic NOT phonetic interpretation.
- Strongly challenges any personal-name (anthroponym/toponym) reading approach; no proper nouns encoded in ISC.
- Directly supports the Phase-32 T8 finding (Enmenanak personal-name signal NOT SIGNIFICANT under this framework).
- Local PDF:
C:\Users\trist\Downloads\s41599-023-02320-7.pdf
Used in: Phase-28 allograph family inspiration; sign-list reduction methodology.
Daggumati, Shruti & Peter Z. Revesz. 2021. "A method of identifying allographs in undeciphered scripts and its application to the Indus Valley Script." Humanities and Social Sciences Communications 8: 50. https://doi.org/10.1057/s41599-021-00713-0. Open access.
Key findings:
- General positional data-mining method for identifying redundant (allographic) signs using sign position within inscriptions as the discriminating signal.
- Applied to Indus Valley Script: finds 50 pairs (23 mirrored + 27 non-mirrored) that can be merged, reducing the estimated sign list significantly.
- Shows multi-directionality of IVS: mirrored signs denote writing direction not semantic difference (except Type 5 grammatical marker cases).
- Reduces sign count to ~417 unique graphemes; supports decipherment tractability.
- Local PDF:
C:\Users\trist\Downloads\s41599-021-00713-0.pdf
Used in: Phase-30+ falsification context; comparative visual script analysis.
Daggumati, Shruti & Peter Z. Revesz. 2018. "Data Mining Ancient Script Image Data Using Convolutional Neural Networks." In Proceedings of the 22nd International Database Engineering & Applications Symposium (IDEAS 2018), Villa San Giovanni, Italy, June 18–20, pp. 1–6. ACM. https://doi.org/10.1145/3216122.3216163.
Key findings relevant to Glossa Lab:
- Trained CNNs on Phoenician alphabet (22 symbols), Brahmi script (27 symbols), and Indus Valley Script (25 symbols) using 3,552 images (25×25px each).
- Counterintuitive finding: Indus Valley script symbols are visually CLOSER
to Phoenician alphabet than to Brahmi, despite geographic proximity to Brahmi.
- Phoenician avg. match strength (without duplicates): 0.6546
- Brahmi avg. match strength (without duplicates): 0.6490
- 14/22 Phoenician symbols uniquely mapped vs. only 13/27 Brahmi symbols.
- Provides tentative phoneme assignments for 25 most-frequent Indus signs based on CNN-derived Phoenician nearest-neighbour matches.
- Supports NW Semitic / Phoenician contact hypothesis (relevant to Fuls NW Semitic falsification experiments).
- Local PDF:
C:\Users\trist\Downloads\3216122.3216163.pdf
Used in: Phase-30 multimodal AI integration target.
Dixit, Vaishnavi, Nushrat Hussain, Shubham Basak, Deva Atturu, Debasis Mitra, Ujjwal Bhattacharya. 2025. "Deep Learning in Archiving Indus Script and Motif Information." Journal of Computer Applications in Archaeology 8(1): 156-169. https://doi.org/10.5334/jcaa.175.
Used in: Phase-30 reference.
Bhaskar, M. V. 2024. "Markers and agencies of anisotropy in the Indus sign system." Indian Journal of History of Science 59: 1-27. https://doi.org/10.1007/s43539-023-00102-3.
Used in: Phase-22+ positional profiler / clustering.
Fuls, Andreas. 2013. "Positional Analysis of Indus Signs." Voprosi Epigrafiki (Epigrafika) 7(1): 253-275.
Konasukawa, A. 2020. "Stratigraphic study of inscribed objects from Harappa." Pp. ??? in Studies on Indus Script. Mohenjodaro: National Fund for Mohenjodaro.
Tsouparopoulou, Christina. 2014. "Creating an online database for the documentation of seals, sealings and seal impressions in the Ancient Near East." Studia Orientalia Electronica 2: 37-68.
Used in: Phase-37 — CSA upgrade to Glossa Lab SA engine (k-permutations, chain coupling).
Tamburini, Fabio. 2025. "On automatic decipherment of lost ancient scripts relying on combinatorial optimisation and coupled simulated annealing." Frontiers in Artificial Intelligence 8: 1581129. https://doi.org/10.3389/frai.2025.1581129. Open access (CC BY). Code: https://github.com/ftamburin/CSA_OptMatcher
Key contributions:
- Coupled SA (CSA): multiple SA chains running simultaneously that communicate periodically, converging to better solutions than independent parallel restarts.
- k-permutations encoding: allows null mappings (signs left unassigned), one-to-many, and many-to-one mappings between sign sets. More expressive than bijective SA.
- Fixed-anchor injection: partial knowledge of sign-to-phoneme mappings can be hard-coded as constraints — validates Glossa Lab's anchor approach.
- Benchmarked on: Ugaritic→Hebrew (29/30 signs correct), Linear B→Mycenaean Greek, Romance language cognate identification. All outperform prior state-of-the-art.
- Limitations for Indus: method targets bilingual corpora; Indus has no known bilingual text, so language identification remains the primary challenge.
Used in: Phase-25 Tamil-Brahmi typology fit; Phase-30 phoneme map expansion.
Burrow, Thomas & M. B. Emeneau. 1984. A Dravidian Etymological Dictionary (DEDR), 2nd edition. Oxford: Clarendon Press.
Krishnamurti, Bhadriraju. 2003. The Dravidian Languages. Cambridge: Cambridge University Press. ISBN 978-0-521-77111-5.
Used in: Phase-29 reference for IRC engagement.
Balakrishnan, R. 2019. Journey of a Civilization: Indus to Vaigai. Chennai: Roja Muthiah Research Library.
Joseph, Tony. 2018. Early Indians: The Story of Our Ancestors and Where We Came From. New Delhi: Juggernaut. ISBN 978-93-86228-98-7.
Used in: Phase-29 Tamil Nadu prize context.
Rajan, K. & R. Sivananthan. 2025. Indus Signs And Graffiti Marks of Tamil Nadu - A Morphological Study. Chennai: Tamil Nadu State Department of Archaeology.
Used in: Phase-22 contact-zone reference (catalog of 95 Early Dilmun seals); Phase-30 acquisition target.
Crawford, Harriet. 2001. Early Dilmun Seals from Saar: Art and Commerce in Bronze Age Bahrain. London-Bahrain Archaeological Expedition: Saar Excavation Reports II. Ludlow: Archaeology International. Pp. 110. ISBN 0-9539561-0-5.
- Available: archive.org full PDF https://ia802709.us.archive.org/34/items/EarlyDilmunSealsFromSaarH.Crawford/.
- Also: academia.edu (Srini Kalyanaraman upload) https://www.academia.edu/28086707.
Used in: Phase-25 Janabiyah readout; Phase-27 reverse Janabiyah search; Phase-29 ReverseJanabiyahSearchV3 reference.
Laursen, Steffen Terp. 2010. "The westward transmission of Indus Valley sealing technology: origin and development of the 'Gulf Type' seal and other administrative technologies in Early Dilmun, c.2100-2000 BC." Arabian Archaeology and Epigraphy 21(2): 96-134. https://doi.org/10.1111/j.1600-0471.2010.00329.x.
Frenez, Dennys. 2018. "Manufacturing and trade of Asian elephant ivory in Bronze Age Middle Asia: Evidence from Gonur Depe (Margiana, Turkmenistan)." Archaeological Research in Asia 15.
Frenez, Dennys. 2020. "Mirrored signs: Administrative and scriptorial information in the Indus Civilization clay sealings." Pp. 21-38 in Studies on the Indus Script.
Vidale, Massimo & Dennys Frenez. 2015. "Indus Components in the Iconography of a White Marble Cylinder Seal from Konar Sandal South (Kerman, Iran)." South Asian Studies 31(1): 144-154.
Possehl, Gregory L. 2006. "Shu-ilishu's Cylinder Seal." Expedition 48(1): 42-43.
Potts, Daniel T. 1990. The Arabian Gulf in Antiquity, Vol. 1: From Prehistory to the Fall of the Achaemenid Empire. Oxford: Oxford University Press.
Potts, Daniel T. 2016. The Archaeology of Elam: Formation and Transformation of an Ancient Iranian State. 2nd ed. Cambridge: Cambridge University Press.
Lerner, Judith A. 2010. "Observations on the Typology and Style of Seals and Sealings from Bactria and the Indo-Iranian Borderlands." Pp. 245-266 in Coins, Art and Chronology II. Vienna: ÖAW.
Used in: Phase-30 cross-civilizational target.
Desset, François et al. 2022. "The decipherment of Linear Elamite writing." Zeitschrift für Assyriologie und Vorderasiatische Archäologie.
Used in: Phase-30 prior on Dravidian-language hypothesis.
Narasimhan, Vagheesh M., Nick Patterson, Priya Moorjani, Nadin Rohland, Rebecca Bernardos, Swapan Mallick, Iosif Lazaridis, et al. 2019. "The formation of human populations in South and Central Asia." Science 365(6457): eaat7487. https://doi.org/10.1126/science.aat7487.
Reich, David. 2018. Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past. New York: Pantheon. ISBN 978-1-101-87034-6.
Shinde, Vasant, Vagheesh M. Narasimhan, et al. 2019. "An Ancient Harappan Genome Lacks Ancestry from Steppe Pastoralists or Iranian Farmers." Cell 179(3): 729-735. https://doi.org/10.1016/j.cell.2019.08.048.
Used in: Phase-33 comparative methodology; supports administrative-logographic interpretation; parallel evidence for cretulae/sealing system functional to IVC. Epistemic status: HIGH — peer-reviewed book chapter, established Mycenologist.
Perna, Massimo. 2014. "The Birth of Administration and Writing in Minoan Crete: Some Thoughts on Hieroglyphics and Linear A." Chapter 19 in KE-RA-ME-JA: Studies Presented to Cynthia W. Shelmerdine, edited by Dimitri Nakassis, Joann Gulizio, and Sarah A. James. Prehistory Monographs 46. Philadelphia: INSTAP Academic Press. Pp. 251–259. ISBN 978-1-931534-76-5.
Key findings applicable to IVC research:
- The Minoan cretulae system (clay sealings on storage vessels, door pegs) tracks commodity withdrawals without phonetic content — functions identically to the IVC sealings on Lothal "warehouse" containers (Mukhopadhyay 2023, D.5c).
- Cretan Hieroglyphic (96 signs, ~350 docs) and Linear A (97 signs, ~1,500 docs, 8,000 tokens) coexisted as functionally independent systems: CH on seals, Linear A on clay tablets. Direct comparative model for IVC's seal-vs-tablet distinction.
- 90% of Linear A tablets record only short economic entries (1–3 words). 708 flat-based nodules were shaped around folded parchment sheets — evidence that far richer content was written on perishable materials. Strongly supports: IVC inscriptions = durable fraction of a larger administrative system.
- IVC sign count and token count are within normal range for Bronze Age logographic writing systems (not anomalously large).
- Local PDF:
C:\Users\trist\Downloads\The_Birth_of_Administration_and_Writing.pdf
Used in: Phase-32/33 falsification context; confirms 1,500-year gap between IVC and Brāhmī; relevant to IVC→script descent questions. Epistemic status: MODERATE — doctoral dissertation summary (Eötvös Loránd University), peer-reviewed defence committee.
Tóth, Ibolya. 2018. The Beginnings of the Epigraphical Tradition in India, with special regard to the Cultural Exchange between India and the Hellenistic World. Summary of Doctoral Dissertation. Doctoral School of Linguistics. Budapest: Eötvös Loránd University, Faculty of Humanities.
Key findings applicable to IVC research:
- Confirms the 1,500-year gap: IVC ends ~1900 BCE; earliest confirmed Brāhmī inscriptions are Aśoka's Rock Edicts (~250 BCE). No epigraphic evidence bridges this gap. Decisive against any IVC→Brāhmī phonetic lineage claim.
- Kharoṣṭhī derives from Aramaic (Northwest India, Achaemenid period) — NOT from IVC. If IVC encoded NW-Indian languages, the successor script tradition was externally replaced, not organically evolved.
- Pāṇini (~4th century BCE) mentions lipikara (scribe) and yavanānī (Greek writing) — writing existed before Aśoka on perishable materials. Consistent with the Minoan parchment model (Perna 2014, I.3).
- Brāhmī's geometric character suggests possible adaptation of a pre-existing script; Indian scholars' connections to IVC are treated as speculative in the literature.
- Local PDF:
C:\Users\trist\Downloads\Summary_of_Doctoral_Dissertation_The_Beg.pdf
Used in: Gulf corpus expansion reference; rebus decipherment alternative hypothesis. Epistemic status: LOW — unpublished working paper, no peer review; advocate of controversial rebus/Meluhha metalwork hypothesis. Use cautiously.
Kalyanaraman, S. (undated). "Nine Indus Script Inscriptions of Ancient Near East, catalogues of metalwork, lapidary work repertoire." Working paper. Available via Academia.edu and ANU Open Research Repository. hdl: 1885/162902.
Points of limited value for Glossa Lab:
- Identifies Dilmun seals (Bahrain) and Failaka (Kuwait) seal as bearing Indus script hieroglyphs, consistent with Laursen 2010 (F.2) and Gadd 1932 (F.7). The Dilmun seal images discussed by Elisabeth C.L. During Caspers (Proceedings of the Seminar for Arabian Studies 6, 1976, pp. 8–39) are an additional Gulf corpus reference.
- Gadd Seal 1 from Ur (BM 120573): proposes the cuneiform text reads SAG KUSIDA (Sumerian 'chief' + Meluhha borrowing 'money-lender') with Indus bull field symbol = bharata 'metal alloy'. If correct, this would be the only bilingual IVC-cuneiform seal, highly relevant to the Gulf contact zone hypothesis. Speculative; not verified. Caution: Rebus punning methodology (Munda/Dravidian phonetic punning on pictographic readings) is not mainstream-accepted; etymological chains are long. Do NOT cite as established decipherment.
- Local PDF:
C:\Users\trist\Downloads\Nine_Indus_Script_Inscriptions_of_Ancien.pdf
Used in: Phase-32/33 cross-civilizational statistical benchmarking. Epistemic status: LOW — conference paper, no named authors, no peer review.
Anonymous. (undated). "Linguistic Lineages and Lost Scripts: A Cross-Civilizational Analysis of the Indus Valley Script." Conference paper.
Statistical findings (replicable, useful):
- Based on Fuls (2019) A Catalogue of Indus Signs (~900 signs): Mean count 60.93, SD 129.61, Median 8, Max 867. χ² = 10,157.98 (p < 0.001) against uniform distribution. Confirms structured, non-random sign distribution consistent with a formal writing system. Replicates our Phase-22 Zipf findings with a different dataset.
- Primary Core Signs (high-frequency) vs. Secondary Composite Signs (rare) classification mirrors the allograph-reduction methodology of Daggumati & Revesz 2021 (D.6).
- Cross-script comparison: Oracle Bone Script (~50% visual similarity, highest), Egyptian Hieroglyphs (~30%), Proto-Elamite (~20%). Both IVC and OBS share short inscriptions (3–5 signs), core radical repetition, ideographic bases. Convergent function, not historical contact.
- Seals-as-Geographical-Indication-tags hypothesis aligns with Mukhopadhyay 2023 (D.5c). Caution: Similarity percentages are visual estimates, not formally quantified. Dholavira "pottery" reading is speculative with no linguistic grounding.
- Local PDF:
C:\Users\trist\Downloads\LINGUISTIC_LINEAGES_AND_LOST_SCRIPTS_A_C.pdf
Used in: Phase-33 comparative epigraphy methodology; Cypro-Minoan as IVC parallel; potmarks vs. script sign debates. Epistemic status: HIGH — Cambridge CREWS project, open access CC BY 3.0.
Steele, Philippa M. & Philip J. Boyes (eds.). (undated, ~2021). Writing Around the Ancient Mediterranean: Practices and Adaptations. CREWS (Contexts of and Relations between Early Writing Systems) project. Oxbow Books / Casemate. Open access: https://books.casematepublishing.com/Writing_Around_the_Ancient_Mediterranean.pdf License: CC BY 3.0.
Chapter 4 specifically: "Cypro-Minoan and its potmarks and vessel inscriptions as challenges to Aegean Scripts corpora." Authors from the CREWS team.
Key findings applicable to IVC research:
- Methodological framework for distinguishing potmarks (makers' marks, possibly non-linguistic) from script inscriptions on pottery — the exact debate for IVC.
- Methodology for building a unified sign repertory from mixed media (seals, pottery, tablets) with variant forms: directly applicable to IVC corpus construction.
- Cypro-Minoan (~100 signs, ~230 inscriptions, 1550–1050 BCE, Cyprus trade context) is a high-quality comparandum: undeciphered, short inscriptions (avg. ~5 signs), traded objects as primary carriers. Structural profile matches IVC.
- The CREWS methodological framework for allograph identification and sign classification mirrors the Phase-32 allograph work (Daggumati & Revesz 2021, D.6).
- Local PDF:
C:\Users\trist\Downloads\Ch_4_Cypro_Minoan_and_its_potmarks_and_v.pdf
Used in: Phase-32/33 sign repertory validation methodology. Epistemic status: MODERATE — scholarly chapter from undetermined larger work.
Anonymous chapter (undated). "Debugging the Process of Building a Repertory of the Southeastern European Signs." Chapter 4, Part I. (~23 pages).
Key findings applicable to IVC research:
- Explicit methodology for testing a sign repertory by using known scripts (Linear B used as benchmark) to validate the repertory construction process.
- Southeastern European signs (Vinča culture, ~5000–3500 BCE) as another undeciphered prehistoric sign system — provides temporal and cultural comparanda outside the Near East.
- The "debugging" approach — iterating a proposed sign list against known-answer systems before applying to unknown — is directly applicable to Phase-32/33 foundation checking. Caution: Author and publication details unknown; treat as methodological reference only.
- Local PDF:
C:\Users\trist\Downloads\Chapter_4_part_I_Debugging_the_process_o.pdf
Used in: Phase-32/33 fish-sign family reassessment; alternative to Parpola mīn reading. Epistemic status: LOW — unpublished working paper, same author as I.5. Use cautiously.
Kalyanaraman, S. (undated). "Indus Script hieroglyphs 1. fish-fin 'khambhaṛā', 2. reed-mollusc 'eraka-sippi' signify Bronze Age mint." Working paper. (~25 pages).
Points of limited value for Glossa Lab:
- Fish-fin sign: khambhaṛā → kammaṭa 'mint, coiner, coinage' — a third interpretation of the fish-sign family alongside (1) Parpola phonetic mīn and (2) Mukhopadhyay 2020 apotropaic eye-bead reading. All three agree the signs have commercial/craft significance.
- Reed-mollusc sign: eraka 'molten metal' + sippi 'casting vessel' → smelting context. Supports the Phase-32 finding that fish-sign Enmenanak signal is NOT SIGNIFICANT under any of the three frameworks. Caution: Same methodological concerns as I.5; rebus chains are long and unverified.
- Local PDF:
C:\Users\trist\Downloads\Indus_Script_hieroglyphs_1_fish_fin_kham.pdf
Knight, Kevin & Richard Sproat. 2009. "Writing systems, transliteration and decipherment." NAACL Tutorial.
Robinson, Andrew. 2015. "Ancient civilization: Cracking the Indus script." Nature 526: 499-501. https://doi.org/10.1038/526499a.
McKie, Jorj X. & contributors. 2016-present. PyMuPDF (fitz): Python bindings for the MuPDF library. https://github.com/pymupdf/PyMuPDF. License: AGPL.
Used in: Phase-28 OCR.
Mistral AI Team. 2024. Pixtral-12B-2409 vision-language model. https://mistral.ai/. License: commercial API access.
Used in: Phase-26-29 PDF generation.
ReportLab Inc. ReportLab Open Source. https://www.reportlab.com/.
Wikipedia contributors. 2026. Various articles on Indus Valley Civilization, Mahadevan, Parpola, Mohenjo-daro, Harappa, Ur III, etc. License: CC BY-SA 3.0.
Pierson, Tristen Kyle & contributors. 2026. Glossa-Lab: An agentic computational linguistics research platform for statistical analysis and decipherment of ancient writing systems. BitConcepts LLC. GitHub: https://github.com/BitConcepts/glossa-lab. License: MIT (source code); CC BY 4.0 (research outputs in research/indus/).
| Source | License | Compatible with redistribution? |
|---|---|---|
| Mahadevan 1977 (M77) | Public domain (Indian Government, ASI) | Yes |
| ePSD2 names | CC BY-SA | Yes (with attribution + ShareAlike) |
| CDLI tablets | CC BY-NC-SA 3.0 | Yes (non-commercial only) |
| Crawford 2001 PDF | Copyrighted, but freely uploaded to archive.org | Reference only |
| Wells 2006 PhD | Open access at archive.org | Yes |
| Wells 2015 | Copyrighted, Archaeopress | License purchase required |
| CISI 1, 2, 3.1, 3.2, 3.3 | Copyrighted, Suomalainen Tiedeakatemia | Reference only |
| Parpola 1994a | Copyrighted, CUP | Reference only |
| Parpola 2010 | Open conference paper | Reference + cite |
| Mahadevan papers (RMRL) | Released by RMRL with attribution | Yes (with attribution) |
| Fuls vol. 3, vol. 4 | Copyrighted, Independently published | Purchase required |
| ICIT database | Restricted (TU Berlin) | API access only |
| Yajnadevam 2024 | ResearchGate preprint | Reference only |
| Narasimhan 2019 | Science article | Subscription |
| Shinde 2019 | Cell article | Subscription |
| ETCSL | CC BY-SA | Yes |
| ICIPS / RMRL | Various | Yes (with attribution) |
If you use any of the Phase-22 through Phase-29 pipeline outputs in academic work, please cite:
- The Glossa-Lab project itself (see Section L).
- Mahadevan 1977 if you use the M77 corpus or M77 sign codes.
- Parpola 1994a + Parpola 2010 if you use the phoneme map or iconographic anchors.
- ePSD2 (Tinney et al.) if you use the Sumerian/Akkadian names corpus (license: CC BY-SA — your work must also be CC BY-SA).
- Joshi & Parpola 1987 + Shah & Parpola 1991 + Parpola, Pande & Koskikallio 2010, 2019, 2022 if you use any CISI sign sequences or find-spot data.
- Laursen 2010 if you use the Janabiyah seal #10 reading.
- Crawford 2001 if you use Saar seal data.
- Wells 2015 / Wells 2006 / Fuls 2022, 2023 if you use the Wells/Fuls sign list or ICIT corpus.
- CDLI for any cuneiform tablet citations.
If you use the Enmenanak or Enheduana finding from Phase-29, please
note that the structural fit is suggestive but not yet statistically
significant; cite the Phase-29 synthesis at
reports/phase29_synthesis.md and the underlying ePSD2 + Parpola sources.
We are indebted to the following scholars whose work directly enabled this pipeline (alphabetical):
-
R. Balakrishnan — Indus Research Centre, Mahadevan's successor.
-
Dennys Frenez — Indus-BMAC contact-zone studies.
-
Andreas Fuls (TU Berlin) — ICIT database, Mathematica Epigraphica vols. 3-4, positional analysis methodology, anchor-amplification methodology.
-
Philip Jones — ePSD2 admin/oakk + admin/ed3b + Penn Sumerian Dictionary.
-
J. Mark Kenoyer — Harappa archaeological project (HARP).
-
Petteri Koskikallio — CISI Vol. 3.1, 3.2, 3.3 co-editor.
-
Steffen Terp Laursen — Bahrain/Dilmun seal corpus, Janabiyah reading.
-
Iravatham Mahadevan (1930-2018) — In memoriam. The foundational M77 concordance, Tamil-Brahmi epigraphy, 40+ Indus-script papers.
-
Richard H. Meadow — HARP co-director.
-
Asko Parpola (Helsinki) — Decipherment hypothesis, CISI editor, phoneme map source. Helsinki research portal: https://researchportal.helsinki.fi/.
-
B. M. Pande — CISI Vol. 3 co-editor.
-
Daniel T. Potts — Tepe Yahya, Susa archaeology.
-
K. Rajan + R. Sivananthan — Tamil Nadu State Department of Archaeology.
-
Steve Tinney — Penn Sumerian Dictionary lead.
-
Niek Veldhuis — ePSD2 admin/ur3 + literary corpora.
-
Massimo Vidale — Indus-Iranian seal studies.
-
Bryan K. Wells — ICIT, Wells sign list, Wells 2006 PhD + Wells 2015 book.
-
Roja Muthiah Research Library (Chennai) — open access to Mahadevan's papers.
-
Tata Institute of Fundamental Research (TIFR) — Yadav, Vahia, Joglekar, Adhikari, Rao computational analyses.
-
University of Pennsylvania Museum — ePSD2 hosting.
Last updated: 2026-04-30. Maintained as part of the Glossa-Lab Indus decipherment pipeline. Any errors of attribution are the project's own. Please open an issue at https://github.com/BitConcepts/glossa-lab if you find any.
Used in: Phase-32/33 peripheral IVC corpus; site-specific sign data from Rann of Kutch. Epistemic status: HIGH — peer-reviewed chapter in 2023 academic volume, CC BY 4.0 Open Access.
Seth, Hansmukh & J. S. Kharakwal. 2023. "Harappan Script Material from Kanmer, Gujarat." In Research on Indus Civilization in the Wake of Hundred Years of Excavation at Harappa, edited by [editors]. Thiruvananthapuram: Department of Archaeology, University of Kerala. CC BY 4.0 Open Access. Local PDF:
C:\Users\trist\Downloads\Indus_Script_Material_from_Kanmer_Gujara.pdf
Key value:
- Provides inscription data from Kanmer, a coastal IVC site in Rann of Kutch, Gujarat (~2500–1800 BCE), excavated 2005–2012 by MS University Vadodara + Deccan College.
- Kanmer is geographically peripheral (Gujarat coast, near Arabian Sea trade routes) — its sign usage pattern may reflect regional variation relevant to Phase-32 T3 bigram transition analysis.
- 2023 publication = post-ICIT — may include inscriptions not in M77 or Wells/Fuls corpus.
- The parent volume covers the Harappa centenary (100 years since 1924 discovery), providing historiographic breadth alongside the corpus data.
- License: CC BY 4.0 — freely reusable with attribution.
Used in: V8-V24 autonomous decipherment campaign; foundation check corpus verification; all analyses in backend/scripts/v8_autonomous_loop.py and v18_autonomous_loop.py.
Miller, William (Sr) (publishing as Holdat LLC / WILL DA BEATZ HOLDAT). 2025. Indus Valley Script: Computational Evidence for the Minimal Grammar Hypothesis. GitHub: holdatllc/indus_valley_repo. License: see repository LICENSE file.
- Corpus: 1,670 seals, 7,002 sign tokens, 9 sites, 390 distinct sign IDs (Mahadevan M-numbers).
- Fields: cisi_number, site, iconography, position, letters (M-number), semantic role annotations.
- Important caveat: Independent computational study using Mahadevan M-sign numbering. Sign transcriptions have NOT been independently verified against CISI photographic plates. All sign identifications derive from Mahadevan 1977 (A.1) as processed by Miller. Use with caution for publication-grade work; cross-check against CISI plates before publishing any specific sign readings.
- Acknowledgement: William Miller Sr authored the structural constraint model and provided the corpus CSV that powers the Holdat-based analysis phases.
Used in: Gulf corpus acquisition plan (Laursen Table 1 nos. 16-21, 23-24); foundation check western Gulf corpus discussion.
Gadd, C. J. 1932. "Seals of Ancient Indian Style Found at Ur." Proceedings of the British Academy 18: 191-210. Archive scan: https://archive.org/download/in.gov.ignca.33779/33779_text.pdf
- Identifies and illustrates 17 Indian-style seals from Ur excavations (1923-1930).
- Confirmed museum numbers: BM 123208 (U.17649), Penn U.8685, BM 120228, BM 123059.
Used in: Gulf corpus (Laursen Table 1 nos. 12-13).
Kjærum, Poul. 1983. Failaka/Dilmun: The Second Millennium Settlements. Jutland Archaeological Society Publications XVII. Moesgaard. — Cat. nos. 279, 319 (Failaka seals with Indus-script inscriptions).
Used in: Gulf corpus (Laursen Table 1 nos. 6-7).
Kjærum, Poul. 1994. "Seals of 'Dilmun-type' from Failaka, Kuwait." In D.T. Potts, H.A. Al-Naboodah & P. Hellyer (eds.), Archaeology of the United Arab Emirates. London: Trident. — Figs. 1725, 1726 = Qala'at al-Bahrain seals.
Used in: Gulf corpus (Laursen Table 1 nos. 8-9, 56).
Al-Sindi, Khalid M. 1999. Corpus of Seals from the Bahrain National Museum. Manama: Bahrain National Museum. — Nos. 160 (BBM 20362), 180 (BBM 18839), 182 (Saar/Karzakkan cemetery pieces).
Used in: v8/v18 autonomous loops (PDR phoneme inventories); Phase-32 T4 Tamil LM; P30-E1 Sanskrit falsification comparison.
Glossa Lab contributors. 2026. dravidian.py: Old Tamil / Proto-Dravidian phoneme inventory and attested vocabulary. File: backend/glossa_lab/data/dravidian.py GitHub: https://github.com/BitConcepts/glossa-lab
- Data sources (all credited):
- DEDR (Burrow & Emeneau 1984, E.1) — etymological roots and reconstructions
- Parpola 1994a (C.1) — Dravidian rebus phoneme map
- Parpola 2010 (C.2) — iconographic anchor phoneme assignments
- Sangam Tamil literature (E.3) — attested Old Tamil personal names and vocabulary
- Krishnamurti 2003 (E.2) — phonological system
- Vocabulary: 1,740 entries (dict: Tamil word → gloss) + 2,155 attested forms
- Corpus inscriptions: 1,297 Old Tamil inscription sequences (character level)
- Important caveat: This is a DERIVED corpus compiled from the above sources. It is not independently peer-reviewed. Every entry is traceable to DEDR, Sangam, or Parpola. The phoneme inventories (PDR_INITIALS, PDR_MEDIALS, PDR_TERMINALS) use Classical Old Tamil forms, not phonologically reconstructed Proto-Dravidian. Label "OldTamil" or "Classical Tamil" in any publication; do NOT call it "PDR."
Every data file, corpus file, and report generated by Glossa Lab MUST include citation metadata. This section defines the required format.
"_citation": {
"primary_sources": ["A.1", "A.10"],
"derivation": "Computed from Mahadevan M77 + Holdat corpus v2025",
"authors_credited": [
"Mahadevan, Iravatham (1977) — sign system",
"Miller, William Sr / Holdat LLC (2025) — corpus transcription"
],
"year_data": "1977/2025",
"license": "M77 public domain (ASI); Holdat see repository LICENSE",
"glossa_lab_version": "2026-05-11",
"see_also": "CITATIONS.md sections A.1, A.13"
}Every reports/*.json file MUST have one of:
"_citation"key with at leastprimary_sourcesandderivation"citations"array listing CITATIONS.md section IDs (e.g.,["A.1", "C.2"])- Reference to
foundation_check_report.jsonwhich covers the full citation audit
Every Python script that loads external data MUST have a docstring citing:
- Author(s) of the data
- CITATIONS.md section ID(s)
- Any derivation from multiple sources
See AGENTS.md Rule H19 (Citation Required) for enforcement policy.
The foundation check (GET /api/v1/research/foundation-check) verifies that
key data files have _citation metadata.
Additional acknowledgements since the last update:
- William Miller Sr (Holdat LLC) — Independent computational analysis of 1,670 Indus seals; structural constraint model and minimal grammar hypothesis; provides the primary digital corpus for the V8-V24 distributional campaigns.
- Iravatham Mahadevan (1930-2018) (additional note) — The
dravidian.pyvocabulary and the M77-based analyses are ultimately grounded in Mahadevan's 50 years of epigraphic work. His Tamil-Brahmi corpus (2003) is the only direct parallel corpus available to this project. - Burrow, Thomas & Emeneau, Murray Barnson — The DEDR (1984) underlies every Proto-Dravidian / Old Tamil phoneme assignment in this project.
- Sangam poets (collectively, ~300 BCE–300 CE) — The Old Tamil attested
vocabulary in
dravidian.pyderives from their inscriptions.
Last updated: June 2026. For attribution concerns contact tpierson@bitconcepts.tech — we respond within 48 hours. See also ATTRIBUTION.md.
This section covers all sources acquired or planned for the ICIT-scale Indus
corpus reconstruction project. Branch: corpus/icit-scale-reconstruction.
All sources are cited per H18. Rights classes and acquisition status are tracked
in glossa-corpus/indus/sources/*/provenance.yaml.
- Author: mcskware (GitHub: mayig)
- Title: Indus Valley Script Corpus — digitization of CISI in JSON format
- Repository: https://github.com/mayig/indus-valley-script-corpus
- License: MIT
- Accessed: 2026-05-14
- Coverage: 179 Mohenjo-daro inscriptions (Parpola P-numbers); full repo may contain additional sites
- Original corpus: Parpola, A. et al. (1987-2010). Corpus of Indus Seals and Inscriptions, Vols. 1-3. Suomalainen Tiedeakatemia, Helsinki.
- Used in:
indus_cisi.py(existing),corpus_indus_objectize.py(expanded),indus_corpus_v2.py - Rights gate: ML training OK (MIT); redistribution OK with attribution
- Author: The Metropolitan Museum of Art
- Title: Met Museum Collections Open Access API
- URL: https://collectionapi.metmuseum.org
- License: CC0 (public domain objects)
- Accessed: 2026-05-14 (ongoing)
- Search endpoint:
/public/collection/v1/search?hasImages=true&q=Indus+Valley - Object endpoint:
/public/collection/v1/objects/{objectID} - Used in:
corpus_indus_acquire_free.py,corpus_indus_objectize.py - Rights gate: CC0 objects: ML training OK, redistribution OK. Per-object
isPublicDomainflag must be verified.
- Author: Cleveland Museum of Art
- Title: Cleveland Museum of Art Open Access
- URL: https://openaccess-api.clevelandart.org
- License: CC0 (public domain artworks)
- Accessed: 2026-05-14 (ongoing)
- API endpoint:
https://openaccess-api.clevelandart.org/api/artworks/?q=indus - Sample object: https://www.clevelandart.org/art/1973.160
- Used in:
corpus_indus_acquire_free.py,corpus_indus_objectize.py - Rights gate: CC0 public-domain objects: ML training OK, redistribution OK. Unrestricted metadata and images via API.
- Author: Penn Museum (University of Pennsylvania Museum of Archaeology and Anthropology)
- Title: Penn Museum Collections Open Data
- URL: https://www.penn.museum/collections/
- Data endpoint: https://www.penn.museum/collections/objects/data.php
- License: CC BY 4.0 (metadata dataset); images: noncommercial-educational (without further permission)
- Publication-quality images: formal request required (https://www.penn.museum/collections/permission-to-reproduce.php)
- Accessed: 2026-05-14
- Used in:
corpus_indus_acquire_free.py,corpus_indus_objectize.py - Rights gate: Metadata CC BY 4.0: ML OK with attribution. Images: research use; publication requires request.
- Author: Ministry of Culture, Government of India
- Title: Indian Culture Portal — Harappan / Indus Civilization materials
- URL: https://indianculture.gov.in
- Key target URLs:
- License: Government of India cultural portal — rights tracked per item
- Accessed: 2026-05-14
- Rights class:
india-gov-cultural— research use; redistribution requires verification - Used in:
corpus_indus_acquire_free.py(OCR seed data only) - Rights gate: Research use OK; no ML training or redistribution without explicit per-item rights clearance.
- Author: Roja Muthiah Research Library, Chennai; Indus Research Centre
- Title: RMRL Indus Script Portal + Bulletins (Mahadevan 1977 concordance based)
- URL: https://rmrl.in/en/irc
- Portal: https://indusscript.in
- Bulletins: https://rmrl.in/bulletin/bulletin-No-{1-6}-{date}.pdf
- License: RMRL research use — contact required for concordance export
- Accessed: 2026-05-14
- Rights class:
rmrl-research— research use; redistribution requires contact - Note: RMRL states a new expanded concordance is in development — highest priority institutional contact.
- Used in:
corpus_indus_acquire_free.py, future concordance cooperation - Rights gate: Research use; contact RMRL (https://rmrl.in/en/irc) before any export or redistribution.
- Author: Ministry of Culture, Government of India; C-DAC
- Title: Museums of India Repository
- URL: https://www.museumsofindia.gov.in/repository/
- Endpoints:
- Museum list:
https://museumsofindia.gov.in/repository/collection/musuemList - Search:
https://museumsofindia.gov.in/repository/search-api
- Museum list:
- License: Restrictive assumed; rights tracked per record
- Rights class:
india-museum-restricted— discovery and metadata reconciliation only - Used in:
corpus_indus_acquire_free.py(metadata discovery only) - Rights gate: No ML training or redistribution without explicit per-record rights clearance.
- Author: Various (Mahadevan 1977 scan; corpus-vol-2 scan; other scans)
- Title: Internet Archive IIIF — Indus Script OCR sources
- URL: https://archive.org
- Key items:
archive.org/details/TheIndusScript.TextConcordanceAndTablesIravathanMahadevanarchive.org/details/corpus-vol-2
- IIIF manifest pattern:
https://iiif.archive.org/iiif/{identifier}/manifest.json - License: Varies per item; treat as derivative fallback
- Rights class:
internet-archive-derivative— OCR seeding only - Used in:
corpus_indus_acquire_free.py(IIIF manifests for OCR seed data) - Rights gate: Do NOT canonicalize readings without reconciliation against official editions. Do NOT include in ML training or redistribution.
Section I added: 2026-05-14. Branch: corpus/icit-scale-reconstruction.