Skip to content

#78 - Clean up NER types#1633

Merged
reckart merged 2 commits into
mainfrom
removal/78-Clean-up-NER-types
Jun 14, 2026
Merged

#78 - Clean up NER types#1633
reckart merged 2 commits into
mainfrom
removal/78-Clean-up-NER-types

Conversation

@reckart

@reckart reckart commented Jun 14, 2026

Copy link
Copy Markdown
Member

What's in the PR

  • Drop 27 marker subtypes from NamedEntity type system (Person, Organization, Location, Date, Time, Money, Percent, Quantity, Ordinal, Cardinal, WorkOfArt, Animal, Plant, Substance, Disease, Event, Law, Language, ContactInfo, Game, Fac, FacDesc, Gpe, GpeDesc, Norp, OrgDesc, PerDesc, Product, ProductDesc, Nationality) — keep only NamedEntity with its value + identifier features
  • Delete 11 per-model CoreNLP NER tag-to-type .map files; MappingProvider falls through to BASE_TYPE=NamedEntity and records the source tag in value
  • Keep the generic NER mapping mechanism (MappingProviderFactory.createNerMappingProvider, mappingLocation parameter, ner-default-variants.map) so users can still supply custom .map files
  • Rewrite DKPro2Gate to route GATE annotation types (Person/Location/Organization/NamedEntity) off NamedEntity.value instead of instanceof on subtypes; drop the now-redundant dkproType GATE feature
  • Collapse the two GATE writer NER tests into one (oneWayNamedEntity); delete obsolete ner2002_conll.map + ner2002_ref_specific.xml; rename ner2002_ref_generic.xmlner2002_ref.xml and update to expected GATE types
  • Update brat reader/writer tests and .ann reference fixtures to use NamedEntity instead of Person/Location/Organization subtypes
  • Update brat reader Javadoc example to reference NamedEntity instead of Location
  • Update CoreNLP NER test assertions from Person(...)/Location(...)/Organization(...) to NamedEntity(...)
  • Strip 30 deleted subtype entries from typesystemmapping.yaml; update brat documentation example to map (LOC|PER|ORG)NamedEntity

How to test manually

  • No specific test procedure

Automatic testing

  • PR includes unit tests

Documentation

  • PR updates documentation

- Drop 27 marker subtypes from NamedEntity type system (Person, Organization, Location, Date, Time, Money, Percent, Quantity, Ordinal, Cardinal, WorkOfArt, Animal, Plant, Substance, Disease, Event, Law, Language, ContactInfo, Game, Fac, FacDesc, Gpe, GpeDesc, Norp, OrgDesc, PerDesc, Product, ProductDesc, Nationality) — keep only `NamedEntity` with its `value` + `identifier` features
- Delete 11 per-model CoreNLP NER tag-to-type `.map` files; `MappingProvider` falls through to `BASE_TYPE=NamedEntity` and records the source tag in `value`
- Keep the generic NER mapping mechanism (`MappingProviderFactory.createNerMappingProvider`, `mappingLocation` parameter, `ner-default-variants.map`) so users can still supply custom `.map` files
- Rewrite `DKPro2Gate` to route GATE annotation types (`Person`/`Location`/`Organization`/`NamedEntity`) off `NamedEntity.value` instead of `instanceof` on subtypes; drop the now-redundant `dkproType` GATE feature
- Collapse the two GATE writer NER tests into one (`oneWayNamedEntity`); delete obsolete `ner2002_conll.map` + `ner2002_ref_specific.xml`; rename `ner2002_ref_generic.xml` → `ner2002_ref.xml` and update to expected GATE types
- Update brat reader/writer tests and `.ann` reference fixtures to use `NamedEntity` instead of `Person`/`Location`/`Organization` subtypes
- Update brat reader Javadoc example to reference `NamedEntity` instead of `Location`
- Update CoreNLP NER test assertions from `Person(...)`/`Location(...)`/`Organization(...)` to `NamedEntity(...)`
- Strip 30 deleted subtype entries from `typesystemmapping.yaml`; update brat documentation example to map `(LOC|PER|ORG)` → `NamedEntity`
@reckart reckart added this to the 3.0.0 milestone Jun 14, 2026
@reckart reckart self-assigned this Jun 14, 2026
@reckart reckart added this to Kanban Jun 14, 2026
@github-project-automation github-project-automation Bot moved this to In progress in Kanban Jun 14, 2026
@reckart reckart force-pushed the removal/78-Clean-up-NER-types branch from 338deba to 3fa9c9d Compare June 14, 2026 11:05
- Rewrite the type4:Person elements to type4:NamedEntity; xmi:ids stay stable so the view member references remain valid.
@reckart reckart force-pushed the removal/78-Clean-up-NER-types branch from 3fa9c9d to 48f814a Compare June 14, 2026 11:07
@reckart reckart merged commit 212ebf2 into main Jun 14, 2026
5 checks passed
@reckart reckart deleted the removal/78-Clean-up-NER-types branch June 14, 2026 11:30
@github-project-automation github-project-automation Bot moved this from In progress to Done in Kanban Jun 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant