#78 - Clean up NER types#1633
Merged
Merged
Conversation
- Drop 27 marker subtypes from NamedEntity type system (Person, Organization, Location, Date, Time, Money, Percent, Quantity, Ordinal, Cardinal, WorkOfArt, Animal, Plant, Substance, Disease, Event, Law, Language, ContactInfo, Game, Fac, FacDesc, Gpe, GpeDesc, Norp, OrgDesc, PerDesc, Product, ProductDesc, Nationality) — keep only `NamedEntity` with its `value` + `identifier` features - Delete 11 per-model CoreNLP NER tag-to-type `.map` files; `MappingProvider` falls through to `BASE_TYPE=NamedEntity` and records the source tag in `value` - Keep the generic NER mapping mechanism (`MappingProviderFactory.createNerMappingProvider`, `mappingLocation` parameter, `ner-default-variants.map`) so users can still supply custom `.map` files - Rewrite `DKPro2Gate` to route GATE annotation types (`Person`/`Location`/`Organization`/`NamedEntity`) off `NamedEntity.value` instead of `instanceof` on subtypes; drop the now-redundant `dkproType` GATE feature - Collapse the two GATE writer NER tests into one (`oneWayNamedEntity`); delete obsolete `ner2002_conll.map` + `ner2002_ref_specific.xml`; rename `ner2002_ref_generic.xml` → `ner2002_ref.xml` and update to expected GATE types - Update brat reader/writer tests and `.ann` reference fixtures to use `NamedEntity` instead of `Person`/`Location`/`Organization` subtypes - Update brat reader Javadoc example to reference `NamedEntity` instead of `Location` - Update CoreNLP NER test assertions from `Person(...)`/`Location(...)`/`Organization(...)` to `NamedEntity(...)` - Strip 30 deleted subtype entries from `typesystemmapping.yaml`; update brat documentation example to map `(LOC|PER|ORG)` → `NamedEntity`
338deba to
3fa9c9d
Compare
- Rewrite the type4:Person elements to type4:NamedEntity; xmi:ids stay stable so the view member references remain valid.
3fa9c9d to
48f814a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's in the PR
NamedEntitywith itsvalue+identifierfeatures.mapfiles;MappingProviderfalls through toBASE_TYPE=NamedEntityand records the source tag invalueMappingProviderFactory.createNerMappingProvider,mappingLocationparameter,ner-default-variants.map) so users can still supply custom.mapfilesDKPro2Gateto route GATE annotation types (Person/Location/Organization/NamedEntity) offNamedEntity.valueinstead ofinstanceofon subtypes; drop the now-redundantdkproTypeGATE featureoneWayNamedEntity); delete obsoletener2002_conll.map+ner2002_ref_specific.xml; renamener2002_ref_generic.xml→ner2002_ref.xmland update to expected GATE types.annreference fixtures to useNamedEntityinstead ofPerson/Location/OrganizationsubtypesNamedEntityinstead ofLocationPerson(...)/Location(...)/Organization(...)toNamedEntity(...)typesystemmapping.yaml; update brat documentation example to map(LOC|PER|ORG)→NamedEntityHow to test manually
Automatic testing
Documentation