Skip to content

KokoroAne English raw text: consider strict normalization for standalone numbers and times #711

@LemonCANDY42

Description

@LemonCANDY42

Context

KokoroAne English now has a much better raw-text frontend after the Misaki lexicon work, but raw numeric text still appears to be handled mostly by tokenization + lexicon/G2P fallback.

FluidAudio already has SayAsInterpreter for SSML <say-as>, but the public KokoroAne English raw-text path does not seem to apply a strict text-normalization pass before KokoroAneEnglishPhonemizer tokenization.

Observed / likely affected cases

Common chat-style English text can include:

  • I am 26 years old.
  • Today is June 13th.
  • The score is 3.14.
  • The current time is 1:49 PM.

In a raw-text KokoroAne path, these can reach the word-level G2P path or punctuation tokenization in shapes that are not ideal for TTS. For example, 3.14 can be split around . and sound closer to three fourteen instead of three point one four.

Constraints / non-goals

This should probably not become a broad, locale-sensitive text-normalization system in the KokoroAne frontend. A conservative pass should avoid rewriting ambiguous or structured strings where caller intent is unclear.

Examples that should likely be left unchanged unless a larger TN design is accepted:

  • version-like strings: 1.2.3
  • separated number formats: 1,234
  • embedded digits: word26, 26word
  • loose colon numbers: 1:49
  • invalid times: 1:99 PM
  • 24-hour forms if not explicitly supported: 13:49

Conservative idea

A narrow pre-tokenization pass for KokoroAne English raw text could handle only strict standalone forms:

  • standalone cardinal integers: 26 -> twenty six
  • valid ordinals: 13th -> thirteenth
  • leading-zero digit strings: 007 -> zero zero seven
  • decimals: 3.14 -> three point one four or a variant with an explicit pause after point
  • explicit 12-hour meridiem times: 1:49 PM, 1:49 p.m. -> one forty nine p m

The implementation could reuse or share logic with SayAsInterpreter where appropriate, but keep the raw-text rules stricter than SSML because raw text has no explicit caller annotation.

Possible follow-up

If maintainers agree this belongs in the KokoroAne English raw-text frontend, I can prepare a small PR with tests for the supported forms plus negative tests for the ambiguous forms above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions