KokoroAne English raw text: consider strict normalization for standalone numbers and times

## Context

KokoroAne English now has a much better raw-text frontend after the Misaki lexicon work, but raw numeric text still appears to be handled mostly by tokenization + lexicon/G2P fallback.

FluidAudio already has `SayAsInterpreter` for SSML `<say-as>`, but the public KokoroAne English raw-text path does not seem to apply a strict text-normalization pass before `KokoroAneEnglishPhonemizer` tokenization.

## Observed / likely affected cases

Common chat-style English text can include:

- `I am 26 years old.`
- `Today is June 13th.`
- `The score is 3.14.`
- `The current time is 1:49 PM.`

In a raw-text KokoroAne path, these can reach the word-level G2P path or punctuation tokenization in shapes that are not ideal for TTS. For example, `3.14` can be split around `.` and sound closer to `three fourteen` instead of `three point one four`.

## Constraints / non-goals

This should probably not become a broad, locale-sensitive text-normalization system in the KokoroAne frontend. A conservative pass should avoid rewriting ambiguous or structured strings where caller intent is unclear.

Examples that should likely be left unchanged unless a larger TN design is accepted:

- version-like strings: `1.2.3`
- separated number formats: `1,234`
- embedded digits: `word26`, `26word`
- loose colon numbers: `1:49`
- invalid times: `1:99 PM`
- 24-hour forms if not explicitly supported: `13:49`

## Conservative idea

A narrow pre-tokenization pass for KokoroAne English raw text could handle only strict standalone forms:

- standalone cardinal integers: `26` -> `twenty six`
- valid ordinals: `13th` -> `thirteenth`
- leading-zero digit strings: `007` -> `zero zero seven`
- decimals: `3.14` -> `three point one four` or a variant with an explicit pause after `point`
- explicit 12-hour meridiem times: `1:49 PM`, `1:49 p.m.` -> `one forty nine p m`

The implementation could reuse or share logic with `SayAsInterpreter` where appropriate, but keep the raw-text rules stricter than SSML because raw text has no explicit caller annotation.

## Possible follow-up

If maintainers agree this belongs in the KokoroAne English raw-text frontend, I can prepare a small PR with tests for the supported forms plus negative tests for the ambiguous forms above.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KokoroAne English raw text: consider strict normalization for standalone numbers and times #711

Context

Observed / likely affected cases

Constraints / non-goals

Conservative idea

Possible follow-up

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

KokoroAne English raw text: consider strict normalization for standalone numbers and times #711

Description

Context

Observed / likely affected cases

Constraints / non-goals

Conservative idea

Possible follow-up

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions