Skip to content

feat(pt-BR): add number parsing support and ordinal numeric formatting#1785

Open
Kaikygabriel wants to merge 3 commits into
Humanizr:mainfrom
Kaikygabriel:main
Open

feat(pt-BR): add number parsing support and ordinal numeric formatting#1785
Kaikygabriel wants to merge 3 commits into
Humanizr:mainfrom
Kaikygabriel:main

Conversation

@Kaikygabriel

Copy link
Copy Markdown

Here is a checklist you should tick through before submitting a pull request:

  • Implementation is clean
  • Code adheres to the existing coding standards; e.g. no curlies for one-line blocks, no redundant empty lines between methods or code blocks, spaces rather than tabs, etc.
  • No Code Analysis warnings
  • There is proper unit test coverage
  • If the code is copied from StackOverflow (or a blog or OSS) full disclosure is included. That includes required license files and/or file headers explaining where the code came from with proper attribution
  • There are very few or no comments (because comments shouldn't be needed if you write clean code)
  • Xml documentation is added/updated for the addition/change
  • Your PR is (re)based on top of the latest commits from the main branch (more info below)
  • Link to the issue(s) you're fixing from your PR description. Use fixes #<the issue number>
  • Readme is updated if you change an existing feature or add a new one
  • Run either build.cmd or build.ps1 and ensure there are no test failures

Enhances the pt-BR locale with improved localization and number parsing support.

Changes include:

  • Added token-based number parsing (cardinal and ordinal)
  • Added support for ordinal numeric formatting (e.g., 1º, 2º)
  • Improved time unit symbols for better clarity
  • Adjusted grammatical gender for better linguistic accuracy
  • General consistency improvements across phrases and units

No runtime code changes were made. This PR focuses only on locale improvements.
This brings the pt-BR locale closer to feature parity with the en locale while preserving natural Portuguese usage.

@coderabbitai

coderabbitai Bot commented May 24, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • Improvements
    • Portuguese (pt-BR) number parsing: more robust handling of case, periods, negatives, connector words, and ordinal abbreviations for more accurate numeric interpretation.
    • Expanded token mappings for units, tens, hundreds and ordinals, plus parse options to better handle terminal ordinals and hundred multipliers.
    • More natural clock/time expressions in Portuguese: added singular/plural articles and updated minute-offset templates for grammatical correctness.

Walkthrough

Adds a token-map number parser under surfaces.number.parse for pt-BR and updates clock templates to use {nextArticle} with new singularArticle/pluralArticle keys.

Changes

Portuguese-Brazil Locale Updates

Layer / File(s) Summary
Numeric parsing configuration
src/Humanizer/Locales/pt-BR.yml
Added surfaces.number.parse block using token-map engine with lowercase and period-removal normalization, cardinal and ordinal token-to-value maps (units, tens, hundreds, ordinals), menos negative prefix, ignored token e, ordinal suffixes º/ª, and parse option flags for terminal ordinals, hundred multiplication, and invariant integer input.
Time formatting: clock phrase templates
src/Humanizer/Locales/pt-BR.yml
Introduced singularArticle: "a" and pluralArticle: "as", and updated min40, min45, min50, and min55 templates to use {nextArticle} instead of a hardcoded as.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 No pt‑BR o salto é preciso,
sem encurta o passo do aviso,
Números aprendem a falar direito,
Relógios pedem artigos no enfeite,
Um rabisco feliz celebra o serviço.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main changes: adding number parsing support and ordinal numeric formatting to the pt-BR locale, which aligns directly with the primary modifications in the changeset.
Description check ✅ Passed The description provides relevant context about the locale enhancements, including number parsing and ordinal formatting, which directly relate to the changes made in the YAML configuration file.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 05ffb8e3a4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

noventa: 90
cem: 100
mil: 1000
milhão: 1000000

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Add missing milhões token to pt-BR cardinal map

The new token-map parser for pt-BR only includes milhão but omits the plural milhões, even though this locale’s number-to-words output uses plural million forms (for example, values like 2,000,000 are rendered with milhões). TokenMapWordsToNumberConverter resolves scale words via exact token lookup, so dois milhões is treated as unrecognized and parsing fails for common million-range inputs.

Useful? React with 👍 / 👎.

setenta: 70
oitenta: 80
noventa: 90
cem: 100

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Add cento/hundreds tokens needed for canonical pt-BR parsing

This cardinal map defines cem but not cento (nor other hundred words), while the same locale’s number-to-words surface emits forms like cento e um and duzentos. With token-map parsing, those words must exist in cardinalMap to be recognized, so many standard pt-BR numbers in the 101–999 range will fail to parse even though they are canonical outputs of the locale.

Useful? React with 👍 / 👎.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/Humanizer/Locales/pt-BR.yml`:
- Around line 342-378: Add the missing numeric word keys to the cardinalMap so
parsing handles plural and hundred forms: include plural large-number keys
("milhões", "trilhões", "quadrilhões", "quintilhões"), add hundreds entries
matching number.words.cardinal.hundredsMap ("duzentos", "trezentos",
"quatrocentos", "quinhentos", "seiscentos", "setecentos", "oitocentos",
"novecentos"), and ensure "milhões" maps to 1000000 while the plural
large-number keys map to their corresponding powers of 1,000; update the
cardinalMap block (the YAML mapping named cardinalMap) to contain these keys and
their numeric values to mirror the singular forms already present (e.g.,
"milhão":1000000).
- Around line 441-444: Add Portuguese articles for nextHour in the pt-BR clock
templates: define singularArticle and pluralArticle entries in the pt-BR clock
section (same keys used by other locales) and update min40, min45, min50 and
min55 to include the nextArticle token before nextHour (use "{nextArticle}
{nextHour}" pattern) so phrases like "vinte para as duas" are produced; modify
the existing min40/min45/min50/min55 templates and add
singularArticle/pluralArticle keys accordingly in the pt-BR locale block.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 51370ed1-4ba6-4b98-bdc2-fc343f3825e3

📥 Commits

Reviewing files that changed from the base of the PR and between f9292aa and 05ffb8e.

📒 Files selected for processing (1)
  • src/Humanizer/Locales/pt-BR.yml

Comment thread src/Humanizer/Locales/pt-BR.yml
Comment thread src/Humanizer/Locales/pt-BR.yml Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/Humanizer/Locales/pt-BR.yml (1)

238-238: 💤 Low value

Consider using an abbreviated symbol for week.

Other time unit symbols use short abbreviations (ms, s, min, h, d, m, a), but week uses the full word semana. While Portuguese lacks a universally accepted abbreviation for "week," using sem or sem. would maintain consistency with other symbols. If the full word is intentional for clarity, feel free to disregard.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/Humanizer/Locales/pt-BR.yml` at line 238, The pt-BR locale uses the full
word for the week symbol ("symbol: 'semana'"); change that value to a short
abbreviation (e.g., "sem" or "sem.") to match the other short unit symbols (ms,
s, min, h, d, m, a) and keep consistency across time unit symbols—update the
"symbol: 'semana'" entry accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/Humanizer/Locales/pt-BR.yml`:
- Line 238: The pt-BR locale uses the full word for the week symbol ("symbol:
'semana'"); change that value to a short abbreviation (e.g., "sem" or "sem.") to
match the other short unit symbols (ms, s, min, h, d, m, a) and keep consistency
across time unit symbols—update the "symbol: 'semana'" entry accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 4603d172-ce6f-4c3a-a670-d162a4f5d93c

📥 Commits

Reviewing files that changed from the base of the PR and between d25e491 and 28d6505.

📒 Files selected for processing (1)
  • src/Humanizer/Locales/pt-BR.yml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant