Skip to content

[FEAT] sherpa-onnx.rn: expose offline CT-Transformer punctuation engine via PunctuationService #413

Description

@brainlybai

Environment

  • expo-audio-studio version: n/a (this is a @siteed/sherpa-onnx.rn feature request)
  • Package: @siteed/sherpa-onnx.rn 1.3.1-beta.1
  • Expo SDK version: 55
  • Platform & OS version: iOS 17 + Android 14 (both affected)
  • Device: any

Note: filed against this repo because sherpa-onnx.rn lives in this monorepo. Not related to expo-audio-studio.

Description

The package already vendors the offline CT-Transformer punctuation bindings on both platforms:

  • iOS: SherpaOnnxCreateOfflinePunctuation, SherpaOfflinePunctuationAddPunct, SherpaOnnxDestroyOfflinePunctuation (C API in CSherpaOnnx)
  • Android: com.k2fsa.sherpa.onnx.OfflinePunctuation (Kotlin)

But the RN PunctuationService / PunctuationHandler only reaches the online CNN-BiLSTM engine. There is no way for a JS caller to use the offline CT-Transformer models from JS today, even though the native code paths are compiled in and shipped in every install.

This blocks using the sherpa-onnx-punct-ct-transformer-zh-en-vocab272727 family of models, which produce Chinese + English punctuation in a single ~70 MB int8 model — useful for any ASR pipeline that needs both languages from one punctuator.

Expected Behavior

A JS caller can opt into the offline engine by passing a model filename in PunctuationModelConfig, e.g.:

await PunctuationService.init({
  modelDir: '/abs/path/to/punct-ct-transformer-zh-en',
  model: 'model.int8.onnx',
});
const out = await PunctuationService.addPunctuation('你好 hello world how are you');
// → '你好, hello world. How are you?'

When model is set, the native side uses the offline CT-Transformer engine and ignores cnnBilstm / bpeVocab. When model is unset, behavior is unchanged (existing online callers see no regression).

Actual Behavior

PunctuationService.init only accepts cnnBilstm + bpeVocab and always constructs an OnlinePunctuation. The offline OfflinePunctuation class is unreachable from JS.

Logs

n/a (missing feature, not a crash)

Configuration

// Today: only online engine reachable
await PunctuationService.init({
  modelDir,
  cnnBilstm: 'model.onnx',
  bpeVocab: 'bpe.vocab',
});

// Desired: offline CT-Transformer engine
await PunctuationService.init({
  modelDir,
  model: 'model.int8.onnx',
});

Fix

Wire the existing offline bindings through PunctuationHandler (Kotlin + Swift) and PunctuationService (TS). Backwards-compatible — a new optional model field opts into the offline engine; absence of model keeps the current online behavior.

PR: #410

Checklist

  • Verified offline bindings already exist in iOS CSherpaOnnx and Android Kotlin
  • Implemented end-to-end and validated in a downstream RN 0.85 / New Architecture app
  • Existing online callers remain unchanged (verified via unit test)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions