Merge pull request #888 from Vonage/feature/VIDPA-1417/update_docs_with_captions_and_ind

markbackman · web-flow · commit a3fb63fa9c86 · 2026-06-15T21:54:24.000-04:00
Adding docs for new Vonage WebRTC transport features in PR #4686
diff --git a/api-reference/server/services/transport/vonage.mdx b/api-reference/server/services/transport/vonage.mdx
@@ -83,8 +83,10 @@ Before using `VonageVideoConnectorTransport`, you need:
 
 - **Vonage Video API**: Integrate with Vonage's managed WebRTC infrastructure
 - **Audio and Video I/O**: Bidirectional audio and video streaming
+- **Captions**: Receive real-time transcription frames from session participants
+- **Individual Audio Streams**: Subscribe to per-participant audio in addition to the session-level mix
 - **Participant Management**: Stream subscription and participant lifecycle events
-- **Auto-subscription**: Optionally auto-subscribe to incoming audio and video streams
+- **Auto-subscription**: Optionally auto-subscribe to incoming audio, video, and captions streams
 - **Interruption Handling**: Automatic media buffer clearing on pipeline interruptions
 
 ## Configuration
@@ -143,6 +145,16 @@ Inherits all parameters from [TransportParams](/api-reference/server/services/tr
   participants.
 </ParamField>
 
+<ParamField path="captions_in_enabled" type="bool" default="False">
+  Whether to enable captions input. When enabled, the transport will process
+  incoming transcription frames from subscribers.
+</ParamField>
+
+<ParamField path="captions_in_auto_subscribe" type="bool" default="False">
+  Whether to automatically subscribe to incoming captions streams from session
+  participants. Requires `captions_in_enabled` to be `True`.
+</ParamField>
+
 <ParamField
   path="video_in_preferred_resolution"
   type="tuple[int, int]"
@@ -171,7 +183,7 @@ Inherits all parameters from [TransportParams](/api-reference/server/services/tr
 
 ### SubscribeSettings
 
-Used with `subscribe_to_stream()` to control per-stream subscription quality when `audio_in_auto_subscribe` or `video_in_auto_subscribe` are disabled.
+Used with `subscribe_to_stream()` to control per-stream subscription quality when `audio_in_auto_subscribe`, `video_in_auto_subscribe`, or `captions_in_auto_subscribe` are disabled.
 
 <ParamField path="subscribe_to_audio" type="bool" default="True">
   Whether to subscribe to the stream's audio track.
@@ -181,6 +193,10 @@ Used with `subscribe_to_stream()` to control per-stream subscription quality whe
   Whether to subscribe to the stream's video track.
 </ParamField>
 
+<ParamField path="subscribe_to_captions" type="bool" default="False">
+  Whether to subscribe to the stream's captions track.
+</ParamField>
+
 <ParamField path="preferred_resolution" type="tuple[int, int]" default="None">
   Preferred `(width, height)` resolution for the subscribed video track. The
   server provides the closest available quality if the exact resolution is
@@ -227,7 +243,7 @@ See the [complete example](https://github.com/pipecat-ai/pipecat/blob/main/examp
 
 ### Subscribing to streams manually
 
-When `audio_in_auto_subscribe` or `video_in_auto_subscribe` is disabled, subscribe to a specific participant's stream with `subscribe_to_stream()`, passing [SubscribeSettings](#subscribesettings) to control which tracks are received and at what quality. The `streamId` is available from the `on_participant_joined` event data.
+When `audio_in_auto_subscribe`, `video_in_auto_subscribe`, or `captions_in_auto_subscribe` is disabled, subscribe to a specific participant's stream with `subscribe_to_stream()`, passing [SubscribeSettings](#subscribesettings) to control which tracks are received and at what quality. The `streamId` is available from the `on_participant_joined` event data.
 
 ```python
 from pipecat.transports.vonage.video_connector import SubscribeSettings
@@ -237,12 +253,35 @@ await transport.subscribe_to_stream(
     params=SubscribeSettings(
         subscribe_to_audio=True,
         subscribe_to_video=True,
+        subscribe_to_captions=True,
         preferred_resolution=(1280, 720),
         preferred_framerate=30,
     ),
 )
 ```
 
+### Receiving captions
+
+Enable captions to receive real-time `TranscriptionFrame` and `InterimTranscriptionFrame` from participants. Each frame includes the `user_id` (stream ID) of the speaker.
+
+```python
+transport = VonageVideoConnectorTransport(
+    application_id,
+    session_id,
+    token,
+    VonageVideoConnectorTransportParams(
+        audio_in_enabled=True,
+        audio_out_enabled=True,
+        captions_in_enabled=True,
+        captions_in_auto_subscribe=True,
+    ),
+)
+```
+
+### Individual audio streams
+
+By default, audio input is received as a session-level mix of all participants. When you subscribe to a stream (either manually or via auto-subscribe), the transport also delivers per-subscriber `UserAudioRawFrame` frames with a `user_id` field identifying the source participant. This enables use cases like speaker diarization or per-participant processing.
+
 ## Event Handlers
 
 `VonageVideoConnectorTransport` provides event handlers for session lifecycle, participant stream management, and subscriber connectivity. Register handlers using the `@event_handler` decorator on the transport instance.