Przejdź do głównej zawartości

STT WebSocket

Production endpoint dla real-time speech-to-text używany przez Vista clients.

ParametrWartość
URLwss://api.libraxis.cloud/stt/v1/stream
Authx-api-key: <LIBRAXIS_API_KEY>
ProtocolJSON (init → audio → interim/final → end)
sequenceDiagram
participant C as Client
participant S as Server
C->>S: init (language, sample_rate, encoding)
S-->>C: ready (session_id, model, language)
loop Audio streaming
C->>S: audio (base64 PCM16)
S-->>C: interim (text, confidence)
end
C->>S: end
S-->>C: final (text, confidence, words)
S-->>C: ended
{
"type": "init",
"language": "pl",
"sample_rate": 16000,
"encoding": "pcm16",
"session_id": "optional"
}

Server odpowiada ready:

{
"type": "ready",
"session_id": "sess_abc",
"model": "whisper-large-v3",
"language": "pl",
"sample_rate": 16000
}
{
"type": "audio",
"audio": "<base64 of PCM16LE mono @16kHz>"
}
  • Chunk size: 20ms–1s
  • Recommended: 32kB (~1s)
{
"type": "interim",
"text": "...",
"confidence": 0.85,
"language": "pl"
}
{
"type": "final",
"text": "...",
"confidence": 0.94,
"words": [
{
"word": "...",
"start": 0.0,
"end": 0.5,
"confidence": 0.96
}
]
}
MessageResponseOpis
{"type": "pause"}Pauza streamingu
{"type": "resume"}Wznowienie
{"type": "end"}{"type": "ended"}Zakończenie sesji
LimitWartość
Max global connections100
Per IP10
Per API key20
Idle timeout300s
Max session duration3600s
PlikRola
src-tauri/src/engines/recordings/visit_audio.rslibraxis_stt::WebsocketClient
unified_ai/service_resolver/provider_registry.rsProvider routing
src/hooks/audio/useAudioRecording/uploadController.tsFE upload controller
ŚrodowiskoURL
Productionwss://api.libraxis.cloud/stt/v1/stream
DevelopmentVISTASCRIBE_URL via provider_registry
  • Preferowany: PCM16LE 16kHz mono
  • Jeśli input OGG/OPUS: transcode w kliencie do PCM16 (convert_to_pcm_stereo)
  1. Handle interim/final messages
  2. Surface partials w UI
  3. Send end on stream completion lub gdy user przestaje mówić

Errors z recoverable=true nie powinny przerywać sesji:

  • Retry lub kontynuuj streaming
  • Log do secure_logger dla visibility

Minimalny client z ffmpeg - patrz ~/.ai-collaborators/ws-stt-quickstart.md

Copy sample i ustaw LIBRAXIS_API_KEY