Przejdź do głównej zawartości

VistaScribe

VistaScribe to lokalny serwer STT oparty na FastAPI. Obsługuje speech-to-text, formatowanie AI i diagnostykę.

ParametrWartość
Base URLhttp://127.0.0.1:8237
FrameworkFastAPI
FirewallLoopback-only
EndpointTransportPurpose
GET /healthzHTTPLiveness/readiness - Whisper availability + AI provider info
GET /versionHTTPBuild metadata - model paths, Light+ status, AI toggles
POST /transcribeMultipart HTTPSingle-shot STT (wav/mp3/m4a/flac/webm)
POST /stream/transcribeChunked HTTP (NDJSON)Streaming STT bez socketów
WS /ws/transcribeWebSocketBidirectional streaming STT - najlepsza latencja
POST /formatJSONAI formatting only (Light+ → Harmony/Ollama)
POST /stt_and_formatMultipart HTTPConvenience: /transcribe + /format
POST /demo/chatJSONChat proxy dla Voice & Chat Lab
Okno terminala
curl http://127.0.0.1:8237/healthz | jq
Okno terminala
curl -X POST \
-F "audio=@/path/to/sample.wav" \
http://127.0.0.1:8237/transcribe

Response:

{ "text": "Pan pies czuje się dobrze..." }

Send newline-delimited JSON (chunked):

Okno terminala
cat <<'PAYLOAD' | curl -X POST \
-H 'Content-Type: application/x-ndjson' \
--data-binary @- \
http://127.0.0.1:8237/stream/transcribe
{"type":"chunk","audio_base64":"<PCM16 base64>","sample_rate":16000}
{"type":"chunk","audio_base64":"...","last":true}
PAYLOAD

Response (also NDJSON):

{"type":"hello","protocol":"stt-jsonl-v1"}
{"type":"ack","received_bytes":32768}
{"type":"transcript.final","text":"..."}
{"type":"stream.closed"}

Ta sama schema jak NDJSON:

const ws = new WebSocket('ws://127.0.0.1:8237/ws/transcribe');
ws.onmessage = (event) => console.log(JSON.parse(event.data));
ws.onopen = () => {
ws.send(JSON.stringify({
type: 'chunk',
audio_base64: '<PCM16 base64>',
sample_rate: 16000,
}));
ws.send(JSON.stringify({ type: 'end' }));
};

Oczekiwane eventy: hello, ack, transcript.final, stream.closed.

MessageResponseOpis
chunkack (received_bytes)PCM16 base64 + optional sample_rate/encoding
flushtranscript.finalImmediate transcription
endtranscript.final + stream.closedFlush and close
Okno terminala
curl -X POST http://127.0.0.1:8237/format \
-H 'Content-Type: application/json' \
-d '{"text":"pan pies jest zdrowy.","assistive":false}'
Okno terminala
curl -X POST \
-F "audio=@/path/to/sample.wav" \
-F "instruction=stress important diagnoses" \
http://127.0.0.1:8237/stt_and_format
ScenariuszRekomendacja
Najniższa latencja (hands-off mode)WebSocket /ws/transcribe
Pure-HTTP (Safari extension, service worker)/stream/transcribe
Batch uploads (drag-and-drop)/transcribe
LimitWartośćKonfiguracja
Max upload size20 MBBACKEND_MAX_UPLOAD_MB
ZmiennaOpis
HARMONY_BASE_URLURL do Harmony API
HARMONY_API_KEYKlucz API
ai_formatting_enabledToggle w tray menu