Kompletny flow od nagrywania audio do transkrypcji i SOAP generation. ~25 plików, ~12k LOC.
Na tej stronie
START([Veterinarian starts visit])
RECORD[Audio Recording Begins]
STREAM[Real-time Audio Stream]
BUFFER[Audio Buffer Management]
subgraph "Audio Processing"
VAD[Voice Activity Detection]
SEGMENT[Audio Segmentation]
COMPRESS[Audio Compression]
subgraph "Speech Recognition"
STT_DETECT[STT Endpoint Detection]
STT_PROCESS[Speech-to-Text]
DIARIZATION[Speaker Diarization]
PREPROCESS[Text Preprocessing]
LLM_REQUEST[LLM SOAP Generation]
POST_PROCESS[Response Processing]
subgraph "Database Storage"
SAVE_AUDIO[Save Audio File]
SAVE_SEGMENTS[Save Speaker Segments]
SAVE_SOAP[Save SOAP Notes]
UPDATE_VISIT[Update Visit Status]
STT_DETECT --> STT_PROCESS
STT_PROCESS --> DIARIZATION
DIARIZATION --> PREPROCESS
PREPROCESS --> LLM_REQUEST
LLM_REQUEST --> POST_PROCESS
POST_PROCESS --> SAVE_AUDIO
SAVE_AUDIO --> SAVE_SEGMENTS
SAVE_SEGMENTS --> SAVE_SOAP
SAVE_SOAP --> UPDATE_VISIT
Supported formats:
- Recording: WebM (preferred), MP4, WAV
- Storage: Compressed OPUS for efficiency
- Processing: PCM conversion for AI services
- Cleanup: Configurable retention (7-365 days)
File locations:
~/Library/Application Support/Vista/
├── recordings/ # Original audio files
├── processed/ # Compressed/converted files
├── transcripts/ # Text output cache
└── temp/ # Processing workspace
| Method | Rust Command | Purpose |
|---|
saveFile(data, visitId) | save_audio_file | Save recording blob |
queueTranscription(recId) | queue_transcription_job | Queue for background |
transcribeRecording(recId) | transcribe_recording_now | Immediate transcription |
linkRecordingToVisit(recId, visitId) | link_recording_to_visit | Associate with visit |
listPendingCaptures(limit) | list_pending_audio_captures | Get unattached recordings |
attachCaptureToVisit(recId, visitId) | attach_capture_to_visit | Link pending to visit |
discardPendingCapture(recId) | discard_pending_capture | Delete pending |
cleanupOldRecordings(days) | cleanup_old_recordings | Remove old files |
processUploadedFile(path) | process_uploaded_audio_file | Handle file upload |
[*] --> recording: MediaRecorder.start()
recording --> stopped: MediaRecorder.stop()
stopped --> saving: saveFile()
saving --> saved: DB insert success
saved --> queued: queueTranscription()
saved --> transcribing: transcribeRecording()
queued --> transcribing: Worker picks up
transcribing --> transcribed: WhisperX success
transcribing --> error: WhisperX failed
transcribed --> linked: linkRecordingToVisit()
RECORDINGS ||--o| VISITS : "belongs to"
RECORDINGS ||--o{ TRANSCRIPTION_JOBS : "has"
string visit_id FK "nullable"
string source_type "live|upload"
| Event Name | Emitter | Listener | Purpose |
|---|
transcription_completed | transcribe_recording_now | useAiSuggestions | Trigger AI suggestions |
transcription_progress | transcribe_recording_now | UI | Progress updates |
recording_saved | save_audio_file | UI | Confirm save |
Każda powierzchnia używająca useAudioRecording uruchamia RecordingSession przez recording_start_session:
- PCM captured through
usePcm16Streamer
- Funnelled into
AudioCaptureCoordinator
- Chunked to fixed 40 ms packets
- Sent with
push_recording_chunk / push_stream_chunk
File uploads (visit audio panel, emergency queue, workspace recorder, Dev Playground) używają ingestSystemBlob():
blobToSystemPcm → feedSystemAudioChunk → flush coordinator → stop stream → transcript
AudioCaptureCoordinator.getTelemetry() śledzi:
| Metric | Purpose |
|---|
bufferDepthMs | Audio buffer depth |
pendingSources | Active audio sources |
droppedChunks | Chunks dropped (overflow) |
| Rule | Details |
|---|
| Invoke format | Strictly snake_case via invokeSnake |
| Permissions | Rust enforces use:ai / read:visits per command |
| Start | startVoiceStream yields { session_id, recording_session_id } |
| Stop | stopVoiceStream returns VoiceStreamSummary (status + transcript) |
| Immutability | Frontend never mutates session IDs |
| Surface | Live capture | System ingest | Notes |
|---|
Visit Audio Panel (AudioRecordingSection) | mic | manual uploads + emergency queue | No more process_uploaded_file polling |
| Emergency Recording Button / queue | yes | saved clip import → PCM stream | Visit gets transcript + recording_id immediately |
| Workspace recorder / Dev Playground | yes | feedSystemAudioChunk | Unchanged |
| PDF Drop Zone | n/a | n/a | Document attachments only |
recording_start_session → requires session_id, purpose, returns recording_session_id
recording_push_chunk → snake_case payload (chunk, sequence, recording_session_id)
recording_stop_session → finalizes, returns { status, transcript }
serve_audio_file → streams saved recording bytes for emergency queue import
transcribe_recording → VisitAudioService fast-path (recording_id, source_type)
queue_transcription_job → durable background job (recording_id, visit_id?)
| Scenario | Behavior |
|---|
| WebAudio unavailable | blobToSystemPcm returns null → fallback to transcribeRecordingWithFallback(recordingId, filePath) |
Engine disabled (ENGINE_ENABLED=false) | System ingest gated; uploads persist but STT via fallback helper |
| Large files | Coordinator flushes every 40ms; VITE_AUDIO_MAX_DURATION_MS (default 30 min) aborts long files |
blobToUint8Array → audioApi.saveFile writes upload to local store and links recording_id (when visit ID provided)
- Emergency queue entries serialize
{ recordingId, fileName, mimeType, sizeBytes } into emergencyRecordingQueue (localStorage)
useAudioTranscription - single entry point for “save and transcribe later” flows
- Extend
PDFDropZone/chat attachments once audio uploads in chat are introduced
- Add Playwright coverage: drop WAV into Visit Audio panel → assert transcript renders immediately
- Document emergency queue → visit flow in
.codex/audio (diagram)
| Metric | Value |
|---|
| Files | ~25 |
| LOC | ~12,000 |
| Tauri Commands | 16 |
| Key Hook | useAudioRecording (834 LOC) |
subgraph AudioCapture["Audio Capture Layer"]
AudioSource[Microphone Input]
Coordinator[AudioCaptureCoordinator<br/>visitId, callback, hooks]
subgraph Processing["Processing Layer"]
Normalise[ingestSystemBlob<br/>normalize PCM]
RecordingFile[recordings table<br/>WAV storage]
STT[STT Pipeline<br/>WhisperX / MLX / OpenAI]
subgraph Visit["Visit Layer"]
Transcript[visit.pending_transcript]
Merge[mergePendingTranscript]
AudioSource --> PCMChunks
PCMChunks --> Coordinator
Coordinator --> Normalise
Normalise --> RecordingFile
| Hook | LOC | Opis |
|---|
useAudioRecording | 834 | Recording state, start/stop |
useAudioTranscription | 456 | Transcription pipeline |
useAudioUpload | 234 | File upload handling |
useRecordingSession | 312 | Session management |
| ID | Severity | Description |
|---|
| P2-018 | MEDIUM | Emergency queue może się rozrosnąć przy offline |
| P2-019 | LOW | Brak UI dla emergency queue status |