Przejdź do głównej zawartości

Audio Pipeline

Kompletny flow od nagrywania audio do transkrypcji i SOAP generation. ~25 plików, ~12k LOC.

Na tej stronie
flowchart TD
START([Veterinarian starts visit])
RECORD[Audio Recording Begins]
STREAM[Real-time Audio Stream]
BUFFER[Audio Buffer Management]
subgraph "Audio Processing"
VAD[Voice Activity Detection]
SEGMENT[Audio Segmentation]
COMPRESS[Audio Compression]
end
subgraph "Speech Recognition"
STT_DETECT[STT Endpoint Detection]
STT_PROCESS[Speech-to-Text]
DIARIZATION[Speaker Diarization]
end
subgraph "AI Processing"
PREPROCESS[Text Preprocessing]
LLM_REQUEST[LLM SOAP Generation]
POST_PROCESS[Response Processing]
end
subgraph "Database Storage"
SAVE_AUDIO[Save Audio File]
SAVE_SEGMENTS[Save Speaker Segments]
SAVE_SOAP[Save SOAP Notes]
UPDATE_VISIT[Update Visit Status]
end
STOP([Visit Finalized])
START --> RECORD
RECORD --> STREAM
STREAM --> BUFFER
BUFFER --> VAD
VAD --> SEGMENT
SEGMENT --> COMPRESS
COMPRESS --> STT_DETECT
STT_DETECT --> STT_PROCESS
STT_PROCESS --> DIARIZATION
DIARIZATION --> PREPROCESS
PREPROCESS --> LLM_REQUEST
LLM_REQUEST --> POST_PROCESS
POST_PROCESS --> SAVE_AUDIO
SAVE_AUDIO --> SAVE_SEGMENTS
SAVE_SEGMENTS --> SAVE_SOAP
SAVE_SOAP --> UPDATE_VISIT
UPDATE_VISIT --> STOP

Supported formats:

  • Recording: WebM (preferred), MP4, WAV
  • Storage: Compressed OPUS for efficiency
  • Processing: PCM conversion for AI services
  • Cleanup: Configurable retention (7-365 days)

File locations:

~/Library/Application Support/Vista/
├── recordings/ # Original audio files
├── processed/ # Compressed/converted files
├── transcripts/ # Text output cache
└── temp/ # Processing workspace
MethodRust CommandPurpose
saveFile(data, visitId)save_audio_fileSave recording blob
queueTranscription(recId)queue_transcription_jobQueue for background
transcribeRecording(recId)transcribe_recording_nowImmediate transcription
linkRecordingToVisit(recId, visitId)link_recording_to_visitAssociate with visit
listPendingCaptures(limit)list_pending_audio_capturesGet unattached recordings
attachCaptureToVisit(recId, visitId)attach_capture_to_visitLink pending to visit
discardPendingCapture(recId)discard_pending_captureDelete pending
cleanupOldRecordings(days)cleanup_old_recordingsRemove old files
processUploadedFile(path)process_uploaded_audio_fileHandle file upload
stateDiagram-v2
[*] --> recording: MediaRecorder.start()
recording --> stopped: MediaRecorder.stop()
stopped --> saving: saveFile()
saving --> saved: DB insert success
saved --> queued: queueTranscription()
saved --> transcribing: transcribeRecording()
queued --> transcribing: Worker picks up
transcribing --> transcribed: WhisperX success
transcribing --> error: WhisperX failed
transcribed --> linked: linkRecordingToVisit()
linked --> [*]
erDiagram
RECORDINGS ||--o| VISITS : "belongs to"
RECORDINGS ||--o{ TRANSCRIPTION_JOBS : "has"
RECORDINGS {
string recording_id PK
string visit_id FK "nullable"
string path
string status
string created_by
int original_bytes
int final_bytes
float reduction_pct
string transcript
datetime created_at
}
TRANSCRIPTION_JOBS {
string job_id PK
string recording_id FK
string visit_id FK
string status
string source_type "live|upload"
datetime created_at
datetime completed_at
}
Event NameEmitterListenerPurpose
transcription_completedtranscribe_recording_nowuseAiSuggestionsTrigger AI suggestions
transcription_progresstranscribe_recording_nowUIProgress updates
recording_savedsave_audio_fileUIConfirm save

Każda powierzchnia używająca useAudioRecording uruchamia RecordingSession przez recording_start_session:

  1. PCM captured through usePcm16Streamer
  2. Funnelled into AudioCaptureCoordinator
  3. Chunked to fixed 40 ms packets
  4. Sent with push_recording_chunk / push_stream_chunk

File uploads (visit audio panel, emergency queue, workspace recorder, Dev Playground) używają ingestSystemBlob():

blobToSystemPcm → feedSystemAudioChunk → flush coordinator → stop stream → transcript

AudioCaptureCoordinator.getTelemetry() śledzi:

MetricPurpose
bufferDepthMsAudio buffer depth
pendingSourcesActive audio sources
droppedChunksChunks dropped (overflow)
RuleDetails
Invoke formatStrictly snake_case via invokeSnake
PermissionsRust enforces use:ai / read:visits per command
StartstartVoiceStream yields { session_id, recording_session_id }
StopstopVoiceStream returns VoiceStreamSummary (status + transcript)
ImmutabilityFrontend never mutates session IDs
SurfaceLive captureSystem ingestNotes
Visit Audio Panel (AudioRecordingSection)micmanual uploads + emergency queueNo more process_uploaded_file polling
Emergency Recording Button / queueyessaved clip import → PCM streamVisit gets transcript + recording_id immediately
Workspace recorder / Dev PlaygroundyesfeedSystemAudioChunkUnchanged
PDF Drop Zonen/an/aDocument attachments only
recording_start_session → requires session_id, purpose, returns recording_session_id
recording_push_chunk → snake_case payload (chunk, sequence, recording_session_id)
recording_stop_session → finalizes, returns { status, transcript }
serve_audio_file → streams saved recording bytes for emergency queue import
transcribe_recording → VisitAudioService fast-path (recording_id, source_type)
queue_transcription_job → durable background job (recording_id, visit_id?)
ScenarioBehavior
WebAudio unavailableblobToSystemPcm returns null → fallback to transcribeRecordingWithFallback(recordingId, filePath)
Engine disabled (ENGINE_ENABLED=false)System ingest gated; uploads persist but STT via fallback helper
Large filesCoordinator flushes every 40ms; VITE_AUDIO_MAX_DURATION_MS (default 30 min) aborts long files
  1. blobToUint8ArrayaudioApi.saveFile writes upload to local store and links recording_id (when visit ID provided)
  2. Emergency queue entries serialize { recordingId, fileName, mimeType, sizeBytes } into emergencyRecordingQueue (localStorage)
  3. useAudioTranscription - single entry point for “save and transcribe later” flows
  • Extend PDFDropZone/chat attachments once audio uploads in chat are introduced
  • Add Playwright coverage: drop WAV into Visit Audio panel → assert transcript renders immediately
  • Document emergency queue → visit flow in .codex/audio (diagram)

MetricValue
Files~25
LOC~12,000
Tauri Commands16
Key HookuseAudioRecording (834 LOC)

graph TB
subgraph AudioCapture["Audio Capture Layer"]
AudioSource[Microphone Input]
PCMChunks[PCM Chunks]
Coordinator[AudioCaptureCoordinator<br/>visitId, callback, hooks]
end
subgraph Processing["Processing Layer"]
Normalise[ingestSystemBlob<br/>normalize PCM]
RecordingFile[recordings table<br/>WAV storage]
STT[STT Pipeline<br/>WhisperX / MLX / OpenAI]
end
subgraph Visit["Visit Layer"]
Transcript[visit.pending_transcript]
SOAP[visit.soap_*]
Merge[mergePendingTranscript]
end
AudioSource --> PCMChunks
PCMChunks --> Coordinator
Coordinator --> Normalise
Normalise --> RecordingFile
RecordingFile --> STT
STT --> Transcript
Transcript --> Merge
Merge --> SOAP

HookLOCOpis
useAudioRecording834Recording state, start/stop
useAudioTranscription456Transcription pipeline
useAudioUpload234File upload handling
useRecordingSession312Session management

IDSeverityDescription
P2-018MEDIUMEmergency queue może się rozrosnąć przy offline
P2-019LOWBrak UI dla emergency queue status