Unified AI System

Architektura Unified AI

Unified AI to centralny system zarządzania wszystkimi usługami AI w Vista - abstrakcja nad wieloma providerami z automatycznym failover i load balancing.

graph TB
    subgraph "Frontend"
        UI[React UI]
        UAC[UnifiedAIClient]
    end

    subgraph "Backend (Tauri/Rust)"
        CMD[AI Commands]
        SR[Service Resolver]
        HM[Health Monitor]
    end

    subgraph "AI Providers"
        LIB[LibraxisAI Cloud<br/>Primary]
        MLX[Local MLX<br/>Secondary]
        OAI[OpenAI API<br/>Tertiary]
    end

    UI --> UAC
    UAC --> CMD
    CMD --> SR
    SR --> HM

    HM --> LIB
    HM --> MLX
    HM --> OAI

    SR -.->|Priority 1| LIB
    SR -.->|Priority 2| MLX
    SR -.->|Priority 3| OAI

Provider Chain

Priorytet providerów

Priority	Provider	Latency	Cost	Capabilities
1	LibraxisAI Cloud	~200ms	Included	LLM, STT, TTS
2	Local MLX	~100ms	Free	LLM, STT (Apple Silicon only)
3	OpenAI API	~500ms	$$$	LLM, STT, TTS

Automatic Failover

// Service Resolver w Rust
pub struct ServiceResolver {
    providers: Vec<Box<dyn AIProvider>>,
    health_cache: RwLock<HashMap<String, ProviderHealth>>,
    retry_config: RetryConfig,
}

impl ServiceResolver {
    pub async fn execute<T>(
        &self,
        request: AIRequest,
        operation: &str,
    ) -> Result<T, AIError> {
        for provider in &self.providers {
            // Skip unhealthy providers
            if !self.is_healthy(provider.name()).await {
                continue;
            }

            match provider.execute(&request).await {
                Ok(response) => {
                    self.record_success(provider.name());
                    return Ok(response);
                }
                Err(e) => {
                    self.record_failure(provider.name(), &e);
                    // Continue to next provider
                }
            }
        }

        Err(AIError::AllProvidersFailed)
    }
}

Usługi AI

LLM (Large Language Model)

Zastosowania w Vista:

Generowanie notatek SOAP z transkrypcji
Sugestie diagnoz i leczenia
Czat asystent AI
Analiza dokumentacji medycznej

#[derive(Debug, Serialize, Deserialize)]
pub struct ChatRequest {
    pub messages: Vec<ChatMessage>,
    pub model: Option<String>,
    pub temperature: Option<f32>,
    pub max_tokens: Option<u32>,
    pub stream: bool,
}

#[derive(Debug, Serialize, Deserialize)]
pub struct ChatMessage {
    pub role: String,      // "system", "user", "assistant"
    pub content: String,
}

STT (Speech-to-Text)

Zastosowania:

Transkrypcja nagrań wizyt
Dyktowanie notatek
Voice commands (przyszłość)

#[derive(Debug, Serialize, Deserialize)]
pub struct TranscriptionRequest {
    pub audio_path: String,
    pub language: String,          // "pl", "en"
    pub enable_diarization: bool,  // Speaker identification
    pub word_timestamps: bool,
}

#[derive(Debug, Serialize, Deserialize)]
pub struct TranscriptionResponse {
    pub text: String,
    pub segments: Vec<Segment>,
    pub speakers: Option<Vec<Speaker>>,
    pub language_detected: String,
    pub duration_seconds: f64,
}

TTS (Text-to-Speech)

Zastosowania:

Odczytywanie notatek
Accessibility features
Synteza odpowiedzi AI

#[derive(Debug, Serialize, Deserialize)]
pub struct SynthesisRequest {
    pub text: String,
    pub voice: Option<String>,     // Voice ID
    pub speed: Option<f32>,        // 0.5 - 2.0
    pub format: Option<String>,    // "mp3", "wav"
}

LibraxisAI Integration

LibraxisAI to dedykowany cloud service dla Vista.

Endpoints

Base URL: https://api.libraxis.ai/v1

POST /chat/completions     # LLM
POST /audio/transcriptions # STT
POST /audio/speech         # TTS
GET  /health               # Health check

Authentication

// API key stored in system keychain
let api_key = keychain::get("libraxis_api_key")?;

let client = reqwest::Client::new();
let response = client
    .post("https://api.libraxis.ai/v1/chat/completions")
    .header("Authorization", format!("Bearer {}", api_key))
    .header("Content-Type", "application/json")
    .json(&request)
    .send()
    .await?;

Rate Limiting

Tier	Requests/min	Tokens/min
Free	20	10,000
Pro	100	100,000
Enterprise	Unlimited	Unlimited

Local MLX Integration

MLX - Apple’s ML framework dla Apple Silicon (M1/M2/M3).

Konfiguracja MLX Server

# Instalacja
pip install mlx-lm mlx-whisper

# Uruchomienie LLM server
mlx_lm.server --model mlx-community/Llama-3.2-3B-Instruct-4bit --port 8080

# Uruchomienie STT server
mlx_whisper.server --model mlx-community/whisper-large-v3-mlx --port 8081

Health Check

#[tauri::command]
pub async fn get_mlx_health_detailed() -> Result<MLXHealthReport, String> {
    let llm_check = check_service("http://localhost:8080/health").await;
    let stt_check = check_service("http://localhost:8081/health").await;

    Ok(MLXHealthReport {
        llm_status: llm_check.status,
        llm_response_time: llm_check.latency_ms,
        llm_model: llm_check.model,
        stt_status: stt_check.status,
        stt_response_time: stt_check.latency_ms,
        overall_health: llm_check.status == "healthy" || stt_check.status == "healthy",
    })
}

OpenAI Fallback

OpenAI API jako ostatnia opcja gdy inne providery niedostępne.

Modele

Use Case	Model	Cost
Chat	gpt-4o-mini	$0.15/1M input
STT	whisper-1	$0.006/min
TTS	tts-1	$0.015/1K chars

Konfiguracja

// User settings
interface OpenAIConfig {
  apiKey: string;          // Stored in keychain
  enableFallback: boolean; // Default: true
  preferredModel: string;  // Default: "gpt-4o-mini"
}

SOAP Generation Pipeline

sequenceDiagram
    participant UI as Frontend
    participant BE as Backend
    participant AI as AI Service

    UI->>BE: generate_soap(visit_id, transcript)

    BE->>BE: Load patient context
    BE->>BE: Load user preferences
    BE->>BE: Build system prompt

    Note over BE: System Prompt includes:<br/>- Patient history<br/>- Visit type<br/>- User preferences<br/>- SOAP format rules

    BE->>AI: Chat completion request
    AI->>BE: SOAP response

    BE->>BE: Parse & validate SOAP
    BE->>BE: Extract AI suggestions

    BE->>UI: SOAPNote + Suggestions

System Prompt Template

Jesteś asystentem weterynaryjnym. Twoim zadaniem jest wygenerowanie
notatki SOAP na podstawie transkrypcji wizyty.

## Kontekst pacjenta
- Imię: {{patient.name}}
- Gatunek: {{patient.species}}
- Rasa: {{patient.breed}}
- Wiek: {{patient.age}}
- Historia medyczna: {{patient.medical_conditions}}

## Preferencje użytkownika
- Styl dokumentacji: {{user.docs_style}}
- Poziom szczegółowości: {{user.ai_precision_level}}
- Format: {{user.format}}

## Transkrypcja
{{transcript}}

## Zadanie
Wygeneruj notatkę SOAP w formacie:
- Subjective (S): Obserwacje właściciela, historia
- Objective (O): Wyniki badania fizykalnego
- Assessment (A): Diagnoza, ocena kliniczna
- Plan (P): Plan leczenia, zalecenia

Dodatkowo zaproponuj:
- Możliwe follow-up zadania
- Przypomnienia dla właściciela

Error Handling

Error Types

#[derive(Debug, thiserror::Error)]
pub enum AIError {
    #[error("All providers failed")]
    AllProvidersFailed,

    #[error("Provider {0} returned error: {1}")]
    ProviderError(String, String),

    #[error("Rate limit exceeded for {0}")]
    RateLimitExceeded(String),

    #[error("Invalid request: {0}")]
    InvalidRequest(String),

    #[error("Timeout after {0}ms")]
    Timeout(u64),

    #[error("Network error: {0}")]
    NetworkError(#[from] reqwest::Error),
}

Retry Strategy

pub struct RetryConfig {
    pub max_attempts: u32,      // Default: 3
    pub initial_delay_ms: u64,  // Default: 1000
    pub max_delay_ms: u64,      // Default: 10000
    pub backoff_multiplier: f32, // Default: 2.0
}

impl RetryConfig {
    pub fn should_retry(&self, error: &AIError, attempt: u32) -> bool {
        if attempt >= self.max_attempts {
            return false;
        }

        matches!(error,
            AIError::Timeout(_) |
            AIError::NetworkError(_) |
            AIError::ProviderError(_, _)
        )
    }
}

Metrics & Monitoring

Tracked Metrics

Metric	Description
`ai_request_total`	Total requests per provider
`ai_request_duration_ms`	Request latency
`ai_request_errors`	Errors per provider
`ai_tokens_used`	Token consumption
`ai_provider_health`	Provider availability

Health Dashboard Data

#[derive(Debug, Serialize)]
pub struct AIHealthDashboard {
    pub providers: Vec<ProviderStatus>,
    pub total_requests_today: u64,
    pub average_latency_ms: f64,
    pub error_rate: f64,
    pub primary_provider: String,
}

Security Considerations

Data Flow Security

┌─────────────────────────────────────────────────────────────┐
│                    LOCAL PROCESSING                          │
├─────────────────────────────────────────────────────────────┤
│  ✅ Audio recording                                          │
│  ✅ Audio storage                                            │
│  ✅ Database storage                                         │
│  ✅ Local MLX inference (when available)                     │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                    CLOUD PROCESSING                          │
├─────────────────────────────────────────────────────────────┤
│  ⚠️ STT transcription → transcript text sent to cloud       │
│  ⚠️ LLM generation → transcript + context sent to cloud     │
│  ✅ No data retention by AI providers (per contract)        │
│  ✅ TLS 1.3 encryption in transit                           │
└─────────────────────────────────────────────────────────────┘