VAL-9000

Self-Hosted Local AI Assistant 6/1/2025

A self-hosted AI assistant stack running entirely on local hardware — LLM inference, text-to-speech, and physical hardware interfaces, all without a single cloud API call.

GPU-accelerated local LLM inference (Ollama + Llama 3)
Text-to-speech via Kokoro TTS
Multi-source ingestion pipeline — web pages, YouTube, git repos, local files
FastAPI coordinator server orchestrating all services and hardware I/O
Physical interfaces: MECHROPAD for input, VAL-EYE for visual output
Fully containerized via Docker Compose

VAL-9000 architecture

Details

Stack

All services run containerized via Docker Compose on local hardware with NVIDIA GPU passthrough:

Ollama — GPU-accelerated local LLM inference (Llama 3)
Kokoro — local text-to-speech synthesis
OpenWebUI — web-based chat interface
Ingestion service — custom FastAPI service for feeding RAG content
Preprocessor — cleans and prepares ingested content for retrieval

Custom Personality

VAL-9000 runs a hand-authored Modelfile defining a consistent personality with Big Five and MBTI trait scores, a defined value system, and specific behavioral constraints. The result is a coherent assistant character that remains consistent across sessions rather than a generic chatbot.

RAG Ingestion Pipeline

A FastAPI ingestion service accepts content from multiple source types and normalizes it into RAG-ready text:

Web pages (main content extraction)
YouTube videos (transcript ingestion)
Git repositories
Local files

Hardware Interface

The physical presence is VAL-EYE — an ESP32-S3 with a 240x240 circular display running a real-time software-rasterized 3D scene that reacts to VAL-9000's audio output, changing visual state based on whether the assistant is idle or speaking.

← Portfolio

Isaac
Freer
Barrett

Isaac
Freer
Barrett

Isaac
Freer
Barrett

Isaac
Freer
Barrett

VAL-9000

Stack

Custom Personality

RAG Ingestion Pipeline

Hardware Interface