Complete Self-Hosted AI Stack with Ollama, LiteLLM & RAG

Easy deployment of a complete self-hosted AI stack including LLM, gateway, STT, TTS, and RAG, with GPU support.

Complete Self-Hosted AI Stack with Ollama, LiteLLM & RAG

This repository provides a streamlined solution for deploying a comprehensive, self-hosted AI stack. Designed for developers, it integrates essential components for building and running AI applications locally, offering control over your data and models. The stack is built using Docker, simplifying deployment and management, and includes robust support for GPU acceleration.

What it Does

The Complete Self-Hosted AI Stack automates the setup of a fully functional AI environment on your own infrastructure. It bundles key technologies to enable local LLM inference, API gateway functionality, speech-to-text (STT), text-to-speech (TTS), and Retrieval Augmented Generation (RAG) capabilities. This allows developers to experiment with, develop, and deploy AI agents and applications without relying on external cloud services.

Key Features

Self-Hosted Control: Maintain full ownership and privacy of your AI models and data.
Integrated Components: Includes Ollama for LLM management, LiteLLM as a unified API gateway, and RAG implementation for knowledge retrieval.
GPU Acceleration: Optimized for performance with seamless GPU support, crucial for demanding AI workloads.
Dockerized Deployment: Simplifies installation and configuration through containerization.
Extensible Architecture: Provides a foundation for custom AI agent development and integration.
STT/TTS Support: Incorporates speech processing capabilities for voice-enabled AI applications.

Who it's For

This tool is specifically tailored for AI developers , ML engineers , and researchers who require a flexible and private environment for their AI projects. It's ideal for those looking to:

Build and deploy custom AI agents.
Experiment with different LLMs locally.
Develop RAG-powered applications.
Integrate STT and TTS into their AI workflows.
Avoid vendor lock-in and manage their own AI infrastructure.
Leverage GPU resources for faster AI model processing.