THE CBORG AI PORTAL

Full-Stack AI Inference Platform

Summary

The CBorg AI Portal is a multi-model AI inference-as-a-service portal that supports productivity and research applications at at Berkeley Lab. The portal provides a unified secure gateway to cloud-based and on-prem hosted models running on a GPU-enabled Kubernetes cluster. Supporting over 1500 chat users and 300+ API developers, CBorg has been the fastest-growing IT service at the Lab since its inception in 2024.

I was directed to lead the effort to create the portal in 2024. After surveying the configurations of similar portals at other UC institutions (UCSF, UCSD, and UCI), I designed a system architecture based around the LiteLLM proxy server (for API access and budget control), Librechat (for user front end) and vLLM (for high-performance continuous batched inference). The portal provides access to state-of-the-art commercial cloud models (ChatGPT, Gemini, Claude, xAI) via Lab single sign-on with a convenient self-managed API key service for advanced users developing their own LLM-powered applications.

For the on-prem side of the service, I curated a collection of open-weight models selected to provide essential AI functions within the hardware constraints of the GPU cluster; e.g., 4x H100 for Llama 4 (Chat Completions), 2x H100 for Qwen Coder (Code Completions), and 4x A100 for Qwen VL (Vision). In addition, I built out RAG and CAG agents incorporating domain knowledge from operational and scientific divisions including electrical safety, the Project Management Advisory Board and Joint Bio-Energy Institute.

Skills

AI Inference - vLLM, LiteLLM, Librechat
RAG , CAG and Agentic Bot Development
Kubernetes - Docker - Rancher
Full-Stack AI Infrastructure
Training, Community Engagement
Cross-Functional Leadership

Page updated

Google Sites

Report abuse