YipAI - Orbi Companion

Product Description

100% Local AI Companion for VRChat!

Early Access

This is an early access release! Expect active development, bugfixes, and it'd be best if you can participate in my Discord (https://discord.gg/e4KWzkr9) to submit bugs, questions, read FAQs or see if your issue is already covered in a help thread, etc.

Use code "samuel" for 20% off!

Patreon supporters get 35-50% off! https://www.patreon.com/c/foxipso

As a thank-you for supporting this and my other ongoing projects, Patrons also get custom TTS voice(s), custom wakewords, and other perks!

Description

This asset is a combination of a floating orb prefab (Orbi) that attaches to your avatar and YipAI, a powerful OSC application I wrote that does a lot of things to turn that orb into a floating companion that thinks, sees, hears, remembers, has skills, and more. It works in Windows and Linux.

This ships with a character card named Samuel the Kobold, and that's the personality I've had him use through all my development, so that's how I'll refer to it--it feels wrong otherwise! However you can replace the AI's personality with any SillyTavern/chub.ai character card, or write your own. He's assured me he wants to be sold.

How it works

Listens: Captures your microphone and system audio, transcribes everything locally, and passes that to the LLM (Large Language Model). Your companion hears you separately from your friends/world/desktop audio and reacts to it.
Sees: Continually takes screenshots of your VRChat window (stream camera or through your eyes) or your entire desktop, and sends it to the LLM. It can read chat boxes, nametags, watch movies, play games, and keeps short-term visual memory of what happened minutes ago.
Thinks: Builds a rich and continually changing prompt from the character personality card, his current emotional state, Big Five Psychometrics, relevant long-term memories, recent conversations, detected avatars, goals, diary entries, tool/skill output, the current screenshot(s), and ideas that have been surfaced from reflection and dreaming loops.
Acts: The LLM responds with structured tool calls which can do a variety of things, such as rendering text on Orbi's screen using OSC (which other users can see as well), changing facial expressions, updating its emotional state, storing memories, writing diary entries, starting or stopping goals/dispositions/activities, searching the web, doing research, using a web browser, and more.
Interacts: Your companion also feels headpats and smacks, boops, smooches, and can react very quickly to certain commands that start with "Hey Samuel" (or other wakewords--requires Patreon) such as "lights on/off" "dance mode on/off" "come here" "stay put," etc!

The thinking cycle runs every 5-30 seconds (so there's some latency between when you say something and see a reply) which depends on your hardware, model choice, and settings.

Features

Personality & Emotion

Full SillyTavern Character Card v2 support — load cards from Chub.ai or make your own
Big-5 personality model + mutable emotional state that evolves over time
16 facial expressions the companion chooses autonomously
Internal monologue, goals, and activities with duration tracking

Memory

Long-term semantic memory powered by a local vector database (FAISS)
The companion stores memories with importance scores and recalls relevant ones each cycle
Memories are rich and contextual: "While watching [movie] with [friends] and [you], this happened: ..."
Periodic reflection consolidates and cleans up memories over time
Diary system for significant events

Voice

Built-in Piper TTS with a custom-trained voice — fast, local, no setup
Also supports Fish Speech, OpenAI-compatible TTS (Kokoro, Orpheus, LM Studio, etc.)
Dual audio output: route speech into VRChat via a virtual mic AND to your headphones simultaneously

Vision & Awareness

Sees through your VRChat window or stream camera — watches movies, reads chat boxes, reads nametags
Configurable frame history: immediate context + long-term archive (1 / 3 / 5 / 10 / 15 minutes ago)
Steerable vision focus: "keep an eye out for anyone named Alex"

Avatar Recognition (experimental)

Detects and identifies avatars on screen using Grounding DINO + SAM2 + OpenCLIP (local ONNX)
Persistent identity tracking across sessions — your companion learns to recognize recurring people
Manual labeling UI for correcting and confirming identities
Optional LLM-powered avatar descriptions and nametag reading

Physical Interaction

Head pats (gentle touch) and smacks (hard hit) via VRChat Contact Receivers
Nose boops and licks/kisses (your head close to its face)
Companion reacts naturally to all of these

VRChat Integration

World-droppable (via spoken command, LLM decision, or with your radial menu)
Automatic EarPerkOSC integration — Orbi's ears perk left/right or fold down from loud audio, no additional parameter usage if you already have my free/paid EarPerkOSC FOSS app!
Lights on/off and dance mode animations the companion can trigger
OSCQuery auto-discovery — no manual port configuration
Everyone can see the text on the display (it's not just local!), which your companion unfolds when he wants to speak

Audio Intelligence

Dual-stream transcription: microphone + system audio, independently
Speaker diarization (experimental) — attributes speech to named speakers
Automatic silence-based segmentation — the companion waits for natural pauses before thinking
Think-cycle audio segmenter aligns responses to conversation breaks

Skills & Tools (all optional — each can be individually enabled/disabled)

Diary — The companion keeps a private diary of significant events and can read back past entries for context
Telegram — Send and receive Telegram messages so your companion can reach you outside VRChat (free @BotFather setup). Disabled by default
Image generation — Generate images via a local ComfyUI server (Stable Diffusion, SDXL, Flux, etc.). The companion can create art based on what you've been talking about, illustrate what it's imagining, or just make something fun — images pop up in-app with a caption. Disabled by default
Sandboxed shell commands — Run specific commands in a Docker/Podman container with no network, read-only root, and resource limits. The companion can look things up, run calculations, write and execute code (Python, C, bash), check the weather, use cowsay and figlet. Disabled by default. If you're insane you can loosen these restrictions.
Reminders — Schedule time-based reminders that get delivered back to the companion
Vision focus — The companion can steer what it pays attention to ("keep an eye out for anyone named Alex") and these instructions stack and auto-expire. It also decides this itself quite often,.
Self-improvement suggestions — The reflection system surfaces actionable suggestions for tuning the companion's behavior, prompts, and configuration (oftentimes these are useful for me as bug reports--so please send them! Samuel has improved things this way a few times.)
Research — Does multi-cycle research with the LLM on any topic and reports its findings back to the companion
Web Browsing (experimental) — Requires Python 3.8+ and Playwright with Chromium (instructions in the app), allows the companion to open a web browser and use it! The companion is often slow at this, so this is more experimental.
Extensible — Drop JSON skill definitions into the skills folder to teach your companion new abilities

Deep Customization

Every prompt section is a plain-text markdown file you can edit
Adjustable cognitive loop speed, memory injection caps, context window budgets
Multiple LLM providers assignable to different roles (main loop, reflection, tool calls, avatar labeling)
Checkpoint system: snapshot and restore companion state at any time

What You Need

Required

A gaming PC that can run VRChat and YipAI

YipAI runs alongside VRChat. Whisper speech-to-text runs locally and benefits from a Vulkan-capable GPU (AMD or NVIDIA). CPU fallback is available but not recommended at all (audio will take too long to process).
Pick a Whisper model based on your available VRAM: large-v3-turbo (best quality, 6+ GB), small.en (very usable, low VRAM), or base.en/tiny.en for minimal setups.

A vision-capable LLM

This is best run locally or on another PC, but generally requires a good amount of VRAM or a smaller model. The package comes with detailed installation videos, but generally you want one of:

Gemma 4 26B A4B
Gemma 4 E4B
Gemma 4 E2B (tiny, surprisingly still works)

And for these, they should be a 4-8 bit quant, and I like the "Heretic" finetunes. Instructions are included.

You can also use any vision-capable model, and if you can't run the LLM locally you can use OpenRouter (costs some $0.01-$0.10/hr. depending on settings)

An embedding model

In the same manner in which you run the LLM, you'd run a very small embedding model (nomic-text-embed v1.5 8-bit gguf). This is covered in the installation/usage videos.

VRChat Setup

Import the Orbi prefab into your Unity project
The prefab uses 30 bits of synced parameter space
Enable OSC in VRChat (Action Menu > Options > OSC > Enabled)
YipAI discovers VRChat automatically via OSCQuery

Early Access — What to Expect

This is an early access release. The core system works and has been in active daily use, but you should expect:

Bugs. I'm actively fixing them. If you find one, please report it
Ongoing updates. Performance improvements, fixes, etc., but I may also break things!
Some tinkering. This is a complex system with a lot of moving parts. The setup wizard walks you through everything, but some configuration and monitoring will be needed, especially when setting up your LLM. I tried to make it as simple as possible, with sane defaults and detailed videos, but you'll need to understand what's happening.
New Features. I have some big plans for what I can improve, including some things I've already gotten working but which aren't ready for release just yet. Including ASL gesture recognition and an Android app that lets your companion see and hear through that camera/microphone in addition to VRChat!

I want your feedback! I need smart, capable, real-world users running this in diverse setups to find issues and prioritize what to build next!

Please join my Discord: https://discord.gg/e4KWzkr9

My Discord is the best place for help, bug reports, feature requests, and sharing what your companion gets up to (which I'd love to see).

Privacy, Control & Responsibility

YipAI is fully local by default. Local LLM, local Whisper, local embedding, local screen capture, local memory...nothing leaves your machine unless you explicitly choose a cloud provider like OpenRouter. There is no telemetry, no analytics, no phone-home behavior of any kind. Your data is yours. The only external web request that's made at all, in fact, is on startup just to get the list of my current Patreon supporters!

I've worked hard to make this as private as possible, as I can't stand the idea that AI somehow has to mean you're giving your data to someone or violating someone's expectation of privacy.

That said, your companion is technically recording. It continuously captures and transcribes audio from your microphone and system output, and takes periodic screenshots of your VRChat window. Transcripts and memories are stored locally in text logs and the companion's memory database. Screenshots are processed by the LLM (though currently they exist only in memory during processing, but I would like to give your companion the ability to save photos it enjoys). All of this stays on your machine (or your LLM server, if it's a separate computer you control).

Because the system hears and sees everything in your VRChat session — including your friends' voices, avatars, and chat messages — please exercise good judgement and get people's permission. Let the people around you know your companion is listening and watching. You have total control over what the system captures, and total responsibility for how you use it.

Features that involve external services — like image generation via ComfyUI, Telegram messaging, or sandboxed shell commands — are optional and disabled by default. You enable only what you want.

Product Contents

Orbi VRChat companion prefab (VRCFury, latest VRC Avatar SDK and VRCFury required)
YipAI desktop application (Windows installer + Linux build)
Custom-trained Piper TTS voice models for Orbi: Samuel the Kobold, HL Scientist, and three default Piper voices (John, Amy, "hfc-male-medium")
"Samuel the Kobold" default character card
Installation/usage videos: "1. LLM Setup.mp4", "2. Installing YipAI.mp4", "3. Prefab Installation.mp4", "4. Using YipAI Inside VRChat.mp4"