AI Agent
This document describes how to use the AI Agent in SightLab, an interactive, intelligent AI agent that can be connected to various large language models like GPT-4 and Claude Opus. You can customize the agent's personality, use speech recognition, and leverage high-quality text-to-speech models.
Can be found in ExampleScripts- AI_Intelligent_Agent (or ExampleScripts- Chat_GPT_Agent on older versions of SightLab)
Key Features
Interact and converse with custom AI Large Language Models in real-time VR or XR simulations.
Choose from OpenAI models (including GPT-4o, custom GPTs), Anthropic models (like Claude 3 Opus or 3.5 Sonnet). Requires an API key.
Modify avatar appearance, animations, environment, and more. Works with most avatar libraries (Avaturn, ReadyPlayerMe, Mixamo, Rocketbox, Reallusion, etc.).
Customize the agent's personality, contextual awareness, emotional state, interactions, and more. Save your creations as custom agents.
Use speech recognition to converse using your voice or text-based input.
Choose from high-quality voices from Open AI TTS or Eleven Labs (requires API)
Train the agent as it adapts using conversation history and interactions.
Works with all features of SightLab, including data collection and visualizations, transcript saving, and more.
Add to any SightLab script
Instructions
Installation
Ensure you have the required libraries installed using the Vizard Package Manager. These include:
openai (for OpenAI GPT agents)
anthropic (for Anthropic Claude agent)
elevenlabs (for ElevenLabs text-to-speech)
SpeechRecognition
sounddevice (pyaudio for older versions of Sightlab)
python-vlc
numpy
Need to install vlc player (for Open AI TTS). Seems to need at least version 3.0.20
For elevenlabs you may need to install ffmpeg(see below)
Note: Requires an active internet connection
If using Vizard 8 or higher, copy the contents of the "updated speech recognition files" to C:\Program Files\WorldViz\Vizard8\bin\lib\site-packages\speech_recognition overwriting the _init_ and audio.py files. You will need to also do this again if updating the SpeechRecognition library.
API Keys
Obtain API keys from OpenAI (if using Chat GPT), Anthropic (if using that model), and ElevenLabs (if using elevenlabs instead of OpenAI's TTS). See below for specific information on obtaining API keys.
Create a folder named "keys" in your SightLab root directory and place these text files inside:
key.txt: Contains your OpenAI API key.
elevenlabs_key.txt: Contains your ElevenLabs API key (if using elevenlabs)
ffmpeg_path.txt: Contains the path to the ffmpeg bin folder. On some setups this is not needed. ffmpeg download
Copy path where bin directory is and paste that into this text file
If using the Anthropic model, create an anthropic_key.txt file containing your Anthropic API key.
For Gemini can place a text file called gemini_key.txt
Configuration
Open the AI_Agent_Config.py script (now in the configs folder) and configure the following options. Add new config files if wanting to have multiple configurations (would then change the top line in the AI_Agent.py and AI_Agent_Avatar scripts where it's being imported). Note that USE_PASSTHROUGH and DISABLE_LIGHTING_AVATAR are additionally set in the AI_Agent_GUI.py script. The default config uses a male RocketBox avatar in a Home Office, but can easily be changed. The included options are already included at the top of the AI_Agent_Avatar script, so you can just comment and uncomment to use the one you want.
AI_MODEL: Choose between 'CHAT_GPT' and 'CLAUDE'.
OPENAI_MODEL: Specify the OpenAI model name (e.g., "gpt-4o"). List of models here
ANTHROPIC_MODEL: Specify the Anthropic model name (e.g., "claude-3-5-sonnet-20240620"). List of models here
MAX_TOKENS: The amount of tokens each exchange will use. Set to a higher number to increase number of responses (token limit for most models is 4096, gpt-4 has 8192)
USE_SPEECH_RECOGNITION: Toggle on or off using speech recognition vs. text based interactions
SPEECH_MODEL: Choose Open AI TTS or Eleven Labs
ELEVEN_LABS_VOICE: Choose Voice for Eleven Labs
OPEN_AI_VOICE: Choose Voice for Open AI TTS (Samples here)
USE_GUI: Choose if you want to use the SightLab GUI to select environments and options
chatgpt_prompt_file: Save prompts as text files in "prompts" folder. Reference which one to use here
USE_PASSTHROUGH: Choose if using Mixed Reality Passthrough (select 'empty.osgb' for environment)
ENVIRONMENT: Not necessary to set if using GUI. User your own or find ones in sightlab_resources/environments
AVATAR_MODEL: Add avatar model to use. Use your own or find some in the Resources folder or sightlab_resources/avatar/full_body (see here for how to get more)
BIOPAC_ON: Choose whether to connect with Biopac Acqknowledge to measure physiological responses (this can also be toggled in the GUI if using the GUI mode)
DISABLE_LIGHTING_AVATAR: Disable the avatar lighting if the avatar looks "blown out" or overly lit in certain environments.
ATTACH_FACE_LIGHT: Attach a light in front of the avatar's face if the face looks too dark.
And other configurations as needed such as Avatar options (see below), GUI options, max token size, history and more (refer to the script for details).
Running the Script
Run AI_Agent.py or AI_Agent_GUI.py to start
Interaction
Hold either the 'c' key or the RH grip button to start speaking, let go to stop and AI agent will respond. If INTERUPT_AI is True, you can press 'c' or RH grip to interrupt and speak again.
If HOLD_KEY_TO_SPEAK is False, press the 'c' key once to start speaking to the AI.
If USE_KEY_WORD is set to True, say "Agent" before each interaction.
If SPEECH_RECOGNITION is set to False, press 'c' to bring up the chat window
To stop the conversation, type "q" and click "OK" in the text chat, or say "exit" if using speech recognition (or just close the window or click Escape).
Change the role or personality of the avatar by modifying or creating a prompt file in the "prompts" folder using natural language and changing which one is referenced in the config file (see below).
Modify options in the config files (see below)
Modifying Environment and Avatar(s)
See this page for places to get new assets (works with Avaturn, ReadyPlayerMe, Reallusion, Mixamo, RocketBox, and other fbx avatar libraries)
Place environment model in Resources/environments for default location, or reference the new path
For adding new avatars see this page https://sightlab2.worldviz.com/examplestemplates/adding-avatar-agents
Modify the config file to update the environment and avatar path, as well as avatar options
Adding to Existing or Additional Scripts
First make sure to copy over the "configs", "keys" and "prompts" folders, as well as the AI_Agent_Avatar.py file
Add these lines at the top of your script
from configs.AI_Agent_Config import *
import AI_Agent_Avatar
Add these lines to have the avatar show up in the replay
avatar = AI_Agent_Avatar.avatar
sightlab.addSceneObject('avatar',avatar,avatar = True)
Add these lines to enable passthrough augmented reality
if USE_PASSTHROUGH:
import openxr
xr = openxr.getClient()
if sightlab.getConfig() in[ "Meta Quest Pro","Meta Quest 3"]:
passthrough = xr.getPassthroughFB()
elif sightlab.getConfig() == "Varjo":
passthrough = xr.getPassthroughVarjo()
viz.clearcolor(viz.BLACK, 0.0)
if passthrough:
passthrough.setEnabled(True)
Obtaining API Keys
To use certain features of the AI Agent, you'll need to obtain API keys from the following services:
OpenAI (for ChatGPT and Open AI Text to Speech):
Visit the OpenAI website (not the ChatGPT login page): https://platform.openai.com/
Sign up for an account if you don't have one, or log in if you already do (you may also need to buy some credits, but don't need much for this (most likely would not surpass a few dollars a month))
Navigate to the API section of your account (https://platform.openai.com/api-keys)
Click "Create a new secret key" and copy the key.
Paste the copied key into a text file named "key.txt" and place it in your root SightLab folder.
Might need to buy a certain amount of credits, but even $5 should be sufficient. Go to "Usage" and increase usage limit. https://platform.openai.com/usage
Can add whichever models are available in the config file
Eleven Labs (for ElevenLabs Text-to-Speech):
Log in to your ElevenLabs account: https://elevenlabs.io/
Click your profile icon in the top-right corner.
Click the eye icon next to the "API Key" field.
Copy your API key.
Paste the copied key into a text file named "elevenlabs_key.txt" and place it in your root SightLab folder.
Anthropic API:
Go to the Anthropic login page here (Anthropic Console ) and either Sign up or login.
Fill out the sign-up form with your email address and other required information. You may need to provide details about your intended use case for the API.
After submitting the form, you should receive a confirmation email. Follow the instructions in the email to verify your account.
Once your account is verified, log in to the Anthropic website using your credentials.
Navigate to the API section of your account dashboard
Gemini and Gemini Ultra
In Package Manager- cmd use this: install -q -U google-generativeai
More instructions on using Gemini to come soon
Avatar Configuration Options
TALK_ANIMATION: The animation index for the avatar's talking animation.
IDLE_ANIMATION: The animation index for the avatar's idle or default pose.
AVATAR_POSITION: A list defining the avatar's starting position in the virtual environment (format: [x, y, z]).
AVATAR_EULER: A list specifying the avatar's initial rotation in Euler angles (format: [yaw, pitch, roll]).
NECK_BONE, HEAD_BONE, SPINE_BONE: String names of the bones used for the follow viewpoint (can find these by opening the avatar model in Inspector)
TURN_NECK: Boolean flag if the neck needs to initially be turned
NECK_TWIST_VALUES: List of values defining the neck's twisting motion (format: [yaw, pitch, roll]).
USE_MOUTH_MORPH: Boolean flag to activate or deactivate mouth morphing animations during speech (not needed for Rocketbox avatars)
MOUTH_OPEN_ID: The ID number of the morph target for opening the mouth (find in Inspector).
MOUTH_OPEN_AMOUNT: The amount by which the mouth opens, typically a value between 0 and 1.
BLINKING: Boolean flag to enable or disable blinking animations.
BLINK_ID: The ID number of the morph target for blinking.
DISABLE_LIGHTING_AVATAR: Boolean flag if lighting in environment is too blown out for the avatar
ATTACH_FACE_LIGHT: Boolean flag to attach a light source to the avatar's face.
FACE_LIGHT_BONE: The name of the bone to which the face light is attached if ATTACH_FACE_LIGHT is true.
MORPH_DURATION_ADJUSTMENT: If mouth movement goes too long, can adjust this
Additional Information:
For prompts, add "" quotation marks around GPT prompt and use "I am... " for configuring agent. For Anthropic do not need quotes and can use "You are..."
For elevenlabs, refer to the ElevenLabs Python documentation for more details: https://github.com/elevenlabs.
You can connect "Assistants" through the openai API, but not custom GPTs.
Issues and Troubleshooting
There may be an error if you have your microphone set to your VR headset and the sound output device set to not be the headset
May see an error if are using the free version of elevenlabs and run out of the 10,000 character limit (paid accounts get larger quotas)
ffplay error with elevenlabs - may need to install ffmpeg and add it to the Vizard environment path https://www.gyan.dev/ffmpeg/builds/
mpv player error with elevenlabs- may need to install mpv and add it to the Vizard environment path https://mpv.io/installation/
Tips
To give your agent an understanding of the environment it is in, take a screenshot (can use the '/' key in SightLab, which is saved to the 'recordings' folder), then upload that to Chat GPT online and ask to generate a description, you can then put this in your text prompt to use.
To add an event for the button to hold for speaking, can modify your vizconnect (in sightlab_utils/vizconnect_configs - see "settings.py" for which vizconnect files go with which hardware choice). Double click to open and go to Advanced- Events, from there either modify the Mappings for the existing "triggerDown" and "triggerUp", or create new ones if there are none. See here for more information.
Interact and converse with custom AI Large Language Models in a VR or XR simulation in real time.
Choose from Openai models, including GPT-4, custom GPTs, and Anthropic (such as Claude 3 Opus)- Requires API key
Modify avatar appearance, animations, environment and more. Works with most avatar libraries
Customize personality of agent, contextual awareness, emotional state, interactions and more. Save as custom agents.
Use speech recognition to converse using your voice or text based
Choose from high quality voices from eleven labs and other libraries (requires API) or customize and create your own
Train agent as it adapts using a history of the conversation and its interactions
Works with all features of SightLab such as data collection and visualizations, save transcript and more
There is also available a version of this that just runs as an education based tool, where you can select objects in a scene and get information and labels on that item (such as paintings in an art gallery). See this page for that version.
Planned Updates:
Integration of Gemini Ultra
Loading of multiple LLMs in a single scene with multiple agents
Image recognition and visual processing
Documentation improvements on adding your own avatar
Direct interactions with scene (i.e. ability for agent to pick up objects, trigger events. etc.)
Code optimizations and streamlining