Building a Voice-Driven Portfolio with ElevenLabs Real-Time API
This guide walks through creating a real-time, voice-driven conversational experience for your portfolio using ElevenLabs' Real-Time Conversational WebSocket API.
How to Create It
Build a simple frontend using React or Next.js with a clean, minimal UI. Add a microphone button that activates getUserMedia and captures live audio. Set up a backend (Node.js or any server) to store your ElevenLabs API key and create a secure session for the client. Then connect the frontend to the ElevenLabs Real-Time Conversational WebSocket API, which handles speech-to-speech interaction.
Frontend Setup
navigator.mediaDevices.getUserMedia({ audio: true }) to capture audioBackend Setup
How to Integrate It
The frontend continuously captures microphone audio, converts it to 16-bit PCM, and sends it through a WebSocket. The backend provides a temporary token or proxies the WebSocket so the API key never touches the browser. ElevenLabs receives the user audio, understands it, generates a response, and streams back real-time TTS audio. The frontend immediately plays these audio chunks as they arrive.
Key Integration Points
How It Works
The system functions as a live, two-way audio stream. When the user speaks, raw audio is streamed directly to the ElevenLabs conversational endpoint. ElevenLabs performs live speech recognition, processes the query, and instantly streams back natural voice output. Each audio chunk is played on the frontend immediately, creating a near-instant response (usually under 1 second). The entire portfolio becomes a conversational, voice-driven experience powered by ElevenLabs' real-time API.
Technical Flow
Benefits
Technologies Used
Enjoyed this piece of writing?
You should definitely subscribe my substack to get notified with more such posts