ChatGPT Speaks Back: OpenAI Launches Advanced Voice Mode
Published On: September 25, 2024
OpenAI has officially launched their highly anticipated Advanced Voice Mode for ChatGPT, bringing more natural and fluid audio chats to its premium users. This feature, which has been in the works for several months, is now rolling out to users subscribed to the Plus, Team, or Enterprise plans. The upgrade promises a more engaging and seamless interaction with ChatGPT, marking a significant step forward in conversational AI.
The new voice feature allows users to engage in real-time voice conversations with ChatGPT. One key advantage of Advanced Voice Mode is its ability to handle interruptions—users can stop ChatGPT mid-sentence, and the model will adapt accordingly. The rollout also introduces nine different voices for users to choose from, and users can further customize their audio experience through the app’s settings.
Advanced Voice Mode makes full use of OpenAI’s multimodal capabilities powered by GPT-4, enabling more natural back-and-forth communication. This innovation sets them apart from competitors like Google’s Gemini Live, which relies on text-to-speech (TTS) and speech-to-text (STT) engines for interaction. Unlike Gemini Live, ChatGPT's voice feature supports direct audio input and output, making the interactions more fluid and conversational.
While the voice feature was initially announced in May, its release was delayed due to safety concerns and a high-profile controversy surrounding the use of a voice strikingly similar to that of actress Scarlett Johansson. Following legal challenges, OpenAI removed the voice, dubbed "Sky," from their product lineup and worked to refine the system. Five months later, the company is confident in the polished version now being rolled out globally—though it remains unavailable in certain regions, including EU countries and the UK.
Getting started with advanced voice mode
For users eager to try the new feature, the process is straightforward. Subscribers to the Plus plan, which costs $20 per month, as well as Team and Enterprise users, will receive a notification in the ChatGPT app once the feature is enabled. From there, activating voice mode involves opening the app, creating a new chat, and selecting the voice option next to the text input field. With a tap of the microphone icon, users can start speaking, and ChatGPT will respond quickly, even adjusting to user preferences like speed or accent.
In addition to more conversational exchanges, OpenAI has improved accent recognition for foreign languages and increased the overall speed of responses, making the feature ideal for tasks like language learning, bedtime stories, or job interview preparation. However, users should note that access is not unlimited. Session time limits are in place—typically around 30 minutes per session, with prompts reminding users of the remaining time.
How you can benefit from voice mode
Advanced Voice Mode opens up new possibilities for how you use ChatGPT. Whether you want help practicing a new language, preparing for a big presentation, or simply enjoying an interactive story, the voice feature offers more dynamic and immersive interactions. You can even adjust the voice to suit your mood—whether it’s fast, slow, or accented—and interrupt to ask follow-up questions without disrupting the flow.