ChatGPT teased voice & image features that will allow users to speak to the chatbot, get voice responses, upload images, and receive images in responses.
Now, ChatGPT can “see, hear, and speak” with the latest update. The new voice and image capabilities make the AI-based chatbot able to have voice conversations with the user or show the user what they are talking about.
Going multimodal was the next step for ChatGPT for a while now. Tapping to speak is the next leap for today’s generative AI chatbots. In an example, OpenAI tweeted that you can now “Use your voice to engage in a back-and-forth conversation with ChatGPT. Speak with it on the go, request a bedtime story, or settle a dinner table debate.”
This puts the power of generative AI based on OpenAI’s powerful GPT-3.5 (free) and GPT-4 (Plus) models a voice command away, making it a true home assistant like Alexa or Google Home.
On the phone, it could mimic the experience of Google Now or Apple’s Siri. In fact, if you wanted to, you could assign ChatGPT to a shortcut on the new iPhones with the action button and open up the chatbot with the press of a button.
You could choose from five voices—Juniper, Sky, Cove, Ember, and Breeze. In the official blog announcement, OpenAI teased a recording that you can listen to in all 5 styles.
With this update, ChatGPT can also respond with images and accept images for, let’s say, better troubleshooting—Allowing you to even focus on a certain area in your picture input.
The Plus and Enterprise users will get the update in the next two weeks. Other groups including developers will need to wait a while longer.