What’s Inside
How investors view voice AI, from those betting on rapid adoption to those urging a longer view
How startups like Maven AGI, Whippy and Aircall are redefining customer conversations
What the coming “Voice Turing Test” could mean for business and society
On the Fourth of July, fireworks companies faced a predictable problem: phones ringing nonstop with customers asking where to buy fireworks, how to use the products safely and whether certain items were in stock.
For most of the year, these businesses don’t need 24/7 phone support. But on this one weekend, call volume spikes so high that no amount of temporary staffing or outsourced answering services can keep pace with the volume. And once the holiday ends, the phones go silent again.
Instead of scrambling for short-term call centers, many of these companies this past past July turned to Aircall. Within minutes, they deployed AI voice agents that routed calls, provided updates and reassured customers trying to find locations or check on orders.
For Tom Chen, Aircall’s chief product officer, it was proof that real-time voice is making a comeback.
“Historically, voice was the most expensive channel to support. With AI, that changes,” Chen told me. “Companies that never even offered phone support are carving out efficiencies to bring voice back, and customers Love it.”
Across industries, the shift is accelerating. Businesses are under economic pressure to adopt new efficiencies. And Legacy phone systems, which are long blamed for robotic and frustrating interactions, are being replaced by conversational AI.
Also, customers no longer tolerate rigid scripts or hold music. They expect humanlike conversations that solve problems immediately. And in such sectors as healthcare, Retail and food service, where speed and personalization matter most, voice-driven systems are becoming the preferred interface.
Investors are pouring record amounts of capital into voice AI. Total equity funding for voice AI startups reached $2.1 billion in 2024, nearly seven times higher than the $315 million raised in 2022, according to CB Insights. That pace has likely kept up through mid-2025, as multiple sources report funding in AI voice tech has already outpaced 2024’s record.
Notable rounds like ElevenLabs’ $180 million Series C in January and, more recently, Assort Health’s $76 million Series B in late September illustrate how quickly the sector is maturing.
Also, in an indicator of early-stage founder activity, 22% of the startups in a recent Y Combinator class included companies building with voice, according to Cartesia.
The surge explains why Astasia Myers of Felicis Ventures is so emphatic about voice AI. She blogged about Felicis’ investment in Assort Health, a healthcare startup using agentic AI to transform patient communication. The company reports it has cut call wait times by 89 percent and reached 98 percent resolution rates across millions of patient interactions.
For Myers, this isn’t just about one company. It’s a signal that customers are demanding AI voice today. As she told the Startup Grind AI Summit:
“With the immeasurable improvements in AI voice models, you can not only automate the task but often the NPS of the experience is better.”
Tony Wang of 500 Global echoed the excitement at the same Startup Grind event. But he urged a broader lens. He said the winners in text and image models are largely set, while voice remains an open frontier: early, imperfect and full of room for founders to explore.
“The experience is still relatively early and still relatively broken,” Wang said. “But the surface area is huge. Voice agents are a really good interim approach, and as a founder, you want to start thinking about how to go from that to something durable.”
The two investors agree on the same trajectory, and the opportunity is real. But maturity will take time. The difference is tempo: Myers is betting that the breakout moment is now, while Wang suggests that the payoff will come as the Technology evolves.
If voice AI is still evolving, Maven AGI Co-Founder and CEO Jonathan Corbin disagrees. He said his company’s AI agents have already saved customers more than $200 million annually, cut support costs by 50 percent, and achieved satisfaction scores above 90 percent.
“What we’re really building is one brain that powers the entire customer journey,” Corbin said.
For clients like TripAdvisor, that means resolving 93 percent of customer inquiries through AI voice, a scale that he noted few human call centers could ever achieve. Corbin framed it less as replacing agents and more as amplifying them. “When you give someone the right context, they can do the work of five people. The AI makes that possible.”
Despite high expectations, Corbin knows adoption isn’t automatic. Most vendors are still early in deploying commercially viable products, and scaling enterprise-grade performance remains challenging. His view is that those who combine data context, voice nuance and automation speed will separate themselves from the noise.
While Maven tackles enterprise CX, Whippy focuses on small and mid-sized businesses. Co-founder and CTO Jack Kennedy calls the company’s product “UI-less software,” because once installed, it automates phone-based tasks that companies struggle to staff.
“Whippy just operates in the background almost as a real employee,” Kennedy said.
For pharmacies and staffing agencies, that means screening calls, scheduling and handling repetitive customer queries. In industries like recruiting — where phone-based roles can have 125 percent annual turnover — Kennedy argues AI is doing the work people don’t want to do.
Still, he doesn’t oversell. “AI is very good at some things, very bad at others,” he said. Whippy can handle high-volume, low-stakes conversations, but nuanced customer issues still require humans. It’s a reminder that while adoption is growing, the technology’s limitations remain part of the story.
Back at Aircall, Chen sees the challenges and the long-term potential of voice AI.
Voice agents are harder to deploy than text-based AI because every step — from speech-to-text to generating answers, then converting them back into natural-sounding voice — introduces friction and delay. A few milliseconds of latency can destroy the illusion of a natural conversation. Add in poor telecom connections or noisy customer environments, the reliability gap widens.
Still, Chen believes voice AI is nearing a turning point.
He describes an emerging shift toward voice-to-voice models that interpret tone and emotion directly rather than transcribing and re-synthesizing. “You can feel when someone’s frustrated or rushed,” he told me during a recent podcast recording. “Once AI can sense that in real time, we’ll have something closer to real conversation.”
That evolution, he added, will make AI voice more than a customer-service tool. It could become a universal interface for how people interact with software, letting anyone “talk” to systems as naturally as they do with each other.
“If you don’t have always-on customer communication in the next five years,” Chen said, “you’ll be at a disadvantage.”
But voice AI isn’t just an efficiency play. It’s reshaping jobs and raising new questions about its impact on society.
Corbin at Maven AGI describes AI as a force multiplier, giving agents the context to “do the work of five people.” Kennedy at Whippy sees it filling high-turnover roles that humans don’t want, while Aircall’s Chen focuses on transparency, and he warns that customers may not always realize they’re speaking to AI.
Going forward, companies may need new safeguards to keep the customer trust intact.
That tension — between speed and sincerity — sets up the next AI inflection point. Industry researchers now talk openly about a coming “Voice Turing Test,” the moment when AI speech becomes indistinguishable from human. Some analysts expect parity within two years, driven by rapid advances in prosody, breathing simulation and emotional tone modeling.
When that happens, the conversation won’t be about whether customers notice, but whether they should be told.
Companies in healthcare, Finance and Education are already testing new disclosure language, such as declaring: “this call may be monitored or conducted by AI.”
For others, especially in Entertainment and marketing, human-level voice synthesis may become a selling point rather than a concern.
The line between empathy and mimicry is narrowing. The same technology that makes support more personal could also blur the boundary between genuine emotion and algorithmic tone.
It’s a debate that few in tech want to lead, but it’s coming, and fast.
The fireworks story shows voice AI already delivering practical value. Investors like Myers and Wang see the same momentum, but they just differ on when the technology truly hits its stride.
The operators — from Corbin’s enterprise-scale “brain” to Kennedy’s SMB automation to Chen’s pursuit of emotional voice models — show a technology that is promising and imperfect.
Whether voice AI becomes the command layer for business interactions or remains an interim bridge to something else, one thing is certain: after years of chatbots, voice is back in the conversation.
Editorial image generated using AI assistance. Image concept by the author.