- Google推出Gemini 3.1 Flash Live,作为其目前最高质量的音频与语音模型,旨在提升实时对话能力。该模型在速度与对话自然节奏方面优化,适用于下一代以语音为核心的AI应用。目前已通过Google AI Studio的Gemini Live API向开发者提供预览,集成于Gemini Enterprise for Customer Experience供企业使用,并应用于Search Live与Gemini Live产品,面向普通用户开放。
在ComplexFuncBench Audio基准测试中,该模型以90.8%的得分领先,较前代模型表现更优,尤其在多步骤函数调用任务中表现稳定。在Scale AI的Audio MultiChallenge测试中,开启“思考”模式后得分达36.1%,在复杂指令遵循与长程推理方面表现突出,能应对真实音频中的中断与犹豫。模型还增强了对语调的识别能力,能更准确感知音高、语速等声学细节,并动态响应用户的挫败或困惑情绪。
- Gemini 3.1 Flash Live已在多个Google产品中部署,覆盖开发者、企业及普通用户。开发者可通过Google AI Studio的Gemini Live API进行预览,企业客户可通过Gemini Enterprise for Customer Experience集成该模型,而公众用户则可通过Search Live和Gemini Live体验其实时语音交互功能。
该模型在复杂任务执行方面表现突出,尤其在多步骤函数调用和长时程推理场景中优于前代。其改进的语调理解能力使其在识别用户情绪表达(如困惑或不满)时更精准,并能动态调整回应策略。此外,模型在嘈杂环境中仍能保持较高任务完成率,适用于客服、语音助手等高要求场景。
- Gemini 3.1 Flash Live支持开发者构建能处理复杂任务的语音优先代理。其强化了推理与任务执行能力,在ComplexFuncBench Audio测试中以90.8%准确率领先,显著优于前代模型。在Scale AI的Audio MultiChallenge中,开启思考模式后得分36.1%,在真实音频干扰下仍能有效执行复杂指令。
模型提升了对话的自然度,能识别语调、语速等声学特征,并据此调整回应方式。例如,在用户表达困惑时提供更清晰的解释。该能力已在Gemini Enterprise中应用,提升客户体验场景的交互质量。此外,模型支持语音编程等创新用例,允许用户通过语音快速迭代代码。
- 多家企业如Verizon和LiveKit已开始采用Gemini 3.1 Flash Live。该模型通过提升音频理解与响应可靠性,助力企业构建更高效的语音交互系统。其在真实环境中的鲁棒性表现,使其适用于客服、远程协作等高并发场景。
尽管具体合作细节未完全披露,但已有案例显示该模型在提升用户满意度与任务完成效率方面具有潜力。其API开放策略也降低了开发者接入门槛,推动语音AI应用的快速迭代与部署。
- Google has introduced Gemini 3.1 Flash Live, its most advanced audio and voice AI model to date, designed to enhance real-time dialogue with improved naturalness, reliability, and responsiveness. The model is optimized for voice-first applications and supports complex reasoning and task execution in dynamic audio environments. It is now available across multiple Google platforms: in preview for developers via the Gemini Live API in Google AI Studio, integrated into Gemini Enterprise for Customer Experience for business use, and accessible to general users through Search Live and Gemini Live. Performance benchmarks demonstrate significant improvements—Gemini 3.1 Flash Live achieves a 90.8% score on ComplexFuncBench Audio, outperforming its predecessor in multi-step function calling under constraints. On Scale AI’s Audio MultiChallenge, which evaluates complex instruction following and long-horizon reasoning amid real-world interruptions, it scores 36.1% with reasoning enabled. The model also shows enhanced tonal understanding, better detecting acoustic cues such as pitch and pace, and adapts responses based on user emotional states like frustration or confusion. These capabilities enable more intuitive voice interactions in noisy or challenging environments. Early adopters include companies like Verizon and LiveKit, signaling enterprise interest in deploying advanced voice agents. Detailed information on specific use cases or technical architecture is limited.
Key Takeaways:
Gemini 3.1 Flash Live improves audio AI with natural dialogue and reliable task execution
Available across developer, enterprise, and consumer Google platforms for voice-first applications
Outperforms prior models in complex reasoning and real-world audio benchmarks
Enhanced tonal understanding enables adaptive responses to user emotions
Source: Original Article
查看原文 →
View Original →