Jun 16 2025
Artificial Intelligence

The Rise of Multimodal Interfaces in the Workplace

Combining voice, visuals and text into an AI assistant helps small businesses upskill, boost productivity, save money and improve accessibility.

Artificial intelligence voice assistants are giving way to multimodal interfaces that offer small businesses the ability to streamline even more mundane tasks, so their employees can focus on more complex work.

While AI voice assistants can comprehend and respond to verbal commands and queries, their lack of visual or tactile modes is inherently limiting.

Of the 400 business decision-makers surveyed by AI voice platform provider Deepgram in 2023, 82% used some form of voice technology for a variety of reasons, including increasing productivity, revenues and operational efficiencies. The market for AI voice assistants is only expected to grow, and that growth will happen alongside continued advances in natural language processing, chips, graphics processing units, cloud computing and displays.

“Over the next few years, we are going to see a boom of AI assistants or AI agents — every person and organization will likely have one or multiple of these advanced AI companions,” writes Alex Velinov, CTO of Tag Digital, in his think piece, In AI We Trust.

“They will have distinct, digital human-like personas that users will interact with through multimedia interfaces that combine voice, visuals and text for a seamless conversational experience akin to communicating with another person.”

Click the banner below to deepen your understanding of the current artificial intelligence landscape.

 

Four Benefits Multimodal Interfaces Offer Small Businesses

Velinov was responding to two AI developments: the release of OpenAI’s ChatGPT-4o, by his estimation the first truly multimodal model, and the announcement of Google’s Project Asta, a universal, Gemini-powered assistant prototype that supports multimodal inputs.

“Today’s AI models are evolving toward more advanced and diverse data processing capabilities across text, audio and video. Over the last two years, we have seen improvements in the quality of each modality,” notes a recent McKinsey report. “For example, Google’s Gemini Live has improved audio quality and latency and can now deliver a human-like conversation with emotional nuance and expressiveness. Also, demonstrations of Sora by OpenAI show its ability to translate text to video.”

How will these advancements in multimodal interfaces impact small businesses?

  • Easy Upskilling: Multimodal interfaces that can carry on a natural dialogue and even offer visual feedback are easier for employees to navigate and learn from. With these interfaces becoming the future of the workplace, early adopters will reap dividends as they evolve and scale.
  • One-Stop Shops: Employees are used to interacting with a variety of applications to complete their work, but multimodal interfaces eliminate the need to transition between them. For example, the AI can understand a voice command initiating a video call and identify hand gestures used to advance a slide deck.
  • Lower Overhead: The fewer apps and pieces of associated hardware (such as cameras and microphones) that businesses need, the greater their savings.
  • Promoting Diversity: Interfaces that respond to verbal commands and hand gestures are more accessible to employees with diverse needs and thus more inclusive. In general, they allow workers to communicate via their preferred method.

DISCOVER: Small language models drive business efficiency.

Thousands of Multimodal Models To Choose From

Small businesses often have small IT teams and budgets, with about 60% of the 2,000 small and midsize business leaders interviewed for a 2023 Connected Commerce Council survey said they plan to use AI tools to save time and money. planning to use AI tools to save time and money, according to

Early multimodal interface use cases include the medical field, where clinicians are using them to process conversations with patients and analyze medical imaging to identify tumors and other issues. Human resources teams are using the interfaces to handle claims more efficiently.

Businesses have thousands of multimodal models to choose from in the Azure AI Foundry.

“Ultimately, these multimodal AI agents will make us exponentially more efficient and creative, driving human productivity and discovery in unprecedented ways,” Velinov wrote. “Our relationships and interactions with technology will be fundamentally transformed — becoming more personal, intelligent and collaborative than ever before imagined.”

UP NEXT: This is the definitive checklist for deploying artificial intelligence agents.

alvarez/Getty Images
Close

Unlock IT Success for Your Small Business

Click here to sign up for our newsletter and get the latest expert insights.