Multimodal

AI & LLM Glossary

What is Multimodal?

Multimodal: Multimodal models process multiple types of content: text, images, audio, and video in a single request.

Multimodal Explained

Multimodal models can understand and generate different content types. GPT-5.2 can analyze images, Gemini 3.0 handles video, and various models support audio. Each modality has different pricing. Images are typically priced per image or by resolution. Video is priced per second. Multimodal capabilities enable richer applications but require careful cost management.

Track Your LLM Costs

Burnwise monitors every metric automatically. Start optimizing today.

Start Free Trial