GK Question

technology hard fill_blank

The technique that enables LLMs to reason about visual inputs by combining vision and language models is called ________ Learning.

Answer: Multimodal

Multimodal models (CLIP, LLaVA) process text, images, audio jointly, enabling visual question answering, image captioning, and cross-modal retrieval.

Topic Advanced AI/ML
Exam Relevance UPSC, Banking, SSC