Salmonn: Towards Generic Hearing Abilities For Large Language Models
Hearing, which involves the perception and understanding of generic auditory information, is crucial for AI agents in real-world environments. This auditory information encompasses three primary sound types: music, audio events, and speech. Recently, text-based Large Language Model (LLM) frameworks have …