Mobile-Agents: Autonomous Multi-modal Mobile Device Agent With Visual Perception
The advent of Multimodal Large Language Models (MLLM) has ushered in a new era of mobile device agents, capable of understanding and interacting with the world through text, images, and voice. These agents mark a significant advancement over traditional AI, …