Open-AutoGLM: Turning your phone into an AI smart assistant
Open-AutoGLM It is a mobile intelligent assistant framework built on AutoGLM. It endows AI with visual understanding capabilities, enabling it to analyze the content of the mobile phone screen in real time and translate the user's natural language commands into specific automated operation sequences.
Users don't need to operate manually; they only need to give commands such as "search for food on Xiaohongshu" or "find WeChat contacts," and the system will automatically plan the route and simulate clicks, swipes, and input. To ensure security, the system will trigger a manual confirmation or takeover mechanism when sensitive operations are involved.
Core technology implementation
This framework achieves full-process automation through the following technical links:
- Interface awareness: Use the Visual Language Model (VLM) to parse screen elements in real time.
- Task planning: Break down complex instructions into executable steps.
- Equipment control: Commands are executed via Android Debug Bridge (ADB), supporting remote debugging via WiFi.
- Flexible access: Developers can integrate it into custom smart operation scenarios via the API.
Model Versions and Resources
The project provides two optimization models for different language environments:
- AutoGLM-Phone-9B: Deeply optimized for Chinese application scenarios.
- AutoGLM-Phone-9B-Multilingual: It is compatible with English and other language environments.
Model download: Hugging Face | ModelScope
Application coverage
Phone Agent is compatible with over 50 mainstream apps, covering the following core areas:
- Social and Informational: WeChat, QQ, Weibo, Zhihu, Xiaohongshu
- E-commerce and Lifestyle: Taobao, JD.com, Pinduoduo, Meituan, Ele.me, Dianping
- Travel and Tools: Didi Chuxing, Ctrip, 12306, Gaode Map
- Audio-visual entertainment: Douyin, Bilibili, iQiyi, NetEase Cloud Music
By running python main.py --list-apps View the complete list of supported services.
Operational Capability List
| Operation instructions | Function definition |
|---|---|
| Launch | Launch the specified App |
| Tap / Double Tap | Click/double-click to specify coordinates |
| Type | Automatic text input |
| Swipe | Four-way sliding screen |
| Back / Home | Return to previous page / Return to desktop |
| Long Press | Simulate long press |
| Wait | Waiting for the page to load |
| Take_over | Manual intervention (used for processing CAPTCHAs, etc.) |
Quick Start
Project Repository: GitHub – Open-AutoGLM
Whether you're a developer looking to build automation solutions or an AI enthusiast, Open-AutoGLM can provide you with a controlled and efficient prototype of a mobile automation assistant.