Open-AutoGLM Open Source: Enables automated mobile phone control for over 50 mainstream apps.

252Second reading

Open-AutoGLM It is a mobile intelligent assistant framework built on AutoGLM. It endows AI with visual understanding capabilities, enabling it to analyze the content of the mobile phone screen in real time and translate the user's natural language commands into specific automated operation sequences.

Users don't need to operate manually; they only need to give commands such as "search for food on Xiaohongshu" or "find WeChat contacts," and the system will automatically plan the route and simulate clicks, swipes, and input. To ensure security, the system will trigger a manual confirmation or takeover mechanism when sensitive operations are involved.

This framework achieves full-process automation through the following technical links:

Interface awareness: Use the Visual Language Model (VLM) to parse screen elements in real time.
Task planning: Break down complex instructions into executable steps.
Equipment control: Commands are executed via Android Debug Bridge (ADB), supporting remote debugging via WiFi.
Flexible access: Developers can integrate it into custom smart operation scenarios via the API.

Video demo address

The project provides two optimization models for different language environments:

AutoGLM-Phone-9B: Deeply optimized for Chinese application scenarios.
AutoGLM-Phone-9B-Multilingual: It is compatible with English and other language environments.

Model download: Hugging Face | ModelScope

Phone Agent is compatible with over 50 mainstream apps, covering the following core areas:

Social and Informational: WeChat, QQ, Weibo, Zhihu, Xiaohongshu
E-commerce and Lifestyle: Taobao, JD.com, Pinduoduo, Meituan, Ele.me, Dianping
Travel and Tools: Didi Chuxing, Ctrip, 12306, Gaode Map
Audio-visual entertainment: Douyin, Bilibili, iQiyi, NetEase Cloud Music

By running python main.py --list-apps View the complete list of supported services.

Operation instructions	Function definition
Launch	Launch the specified App
Tap / Double Tap	Click/double-click to specify coordinates
Type	Automatic text input
Swipe	Four-way sliding screen
Back / Home	Return to previous page / Return to desktop
Long Press	Simulate long press
Wait	Waiting for the page to load
Take_over	Manual intervention (used for processing CAPTCHAs, etc.)

Project Repository: GitHub – Open-AutoGLM

Whether you're a developer looking to build automation solutions or an AI enthusiast, Open-AutoGLM can provide you with a controlled and efficient prototype of a mobile automation assistant.

End of text

Published to: AI工具教程 GitHub project Creative tools

December 11, 2025

0

Copyright Notice:This article is original content from this website. Administrator Published on 2025-12-11, totaling 884 words.

Reprinting Notice:Unless otherwise stated, all original content on this site is published under the Creative Commons Attribution 4.0 (CC BY 4.0) license. Please indicate the source and retain the original link when reprinting. Some content on this site is compiled from publicly available information and may have been generated or optimized with the assistance of AI technology. It is for reference only and does not constitute any professional advice. Readers should make their own judgments and verifications. This site assumes no responsibility for the availability, security, or legality of third-party resources.

OpenGPT：低门槛构建与分发定制化 ChatGPT 应用的平台指南

Music Immm 纯净音乐播放方案：通过极简界面与无广告设计，构建高效的本地音频管理体验

Maigret 开源用户名追踪工具：支持 3000+ 全球站点匹配及其部署配置指南

如何在线阅读旋元佑《英语魔法师之语法俱乐部》：操作步骤指南

全方位知识进阶指南：脸红红学习平台的资源分布与使用技巧

如何使用在线工具快速生成土味情话：具体操作步骤指南

ASMR资源网盘下载指南：适用环境与获取注意事项

Python数据科学免费课程：学习路径、环境配置与获取指南

LinkedIn Customer Acquisition Guide: An Efficient Practical Path from Account Authority to Targeted Development