Open-AutoGLM Open Source: Enables automated mobile phone control for over 50 mainstream apps.

64Second reading
no comments

Open-AutoGLM: Turning your phone into an AI smart assistant

Open-AutoGLM It is a mobile intelligent assistant framework built on AutoGLM. It endows AI with visual understanding capabilities, enabling it to analyze the content of the mobile phone screen in real time and translate the user's natural language commands into specific automated operation sequences.

Users don't need to operate manually; they only need to give commands such as "search for food on Xiaohongshu" or "find WeChat contacts," and the system will automatically plan the route and simulate clicks, swipes, and input. To ensure security, the system will trigger a manual confirmation or takeover mechanism when sensitive operations are involved.

Core technology implementation

This framework achieves full-process automation through the following technical links:

  • Interface awareness: Use the Visual Language Model (VLM) to parse screen elements in real time.
  • Task planning: Break down complex instructions into executable steps.
  • Equipment control: Commands are executed via Android Debug Bridge (ADB), supporting remote debugging via WiFi.
  • Flexible access: Developers can integrate it into custom smart operation scenarios via the API.

Model Versions and Resources

The project provides two optimization models for different language environments:

  • AutoGLM-Phone-9B: Deeply optimized for Chinese application scenarios.
  • AutoGLM-Phone-9B-Multilingual: It is compatible with English and other language environments.

Model download: Hugging Face | ModelScope

Application coverage

Phone Agent is compatible with over 50 mainstream apps, covering the following core areas:

  • Social and Informational: WeChat, QQ, Weibo, Zhihu, Xiaohongshu
  • E-commerce and Lifestyle: Taobao, JD.com, Pinduoduo, Meituan, Ele.me, Dianping
  • Travel and Tools: Didi Chuxing, Ctrip, 12306, Gaode Map
  • Audio-visual entertainment: Douyin, Bilibili, iQiyi, NetEase Cloud Music

By running python main.py --list-apps View the complete list of supported services.

Operational Capability List

Operation instructions Function definition
Launch Launch the specified App
Tap / Double Tap Click/double-click to specify coordinates
Type Automatic text input
Swipe Four-way sliding screen
Back / Home Return to previous page / Return to desktop
Long Press Simulate long press
Wait Waiting for the page to load
Take_over Manual intervention (used for processing CAPTCHAs, etc.)

Quick Start

Project Repository: GitHub – Open-AutoGLM

Whether you're a developer looking to build automation solutions or an AI enthusiast, Open-AutoGLM can provide you with a controlled and efficient prototype of a mobile automation assistant.

End of text
0
Administrator
Copyright Notice:This article is original content from this website. Administrator Published on 2025-12-11, totaling 884 words.
Reprinting Notice:Unless otherwise stated, all original content on this site is published under the Creative Commons Attribution 4.0 (CC BY 4.0) license. Please indicate the source and retain the original link when reprinting. Some content on this site is compiled from publicly available information and may have been generated or optimized with the assistance of AI technology. It is for reference only and does not constitute any professional advice. Readers should make their own judgments and verifications. This site assumes no responsibility for the availability, security, or legality of third-party resources.
Comments (No comments)
验证码