interact_with_screen
Execute screen interaction actions
Unified interface for screen interactions including tapping, swiping, key pressing, text input, and element search.
Args:
action (str): Action type, one of:
- "tap": Tap screen at specified coordinates
- "swipe": Swipe screen from one position to another
- "key": Press a system key
- "text": Input text
- "find": Find UI element(s)
- "wait": Wait for element to appear
- "scroll": Scroll to find element
params (Dict[str, Any]): Parameters dictionary with action-specific values:
For "tap" action:
- x (int): X coordinate to tap
- y (int): Y coordinate to tap
For "swipe" action:
- x1 (int): Start X coordinate
- y1 (int): Start Y coordinate
- x2 (int): End X coordinate
- y2 (int): End Y coordinate
- duration (int, optional): Swipe duration in ms, defaults to 300
For "key" action:
- keycode (str/int): Key to press (e.g., "back", "home", "enter", or keycode number)
For "text" action:
- content (str): Text to input. For Chinese characters, use pinyin instead
(e.g. "yu\ tian" for "雨天") with escaped spaces.
Direct Chinese character input may fail on some devices.
For "find" action:
- method (str): Search method, one of: "text", "id", "content_desc", "class", "clickable"
- value (str): Text/value to search for (not required for method="clickable")
- partial (bool, optional): Use partial matching, defaults to True (for text/content_desc)
For "wait" action:
- method (str): Search method, same options as "find"
- value (str): Text/value to search for
- timeout (int, optional): Maximum wait time in seconds, defaults to 30
- interval (float, optional): Check interval in seconds, defaults to 1.0
For "scroll" action:
- method (str): Search method, same options as "find"
- value (str): Text/value to search for
- direction (str, optional): Scroll direction, one of: "up", "down", "left", "right", defaults to "down"
- max_swipes (int, optional): Maximum swipe attempts, defaults to 5
Returns:
str: JSON string with operation result containing:
For successful operations:
{
"status": "success",
"message": "Operation-specific success message",
... (optional action-specific data)
}
For failed operations:
{
"status": "error",
"message": "Error description"
}
Special cases:
- find: Returns elements list containing matching elements with their properties
- wait: Returns success when element found or error if timeout
- scroll: Returns success when element found or error if not found after max attempts
Examples:
# Tap by coordinates
result = await interact_with_screen("tap", {"x": 100, "y": 200})
# Swipe down
result = await interact_with_screen("swipe",
{"x1": 500, "y1": 300,
"x2": 500, "y2": 1200,
"duration": 300})
# Input text
result = await interact_with_screen("text", {"content": "Hello world"})
# Press back key
result = await interact_with_screen("key", {"keycode": "back"})
# Find element by text
result = await interact_with_screen("find",
{"method": "text",
"value": "Settings",
"partial": True})
# Wait for element to appear
result = await interact_with_screen("wait",
{"method": "text",
"value": "Success",
"timeout": 10,
"interval": 0.5})
# Scroll to find element
result = await interact_with_screen("scroll",
{"method": "text",
"value": "Privacy Policy",
"direction": "down",
"max_swipes": 8})