Public

The birth of `steve`

Playwright is really useful for "manual testing". That is, when *los manos* belong to a coding agent. I had a thought – why couldn't we build sort of the same but for macOS using accessibility apis?

Stats
Prompts
44
Tool calls
233
Messages
723
Assets
0
Meta
Author
@mikker
Created At
January 24, 2026
Source
Codex
Share
Permalink
Thought 0 times. Used 0 tool calls. +0 -0 [Expand]
System prompt

AGENTS.md instructions for ~/dev/steve

  • We want the simplest change possible. We don't care about migration. Code readability matters most, and we're happy to make bigger changes to achieve it.
  • If I pass you URLs, fetch them directly. You may also use the web command line tool (~/.dotfiles/bin/web) for a text-only version.
  • mise is installed and generally used to manage language versions. Don't force it everywhere but notice it when it's there.
  • ast-grep and ripgrep are available.
  • If I ask you to commit, you decide the message. Use format "[Imperative] [description] [optional: two newlines and an extended description"
    • Examples: "Add pricing page", "Fix whatever", "Change the universe"
  • Whenever you're done working, as the very last thing before returning to the user, run ~/.dotfiles/bin/bootledoot to notify.

Skills

A skill is a set of local instructions to follow that is stored in a SKILL.md file. Below is the list of skills that can be used. Each entry includes a name, description, and file path so you can open the source for full instructions when using a specific skill.

Available skills

  • agent-browser: Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages. (file: ~/.codex/skills/agent-browser/SKILL.md)
  • agent-native-architecture: This skill should be used when building AI agents using prompt-native architecture where features are defined in prompts, not code. Use it when creating autonomous agents, designing MCP servers, implementing self-modifying systems, or adopting the "trust the agent's intelligence" philosophy. (file: ~/.codex/skills/agent-native-architecture/SKILL.md)
  • andrew-kane-gem-writer: This skill should be used when writing Ruby gems following Andrew Kane's proven patterns and philosophy. It applies when creating new Ruby gems, refactoring existing gems, designing gem APIs, or when clean, minimal, production-ready Ruby library code is needed. Triggers on requests like "create a gem", "write a Ruby library", "design a gem API", or mentions of Andrew Kane's style. (file: ~/.codex/skills/andrew-kane-gem-writer/SKILL.md)
  • beads: Tracks complex, multi-session work using the Beads issue tracker and dependency graphs, and provides persistent memory that survives conversation compaction. Use when work spans multiple sessions, has complex dependencies, or needs persistent context across compaction cycles. Trigger with phrases like "create task for", "what's ready to work on", "show task", "track this work", "what's blocking", or "update status". (file: ~/.codex/skills/beads/SKILL.md)
  • coding-tutor: Personalized coding tutorials that build on your existing knowledge and use your actual codebase for examples. Creates a persistent learning trail that compounds over time using the power of AI, spaced repetition and quizes. (file: ~/.codex/skills/coding-tutor/SKILL.md)
  • dhh-rails-style: This skill should be used when writing Ruby and Rails code in DHH's distinctive 37signals style. It applies when writing Ruby code, Rails applications, creating models, controllers, or any Ruby file. Triggers on Ruby/Rails code generation, refactoring requests, code review, or when the user mentions DHH, 37signals, Basecamp, HEY, or Campfire style. Embodies REST purity, fat models, thin controllers, Current attributes, Hotwire patterns, and the "clarity over cleverness" philosophy. (file: ~/.codex/skills/dhh-rails-style/SKILL.md)
  • dhh-ruby-style: This skill should be used when writing Ruby and Rails code in DHH's distinctive 37signals style. It applies when writing Ruby code, Rails applications, creating models, controllers, or any Ruby file. Triggers on Ruby/Rails code generation, refactoring requests, code review, or when the user mentions DHH, 37signals, Basecamp, HEY, or Campfire style. Embodies REST purity, fat models, thin controllers, Current attributes, Hotwire patterns, and the "clarity over cleverness" philosophy. (file: ~/.codex/skills/dhh-ruby-style/SKILL.md)
  • file-todos: This skill should be used when managing the file-based todo tracking system in the todos/ directory. It provides workflows for creating todos, managing status and dependencies, conducting triage, and integrating with slash commands and code review processes. (file: ~/.codex/skills/file-todos/SKILL.md)
  • gemini-imagegen: This skill should be used when generating and editing images using the Gemini API (Nano Banana Pro). It applies when creating images from text prompts, editing existing images, applying style transfers, generating logos with text, creating stickers, product mockups, or any image generation/manipulation task. Supports text-to-image, image editing, multi-turn refinement, and composition from multiple reference images. (file: ~/.codex/skills/gemini-imagegen/SKILL.md)
  • git-worktree: This skill manages Git worktrees for isolated parallel development. It handles creating, listing, switching, and cleaning up worktrees with a simple interactive interface, following KISS principles. (file: ~/.codex/skills/git-worktree/SKILL.md)
  • refactor-pass: Perform a refactor pass focused on simplicity after recent changes. Use when the user asks for a refactor/cleanup pass, simplification, or dead-code removal and expects build/tests to verify behavior. (file: ~/.codex/skills/refactor-pass/SKILL.md)
  • solidify-codebase: Deep investigation and solidification pass on an existing codebase. Use when asked to audit, simplify, or future-proof a system; perform a deep cleanup/refactor pass; identify high-impact improvement opportunities; or present a vetted change list before implementing selected items. (file: ~/.codex/skills/solidify-codebase/SKILL.md)
  • swiftui-ui-patterns: Best practices and example-driven guidance for building SwiftUI views and components. Use when creating or refactoring SwiftUI UI, designing tab architecture with TabView, composing screens, or needing component-specific patterns and examples. (file: ~/.codex/skills/swiftui-ui-patterns/SKILL.md)
  • vercel-react-best-practices: React and Next.js performance optimization guidelines from Vercel Engineering. This skill should be used when writing, reviewing, or refactoring React/Next.js code to ensure optimal performance patterns. Triggers on tasks involving React components, Next.js pages, data fetching, bundle optimization, or performance improvements. (file: ~/.codex/skills/vercel-react-best-practices/SKILL.md)
  • web-design-guidelines: Review UI code for Web Interface Guidelines compliance. Use when asked to "review my UI", "check accessibility", "audit design", "review UX", or "check my site against best practices". (file: ~/.codex/skills/web-design-guidelines/SKILL.md)
  • web-interface-guidelines: Review UI/frontend code for compliance with Vercel Web Interface Guidelines. Use when the user asks for a guideline audit, accessibility/UX compliance review, or a /web-interface-guidelines style check across files or patterns. (file: ~/.codex/skills/web-interface-guidelines/SKILL.md)
  • skill-creator: Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Codex's capabilities with specialized knowledge, workflows, or tool integrations. (file: ~/.codex/skills/.system/skill-creator/SKILL.md)
  • skill-installer: Install Codex skills into $CODEX_HOME/skills from a curated list or a GitHub repo path. Use when a user asks to list installable skills, install a curated skill, or install a skill from another repo (including private repos). (file: ~/.codex/skills/.system/skill-installer/SKILL.md)

How to use skills

  • Discovery: The list above is the skills available in this session (name + description + file path). Skill bodies live on disk at the listed paths.
  • Trigger rules: If the user names a skill (with $SkillName or plain text) OR the task clearly matches a skill's description shown above, you must use that skill for that turn. Multiple mentions mean use them all. Do not carry skills across turns unless re-mentioned.
  • Missing/blocked: If a named skill isn't in the list or the path can't be read, say so briefly and continue with the best fallback.
  • How to use a skill (progressive disclosure):
    1. After deciding to use a skill, open its SKILL.md. Read only enough to follow the workflow.
    2. If SKILL.md points to extra folders such as references/, load only the specific files needed for the request; don't bulk-load everything.
    3. If scripts/ exist, prefer running or patching them instead of retyping large code blocks.
    4. If assets/ or templates exist, reuse them instead of recreating from scratch.
  • Coordination and sequencing:
    • If multiple skills apply, choose the minimal set that covers the request and state the order you'll use them.
    • Announce which skill(s) you're using and why (one short line). If you skip an obvious skill, say why.
  • Context hygiene:
    • Keep context small: summarize long sections instead of pasting them; only load extra files when needed.
    • Avoid deep reference-chasing: prefer opening only files directly linked from SKILL.md unless you're blocked.
    • When variants exist (frameworks, providers, domains), pick only the relevant reference file(s) and note that choice.
  • Safety and fallback: If a skill can't be applied cleanly (missing files, unclear instructions), state the issue, pick the next-best approach, and continue.
System prompt

<environment_context>
~/dev/steve
<approval_policy>never</approval_policy>
<sandbox_mode>danger-full-access</sandbox_mode>
<network_access>enabled</network_access>
zsh
</environment_context>

Permalink

uitool — Mac UI Automation CLI

A Swift CLI for driving Mac applications via the Accessibility API. Designed for automated testing and AI agent control.

Installation

brew install uitool  # or build from source

Requires Accessibility permissions in System Settings > Privacy & Security.

Output Format

All commands output JSON to stdout. Errors go to stderr with non-zero exit.

{"ok": true, "data": ...}
{"ok": false, "error": "message"}

Commands

Application Control

# List running applications
uitool apps
# → {"ok": true, "data": [{"name": "Finder", "pid": 123, "bundleId": "com.apple.finder"}, ...]}

# Focus an application
uitool focus "AppName"
uitool focus --pid 1234
uitool focus --bundle "com.example.app"

# Launch application
uitool launch "com.example.app"
uitool launch "com.example.app" --wait  # wait until responsive

# Quit application
uitool quit "AppName"
uitool quit "AppName" --force

Element Discovery

# List UI elements in focused app (tree view, default depth 3)
uitool elements
uitool elements --depth 5
uitool elements --app "AppName"

# Find elements matching criteria
uitool find "Button"                    # by role
uitool find --title "Submit"            # by title/label
uitool find --role AXButton --title "OK"
uitool find --identifier "loginButton"  # by accessibility identifier

# Get element at coordinates
uitool element-at 100 200

Element output:

{
  "ok": true,
  "data": [{
    "id": "ax://1234/5678",
    "role": "AXButton",
    "title": "Submit",
    "identifier": "submitBtn",
    "enabled": true,
    "focused": false,
    "frame": {"x": 100, "y": 200, "width": 80, "height": 24},
    "children": [...]
  }]
}

Interactions

# Click element (by reference ID from find/elements)
uitool click "ax://1234/5678"

# Click by search (finds first match)
uitool click --title "Submit"
uitool click --role AXButton --title "Cancel"

# Click at coordinates
uitool click-at 100 200
uitool click-at 100 200 --double
uitool click-at 100 200 --right

# Type text (into focused element)
uitool type "hello world"
uitool type "hello" --delay 50   # ms between keystrokes

# Press keys
uitool key enter
uitool key cmd+a
uitool key cmd+shift+p
uitool key escape

# Set element value directly
uitool set-value "ax://1234/5678" "new text"

# Scroll
uitool scroll down
uitool scroll up --amount 5
uitool scroll --element "ax://1234/5678" down

Assertions (for testing)

# Check element exists (exit 0 if found, 1 if not)
uitool exists --title "Welcome"
uitool exists --role AXWindow --title "Settings"

# Wait for element (with timeout)
uitool wait --title "Loading..." --gone --timeout 10
uitool wait --title "Results" --timeout 5

# Assert element state
uitool assert --title "Submit" --enabled
uitool assert --title "Checkbox" --checked
uitool assert --title "Input" --value "expected text"

Windows

# List windows
uitool windows
uitool windows --app "AppName"

# Window operations
uitool window focus "ax://win/123"
uitool window minimize "ax://win/123"
uitool window fullscreen "ax://win/123"
uitool window resize "ax://win/123" 800 600
uitool window move "ax://win/123" 100 100

Menus

# List menu bar items
uitool menus

# Click menu item by path
uitool menu "File" "New"
uitool menu "Edit" "Find" "Find..."

Screenshots (for debugging)

# Capture window
uitool screenshot                       # focused window → stdout (PNG)
uitool screenshot --app "AppName" -o screenshot.png
uitool screenshot --element "ax://1234" -o element.png

Global Options

--app "Name"      # Target specific app (default: frontmost)
--pid 1234        # Target by PID
--bundle "id"     # Target by bundle identifier
--timeout 5       # Default timeout for waits (seconds)
--verbose         # Include debug info in output
--quiet           # Suppress non-essential output

Exit Codes

Code Meaning 0 Success 1 Element not found 2 App not found / not running 3 Timeout 4 Permission denied (accessibility) 5 Invalid arguments

Example: Testing a Launcher App

#!/bin/bash
set -e

# Launch app
uitool launch "com.example.launcher" --wait

# Activate with hotkey
uitool key cmd+space

# Wait for window
uitool wait --role AXWindow --title "Launcher" --timeout 3

# Type query
uitool type "safari"

# Wait for results
uitool wait --role AXList --timeout 2

# Verify result appears
uitool assert --title "Safari" --exists

# Select first result
uitool key enter

# Verify launcher closed
uitool wait --role AXWindow --title "Launcher" --gone --timeout 2

echo "Test passed"

Notes

  • Element IDs (ax://...) are stable within a session but not across app restarts
  • Use --identifier for reliable element targeting when possible (requires app to set accessibility identifiers)
  • Coordinates are in screen space (0,0 = top-left of main display)
  • Multiple monitors: coordinates extend rightward/downward
Permalink
Thought 3 times. Used 3 tool calls. +0 -0 [Expand]
Thinking

Planning to read AGENTS.md

I'll read the AGENTS.md file next to understand the agent-related instructions before proceeding.

$ ls
Thinking

Checking agents directory

I'm verifying whether the AGENTS directory exists and contains files by listing or searching it to understand the code organization before proceeding.

$ rg --files -g 'AGENTS.md' .
Thinking

Noticing missing agents file

Noticed no AGENTS file found per instructions. Planning to notify about missing AGENTS and ask for next steps, including mention of uitool note; will run bootledoot command before finalizing response as required.

$ ~/.dotfiles/bin/bootledoot
Permalink

Couldn’t find an AGENTS.md in ~/dev/steve (rg returned no matches). What would you like me to do next?

Permalink

I want you to plan and implement the tool I just described, in this folder.

Permalink
Thought 9 times. Used 9 tool calls. +0 -0 [Expand]
Thinking

Planning initial exploration

I'm starting with listing files to inspect the repo structure before forming a detailed plan and implementation steps.

$ ls -la
total 0
drwxr-xr-x   2 mikker  staff    64 Jan 15 22:21 .
drwxr-xr-x@ 92 mikker  staff  2944 Jan 15 22:21 ..
Thinking

Identifying need for planning tool

The target folder is empty, so the tool must be implemented from scratch. I'll start by creating a clear plan for the tool since it's not straightforward.

Tool: update_plan
update_plan
{"explanation":"Set up a minimal Swift CLI project that implements the described uitool commands and JSON output format.","plan":[{"step":"Decide project layout and Swift Package Manager configuration for a CLI named uitool","status":"in_progress"},{"step":"Implement core JSON output types and command parsing with subcommands and options","status":"pending"},{"step":"Add Accessibility API wrappers for app control, element discovery, and interactions","status":"pending"},{"step":"Wire commands to functionality and add basic error handling + exit codes","status":"pending"},{"step":"Add README and usage examples aligned with the spec","status":"pending"}]}
Plan updated
Thinking

Planning Swift CLI scaffolding

I'm designing a Swift package with CLI subcommands matching the spec, implementing minimal argument parsing manually to avoid dependencies, and preparing to build core Accessibility API interactions and JSON output in a modular multi-file structure.

Defining detailed Swift CLI design and accessibility handling

I've outlined a comprehensive plan for Swift package setup, manual argument parsing with global and command-specific flags, Accessibility API integration for app and UI element management, stable element ID encoding via child-index paths, event simulation for input and scrolling, JSON output conventions, error handling with exit codes, and support for all specified commands including element traversal, clicks, typing, menus, windows, and screenshots.

Finalizing project structure and command output conventions

I'm confirming JSON output rules with exit codes for existence and assertions, extending window ID handling with distinct "ax://win/" format, detailing screenshot logic for focused windows or elements, and planning to create a Swift package with a multi-file layout separating argument parsing, Accessibility API wrappers, JSON helpers, keycode mappings, and command implementations.

$ mkdir -p ~/dev/steve/Sources/uitool
Thinking

Creating Package.swift

$ cat <<'EOF' > ~/dev/steve/Package.swift\n// swift-tools-version: 5.9\nimport PackageDescription\n\nlet package = Package(\n    name: \
Thinking

Starting JSON helpers implementation

$ cat <<'EOF' > ~/dev/steve/Sources/uitool/JSON.swift\nimport Foundation\n\nenum UitoolExit: Int32 {\n    case success = 0\n    case notFound = 1\n    case appNotFound = 2\n    case timeout = 3\n    case permissionDenied = 4\n    case invalidArguments = 5\n}\n\nstruct JSON {\n    static func ok(_ data: Any? = nil, quiet: Bool = false) {\n        guard !quiet else { return }\n        let payload: [String: Any]\n        if let data {\n            payload = [\
Thinking

Creating AX helpers struct and file

I'm setting up a new AX.swift file to hold helper functions and a GlobalOptions struct, importing Foundation, AppKit, and ApplicationServices for accessibility features.

$ cat <<'EOF' > ~/dev/steve/Sources/uitool/AX.swift\nimport AppKit\nimport ApplicationServices\nimport Foundation\n\nstruct GlobalOptions {\n    var appName: String?\n    var pid: pid_t?\n    var bundleId: String?\n    var timeout: TimeInterval = 5\n    var verbose = false\n    var quiet = false\n}\n\nstruct AXHelper {\n    static func ensureTrusted() -> Bool {\n        if AXIsProcessTrusted() { return true }\n        let options: NSDictionary = [kAXTrustedCheckOptionPrompt.takeRetainedValue() as NSString: true]\n        _ = AXIsProcessTrustedWithOptions(options)\n        return AXIsProcessTrusted()\n    }\n\n    static func frontmostApp() -> NSRunningApplication? {\n        NSWorkspace.shared.frontmostApplication\n    }\n\n    static func runningApp(options: GlobalOptions) -> NSRunningApplication? {\n        if let pid = options.pid {\n            return NSRunningApplication(processIdentifier: pid)\n        }\n        if let bundle = options.bundleId {\n            return NSRunningApplication.runningApplications(withBundleIdentifier: bundle).first\n        }\n        if let name = options.appName {\n            return NSRunningApplication.runningApplications(withBundleIdentifier: name).first\n                ?? NSWorkspace.shared.runningApplications.first(where: { $0.localizedName == name })\n        }\n        return frontmostApp()\n    }\n\n    static func appElement(for app: NSRunningApplication) -> AXUIElement {\n        AXUIElementCreateApplication(app.processIdentifier)\n    }\n\n    static func systemWideElement() -> AXUIElement {\n        AXUIElementCreateSystemWide()\n    }\n\n    static func attribute<T>(_ element: AXUIElement, _ attr: CFString) -> T? {\n        var value: CFTypeRef?\n        let result = AXUIElementCopyAttributeValue(element, attr, &value)\n        guard result == .success, let value else { return nil }\n        return (value as AnyObject) as? T\n    }\n\n    static func boolAttribute(_ element: AXUIElement, _ attr: CFString) -> Bool? {\n        if let value: NSNumber = attribute(element, attr) {\n            return value.boolValue\n        }\n        return nil\n    }\n\n    static func frame(of element: AXUIElement) -> CGRect? {\n        guard let axValue: AXValue = attribute(element, kAXFrameAttribute) else { return nil }\n        var rect = CGRect.zero\n        if AXValueGetType(axValue) == .cgRect {\n            AXValueGetValue(axValue, .cgRect, &rect)\n            return rect\n        }\n        return nil\n    }\n\n    static func children(of element: AXUIElement) -> [AXUIElement] {\n        if let children: [AXUIElement] = attribute(element, kAXChildrenAttribute) {\n            return children\n        }\n        return []\n    }\n\n    static func role(of element: AXUIElement) -> String? {\n        attribute(element, kAXRoleAttribute)\n    }\n\n    static func title(of element: AXUIElement) -> String? {\n        if let title: String = attribute(element, kAXTitleAttribute) { return title }\n        if let desc: String = attribute(element, kAXDescriptionAttribute) { return desc }\n        return nil\n    }\n\n    static func identifier(of element: AXUIElement) -> String? {\n        attribute(element, kAXIdentifierAttribute)\n    }\n\n    static func elementInfo(element: AXUIElement, pid: pid_t, path: [Int], depth: Int) -> [String: Any] {\n        var dict: [String: Any] = [:]\n        dict[\
Thinking

Focusing on key codes

$ cat <<'EOF' > ~/dev/steve/Sources/uitool/KeyCodes.swift\nimport ApplicationServices\n\nstruct KeyCodes {\n    static let mapping: [String: CGKeyCode] = [\n        \
Thinking

Implementing command parsing helpers

Planning to build a helper struct with functions for precise mouse clicks at coordinates, fallback AX actions on elements, text typing via CGEvents, key combos, and methods to get focused windows and their identifiers.

$ cat <<'EOF' > ~/dev/steve/Sources/uitool/Commands.swift\nimport AppKit\nimport ApplicationServices\nimport Foundation\n\nstruct CommandContext {\n    var options: GlobalOptions\n}\n\nstruct Commands {\n    static func apps(ctx: CommandContext) -> Int32 {\n        let apps = NSWorkspace.shared.runningApplications\n        let data = apps.map { app in\n            [\n                \
Thinking

Defining global options parsing logic

I'm planning a parseGlobalOptions function that removes recognized global flags in-place from args, capturing values like --app and --timeout, and handling missing values by returning an error to enforce argument validity before dispatching commands.

Implementing error handling and help check

I'm updating parseGlobal to return a tuple with options and error, so main can handle errors and show usage, also ensuring only known global flags are removed while preserving command-specific flags like --wait.

$ cat <<'EOF' > ~/dev/steve/Sources/uitool/main.swift\nimport Foundation\n\nfunc parseGlobalOptions(_ args: inout [String]) -> (GlobalOptions, String?) {\n    var options = GlobalOptions()\n    var i = 0\n    while i < args.count {\n        let arg = args[i]\n        switch arg {\n        case \
Loading more...