Browser agent

MyCompAgent

A DOM-driven browser agent powered by Gemini native function calling, Playwright execution, approval modes, memory, and run-level debug traces.

Open GitHub Back to systems

Artifact-style board from repository surfaces, not a product screenshot.

Status: Live repository
Proof type: Browser agent
Stack: Playwright Gemini function calling memory debug logs
Source: github.com/rogue-socket/mycompagent

What it is

A DOM-to-action browser loop

MyCompAgent converts browser state into a DOM snapshot, interprets that snapshot for the model, asks Gemini for native function calls, and executes approved actions through Playwright.

Why it exists

Browser agents need controlled execution

The repo is built around the practical parts of browser control: human-in-loop modes, guardrails, persistent sessions, memory events, and evidence logs for what the model saw and did.

How it works

Snapshot, interpret, call, execute

01Snapshot

02Interpreter

03Function

04Playwright

Parse the browser into a compact DOM snapshot with visible state.
Maintain a multi-turn chat loop instead of one-shot action parsing.
Use Gemini native function calling for click, type, navigate, wait, and screenshot actions.
Record actions, LLM responses, browser state, interpreter state, memory events, snapshots, and video traces.