Autonomous Bug-Report Testing for Android

Benchmarking how well autonomous agents reproduce bugs

With Dr. Tingting Yu. Bug reports are messy; testing whether an autonomous system can reproduce them requires both the agent and the harness to measure success/failure correctly.

I’m using computer-use agents to extract APK info and evaluate the outcome of executed reports. The novel methods sit alongside the benchmark itself — both are research output.

Highlights

Eval + benchmarks for autonomous bug-report testing.
Computer-use agents extracting APK info & evaluating outcomes.

← back to all work