sa

SWE Agent

Open-source Devin alternative

Coding, general purpose

About

This Devin alternative scores 12.3% on the FULL swe benchmark
"An open source Devin getting 12.29% on 100% of the SWE Bench test set vs Devin's 13.84% on 25% of the test set!"
SWE-agent works by interacting with a specialized terminal, which allows it to:
- 🔍 Open, scroll and search through files
- ✍️ Edit specific lines w/ automatic syntax check
- 🧪 Write and execute tests
This custom-built interface is critical for good performance. Simply connecting an LM to a vanilla bash terminal does not work well.
"Our key insight is that LMs require carefully designed agent-computer interfaces (similar to how humans like good UI design). E.g. When the LM messes up indentation, our editor prevents it and gives feedback."
SWE-agent was released by the Princeton NLP team.
What makes SWE-agent special is that it performs almost as well as Devin on the SWE-bench.
It is important to say that the performance varies based on the model used by the agent.
The changes and innovations in SWE-agent compared to Devin are:
- The code in SWE Agent is executed locally via Docker.
- It uses “Agent-Computer Interface” (ACI) - constraining the interface makes the agent easier to use for LMs. Only a few commants are allowed: run code, look for code, edit code and submit changes to GitHub.
Any code the agent writes goes through a syntax check (linter) before being submitted. If the syntax is incorrect, the agent gets feedback and is forced to redo the code.
The agent can only read 100 lines of code at a time, rather than the entire file. This makes it easier for the language model to understand the code.

Request product update