sa
About
This Devin alternative scores 12.3% on the FULL swe benchmark
SWE-agent works by interacting with a specialized terminal, which allows it to:
🔍 Open, scroll and search through files
✍️ Edit specific lines w/ automatic syntax check
🧪 Write and execute tests
This custom-built interface is critical for good performance. Simply connecting an LM to a vanilla bash terminal does not work well.
SWE-agent was released by the Princeton NLP team.
What makes SWE-agent special is that it performs almost as well as Devin on the SWE-bench.
It is important to say that the performance varies based on the model used by the agent.
The changes and innovations in SWE-agent compared to Devin are:
The code in SWE Agent is executed locally via Docker.
It uses “Agent-Computer Interface” (ACI) - constraining the interface makes the agent easier to use for LMs. Only a few commants are allowed: run code, look for code, edit code and submit changes to GitHub.
Any code the agent writes goes through a syntax check (linter) before being submitted. If the syntax is incorrect, the agent gets feedback and is forced to redo the code.
The agent can only read 100 lines of code at a time, rather than the entire file. This makes it easier for the language model to understand the code.
Request product update