close
close

Cosine introduces Genie, the AI ​​software developer who beats Devin from Cognition


Cosine introduces Genie, the AI ​​software developer who beats Devin from Cognition

The race to build a team of AI software developers is not over yet. According to Devin from Cognition, Cosine, the human thought lab, has introduced Genie, which is considered the world’s most powerful AI software development model, scoring 30.08% in SWE Bench evaluations.

Genie is designed to mimic the cognitive processes of human engineers, allowing it to solve complex problems with remarkable accuracy and efficiency. “We believe that if you want a model to behave like a software developer, you need to show it how a human software developer works,” said Alistair Pullen, founder of Cosine.

Additionally, UK-based AI startup Cosine has secured $2.5 million in funding from SOMA and Uphonest, as well as additional investments from Lakestar and Focal, and is part of the YC-W23 batch.

As the first peer in AI software development, Genie is trained using data that reflects the logic, workflow and cognitive processes of human engineers.

This allows Genie to overcome the limitations of existing AI tools, which are often extensions of basic models with additional functionality, such as web browsers or code interpreters. Unlike these, Genie can tackle unforeseen problems, test solutions iteratively, and proceed logically, much like a human engineer.

Genie set a new standard on SWE-Bench, achieving a score of 30.08%, a 57% improvement over the previous best scores from Amazon’s Q and Code Factory.

This milestone represents not only the highest score ever recorded, but also the largest single increase in the benchmark’s history. Genie’s improved thinking and planning capabilities extend beyond software development, making it a versatile tool for a variety of fields.

During its development, Genie was evaluated using SWE-Bench and HumanEval, with a strong focus on its ability to solve software development problems and retrieve the right code for tasks.

Genie achieved a score of 64.27% in retrieving the required lines of code, identifying 91,475 of 142,338 required lines. This represents significant progress, although Cosine acknowledges there is still room for improvement in this area.

When developing Genie, challenges related to training models with limited context windows had to be overcome. Early experiments with smaller models made it clear that a larger context model was needed, which led to Genie being trained with billions of tokens. The training mix was carefully chosen to ensure mastery of the programming languages ​​most relevant to users.

Cosine’s innovative approach to developing Genie involved the use of self-improvement techniques, where the model was exposed to imperfect scenarios and learned to correct its mistakes. This iterative process greatly strengthened Genie’s problem-solving skills.

Looking ahead, Cosine plans to continue refining Genie and expanding its capabilities to more programming languages ​​and frameworks. The company wants to develop smaller models for simpler tasks and larger ones for complex challenges, leveraging its unique dataset. Exciting future developments include fine-tuning Genie on specific codebases so it can understand large, legacy systems in less common languages.

Leave a Reply

Your email address will not be published. Required fields are marked *