So now there has been an announcement that has shocked the world that has been going pretty viral about Dev

on the first AI software engineer and apparently he’s pretty good apparently he’s as good as being able to pass actual engineering interviews and just imagine you’re conducting the interview you’re the interviewer and then this goofy ass AI named Devon joins the chat and wants to do the interview anyway and according to the announcement he can also do real jobs on Upwork and in this article together with you I want to find out what was behind the hype is Devon actually good enough to replace us all and is it time for us to discuss alternative career options with the boys now the company behind Devon is cognition labs and listen man

I don’t know about you but I’ve never heard of that company before this announcement they seem like a super new thing even having joined Twitter as late as January 2024 even the content on their YouTube channels like at most a couple hours old which to me seems like they just kind of locked themselves in a basement for years with no any kind of communication to the outside world just to launch this thing into the world without elaborating and you got to give it to him it worked they’ve already secured $21 million in funding led by the founders fund which is a San Francisco based fund with a fancy website so you know it must be good now what’s especially impressive at least to me is the claimed performance of vein on the E bench Benchmark, and if you don’t know what that is, like me before, it’s a measure to evaluate large language models on real-world issues that you collect from open source repositories on GitHub, and then the model gets a code base from any random open source code base, gets an issue, and is then tasked with solving that issue.

generating kind of like a fix for a problem in the code and it all boils down to the model writing code that passes unit tests and if all unit tests run successfully the model gets a score of one and if code written by the model doesn’t pass the unit test if it fails at any point then it gets a score of zero and in this Benchmark Devon can solve about three times as many problems unassisted by the way no human intervention than any other model so in total about 13.8% of problems that it gets can be solved with no human help and at first 13.8% might not seem like a lot it still needs a lot of help but just remember where we were a year ago when mid Journey wasn’t even able to create hands on the images it generates and where we are now where that kind of perfect L works fine and dude who could have been happier about this announcement than software Engineers they see the beautiful opportunities and Endless Possibilities that come with Devon they see him as the gift to the community that he is always friendly and ready to help with your coding problems hey wait a minute people are pissed off well who could have thought that people don’t love the technology that could one day replace them and I say could not will let’s take a look at the demos they posted that show what Devon can do what it’s able to solve and you be the judge so here’s the entire announcement from cognition app introducing Devon and

there’re some parts in here we’ve already talked about like the job interviews that has done or the Upwork work but what we haven’t talked about yet is that it has an built-in Shell Cove editor and web browser and GPT already had the capabilities to execute Python and Devon takes that a lot further but much more interesting is the demo of Devon what can it do how does it look like to interact with it so on the left-hand side we have the chat interface where we can ask questions post or code for example and on the right side we can see four things The shell, the browser, the editor, and the planner are kind of like an internal list for Devon of tasks that it goes through in the code editing process. By the way, that can take up to a couple minutes, but all things considered, that’s still not super long.

and it looks like we can already ask Devon something like, Hello, Deon. The problem is that it doesn’t actually work; it’s still behind the weight list, so we can’t actually prove that Deon is good or bad or whatever because it’s not public access yet. But what we can tell from the demo is that this right here is the planner, where Devon has like a list of tasks, research the API documentation for replication, and so on and so on. Write a Python n script. Implement response time measurement and so on that it goes through sequentially it can execute the code in the shell and by doing that it can solve code problems one at a time and as you can see right here in the bottom right corner, we can also give it custom environment variables that it can use to interact with the API so it’s not limited to free to use apis at all it even got the documentation open for the websites it’s trying to solve the code for and we can also during the whole thing interact with Deon and ask it more questions that’s pretty cool as a first demo that’s really neat but they also posted for specific videos of what it can do the first demo is this one

Devon can learn how to use unfamiliar Technologies this is an image generator that has like hidden text inside of it where this says I guess Sarah for example but it’s kind of hidden in the image and basically she asked Devon to build something like it and it did that’s pretty impressive A second example is Devon’s ability to contribute to mature production repositories in real-world enterprise scenarios. This is likely the best one because Devon can learn the repository and then answer questions from senior engineers and junior engineers about the repository and also kind of help with a codebase. Rarely are you going to have the case where Devon independently makes PLL requests to your repo and adds features, at least right now, which is nowhere near realistic, and it will even ask us right here to please provide the GitHub username and password to push the changes to the repository. I cannot think of anything that might go wrong with that. third example is Devon can train and fine tune it own AI models that’s cool I don’t think it’s too interesting so I’m not going to go too deep into it but essentially maybe Devon can one day create a better Devon and it’s going to be Devon setion and then fourth one is we even tried giving Devon real jobs on upwork and it could do those two and for me this is one of the most impressive but I don’t know how Cherry Picked this example is for all I know this could be the one thing it does well right we can’t tell if it’s not Public Access but just the idea of it doing upwork stuff seems pretty impressive but also I think Jack Harington said it very well the history of AI thus far has been making outrageous promises and then being entirely underwhelming in reality and that is setting the bar at close enough is good enough which is not production ready on any job I’ve had and you know what I 100% agree at my previous job at my current job no way AI would be insanely helpful

maybe Devon will be and I commented even here it’s absolute ass for anything software related and by that I mean chat GPT mostly any coding tool that is AI that I’ve tried they are not good for anything that’s not super obvious very basic algorithmic stuff they can do but also Devon can code Chrome extension so it’s not Public Access yet but you can ask the Creator stuff like for example this guy asked if it could build a Chrome extension that they’ve built and then the creators cognition Labs sent a video of Deon doing just that now is this Cherry Picked again very hard to tell and you can kind of see the process here even though the resolution is terrible I think that that’s because of Twitter but you can see Devon go through the process of creating that Chrome extension now this entire thing of course is not without criticism especially from software developers meet Devon the world’s first AI developer we’ve raised 21 million but can’t use Devon to make a web app for onboarding so we used Google forms instead like bro is just going pretty hard on Google forms there’s nothing wrong with Google forms they’re pretty solid to be honest but you know understandable criticism whatever what I found funny is if you head over to the cognition Labs website they have three open positions like a machine learning researcher a General application and also a software engineer so maybe Devon is not as good as replacing internal people yet maybe you still need humans apparently you still need humans and can’t just offload all the work to an AI at least yet anyway just like with previous large language models I think the main purpose is helping us software developers find Solutions faster and being helpful in very certain code scenarios like calculators didn’t change the need for human mathematicians they just kind of shifted the focus for human work to be more high level and for software Engineers that could be more application planning and software design rather than

actual coding but again with the current state of AI even with Devon with its 13.8% unassisted we are far from any kind of replacement I guess that’s just my two cents, though you are totally fine to disagree with me if you do share your thoughts down in the comments.

Leave a Reply

Your email address will not be published. Required fields are marked *