It’s been a while since GitHub released their take on the AI pair programmer. Let's see if Copilot is living up to its promise.
Will AI replace programmers? When GPT-3 from OpenAI came along, it was most visibly praised for writing human-like articles. At the same time, many programmers used GPT-3 to write HTML and CSS, or do things like program SQL database requests.
The hype for AI pair programming started gaining steam, and then GitHub’s Copilot came out. The heart of Copilot is an engine that predicts code. It’s based on OpenAI Codex (GPT-3’s kid with a special knack for programming), it's trained on public code from all of GitHub’s users, and it's controversial.
The Free Software Foundation even called it “unacceptable and unjust”, bringing into light many important issues. For example, if Copilot gives you code that’s exactly the same as someone else’s, aren’t you breaking copyright?
It’s a tricky, legally unregulated area. GitHub responded to criticism very positively, with a statement that they are “keen to engage in a discussion with developers on these topics and lead the industry in setting appropriate standards for training AI models”.
Copilot was released in July, the criticism from FSF was back in August, and now we’re nearing the end of 2021. How is Copilot doing, and has GitHub kept its promises? To find out, we're going to review several reviews published online by different tech experts.
GitHub Copilot’s Limitations 🤖
Copilot is paving the way towards AI programmers. There are at least ten alternatives, but for the time being, GitHub’s product is the most hyped up.
Nonetheless, Copilot has limitations:
- It comes as an add-on to only a few most popular code editors;
- It’s free, but there’s a waitlist to get access;
- It’s still a technical preview version, not a complete product.
For the time being, Copilot isn’t a great, well-rounded programmer in its own right. It’s more of a helper. It watches what you’re doing and suggests different options to push your coding along.
Sometimes it writes decent code for you. Other times it writes bad code:
“You absolutely need to review the code that Copilot generates. Treat it as though it was written by a green programmer intern who is good with Google searches but needs close supervision.” - Martin Heller, InfoWorld
The main promise of AI engines, like the one that powers Copilot, is that they get better with time. As users keep generating new data, Copilot will keep getting smarter.
At the moment, you can’t just unleash Copilot on your code and let it rip. Human review is still a critical element to ensure that AI-generated code is OK:
“That 'human in the loop' review stage is important to the responsible use of large language models because it's a way to catch a problem before [...] the code goes into production. Code licences are one issue when it comes to writing code, but AI-generated text could create all sorts of headaches, some embarrassing and some more serious.” - Mary Branscombe, ZDNet
Expert reviews of Copilot and AI pair programming 💬
Whenever there’s a lot of hype around something, it’s good to seek out level-headed people that can offer perspective on the subject.
Just by looking at the stats, it seems that Copilot is doing great. GitHub reportedly found on their platform that “about 30% of newly written code is being suggested by [...] Copilot”. Plus, half the developers who tried Copilot kept on using it regularly.
To further understand why the industry seems to be liking AI pair programming and GitHub Copilot, let’s see what tech experts are saying about it online.
“Feels quite magical to use” 🔮
Jeremy Howard, a founding researcher at fast.ai, sees Copilot as a blessing on one hand and a potential curse on the other.
He notes that Copilot isn’t the first AI-powered pair programmer but it’s the most powerful so far, because “it can generate entire multi-line functions and even documentation and tests, based on the full context of a file of code”.
“The code Copilot writes is not very good code”, however, “complaining about the quality of the code written by Copilot feels a bit like coming across a talking dog, and complaining about its diction. The fact that it’s talking at all is impressive enough!”.
Mr Howard explains that Copilot is kind of doomed to be inaccurate. The reason lies at the heart of how AI models like this work. Copilot doesn’t understand whether the code is good or not. But it has analyzed endless codebases similar to yours, so it can predict what your code should look like.
You’re going to have to edit Copilot’s code down, though: “Copilot’s code is verbose, and it’s so easy to generate lots of it that you’re likely to end up with a lot of code!”.
Copilot might be a good thing, but there’s a possibility that it could wreck your project with a sort of death by a thousand cuts: “For those for whom it’s a curse, they may not find that out for years, because the curse would be that they’re learning less, learning slower, increasing technical debt, and introducing subtle bugs – are all things that you might well not notice, particularly for newer developers”.
Ultimately, for Mr Howard, Copilot didn’t fulfil the promise of being an AI pair programmer right after launch (his review is from July). But it was still a huge step for code-generating language models. Plus, it was already visibly useful in some niche areas, like helping coders get comfortable with a new programming language.
If we want AI that writes code fully by itself, “we’ll need to go beyond just language models, to a more holistic solution that incorporates best practices around human-computer interaction, software engineering, testing, and many other disciplines”.
“It's not gonna take our jobs anytime soon” 🙅♀️
In a review from the Stack Overflow Podcast crew, we hear that Copilot is great for boilerplate code (repetitive, non-creative things you need to write to enable standard functionalities), but “if you start doing anything more complex than that, that's when you start to say, okay, you need to calm down Copilot”.
Another reviewer on the podcast thinks “it's actually very aptly named Copilot because you are supposed to be the pilot, and you're supposed to write most of the code”.
Throughout the episode, reviewers echo the sentiment that Copilot is great for those standard things that developers always have to Google. The small things that developers have written millions of times in the same way, so Copilot doesn’t have any issue knowing what the best option is.
One of the reviewers is convinced that Copilot is “a lot like the Gmail autocomplete”. Just like Gmail finishes phrases like 'All the best' for you, Copilot is great as an autocomplete for developers.
It seems that Copilot didn’t amaze the Stack Overflow Podcast crew with its capabilities, but it was definitely an impressive prequel to the future of AI pair programming.
Just like the first one, this review is from July. So now let’s check out a fresh review from November 13, 2021.
”Productivity boost that will most likely create more jobs” 💪
Programmer / YouTuber sentdex notes that Copilot’s engine is much lighter than GPT-3 (12 billion parameters compared to 175 billion parameters), so it’s much faster to run.
GPT-3 had a limit of 4kb for context file weight. Copilot has a limit of 14kb which, as sentdex shows, is about 400 lines of code. To be clear, Copilot will still work on files bigger than that, but it can only take those 400 lines of code as its context for code prediction.
He goes on to show proof that Copilot has some deeper understanding of the code, it’s not just pattern-matching code lines to each other, because it can predict the outputs of functions with perfect accuracy.
Ultimately, sentdex sees Copilot as another layer of abstraction from code. He compares it to using Python instead of C++:
- with Python, you can build bigger things faster because you’re using bigger, ready-made blocks;
- with C++, you have to specify every tiny detail in your code and build your own blocks, so it takes longer to build big things.
This makes Copilot a great productivity boost. Sentdex has a refreshing perspective, saying that tools like Copilot won’t kick programmers out of their jobs. Instead, AI pair programmers will be a way for software developers to build more things, get more experience, and increase their value on the market.
Another great thing that sentdex found in his review is that Copilot fosters fast learning. When sentdex wanted to check out a new code library, he just wrote a prompt and Copilot guided him through using that library. It even showed him methods that he didn’t know existed.
Normally developers need to go through pages of documentation and several Stack Overflow threads when starting to use a new tool. Having an AI friend give you a quickstart guide instead of doing all of that research has to be very convenient.
When it comes to limitations, sentdex noticed that Copilot makes mistakes. It doesn’t always understand your intention. Just like with Googling, writing prompts that get Copilot to do exactly what you want is a skill that takes time to learn.
Ultimately, sentdex admits that he generally doesn’t like tools like this so he didn’t expect to like Copilot. However, he was very positively surprised and found Copilot very useful, and perfect for increasing productivity as a software developer.
Decent assistant, long way from replacement 🚧
The general consensus seems to be that Copilot is a great tool that can enhance your productivity, but it won’t solve all of your problems for you, and you’ll probably need to edit Copilot’s code quite a bit.
But, if you’re a software developer who's willing to take advice from a robot assistant, you might just:
- learn new things quicker than before,
- enhance your productivity,
- increase your value on the software development market.
Copilot, along with its biggest competitors, opened the door towards mass adoption of AI programming assistants. It’s unlikely that these systems will make programmers obsolete. Plus, this technology is still limited. To replace programmers, we would need powerful Artificial General Intelligence (AGI), meaning systems that think like humans.
We’re still far from developing technology like this. And, if almost all of science-fiction is to be believed, the development of AGI would cause problems much bigger than unemployment among programmers—leaving even fewer reasons to worry about it.