Originally posted on devops.
AI is buzzing again thanks to the recent release of ChatGPT, a natural language chatbot that people are using to write emails, poems, song lyrics and college essays. Early adopters have even used it to write Python code, as well as to reverse engineer shellcode and rewrite it in C. ChatGPT has sparked hope among people eager for the arrival of practical applications of AI, but it also begs the question of whether it will displace writers and developers in the same way robots and computers have replaced some cashiers, assembly-line workers and, perhaps in the future, taxi drivers.
It’s hard to say how sophisticated the AI text-creation capabilities will be in the future as the technology ingests more and more examples of our online writing. But I see it having very limited capabilities for programming. If anything, it could end up being just another tool in the developer’s kit to handle tasks that don’t take the critical thinking skills software engineers bring to the table.
ChatGPT has impressed a lot of people because it does a good job of simulating human conversation and sounding knowledgeable. Developed by OpenAI, the creator of the popular text-to-image AI engine DALL-E, it is powered by a large language model trained on voluminous amounts of text scraped from the internet, including code repositories. It uses algorithms to analyze the text and humans fine-tune the training of the system to respond to user questions with full sentences that sound like they were written by a human.
But ChatGPT has flaws—and the same limitations that hamper its use for writing content also render it unreliable for creating code. Because it’s based on data, not human intelligence, its sentences can sound coherent but fail to provide critically-informed responses. It also repurposes offensive content like hate speech. Answers may sound reasonable but can be highly inaccurate. For example, when asked which of two numbers, 1,000 and 1,062, was larger, ChatGPT will confidently respond with a fully reasoned response that 1,000 is larger.
OpenAI’s website provides an example of using ChatGPT to help debug code. The responses are generated from prior code and lack the capability to replicate human-based QA, which means it can generate code that has errors and bugs. OpenAI acknowledged that ChatGPT “sometimes writes plausible-sounding but incorrect or nonsensical answers.” This is why it should not be used directly in the production of any programs.
The lack of reliability is already creating problems for the developer community. Stack Overflow, a question-and-answer website coders use to write and troubleshoot code, temporarily banned its use, saying there was such a huge volume of responses generated by ChatGPT that it couldn’t keep up with quality control, which is done by humans. “Overall, because the average rate of getting correct answers from ChatGPT is too low, the posting of answers created by ChatGPT is substantially harmful to the site and to users who are asking or looking for correct answers.”
Coding errors aside, because ChatGPT—like all machine learning tools—is trained on data that suits its outcome (in this case, a textual nature), it lacks the ability to understand the human context of computing to do programming well. Software engineers need to understand the intended purpose of the software they’re creating and the people who will be using it. Good software can’t be built by cobbling together programs with regurgitated code.
For example, ChatGPT cannot understand the ambiguity in simple requirements. Although it is clear that if one ball just bounces and comes back but another bounces and then bounces again, the second ball has traveled further, ChatGPT struggles with this nuance; that nuance will be needed if these systems are ever to take over from developers.
It also has trouble with basic math, such as when it’s asked to determine which is greater and offered a choice between a negative and positive number. ChatGPT confidently tells us a correct summation of the space, but cannot understand that -5 is less than 4. Imagine your thermostats going haywire because the heating kicks on at 40 degrees Celsius instead of at -5 degrees Celsius because the AI program coded it that way!
Pre-trained AI code generation also raises some legal questions with regard to intellectual property rights; it cannot currently distinguish between code that is licensed in a restrictive or open fashion. This could expose people to licensing compliance risk if the AI borrows a prewritten line of code from a copyrighted repository. The problem has already prompted a class action lawsuit against a different OpenAI-based product called GitHub Copilot.
We need humans to create the software people rely on, but that’s not to say there couldn’t be a place for AI in software development. Just like automation is being used by security operations centers for scanning, monitoring and basic incident response, AI could serve as a programming tool for handling lower-level tasks.
This is already happening, to an extent. GitHub Copilot allows developers to use ChatGPT to improve their code, add tests and find bugs. Amazon offers CodeWhisperer, a machine language-powered tool designed to help increase developer productivity using code recommendations generated by natural language comments and code in the integrated environment. And someone has created a Visual Studio code extension that works with ChatGPT.
And one company is testing AI for developers. DeepMind, which shares a parent company with Google, released its own code generation tool, dubbed AlphaCode, earlier this year. DeepMind published the results from simulated evaluations in competitions on the Codeforces platform in Science magazine earlier this month under the headline Machine Learning systems can program too. Headline grammar aside, AlphaCode achieved an estimated rank within the top 54% of participants by solving problems “that require a combination of critical thinking, logic, algorithms, coding, and natural language understanding.” The abstract for the paper says: “The development of such coding platforms could have a huge impact on programmers’ productivity. It may even change the culture of programming by shifting human work to formulating problems, with machine learning being … responsible for generating and executing codes.”
Machine learning systems are becoming increasingly advanced each day; however, they cannot think like the human brain does. This has been the case for the past 40+ years of study into artificial intelligence. While these systems can recognize patterns and increase productivity for simple tasks, they may not always produce code as well as humans. Before we let computers do code generation en masse, we should probably see systems like AlphaCode rank in the top 75% of participants on a platform like Codeforces, though I fear this may be too much for such a system. In the meantime, machine learning can help with simple programming problems in the future, allowing developers of tomorrow to think of more complex issues.
At this point, ChatGPT won’t be disrupting any field of technology, especially not software engineering. Concern about robots displacing programmers is vastly overstated. There will always be tasks that developers with human cognition can do that machines will never be capable of.