Artificial General Intelligence obsoletes Software Reverse/Re-Engineering research as we know it!?

This week I am going to sit on a panel on the broader topic of “the future of software re-engineering”, as part of WSRE 2023 -- a German workshop on software (reverse and) re-engineering. I decided to label my “impulse presentation” with the dramatic and provocative (?) title “Artificial General Intelligence obsoletes software reverse/re-engineering research as we know it!?”.

In preparing this panel function, I was actually limiting myself to language models such as ChatGPT and didn't really think too much about AGI. So let's continue with a more appropriate title:

Language models support software reverse/re-engineering research!

This claim can be substantiated with the help of a few ChatGPT chats that I designed, executed, and interpreted. The annotated logs of the chats and my slides used at WSRE 2023 are available online. The chats are annotated to capture my “expectations” regarding my questions to the AI and my “judgments” regarding the answers by the AI.

Short summary of the chats

  1. Let the AI propose a methodology for a relatively specific research scenario in the context of software reverse engineering: some sort of architecture recovery for callbacks to be evaluated with a methodology as in Mining Software Repository (MSR). Advance the communication to determine the AI's ability to support automation of certain steps of the methodology.
  2. Let the AI propose a structured implementation (approach) of a relatively specific scenario in the context of software re-engineering: some sort of language migration (Cobol to Java). Advance the communication to determine the AI's ability to generate some central parts of the implementation meeting a technology (ANTLR) and design choice (grammar-to-grammar mapping).
  3. Let the AI propose a list of research challenges in software re- and reverse engineering while also demanding eventual specialization of a selected challenge (here: code deobfuscation) down to the level of code-based illustration and references to suitable literature. 
  4. Let the AI reflect on an AI's capabilities to carry out more or less autonomously research on software re- and reverse engineering. We want to see what limitations the AI identifies for itself and how it assesses the need for cooperation with human counterparts.
These chats revealed (or clarified or documented) certain ChatGPT capabilities and limitations, which we discuss next.

Identified ChatGPT capabilities

  1. The attempt to receive help from the AI on the matter of a research methodology in a specific software reverse engineering context (i.e., extraction of call backs in Java code) is relatively successful in so far that the AI synthesizes a general methodology at the start and refines the methodology subsequently, as more parameters are inserted into the conversation.
  2. The attempt to receive help from the AI on the matter of implementing a service or product in a specific software re-engineering context (i.e., code migration from Cobol to Java) is relatively successful in so far that the AI synthesizes a general outline at the start and produces — after some trial and error — code illustrations for some interesting, technical part in detail.
  3. The AI can synthesize meaningful content on the relatively abstract topic of “research challenges” in the field at hand and leverages input from the human counterpart to become interestingly more specific in terms of claiming meaningful (current, concrete) challenges.
  4. The AI can synthesize meaningful content on the rather abstract topic (if not “meta-topic”) of “AI abilities” for carrying out research in the field at hand and engages in a deeper discussion of each challenge (e.g., NLP or domain expertise) such that a collaboration between AI and human counterpart is sketched.


Identified ChatGPT limitations

  1. The AI, at the current stage, cannot execute the typical empirical methodology in the field, where it would need to shortlist open source repositories. That's because the AI cannot search existing online repos in any systematic manner and the available language model hasn't indexed existing online repos in a general enough manner. This can be clearly improved. The AI does understand low-level criteria of inclusion/exclusion of repos. Those criteria would need to be "executed" by the AI.
  2. The AI, at the current stage, maps NL structures to code structures only unreliably. We faced non-trivial code structures in the software re-engineering context. In the recorded chat, we went through 4-5 iterations to see a demonstrating code example for a certain style of translation (migration). The missteps in between were largely off. While the AI claimed compliance with our requirements at the NL level, corresponding structures were missing or wrong in the code.
  3. The AI, at the current stage, has no usable means of citation. No matter how much one pushes the AI, one will not get reliable citations (bibliographic information). Papers are listed and links are provided, but the corresponding bibliographic information and the links are more or less "made up". (When being informed about bibliographic misinformation, the AI may change the bibliographic information a bit, without thought arriving at authentic references.)
  4. The AI, at the current stage, appears to be slightly “obsessed” to belittle itself in terms of its abilities to carry out research in the field at hand. While the AI confirms initially a certain coverage for each and every relevant ability, it later belittles its abilities, essentially narrowing down its view to “cooperation between AI and human counterpart” is the way to go. The AI does not allow itself to suggest that an AI's abilities would be drastically improvement (up to the point of potentially ruining the cooperation between AI and human counterpart), if the AI could meaningfully interact with the real world.


TL;DR

The AI is getting close to automating some routine steps of software reverse/re-engineering research. What’s missing at this stage is the connection from the AI to the real world — such as the AI being able to search GitHub repos or to run certain scripts on specific repositories. The AI is already useful in informing the methodology design including the initial definition of research questions (challenges). What’s missing at this stage for proper use in actual research is a more rigorous treatment of sources such as existing scholarly papers. In particular, the AI would need to be able to properly cite its sources and to work with proper bibliographic information.


Regards

Ralf 


PS: I also would like to capture my self-imposed principles of interrogation:

  1. It is not an interrogation, in fact. I would rather want to talk to the AI in a fair and friendly manner, as if I had conversations with peers or students.
  2. I also aimed at clear and comprehensive questions, thereby reducing the risks of getting suboptimal answers due to ambiguity or implicit assumptions.
  3. I should go into each initial and follow-up question with a clear expectation. I documented some of these expectations in the recorded chats.
  4. I should also analyze each answer (response) from the AI to infer a judgment.  I documented some of these judgments in the recorded chats.


PS2: A note on language models versus knowledge models:

Let me close my post by reminding us that ChatGPT and Bard are “just” language models. They are good at synthesizing “human-sounding text”, but they are not directly meant to give us search results or genuine text with clear provenance data. That’s to be taken into account before even talking about the AI also being able to interact with the real world.

I quote from a very recent report on a Google all-hands meeting on 2023-03-02:

Jack Krawczyk, the product lead for Bard, made his all-hands debut on Thursday, and answered the following question from Dory, which was viewed by CNBC.

“Bard and ChatGPT are large language models, not knowledge models. They are great at generating human-sounding text, they are not good at ensuring their text is fact-based. Why do we think the big first application should be Search, which at its heart is about finding true information?”

Krawczyk responded by immediately saying, “I just want to be very clear: Bard is not search.”


Comments

Popular posts from this blog

SWI-Prolog's Java Interface JPL

Software Engineering Teaching Meets LLMs

Lecture series on advanced (functional) programming concepts