Software plays an essential role in our modern systems, across all industries. However, the development, maintenance and management of software constitutes an expensive and laborious part of the process, says Professor Eldan Cohen (MIE).
Cohen is leading a team of researchers to develop novel, human-centred machine learning algorithms for source code summarization with support from the Connaught New Researcher Award.
Source code summarization is the process of automatically summarizing a snippet of code into clear and concise language.
While well-documented source code can significantly reduce the cost of maintenance, manually documenting and summarizing code is tedious and time-consuming, which can also result in poorly documented code.
These summaries are meant to capture the purpose of code, helping developers understand, maintain and work with the codebase. Code summaries are particularly important in large software development projects and involve both natural language processing techniques and machine learning.
In recent years, there has been significant research into using artificial intelligence to develop automated source code summarization tools that can generate natural language summaries of code.
“Yet even state-of-the-art deep learning models are prone to mistakes in prediction, yielding summaries that do not match the provided source code. In such cases, software developers must reject the proposed summary and resort to manually documenting the code,” says Cohen.
To address this challenge, he recommends developing a human-in-the-loop technique for automated code summarization that considers the developer’s knowledge, preferences, and insight to overcome and learn from model mistakes. He is developing specialized machine learning algorithms designed to overcome limitations of existing approaches that suffer from limited diversity or from lower-quality summaries.
“We plan on doing this by creating interactive approaches where developers are presented with a small number of diverse and high-quality code summaries to choose from, reducing the risk of generating a single, incorrect summary,” says Cohen.
Human-in-the-loop code summarization allows developers to actively participate in the process of generating code summaries through machine learning algorithms. This method involves integrating human insights into the automated code summarization workflow.
The long-term goal of this work is to significantly improve the effectiveness of automatic source code summarization. By developing these human-in-the-loop approaches they hope to incorporate developer input into state-of-the-art deep learning models to improve the quality of generated code summaries.
The approach is expected to have significant scholarly impact with the potential to catalyze both research and commercial activity on human-in-the-loop automation in software engineering.
Cohen is one of 49 researchers from across U of T — and one of four from U of T Engineering — supported in the latest round of the Connaught New Researcher Awards, which helps early-career faculty members establish their research program.
“Students are involved in all stages of this project and are actively involved in developing and evaluating the novel human-in-the-loop techniques for automatic source code summarization,” says Cohen. “The funds from this award will primarily go to supporting their research.”
The other three projects from U of T Engineering supported by the Connaught New Researcher Awards are:
- Margaret Chapman (ECE) – Risk-aware, adaptive and scalable algorithms for smart sewer technology in Toronto
- Christopher Lawson (ChemE) – Engineering untapped anaerobic bacteria for sustainable fuel and chemical production
- Jay Werber (ChemE) – Ultra-thin bipolar membranes for carbon dioxide removal applications