Mining information from developer chats towards building software maintenance tools

Date
2021
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Software developers are increasingly having conversations about software development via online chat services. Many of those chat communications contain valuable information, such as description of code snippets and APIs, opinions on good programming practices, and causes of common errors/exceptions. Researchers have demonstrated that various software engineering tasks (e.g., recommend mentors in software projects, aid API learning) can be supported by mining similar information from other developer communications such as email, bug reports and Q&A forums. However, limited work has focused on investigating the availability and mining of information from developer chats. ☐ To successfully mine developer chat communications, one has to address several challenges unique to chats. The nature of chat community content is transient. Developer chats are typically informal conversations, with rapid exchanges of messages between two or more developers, where several clarifying questions and answers are often communicated in short bursts. Chats thus contain shorter, quicker responses, often interleaved with non-information-providing messages. As a result, it is difficult to find relevant information in a large chat history, and important advice is lost over time. My thesis is that software developers communications through online chat forums are a valuable resource to mine and valuable knowledge can be automatically mined from this resource towards improving and building new tools to support software engineers. ☐ The focus of this dissertation is Mining Information from Developer Chats Towards Building Software Maintenance Tools: (1) As a first step towards mining, we investigated the availability of information in chats through an empirical study. We also analyzed characteristics of chat conversations that might inhibit accurate automated analysis. (2) Next, we extended an existing algorithm to automatically extract (or disentangle) conversations for analysis by researchers or automatic mining tools. (3) Assessing the quality of information is essential for extracting useful information for building effective software maintenance tools. Hence, we designed an automatic technique to identify post hoc quality conversations, i.e. conversations containing useful information for mining or reading after the conversation has ended. (4) Finally, we studied the use of online chat platforms as a resource towards collecting developer opinions that could potentially help in building opinion Q&A systems, as a specialized instance of virtual assistants and chatbots for software engineers. We developed techniques for automatic identification of opinion-asking questions and extraction of participants’ answers from public online developer chats. ☐ This dissertation takes a significant step to positively impact new research directions on mining developer chats, and enables advances in areas including: information retrieval tasks from unstructured communications, enriching existing knowledge bases and community knowledge, efficient information gathering to improve code efficiency and increase developer productivity, building/enhancing recommendation systems and virtual assistants for software engineering.
Description
Keywords
Empirical study, Information extraction, Mining software repositories, Software developer chats, Software engineering, Software maintenance
Citation