STATISTICAL ANALYSIS OF LLVM IR COMPILATION
Date
2024-05
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
LLVM has become an integral part of many compilation pipelines, from closed source to open-source compilers across industry and academia. Since the open-source
LLVM project was started by Chris Lattner and Vikram Adve, it has proven to be a
versatile and efficient language representation that is capable of being used in multiple
product environments across multiple system architectures. Because LLVM has become
mature and is now in frequent use, LLVM is constantly being changed by developers.
Users of LLVM expect robust functionality and efficiency in the compilation
pipeline without needing to write source code to take advantage of specific parts of
the pipeline. More specifically, users expect compiler optimization to preserve the
functionality of code while improving the execution runtime. If the compilation takes
a significant amount of time to complete, that becomes a notable bottleneck in the
development process of the source code. Furthermore, editing parts of the LLVM
optimization pipeline that are contributing to a large amount of time in compilation
is necessary to lower the overall time to complete the LLVM pipeline execution.
As such, identifying codes that trigger large compilation times in parts of the
optimization pipeline can yield insight into which parts of the pipeline are contributing
most to the compilation time. Furthermore, by considering only the LLVM Interme diate Representation (IR) taken from a given source code, insights can be obtained
that apply to several other cases of source code with LLVM IR representations (gen erated by the compiler frontend). Thus, analyzing LLVM IR and the length of time
it takes to compile can provide a straightforward way of suggesting which portions of
the LLVM optimization pipeline (invoked using opt) are responsible for unexpectedly
large compilation times.