Dataflow software pipelining for codelet model using hardware-software co-design

Journal Title
Journal ISSN
Volume Title
Software pipelining is a code mapping scheme to exploit pipelined parallelism in a loop. It has been successfully applied by compilers to exploit Instruction Level Parallelism (ILP), capable of scheduling up to a couple of hundreds of machine instructions in pipelined execution. However, rapid advances in chip technology and computer architecture have enabled the design and production of chips with thousands of cores or even hundreds of thousands of cores, far beyond the limit of classical software pipelining. An open question is: Can the software pipeline technology be extended and applied to meet such challenges? ☐ This work addresses the above challenges by extending the software pipelining beyond the limit of fine-grain, instruction-level parallelism for the Codelet Model. The extended operational semantics of the Codelet model takes advantage of dataflow software pipelining principles by exploiting pipelined parallelism across loops (coarse-grain) using single owner FIFO buffers across Codelet's dependencies. Extended Codelet Abstract Machine (xCAM) and Local Codelet Core Memory (LCCM) enables efficient implementation of FIFO buffers based hardware-software co-design principles. ☐ We extend operational semantics of the Codelet model in our existing implementation of the Codelet Model, namely DARTS, to take advantage of dataflow software pipelining principles by implementing efficient single owner FIFO buffers across Codelet's dependencies. We show promising improvements when using extended Codelet Model with Dataflow Software Pipelining compared to the original Codelet model with the detailed case study of Cannons algorithm for matrix multiplication. ☐ The hardware-software co-design of these extensions is realized on important and novel Intel Iris Pro graphics architecture using the OpenCL programming model. We introduce the construct of Codelet Pipe as a communication channel between producer-consumer codelets which exploits architectural features like Shared Local Memory enabling efficient dataflow software pipelining. The Application Programming Interface (API) for Codelet Pipe enables users to construct well-structured Codelet Graphs (CDG) as well as helps with the challenge of ease of Programmability. We evaluate the performance of the Codelet Pipe using a set of micro-benchmarks.
Codelet model, Dataflow software pipelining, FIFO, Graphic processing unit, Hardware-software co-design, Many core architectures