Charith Mendis, Cambridge Yang, Yewen Pu, Dr.Saman Amarasinghe, Michael Carbin
Modern microprocessors are equipped with single instruction multiple data (SIMD) or vector instruction sets which allow compilers to exploit fine-grained data level parallelism. To exploit this parallelism, compilers employ auto-vectorization techniques to automatically convert scalar code into vector code. Larsen & Amarasinghe (2000) first introduced superword level parallelism (SLP) based vectorization, which is one form of vectorization popularly used by compilers. Current compilers employ hand-crafted heuristics and typically only follow one SLP vectorization strategy which can be suboptimal. Recently, Mendis & Amarasinghe (2018) formulated the instruction packing problem of SLP vectorization by leveraging an integer linear programming (ILP) solver, achieving superior runtime performance. In this work, we explore whether it is feasible to imitate optimal decisions made by their ILP solution by fitting a graph neural network policy. We show that the learnt policy produces a vectorization scheme which is better than industry standard compiler heuristics both in terms of static measures and runtime performance. More specifically, the learnt agent produces a vectorization scheme which has a 22.6% higher average reduction in cost compared to LLVM compiler when measured using its own cost model and achieves a geometric mean runtime speedup of 1.015× on the NAS benchmark suite when compared to LLVM’s SLP vectorizer.