Research Article | OPEN ACCESS
The Optimization of IF-conversion in Whole Function Vectorization
Jianmin Pang, Feng Yue, Zheng Shan, Chao Dai and Jiuzhen Jin
State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou, 450001, China
Research Journal of Applied Sciences, Engineering and Technology 2014 1:64-68
Received: February 22, 2014 | Accepted: April 09, 2014 | Published: July 05, 2014
Abstract
In order to get better performance, lots of optimization methods are used in code transformation. When migrating SPMD to multi-core platform, vectorization is one key optimization to improve performance. Control flow is the main challenge for vectorization and IF-conversion is usually used to transform control flow into data flow. In most researches, after IF-conversion both the two branch vector codes have to be executed even the predications in scalar lane for one branch are all false. This study proposes code bypass technology to improve this situation in whole function vectorization of SSA from. The region of consecutive instructions guarded by the same predicate is first identified. Then detecting operation is added to identify if predications in scalar lane are all false and a jump operation followed to bypass the consecutive instructions region. For loop structure, we add loop mask to indicate which lane is not alive in loop which could help to treat iteration in loop. The experiment shows our method could improve performance by 6.8%.
Keywords:
Branch , IF-conversion, loop, optimization , vectorization,
References
-
Joseph, C., H. Park and M. Schlansker, 1991. On predicated execution. Technical Report HPL-91-58, Software and Systems Laboratory.
-
Karrenberg, R. and S. Hack, 2011. Whole-function vectorization. Proceeding of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization. The IEEE Computer Society, pp: 141-150.
CrossRef -
Lee, J., J. Kim, S. Seo, S. Kim, J. Park, H. Kim, T.T. Dao, Y. Cho, S.J. Seo, S.H. Lee, S.M. Cho, H.J. Song, S.B. Suh and J.D. Choi, 2010. An opencl framework for heterogeneous multicores with local memory. Proceeding of the 19th International Conference on Parallel Architectures and Compilation Techniques. ACM, New York, USA, pp: 193-204.
CrossRef -
NVIDIA, 2008a. NVIDIA CUDA Compute Unified Device Architecture. 2nd Edn., NVIDIA Corporation, Santa Clara, California.
-
NVIDIA, 2008b. NVIDIA CUDA SDK 2.1. 2nd Edn., NVIDIA Corporation, Santa Clara, California.
-
Rotem, N. and Y. Ben-Asher, 2012. Block unification If-conversion for high performance architectures. IEEE Comput. Archit. Lett., 1(9): 1.
-
Shin, J., 2007. Introducing control flow into vectorized code. Proceeding of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT '07). IEEE Computer Society, pp: 280-291.
CrossRef -
Shin, J., M.W. Hall and J. Chame, 2005. Superword-level parallelism in the presence of control ?ow. Proceeding of the International Symposium on Code Generation and Optimization, pp: 165-175.
CrossRef PMid:15893392 -
Shin, J., M.W. Hall and J. Chame, 2009. Evaluating compiler technology for control-?ow optimizations for multimedia extension architectures. Microprocess. Microsy., 33(4): 235-243.
CrossRef -
Tyson, G. and M. Farrens, 1994. Evaluating the effects of predicated execution on branch prediction. Proceeding of the 27th International Symposium on Microarchitecture, pp: 196-206.
Competing interests
The authors have no competing interests.
Open Access Policy
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright
The authors have no competing interests.
|
|
|
ISSN (Online): 2040-7467
ISSN (Print): 2040-7459 |
|
Information |
|
|
|
Sales & Services |
|
|
|