ROSE 编译器框架/autoPar
这是一个工具,可以自动将 OpenMP 指令插入输入的串行 C/C++ 代码中。
该工具是使用 OpenMP 进行自动并行化的实现。它可以自动将 OpenMP 指令插入输入的串行 C/C++ 代码中。对于具有现有 OpenMP 指令的输入程序,当开启正确选项时,该工具将双重检查正确性。
- 源文件目前位于 rose/projects/autoParallelization。
- 一个独立的可执行程序(名为 autoPar )被生成并安装到 ROSE 的安装树中(位于 ROSE INS/bin 下)。
- 测试输入文件位于 rose/projects/autoParallelization/tests
- 您可以在 rose_build_tree/projects/autoParallelization 中通过键入 "make check" 来测试该工具。
- ROSE 手册中有一节:12.7 自动并行化 pdf
- 它用于探索语义感知的自动并行化,如我们的论文中所述。
与 ROSE 类似,autoPar 是在 BSD 许可下发布的。
autoPar 作为 ROSE 编译器的一部分发布。请按照以下步骤安装 ROSE:
- 安装
- 键入 make install-core -j8 来安装核心 ROSE 库和预构建工具。
在构建 ROSE 的过程中会创建一个可执行文件 autoPar。用户可以使用它来并行化他们的代码。autoPar 将安装到由 –prefix=ROSE 安装路径 指定的路径中。以下是部分说明。
在配置期间,您可能想要指定一个前缀路径,例如 –prefix=/home/youraccount/opt/roserev-8345。然后,按照 ROSE 安装指南构建并安装 ROSE,通常是键入 make 和 make install。
下一步是设置 ROSE 可执行文件(包括 autoPar )和共享库的搜索路径环境变量。假设您使用的是 bash,将以下几行放入一个文件(例如 set.rose)中并源代码它(键入 source set.rose)
ROSE_INS=/home/youraccount/opt/rose-rev-8345 export ROSE_INS PATH=$ROSE_INS/bin:$PATH export PATH LD_LIBRARY_PATH=$ROSE_INS/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH
现在您可以使用 autoPar 来并行化您的代码。假设您有一个顺序文件:file.c,只需键入 autoPar -c file.c 即可。将生成一个输出文件 rose file.c。如果 autoPar 可以找到任何可并行化的内容,输出文件应该包含生成的 OpenMP 指令。
如果您不想从头开始安装所有内容,可以下载 ROSE 的 vmware 镜像,并直接使用其中安装的 ROSE 副本。
有关此方面的更多信息,请参见 ROSE_Compiler_Framework/虚拟机镜像。
在虚拟机中,您可以 cd 到
/home/demo/buildrose/projects/autoParallelization
并键入
make check
您将看到 autoPar 被用来处理输入文件。
[~build64/projects/autoParallelization/tests]../autoPar—help
AUTOPAR(1) ROSE Command-line Tools AUTOPAR(1) Name autoPar - This tool automatically inserts OpenMP directives into sequential codes. Synopsis autoPar switches files... Description This tool is an implementation of automatic parallelization using OpenMP. It can automatically insert OpenMP directives into input serial C/C++ codes. For input programs with existing OpenMP directives, the tool will double check the correctness when requested. Switches General switches --assert how Determines how a failed assertion behaves. The choices are "abort", "exit" with a non-zero value, or "throw" a Rose::Diagnostics::FailedAssertion exception. The default behavior depends on how ROSE was configured. --help; -h Show this documentation. --log config Configures diagnostics. Use "--log=help" and "--log=list" to get started. --threads n Number of threads to use for algorithms that support multi-threading. The default is 0. A value of zero means use the same number of threads as there is hardware concurrency (or one thread if the hardware concurrency can't be determined). --version; -V Shows the dotted quad ROSE version and then exits. See also --version-long, which prints much more information. --version-long Shows version information for ROSE and various dependencies and then exits. The shorter --version switch shows only the dotted quad of the ROSE library itself. autoPar's switches These switches control the autoPar tool. --[rose:autopar:]annot string Specify annotation file for semantics of abstractions --[rose:autopar:]dumpannot Dump annotation file content for debugging purposes. --[rose:autopar:]enable_debug Enable the debugging mode. --[rose:autopar:]enable_diff Compare user defined OpenMP pragmas to auto parallelization generated ones. --[rose:autopar:]enable_distance Report the absolute dependence distance of a dependence relation preventing parallelization. --[rose:autopar:]enable_patch Enable generating patch files for auto parallelization --[rose:autopar:]failure_report string Specify the report file for logging files autoPar cannot process --[rose:autopar:]keep_going Allow auto parallelization to keep going if errors happen --[rose:autopar:]no_aliasing Assuming no pointer aliasing exists. --[rose:autopar:]success_report string Specify the report file for logging files autoPar can process --[rose:autopar:]unique_indirect_index Assuming all arrays used as indirect indices have unique elements (no overlapping) ROSE builtin switches --rose:help Show the old-style ROSE help. 0.9.9.123 Friday October 13 10:16:56 2017
其他有用的 rose 标志
- -rose:skipfinalCompileStep // 跳过调用后端编译器来编译转换后的代码,这有助于解决一些错误
- --edg:no_warnings // 抑制 EDG C++ 前端的警告
测试输入文件可以在 https://github.com/rose-compiler/rose-develop/tree/master/projects/autoParallelization/tests 找到。
相应的生成测试输出文件可以在:https://github.com/chunhualiao/autoPar-demo 找到。
我们在下面提供两个示例。
输入
/* Only the inner loop can be parallelized */ void foo() { int n=100, m=100; double b[n][m]; int i,j; for (i=0;i<n;i++) for (j=0;j<m;j++) b[i][j]=b[i-1][j-1]; }
命令行和屏幕输出
../autoPar -rose:C99—edg:no_warnings -w -rose:verbose 0 --edg:restrict -rose:autopar:unique_indirect_index -rose:autopar:enable_patch -I../../../../sourcetree/src/frontend/SageIII -c ../../../../sourcetree/projects/autoParallelization/tests/inner_only.c
Enabling generating patch files for auto parallelization ... Assuming all arrays used as indirect indices have unique elements (no overlapping) ... Unparallelizable loop at line:8 due to the following dependencies: 2*2 TRUE_DEP; commonlevel = 2 CarryLevel = (0,0) Is precise SgPntrArrRefExp:b[i][j]@10:14->SgPntrArrRefExp:b[i - 1][j - 1]@10:21 == -1;* 0;||* 0;== -1;||:: Automatically parallelized a loop at line:9
输出
/* Only the inner loop can be parallelized */ #include "omp.h" void foo() { int n = 100; int m = 100; double b[n][m]; int i; int j; for (i = 0; i <= n - 1; i += 1) { #pragma omp parallel for private (j) firstprivate (n,m,i) for (j = 0; j <= m - 1; j += 1) b[i][j] = b[i - 1][j - 1]; } }
另一个输入
int i,j; int a[100][100]; void foo() { for (i=0;i<100;i++) for (j=0;j<100;j++) a[i][j]=a[i][j]+1; }
输出
#include "omp.h" int i; int j; int a[100UL][100UL]; void foo() { #pragma omp parallel for private (i,j) for (i = 0; i <= 100 - 1; i += 1) { #pragma omp parallel for private (j) firstprivate (i) for (j = 0; j <= 100 - 1; j += 1) a[i][j] = (a[i][j] + 1); } }
注解文件提供了额外的程序信息,这些信息通常难以被编译器提取,例如别名、副作用信息。
多个注解文件可以在一个命令行中加载。
../autoPar --edg:no_warnings -w -rose:verbose 0 --edg:restrict -rose:autopar:unique_indirect_index -I/home/liao6/rose/freshmaster/sourcetree/src/frontend/SageIII -c -annot /home/liao6/rose/freshmaster/sourcetree/projects/autoParallelization/tests/floatArray.annot -annot /home/liao6/rose/freshmaster/sourcetree/projects/autoParallelization/tests/funcs.annot -annot /home/liao6/rose/freshmaster/sourcetree/projects/autoParallelization/tests/Index.annot /home/liao6/rose/freshmaster/sourcetree/projects/autoParallelization/tests/interp1_elem2.C
假设所有用作间接索引的数组具有唯一元素(没有重叠)...
自动并行化了第 13 行的循环。
输入
//#include <A++.h> #include "simpleA++.h" void interpolate1D(class floatArray &fineGrid,class floatArray &coarseGrid) { int _var_0,i; int _var_1; int fineGridSize = fineGrid.length(0); int coarseGridSize = coarseGrid.length(0); // Interior fine points class Range If(2,_var_1 = (fineGridSize - 2),2); class Range Ic(1,(coarseGridSize - 1),1); for (i = 1; i < _var_0; i += 1) { fineGrid.elem(i) =fineGrid.elem(i)+1; } }
注解文件
- https://github.com/rose-compiler/rose-develop/blob/master/projects/autoParallelization/tests/floatArray.annot
- https://github.com/rose-compiler/rose-develop/blob/master/projects/autoParallelization/tests/funcs.annot
输出文件
//#include <A++.h> #include "simpleA++.h" #include "omp.h" void interpolate1D(class floatArray &fineGrid,class floatArray &coarseGrid) { int _var_0; int i; int _var_1; int fineGridSize = fineGrid . length (0); int coarseGridSize = coarseGrid . length (0); // Interior fine points class Range If(2,_var_1 = fineGridSize - 2,2); class Range Ic(1,coarseGridSize - 1,1); #pragma omp parallel for private (i) firstprivate (_var_0) for (i = 1; i <= _var_0 - 1; i += 1) { fineGrid . elem (i) = fineGrid . elem (i) + 1; } }
autoPar 接受一个选项 -rose:autopar:enable_patch 来生成一个补丁文件。因此,用户首先需要检查要插入的指令。然后,用户可以将检查过的补丁应用到他们的应用程序中。
命令行
- autoPar -rose:C99 --edg:no_warnings -w -rose:verbose 0 --edg:restrict -rose:autopar:unique_indirect_index -rose:autopar:enable_patch -c jacobi_seq.c
屏幕输出
Enabling generating patch files for auto parallelization ... Assuming all arrays used as indirect indices have unique elements (no overlapping) ... Automatically parallelized a loop at line:83 Unparallelizable loop at line:84 due to the following dependencies: 1*1 OUTPUT_DEP; commonlevel = 1 CarryLevel = (0,0) Is precise SgPntrArrRefExp:f[i][0]@89:6->SgPntrArrRefExp:f[i][0]@89:6 <= -1;||:: 1*1 OUTPUT_DEP; commonlevel = 1 CarryLevel = (0,0) Is precise SgPntrArrRefExp:f[i][0]@89:6->SgPntrArrRefExp:f[i][0]@89:6 <= -1;||:: Automatically parallelized a loop at line:143 Automatically parallelized a loop at line:144 Automatically parallelized a loop at line:147 Automatically parallelized a loop at line:148 Automatically parallelized a loop at line:185 Automatically parallelized a loop at line:186
生成的补丁文件:cat jacobi_seq.c.patch
diff -ar /home/liao6/workspace/autoPar/sourcetree/projects/autoParallelization/tests/jacobi_seq.c rose_jacobi_seq.c 0a1 > #include <omp.h> 82a83 > #pragma omp parallel for private (xx,yy,i,j) firstprivate (n,m) 142a143 > #pragma omp parallel for private (i,j) 143a144 > #pragma omp parallel for private (j) 146a147 > #pragma omp parallel for private (resid,i,j) reduction (+:error) 147a148 > #pragma omp parallel for private (resid,j) reduction (+:error) firstprivate (omega,ax,ay,b) 184a185 > #pragma omp parallel for private (xx,yy,temp,i,j) reduction (+:error) 185a186 > #pragma omp parallel for private (xx,yy,temp,j) reduction (+:error) firstprivate (dx,dy)
检查完补丁后,您可以将补丁应用到原始源文件
- patch jacobi_seq.c -i jacobi_seq.c.patch -o jacobi_patched.c
AutoPar 检查应用程序中预先存在的 OpenMP 指令,并验证它们是否正确地考虑了私有、归约和其他 OMP 数据共享属性。
示例输入
#include <stdio.h> #include <omp.h> int main(int argc, char *argv[]) { int N = 20; int total ; int i,j; #pragma omp parallel for for (j=0;j<N;j++) { for (i=0;i<N;i++) { total += i+j ; } } printf("%d\n", total) ; return 0; }
上面的代码包含一个真正的 OpenMP 错误,有人在尝试向他们的代码添加 OMP 注解时遇到了困难,并在以下地址提交:http://openmp.org/forum/viewtopic.php?f=3&t=821
运行 autoPar OpenMP 指令检查器会产生以下结果。
% autoPar -rose:unparse_tokens -rose:autopar:unique_indirect_index -rose:autopar:enable_diff -fopenmp -c OMPcheck.cc <<<<<<<<< user defined : #pragma omp parallel for -------- OMPcheck.cc:11 compiler generated:#pragma omp parallel for private (i) reduction (+:total) >>>>>>>>
来自 autoPar 的上述输出表明,OMP 指令缺少变量 'i' 的 OpenMP 'private' 声明,以及变量 'total' 的归约注解。
autoPar 旨在处理对原始数组进行操作的传统循环以及使用高级抽象的现代应用程序。并行化器使用以下算法:循环可能包含原始数据类型或 STL 容器类型的变量,或两者兼而有之。
1. 准备和预处理
- (可选) 读取已知抽象和语义的规范文件。
- (可选) 基于输入代码语义应用可选的自定义转换,例如将树遍历转换为内存池上的循环迭代。
- (可选) 如果假定唯一索引索引语义,则打开间接数组访问的规范化(void uniformIndirectIndexedArrayRefs() )
- indirect_loop_index = array_Y [current_loop_index] ;array_X[indirect_loop_index] ..; 变为 array_X [array_Y [current_loop_index]] ..
- 规范化循环,包括使用迭代器的循环。
- 对规范化的代码应用常量折叠以简化代码。
- 查找具有规范形式(用于 omp for)的候选数组计算循环或对单个元素进行操作的循环和函数(用于 omp task)。
2. 对于每个候选者
- 如果存在没有已知语义或副作用的函数调用,则跳过目标。
- 调用依赖关系分析和生存期分析。
- 对 OpenMP 变量进行分类(自动作用域)以使其成为私有变量、firstprivate 变量、lastprivate 变量、reduction 变量。识别对当前元素的引用,并找到与顺序无关的写入访问。
- 消除与自动作用域变量相关的依赖关系,那些仅涉及当前元素的依赖关系,以及由与顺序无关的写入访问引起的输出依赖关系。
- 如果不存在依赖关系,则插入相应的 OpenMP 结构。
算法的关键思想是捕获目标内的依赖关系,并根据各种规则尽可能地消除它们。如果不存在剩余的依赖关系,则并行化是安全的。
在第 51 页或(59/320):第 2.6 节,OpenMP 4.0(http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf)中定义为
for (init-expr; test-expr; incr-expr) structured-block
init-expr 以下之一
var = lb integer-type var = lb random-access-iterator-type var = lb pointer-type var = lb
test-expr 以下之一
var relational-op b b relational-op var
relational-op 以下之一
< <= > >=
incr-expr 以下之一
++var var++ --var var-- var += incr var -= incr var = var + incr var = incr + var var = var - incr
var 以下之一
A variable of a signed or unsigned integer type. For C++, a variable of a random access iterator type. For C, a variable of a pointer type.
If this variable would otherwise be shared, it is implicitly made private in the loop construct. This variable must not be modified during the execution of the for-loop other than in incr-expr. Unless the variable is specified lastprivate on the loop construct, its value after the loop is unspecified.
lb 和 b:与 var 类型兼容的循环不变表达式。
incr:循环不变整数表达式。
依赖关系分析是并行化器决定循环是否可并行化的基础。autoPar 从循环优化器调用依赖关系分析,该优化器实现 [35, 36] 中提出的算法以有效地转换完全嵌套循环和非完全嵌套循环。扩展方向矩阵 (EDM) 依赖关系表示用于覆盖仅围绕两个语句之一的非通用循环嵌套,以处理非完全嵌套循环。对于循环内的数组访问,使用高斯消元算法来求解一组循环归纳变量的线性整数方程。
此步骤依赖于在以下位置实现的依赖关系分析
- src/midend/programTransformation/loopProcessing/depGraph
在以下位置提供了简化的接口函数
//Compute dependence graph for a loop, using ArrayInterface and ArrayAnnoation LoopTreeDepGraph* ComputeDependenceGraph(SgNode* loop, ArrayInterface*, ArrayAnnotation* annot);
依赖关系分析依赖于从副作用分析收集的变量引用信息。
分析的源代码头文件为 midend/programTransformation/loopProcessing/depInfo/DepInfoAnal.h
- bool AnalyzeStmtRefs( AstInterface& fa, const AstNodePtr& n, CollectObject<AstNodePtr> &wRefs, CollectObject<AstNodePtr> &rRefs);
提供了一些 SageInterface 函数来使开发人员更容易地使用此分析。
// Collect all read and write references within stmt, which can be a function, a scope statement, or a single statement. Note that a reference can be both read and written, like i++. bool SageInterface::collectReadWriteRefs (SgStatement *stmt, std::vector< SgNode * > &readRefs, std::vector< SgNode * > &writeRefs) //Collect unique variables which are read or written within a statement. Note that a variable can be both read and written. The statement can be either of a function, a scope, or a single line statement. bool SageInterface::collectReadWriteVariables (SgStatement *stmt, std::set< SgInitializedName * > &readVars, std::set< SgInitializedName * > &writeVars) // Collect read only variables within a statement. The statement can be either of a function, a scope, or a single line statement. void SageInterface::collectReadOnlyVariables (SgStatement *stmt, std::set< SgInitializedName * > &readOnlyVars) //Collect read only variable symbols within a statement. The statement can be either of a function, a scope, or a single line statement. void SageInterface::collectReadOnlySymbols (SgStatement *stmt, std::set< SgVariableSymbol * > &readOnlySymbols)
此步骤根据变量的 live-in(在执行循环之前)和 live-out(在执行循环之后)分析结果,对变量的数据共享属性进行分类。例如,循环内的私有变量既不是循环的 live-in,也不是 live-out,这意味着该变量在循环内立即被杀死(重新定义),然后在循环内以某种方式使用,但在循环之后在任何地方都不会被使用。所有循环索引变量也被分类为 OpenMP 私有变量以避免可能的竞态条件。另一方面,共享变量在循环开始和结束时都是活动的。Firstprivate 和 lastprivate 变量分别仅在循环开始或仅在循环结束时处于活动状态。
reduction 变量的处理方式有所不同,以最大限度地提高并行化的机会。循环内的典型 reduction 操作,例如 sum = sum + a[i],会导致循环携带的输出依赖关系、循环携带的反依赖关系和循环独立的反依赖关系。我们使用一个习语识别分析来捕获此类典型操作,并在决定循环是否可并行化时排除相关的循环携带依赖关系。
此步骤依赖于在以下位置实现的生存期分析
- src/midend/programAnalysis/defUseAnaysis/LivenessAnalysis.cpp
- 存在一个 SageInterface 函数来调用此分析
//!Get liveIn and liveOut variables for a for loop void SageInterface::getLiveVariables(LivenessAnalysis * liv, SgForStatement* loop, std::set<SgInitializedName*>& liveIns, std::set<SgIn itializedName*> & liveOuts)
算法:私有变量和 reduction 变量会导致依赖关系(被写入)。firstprivate 和 lastprivate 变量从未在循环中被写入(没有依赖关系)
live-in live-out shared Y Y no written, no dependences: no special handling, shared by default private N N written (depVars), need privatization: depVars- liveIns - liveOuts firstprivate Y N liveIns - LiveOus - writtenVariables lastprivate N Y liveOuts - LiveIns reduction Y Y depVars Intersection (liveIns Intersection liveOuts)
要识别 reduction 变量:RecognizeReduction()
- 从 https://github.com/rose-compiler/rose-develop/blob/master/src/frontend/SageIII/sageInterface/sageInterface.C 中实现为 void SageInterface::ReductionRecognition(SgForStatement* loop, std::set< std::pair <SgInitializedName*, VariantT> > & results)
/* * Algorithms: * for each scalar candidate which are both live-in and live-out for the loop body * and which is not the loop invariant variable. * Consider those with only 1 or 2 references * 1 reference * the operation is one of x++, ++x, x--, --x, x binop= expr * 2 references belonging to the same operation * operations: one of x= x op expr, x = expr op x (except for subtraction) * Also according to the specification. * x is not referenced in exp * expr has scalar type (no array, objects etc) * x: scalar only, aggregate types (including arrays), pointer types and reference types may not appear in a reduction clause. * op is not an overloaded operator, but +, *, -, &, ^ ,|, &&, || * binop is not an overloaded operator but: +, *, -, &, ^ ,| * */
我们使用一个单独的文件来存储输入代码的注释。该文件本质上指定了输入代码的语义,以便工具可以利用它来并行化代码。
一些注释文件 (*.annot) 可以在项目目录中的 https://github.com/rose-compiler/rose-develop/tree/master/projects/autoParallelization/tests 中找到。
- https://github.com/rose-compiler/rose/blob/master/projects/autoParallelization/annot/Cmath.annot 156 个 C 标准库函数
- https://github.com/rose-compiler/rose-develop/blob/master/projects/autoParallelization/tests/floatArray.annot 数组语义
- https://github.com/rose-compiler/rose-develop/blob/master/projects/autoParallelization/tests/std_vector.annot std::vector 语义
- https://github.com/rose-compiler/rose-develop/blob/master/projects/autoParallelization/tests/funcs.annot 用户函数
- https://github.com/rose-compiler/rose-develop/blob/master/projects/autoParallelization/tests/Index.annot
示例注释为
operator atoll(const char * val) { modify none; read{val}; alias none; } class InternalIndex { has_value { stride = this.stride; base = this.base; length = this.length; } } class Index { has_value { stride = this.stride; base = this.base; length = this.length; } } class Range { has_value { stride = this.stride; base = this.base; length = this.length; } } operator Index::Index( int lb, int l, int step) { modify none; read {lb, l, step}; alias none; restrict_value { result = { base = lb; length = l; stride = step; } }; } operator Range::Range( int lb, int ub, int step) { modify none; read {lb, ub, step}; alias none; restrict_value { result = { base = lb; length = (ub-lb+1)/step; stride = step; } }; } operator InternalIndex::InternalIndex( int lb, int l, int step) { modify none; read {lb, l, step}; alias none; restrict_value { result = { base = lb; length = l; stride = step; } }; } operator operator+ ( const InternalIndex & lhs, int x ) { modify none; read {lhs, x}; alias none; restrict_value { result = {stride = lhs.stride; base = lhs.base + x; length = lhs.length}; }; } operator operator- ( const InternalIndex & lhs, int x ) { modify none; read {lhs, x}; alias none; restrict_value { result = {stride = lhs.stride; base = lhs.base - x; length = lhs.length}; }; }
数组抽象
class floatArray { array { dimension = 6; length(i) = this.Array_Descriptor.Array_Domain.Size[i]; elem(i:dim:1:dimension) = this(i$dim); reshape(i:dim:1:dimension) = this.resize(i$dim); }; array_optimize { define { float* _pointer = this.getDataPointer(); int _size:dim:1:dimension = this.Array_Descriptor.Array_Domain.Size[dim-1]; int _stride:dim:1:dimension = this.Array_Descriptor.Array_Domain.Stride[dim-1]; int _length:dim:1:dimension = this.Array_Descriptor.Array_Domain.getLength(dim-1) } length(i) = _length$(i+1); elem(i:dim:1:dimension) = _pointer[i$1 + repeat(x,2,dimension, i$x * _stride$(x-1) * _size$(x-1))]; }; has_value { dimension = 6; length_0 = this.length(0); length_1 = this.length(1); length_2 = this.length(2); }; } operator floatArray::operator() (int index) { modify none; read {index}; alias none; restrict_value { this = { dimension = 1; } }; inline { this.elem(index) }; } operator floatArray::operator() ( const InternalIndex& index) { modify none; read {this, index}; alias { (result, this) }; restrict_value { this = { dimension = 1; }; result = {dimension = 1; length_0 = index.length;}; }; construct_array (this) { dimension = 1; length(0) = index.length; elem(i) = this.elem(i * index.stride + index.base); }; } operator floatArray::operator()(const InternalIndex& index1, const InternalIndex& index2) { modify none; read {this, index1, index2}; alias { (result, this) }; restrict_value { this = { dimension = 2; }; result = {dimension = 2; length_0 = index1.length; length_1 = index2.length}; }; construct_array (this) { dimension = 2; length(0) = index1.length; length(1) = index2.length; elem(i,j) = this.elem(i * index1.stride + index1.base, j * index2.stride + index2.base); }; } operator floatArray::operator()(const InternalIndex& index1, const InternalIndex& index2, const InternalIndex& index3) { modify none; read {this, index1, index2, index3}; alias { (result, this) }; restrict_value { this = { dimension = 3; }; result = {dimension = 3; length_0 = index1.length; length_1 = index2.length; length_2 = index3.length; }; }; construct_array (this) { dimension = 3; length(0) = index1.length; length(1) = index2.length; length(2) = index3.length; elem(i,j,k) = this.elem(i * index1.stride + index1.base, j * index2.stride + index2.base, k * index3.stride + index3.base); }; } operator floatArray::operator= (const floatArray& that) { modify {this}; read {that}; alias none; restrict_value { result = { dimension = that.dimension; length_0 = that.length_0;}; }; modify_array (this) { dimension = that.dimension; length(i) = that.length(i); elem(i:dim:1:dimension) = that.elem(i$dim); }; } operator floatArray::operator= (float val) { modify {this}; read {val}; alias none; modify_array (this) { elem(i:dim:1:dimension) = val; }; } operator floatArray::getLength( int dim) { modify none; read {dim}; alias none; inline { this.length(dim) }; }
并非所有依赖关系都会阻止我们并行化循环。在我们的考虑中,我们排除了某些无害的依赖关系。
void DependenceElimination(SgNode* sg_node, LoopTreeDepGraph* depgraph, std::vector<DepInfo>& remainings, OmpSupport::OmpAttribute* att, std::map<SgNode*, bool> & indirect_table, ArrayInterface* array_interface/*=0*/, ArrayAnnotation* annot/*=0*/)
- 来自文件 https://github.com/rose-compiler/rose/blob/master/projects/autoParallelization/autoParSupport.C
算法,消除以下依赖关系
- 由局部声明的变量引起:已经是每个迭代的私有变量
- 源变量或汇变量是线程局部变量
- 忽略具有空源/汇节点的依赖关系
- commonlevel ==0,没有共同的封闭循环
- carry level !=0,循环独立,
- 依赖关系中至少有一个数组引用,但依赖关系类型为 SCALAR_DEP 或 SCALAR_BACK_DEP
- 由自动作用域变量引起的依赖关系(private、firstprivate、lastprivate、reduction 等)
- 由一对间接索引数组引用引起的依赖关系,如果我们知道间接索引数组具有唯一的元素。
请注意,为什么 carry level 必须为 0 才能防止我们的分析中的并行化
- 这是有效的,因为我们在循环的每一级上运行 dep 分析。
- 为了考虑当前循环,仅由当前循环携带的依赖关系很重要。
- 当前循环始终是最外层循环,carry level id 为 0。
autoPar -rose:autopar:enable_debug sourcetree/projects/autoParallelization/tests/jacobi_seq.c
搜索“此 dep 关系无法消除。保存到剩余的依赖关系集中。”以查找具有无法消除的循环携带依赖关系的循环。
// Input code from Jacobi, expected to generate // #pragma omp parallel for private(i,j,xx,yy,temp) reduction(+:error) 185 for (i = 0; i < n; i++) 186 for (j = 0; j < m; j++) 187 { 188 xx = -1.0 + dx * (i - 1); 189 yy = -1.0 + dy * (j - 1); 190 temp = u[i][j] - (1.0 - xx * xx) * (1.0 - yy * yy); 191 error = error + temp * temp; 192 } Debug: ComputeDependenceGraph() dumps the dependence graph for the loop at line :185 dep SgExprStatement:xx = - 1.0 + dx *(i - 1); SgExprStatement:xx = - 1.0 + dx *(i - 1); 0x7fffb5d03db0 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (2,2) Not precise SgVarRefExp:xx@188:2-> SgVarRefExp:xx@188:2 == 0;* 0;||* 0;== 0;||:: dep SgExprStatement:xx = - 1.0 + dx *(i - 1); SgExprStatement:temp = u[i][j] -(1.0 - xx * xx) *(1.0 - yy * yy); 0x7fffb5d03db0 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (0,0) Not precise SgVarRefExp:xx@188:2-> SgPntrArrRefExp:u[i][j]@190:13 * 0;* 0;||* 0;* 0;||:: dep SgExprStatement:xx = - 1.0 + dx *(i - 1); SgExprStatement:temp = u[i][j] -(1.0 - xx * xx) *(1.0 - yy * yy); 0x7fffb5d03db0 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (2,2) Not precise SgVarRefExp:xx@188:2-> SgVarRefExp:xx@190:26 == 0;* 0;||* 0;== 0;||:: dep SgExprStatement:xx = - 1.0 + dx *(i - 1); SgExprStatement:temp = u[i][j] -(1.0 - xx * xx) *(1.0 - yy * yy); 0x7fffb5d03db0 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (2,2) Not precise SgVarRefExp:xx@188:2-> SgVarRefExp:xx@190:31 == 0;* 0;||* 0;== 0;||:: .. dep SgExprStatement:error = error + temp * temp; SgExprStatement:error = error + temp * temp; 0x7fffb5d03db0 Distance Matrix size:2*2 SCALAR_BACK_DEP; commonlevel = 2 CarryLevel = (1,1) Not precise SgVarRefExp:error@191:2-> SgVarRefExp:error@191:10 == 0;* 0;||* 0;<= -1;||:: ... dep SgExprStatement:error = error + temp * temp; SgExprStatement:temp = u[i][j] -(1.0 - xx * xx) *(1.0 - yy * yy); 0x7fffb5d03db0 Distance Matrix size:2*2 SCALAR_BACK_DEP; commonlevel = 2 CarryLevel = (1,1) Not precise SgVarRefExp:temp@191:25-> SgVarRefExp:temp@190:2 == 0;* 0;||* 0;<= -1;||:: dep SgExprStatement:error = error + temp * temp; SgExprStatement:temp = u[i][j] -(1.0 - xx * xx) *(1.0 - yy * yy); 0x7fffb5d03db0 Distance Matrix size:2*2 SCALAR_BACK_DEP; commonlevel = 2 CarryLevel = (0,0) Not precise SgVarRefExp:error@191:2-> SgPntrArrRefExp:u[i][j]@190:13 * 0;* 0;||* 0;* 0;||:: -------------------------------------------------------- Live-in variables for loop:186 error ::m ::n Live-out variables for loop before line:193 error ::m ::n Final Live-in variables for loop: error ::m ::n Final Live-out variables for loop: error ::m ::n Debug after CollectVisibleVaribles (): 0x7f1640ef0678 ::n 0x7f1640ef07c0 ::m 0x7f1640ef1200 ::dx 0x7f1640ef1348 ::dy 0x7f1640ef2680 j 0x7f1640ef27c8 xx 0x7f1640ef2910 yy 0x7f1640ef2a58 temp 0x7f1640ef2ba0 error Debug after CollectVariablesWithDependence(): 0x7f1640ef27c8 xx 0x7f1640ef2910 yy 0x7f1640ef2a58 temp 0x7f1640ef2ba0 error Debug dump private: 0x7f1640ef27c8 xx 0x7f1640ef2910 yy 0x7f1640ef2a58 temp 0x7f1640ef2538 i 0x7f1640ef2680 j Debug dump lastprivate: Debug: A candidate used twice:error Debug dump firstprivate: Entering DependenceElimination () ---------------------------------- Eliminating a dep relation due to at least one autoscoped variables 0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (2,2) Not precise SgVarRefExp:xx@188:2-> SgVarRefExp:xx@188:2 == 0;* 0;||* 0;== 0;||:: Eliminating a dep relation due to scalar dep type for at least one array variable 0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (0,0) Not precise SgVarRefExp:xx@188:2-> SgPntrArrRefExp:u[i][j]@190:13 * 0;* 0;||* 0;* 0;||:: Eliminating a dep relation due to at least one autoscoped variables 0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (2,2) Not precise SgVarRefExp:xx@188:2-> SgVarRefExp:xx@190:26 == 0;* 0;||* 0;== 0;||:: Eliminating a dep relation due to scalar dep type for at least one array variable 0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (0,0) Not precise SgVarRefExp:yy@189:2-> SgPntrArrRefExp:u[i][j]@190:13 * 0;* 0;||* 0;* 0;||:: .. Eliminating a dep relation due to at least one autoscoped variables 0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_BACK_DEP; commonlevel = 2 CarryLevel = (1,1) Not precise SgVarRefExp:yy@190:44-> SgVarRefExp:yy@189:2 == 0;* 0;||* 0;<= -1;||:: ... Eliminating a dep relation due to scalar dep type for at least one array variable 0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (0,0) Not precise SgPntrArrRefExp:u[i][j]@190:13-> SgVarRefExp:error@191:2 * 0;* 0;||* 0;* 0;||:: ... Eliminating a dep relation due to at least one autoscoped variables 0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_BACK_DEP; commonlevel = 2 CarryLevel = (1,1) Not precise SgVarRefExp:error@191:2-> SgVarRefExp:error@191:10 == 0;* 0;||* 0;<= -1;||:: Eliminating a dep relation due to at least one autoscoped variables 0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_BACK_DEP; commonlevel = 2 CarryLevel = (1,1) Not precise SgVarRefExp:temp@191:18-> SgVarRefExp:temp@190:2 == 0;* 0;||* 0;<= -1;||:: Exiting DependenceElimination () ---------------------------------- Automatically parallelized a loop at line:185 debug patch generation: attaching OMP att to sg_node SgForStatement at line 185
用于编码类型(类)及其操作(运算符)的语义信息
源文件:https://github.com/rose-compiler/rose-develop/tree/master/src/midend/astUtil/annotation
// annotations are separated by ; <annot> ::= <annot1> | <annot1>;<annot> // two main categories: class and operator <annot1> ::= class <cls annot> | operator <op annot> //---------class------------ // If this is an array-like class? // e.g: class MyArray: <cls annot> ::= <clsname>:<cls annot1>; <cls annot1>::= <cls annot2> | <cls annot2> <cls annot1> //from bool TypeDescriptor:: read(istream& in) in AnnotDescriptors.C // type name can have const modifier, with qualifiers, and of reference or pointer type <clasname>::= [const] name1[::name2[...]][&|*] // array information, inheritable or not <cls annot2>::= <arr annot> | inheritable <arr annot> | has-value { <val def> } //Array information: dimension, length, element, reshape <arr annot>::= is-array{ <arr def>} | is-array{define{<stmts>}<arr def>} // define loop-invariant operations that should be executed before enumerating array elements <arr def> ::= <arr attr def> | <arr attr def> <arr def> <arr attr def> ::= <arr attr>=<expression>; <arr attr> ::= dim // at most 'dim' dimensions | len (<param>) // length of each dimension: parameter is used to specify which dimension | elem(<paramlist>) //i$x:0:dim-1 ==> $x is from 0 to dim-1 ==> list of parameters i_0, i_1, i_dim-1 | reshape(<paramlist>) //Data members for types/classes //Could have expressions <val def> ::= <name>; | <name>;<val def> | <name> = <expression> ; | <name> = <expression> ; <val def> //---------operator------------ // side effects: mod, read, alias <op annot> ::= <opdecl> : <op annot1> ; <op annot1> ::=<op annot2> | <op annot2> <op annot1> <op annot2> ::= modify <namelist> // side effects on scalar:e.g. modify none | new-array (<aliaslist>){<arr def>} // side effects on arrays: creational | modify-array (<name>) {<arr def>} // side effects on arrays: modifying only | restrict-value {<val def list>} // How to get member values from this operation's semantics( may be from parameters values) | read <namelist> // e.g.: read {a,b,c} | alias <nameGrouplist> // e.g.: alias none | allow-alias <nameGrouplist> | inline <expression> // this operator is semantically equal to another operator/expression, used to substitute an operation with another // from OperatorDescriptors.h NameGropu, OperatorAliasDescriptor <nameGrouplist> e.g: {(x,y,z), (a,b), (m,n)}
数组和数组优化(重写为原始数组)
class floatArray { array { // high level array abstraction with range from 1 to dimension dimension = 6; // max dimension length(i) = this.Array_Descriptor.Array_Domain.Size[i]; // length of each dimension //i:dim:1:dimension/ i$dim means a list of parameters i_1, i_2, .., i_dimension? // i:dim:1:dimension also defines a range variable dim, which is reused later in i$dim elem(i:dim:1:dimension) = this(i$dim); reshape(i:dim:1:dimension) = this.resize(i$dim); }; array_optimize { // low level array with range from 0 to dimension-1 define { float* _pointer = this.getDataPointer(); int _size:dim:1:dimension = this.Array_Descriptor.Array_Domain.Size[dim-1]; int _stride:dim:1:dimension = this.Array_Descriptor.Array_Domain.Stride[dim-1]; int _length:dim:1:dimension = this.Array_Descriptor.Array_Domain.getLength(dim-1) } length(i) = _length$(i+1); elem(i:dim:1:dimension) = _pointer[i$1 + repeat(x,2,dimension, i$x * _stride$(x-1) * _size$(x-1))]; }; has_value { dimension = 6; length_0 = this.length(0); length_1 = this.length(1); length_2 = this.length(2); }; } dump ()-------------- arrays: floatArray : {dimension=6; length=((i:::):[](.(.(.(this,Array_Descriptor),Array_Domain),Size),i)); elem=((i:dim:1:dimension):this($(i,dim))) } reshape = ((i:dim:1:dimension):FunctionPtrCall(.(this,resize),$(i,dim))) array optimizations: floatArray : {float* _pointer:::=FunctionPtrCall(.(this,getDataPointer)); int _size:dim:1:dimension=[](.(.(.(this,Array_Descriptor),Array_Domain),Size),+(-1dim)); int _stride:dim:1:dimension=[](.(.(.(this,Array_Descriptor),Array_Domain),Stride),+(-1dim)); int _length:dim:1:dimension=FunctionPtrCall(.(.(.(this,Array_Descriptor),Array_Domain),getLength),+(-1dim)) } {dimension=; length=((i:::):$(_length,+(1i))); elem=((i:dim:1:dimension):[](_pointer,+($(i,1)repeat(x,2,dimension,*($(i,x)$(_stride,+(-1x))$(_size,+(-1x))))))) }
函数副作用注释:读、修改和别名的变量集
operator VectorXY::VectorXY(double xx, double yy) { modify none; read {xx,yy}; } // side effects annotation operator floatArray::operator() ( const InternalIndex& index) { modify none; read {this, index}; } // Dump(): mangled_function_name: (all variables): {modified variable } floatArray::operator()_constInternalIndex& : (this,index):{} // another annotation, side effects, alias, value operator Index::Index( int lb, int l, int step) { modify none; read {lb, l, step}; alias none; restrict_value { result = { base = lb; length = l; stride = step; } }; } // Dump() read: Index::Index_int_int_int : (lb,l,step):{l,lb,step} // allow_alias: used only for a function operating on two or more parameters? tell if parameters can be alias to each other? operator interpolate2D(floatArray & fineGrid,floatArray & coarseGrid) { allow_alias none; }
has_value 和 restrict _value 注释:“this” 和 “result” 是什么?
// has_value: Only used for within class annotation, class floatArray { .. has_value { dimension = 6; length_0 = this.length(0); length_1 = this.length(1); length_2 = this.length(2); }; class Range { has_value { stride = this.stride; base = this.base; length = this.length; } } //Dump: floatArray : {dimension:6;length_0:FunctionPtrCall(.(this,length),0);length_1:FunctionPtrCall(.(this,length),1);length_2:FunctionPtrCall(.(this,length),2)} } Range : {base:.(this,base);length:.(this,length);stride:.(this,stride)} // restrict_value: infer the value of member variable from the semantics of current operations: used for operation annotations. could have inferred information for one "this" and multiple "result" depending on function parameters operator floatArray::operator() (int index) { .. restrict_value { this = { dimension = 1; } }; // this element access operator using one integer parameter can tell us that the current object (this) has a dimension value equaling to 1 } operator floatArray::operator()(const InternalIndex& index1, const InternalIndex& index2, const InternalIndex& index3) { restrict_value { this = { dimension = 3; }; result = {dimension = 3; length_0 = index1.length; length_1 = index2.length; length_2 = index3.length; }; }; }
operator sqrt( double val) { modify none; read{val}; alias none; } operator sqrt( float val) { modify none; read{val}; alias none; } operator abs(int val) { modify none; read{val}; alias none; }
- 旧技术报告:廖春华、丹尼尔·J·昆兰、杰里米·J·威尔科克和托马斯·帕纳斯。基于 STL 语义的 OpenMP 自动并行化。技术报告,劳伦斯利弗莫尔国家实验室 (LLNL),利弗莫尔,加利福尼亚州(美国),2008 年。
- 一篇研讨会论文:Chunhua Liao、Daniel J. Quinlan、Jeremiah J. Willcock 和 Thomas Panas,扩展自动并行化以优化多核的高级抽象,2009 年 5 月国际 OpenMP 研讨会论文集:在极端并行时代发展 OpenMP(德国德累斯顿,2009 年 6 月 3 日至 5 日)。pdf
- 论文的期刊版本:Chunhua Liao、Daniel J. Quinlan、Jeremiah J. Willcock 和 Thomas Panas,使用高级抽象进行现代应用程序的语义感知自动并行化,并行编程杂志,2010 年 1 月接受 pdf