跳转到内容

ROSE 编译器框架/autoPar

来自维基教科书,开放的书籍,用于开放的世界

这是一个工具,可以自动将 OpenMP 指令插入输入的串行 C/C++ 代码中。

解释 autoPar 编程工具工作原理的图表

该工具是使用 OpenMP 进行自动并行化的实现。它可以自动将 OpenMP 指令插入输入的串行 C/C++ 代码中。对于具有现有 OpenMP 指令的输入程序,当开启正确选项时,该工具将双重检查正确性。

  • 源文件目前位于 rose/projects/autoParallelization
  • 一个独立的可执行程序(名为 autoPar )被生成并安装到 ROSE 的安装树中(位于 ROSE INS/bin 下)。
  • 测试输入文件位于 rose/projects/autoParallelization/tests
  • 您可以在 rose_build_tree/projects/autoParallelization 中通过键入 "make check" 来测试该工具。
  • ROSE 手册中有一节:12.7 自动并行化 pdf
  • 它用于探索语义感知的自动并行化,如我们的论文中所述。
    • 一篇研讨会论文:Chunhua Liao、Daniel J. Quinlan、Jeremiah J. Willcock 和 Thomas Panas,扩展自动并行化以优化多核的高级抽象,2009 年 5 月国际 OpenMP 研讨会论文集:在极端并行时代发展 OpenMP(德国德累斯顿,2009 年 6 月 3 日至 5 日)。pdf
    • 论文的期刊版本:Chunhua Liao、Daniel J. Quinlan、Jeremiah J. Willcock 和 Thomas Panas,使用高级抽象进行现代应用程序的语义感知自动并行化,并行编程杂志,2010 年 1 月接受 pdf

与 ROSE 类似,autoPar 是在 BSD 许可下发布的。

安装和配置

[编辑 | 编辑源代码]

autoPar 作为 ROSE 编译器的一部分发布。请按照以下步骤安装 ROSE:

  • 安装
  • 键入 make install-core -j8 来安装核心 ROSE 库和预构建工具。

在构建 ROSE 的过程中会创建一个可执行文件 autoPar。用户可以使用它来并行化他们的代码。autoPar 将安装到由 –prefix=ROSE 安装路径 指定的路径中。以下是部分说明。

在配置期间,您可能想要指定一个前缀路径,例如 –prefix=/home/youraccount/opt/roserev-8345。然后,按照 ROSE 安装指南构建并安装 ROSE,通常是键入 make 和 make install。

下一步是设置 ROSE 可执行文件(包括 autoPar )和共享库的搜索路径环境变量。假设您使用的是 bash,将以下几行放入一个文件(例如 set.rose)中并源代码它(键入 source set.rose)

ROSE_INS=/home/youraccount/opt/rose-rev-8345
export ROSE_INS
PATH=$ROSE_INS/bin:$PATH
export PATH
LD_LIBRARY_PATH=$ROSE_INS/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH

现在您可以使用 autoPar 来并行化您的代码。假设您有一个顺序文件:file.c,只需键入 autoPar -c file.c 即可。将生成一个输出文件 rose file.c。如果 autoPar 可以找到任何可并行化的内容,输出文件应该包含生成的 OpenMP 指令。

替代方案:使用虚拟机镜像

[编辑 | 编辑源代码]

如果您不想从头开始安装所有内容,可以下载 ROSE 的 vmware 镜像,并直接使用其中安装的 ROSE 副本。

有关此方面的更多信息,请参见 ROSE_Compiler_Framework/虚拟机镜像


在虚拟机中,您可以 cd 到

 /home/demo/buildrose/projects/autoParallelization

并键入

 make check

您将看到 autoPar 被用来处理输入文件。

命令行选项

[编辑 | 编辑源代码]

[~build64/projects/autoParallelization/tests]../autoPar—help

AUTOPAR(1)                           ROSE Command-line Tools                           AUTOPAR(1)

Name
    autoPar - This tool automatically inserts OpenMP directives into sequential codes.

Synopsis
    autoPar switches files...

Description
    This tool is an implementation of automatic parallelization using OpenMP. It can automatically insert OpenMP directives into input serial C/C++
    codes. For input programs with existing OpenMP directives, the tool will double check the correctness when requested.

Switches
  General switches
    --assert how
      Determines how a failed assertion behaves. The choices are "abort", "exit" with a non-zero value, or "throw" a
      Rose::Diagnostics::FailedAssertion exception. The default behavior depends on how ROSE was configured.

    --help; -h
      Show this documentation.

    --log config
      Configures diagnostics. Use "--log=help" and "--log=list" to get started.

    --threads n
      Number of threads to use for algorithms that support multi-threading. The default is 0. A value of zero means use the same number of
      threads as there is hardware concurrency (or one thread if the hardware concurrency can't be determined).

    --version; -V
      Shows the dotted quad ROSE version and then exits. See also --version-long, which prints much more information.

    --version-long
      Shows version information for ROSE and various dependencies and then exits. The shorter --version switch shows only the dotted quad of the
      ROSE library itself.

  autoPar's switches
    These switches control the autoPar tool.

    --[rose:autopar:]annot string
      Specify annotation file for semantics of abstractions

    --[rose:autopar:]dumpannot
      Dump annotation file content for debugging purposes.

    --[rose:autopar:]enable_debug
      Enable the debugging mode.

    --[rose:autopar:]enable_diff
      Compare user defined OpenMP pragmas to auto parallelization generated ones.

    --[rose:autopar:]enable_distance
      Report the absolute dependence distance of a dependence relation preventing parallelization.

    --[rose:autopar:]enable_patch
      Enable generating patch files for auto parallelization

    --[rose:autopar:]failure_report string
      Specify the report file for logging files autoPar cannot process

    --[rose:autopar:]keep_going
      Allow auto parallelization to keep going if errors happen

    --[rose:autopar:]no_aliasing
      Assuming no pointer aliasing exists.

    --[rose:autopar:]success_report string
      Specify the report file for logging files autoPar can process

    --[rose:autopar:]unique_indirect_index
      Assuming all arrays used as indirect indices have unique elements (no overlapping)

  ROSE builtin switches
    --rose:help
      Show the old-style ROSE help.

0.9.9.123                          Friday October 13 10:16:56 2017 

其他有用的 rose 标志

  • -rose:skipfinalCompileStep // 跳过调用后端编译器来编译转换后的代码,这有助于解决一些错误
  • --edg:no_warnings // 抑制 EDG C++ 前端的警告

示例用法

[编辑 | 编辑源代码]

测试输入文件可以在 https://github.com/rose-compiler/rose-develop/tree/master/projects/autoParallelization/tests 找到。

相应的生成测试输出文件可以在:https://github.com/chunhualiao/autoPar-demo 找到。

我们在下面提供两个示例。

不使用注解

[编辑 | 编辑源代码]

输入

/* Only the inner loop can be parallelized
 */
void foo()
{
 int n=100, m=100;
 double b[n][m];
 int i,j;
 for (i=0;i<n;i++)
  for (j=0;j<m;j++)
   b[i][j]=b[i-1][j-1];
}

命令行和屏幕输出

../autoPar -rose:C99—edg:no_warnings -w -rose:verbose 0 --edg:restrict -rose:autopar:unique_indirect_index -rose:autopar:enable_patch -I../../../../sourcetree/src/frontend/SageIII -c ../../../../sourcetree/projects/autoParallelization/tests/inner_only.c

Enabling generating patch files for auto parallelization ...
Assuming all arrays used as indirect indices have unique elements (no overlapping) ...

Unparallelizable loop at line:8 due to the following dependencies:
2*2 TRUE_DEP; commonlevel = 2 CarryLevel = (0,0) Is precise SgPntrArrRefExp:b[i][j]@10:14->SgPntrArrRefExp:b[i - 1][j - 1]@10:21 == -1;* 0;||* 0;== -1;||::

Automatically parallelized a loop at line:9

输出

/* Only the inner loop can be parallelized
 */
#include "omp.h" 

void foo()
{
 int n = 100;
 int m = 100;
 double b[n][m];
 int i;
 int j;
 for (i = 0; i <= n - 1; i += 1) {
  
#pragma omp parallel for private (j) firstprivate (n,m,i)
  for (j = 0; j <= m - 1; j += 1) 
   b[i][j] = b[i - 1][j - 1];
 }
}

另一个输入

int i,j;
int a[100][100];
void foo()
{
 for (i=0;i<100;i++)
  for (j=0;j<100;j++)
   a[i][j]=a[i][j]+1;
}

输出

#include "omp.h" 
int i;
int j;
int a[100UL][100UL];

void foo()
{
 
#pragma omp parallel for private (i,j)
 for (i = 0; i <= 100 - 1; i += 1) {
  
#pragma omp parallel for private (j) firstprivate (i)
  for (j = 0; j <= 100 - 1; j += 1) 
   a[i][j] = (a[i][j] + 1);
 }
}

使用注解

[编辑 | 编辑源代码]

注解文件提供了额外的程序信息,这些信息通常难以被编译器提取,例如别名、副作用信息。

多个注解文件可以在一个命令行中加载。

../autoPar --edg:no_warnings -w -rose:verbose 0 --edg:restrict -rose:autopar:unique_indirect_index -I/home/liao6/rose/freshmaster/sourcetree/src/frontend/SageIII -c -annot /home/liao6/rose/freshmaster/sourcetree/projects/autoParallelization/tests/floatArray.annot -annot /home/liao6/rose/freshmaster/sourcetree/projects/autoParallelization/tests/funcs.annot -annot /home/liao6/rose/freshmaster/sourcetree/projects/autoParallelization/tests/Index.annot /home/liao6/rose/freshmaster/sourcetree/projects/autoParallelization/tests/interp1_elem2.C

假设所有用作间接索引的数组具有唯一元素(没有重叠)...

自动并行化了第 13 行的循环。


输入

//#include <A++.h>
#include "simpleA++.h"

void interpolate1D(class floatArray &fineGrid,class floatArray &coarseGrid)
{
 int _var_0,i;
 int _var_1;
 int fineGridSize = fineGrid.length(0);
 int coarseGridSize = coarseGrid.length(0);
// Interior fine points
 class Range If(2,_var_1 = (fineGridSize - 2),2);
 class Range Ic(1,(coarseGridSize - 1),1);
 for (i = 1; i < _var_0; i += 1) {
  fineGrid.elem(i) =fineGrid.elem(i)+1;
 }
}

注解文件

输出文件

//#include <A++.h>
#include "simpleA++.h"
#include "omp.h"

void interpolate1D(class floatArray &fineGrid,class floatArray &coarseGrid)
{
 int _var_0;
 int i;
 int _var_1;
 int fineGridSize = fineGrid . length (0);
 int coarseGridSize = coarseGrid . length (0);
// Interior fine points
 class Range If(2,_var_1 = fineGridSize - 2,2);
 class Range Ic(1,coarseGridSize - 1,1);

#pragma omp parallel for private (i) firstprivate (_var_0)
 for (i = 1; i <= _var_0 - 1; i += 1) {
  fineGrid . elem (i) = fineGrid . elem (i) + 1;
 }
}

生成补丁

[编辑 | 编辑源代码]

autoPar 接受一个选项 -rose:autopar:enable_patch 来生成一个补丁文件。因此,用户首先需要检查要插入的指令。然后,用户可以将检查过的补丁应用到他们的应用程序中。

输入文件:https://github.com/rose-compiler/rose-develop/blob/master/projects/autoParallelization/tests/jacobi_seq.c

命令行

  • autoPar -rose:C99 --edg:no_warnings -w -rose:verbose 0 --edg:restrict -rose:autopar:unique_indirect_index -rose:autopar:enable_patch -c jacobi_seq.c

屏幕输出

Enabling generating patch files for auto parallelization ...
Assuming all arrays used as indirect indices have unique elements (no overlapping) ...

Automatically parallelized a loop at line:83

Unparallelizable loop at line:84 due to the following dependencies:
1*1 OUTPUT_DEP; commonlevel = 1 CarryLevel = (0,0) Is precise SgPntrArrRefExp:f[i][0]@89:6->SgPntrArrRefExp:f[i][0]@89:6 <= -1;||::
1*1 OUTPUT_DEP; commonlevel = 1 CarryLevel = (0,0) Is precise SgPntrArrRefExp:f[i][0]@89:6->SgPntrArrRefExp:f[i][0]@89:6 <= -1;||::

Automatically parallelized a loop at line:143

Automatically parallelized a loop at line:144

Automatically parallelized a loop at line:147

Automatically parallelized a loop at line:148

Automatically parallelized a loop at line:185

Automatically parallelized a loop at line:186

生成的补丁文件:cat jacobi_seq.c.patch

diff -ar /home/liao6/workspace/autoPar/sourcetree/projects/autoParallelization/tests/jacobi_seq.c rose_jacobi_seq.c
0a1
> #include <omp.h>
82a83
> #pragma omp parallel for private (xx,yy,i,j) firstprivate (n,m)
142a143
> #pragma omp parallel for private (i,j)
143a144
> #pragma omp parallel for private (j)
146a147
> #pragma omp parallel for private (resid,i,j) reduction (+:error)
147a148
> #pragma omp parallel for private (resid,j) reduction (+:error) firstprivate (omega,ax,ay,b)
184a185
> #pragma omp parallel for private (xx,yy,temp,i,j) reduction (+:error)
185a186
> #pragma omp parallel for private (xx,yy,temp,j) reduction (+:error) firstprivate (dx,dy)

检查完补丁后,您可以将补丁应用到原始源文件

  • patch jacobi_seq.c -i jacobi_seq.c.patch -o jacobi_patched.c

检查正确性

[编辑 | 编辑源代码]

AutoPar 检查应用程序中预先存在的 OpenMP 指令,并验证它们是否正确地考虑了私有、归约和其他 OMP 数据共享属性。

示例输入

#include <stdio.h>
#include <omp.h>
 
int main(int argc, char *argv[]) {
 int N = 20;
 int total ;
 int i,j;
 
#pragma omp parallel for
 for (j=0;j<N;j++) {
  for (i=0;i<N;i++) {
   total += i+j ;
  }
 }
 printf("%d\n", total) ;
 return 0;
}

上面的代码包含一个真正的 OpenMP 错误,有人在尝试向他们的代码添加 OMP 注解时遇到了困难,并在以下地址提交:http://openmp.org/forum/viewtopic.php?f=3&t=821

运行 autoPar OpenMP 指令检查器会产生以下结果。

% autoPar -rose:unparse_tokens -rose:autopar:unique_indirect_index -rose:autopar:enable_diff -fopenmp -c OMPcheck.cc
<<<<<<<<<
user defined   : #pragma omp parallel for
--------
OMPcheck.cc:11
compiler generated:#pragma omp parallel for private (i) reduction (+:total)
>>>>>>>> 

来自 autoPar 的上述输出表明,OMP 指令缺少变量 'i' 的 OpenMP 'private' 声明,以及变量 'total' 的归约注解。

autoPar 旨在处理对原始数组进行操作的传统循环以及使用高级抽象的现代应用程序。并行化器使用以下算法:循环可能包含原始数据类型或 STL 容器类型的变量,或两者兼而有之。

1. 准备和预处理

  • (可选) 读取已知抽象和语义的规范文件。
  • (可选) 基于输入代码语义应用可选的自定义转换,例如将树遍历转换为内存池上的循环迭代。
  • (可选) 如果假定唯一索引索引语义,则打开间接数组访问的规范化(void uniformIndirectIndexedArrayRefs() )
    • indirect_loop_index = array_Y [current_loop_index] ;array_X[indirect_loop_index] ..; 变为 array_X [array_Y [current_loop_index]] ..
  • 规范化循环,包括使用迭代器的循环。
  • 对规范化的代码应用常量折叠以简化代码。
  • 查找具有规范形式(用于 omp for)的候选数组计算循环或对单个元素进行操作的循环和函数(用于 omp task)。

2. 对于每个候选者

  • 如果存在没有已知语义或副作用的函数调用,则跳过目标。
  • 调用依赖关系分析和生存期分析。
  • 对 OpenMP 变量进行分类(自动作用域)以使其成为私有变量、firstprivate 变量、lastprivate 变量、reduction 变量。识别对当前元素的引用,并找到与顺序无关的写入访问。
  • 消除与自动作用域变量相关的依赖关系,那些仅涉及当前元素的依赖关系,以及由与顺序无关的写入访问引起的输出依赖关系。
  • 如果不存在依赖关系,则插入相应的 OpenMP 结构。

算法的关键思想是捕获目标内的依赖关系,并根据各种规则尽可能地消除它们。如果不存在剩余的依赖关系,则并行化是安全的。

规范循环

[编辑 | 编辑源代码]

在第 51 页或(59/320):第 2.6 节,OpenMP 4.0(http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf)中定义为

for (init-expr; test-expr; incr-expr) structured-block

init-expr 以下之一

var = lb
integer-type var = lb
random-access-iterator-type var = lb
pointer-type var = lb

test-expr 以下之一

var relational-op b
b relational-op var

relational-op 以下之一

<
<=
>
>=

incr-expr 以下之一

++var
var++
--var
var--
var += incr
var -= incr
var = var + incr
var = incr + var
var = var - incr

var 以下之一

A variable of a signed or unsigned integer type.
For C++, a variable of a random access iterator type.
For C, a variable of a pointer type.
If this variable would otherwise be shared, it is implicitly made private in the loop
construct. This variable must not be modified during the execution of the for-loop  other
than in incr-expr. Unless the variable is specified lastprivate on the loop
construct, its value after the loop is unspecified.

lb 和 b:与 var 类型兼容的循环不变表达式。

incr:循环不变整数表达式。

依赖关系分析

[编辑 | 编辑源代码]

依赖关系分析是并行化器决定循环是否可并行化的基础。autoPar 从循环优化器调用依赖关系分析,该优化器实现 [35, 36] 中提出的算法以有效地转换完全嵌套循环和非完全嵌套循环。扩展方向矩阵 (EDM) 依赖关系表示用于覆盖仅围绕两个语句之一的非通用循环嵌套,以处理非完全嵌套循环。对于循环内的数组访问,使用高斯消元算法来求解一组循环归纳变量的线性整数方程。


此步骤依赖于在以下位置实现的依赖关系分析

  • src/midend/programTransformation/loopProcessing/depGraph

在以下位置提供了简化的接口函数

https://github.com/rose-compiler/rose-develop/blob/master/projects/autoParallelization/autoParSupport.C

//Compute dependence graph for a loop, using ArrayInterface and ArrayAnnoation
 LoopTreeDepGraph* ComputeDependenceGraph(SgNode* loop, ArrayInterface*, ArrayAnnotation* annot);

副作用分析

[编辑 | 编辑源代码]

依赖关系分析依赖于从副作用分析收集的变量引用信息。

分析的源代码头文件为 midend/programTransformation/loopProcessing/depInfo/DepInfoAnal.h

  • bool AnalyzeStmtRefs( AstInterface& fa, const AstNodePtr& n, CollectObject<AstNodePtr> &wRefs, CollectObject<AstNodePtr> &rRefs);

提供了一些 SageInterface 函数来使开发人员更容易地使用此分析。


// Collect all read and write references within stmt, which can be a function, a scope statement, or a single statement. Note that a reference can be both read and written, like i++.
bool SageInterface::collectReadWriteRefs (SgStatement *stmt, std::vector< SgNode * > &readRefs, std::vector< SgNode * > &writeRefs)


//Collect unique variables which are read or written within a statement. Note that a variable can be both read and written. The statement can be either of a function, a scope, or a single line statement.
bool SageInterface::collectReadWriteVariables (SgStatement *stmt, std::set< SgInitializedName * > &readVars, std::set< SgInitializedName * > &writeVars)

// Collect read only variables within a statement. The statement can be either of a function, a scope, or a single line statement.
void SageInterface::collectReadOnlyVariables (SgStatement *stmt, std::set< SgInitializedName * > &readOnlyVars)

//Collect read only variable symbols within a statement. The statement can be either of a function, a scope, or a single line statement. 
void SageInterface::collectReadOnlySymbols (SgStatement *stmt, std::set< SgVariableSymbol * > &readOnlySymbols)

自动作用域/变量分类

[编辑 | 编辑源代码]

此步骤根据变量的 live-in(在执行循环之前)和 live-out(在执行循环之后)分析结果,对变量的数据共享属性进行分类。例如,循环内的私有变量既不是循环的 live-in,也不是 live-out,这意味着该变量在循环内立即被杀死(重新定义),然后在循环内以某种方式使用,但在循环之后在任何地方都不会被使用。所有循环索引变量也被分类为 OpenMP 私有变量以避免可能的竞态条件。另一方面,共享变量在循环开始和结束时都是活动的。Firstprivate 和 lastprivate 变量分别仅在循环开始或仅在循环结束时处于活动状态。

reduction 变量的处理方式有所不同,以最大限度地提高并行化的机会。循环内的典型 reduction 操作,例如 sum = sum + a[i],会导致循环携带的输出依赖关系、循环携带的反依赖关系和循环独立的反依赖关系。我们使用一个习语识别分析来捕获此类典型操作,并在决定循环是否可并行化时排除相关的循环携带依赖关系。


此步骤依赖于在以下位置实现的生存期分析

  • src/midend/programAnalysis/defUseAnaysis/LivenessAnalysis.cpp
  • 存在一个 SageInterface 函数来调用此分析
//!Get liveIn and liveOut variables for a for loop
void SageInterface::getLiveVariables(LivenessAnalysis * liv, SgForStatement* loop, std::set<SgInitializedName*>& liveIns, std::set<SgIn
itializedName*> & liveOuts)


算法:私有变量和 reduction 变量会导致依赖关系(被写入)。firstprivate 和 lastprivate 变量从未在循环中被写入(没有依赖关系)

       live-in   live-out
shared      Y      Y   no written, no dependences: no special handling, shared by default 
private      N      N   written (depVars), need privatization: depVars- liveIns - liveOuts 
firstprivate   Y      N   liveIns - LiveOus - writtenVariables
lastprivate    N      Y   liveOuts - LiveIns 
reduction     Y      Y   depVars Intersection (liveIns Intersection liveOuts)

识别 reduction 变量:RecognizeReduction()

 /* 
  * Algorithms:
  *  for each scalar candidate which are both live-in and live-out for the loop body
  *  and which is not the loop invariant variable.
  *  Consider those with only 1 or 2 references
  *  1 reference
  *   the operation is one of x++, ++x, x--, --x, x binop= expr
  *  2 references belonging to the same operation
  *   operations: one of x= x op expr, x = expr op x (except for subtraction)
  * Also according to the specification.
  * x is not referenced in exp
  * expr has scalar type (no array, objects etc)
  * x: scalar only, aggregate types (including arrays), pointer types and reference types may not appear in a reduction clause.
  * op is not an overloaded operator, but +, *, -, &, ^ ,|, &&, ||
  * binop is not an overloaded operator but: +, *, -, &, ^ ,| 
  *
  */

通过注释进行语义规范

[编辑 | 编辑源代码]

我们使用一个单独的文件来存储输入代码的注释。该文件本质上指定了输入代码的语义,以便工具可以利用它来并行化代码。

一些注释文件 (*.annot) 可以在项目目录中的 https://github.com/rose-compiler/rose-develop/tree/master/projects/autoParallelization/tests 中找到。

示例注释为

operator atoll(const char * val)
  {
  modify none; read{val}; alias none;
  }

class InternalIndex { has_value { stride = this.stride; base = this.base; length = this.length; } }
class Index { has_value { stride = this.stride; base = this.base; length = this.length; } }
class Range { has_value { stride = this.stride; base = this.base; length = this.length; } }

operator Index::Index( int lb, int l, int step)
   {
    modify none; read {lb, l, step}; alias none;
    restrict_value { result = { base = lb; length = l; stride = step; } };
   }
operator Range::Range( int lb, int ub, int step)
   {
    modify none; read {lb, ub, step}; alias none;
    restrict_value { result = { base = lb; length = (ub-lb+1)/step; stride = step; } };
   }
operator InternalIndex::InternalIndex( int lb, int l, int step)
   {
    modify none; read {lb, l, step}; alias none;
    restrict_value { result = { base = lb; length = l; stride = step; } };
   }

operator operator+ ( const InternalIndex & lhs, int x )
   {
    modify none; read {lhs, x}; alias none;
  restrict_value { result = {stride = lhs.stride; base = lhs.base + x;
                length = lhs.length}; };
   }
operator operator- ( const InternalIndex & lhs, int x )
   {
    modify none; read {lhs, x}; alias none;
  restrict_value { result = {stride = lhs.stride; base = lhs.base - x;
                length = lhs.length}; };
   }

数组抽象

class floatArray {
  array {
   dimension = 6;
   length(i) = this.Array_Descriptor.Array_Domain.Size[i];
   elem(i:dim:1:dimension) = this(i$dim);
   reshape(i:dim:1:dimension) = this.resize(i$dim);
   };
   array_optimize {
   define {
    float* _pointer = this.getDataPointer();
    int _size:dim:1:dimension = this.Array_Descriptor.Array_Domain.Size[dim-1];
    int _stride:dim:1:dimension = this.Array_Descriptor.Array_Domain.Stride[dim-1];
    int _length:dim:1:dimension = this.Array_Descriptor.Array_Domain.getLength(dim-1)
   }
   length(i) = _length$(i+1);
   elem(i:dim:1:dimension) =
      _pointer[i$1 + repeat(x,2,dimension, i$x * _stride$(x-1) * _size$(x-1))];
   };
   has_value { dimension = 6; length_0 = this.length(0);
         length_1 = this.length(1); length_2 = this.length(2); };
 }

operator floatArray::operator() (int index)
       {
          modify none; read {index};
          alias none;
          restrict_value { this = { dimension = 1; } };
          inline { this.elem(index) };
       }

operator floatArray::operator() ( const InternalIndex& index)
       {
          modify none; read {this, index};
          alias { (result, this) };
          restrict_value
            { this = { dimension = 1; };
             result = {dimension = 1; length_0 = index.length;}; };
          construct_array (this) {
            dimension = 1;
            length(0) = index.length;
            elem(i) = this.elem(i * index.stride + index.base);
          };
        }
operator floatArray::operator()(const InternalIndex& index1,
                const InternalIndex& index2)
       {
          modify none; read {this, index1, index2};
          alias { (result, this) };
          restrict_value
            { this = { dimension = 2; };
             result = {dimension = 2; length_0 = index1.length;
                  length_1 = index2.length};
            };
          construct_array (this) {
            dimension = 2;
            length(0) = index1.length; length(1) = index2.length;
            elem(i,j) = this.elem(i * index1.stride + index1.base,
                       j * index2.stride + index2.base);
          };
        }
operator floatArray::operator()(const InternalIndex& index1,
                const InternalIndex& index2,
                const InternalIndex& index3)
       {
          modify none; read {this, index1, index2, index3};
          alias { (result, this) };
          restrict_value
            { this = { dimension = 3; };
             result = {dimension = 3; length_0 = index1.length;
                  length_1 = index2.length; length_2 = index3.length; };
            };
          construct_array (this) {
            dimension = 3;
            length(0) = index1.length; length(1) = index2.length;
            length(2) = index3.length;
            elem(i,j,k) = this.elem(i * index1.stride + index1.base,
                       j * index2.stride + index2.base,
                       k * index3.stride + index3.base);
          };
        }

operator floatArray::operator= (const floatArray& that)
       {
          modify {this}; read {that}; alias none;
          restrict_value {
           result = { dimension = that.dimension; length_0 = that.length_0;};
          };
          modify_array (this) {
            dimension = that.dimension;
            length(i) = that.length(i);
            elem(i:dim:1:dimension) = that.elem(i$dim);
          };
        }
operator floatArray::operator= (float val)
       {
          modify {this}; read {val}; alias none;
          modify_array (this) {
            elem(i:dim:1:dimension) = val;
          };
        }

operator floatArray::getLength( int dim)
       {
          modify none; read {dim}; alias none;
          inline { this.length(dim) };
       }

依赖关系消除

[编辑 | 编辑源代码]

并非所有依赖关系都会阻止我们并行化循环。在我们的考虑中,我们排除了某些无害的依赖关系。

void DependenceElimination(SgNode* sg_node, LoopTreeDepGraph* depgraph, std::vector<DepInfo>& remainings, OmpSupport::OmpAttribute* att, std::map<SgNode*, bool> & indirect_table, ArrayInterface* array_interface/*=0*/, ArrayAnnotation* annot/*=0*/)

算法,消除以下依赖关系

  • 由局部声明的变量引起:已经是每个迭代的私有变量
  • 源变量或汇变量是线程局部变量
  • 忽略具有空源/汇节点的依赖关系
  • commonlevel ==0,没有共同的封闭循环
  • carry level !=0,循环独立,
  • 依赖关系中至少有一个数组引用,但依赖关系类型为 SCALAR_DEP 或 SCALAR_BACK_DEP
  • 由自动作用域变量引起的依赖关系(private、firstprivate、lastprivate、reduction 等)
  • 由一对间接索引数组引用引起的依赖关系,如果我们知道间接索引数组具有唯一的元素。


请注意,为什么 carry level 必须为 0 才能防止我们的分析中的并行化

  • 这是有效的,因为我们在循环的每一级上运行 dep 分析。
  • 为了考虑当前循环,仅由当前循环携带的依赖关系很重要。
  • 当前循环始终是最外层循环,carry level id 为 0。

带调试信息的示例

[编辑 | 编辑源代码]

autoPar -rose:autopar:enable_debug sourcetree/projects/autoParallelization/tests/jacobi_seq.c

搜索“此 dep 关系无法消除。保存到剩余的依赖关系集中。”以查找具有无法消除的循环携带依赖关系的循环。

// Input code from Jacobi, expected to generate
// #pragma omp parallel for private(i,j,xx,yy,temp) reduction(+:error)

185  for (i = 0; i < n; i++)
186   for (j = 0; j < m; j++)
187    {
188     xx = -1.0 + dx * (i - 1);
189     yy = -1.0 + dy * (j - 1);
190     temp = u[i][j] - (1.0 - xx * xx) * (1.0 - yy * yy);
191     error = error + temp * temp;
192    }


Debug: ComputeDependenceGraph() dumps the dependence graph for the loop at line :185
dep SgExprStatement:xx = - 1.0 + dx *(i - 1); SgExprStatement:xx = - 1.0 + dx *(i - 1); 0x7fffb5d03db0 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (2,2) Not precise
	SgVarRefExp:xx@188:2->
	SgVarRefExp:xx@188:2
== 0;* 0;||* 0;== 0;||::

dep SgExprStatement:xx = - 1.0 + dx *(i - 1); SgExprStatement:temp = u[i][j] -(1.0 - xx * xx) *(1.0 - yy * yy); 0x7fffb5d03db0 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (0,0) Not precise
	SgVarRefExp:xx@188:2->
	SgPntrArrRefExp:u[i][j]@190:13
* 0;* 0;||* 0;* 0;||::

dep SgExprStatement:xx = - 1.0 + dx *(i - 1); SgExprStatement:temp = u[i][j] -(1.0 - xx * xx) *(1.0 - yy * yy); 0x7fffb5d03db0 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (2,2) Not precise
	SgVarRefExp:xx@188:2->
	SgVarRefExp:xx@190:26
== 0;* 0;||* 0;== 0;||::

dep SgExprStatement:xx = - 1.0 + dx *(i - 1); SgExprStatement:temp = u[i][j] -(1.0 - xx * xx) *(1.0 - yy * yy); 0x7fffb5d03db0 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (2,2) Not precise
	SgVarRefExp:xx@188:2->
	SgVarRefExp:xx@190:31
== 0;* 0;||* 0;== 0;||::

..

dep SgExprStatement:error = error + temp * temp; SgExprStatement:error = error + temp * temp; 0x7fffb5d03db0 Distance Matrix size:2*2 SCALAR_BACK_DEP; commonlevel = 2 CarryLevel = (1,1) Not precise
	SgVarRefExp:error@191:2->
	SgVarRefExp:error@191:10
== 0;* 0;||* 0;<= -1;||::
...
dep SgExprStatement:error = error + temp * temp; SgExprStatement:temp = u[i][j] -(1.0 - xx * xx) *(1.0 - yy * yy); 0x7fffb5d03db0 Distance Matrix size:2*2 SCALAR_BACK_DEP; commonlevel = 2 CarryLevel = (1,1) Not precise
	SgVarRefExp:temp@191:25->
	SgVarRefExp:temp@190:2
== 0;* 0;||* 0;<= -1;||::

dep SgExprStatement:error = error + temp * temp; SgExprStatement:temp = u[i][j] -(1.0 - xx * xx) *(1.0 - yy * yy); 0x7fffb5d03db0 Distance Matrix size:2*2 SCALAR_BACK_DEP; commonlevel = 2 CarryLevel = (0,0) Not precise
	SgVarRefExp:error@191:2->
	SgPntrArrRefExp:u[i][j]@190:13
* 0;* 0;||* 0;* 0;||::
--------------------------------------------------------
Live-in variables for loop:186
error
::m
::n
Live-out variables for loop before line:193
error
::m
::n
Final Live-in variables for loop:
error
::m
::n
Final Live-out variables for loop:
error
::m
::n
Debug after CollectVisibleVaribles ():
0x7f1640ef0678 ::n
0x7f1640ef07c0 ::m
0x7f1640ef1200 ::dx
0x7f1640ef1348 ::dy
0x7f1640ef2680 j
0x7f1640ef27c8 xx
0x7f1640ef2910 yy
0x7f1640ef2a58 temp
0x7f1640ef2ba0 error

Debug after CollectVariablesWithDependence():
0x7f1640ef27c8 xx
0x7f1640ef2910 yy
0x7f1640ef2a58 temp
0x7f1640ef2ba0 error

Debug dump private:
0x7f1640ef27c8 xx
0x7f1640ef2910 yy
0x7f1640ef2a58 temp
0x7f1640ef2538 i
0x7f1640ef2680 j

Debug dump lastprivate:
Debug: A candidate used twice:error
Debug dump firstprivate:

Entering DependenceElimination ()
----------------------------------
Eliminating a dep relation due to at least one autoscoped variables
0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (2,2) Not precise
	SgVarRefExp:xx@188:2->
	SgVarRefExp:xx@188:2
== 0;* 0;||* 0;== 0;||::

Eliminating a dep relation due to scalar dep type for at least one array variable
0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (0,0) Not precise
	SgVarRefExp:xx@188:2->
	SgPntrArrRefExp:u[i][j]@190:13
* 0;* 0;||* 0;* 0;||::

Eliminating a dep relation due to at least one autoscoped variables
0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (2,2) Not precise
	SgVarRefExp:xx@188:2->
	SgVarRefExp:xx@190:26
== 0;* 0;||* 0;== 0;||::

Eliminating a dep relation due to scalar dep type for at least one array variable
0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (0,0) Not precise
	SgVarRefExp:yy@189:2->
	SgPntrArrRefExp:u[i][j]@190:13
* 0;* 0;||* 0;* 0;||::

..
Eliminating a dep relation due to at least one autoscoped variables
0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_BACK_DEP; commonlevel = 2 CarryLevel = (1,1) Not precise
	SgVarRefExp:yy@190:44->
	SgVarRefExp:yy@189:2
== 0;* 0;||* 0;<= -1;||::

...

Eliminating a dep relation due to scalar dep type for at least one array variable
0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_DEP; commonlevel = 2 CarryLevel = (0,0) Not precise
	SgPntrArrRefExp:u[i][j]@190:13->
	SgVarRefExp:error@191:2
* 0;* 0;||* 0;* 0;||::

...

Eliminating a dep relation due to at least one autoscoped variables
0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_BACK_DEP; commonlevel = 2 CarryLevel = (1,1) Not precise
	SgVarRefExp:error@191:2->
	SgVarRefExp:error@191:10
== 0;* 0;||* 0;<= -1;||::

Eliminating a dep relation due to at least one autoscoped variables
0x7fffb5d03f80 Distance Matrix size:2*2 SCALAR_BACK_DEP; commonlevel = 2 CarryLevel = (1,1) Not precise
	SgVarRefExp:temp@191:18->
	SgVarRefExp:temp@190:2
== 0;* 0;||* 0;<= -1;||::


Exiting DependenceElimination ()
----------------------------------

Automatically parallelized a loop at line:185
debug patch generation: attaching OMP att to sg_node SgForStatement at line 185

注释编码语义

[编辑 | 编辑源代码]

用于编码类型(类)及其操作(运算符)的语义信息

源文件:https://github.com/rose-compiler/rose-develop/tree/master/src/midend/astUtil/annotation

// annotations are separated by ;
<annot> ::= <annot1> 
      | <annot1>;<annot>

// two main categories: class and operator 
<annot1> ::= class <cls annot>
     | operator <op annot>

//---------class------------
// If this is an array-like class? 
// e.g: class MyArray:
<cls annot> ::= <clsname>:<cls annot1>;
<cls annot1>::= <cls annot2> 
       | <cls annot2> <cls annot1>
//from bool TypeDescriptor:: read(istream& in) in AnnotDescriptors.C
// type name can have const modifier, with qualifiers, and of reference or pointer type
<clasname>::= [const] name1[::name2[...]][&|*]
 
// array information, inheritable or not
<cls annot2>::= <arr annot>
       | inheritable <arr annot>
       | has-value { <val def> } 

//Array information: dimension, length, element, reshape 
<arr annot>::= is-array{ <arr def>}
       | is-array{define{<stmts>}<arr def>}  // define loop-invariant operations that should be executed before enumerating array elements

<arr def> ::= <arr attr def> 
       | <arr attr def> <arr def>

<arr attr def> ::= <arr attr>=<expression>;
<arr attr> ::= dim      // at most 'dim' dimensions
       | len (<param>) // length of each dimension: parameter is used to specify which dimension
       | elem(<paramlist>) //i$x:0:dim-1 ==> $x is from 0 to dim-1 ==> list of parameters i_0, i_1, i_dim-1
       | reshape(<paramlist>)

//Data members for types/classes
//Could have expressions
<val def> ::= <name>; 
       | <name>;<val def>
       | <name> = <expression> ;
       | <name> = <expression> ; <val def>

//---------operator------------
// side effects: mod, read, alias
<op annot> ::= <opdecl> : <op annot1> ;
<op annot1> ::=<op annot2> 
       | <op annot2> <op annot1>
<op annot2> ::= modify <namelist>         // side effects on scalar:e.g. modify none
       | new-array (<aliaslist>){<arr def>} // side effects on arrays: creational 
       | modify-array (<name>) {<arr def>} // side effects on arrays: modifying only
       | restrict-value {<val def list>}  // How to get member values from this operation's semantics( may be from parameters values)
       | read <namelist>          // e.g.: read {a,b,c}
       | alias <nameGrouplist>       // e.g.: alias none
       | allow-alias <nameGrouplist>
       | inline <expression>        // this operator is semantically equal to another operator/expression, used to substitute an operation with another

// from OperatorDescriptors.h NameGropu, OperatorAliasDescriptor
<nameGrouplist> e.g: {(x,y,z), (a,b), (m,n)}

数组和数组优化(重写为原始数组)

class floatArray {

  array { // high level array abstraction with range from 1 to dimension
   dimension = 6; // max dimension
   length(i) = this.Array_Descriptor.Array_Domain.Size[i]; // length of each dimension
//i:dim:1:dimension/ i$dim means a list of parameters i_1, i_2, .., i_dimension?
// i:dim:1:dimension also defines a range variable dim, which is reused later in i$dim
   elem(i:dim:1:dimension) = this(i$dim); 
   reshape(i:dim:1:dimension) = this.resize(i$dim);
   };

   array_optimize { // low level array with range from 0 to dimension-1
   define {
    float* _pointer = this.getDataPointer();
    int _size:dim:1:dimension = this.Array_Descriptor.Array_Domain.Size[dim-1];
    int _stride:dim:1:dimension = this.Array_Descriptor.Array_Domain.Stride[dim-1];
    int _length:dim:1:dimension = this.Array_Descriptor.Array_Domain.getLength(dim-1)
   }
   length(i) = _length$(i+1);
   elem(i:dim:1:dimension) =
      _pointer[i$1 + repeat(x,2,dimension, i$x * _stride$(x-1) * _size$(x-1))];
   };

   has_value { dimension = 6; length_0 = this.length(0); 
         length_1 = this.length(1); length_2 = this.length(2); };
 }

dump ()--------------
arrays: 

 floatArray : {dimension=6; length=((i:::):[](.(.(.(this,Array_Descriptor),Array_Domain),Size),i)); elem=((i:dim:1:dimension):this($(i,dim))) } reshape = ((i:dim:1:dimension):FunctionPtrCall(.(this,resize),$(i,dim)))

array optimizations: 

 floatArray : 
{float* _pointer:::=FunctionPtrCall(.(this,getDataPointer));
int _size:dim:1:dimension=[](.(.(.(this,Array_Descriptor),Array_Domain),Size),+(-1dim));
int _stride:dim:1:dimension=[](.(.(.(this,Array_Descriptor),Array_Domain),Stride),+(-1dim));
int _length:dim:1:dimension=FunctionPtrCall(.(.(.(this,Array_Descriptor),Array_Domain),getLength),+(-1dim))
}

{dimension=; length=((i:::):$(_length,+(1i))); elem=((i:dim:1:dimension):[](_pointer,+($(i,1)repeat(x,2,dimension,*($(i,x)$(_stride,+(-1x))$(_size,+(-1x))))))) 
} 


函数副作用注释:读、修改和别名的变量集

operator VectorXY::VectorXY(double xx, double yy)
{
 modify none; read {xx,yy}; 
}


// side effects annotation 
operator floatArray::operator() ( const InternalIndex& index) 
       {
          modify none; read {this, index};
        }
// Dump(): mangled_function_name: (all variables): {modified variable }
 floatArray::operator()_constInternalIndex& : (this,index):{}

// another annotation, side effects, alias, value
operator Index::Index( int lb, int l, int step) 
   {
    modify none; read {lb, l, step}; alias none; 
    restrict_value { result = { base = lb; length = l; stride = step; } };
   }

// Dump() 
read: 
 Index::Index_int_int_int : (lb,l,step):{l,lb,step}


// allow_alias: used only for a function operating on two or more parameters? tell if parameters can be alias to each other?

operator interpolate2D(floatArray & fineGrid,floatArray & coarseGrid) 
  {
  allow_alias none;
  }

has_value 和 restrict _value 注释:“this” 和 “result” 是什么?

// has_value: Only used for within class annotation, 
class floatArray {
..
   has_value { dimension = 6; length_0 = this.length(0); 
         length_1 = this.length(1); length_2 = this.length(2); };
class Range { 
   has_value { stride = this.stride; base = this.base; length = this.length; } 
}


//Dump:
floatArray : {dimension:6;length_0:FunctionPtrCall(.(this,length),0);length_1:FunctionPtrCall(.(this,length),1);length_2:FunctionPtrCall(.(this,length),2)}
}

 Range : {base:.(this,base);length:.(this,length);stride:.(this,stride)}


// restrict_value: infer the value of member variable from the semantics of current operations: used for operation annotations. could have inferred information for one "this" and multiple "result" depending on function parameters

operator floatArray::operator() (int index) 
       { ..
 restrict_value { this = { dimension = 1; } }; // this element access operator using one integer parameter can tell us that the current object (this) has a dimension value equaling to 1
}

operator floatArray::operator()(const InternalIndex& index1,
                const InternalIndex& index2,
                const InternalIndex& index3) 
       {
         restrict_value
            { this = { dimension = 3; };
             result = {dimension = 3; length_0 = index1.length;
                  length_1 = index2.length; length_2 = index3.length; };
            };

}

C 函数注释

[编辑 | 编辑源代码]
operator sqrt( double val)
  {
   modify none; read{val}; alias none;
  }

operator sqrt( float val)
  {
   modify none; read{val}; alias none;
  }

operator abs(int val)
  {
  modify none; read{val}; alias none;
  }


出版物

[编辑 | 编辑源代码]
  • 旧技术报告:廖春华、丹尼尔·J·昆兰、杰里米·J·威尔科克和托马斯·帕纳斯。基于 STL 语义的 OpenMP 自动并行化。技术报告,劳伦斯利弗莫尔国家实验室 (LLNL),利弗莫尔,加利福尼亚州(美国),2008 年。
  • 一篇研讨会论文:Chunhua Liao、Daniel J. Quinlan、Jeremiah J. Willcock 和 Thomas Panas,扩展自动并行化以优化多核的高级抽象,2009 年 5 月国际 OpenMP 研讨会论文集:在极端并行时代发展 OpenMP(德国德累斯顿,2009 年 6 月 3 日至 5 日)。pdf
  • 论文的期刊版本:Chunhua Liao、Daniel J. Quinlan、Jeremiah J. Willcock 和 Thomas Panas,使用高级抽象进行现代应用程序的语义感知自动并行化,并行编程杂志,2010 年 1 月接受 pdf
华夏公益教科书