ROSE 编译器框架/大纲
基本概念:大纲化是将一段连续的语句替换为对包含这些语句的新函数的函数调用。从概念上讲,大纲化是内联的逆过程。
用途:大纲化广泛用于生成要在 CPU 和/或 GPU 上执行的内核函数。
- 帮助实现 OpenMP 等编程模型
- 支持通过首先从代码部分生成函数来对代码部分进行经验性调整。
ROSE 提供了一个名为 AST 大纲化的内置翻译器,它可以概述指定的部分代码并从中生成函数。
AST 大纲化的官方文档位于 ROSE 教程的第 37 章 使用 AST 大纲化。 pdf.
使用大纲化器主要有两种方法。
- 命令行方法:可以使用命令(outline )带选项来指定大纲目标,有两种方法来指定要概述的代码部分
- 在输入程序中使用特殊的编译指示来标记大纲目标,然后调用高级驱动程序例程来处理这些编译指示。
- 在命令行中使用抽象句柄字符串(在 ROSE 教程的第 46 章中详细介绍)
- 函数调用方法:调用“低级”大纲例程,这些例程直接在要概述的 AST 节点上操作
请遵循 https://github.com/rose-compiler/rose/wiki/How-to-Set-Up-ROSE 中的说明
如果您是从源代码安装的,请查看 https://github.com/rose-compiler/rose/wiki/Install-Rose-From-Source
要仅安装该工具,请键入
- make install -C tests/nonsmoke/functional/roseTests/astOutliningTests
大纲工具将安装为
- ROSE_INST/bin/outline
工具 rose/bin/outline 依赖于 1) 输入代码中的编译指示或 2) 作为命令行选项指定的抽象句柄来查找要概述的目标代码部分。
- 编译指示:将 #pragam rose_outline 放在要概述的代码部分的前面,在输入代码中
- 抽象句柄:-rose:outline:abstract_handle your_handle_string
./outline --help | more
Outliner-specific options
Usage: outline [OPTION]... FILENAME...
Main operation mode:
-rose:outline:preproc-only preprocessing only, no actual outlining
-rose:outline:abstract_handle handle_string using an abstract handle to specify an outlining target
-rose:outline:parameter_wrapper use an array of pointers to pack the variables to be passed
-rose:outline:structure_wrapper use a data structure to pack the variables to be passed
-rose:outline:enable_classic use parameters directly in the outlined function body without transferring statement, C only
-rose:outline:temp_variable use temp variables to reduce pointer dereferencing for the variables to be passed
-rose:outline:enable_liveness use liveness analysis to reduce restoring statements if temp_variable is turned on
-rose:outline:new_file use a new source file for the generated outlined function
-rose:outline:output_path the path to store newly generated files for outlined functions, if requested by new_file. The original source file's path is used by default.
-rose:outline:exclude_headers do not include any headers in the new file for outlined functions
-rose:outline:use_dlopen use dlopen() to find the outlined functions saved in new files.It will turn on new_file and parameter_wrapper flags internally
-rose:outline:copy_orig_file used with dlopen(): single lib source file copied from the entire original input file. All generated outlined functions are appended to the lib source file
-rose:outline:enable_debug run outliner in a debugging mode
-rose:outline:select_omp_loop select OpenMP for loops for outlining, used for testing purpose
- outline test.cpp // 概述 test.cpp 中的代码部分。这些代码部分由特殊的 rose_outline 编译指示标记
- outline -rose:skipfinalCompileStep -rose:outline:new_file test.cpp // 跳过编译生成的 rose_? 文件,将生成的函数放入新文件
在命令行中使用抽象句柄,不再需要在输入代码中插入编译指示
- outline -rose:outline:abstract_handle ”ForStatement<position,12>” test3.cpp // 概述 test3.cpp 第 12 行的 for 循环
- outline -rose:outline:abstract_handle ”FunctionDeclaration<name,initialize>::ForStatement<numbering,2>” test2.cpp // 概述 test2.cpp 文件中名为“initialize”的函数内的第 2 个 for 循环。
/home/liao6/workspace/masterDevClean/buildtree/tests/roseTests/astOutliningTests/outline -rose:outline:new_file -rose:outline:temp_variable -rose:outline:exclude_headers -rose:outline:abstract_handle 'ForStatement<numbering,1>' -c /home/liao6/workspace/masterDevClean/sourcetree/tests/roseTests/astOutliningTests/complexStruct.c
您可以构建自己的翻译器,利用 ROSE 中的大纲支持。编程 API 定义在
- 头文件:src/midend/programTransformation/astOutlining/
- 命名空间:Outliner
提供了一些函数和选项
- 函数:Outliner::outline()、Outliner::isOutlineable()
- 选项
Outliner.cc
namespace Outliner {
//! A set of flags to control the internal behavior of the outliner
bool enable_classic=false;
// use a wrapper for all variables or one parameter for a variable or a wrapper for all variables
bool useParameterWrapper=false; // use an array of pointers wrapper for parameters of the outlined function
bool useStructureWrapper=false; // use a structure wrapper for parameters of the outlined function
bool preproc_only_=false; // preprocessing only
bool useNewFile=false; // generate the outlined function into a new source file
bool copy_origFile=false; // when generating the new file to store outlined function, copy entire original file to it.
bool temp_variable=false; // use temporary variables to reduce pointer dereferencing
bool enable_liveness =false;
bool enable_debug=false; //
bool exclude_headers=false;
bool use_dlopen=false; // Outlining the target to a separated file and calling it using a dlopen() scheme. It turns on useNewFile.
std::string output_path=""; // default output path is the original file's directory
std::vector<std::string> handles; // abstract handles of outlining targets, given by command line option -rose:outline:abstract_handle for each
// DQ (3/19/2019): Suppress the output of the #include "autotuning_lib.h" since some tools will want to define their own supporting libraries and header files.
bool suppress_autotuning_header = false; // when generating the new file to store outlined function, suppress output of #include "autotuning_lib.h".
};
大纲化器使用三种方法来查找要概述的代码部分
- collectPragms() 用于 C/C++
- collectFortranTarget() 用于 Fortran,
- collectAbstractHandles() 使用抽象句柄
大纲程序的顶级驱动程序:PragmaInterface.cc
- Outliner::outlineAll (SgProject* project)
- collectPragms() 用于 C/C++ 或 collectFortranTarget() 用于 Fortran,或 collectAbstractHandles() 使用抽象句柄
- outline(SgPragmaDeclaration)
- outline(SgStatement, func_name)
- preprocess(s)
- outlineBlock (s_post, func_name) // Transform.cc 这里的主要函数!!
- outline(SgStatement, func_name)
- deleteAST(SgPragmaDeclaration)
检查 SgNode 是否有资格进行大纲化。
- Outliner::isOutlineable() src/Check.cc:251
- checkType() // 只有指定的 SgNode 类型可以进行大纲化,这里维护着一个列表
- 排除 SgVariableDeclaration
- 必须包含在函数声明内
- 排除模板实例化(成员)函数声明
- 不引用隐藏类型...
有两个阶段:预处理和实际转换。
- SgBasicBlock* s_post = preprocess (s);
- SgStatement * processPragma (SgPragmaDeclaration* decl) // 检查它是否是大纲编译指示 (#pragma rose_outline),如果是,则返回下一个语句。
- Outliner::preprocess(SgStatement);
- SgBasicBlock * Outliner::Preprocess::preprocessOutlineTarget (SgStatement* s)
- normalizeVarDecl()
- createBlock()
- Outliner::Preprocess::transformPreprocIfs
- Outliner::Preprocess::transformThisExprs
- Outliner::Preprocess::transformNonLocalControlFlow
- Outliner::Preprocess::gatherNonLocalDecls(); // 在这里复制函数声明,例如 test2005_179.C
- SgBasicBlock * Outliner::Preprocess::preprocessOutlineTarget (SgStatement* s)
Outliner::outline(stmt) --> generateFuncName(s) 唯一函数名称 Outliner::outline (stmt, func_name)
- Outliner::Transform::outlineBlock (s_post, func_name); // Transform.cc
- Outliner::Transform::collectVars (s, syms); // 收集要传递的变量
- Outliner::generateFunction() // 生成一个概述的函数,src/midend/programTransformation/astOutlining/GenerateFunc.cc
- createFuncSkeleton()
- moveStatementsBetweenBlocks (s, func_body); // 将源 BB 中的语句移动到函数体中
- variableHandling (syms, func, vsym_remap); // 添加解包语句
- createParam() // 创建参数
- createUnpackDecl() // 创建解包语句:int local = parameter,来自 src/midend/programTransformation/astOutlining/GenerateFunc.cc
- createPackStmt() // 在所有局部计算后将局部变量传回参数
- remapVarSyms (vsym_remap, func_body); // 变量替换
- insert() 来自 Insert.cc // 插入轮廓函数及其原型
- insertFriendDecls()
- insertGlobalPrototype()
- GlobalProtoInserter::insertManually ()
- generatePrototype()
- GlobalProtoInserter::insertManually ()
- generateCall() // 生成对轮廓函数的调用
- ASTtools::replaceStatement () // 用调用替换原始部分
调用栈
#0 Outliner::generateFunction (s=0x7fffe849a990, func_name_str="OUT__1__11770__", syms=..) at ../../../sourcetree/src/midend/programTransformation/astOutlining/GenerateFunc.cc:1283 #1 0x00007ffff65bd93e in Outliner::outlineBlock (s=0x7fffe849a990, func_name_str="OUT__1__11770__") at ../../../sourcetree/src/midend/programTransformation/astOutlining/Transform.cc:310 #2 0x00007ffff6589b09 in Outliner::outline (s=0x7fffe849a990, func_name="OUT__1__11770__") at ../../../sourcetree/src/midend/programTransformation/astOutlining/Outliner.cc:166 #3 0x00007ffff65907f9 in Outliner::outline (decl=0x7fffe87a2310) at ../../../sourcetree/src/midend/programTransformation/astOutlining/PragmaInterface.cc:141 #4 0x00007ffff65911b8 in Outliner::outlineAll (project=0x7fffebc38010) at ../../../sourcetree/src/midend/programTransformation/astOutlining/PragmaInterface.cc:355 #5 0x000000000040c84f in main (argc=12, argv=0x7fffffffae38) at ../../../../../../sourcetree/tests/nonsmoke/functional/roseTests/astOutliningTests/outline.cc:51
对于要进行轮廓化的 C++ 代码块,我们必须检查对私有成员的访问并添加必要的友元函数声明
创建调用链:全部在 Insert.cc 中
- Outliner::insert (SgFunctionDeclaration* func, SgGlobal* scope, SgBasicBlock* target_outlined_code )
- insertFriendDecls (SgFunctionDeclaration* func, SgGlobal* scope, FuncDeclList_t& friends) // 这里的 func 是什么?
- insertFriendDecl (const SgFunctionDeclaration* func, SgGlobal* scope, SgClassDefinition* cls_def)
- generateFriendPrototype (const SgFunctionDeclaration* full_decl, SgScopeStatement* scope, SgScopeStatement* class_scope) Insert.cc
- insertFriendDecl (const SgFunctionDeclaration* func, SgGlobal* scope, SgClassDefinition* cls_def)
- insertFriendDecls (SgFunctionDeclaration* func, SgGlobal* scope, FuncDeclList_t& friends) // 这里的 func 是什么?
insertFriendDecls (SgFunctionDeclaration* func, SgGlobal* scope, FuncDeclList_t& friends) 的算法
对于轮廓函数
- 使用 isProtPrivMember (func) 查找对类私有变量的引用
- 使用 isProtPrivMember (f_ref) 查找对类私有成员函数的引用
- 将相关的类定义保存到一个列表中
如果轮廓函数将在新的源文件中创建,轮廓器也会将相关的声明复制到新的源文件中。所使用的相关函数是 SageInterface::appendStatementWithDependentDeclaration(func,glob_scope,func_orig,exclude_headers);
使用该函数的代码位于源文件的第 636 行:https://github.com/rose-compiler/rose/blob/weekly/src/midend/programTransformation/astOutlining/Transform.cc
变量处理
[edit | edit source]变量处理过程会找到代码块中使用的变量,并决定如何将变量传递到轮廓函数中以及从轮廓函数中传递出去。它依赖于几个程序分析来获得最佳结果。
- 作用域分析(在 CollectVars.cc 中):决定哪些变量应该作为函数参数传递,使用变量声明相对于轮廓函数位置的可见性。如果原始声明对轮廓函数可见,则无需将其作为函数参数传递。
- collectPointerDereferencingVar:查找应该在轮廓函数中使用指针解引用(在 VarSym.cc 中)的变量:ASTtools::collectPointerDereferencingVarSyms(s,pdSyms);
- 副作用分析:SageInterface::collectReadOnlyVariables(s,readOnlyVars);
- 存活性分析:SageInterface::getLiveVariables(liv, isSgForStatement(firstStmt), liveIns, liveOuts);
作用域分析:变量集和集合运算的符号,用于获取哪些变量应该作为函数参数传递,在以下文件中实现:
- U:要进行轮廓化的代码块 (s) 中使用的变量集
- L:在 s 中声明的局部变量
- U-L:应该作为函数参数传递到轮廓函数中或从轮廓函数中传递出去的变量
- Q:在包含 s 的函数中定义,在 s 处可见,但不是在包含函数之外全局声明的变量。如果轮廓函数放在同一个文件中,则全局变量不应该作为参数传递。
- (U-L) Intersect Q:要传递到轮廓函数中的变量
ASTtools::collectPointerDereferencingVarSyms ():收集要在轮廓函数中用指针解引用 (pdSym) 替换的变量
- pdSyms = useByAddressVars + Non-assignableVars + Struct/ClassVars
- 按地址使用分析:collectVarRefsUsingAddress(s, varSetB); 例如 &a
- 不可分配变量分析:collectVarRefsOfTypeWithoutAssignmentSupport(s,varSetB); 类型不可分配的变量
- 类/结构体变量:按引用传递对它们来说更高效
calculateVariableRestorationSet():确定在轮廓函数的末尾是否需要从其克隆中恢复某些变量,仅在变量克隆功能开启时使用
- 检查每个函数参数
- 如果 isWritten && isLiveOut,则应恢复参数:在轮廓函数中更改,并在轮廓函数之后使用。
Transform.cc
/**
* Major work of outlining is done here
* Preparations: variable collection
* Generate outlined function
* Replace outlining target with a function call
* Append dependent declarations,headers to new file if needed
*/
Outliner::Result
Outliner::outlineBlock (SgBasicBlock* s, const string& func_name_str)
{
...
SgClassDeclaration* struct_decl = NULL;
if (Outliner::useStructureWrapper)
{
struct_decl = generateParameterStructureDeclaration (s, func_name_str, syms, pdSyms, glob_scope);
ROSE_ASSERT (struct_decl != NULL);
}
std::set<SgInitializedName*> restoreVars;
calculateVariableRestorationSet (syms, readOnlyVars,liveOuts,restoreVars);
高级功能
[edit | edit source]可以使用命令行选项或编程 API 的内部标志指定轮廓化的一些详细信息。
列表
- 将所有变量包装到一个数据结构中:Outliner::useStructureWrapper
变量克隆
[edit | edit source]启用此功能的选项
- -rose:outline:temp_variable 使用临时变量减少要传递的变量的指针解引用
此功能的目的是减少代码块中的指针解引用,以便可以更轻松地对代码块进行优化。转换将使用局部变量来获取值,然后使用局部变量来参与计算。之后,局部变量的值将传回指针值。
示例
// input code
#include <stdio.h>
#include <stdlib.h>
const char *abc_soups[10] = {("minstrone"), ("french onion"), ("Texas chili"), ("clam chowder"), ("potato leek"), ("lentil"), ("white bean"), ("chicken noodle"), ("pho"), ("fish ball")};
int main (void)
{
// split variable declarations with their initializations, as a better demo for the outliner
const char *soupName;
int value;
#pragma rose_outline
{
value = rand();
soupName = abc_soups[value % 10];
}
printf ("Here are your %d, %s soup\n", value, soupName);
return 0;
}
// without variable cloning
#include <stdio.h>
#include <stdlib.h>
const char *abc_soups[10] = {("minstrone"), ("french onion"), ("Texas chili"), ("clam chowder"), ("potato leek"), ("lentil"), ("white bean"), ("chicken noodle"), ("pho"), ("fish ball")};
static void OUT__1__12274__(void **__out_argv);
int main()
{
// split variable declarations with their initializations, as a better demo for the outliner
const char *soupName;
int value;
void *__out_argv1__12274__[2];
__out_argv1__12274__[0] = ((void *)(&value));
__out_argv1__12274__[1] = ((void *)(&soupName));
OUT__1__12274__(__out_argv1__12274__);
printf("Here are your %d, %s soup\n",value,soupName);
return 0;
}
static void OUT__1__12274__(void **__out_argv)
{
const char **soupName = (const char **)__out_argv[1];
int *value = (int *)__out_argv[0];
*value = rand(); // pointer dreferencing is used in the computation
*soupName = abc_soups[ *value % 10];
}
// With variable cloning
#include <stdio.h>
#include <stdlib.h>
const char *abc_soups[10] = {("minstrone"), ("french onion"), ("Texas chili"), ("clam chowder"), ("potato leek"), ("lentil"), ("white bean"), ("chicken noodle"), ("pho"), ("fish ball")};
static void OUT__1__12274__(void **__out_argv);
int main()
{
// split variable declarations with their initializations, as a better demo for the outliner
const char *soupName;
int value;
void *__out_argv1__12274__[2];
__out_argv1__12274__[0] = ((void *)(&value));
__out_argv1__12274__[1] = ((void *)(&soupName));
OUT__1__12274__(__out_argv1__12274__);
printf("Here are your %d, %s soup\n",value,soupName);
return 0;
}
static void OUT__1__12274__(void **__out_argv)
{
const char *soupName = *((const char **)__out_argv[1]);
int value = *((int *)__out_argv[0]); // local variable, original type, (not pointer type)
value = rand(); // local variable in computation.
soupName = abc_soups[value % 10];
*((const char **)__out_argv[1]) = soupName;
*((int *)__out_argv[0]) = value;
}
局部变量的类型
452│ SgType* local_type = NULL;
453│ if( SageInterface::is_Fortran_language( ) )
454│ local_type= orig_var_type;
455│ else if( Outliner::temp_variable || Outliner::useStructureWrapper )
456│ // unique processing for C/C++ if temp variables are used
457│ {
458│ if( isPointerDeref || ( !isPointerDeref && is_array_parameter ) )
459│ {
460│ // Liao 3/11/2015. For a parameter of a reference type, we have to specially tweak the unpacking statement
461│ // It is not allowed to create a pointer to a reference type. So we use a pointer to its raw type (stripped reference type) instead.
462│ // use pointer dereferencing for some
463│ if (SgReferenceType* rtype = isSgReferenceType(orig_var_type))
464│ local_type = buildPointerType(rtype->get_base_type());
465│ else
466│ local_type = buildPointerType(orig_var_type);
467│ }
468│ else // use variable clone instead for others
469│ local_type = orig_var_type;
470│ }
471│ else // all other cases: non-fortran, not using variable clones
472│ {
473│ if( is_C_language( ) )
474│ {
475│ // we use pointer types for all variables to be passed
476│ // the classic outlining will not use unpacking statement, but use the parameters directly.
477│ // So we can safely always use pointer dereferences here
478│ local_type = buildPointerType( orig_var_type );
479│ }
480│ else // C++ language
481│ // Rich's idea was to leverage C++'s reference type: two cases:
482│ // a) for variables of reference type: no additional work
483│ // b) for others: make a reference type to them
484│ // all variable accesses in the outlined function will have
485│ // access the address of the by default, not variable substitution is needed
486│ {
487| local_type = isSgReferenceType( orig_var_type ) ? orig_var_type
488│ : SgReferenceType::createType( orig_var_type );
489│ }
490│ }
Transform.cc:收集变量
std::set<SgInitializedName*> restoreVars;
calculateVariableRestorationSet (syms, readOnlyVars,liveOuts,restoreVars);
dlopen
[edit | edit source]use_dlopen 选项告诉轮廓器使用 dlopen() 查找并调用存储在动态加载库中的轮廓函数。
此选项将打开其他几个选项(在 Outliner.cc Outliner::validateSettings() 中)
- -rose:outline:exclude_headers
- useNewFile= true;
- useParameterWrapper = true;
- temp_variable = true;
编译和链接说明:假设输入文件是 ft.c
- outline -rose:outline:use_dlopen -I/home/liao6/workspace/outliner/build/../sourcetree/projects/autoTuning -c /path/to/ft.c
- 此步骤将生成两个文件
- rose_ft.c:原始 ft.c 文件被转换为此文件
- rose_ft_lib.c(轮廓函数位于共享库文件中)
- 从 rose_ft_lib.c 构建 .so 文件
- gcc -I. -g -fPIC -c rose_ft_lib.c
- gcc -g -shared rose_ft_lib.o -o rose_ft_lib.so
- cp rose_ft_lib.so /tmp/.
- 将所有内容链接在一起
- 目标文件应与 libautoTuning.a 链接,libautoTuning.a 由 projects/autoTuning/autotuning_lib.c 构建,而 autotuning_lib.c 又定义了 findFunctionUsingDlopen()。
- gcc -o a.out rose_ft.o /roseInstallPath/lib/libautoTuning.a -Wl,--export-dynamic -g -ldl -lm
可以在以下位置找到使用 dlopen 的完整示例
测试
[edit | edit source]ROSE AST 轮廓器有一个专门的测试目录:rose/tests/nonsmoke/functional/roseTests/astOutliningTests
- 一些 C、C++ 和 Fortran 测试输入文件已准备就绪。
- 示例命令行选项在该测试目录中的 Makefile.am 文件中提供。
完整的命令行示例
- /home/liao6/workspace/rose/buildtree/tests/nonsmoke/functional/roseTests/astOutliningTests/outline -rose:outline:use_dlopen -rose:outline:temp_variable -I/home/liao6/workspace/rose/buildtree/../sourcetree/projects/autoTuning -rose:outline:exclude_headers -rose:outline:output_path . -c /home/liao6/workspace/rose/sourcetree/tests/nonsmoke/functional/roseTests/astOutliningTests/array1.c
要触发单个测试,假设输入文件名为 inputFile.c
- make classic_inputFile.c.passed // 经典行为
- make dlopen_inputFile.c.passed // dlopen 功能
如您所见,前缀表示使用轮廓器的不同选项。
示例输入和输出
[edit | edit source]作为独立工具
[edit | edit source]输入文件,使用 pragma 指示要进行轮廓化的代码部分
int main()
{
double n, start=1, total;
double unlucky=0, lucky;
double *number;
scanf("%lf",&n);
total = 9;
for(int j =1; j < n; j++)
{
total = total * 10;
start = start *10;
}
number = (double*)malloc(n * sizeof(double));
for(double i = start; i < start*10; i++)
{
double temp = i;
#pragma rose_outline
for(int j = 1; j<= n; j++)
{
number[j]=(int)temp%10;
temp = temp/10;
}
for(int k = n; k>=1; k--)
{
if(number[k] == 1 && number[k-1] == 3){
unlucky++;
break;
}
}
}
lucky = total - unlucky;
printf("there are %f lucky integers in %f digits integers", lucky, n);
return 0;
}
//------------output file is
static void OUT__1__2222__(double *np__,double **numberp__,double *tempp__);
int main()
{
double n;
double start = 1;
double total;
double unlucky = 0;
double lucky;
double *number;
scanf("%lf",&n);
total = 9;
for (int j = 1; j < n; j++) {
total = total * 10;
start = start * 10;
}
number = ((double *)(malloc(n * (sizeof(double )))));
for (double i = start; i < start * 10; i++) {
double temp = i;
OUT__1__2222__(&n,&number,&temp);
for (int k = n; k >= 1; k--) {
if (number[k] == 1 && number[k - 1] == 3) {
unlucky++;
break;
}
}
}
lucky = total - unlucky;
printf("there are %f lucky integers in %f digits integers",lucky,n);
return 0;
}
static void OUT__1__2222__(double *np__,double **numberp__,double *tempp__)
{
double *n = (double *)np__;
double **number = (double **)numberp__;
double *temp = (double *)tempp__;
for (int j = 1; j <= *n; j++) {
( *number)[j] = (((int )( *temp)) % 10);
*temp = *temp / 10;
}
}
char* type
[edit | edit source]输入
#include <stdio.h>
#include <stdlib.h>
const char *abc_soups[10] = {("minstrone"), ("french onion"), ("Texas chili"), ("clam chowder"), ("potato leek"), ("lentil"), ("white bean"), ("chicken noodle"), ("pho"), ("fish ball")};
int main (void)
{
// split variable declarations with their initializations, as a better demo for the outliner
int abc_numBowls;
const char *abc_soupName;
int numBowls;
const char *soupName;
#pragma rose_outline
{
abc_numBowls = rand () % 10;
abc_soupName = abc_soups[rand () % 10];
numBowls = abc_numBowls;
soupName = abc_soupName;
}
printf ("Here are your %d bowls of %s soup\n", numBowls, soupName);
printf ("-----------------------------------------------------\n");
return 0;
}
outline --edg:no_warnings -rose:verbose 0 -rose:outline:parameter_wrapper -rose:detect_dangling_pointers 1 -c input.cpp
输出文件
#include <stdio.h>
#include <stdlib.h>
const char *abc_soups[10] = {("minstrone"), ("french onion"), ("Texas chili"), ("clam chowder"), ("potato leek"), ("lentil"), ("white bean"), ("chicken noodle"), ("pho"), ("fish ball")};
static void OUT__1__11770__(void **__out_argv);
int main()
{
// split variable declarations with their initializations, as a better demo for the outliner
int abc_numBowls;
const char *abc_soupName;
int numBowls;
const char *soupName;
void *__out_argv1__11770__[4];
__out_argv1__11770__[0] = ((void *)(&soupName));
__out_argv1__11770__[1] = ((void *)(&numBowls));
__out_argv1__11770__[2] = ((void *)(&abc_soupName));
__out_argv1__11770__[3] = ((void *)(&abc_numBowls));
OUT__1__11770__(__out_argv1__11770__);
printf("Here are your %d bowls of %s soup\n",numBowls,soupName);
printf("-----------------------------------------------------\n");
return 0;
}
static void OUT__1__11770__(void **__out_argv)
{
int &abc_numBowls = *((int *)__out_argv[3]);
const char *&abc_soupName = *((const char **)__out_argv[2]);
int &numBowls = *((int *)__out_argv[1]);
const char *&soupName = *((const char **)__out_argv[0]);
abc_numBowls = rand() % 10;
abc_soupName = abc_soups[rand() % 10];
numBowls = abc_numBowls;
soupName = abc_soupName;
}
使用 C++ 成员函数
[edit | edit source]输入代码
int a;
class B
{
private:
int b;
inline void foo(int c)
{
#pragma rose_outline
b = a+c;
}
};
输出代码
- 添加轮廓函数的友元声明,以便它可以访问私有类成员
- 将此指针作为函数参数传递给类对象
int a;
static void OUT__1__2386__(int *cp__,void *this__ptr__p__);
class B
{
public: friend void ::OUT__1__2386__(int *cp__,void *this__ptr__p__);
private: int b;
inline void foo(int c)
{
// //A declaration for this pointer
class B *this__ptr__ = this;
OUT__1__2386__(&c,&this__ptr__);
}
}
;
static void OUT__1__2386__(int *cp__,void *this__ptr__p__)
{
int &c = *((int *)cp__);
class B *&this__ptr__ = *((class B **)this__ptr__p__);
this__ptr__ -> b = a + c;
}
使用 -rose:outline:parameter_wrapper,结果会有所不同
- 在调用函数中,所有参数都将被包装到一个指向指针的数组中
- 该数组将在轮廓函数中解包以检索参数
int a;
static void OUT__1__2391__(void **__out_argv);
class B
{
public: friend void ::OUT__1__2391__(void **__out_argv);
private: int b;
inline void foo(int c)
{
// //A declaration for this pointer
class B *this__ptr__ = this;
void *__out_argv1__1527__[2];
__out_argv1__1527__[0] = ((void *)(&this__ptr__));
__out_argv1__1527__[1] = ((void *)(&c));
OUT__1__2391__(__out_argv1__1527__);
}
}
;
static void OUT__1__2391__(void **__out_argv)
{
int &c = *((int *)__out_argv[1]);
class B *&this__ptr__ = *((class B **)__out_argv[0]);
this__ptr__ -> b = a + c;
}
用于 OpenMP 实现
[edit | edit source]在 ROSE_Compiler_Framework/OpenMP_Support 中查看更多信息。
以下是翻译示例
/*a test C program. You can replace this content with yours, within 20,000 character limit (about 500 lines) . */
#include<stdio.h>
#include<stdlib.h>
int main(int argc, char* argv[])
{
int nthreads, tid;
#pragma omp parallel private(nthreads, tid)
{
tid = omp_get_thread_num();
printf("Hello World from thread = %d ", tid);
if(tid == 0)
{
nthreads = omp_get_num_threads();
printf("Number of threads = %d", nthreads);
}
}
return 0;
}
//------------- output code --------------
/*a test C program. You can replace this content with yours, within 20,000 character limit (about 500 lines) . */
#include<stdio.h>
#include<stdlib.h>
#include "libxomp.h"
static void OUT__1__2231__(void *__out_argv);
int main(int argc,char *argv[])
{
int status = 0;
XOMP_init(argc,argv);
int nthreads;
int tid;
XOMP_parallel_start(OUT__1__2231__,0,1,0,"/tmp/test-20191219_224253-113680.c",8);
XOMP_parallel_end("/tmp/test-20191219_224253-113680.c",17);
XOMP_terminate(status);
return 0;
}
static void OUT__1__2231__(void *__out_argv)
{
int _p_nthreads;
int _p_tid;
_p_tid = omp_get_thread_num();
printf("Hello World from thread = %d ",_p_tid);
if (_p_tid == 0) {
_p_nthreads = omp_get_num_threads();
printf("Number of threads = %d",_p_nthreads);
}
}
用于为 OpenMP 4.x 生成 CUDA 内核
[edit | edit source]经典 Jacobi OpenMP 4.0 版本的示例输入和输出代码
//--------------input--------------
void jacobi( )
{
REAL omega;
int i,j,k;
REAL error,resid,ax,ay,b;
// double error_local;
// float ta,tb,tc,td,te,ta1,ta2,tb1,tb2,tc1,tc2,td1,td2;
// float te1,te2;
// float second;
omega=relax;
/*
* Initialize coefficients */
ax = 1.0/(dx*dx); /* X-direction coef */
ay = 1.0/(dy*dy); /* Y-direction coef */
b = -2.0/(dx*dx)-2.0/(dy*dy) - alpha; /* Central coeff */
error = 10.0 * tol;
k = 1;
// An optimization on top of naive coding: promoting data handling outside the while loop
// data properties may change since the scope is bigger:
#pragma omp target data map(to:n, m, omega, ax, ay, b, f[0:n][0:m]) map(tofrom:u[0:n][0:m]) map(alloc:uold[0:n][0:m])
while ((k<=mits)&&(error>tol))
{
error = 0.0;
/* Copy new solution into old */
#pragma omp target map(to:n, m, u[0:n][0:m]) map(from:uold[0:n][0:m])
#pragma omp parallel for private(j,i) collapse(2)
for(i=0;i<n;i++)
for(j=0;j<m;j++)
uold[i][j] = u[i][j];
#pragma omp target map(to:n, m, omega, ax, ay, b, f[0:n][0:m], uold[0:n][0:m]) map(from:u[0:n][0:m])
#pragma omp parallel for private(resid,j,i) reduction(+:error) collapse(2) // nowait
for (i=1;i<(n-1);i++)
for (j=1;j<(m-1);j++)
{
resid = (ax*(uold[i-1][j] + uold[i+1][j])\
+ ay*(uold[i][j-1] + uold[i][j+1])+ b * uold[i][j] - f[i][j])/b;
u[i][j] = uold[i][j] - omega * resid;
error = error + resid*resid ;
}
...
/* Error check */
if (k%500==0)
printf("Finished %d iteration with error =%f\n",k, error);
error = sqrt(error)/(n*m);
k = k + 1;
} /* End iteration loop */
printf("Total Number of Iterations:%d\n",k);
printf("Residual:%E\n", error);
printf("Residual_ref :%E\n", resid_ref);
printf ("Diff ref=%E\n", fabs(error-resid_ref));
assert (fabs(error-resid_ref) < 1E-13);
}
//----------------output-----------------
#include "libxomp.h"
#include "xomp_cuda_lib_inlined.cu"
...
__global__ void OUT__1__8714__(float omega,float ax,float ay,float b,int __final_total_iters__2__,int __i_interval__3__,float *_dev_per_block_error,float *_dev_u,float *_dev_f,float *_dev_uold)
{
int _p_i;
int _p_j;
float _p_error;
_p_error = 0;
float _p_resid;
int _p___collapsed_index__5__;
int _dev_lower;
int _dev_upper;
int _dev_loop_chunk_size;
int _dev_loop_sched_index;
int _dev_loop_stride;
int _dev_thread_num = getCUDABlockThreadCount(1);
int _dev_thread_id = getLoopIndexFromCUDAVariables(1);
XOMP_static_sched_init(0,__final_total_iters__2__ - 1,1,1,_dev_thread_num,_dev_thread_id,&_dev_loop_chunk_size,&_dev_loop_sched_index,&_dev_loop_stride);
while(XOMP_static_sched_next(&_dev_loop_sched_index,__final_total_iters__2__ - 1,1,_dev_loop_stride,_dev_loop_chunk_size,_dev_thread_num,_dev_thread_id,&_dev_lower,&_dev_upper))
for (_p___collapsed_index__5__ = _dev_lower; _p___collapsed_index__5__ <= _dev_upper; _p___collapsed_index__5__ += 1) {
_p_i = _p___collapsed_index__5__ / __i_interval__3__ * 1 + 1;
_p_j = _p___collapsed_index__5__ % __i_interval__3__ * 1 + 1;
_p_resid = (ax * (_dev_uold[(_p_i - 1) * 512 + _p_j] + _dev_uold[(_p_i + 1) * 512 + _p_j]) + ay * (_dev_uold[_p_i * 512 + (_p_j - 1)] + _dev_uold[_p_i * 512 + (_p_j + 1)]) + b * _dev_uold[_p_i * 512 + _p_j] - _dev_f[_p_i * 512 + _p_j]) / b;
_dev_u[_p_i * 512 + _p_j] = _dev_uold[_p_i * 512 + _p_j] - omega * _p_resid;
_p_error = _p_error + _p_resid * _p_resid;
}
xomp_inner_block_reduction_float(_p_error,_dev_per_block_error,6);
}
...
void jacobi()
{
float omega;
int i;
int j;
int k;
float error;
float resid;
float ax;
float ay;
float b;
// double error_local;
// float ta,tb,tc,td,te,ta1,ta2,tb1,tb2,tc1,tc2,td1,td2;
// float te1,te2;
// float second;
omega = relax;
/*
* Initialize coefficients */
/* X-direction coef */
ax = (1.0 / (dx * dx));
/* Y-direction coef */
ay = (1.0 / (dy * dy));
/* Central coeff */
b = (- 2.0 / (dx * dx) - 2.0 / (dy * dy) - alpha);
error = (10.0 * tol);
k = 1;
/* Translated from #pragma omp target data ... */
{
xomp_deviceDataEnvironmentEnter();
float *_dev_u;
int _dev_u_size = sizeof(float ) * n * m;
_dev_u = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)u),_dev_u_size,1,1)));
float *_dev_f;
int _dev_f_size = sizeof(float ) * n * m;
_dev_f = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)f),_dev_f_size,1,0)));
float *_dev_uold;
int _dev_uold_size = sizeof(float ) * n * m;
_dev_uold = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)uold),_dev_uold_size,0,0)));
while(k <= mits && error > tol){
int __i_total_iters__0__ = (n - 1 - 1 - 1 + 1) % 1 == 0?(n - 1 - 1 - 1 + 1) / 1 : (n - 1 - 1 - 1 + 1) / 1 + 1;
int __j_total_iters__1__ = (m - 1 - 1 - 1 + 1) % 1 == 0?(m - 1 - 1 - 1 + 1) / 1 : (m - 1 - 1 - 1 + 1) / 1 + 1;
int __final_total_iters__2__ = 1 * __i_total_iters__0__ * __j_total_iters__1__;
int __i_interval__3__ = __j_total_iters__1__ * 1;
int __j_interval__4__ = 1;
int __collapsed_index__5__;
int __i_total_iters__6__ = (n - 1 - 0 + 1) % 1 == 0?(n - 1 - 0 + 1) / 1 : (n - 1 - 0 + 1) / 1 + 1;
int __j_total_iters__7__ = (m - 1 - 0 + 1) % 1 == 0?(m - 1 - 0 + 1) / 1 : (m - 1 - 0 + 1) / 1 + 1;
int __final_total_iters__8__ = 1 * __i_total_iters__6__ * __j_total_iters__7__;
int __i_interval__9__ = __j_total_iters__7__ * 1;
int __j_interval__10__ = 1;
int __collapsed_index__11__;
error = 0.0;
/* Copy new solution into old */
{
xomp_deviceDataEnvironmentEnter();
float *_dev_u;
int _dev_u_size = sizeof(float ) * n * m;
_dev_u = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)u),_dev_u_size,1,0)));
float *_dev_uold;
int _dev_uold_size = sizeof(float ) * n * m;
_dev_uold = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)uold),_dev_uold_size,0,1)));
/* Launch CUDA kernel ... */
int _threads_per_block_ = xomp_get_maxThreadsPerBlock();
int _num_blocks_ = xomp_get_max1DBlock(__final_total_iters__8__ - 1 - 0 + 1);
OUT__2__8714__<<<_num_blocks_,_threads_per_block_>>>(__final_total_iters__8__,__i_interval__9__,_dev_u,_dev_uold);
xomp_deviceDataEnvironmentExit();
}
{
xomp_deviceDataEnvironmentEnter();
float *_dev_u;
int _dev_u_size = sizeof(float ) * n * m;
_dev_u = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)u),_dev_u_size,0,1)));
float *_dev_f;
int _dev_f_size = sizeof(float ) * n * m;
_dev_f = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)f),_dev_f_size,1,0)));
float *_dev_uold;
int _dev_uold_size = sizeof(float ) * n * m;
_dev_uold = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)uold),_dev_uold_size,1,0)));
/* Launch CUDA kernel ... */
int _threads_per_block_ = xomp_get_maxThreadsPerBlock();
int _num_blocks_ = xomp_get_max1DBlock(__final_total_iters__2__ - 1 - 0 + 1);
float *_dev_per_block_error = (float *)(xomp_deviceMalloc(_num_blocks_ * sizeof(float )));
OUT__1__8714__<<<_num_blocks_,_threads_per_block_,(_threads_per_block_ * sizeof(float ))>>>(omega,ax,ay,b,__final_total_iters__2__,__i_interval__3__,_dev_per_block_error,_dev_u,_dev_f,_dev_uold);
error = xomp_beyond_block_reduction_float(_dev_per_block_error,_num_blocks_,6);
xomp_freeDevice(_dev_per_block_error);
xomp_deviceDataEnvironmentExit();
}
// }
/* omp end parallel */
/* Error check */
if (k % 500 == 0) {
printf("Finished %d iteration with error =%f\n",k,error);
}
error = (sqrt(error) / (n * m));
k = k + 1;
/* End iteration loop */
}
xomp_deviceDataEnvironmentExit();
}
printf("Total Number of Iterations:%d\n",k);
printf("Residual:%E\n",error);
printf("Residual_ref :%E\n",resid_ref);
printf("Diff ref=%E\n",(fabs((error - resid_ref))));
fabs((error - resid_ref)) < 1E-14?((void )0) : __assert_fail("fabs(error-resid_ref) < 1E-14","jacobi-ompacc-opt2.c",236,__PRETTY_FUNCTION__);
}
在 ROSE_Compiler_Framework/OpenMP_Acclerator_Model_Implementation 中查看详细信息
列表
- 当将 Outliner::useStructureWrapper 设置为 true 时,会出现“副作用分析错误!”的消息。这在教程目录中的 outlineIfs 示例中也会发生。
- 如果您的翻译器仍然有效,您可以忽略此警告消息。如果启用了 Outliner::useStructureWrapper,则外围程序在内部会使用一些分析。但有些分析可能并不总是能够处理所有情况,因此它们只是放弃并通知外围程序。外围程序的设计目的是在这种情况下做出保守的决定,并生成不太理想的翻译代码。
一篇描述 AST 外围程序内部机制的论文,如果您恰好将 AST 外围程序用于您的研究工作,则建议您引用这篇论文。
- Chunhua Liao,Daniel J. Quinlan,Richard Vuduc 和 Thomas Panas。2009 年。有效地进行源代码到源代码的轮廓化以支持全程序经验优化。在第 22 届并行计算语言和编译器国际会议(LCPC'09)论文集
支持为 CPU 和 GPU 生成多线程内核
- Chunhua Liao,Daniel J. Quinlan,Thomas Panas,Bronis R. de Supinski,基于 ROSE 的 OpenMP 3.0 研究编译器,支持多种运行时库,第 6 届 OpenMP 超越循环级并行性的国际会议论文集:加速器、任务和更多,2010 年 6 月 14 日至 16 日,日本筑波
- C. Liao,Y. Yan,B. R. de Supinski,D. J. Quinlan 和 B. Chapman,“OpenMP 加速器模型的早期体验”,《低功耗设备和加速器时代的 OpenMP》,施普林格出版社,2013 年,第 84-98 页。
用于支持经验性调整或自动调整
- Shirley Moore,计算化学应用程序代码的重构和自动性能调整,冬季模拟会议论文集,2012 年 12 月 09 日至 12 日,德国柏林
- Nicholas Chaimov,Scott Biersdorff,Allen D Malony,基于机器学习的经验性自动调整和专门化工具,国际高性能计算应用杂志,第 27 卷第 4 期,第 403-411 页,2013 年 11 月