ROSE 编译器框架/内联器

ROSE 内联器在函数调用点内联函数。

背景

内联函数是指编译器直接将函数定义中的代码复制到调用函数的代码中，而不是在内存中创建一个单独的指令集。这样，就可以直接用函数体修改后的副本替换函数调用，从而避免了函数调用的性能开销。inline 关键字只是向编译器建议可以进行内联扩展；编译器可以自由地忽略此建议。

用户说明

您必须启用 EDG 5.0 来内联 C++11 代码

--enable-edg_version=5.0

安装工具

按照 https://github.com/rose-compiler/rose/wiki 中的说明配置和构建 ROSE

inlineEverything -c [选项] input.c

用法

这是一个程序转换工具，用于在您的 C/C++ 或 Fortran 代码中内联函数调用。

用法：inlineEverything -c [选项] input.c

可选选项包括

-skip-postprocessing：跳过清理代码的后处理
-process-headers：处理头文件中的调用
-verbose：打印调试信息
-limit N：内联最多 N 个函数，然后停止
-main-only：仅内联从 main() 可到达的函数

源代码

API 和实现

// main API
bool doInline(SgFunctionCallExp*, bool)

https://github.com/rose-compiler/rose/tree/develop/src/midend/programTransformation/astInlining

该工具的源代码

https://github.com/rose-compiler/rose/tree/develop/tests/nonsmoke/functional/roseTests/astInliningTests/inlineEverything.C
- 命令行处理在此源文件中进行

算法

216 // Main inliner code.  Accepts a function call as a parameter, and inlines
217 // only that single function call.  Returns true if it succeeded, and false
218 // otherwise.  The function call must be to a named function, static member
219 // function, or non-virtual non-static member function, and the function
220 // must be known (not through a function pointer or member function
221 // pointer).  Also, the body of the function must already be visible.
222 // Recursive procedures are handled properly (when allowRecursion is set), by
223 // inlining one copy of the procedure into itself.  Any other restrictions on
224 // what can be inlined are bugs in the inliner code.
225 bool
226 doInline(SgFunctionCallExp* funcall, bool allowRecursion)

主要步骤

资格检查：跳过无法内联的内容
如果函数调用用作表达式操作数：例如 a = func1() + func2();
- 生成一个临时变量以获取返回值：例如 temp = func1();
- 将函数调用表达式替换为临时变量。例如 a = temp + temp;
- 一个轻微的优化：如果函数调用是唯一的表达式操作数：例如 a= func1()。不需要临时变量（a 可以直接使用，无需另一个临时变量作为中间变量。）
获取实际参数列表
复制要内联的函数的函数体
重命名内联函数定义中的标签。指向它们的 goto 语句将被更新。
在函数体内
- 为每个形式参数创建一个局部变量，用实际参数初始化每个局部变量
- 构建一个 paramMap：将形式参数（SgInitializedName）映射到新的局部变量（SgVariableSymbol）
- this 指针的处理方式类似：创建一个局部变量，用调用者的 this 指针初始化
- 将函数体中的变量引用替换为实际参数 // ReplaceParameterUseVisitor(paramMap).traverse(funbody_copy, postorder);
- 插入一个标签以指示内联函数体末尾 // rose_inline_end__

该算法的局限性不是非常干净

它生成新的局部变量和标签。

资格检查

可以内联的内容

一个命名函数，
静态成员函数，或
限定名称不以 "::std:: " 开头 // 跳过 std:: 函数
非虚拟非静态成员函数 // 跳过虚拟函数，静态成员函数无法访问 this->data（非静态数据）。这就是我们检查非静态以获取 this 指针情况的原因。
该函数必须已知（不是通过函数指针或成员函数指针）。// 空函数引用表达式
该函数的函数体必须已经在当前 AST 中可见。// 跳过函数体为空的函数

后处理

inlineEverything.C 有一个清理步骤，可以使概述的代码更完善

cleanupInlinedCode(sageProject);
- 删除未使用的标签：SageInterface::removeUnusedLabels(top);
- 删除跳转到下一条语句：SageInterface::removeJumpsToNextStatement(top);
- simpleCopyAndConstantPropagation() // 在代码中，例如 "int foo = bar"，其中 foo 和 bar 未修改，用 "bar" 替换 "foo" 并删除声明
- 删除空语句：RemoveNullStatementsVisitor().traverse(top, postorder);
- 将声明移动到首次使用：MoveDeclarationsToFirstUseVisitor().traverse(top, postorder); 例如 int x__2 =7; w= x_2 +3; ==> w=7+3;
- doSubexpressionExpansionSmart () // 用其初始化表达式替换变量的所有使用情况。要求 initname 具有赋值初始化程序用其初始化表达式的副本替换 initname 作用域中 initname 的所有使用情况。然后删除 initname。
changeAllMembersToPublic(sageProject);

此步骤做了很多事情，很容易引发一些错误。

更糟糕的是，清理步骤作用于整个 AST，包括 C++ 头文件。

 // Post-inline AST normalizations

 // DQ (6/12/2015): These functions first renames all variable (a bit heavy handed for my tastes)
 // and then (second) removes the blocks that are otherwise added to support the inlining.  The removal
 // of the blocks is the motivation for renaming the variables, but the variable renaming is
 // done evarywhere instead of just where the functions are inlined.  I think the addition of
 // the blocks is a better solution than the overly agressive renaming of variables in the whole
 // program.  So the best solution is to comment out both of these functions.  All test codes
 // pass (including the token-based unparsing tests).
 // renameVariables(sageProject);
 // flattenBlocks(sageProject);

    cleanupInlinedCode(sageProject);

// In code with declarations such as "int foo = bar", where foo and bar are
// not modified, replace "foo" with "bar" and remove the declaration

void simpleCopyAndConstantPropagation(SgNode* top) {
  FindReferenceVariablesVisitor().traverse(top, preorder);
  FindCopiesVisitor().traverse(top, preorder);
  FindUsedDeclarationsVisitor vis;
  vis.traverse(top, preorder);
  RemoveUnusedDeclarationsVisitor(vis.used_decls, set<SgFunctionDeclaration*>()).traverse(top, postorder);
}

测试

带有示例转换器和测试输入文件的测试目录

https://github.com/rose-compiler/rose/tree/develop/tests/nonsmoke/functional/roseTests/astInliningTests

查看 Makefile.am，示例转换器的源代码将在您的构建树中生成一个名为 "inlineEverything" 的可执行文件。

转换器：inlineEverything

这是您可以尝试内联示例代码的工具。

inlineEverything

相同的 Makefile.am 中的 make check 规则包含用于使用该工具的示例命令行。

要测试单个输入文件（例如 template_functions.C），请键入

make inlineEverything_template_functions.C.passed // TODO: 更新为触发测试的新方法

命令行选项

inlineEverything --help

---------------------Tool-Specific Help-----------------------------------
This is a program transformation tool to inline function calls in your C/C++ or Fortran code.
Usage: inlineEverything -c [options] input.c


The optional options include: 
 -skip-postprocessing: Skip postprocessing which cleanups code
 -process-headers:     Process calls within header files
 -verbose:            Printout debugging information

后处理

--------------input----------------
bash-4.2$ cat specimen25_1.C
 template<typename T>
 void swap(T& x, T& y)
 {
   T tmp = x;
   x = y;
   y = tmp;
 }
 
int foo (int a, int b)
{
   swap(a,b);
}
 
int main()
{
}
 
----- command line  -------------
bash-4.2$ inlineEverything -c specimen26_1.C
 
-----------output: with postprocessing (cleanup) --------------
bash-4.2$ cat rose_specimen25_1.C 
template < typename T >
 void swap ( T & x, T & y )
 {
   T tmp = x;
   x = y;
   y = tmp;
 }
 
int foo(int a,int b)
{
{
    int tmp = a;
    a = b;
    b = tmp;
  }
}
 
int main()
{
}



// output without postprocessing: cleanup
template < typename T >
 void swap ( T & x, T & y )
 {
   T tmp = x;
   x = y;
   y = tmp;
 }

int foo(int a,int b)
{
{
    int &x__2 = a;
    int &y__3 = b;
    int tmp = x__2;
    x__2 = y__3;
    y__3 = tmp;
    rose_inline_end__4:
    ;
  }
}

int main()
{
}

正确性检查

make check 只检查内联函数的数量是否与预期相同。

# Note: must use the name convention of specimenXX_N.C , in which N is the number of function calls inlined.   
# The specimens are named so that the number between the "_" and next "." is the number of function calls that
# we expect this specimen to inline.
inlineEverything_specimens =                    \
        specimen01_1.C   ..




inlineEverything_test_targets = $(addprefix inlineEverything_, $(addsuffix .passed, $(inlineEverything_specimens)))
TEST_TARGETS += $(inlineEverything_test_targets)
$(inlineEverything_test_targets): inlineEverything_%.passed: % inlineEverything inlineEverything.conf
        @$(RTH_RUN)                                                                                             \
                TITLE="inlineEverything $< [$@]"                                                                \
                SPECIMEN="$(abspath $<)"                                                                        \
                NINLINE="$$(echo $(notdir $<) |sed --regexp-extended 's/specimen[0-9]+_([0-9]+).*/\1/')"        \
                TRANSLATOR="$$(pwd)/inlineEverything"                                                           \
                $(srcdir)/inlineEverything.conf $@


 cat inlineEverything.conf 
# Test configuration file (see "scripts/rth_run.pl --help" for details)
# Tests the inliner

# Run the tests in subdirectories for ease of cleanup.
subdir = yes

# Run the test and then make sure the output contains a certain string
cmd = ${VALGRIND} ${TRANSLATOR} -rose:verbose 0 ${SPECIMEN} -o a.out |tee ${TEMP_FILE_0}
cmd = grep "Test inlined ${NINLINE} function" ${TEMP_FILE_0}
cmd = cat -n rose_*
cmd = ./a.out

# Extra stuff that might be useful to specify in the makefile
title = ${TITLE}
disabled = ${DISABLED}
timeout = ${TIMEOUT}

示例

我们使用一系列越来越复杂的示例来解释所使用的内联算法。

更多示例输入和输出文件可在此处获得：

https://github.com/chunhualiao/inliner-demo

裸调用：没有输入参数或返回输出

extern int x;

void incrementX()
   {
      x++;
   }

int main()
   {
     incrementX();
     return x;
   }


//----------output, without postprocessing --------

extern int x;

void incrementX()
{
  x++;
}

int main()
{
   // the function body is copied here
  {              
    x++;
    rose_inline_end__2:   // a label for the end of a function is generated. 
    ;
  }
  
  return x;
}

//-----------output, with postprocessing for clean up
// unused label and empty statement are removed

extern int x;

void incrementX()
{
  x++;
}

int main()
{
   // the function body is copied here
  {              
    x++;
  }
  
  return x;
}

具有返回值的函数

// a function with a return
extern int x;


int incrementX()
{
  x++;
  return x; 
}

int main()
{
  incrementX();
  return x;
}

//---------- output without postprocessing

// a function with a return
extern int x;

int incrementX()
{
  x++;
  return x;
}

int main()
{
  {
    x++;
    {                    // a return statement is translated into a block, go to the exit point in the end
      x;
      goto rose_inline_end__2;
    }
    rose_inline_end__2:  // a label for the end of the function: the exit point
    ;
  }
  return x;
}

//-------- with postprocessing, the code look the same 

// a function with a return
extern int x;

int incrementX()
{
  x++;
  return x;
}

int main()
{
  {
    x++;
    {
      goto rose_inline_end__2;
    }
rose_inline_end__2:
    ;
  }
  return x;
}

函数调用作为两个表达式

// input code -----------------------
int foo(int i) {
  return 5+i;
}

int main(int, char**) {
  int w;
  w = foo(1)+ foo(2);
  return w;
}

//--------------after inlining-----------------
// You can see that a temparory variable is used to capture the returned value of a function call.
// Then the temp variable is used to replace the original function call expression
int foo(int i)
{
  return 5 + i;
}

int main(int ,char **)
{
  int w;
  int rose_temp__4;
  {
    int i__2 = 1;
    {
      rose_temp__4 = 5 + i__2;
      goto rose_inline_end__3;
    }
    rose_inline_end__3:
    ;
  }
  int rose_temp__8;
  {
    int i__6 = 2;
    {
      rose_temp__8 = 5 + i__6;
      goto rose_inline_end__7;
    }
    rose_inline_end__7:
    ;
  }
  w = rose_temp__4 + rose_temp__8;
  return w;
}

//----- postprocessing does not simplify the code any further
int foo(int i)
{
  return 5 + i;
}

int main(int ,char **)
{
  int rose_temp__4;
  {
    {
      rose_temp__4 = 5 + 1;
      goto rose_inline_end__3;
    }
    rose_inline_end__3:
      ;
  }
  int rose_temp__8;
  {
    {
      rose_temp__8 = 5 + 2;
      goto rose_inline_end__7;
    }
rose_inline_end__7:
    ;
  }
  int w = rose_temp__4 + rose_temp__8;
  return w;
}

函数调用作为单个表达式

优化的转换

不要盲目生成一个临时变量来捕获函数调用的值。
而是直接在函数体内重复使用 lhs 变量的原始声明。

int foo(int i) {
  return 5+i;
}

int main(int, char**) {
  int w;
  w = foo(1);
  return w;
}


//-------------after inlining ----------

int foo(int i)
{
  return 5 + i;
}

int main(int ,char **)
{
  int w;
  {
    int i__2 = 1;
    {
      w = 5 + i__2;
      goto rose_inline_end__3;
    }
rose_inline_end__3:
    ;
  }
  return w;
}


//postprocessing does not simplify the code further.
int foo(int i)
{
  return 5 + i;
}

int main(int ,char **)
{
  int w;
  {
    {
      w = 5 + 1;
      goto rose_inline_end__3;
    } 
rose_inline_end__3:
    ;
  } 
  return w;
}

3 操作数运算

代码已标准化。

#include <stdlib.h>

int foo() {
  exit (1);
  return 0;
}

int main(int, char**) {
  int w, x = 7;
  w = x == 8 ? foo() : 0;
  return w;
}

//----------- after inlining ---------------

#include <stdlib.h>

int foo()
{
  exit(1);
  return 0;
}

int main(int ,char **)
{
  int w;
  int x = 7;
  if (x == 8) {
    int rose_temp__4;
    {
      exit(1);
      {
        rose_temp__4 = 0;
        goto rose_inline_end__2;
      }
rose_inline_end__2:
      ;
    }
    w = rose_temp__4;
  }
  else {
    w = 0;
  }
  return w;
}

数据成员访问函数

#include <vector>
typedef int    Index_t ;

struct Domain
{
  public:
    // non-reference type
    Index_t  numNode()            { return m_numNode ; }

    void AllocateNodeElemIndexes()
    {
      Index_t numNode = this->numNode() ;
    }

#if 0  // the best inline result should look like the following
    void AllocateNodeElemIndexes_inlined()
    {
      Index_t numNode = m_numNode; // call site 1 inlined
    }
#endif

  private:
    Index_t   m_numNode ;
} domain;

//----------------------------after inlining ----------------

#include <vector>

typedef int Index_t;

struct Domain {

  // non-reference type
  inline Index_t numNode()
  {
    return (this) -> m_numNode;
  }

  inline void AllocateNodeElemIndexes()
  {

//x. split declaration + initializer into two parts
// a temporary variable to transfer value of initializer

    Index_t rose_temp__3;

//x. a new code block to embed the function body
    {
      struct Domain *this__1 = this__1;
      {
        rose_temp__3 = this__1 -> m_numNode;

//x. goto the label after function call
        goto rose_inline_end__2;
      }

//x. label after the function call
rose_inline_end__2:
      ;
    }

    Index_t numNode = rose_temp__3;
  }

  Index_t m_numNode;

} domain;

C++ 模板函数

--------------input----------------
bash-4.2$ cat specimen25_1.C
 template<typename T>
 void swap(T& x, T& y)
 {
   T tmp = x;
   x = y;
   y = tmp;
 }
 
int foo (int a, int b)
{
   swap(a,b);
}
 
int main()
{
}
 
----- command line  -------------
bash-4.2$ inlineEverything -c -skip-postprocessing specimen26_1.C
 
-----------output: with postprocessing (cleanup) --------------
// output without postprocessing: cleanup
template < typename T >
 void swap ( T & x, T & y )
 {
   T tmp = x;
   x = y;
   y = tmp;
 }

int foo(int a,int b)
{
{
    int &x__2 = a;    // local variables for each formal arguments, initialized with actual arguments
    int &y__3 = b;

    int tmp = x__2;   // variable references are replace with the local variables
    x__2 = y__3;
    y__3 = tmp;
    rose_inline_end__4:   // a label to indicate the end of the outlined function body. 
    ;
  }
}

int main()
{
}

多级函数调用

//------------input 

int foo(int x) {
  return x + 3;
}

int bar(int y) {
  return foo(y) + foo(2);
}

int main(int, char**) {
  int w;
  w = bar(1);
  return 0;
}

//--------------- output, no postprocessing ------------

int
foo (int x)
{
  return x + 3;
}

int
bar (int y)
{
  int rose_temp__4;
  {
    int x__2 = y;
    {
      rose_temp__4 = x__2 + 3;
      goto rose_inline_end__3;
    }
  rose_inline_end__3:
    ;
  }
  int rose_temp__8;
  {
    int x__6 = 2;
    {
      rose_temp__8 = x__6 + 3;
      goto rose_inline_end__7;
    }
  rose_inline_end__7:
    ;
  }
  return rose_temp__4 + rose_temp__8;
}

int
main (int, char **)
{
  int w;
  {
    int y__10 = 1;
    int rose_temp__4;
    {
      int x__2 = y__10;
      {
	rose_temp__4 = x__2 + 3;
	goto rose_inline_end__3__1;
      }
    rose_inline_end__3__1:
      ;
    }
    int rose_temp__8;
    {
      int x__6 = 2;
      {
	rose_temp__8 = x__6 + 3;
	goto rose_inline_end__7__2;
      }
    rose_inline_end__7__2:
      ;
    }
    {
      w = rose_temp__4 + rose_temp__8;
      goto rose_inline_end__11;
    }
  rose_inline_end__11:
    ;
  }
  return 0;
}

教程

有关如何调用内联 API 的官方文档是

第 36 章“调用 ROSE 的内联器”教程：http://rosecompiler.org/ROSE_Tutorial/ROSE-Tutorial.pdf

故障排除

TEST inlineEverything ../../../../../../sourcetree/tests/nonsmoke/functional/roseTests/astInliningTests/template_functions.C [inlineEverything_template_functions.C.passed]

inlineEverything_template_functions.C [out]: 注意：使用 EDG 4.9 配置和 GNU 编译器 4.9 及更高版本（使用 EDG 4.12 配置 ROSE）不支持 C++11 输入文件到 ROSE。