ROSE 编译器框架/通用数据流框架
随着 ROSE 项目的进行,我们收集了相当多的数据流分析版本。维护和使用它们很痛苦,因为它们
- 重复迭代不动点算法,
- 分散在不同的目录中,
- 使用不同的结果表示形式,以及
- 具有不同的成熟度和鲁棒性。
一项正在进行的工作是将所有数据流分析工作整合到一个单一的框架中。
快速事实
- 原始作者:Greg Bronevetsky
- 代码管理员:廖春华
- 文档
- 源代码:./src/midend/programAnalysis/genericDataflow 下的文件
- 测试:tests/roseTests/programAnalysisTests/generalDataFlowAnalysisTests
列表
- 常量传播
- 支配者分析:dominatorAnalysis.h dominatorAnalysis.C
- 活跃/死变量分析,或活性分析:liveDeadVarAnalysis.h liveDeadVarAnalysis.C
- 指针分析
函数和节点状态是运行数据流分析的两个必需参数
它们一起存储在函数状态内部 //functionState.h
functionState.h
genericDataflow/cfgUtils/CallGraphTraverse.h
函数的抽象,在内部连接到 SgFunctionDeclaration *decl
声明在 ./src/midend/programAnalysis/genericDataflow/cfgUtils/CallGraphTraverse.h 中
构造函数
- Function::Function(string name) 基于函数名
- Function::Function(SgFunctionDeclaration* sample) // 核心构造函数
- Function::Function(SgFunctionDefinition* sample)
CGFunction* cgFunc; // 调用图函数
Function func(cgFunc);
与 CFG 节点相关的任何信息。
- 它没有数据流的输入/输出概念
- 在数据流分析期间不打算演变
class NodeFact: public printable { public: // returns a copy of this node fact virtual NodeFact* copy() const=0; };
存储有关多个分析及其对应格的信息,用于给定节点(CFG 节点??)
./src/midend/programAnalysis/genericDataflow/state/nodeState.h
它还提供静态函数来
- 为所有 DataflowNode 初始化节点状态
- 为给定的 DataflowNode 检索节点状态
class NodeState { // internal types: map between analysis and set of lattices typedef std::map<Analysis*, std::vector<Lattice*> > LatticeMap; typedef std::map<Analysis*, std::vector<NodeFact*> > NodeFactMap; typedef std::map<Analysis*, bool > BoolMap; // the dataflow information Above the node, for each analysis that // may be interested in the current node LatticeMap dfInfoAbove; // IN set in a dataflow // the Analysis information Below the node, for each analysis that // may be interested in the current node LatticeMap dfInfoBelow; // OUT set in a dataflow, // the facts that are true at this node, for each analysis that // may be interested in the current node NodeFactMap facts; // Contains all the Analyses that have initialized their state at this node. It is a map because // TBB doesn't provide a concurrent set. BoolMap initializedAnalyses; // static interfaces // returns the NodeState object associated with the given dataflow node. // index is used when multiple NodeState objects are associated with a given node // (ex: SgFunctionCallExp has 3 NodeStates: entry, function body, exit) static NodeState* getNodeState(const DataflowNode& n, int index=0); // most useful interface: retrieve the lattices (could be only one) associated with a given analysis // returns the map containing all the lattices from above the node that are owned by the given analysis // (read-only access) const std::vector<Lattice*>& getLatticeAbove(const Analysis* analysis) const; // returns the map containing all the lattices from below the node that are owned by the given analysis // (read-only access) const std::vector<Lattice*>& getLatticeBelow(const Analysis* analysis) const; }
./src/midend/programAnalysis/genericDataflow/state/functionState.h
函数和节点状态的一对。
- 它提供静态函数来初始化所有函数状态并检索函数状态
class FunctionState { friend class CollectFunctions; public: Function func; NodeState state; // The lattices that describe the value of the function's return variables NodeState retState; private: static std::set<FunctionState*> allDefinedFuncs; static std::set<FunctionState*> allFuncs; static bool allFuncsComputed; public: FunctionState(Function &func): func(func), state(/*func.get_declaration()->cfgForBeginning()*/) {} // We should use this interface -------------- // 1. returns a set of all the functions whose bodies are in the project static std::set<FunctionState*>& getAllDefinedFuncs(); // 2. returns the FunctionState associated with the given function // func may be any declared function static FunctionState* getFuncState(const Function& func); ... }
FunctionState* fs = new FunctionState(func); // 从函数状态到节点状态的空
/************************************* *** UnstructuredPassInterAnalysis *** *************************************/ void UnstructuredPassInterAnalysis::runAnalysis() { set<FunctionState*> allFuncs = FunctionState::getAllDefinedFuncs(); // call a static function to get all function state s // Go through functions one by one, call an intra-procedural analysis on each of them // iterate over all functions with bodies for(set<FunctionState*>::iterator it=allFuncs.begin(); it!=allFuncs.end(); it++) { FunctionState* fState = *it; intraAnalysis->runAnalysis(fState->func, &(fState->state)); } } // runs the intra-procedural analysis on the given function, returns true if // the function's NodeState gets modified as a result and false otherwise // state - the function's NodeState bool UnstructuredPassIntraAnalysis::runAnalysis(const Function& func, NodeState* state) { DataflowNode funcCFGStart = cfgUtils::getFuncStartCFG(func.get_definition(),filter); DataflowNode funcCFGEnd = cfgUtils::getFuncEndCFG(func.get_definition(), filter); if(analysisDebugLevel>=2) Dbg::dbg << "UnstructuredPassIntraAnalysis::runAnalysis() function "<<func.get_name().getString()<<"()\n"; // iterate over all the nodes in this function for(VirtualCFG::iterator it(funcCFGStart); it!=VirtualCFG::dataflow::end(); it++) { DataflowNode n = *it; // The number of NodeStates associated with the given dataflow node //int numStates=NodeState::numNodeStates(n); // The actual NodeStates associated with the given dataflow node const vector<NodeState*> nodeStates = NodeState::getNodeStates(n); // Visit each CFG node for(vector<NodeState*>::const_iterator itS = nodeStates.begin(); itS!=nodeStates.end(); itS++) visit(func, n, *(*itS)); } return false; }
示例:检索活性分析的输入格
void getAllLiveVarsAt(LiveDeadVarsAnalysis* ldva, const NodeState& state, set<varID>& vars, string indent)
- LiveVarsLattice* liveLAbove = dynamic_cast<LiveVarsLattice*>(*(state.getLatticeAbove(ldva).begin()));
警告:格与格值
- 根据定义,格是一组值。但是,通用数据流框架中格类型的实例也用于表示格中的单个值。对于此混淆,我们深感抱歉。我们欢迎您提出改进建议。
请参阅 ROSE 编译器框架/格
存储附加到 CFG 节点的数据流分析信息。
基本操作
- 存储什么:格值集、底部、顶部以及介于两者之间的任何内容
- 初始化:LiveDeadVarsAnalysis::genInitState()
- 创建:转移函数
- 相遇操作:格的成员函数
示例
- 活性分析:CFG 节点入口处的活跃变量集
- 常量传播:格值从无信息(底部)-> 未知 -> 常量 -> 过多信息(冲突的常量值,顶部),
// blindly add all of that_arg's values into current lattice's value set void LiveVarsLattice::incorporateVars(Lattice* that_arg) // retrieve a subset lattice information for a given expr. This lattice may contain more information than those about a given expr. Lattice* LiveVarsLattice::project(SgExpression* expr) // add lattice (exprState)information about expr into current lattice's value set: default implementation just calls meetUpdate(exprState) bool LiveVarsLattice::unProject(SgExpression* expr, Lattice* exprState)
该概念基于原始 CFG 流方向
- 上方:传入边方向
- 下方:传出边方向
输入和输出取决于问题的方向,前向与后向
- 前向方向:输入 == 上方格,输出 = 下方格
- 后向方向:输入 == 下方格,输出 = 上方格
该框架提供了一些预定义的格,可供使用。
lattice.h/latticeFull.h
- BoolAndLattice
class LiveVarsLattice : public FiniteLattice { public: std::set<varID> liveVars; // bottom is all live variables, top is the empty set, meet brings down the lattice -> union of variables. ... }; // Meet operation: simplest set union of two lattices: // computes the meet of this and that and saves the result in this // returns true if this causes this to change and false otherwise bool LiveVarsLattice::meetUpdate(Lattice* that_arg) { bool modified = false; LiveVarsLattice* that = dynamic_cast<LiveVarsLattice*>(that_arg); // Add all variables from that to this for(set<varID>::iterator var=that->liveVars.begin(); var!=that->liveVars.end(); var++) { // If this lattice doesn't yet record *var as being live if(liveVars.find(*var) == liveVars.end()) { // this if () statement gives a chance to set the modified flag. // otherwise, liveVars.insert() can be directly called. modified = true; liveVars.insert(*var); } } return modified; }
基础:Data_flow_analysis#flow.2Ftransfer_function
- IN = OUT(所有前驱节点)的并集
- OUT = GEN + (IN - KILL)
程序结构对当前格的影响(如何改变当前格)。
- 格:存储IN和OUT信息
- 传递函数内部需要额外的成员变量来存储GEN和KILL集合。
类层次结构
class IntraDFTransferVisitor : public ROSE_VisitorPatternDefaultBase { protected: // Common arguments to the underlying transfer function const Function &func; // which function are we talking about const DataflowNode &dfNode; // wrapper of CFGNode NodeState &nodeState; // lattice element state, context information? const std::vector<Lattice*> &dfInfo; // data flow information public: IntraDFTransferVisitor(const Function &f, const DataflowNode &n, NodeState &s, const std::vector<Lattice*> &d) : func(f), dfNode(n), nodeState(s), dfInfo(d) { } virtual bool finish() = 0; virtual ~IntraDFTransferVisitor() { } }; class LiveDeadVarsTransfer : public IntraDFTransferVisitor { }; class ConstantPropagationAnalysisTransfer : public VariableStateTransfer<ConstantPropagationLattice> {}
template <class LatticeType> class VariableStateTransfer : public IntraDFTransferVisitor { ... }; class ConstantPropagationAnalysisTransfer : public VariableStateTransfer<ConstantPropagationLattice> {}; void ConstantPropagationAnalysisTransfer::visit(SgIntVal *sgn) { ROSE_ASSERT(sgn != NULL); ConstantPropagationLattice* resLat = getLattice(sgn); ROSE_ASSERT(resLat != NULL); resLat->setValue(sgn->get_value()); resLat->setLevel(ConstantPropagationLattice::constantValue); }
将程序点转换为生成器和KILL集合的函数。用于活跃性分析
- Kill (s) = {在s中被定义的变量}://
- Gen (s) = {在s中被使用的变量}
OUT = IN - KILL + GEN
- OUT初始化为IN集合,
- 传递函数将应用-KILL + GEN
class LiveDeadVarsTransfer : public IntraDFTransferVisitor { LiveVarsLattice* liveLat; // the result of this analysis bool modified; // Expressions that are assigned by the current operation std::set<SgExpression*> assignedExprs; // KILL () set // Variables that are assigned by the current operation std::set<varID> assignedVars; // Variables that are used/read by the current operation std::set<varID> usedVars; // GEN () set public: LiveDeadVarsTransfer(const Function &f, const DataflowNode &n, NodeState &s, const std::vector<Lattice*> &d, funcSideEffectUses *fseu_) : IntraDFTransferVisitor(f, n, s, d), indent(" "), liveLat(dynamic_cast<LiveVarsLattice*>(*(dfInfo.begin()))), modified(false), fseu(fseu_) { if(liveDeadAnalysisDebugLevel>=1) Dbg::dbg << indent << "liveLat="<<liveLat->str(indent + " ")<<std::endl; // Make sure that all the lattice is initialized liveLat->initialize(); } bool finish(); // operationg on different AST nodes void visit(SgExpression *); void visit(SgInitializedName *); void visit(SgReturnStmt *); void visit(SgExprStatement *); void visit(SgCaseOptionStmt *); void visit(SgIfStmt *); void visit(SgForStatement *); void visit(SgWhileStmt *); void visit(SgDoWhileStmt *); } // Helper transfer function, focusing on handling expressions. // live dead variable analysis: LDVA, // expression transfer: transfer functions for expressions /// Visits live expressions - helper to LiveDeadVarsTransfer class LDVAExpressionTransfer : public ROSE_VisitorPatternDefaultBase { LiveDeadVarsTransfer &ldva; public: // Plain assignment: lhs = rhs, set GEN (read/used) and KILL (written/assigned) sets void visit(SgAssignOp *sgn) { ldva.assignedExprs.insert(sgn->get_lhs_operand()); // If the lhs of the assignment is a complex expression (i.e. it refers to a variable that may be live) OR // if is a known expression that is known to may-be-live // THIS CODE ONLY APPLIES TO RHSs THAT ARE SIDE-EFFECT-FREE AND WE DON'T HAVE AN ANALYSIS FOR THAT YET /*if(!isVarExpr(sgn->get_lhs_operand()) || (isVarExpr(sgn->get_lhs_operand()) && liveLat->isLiveVar(SgExpr2Var(sgn->get_lhs_operand())))) { */ ldva.used(sgn->get_rhs_operand()); } ... }
(gdb) bt #0 LDVAExpressionTransfer::visit (this=0x7fffffffcea0, sgn=0xa20320) at ../../../../sourcetree/src/midend/programAnalysis/genericDataflow/simpleAnalyses/liveDeadVarAnalysis.C:228 #1 0x00002aaaac3d9968 in SgAssignOp::accept (this=0xa20320, visitor=...) at Cxx_Grammar.C:143069 #2 0x00002aaaadc61c04 in LiveDeadVarsTransfer::visit (this=0xaf9e00, sgn=0xa20320) at ../../../../sourcetree/src/midend/programAnalysis/genericDataflow/simpleAnalyses/liveDeadVarAnalysis.C:384 #3 0x00002aaaadbbaef0 in ROSE_VisitorPatternDefaultBase::visit (this=0xaf9e00, variable_SgBinaryOp=0xa20320) at ../../../src/frontend/SageIII/Cxx_Grammar.h:316006 #4 0x00002aaaadbba04a in ROSE_VisitorPatternDefaultBase::visit (this=0xaf9e00, variable_SgAssignOp=0xa20320) at ../../../src/frontend/SageIII/Cxx_Grammar.h:315931 #5 0x00002aaaac3d9968 in SgAssignOp::accept (this=0xa20320, visitor=...) at Cxx_Grammar.C:143069 #6 0x00002aaaadbcca0a in IntraUniDirectionalDataflow::runAnalysis (this=0x7fffffffd9f0, func=..., fState=0xafbd18, analyzeDueToCallers=true, calleesUpdated=...) at ../../../../sourcetree/src/midend/programAnalysis/genericDataflow/analysis/dataflow.C:282 #7 0x00002aaaadbbf444 in IntraProceduralDataflow::runAnalysis (this=0x7fffffffda00, func=..., state=0xafbd18) at ../../../../sourcetree/src/midend/programAnalysis/genericDataflow/analysis/dataflow.h:74 #8 0x00002aaaadbb0966 in UnstructuredPassInterDataflow::runAnalysis (this=0x7fffffffda50) at ../../../../sourcetree/src/midend/programAnalysis/genericDataflow/analysis/analysis.C:467 #9 0x000000000040381a in main (argc=2, argv=0x7fffffffdba8) at ../../../../../sourcetree/tests/roseTests/programAnalysisTests/generalDataFlowAnalysisTests/liveDeadVarAnalysisTest.C:101
通用数据流框架在ROSE中工作于虚拟控制流图。
原始的虚拟CFG对于所有类型的分析可能并不理想,因为它可能包含太多与问题无关的管理节点。
因此,框架向Analysis类提供了一个过滤器参数。除非您指定自己的过滤器,否则将使用默认过滤器。
// Example filter funtion deciding if a CFGnNode should show up or not bool gfilter (CFGNode cfgn) { SgNode *node = cfgn.getNode(); switch (node->variantT()) { //Keep the last index for initialized names. This way the def of the variable doesn't propagate to its assign initializer. case V_SgInitializedName: return (cfgn == node->cfgForEnd()); // For function calls, we only keep the last node. The function is actually called after all its parameters are evaluated. case V_SgFunctionCallExp: return (cfgn == node->cfgForEnd()); //For basic blocks and other "container" nodes, keep the node that appears before the contents are executed case V_SgBasicBlock: case V_SgExprStatement: case V_SgCommaOpExp: return (cfgn == node->cfgForBeginning()); // Must have a default case: return interesting CFGNode by default in this example default: return cfgn.isInteresting(); } } // Code using the filter function int main( int argc, char * argv[] ) { SgProject* project = frontend(argc,argv); initAnalysis(project); LiveDeadVarsAnalysis ldva(project); ldva.filter = gfilter; // set the filter to be your own one UnstructuredPassInterDataflow ciipd_ldva(&ldva); ciipd_ldva.runAnalysis(); .... }
关键函数
bool IntraUniDirectionalDataflow::runAnalysis(const Function& func, NodeState* fState, bool analyzeDueToCallers, set<Function> calleesUpdated) // analysis/dataflow.C
基本任务:通过以下方式运行分析
- 初始化数据流状态:格和其他信息
- 遍历CFG:查找当前节点的后继节点
- 调用传递函数
- Analysis -> IntraProceduralAnalysis -> IntraProceduralDataflow -> IntraUnitDataflow --> IntraUniDirectionalDataflow (感兴趣级别) -> IntraBWDataflow -> LiveDeadVarsAnalysis
class Analysis {}; // an empty abstract class for any analysis class IntraProceduralAnalysis : virtual public Analysis //analysis/analysis.h , any intra procedural analysis, data flow or not { protected: InterProceduralAnalysis* interAnalysis; public: void setInterAnalysis(InterProceduralAnalysis* interAnalysis) // connection to inter procedural analysis virtual bool runAnalysis(const Function& func, NodeState* state)=0; // run this per function, NodeState stores lattices for each CFG node, etc. virtual ~IntraProceduralAnalysis(); } //No re-entry. analysis will be executed once??, data flow , intra-procedural analysis // now lattices are interested class IntraProceduralDataflow : virtual public IntraProceduralAnalysis //analysis/dataflow.h { // initialize lattice etc for a given dataflow node within a function virtual void genInitState (const Function& func, const DataflowNode& n, const NodeState& state, std::vector<Lattice*>& initLattices, std::vector<NodeFact*>& initFacts); virtual bool runAnalysis(const Function& func, NodeState* state, bool analyzeDueToCallers, std::set<Function> calleesUpdated)=0; // the analysis on a function could be triggered by the state changes of function's callers, or callees. std::set<Function> visited; // make sure a function is initialized once when visited multiple times } class IntraUnitDataflow : virtual public IntraProceduralDataflow { // transfer function: operate on lattices associated with a dataflow node, considering its current state virtual bool transfer(const Function& func, const DataflowNode& n, NodeState& state, const std::vector<Lattice*>& dfInfo)=0; }; // Uni directional dataflow: either forward or backward, but not both directions! class IntraUniDirectionalDataflow : public IntraUnitDataflow { public: bool runAnalysis(const Function& func, NodeState* state, bool analyzeDueToCallers, std::set<Function> calleesUpdated); protected: bool propagateStateToNextNode ( const std::vector<Lattice*>& curNodeState, DataflowNode curDFNode, int nodeIndex, const std::vector<Lattice*>& nextNodeState, DataflowNode nextDFNode); std::vector<DataflowNode> gatherDescendants(std::vector<DataflowEdge> edges, DataflowNode (DataflowEdge::*edgeFn)() const); virtual NodeState*initializeFunctionNodeState(const Function &func, NodeState *fState) = 0; virtual VirtualCFG::dataflow* getInitialWorklist(const Function &func, bool firstVisit, bool analyzeDueToCallers, const set<Function> &calleesUpdated, NodeState *fState) = 0; virtual vector<Lattice*> getLatticeAnte(NodeState *state) = 0; virtual vector<Lattice*> getLatticePost(NodeState *state) = 0; // If we're currently at a function call, use the associated inter-procedural // analysis to determine the effect of this function call on the dataflow state. virtual void transferFunctionCall(const Function &func, const DataflowNode &n, NodeState *state) = 0; virtual vector<DataflowNode> getDescendants(const DataflowNode &n) = 0; virtual DataflowNode getUltimate(const Function &func) = 0; // ultimate what? final CFG node? }; class IntraBWDataflow : public IntraUniDirectionalDataflow {//BW: Backward public: IntraBWDataflow() {} NodeState* initializeFunctionNodeState(const Function &func, NodeState *fState); VirtualCFG::dataflow* getInitialWorklist(const Function &func, bool firstVisit, bool analyzeDueToCallers, const set<Function> &calleesUpdated, NodeState *fState); virtual vector<Lattice*> getLatticeAnte(NodeState *state); virtual vector<Lattice*> getLatticePost(NodeState *state); void transferFunctionCall(const Function &func, const DataflowNode &n, NodeState *state); vector<DataflowNode> getDescendants(const DataflowNode &n); // next CFG nodes, depending on the direction { return gatherDescendants(n.inEdges(), &DataflowEdge::source); } DataflowNode getUltimate(const Function &func); // the last CFG should be the start CFG of the function for a backward dataflow problem { return cfgUtils::getFuncStartCFG(func.get_definition()); } };
向前过程内数据流分析:例如,到达定义()
- 类IntraFWDataflow : public IntraUniDirectionalDataflow
用于初始化CFG节点的格/事实。它本身就是一个分析。非结构化传递
// super class: provides the driver of initialization: visit each CFG node class UnstructuredPassIntraAnalysis : virtual public IntraProceduralAnalysis { public: // call the initialization function on each CFG node bool runAnalysis(const Function& func, NodeState* state); // to be implemented by InitDataflowState virtual void visit(const Function& func, const DataflowNode& n, NodeState& state)=0; } bool UnstructuredPassIntraAnalysis::runAnalysis(const Function& func, NodeState* state) { DataflowNode funcCFGStart = cfgUtils::getFuncStartCFG(func.get_definition()); DataflowNode funcCFGEnd = cfgUtils::getFuncEndCFG(func.get_definition()); if(analysisDebugLevel>=2) Dbg::dbg << "UnstructuredPassIntraAnalysis::runAnalysis() function "<<func.get_name().getString()<<"()\n"; // iterate over all the nodes in this function for(VirtualCFG::iterator it(funcCFGStart); it!=VirtualCFG::dataflow::end(); it++) { DataflowNode n = *it; // The number of NodeStates associated with the given dataflow node //int numStates=NodeState::numNodeStates(n); // The actual NodeStates associated with the given dataflow node const vector<NodeState*> nodeStates = NodeState::getNodeStates(n); // Visit each CFG node for(vector<NodeState*>::const_iterator itS = nodeStates.begin(); itS!=nodeStates.end(); itS++) visit(func, n, *(*itS)); } return false; } //-------------------- derived class provide link to a concrete analysis, and visit() implementation class InitDataflowState : public UnstructuredPassIntraAnalysis { IntraProceduralDataflow* dfAnalysis; // link to the dataflow analysis to be initialized public: InitDataflowState(IntraProceduralDataflow* dfAnalysis/*, std::vector<Lattice*> &initState*/) { this->dfAnalysis = dfAnalysis; } void visit(const Function& func, const DataflowNode& n, NodeState& state); }; void InitDataflowState::visit (const Function& func, const DataflowNode& n, NodeState& state) { ... dfAnalysis->genInitState(func, n, state, initLats, initFacts); state.setLattices((Analysis*)dfAnalysis, initLats); state.setFacts((Analysis*)dfAnalysis, initFacts); .... }
CFG节点列表,通过迭代器接口访问
auto_ptr<VirtualCFG::dataflow> workList(getInitialWorklist(func, firstVisit, analyzeDueToCallers, calleesUpdated, fState));
class iterator //Declared in cfgUtils/VirtualCFGIterator.h { public: std::list<DataflowNode> remainingNodes; std::set<DataflowNode> visited; bool initialized; protected: // returns true if the given DataflowNode is in the remainingNodes list and false otherwise bool isRemaining(DataflowNode n); // advances this iterator in the given direction. Forwards if fwDir=true and backwards if fwDir=false. // if pushAllChildren=true, all of the current node's unvisited children (predecessors or successors, // depending on fwDir) are pushed onto remainingNodes void advance(bool fwDir, bool pushAllChildren); public: virtual void operator ++ (int); bool eq(const iterator& other_it) const; bool operator==(const iterator& other_it) const; bool operator!=(const iterator& it) const; ... }; void iterator::advance(bool fwDir, bool pushAllChildren) { ROSE_ASSERT(initialized); /*printf(" iterator::advance(%d) remainingNodes.size()=%d\n", fwDir, remainingNodes.size()); cout<<" visited=\n"; for(set<DataflowNode>::iterator it=visited.begin(); it!=visited.end(); it++) cout << " <"<<it->getNode()->class_name()<<" | "<<it->getNode()<<" | "<<it->getNode()->unparseToString()<<">\n";*/ if(remainingNodes.size()>0) { // pop the next CFG node from the front of the list DataflowNode cur = remainingNodes.front(); remainingNodes.pop_front(); if(pushAllChildren) { // find its followers (either successors or predecessors, depending on value of fwDir), push back // those that have not yet been visited vector<DataflowEdge> nextE; if(fwDir) nextE = cur.outEdges(); else nextE = cur.inEdges(); for(vector<DataflowEdge>::iterator it=nextE.begin(); it!=nextE.end(); it++) { DataflowNode nextN((*it).target()/* need to put something here because DataflowNodes don't have a default constructor*/); if(fwDir) nextN = (*it).target(); else nextN = (*it).source(); /*cout << " iterator::advance "<<(fwDir?"descendant":"predecessor")<<": "<< "<"<<nextN.getNode()->class_name()<<" | "<<nextN.getNode()<<" | "<<nextN.getNode()->unparseToString()<<">, "<< "visited="<<(visited.find(nextN) != visited.end())<< " remaining="<<isRemaining(nextN)<<"\n";*/ // if we haven't yet visited this node and don't yet have it on the remainingNodes list if(visited.find(nextN) == visited.end() && !isRemaining(nextN)) { //printf(" pushing back node <%s: 0x%x: %s> visited=%d\n", nextN.getNode()->class_name().c_str(), nextN.getNode(), nextN.getNode()->unparseToString().c_str(), visited.find(nextN)!=visited.end()); remainingNodes.push_back(nextN); } } } // if we still have any nodes left remaining if(remainingNodes.size()>0) { // take the next node from the front of the list and mark it as visited //visited[remainingNodes.front()] = true; visited.insert(remainingNodes.front()); } } } class dataflow : public virtual iterator {}; class back_dataflow: public virtual dataflow {}; void back_dataflow::operator ++ (int) { advance(false, true); // backward, add all children } class IntraUniDirectionalDataflow : public IntraUnitDataflow { ... virtual VirtualCFG::dataflow* getInitialWorklist(const Function &func, bool firstVisit, bool analyzeDueToCallers, const set<Function> &calleesUpdated, NodeState *fState) = 0; }
在派生类中实现
- VirtualCFG::dataflow* IntraFWDataflow::getInitialWorklist ()
- VirtualCFG::dataflow* IntraBWDataflow::getInitialWorklist()
b是CFG中的一个基本块
- // 流入b的信息是b的所有前驱节点流出的信息的并集/连接
- // 流出S的信息是b生成的信息减去b杀死的信息。这就是作用于b的传递函数!!
bool IntraUniDirectionalDataflow::runAnalysis(const Function& func, NodeState* fState, bool analyzeDueToCallers, set<Function> calleesUpdated) { // Iterate over the nodes in this function that are downstream from the nodes added above for(; it != itEnd; it++) { DataflowNode n = *it; SgNode* sgn = n.getNode(); ... for(vector<NodeState*>::const_iterator itS = nodeStates.begin(); itS!=nodeStates.end(); ) { state = *itS; const vector<Lattice*> dfInfoAnte = getLatticeAnte(state); // IN set const vector<Lattice*> dfInfoPost = getLatticePost(state); // OUT set // OUT = IN first // transfer within the node: from IN to OUT, // Overwrite the Lattices below this node with the lattices above this node. // The transfer function will then operate on these Lattices to produce the // correct state below this node. vector<Lattice*>::const_iterator itA, itP; int j=0; for(itA = dfInfoAnte.begin(), itP = dfInfoPost.begin(); itA != dfInfoAnte.end() && itP != dfInfoPost.end(); itA++, itP++, j++) { if(analysisDebugLevel>=1){ // Dbg::dbg << " Meet Before: Lattice "<<j<<": \n "<<(*itA)->str(" ")<<endl; Dbg::dbg << " Meet After: Lattice "<<j<<": \n "<<(*itP)->str(" ")<<endl; } (*itP)->copy(*itA); /*if(analysisDebugLevel>=1){ Dbg::dbg << " Copied Meet Below: Lattice "<<j<<": \n "<<(*itB)->str(" ")<<endl; }*/ } // =================== TRANSFER FUNCTION =================== // (IN - KILL ) + GEN if (isSgFunctionCallExp(sgn)) transferFunctionCall(func, n, state); boost::shared_ptr<IntraDFTransferVisitor> transferVisitor = getTransferVisitor(func, n, *state, dfInfoPost); sgn->accept(*transferVisitor); modified = transferVisitor->finish() || modified; // =================== TRANSFER FUNCTION =================== ...// } }
这被证明对于沿着路径传播信息至关重要。不能注释掉!!
???不确定此步骤与之前的步骤(Meet Before() / Meet After)之间的区别
meetUpdate() 也在这里被调用
// Propagates the dataflow info from the current node's NodeState (curNodeState) to the next node's // NodeState (nextNodeState). // Returns true if the next node's meet state is modified and false otherwise. bool IntraUniDirectionalDataflow::propagateStateToNextNode( const vector<Lattice*>& curNodeState, DataflowNode curNode, int curNodeIndex, const vector<Lattice*>& nextNodeState, DataflowNode nextNode) { bool modified = false; vector<Lattice*>::const_iterator itC, itN; if(analysisDebugLevel>=1){ Dbg::dbg << "\n Propagating to Next Node: "<<nextNode.getNode()<<"["<<nextNode.getNode()->class_name()<<" | "<<Dbg::escape(nextNode.getNode()->unparseToString())<<"]"<<endl; int j; for(j=0, itC = curNodeState.begin(); itC != curNodeState.end(); itC++, j++) Dbg::dbg << " Cur node: Lattice "<<j<<": \n "<<(*itC)->str(" ")<<endl; for(j=0, itN = nextNodeState.begin(); itN != nextNodeState.end(); itN++, j++) Dbg::dbg << " Next node: Lattice "<<j<<": \n "<<(*itN)->str(" ")<<endl; } // Update forward info above nextNode from the forward info below curNode. // Compute the meet of the dataflow information along the curNode->nextNode edge with the // next node's current state one Lattice at a time and save the result above the next node. for(itC = curNodeState.begin(), itN = nextNodeState.begin(); itC != curNodeState.end() && itN != nextNodeState.end(); itC++, itN++) { // Finite Lattices can use the regular meet operator, while infinite Lattices // must also perform widening to ensure convergence. if((*itN)->finiteLattice()) modified = (*itN)->meetUpdate(*itC) || modified; else { //InfiniteLattice* meetResult = (InfiniteLattice*)itN->second->meet(itC->second); InfiniteLattice* meetResult = dynamic_cast<InfiniteLattice*>((*itN)->copy()); Dbg::dbg << " *itN: " << dynamic_cast<InfiniteLattice*>(*itN)->str(" ") << endl; Dbg::dbg << " *itC: " << dynamic_cast<InfiniteLattice*>(*itC)->str(" ") << endl; meetResult->meetUpdate(*itC); Dbg::dbg << " meetResult: " << meetResult->str(" ") << endl; // Widen the resulting meet modified = dynamic_cast<InfiniteLattice*>(*itN)->widenUpdate(meetResult); delete meetResult; } } if(analysisDebugLevel>=1) { if(modified) { Dbg::dbg << " Next node's in-data modified. Adding..."<<endl; int j=0; for(itN = nextNodeState.begin(); itN != nextNodeState.end(); itN++, j++) { Dbg::dbg << " Propagated: Lattice "<<j<<": \n "<<(*itN)->str(" ")<<endl; } } else Dbg::dbg << " No modification on this node"<<endl; } return modified; }
class IntraUniDirectionalDataflow : public IntraUnitDataflow { public: protected: // propagates the dataflow info from the current node's NodeState (curNodeState) to the next node's NodeState (nextNodeState) // return true if any state is modified. bool propagateStateToNextNode( const std::vector<Lattice*>& curNodeState, DataflowNode curDFNode, int nodeIndex, const std::vector<Lattice*>& nextNodeState, DataflowNode nextDFNode); }
向后过程内数据流分析:例如,活跃性分析(使用 --> 向后 --> 定义)
- 类IntraBWDataflow : public IntraUniDirectionalDataflow
class LiveDeadVarsAnalysis : public IntraBWDataflow { protected: funcSideEffectUses* fseu; public: LiveDeadVarsAnalysis(SgProject *project, funcSideEffectUses* fseu=NULL); // Generates the initial lattice state for the given dataflow node, in the given function, with the given NodeState void genInitState(const Function& func, const DataflowNode& n, const NodeState& state, std::vector<Lattice*>& initLattices, std::vector<NodeFact*>& initFacts); boost::shared_ptr<IntraDFTransferVisitor> getTransferVisitor(const Function& func, const DataflowNode& n, NodeState& state, const std::vector<Lattice*>& dfInfo) { return boost::shared_ptr<IntraDFTransferVisitor>(new LiveDeadVarsTransfer(func, n, state, dfInfo, fseu)); } bool transfer(const Function& func, const DataflowNode& n, NodeState& state, const std::vector<Lattice*>& dfInfo) { assert(0); return false; } };
关键:应用于调用点的转移函数,用于在函数边界之间执行适当的状态转移。
void IntraFWDataflow::transferFunctionCall(const Function &func, const DataflowNode &n, NodeState *state) { vector<Lattice*> dfInfoBelow = state->getLatticeBelow(this); vector<Lattice*>* retState = NULL; dynamic_cast<InterProceduralDataflow*>(interAnalysis)-> transfer(func, n, *state, dfInfoBelow, &retState, true); if(retState && !(retState->size()==0 || (retState->size() == dfInfoBelow.size()))) { Dbg::dbg << "#retState="<<retState->size()<<endl; for(vector<Lattice*>::iterator ml=retState->begin(); ml!=retState->end(); ml++) Dbg::dbg << " "<<(*ml)->str(" ")<<endl; Dbg::dbg << "#dfInfoBelow="<<dfInfoBelow.size()<<endl; for(vector<Lattice*>::const_iterator l=dfInfoBelow.begin(); l!=dfInfoBelow.end(); l++) Dbg::dbg << " "<<(*l)->str(" ")<<endl; } // Incorporate information about the function's return value into the caller's dataflow state // as the information of the SgFunctionCallExp ROSE_ASSERT(retState==NULL || retState->size()==0 || (retState->size() == dfInfoBelow.size())); if(retState) { vector<Lattice*>::iterator lRet; vector<Lattice*>::const_iterator lDF; for(lRet=retState->begin(), lDF=dfInfoBelow.begin(); lRet!=retState->end(); lRet++, lDF++) { Dbg::dbg << " lDF Before="<<(*lDF)->str(" ")<<endl; Dbg::dbg << " lRet Before="<<(*lRet)->str(" ")<<endl; (*lDF)->unProject(isSgFunctionCallExp(n.getNode()), *lRet); Dbg::dbg << " lDF After="<<(*lDF)->str(" ")<<endl; } } }
InterProceduralDataflow::InterProceduralDataflow(IntraProceduralDataflow* intraDataflowAnalysis) : InterProceduralAnalysis((IntraProceduralAnalysis*)intraDataflowAnalysis) // !!! NOTE: cfgForEnd() AND cfgForBeginning() PRODUCE THE SAME SgFunctionDefinition SgNode BUT THE DIFFERENT INDEXES // !!! (0 FOR BEGINNING AND 3 FOR END). AS SUCH, IT DOESN'T MATTER WHICH ONE WE CHOOSE. HOWEVER, IT DOES MATTER // !!! WHETHER WE CALL genInitState TO GENERATE THE STATE BELOW THE NODE (START OF THE FUNCTION) OR ABOVE IT // !!! (END OF THE FUNCTION). THE CAPABILITY TO DIFFERENTIATE THE TWO CASES NEEDS TO BE ADDED TO genInitState // !!! AND WHEN IT IS, WE'LL NEED TO CALL IT INDEPENDENTLY FOR cfgForEnd() AND cfgForBeginning() AND ALSO TO MAKE // !!! TO SET THE LATTICES ABOVE THE ANALYSIS
待办事项:函数定义的开始和结束问题在此处提及
最简单的形式:根本没有调用点的转移操作。
class UnstructuredPassInterDataflow : virtual public InterProceduralDataflow { public: UnstructuredPassInterDataflow(IntraProceduralDataflow* intraDataflowAnalysis) : InterProceduralAnalysis((IntraProceduralAnalysis*)intraDataflowAnalysis), InterProceduralDataflow(intraDataflowAnalysis) {} // the transfer function that is applied to SgFunctionCallExp nodes to perform the appropriate state transfers // fw - =true if this is a forward analysis and =false if this is a backward analysis // n - the dataflow node that is being processed // state - the NodeState object that describes the dataflow state immediately before (if fw=true) or immediately after // (if fw=false) the SgFunctionCallExp node, as established by earlier analysis passes // dfInfo - the Lattices that this transfer function operates on. The function propagates them // to the calling function and overwrites them with the dataflow result of calling this function. // retState - Pointer reference to a Lattice* vector that will be assigned to point to the lattices of // the function call's return value. The callee may not modify these lattices. // Returns true if any of the input lattices changed as a result of the transfer function and // false otherwise. bool transfer(const Function& func, const DataflowNode& n, NodeState& state, const std::vector<Lattice*>& dfInfo, std::vector<Lattice*>** retState, bool fw) { return false; } void runAnalysis(); }; // simply call intra-procedural analysis on each function one by one. void UnstructuredPassInterDataflow::runAnalysis() { set<FunctionState*> allFuncs = FunctionState::getAllDefinedFuncs(); // iterate over all functions with bodies for(set<FunctionState*>::iterator it=allFuncs.begin(); it!=allFuncs.end(); it++) { const Function& func = (*it)->func; FunctionState* fState = FunctionState::getDefinedFuncState(func); // Call the current intra-procedural dataflow as if it were a generic analysi intraAnalysis->runAnalysis(func, &(fState->state)); } }
待办事项
直接调用:在给定函数上运行过程内分析,如果函数的节点状态因此被修改则返回 true,否则返回 false 状态 - 函数的节点状态
- bool IntraUniDirectionalDataflow::runAnalysis(const Function& func, NodeState* state, bool analyzeDueToCallers, std::set<Function> calleesUpdated);
- 使用更简单的参数列表进行直接调用:不可行,所有过程内分析都必须在内部设置过程间分析!
bool IntraProceduralDataflow::runAnalysis(const Function& func, NodeState* state) { // Each function is analyzed as if it were called directly by the language's runtime, ignoring // the application's actual call graph bool analyzeDueToCallers = true; // We ignore the application's call graph, so it doesn't matter whether this function calls other functions std::set<Function> calleesUpdated; return runAnalysis(func, state, analyzeDueToCallers, calleesUpdated); }
通过非结构化传递过程间数据流类调用简单的过程内分析。
int main() { SgProject* project = frontend(argc,argv); initAnalysis(project); // prepare debugging support Dbg::init("Live dead variable analysis Test", ".", "index.html"); liveDeadAnalysisDebugLevel = 1; analysisDebugLevel = 1; // basis analysis LiveDeadVarsAnalysis ldva(project); // wrap it inside the unstructured inter-procedural data flow UnstructuredPassInterDataflow ciipd_ldva(&ldva); ciipd_ldva.runAnalysis(); ..... }
示例代码
// Initialize vars to hold all the variables and expressions that are live at DataflowNode n //void getAllLiveVarsAt(LiveDeadVarsAnalysis* ldva, const DataflowNode& n, const NodeState& state, set<varID>& vars, string indent) void getAllLiveVarsAt(LiveDeadVarsAnalysis* ldva, const NodeState& state, set<varID>& vars, string indent) { LiveVarsLattice* liveLAbove = dynamic_cast<LiveVarsLattice*>(*(state.getLatticeAbove(ldva).begin())); LiveVarsLattice* liveLBelow = dynamic_cast<LiveVarsLattice*>(*(state.getLatticeBelow(ldva).begin())); // The set of live vars AT this node is the union of vars that are live above it and below it for(set<varID>::iterator var=liveLAbove->liveVars.begin(); var!=liveLAbove->liveVars.end(); var++) vars.insert(*var); for(set<varID>::iterator var=liveLBelow->liveVars.begin(); var!=liveLBelow->liveVars.end(); var++) vars.insert(*var); }
必须有一种方法来测试分析结果是否正确。
我们目前使用一种原始的方法来测试分析的正确性:比较pragma和格字符串输出。
两个示例翻译器测试分析正确性(比较pragma和格字符串输出)
- https://github.com/rose-compiler/rose/blob/master/tests/roseTests/programAnalysisTests/generalDataFlowAnalysisTests/liveDeadVarAnalysisTest.C
- https://github.com/rose-compiler/rose/blob/master/tests/roseTests/programAnalysisTests/generalDataFlowAnalysisTests/constantPropagationTest.C
活动性分析正确性的示例测试输入文件
- https://github.com/rose-compiler/rose/blob/master/tests/roseTests/programAnalysisTests/generalDataFlowAnalysisTests/test5.C
int bar(int flag) { int a =1,b,c; #pragma rose [LiveVarsLattice: liveVars=[flag, a, b]] if (flag == 0) // flag is only read here, not written! c = a; else c = b; return c; }
打开它
liveDeadAnalysisDebugLevel = 1; analysisDebugLevel = 1; // find code with if(analysisDebugLevel>=1) ...
使用浏览器检查网页转储。
firefox index.html
如何阅读跟踪文件:从头开始:信息根据访问的CFG节点排序。顺序可以是前向或后向顺序。首先检查顺序是否正确,然后检查每个访问的节点。
================================== Copying incoming Lattice 0: [LiveVarsLattice: liveVars=[b]] To outgoing Lattice 0: [LiveVarsLattice: liveVars=[]] ================================== Transferring the outgoing Lattice ... liveLat=[LiveVarsLattice: liveVars=[b]] Dead Expression usedVars=<> assignedVars=<> assignedExprs=<> #usedVars=0 #assignedExprs=0 Transferred: outgoing Lattice 0: [LiveVarsLattice: liveVars=[b]] transferred, modified=0 ================================== Propagating/Merging the outgoing Lattice to all descendant nodes ... Descendants (1): ~~~~~~~~~~~~ Descendant: 0x2b9e8c47f010[SgIfStmt | if(flag == 0) c = a;else c = b;] Propagating to Next Node: 0x2b9e8c47f010[SgIfStmt | if(flag == 0) c = a;else c = b;] Cur node: Lattice 0: [LiveVarsLattice: liveVars=[b]] Next node: Lattice 0: [LiveVarsLattice: liveVars=[a]] Next node's in-data modified. Adding... Propagated: Lattice 0: [LiveVarsLattice: liveVars=[a, b]] propagated/merged, modified=1 ^^^^^^^^^^^^^^^^^^ A real example: if (flag) c = a; else c = b; // liveness analysis, a, b are live in two branches, they are propagated backward to if-stmt ------------------ Descendants (1): // from c =a back to if-stmt (next node) ~~~~~~~~~~~~ Descendant: 0x2ac8bb95c010[SgIfStmt | if(flag == 0) c = a;else c = b;] Propagating to Next Node: 0x2ac8bb95c010[SgIfStmt | if(flag == 0) c = a;else c = b;] Cur node: Lattice 0: [LiveVarsLattice: liveVars=[a]] // current node's lattice Next node: Lattice 0: [LiveVarsLattice: liveVars=[]] // next node's lattice before propagation Next node's in-data modified. Adding... Propagated: Lattice 0: [LiveVarsLattice: liveVars=[a]] // propagate a into if-stmt's lattice propagated, modified=1 ^^^^^^^^^^^^^^^^^^ ------------------ Descendants (1): // from c = b --> if-stmt ~~~~~~~~~~~~ Descendant: 0x2ac8bb95c010[SgIfStmt | if(flag == 0) c = a;else c = b;] Propagating to Next Node: 0x2ac8bb95c010[SgIfStmt | if(flag == 0) c = a;else c = b;] Cur node: Lattice 0: [LiveVarsLattice: liveVars=[b]] Next node: Lattice 0: [LiveVarsLattice: liveVars=[a]] Next node's in-data modified. Adding... Propagated: Lattice 0: [LiveVarsLattice: liveVars=[a, b]] // now both a and b are propagated/ merged propagated, modified=1 ^^^^^^^^^^^^^^^^^^
提供了一个类analysisStatesToDot来生成带有格信息的CFG点图。
//AnalysisDebuggingUtils.C class analysisStatesToDOT : public UnstructuredPassIntraAnalysis { private: // LiveDeadVarsAnalysis* lda; // reference to the source analysis Analysis* lda; // reference to the source analysis void printEdge(const DataflowEdge& e); // print data flow edge void printNode(const DataflowNode& n, std::string state_string); // print date flow node void visit(const Function& func, const DataflowNode& n, NodeState& state); // visitor function public: std::ostream* ostr; analysisStatesToDOT (Analysis* l): lda(l){ }; }; namespace Dbg { //.... void dotGraphGenerator (::Analysis *a) { ::analysisStatesToDOT eas(a); IntraAnalysisResultsToDotFiles upia_eas(eas); upia_eas.runAnalysis(); } } // namespace Dbg
// Liao, 12/6/2011 #include "rose.h" #include <list> #include <sstream> #include <iostream> #include <fstream> #include <string> #include <map> using namespace std; // TODO group them into one header #include "genericDataflowCommon.h" #include "VirtualCFGIterator.h" #include "cfgUtils.h" #include "CallGraphTraverse.h" #include "analysisCommon.h" #include "analysis.h" #include "dataflow.h" #include "latticeFull.h" #include "printAnalysisStates.h" #include "liveDeadVarAnalysis.h" int numFails = 0, numPass = 0; //----------------------------------------------------------- int main( int argc, char * argv[] ) { SgProject* project = frontend(argc,argv); initAnalysis(project); // generating index.html for tracing the analysis Dbg::init("Live dead variable analysis Test", ".", "index.html"); liveDeadAnalysisDebugLevel = 1; analysisDebugLevel = 1; LiveDeadVarsAnalysis ldva(project); UnstructuredPassInterDataflow ciipd_ldva(&ldva); ciipd_ldva.runAnalysis(); // Output the dot graph ********************* Dbg::dotGraphGenerator (&ldva); return 0; }
- 难以使用生成的格,因为在格中生成了许多临时表达式对象。但通常用户并不关心它们(常量传播、指针分析)。
- 要查看问题:转到[build64/tests/roseTests/programAnalysisTests/generalDataFlowAnalysisTests]
- 运行make check
- 查看分析的点图转储:run.sh test_ptr4.C_main_0x2b41e651c038_cfg.dot