Motivation

过程内分析遇到函数调用只能做保守假设，导致过多误报，而过程间分析可以跟进被调函数，因此更准确.

Call Graph Construction

A call graph is a set of call edges from call-sites to their target methods (callees)

Call Graph，程序中调用边的集合，调用边从调用点（call-site）出发到被调函数（callee），表示程序的调用关系.

确定函数调用目标（构造调用图）的分析称为控制流分析。

调用图构建算法

调用图构建算法主要有以下四种：

Class hierarchy analysis（CHA）
Rapid type analysis（RTA）
Variable type analysis（VTA）
Pointer analysis（k-CFA）

由上至下分析越来越精确，但分析成本也越来越大

Java调用指令

	静态调用	特殊调用	虚调用
指令	invokestatic	invokespecial	invokeinterface, invokevirtual
是否接受对象实例	N	Y	Y
被调方法	静态方法	构造函数、私有实例函数、父类实例函数	其他实例方法
目标方法个数	1	1	≥1（多态）
绑定时间	编译时	编译时	运行时

可以看到，处理虚调用是构造调用图的关键。

Method Dispatch

在运行时，virtualcall调用（$o^1.foo(\dots)^2$）的函数主要基于两点：

调用者实例对象类型($o^1$ 指针)，记为 $c$
调用点的函数签名，记为$m$

函数签名：包括类名、方法名、以及方法描述符（descriptor），下图为示例，函数签名表示为C.foo(P, Q, R)

可以看到方法描述符包括了函数的返回值类型和参数类型。

定义函数 $\mathrm{Dispatch}(c,m)$，该函数模拟运行时究竟调用哪个函数

$Dispatch(c, m)=\left\{ \begin{array}{ll} m' ,& \text{if } c \text{ contains non-abstract method }m' \\ & \text{that has the same name and descriptor as }m\\ Dispatch(c', m), & \text{otherwise} \end{array}\right.\\ \text{where }c'\text{ is superclass of }c$

即当virtualinvoke时，如果当前类 $c$ 中存在一个非抽象函数 $m’$ ，该函数名称和摘要与$m$相同，那么调用该函数；

否则递归，对 $c$ 的父类调用 $Dispatch$，直到找到该方法。

CHA（Class Hierarchy Analysis）

当调用发生时，解析调用类定义，根据定义类枚举其所有子类，所有子类符合要求的函数都是可能会被调用的函数。

通过检查A的所有子类，寻找被调用函数

具体算法：

def Resolve(cs): # cs为call site
	T = {} # 算法最终输出cs所有可能调用的函数集合
    m = method signature at cs
    # 若 cs 是 staticcall，直接将该函数加入集合
    if cs is a static call:
    	T = { m } 
    # 若 cs 是 specialcall，要处理三种方法，取方法m的类cm，并调用Dispatch()得到方法
    if cs is special call:
        cm = class type of m
        T = { Dispatch(cm, m) }
    # 若 cs 是virtualcall，首先获取包括cs自身类在内的所有子类，在将其dispatch()结果加入集合T中
    if cs is a virtual call:
        c = declared type of receiver variable at cs 
        for each sub_c that is a subclass of c or C itself:
            add Dispatch(sub_c, m) to T
    return T

下图为实际示例，注意，CHA只考虑声明类型，不解析实例，因此在后面B即使实例化为new B()，但是解析仍然有3种，这也看出了CHA的缺陷。

CHA特征

快
- 只考虑声明
- 忽略数据流和控制流
不准
- 导致假目标方法
IDE常用的方法（比如说 IDEA）

用CHA构造整个程序CG

从一些入口函数开始（如main()）
对于每一个到达的方法 $m$，用CHA解析其中每个call sites，即（$Resolve(cs)$）
重复直到没有新的函数需要被发现

可以用DFS或者BFS实现，以下为BFS搜索算法：

BuildCallGraph(m_entry):
	WL = [m_entry] # 工作队列
	CG = {} # 存放callgraph，边的集合
	RM = {} # 可达的方法（book）
	while WL is not empty:
		remove m from WL
	    if m ∉ RM then add m to RM
        # 通过 CHA 解析m()中的所有调用 cs
	    for cs in m:
 	    	T = Resolve(cs)
			for target_method in T:
				add (cs → target_method) to CG # 将调用加入到Callgraph中
				add target_method to WL
return CG

Interprocedural Control-Flow Graph

ICFG（Interprocedural Control-Flow Graph），在CFG基础上加上Call edges和Return edges

Call edges：从调用点（call sites）出发至被调用函数（callees）
Return edges：从被调函数return语句出发至调用点的下一条语句（return sites）

e.g.,

注意在添加两条边后，原先的边并不删去。

Interprocedural Data-Flow Analysis

过程间数据流分析即在 ICFG 上做分析

在 Transfer functions 中除了 CFG 中的 Node transfer 还加了 edge transfer：

Call edge transfer：在 call node 至 callee 的第一个node 上传递数据流（传参数）
Return edge transfer：在 return node 至 return site 上传递控制流（传返回值）

Interprocedural Constant Propagation

过程间数据流分析并没有像过程内分析那样有规范的算法，这里以常量传播为例，解释过程间分析过程：

对于非函数调用语句，仍然用node transfer处理，遇到函数调用（call expression）时：

kill掉左边的变量的值
由 Call edge transfer 将实参传递至callee enter 处，在callee上下文做过程间分析
分析至callee return时，由Return edge transfer将至传回caller上下文

关于是否保留call-to-return edge：

保留call-to-return edge，可以保留caller上下文数据变量，若没有这条边则需要把当前函数上下文变量传进被调函数（如分析 ten() 时，需要传递a, b, c），这是低效的。

注意在call expression时，先把left的值kill掉（如调用ten()是，先把b的值kill，否则会产生冲突）

总结

本节主要讲了过程间分析主要方法，即先生成Callgraph，再合成ICFG进行数据流分析，并且介绍了一个简单的生成CallGraph的方法——CHA，并且可以看到CHA有较大缺陷。