一、背景
安卓逆向的核心流程(https://bbs.kanxue.com/thread-288487.htm)里面分享了安卓逆向技能树的“树干”。“树干”上有多个“树枝”,其中有一个就是“混淆还原”,用来还原复杂难懂的代码。然后这个“树枝”上又有多个“树叶”,接下来分享的就是里面的“扁平化还原”。
大多OLLVM扁平化还原的核心思路是基于:块关系。
这篇文章从另外一个角度进行还原:状态机。
样本来自:OLLVM控制流平坦化混淆还原(https://bbs.kanxue.com/thread-286151.htm),目标函数:.init_proc
二、原理
先说一下对OLLVM扁平化的理解
1、示例
正常一个函数,会有多个基本块,逻辑清晰
intdemo(int x) {
if (x > 0) { // 入口块
return 1; // true分支块
} else {
return -1; // false分支块
}
}
函数通过OLLVM扁平化之后,逻辑变得复杂
int demo_flattened(int x) {
uint32_t state = 0xINIT; // 初始状态常量
int ret = 0;
dispatcher: // 分发器
switch (state) {
case 0xINIT: {
// 对应原来的入口 + 条件判断
if (x > 0) {
state = 0xS_TRUE; // 真分支状态常量
} else {
state = 0xS_FALSE; // 假分支状态常量
}
goto dispatcher;
}
case 0xS_TRUE: {
// 对应原函数的 true 分支:return 1;
ret = 1;
state = 0xS_RET; // 跳到“返回”状态
goto dispatcher;
}
case 0xS_FALSE: {
// 对应原函数的 false 分支:return -1;
ret = -1;
state = 0xS_RET; // 跳到“返回”状态
goto dispatcher;
}
case 0xS_RET: {
return ret;
}
}
}
2、理解
所谓的扁平化,本质上是用状态机的方式对所有基本块进行重新调度。
原本直观的层次关系被抹除,所有基本块扁平成一排,从而提高静态分析的成本。
更形象地理解:一本书本来是章节结构清晰的,扁平化后每一段后面写着下一段的编号,要去找到这个编号才能串起来。
这里先梳理一下状态机的调度过程,后续还原要用到:
state == STATE_INIT // 1、初始化状态:
dispatcher: // 分发器
if (state == STATE_XXX){ //2、比较状态
do something; //3、命中状态
state = STATE_YYY; //4、改变状态
goto dispatcher; //5、返回分发器
} else if (state == STATE_YYY){
...
}
三、还原
1、找到分发器

因为改变状态之后,都会回到分发器:goto dispatcher,这就意味着dispatcher引用有很多个
这个函数只有一个集中返回的点,这里简单粗暴定为引用最多次数就是dispatcher。
对应可以写一个找dispatcher的脚本
def find_dispatcher_ea(start_ea, end_ea):
counts = {}
ea = start_ea
while ea < end_ea:
insn = ida_ua.insn_t()
size = ida_ua.decode_insn(insn, ea)
if size > 0:
mnem = ida_ua.print_insn_mnem(ea)
if mnem == "B":
# 统计所有 B loc_xxx 的次数
target = idc.get_operand_value(ea, 0)
if target != idc.BADADDR:
counts[target] = counts.get(target, 0) + 1
ea += size
else:
ea += 1
if not counts:
return None
# 返回统计次数最多的
most_common = max(counts.items(), key=lambda x: x[1])
return most_common[0]
# 调用
dispatcher_ea = find_dispatcher_ea(start_ea, end_ea)
print("分发器地址: dispatcher_ea=0x{:X}".format(dispatcher_ea))
# 输出:
分发器地址: dispatcher_ea=0x43120
0x43120对应有29个引用

2、找到状态寄存器
状态寄存器存放状态值,分发器根据状态值进行调度,举个例子:
MOV W9, #0xD23AD535
CMP W8, W9
B.NE loc_43120 // dispatcher_ea
等价
if (state_reg != #0xD23AD535) goto loc_43120 // dispatcher_ea
在这里“状态寄存器”state_reg就是CMP W8, W9里面的W8
对应可以写一个找状态寄存的脚本:
def find_state_reg(start_ea, end_ea, dispatcher_ea):
ea = start_ea
while ea < end_ea:
insn = ida_ua.insn_t()
size = ida_ua.decode_insn(insn, ea)
if size > 0:
# MOV other_reg, cur_state
# CMP state_reg, other_reg
# B.NE loc_dispatcher
mnem = ida_ua.print_insn_mnem(ea)
if mnem == "B.NE" and idc.get_operand_value(ea, 0) == dispatcher_ea :
prev_ea = idc.prev_head(ea)
state_reg = idc.print_operand(prev_ea, 0).strip().upper()
return state_reg
ea += size
else:
ea += 1
return None
#调用:
state_reg = find_state_reg(start_ea, end_ea, dispatcher_ea)
print("状态寄存器: state_reg={}".format(state_reg))
#输出:
状态寄存器: state_reg=W8
3、找到改变状态
调度过程都是:改变状态+返回分发器
返回分发器直接找:B dispatcher_ea
改变状态有2种情况:
第1种:直接修改
MOV state_reg, next_state
B dispatcher_ea
第2种:分支修改
MOV true_reg, true_state
MOV false_reg, false_state
CSEL state_reg, true_reg, false_reg, cond
B dispatcher_ea
相当:
state_reg = cond ? true_state: false_state
B dispatcher_ea
对应的可以写一个寻找改变状态块的脚本
def find_block_ea_2_next_state_list(start_ea, end_ea, dispatcher_ea, state_reg):
results = []
ea = start_ea
while ea < end_ea:
insn = ida_ua.insn_t()
size = ida_ua.decode_insn(insn, ea)
if size > 0:
mnem = ida_ua.print_insn_mnem(ea)
# B dispatcher_ea
if mnem == "B" and idc.get_operand_value(ea, 0) == dispatcher_ea :
block_ea = ea
prev_ea = idc.prev_head(block_ea)
mnem = ida_ua.print_insn_mnem(prev_ea)
# 第1种:直接修改
# MOV state_reg, next_state
# B dispatcher_ea
if (mnem == "MOV" or mnem == "MOVK") and idc.print_operand(prev_ea, 0).strip().upper() == state_reg:
reg = idc.print_operand(prev_ea, 0).strip().upper()
next_state = find_prev_mov_imm(block_ea, reg, start_ea)
results.append({
"block_ea": block_ea,
"next_state": next_state
})
# 第2种:分支修改
# MOV true_reg, true_state
# MOV false_reg, false_state
# CSEL state_reg, true_reg, false_reg, cond
# B dispatcher_ea
# 相当
# state_reg = cond ? true_state: false_state
# B dispatcher_ea
elif mnem == "CSEL" and idc.print_operand(prev_ea, 0).strip().upper() == state_reg:
true_reg = idc.print_operand(prev_ea, 1).strip().upper() # 真分支寄存器
false_reg = idc.print_operand(prev_ea, 2).strip().upper() # 假分支寄存器
true_state = find_prev_mov_imm(prev_ea, true_reg, start_ea) # 真分支状态
false_state = find_prev_mov_imm(prev_ea, false_reg, start_ea) # 假分支状态
results.append({
"block_ea": block_ea,
"true_state": true_state,
"false_state": false_state,
})
ea += size
else:
ea += 1
return results
# 调用
block_ea_2_next_state_list = find_block_ea_2_next_state_list(start_ea, end_ea, dispatcher_ea, state_reg)
for block_ea_2_next_state in block_ea_2_next_state_list:
if "next_state" in block_ea_2_next_state:
print("block_ea=0x{:X} -> next_state=0x{:X}".format(
block_ea_2_next_state["block_ea"], block_ea_2_next_state["next_state"]))
else :
print("block_ea=0x{:X} -> true_state=0x{:X}, false_state=0x{:X}".format(
block_ea_2_next_state["block_ea"],
block_ea_2_next_state["true_state"] if block_ea_2_next_state["true_state"] is not None else 0,
block_ea_2_next_state["false_state"] if block_ea_2_next_state["false_state"] is not None else 0))
# 输出
block_ea=0x430DC -> next_state=0x665797A5
block_ea=0x43194 -> true_state=0x4E30550D, false_state=0xBEE4A4C9
block_ea=0x431EC -> true_state=0xA9D4543B, false_state=0xC7AC1F5F
block_ea=0x43244 -> next_state=0xEC74B33D
block_ea=0x43284 -> next_state=0x5338AB80
block_ea=0x432BC -> next_state=0x146E0C87
block_ea=0x43324 -> next_state=0x3455F111
block_ea=0x43360 -> next_state=0x37117E76
block_ea=0x433C4 -> next_state=0xBC34C1D0
block_ea=0x433D4 -> next_state=0xBEE4A4C9
block_ea=0x433F4 -> true_state=0xF5C370CA, false_state=0x667521E4
block_ea=0x43408 -> next_state=0x89EFF5EA
block_ea=0x4341C -> next_state=0x7986A6FB
block_ea=0x43434 -> true_state=0xE4DBC33F, false_state=0x667521E4
block_ea=0x43444 -> next_state=0xF5C370CA
block_ea=0x43454 -> next_state=0x9FAB5B41
block_ea=0x43470 -> next_state=0xC80CBB41
block_ea=0x4348C -> next_state=0xC7AC1F5F
block_ea=0x434A8 -> next_state=0x39649A15
block_ea=0x434C8 -> true_state=0xBD9FBBA, false_state=0x5338AB80
block_ea=0x434E8 -> true_state=0x146E0C87, false_state=0x1B166FED
# 示例
block_ea=0x43194 -> true_state=0x4E30550D, false_state=0xBEE4A4C9
对应的汇编:
0x4317C MOV W8, #0xA4C9
0x43180 MOV W9, #0x550D
0x43184 CMP X0, #0
0x43188 MOVK W8, #0xBEE4,LSL#16 # false_state
0x4318C MOVK W9, #0x4E30,LSL#16 # true_state
0x43190 CSEL W8, W9, W8, EQ
0x43194 B loc_43120 # block_ea
4、找到命中状态
先比较状态,再执行基本块,具体有2种情况:
第1种:没命中状态就继续调度,否则执行基本块
MOV other_reg, cur_state
CMP state_reg, other_reg
B.NE dispatcher_ea
相当:
if (state != cur_state) goto dispatcher_ea
block_ea // 执行基本块
第2种:命中状态,跳转到基本块
MOV other_reg, cur_state
CMP state_reg, other_reg
B.EQ block_begin_ea
相当:
if (state == cur_state) goto block_ea // 跳到基本块
对应的可以写一个寻找命中状态块的脚本
def find_cur_state_2_block_ea_list(start_ea, end_ea, dispatcher_ea, state_reg):
results = []
ea = start_ea
while ea < end_ea:
insn = ida_ua.insn_t()
size = ida_ua.decode_insn(insn, ea)
if size > 0:
# 第1种:没命中状态就继续调度,否则执行基本块
# MOV other_reg, cur_state
# CMP state_reg, other_reg
# B.NE dispatcher_ea
# 相当:
# if (state != cur_state) goto dispatcher_ea
# block_ea
mnem = ida_ua.print_insn_mnem(ea)
if mnem == "B.NE" and idc.get_operand_value(ea, 0) == dispatcher_ea:
prev_ea = idc.prev_head(ea)
if ida_ua.print_insn_mnem(prev_ea) == "CMP" and idc.print_operand(prev_ea, 0).strip().upper() == state_reg:
other_reg = idc.print_operand(prev_ea, 1).strip().upper()
cur_state = find_prev_mov_imm(prev_ea, other_reg, start_ea)
block_ea = idc.next_head(ea)
results.append({
"cur_state":cur_state,
"block_ea":block_ea
})
# 第2种:命中状态,跳转到基本块
# MOV other_reg, cur_state
# CMP state_reg, other_reg
# B.EQ block_ea
# 相当:
# if (state == cur_state) goto block_ea
elif mnem == "B.EQ":
prev_ea = idc.prev_head(ea)
if ida_ua.print_insn_mnem(prev_ea) == "CMP" and idc.print_operand(prev_ea, 0).strip().upper() == state_reg:
other_reg = idc.print_operand(prev_ea, 1).strip().upper()
cur_state = find_prev_mov_imm(prev_ea, other_reg, start_ea)
block_ea = idc.get_operand_value(ea, 0)
results.append({
"cur_state":cur_state,
"block_ea":block_ea
})
ea += size
else:
ea += 1
return results
# 调用
cur_state_2_block_ea_list = find_cur_state_2_block_ea_list(start_ea, end_ea, dispatcher_ea, state_reg)
for cur_state_2_block_ea in cur_state_2_block_ea_list:
print("cur_state=0x{:X} -> block_ea=0x{:X}".format(cur_state_2_block_ea["cur_state"], cur_state_2_block_ea["block_ea"]))
# 输出
cur_state=0xC7AC1F5F -> block_ea=0x433F8
cur_state=0xC80CBB41 -> block_ea=0x43448
cur_state=0xD23AD535 -> block_ea=0x43168
cur_state=0x3455F111 -> block_ea=0x43490
cur_state=0x37117E76 -> block_ea=0x434CC
cur_state=0x39649A15 -> block_ea=0x431D8
cur_state=0x7986A6FB -> block_ea=0x433D8
cur_state=0x667521E4 -> block_ea=0x43438
cur_state=0x665797A5 -> block_ea=0x43228
cur_state=0xA9D4543B -> block_ea=0x43474
cur_state=0xBC34C1D0 -> block_ea=0x434AC
cur_state=0xBEE4A4C9 -> block_ea=0x43278
cur_state=0xE4DBC33F -> block_ea=0x4340C
cur_state=0xEC74B33D -> block_ea=0x43458
cur_state=0xF5C370CA -> block_ea=0x432B0
cur_state=0xBD9FBBA -> block_ea=0x430E0
cur_state=0x1B166FED -> block_ea=0x43364
cur_state=0x146E0C87 -> block_ea=0x432F0
cur_state=0x4E30550D -> block_ea=0x433C8
cur_state=0x5338AB80 -> block_ea=0x43314
cur_state=0x89EFF5EA -> block_ea=0x43420
cur_state=0x9FAB5B41 -> block_ea=0x43348
# 例子
cur_state=0xC7AC1F5F -> block_ea=0x433F8
对应的汇编:
0x43138 MOV W9, #0xC7AC1F5F # state
0x43140 CMP W8, W9
0x43144 B.EQ loc_433F8 # if (state != cur_state) goto dispatcher_ea
0x43148 MOV W9, #0xC80CBB41 # block_ea
5、还原扁平化
OLLVM扁平化的还原,实际上是去掉分发器,从而还原基本块的关系
1、改变状态:block_ea -> next_state
2、比较状态:cur_state -> block_ea
所以只需要匹配 next_state 和 cur_state 就可以串成线
block_ea -> next_state == cur_state -> block_ea
上面说到基本块的结束有2种情况,所以还原扁平化的脚本要处理2种情况:
def find_next_block_ea(cur_state_2_block_ea_list, next_state):
for cur_state_2_block_ea in cur_state_2_block_ea_list :
if cur_state_2_block_ea["cur_state"] == next_state :
next_block_ea = cur_state_2_block_ea["block_ea"]
return next_block_ea
return None
def unflatten_block_list(block_ea_2_next_state_list, cur_state_2_block_ea_list):
for block_ea_2_next_state in block_ea_2_next_state_list :
if "next_state" in block_ea_2_next_state:
# B dispatcher_ea
# 改成
# B next_block_ea
next_state = block_ea_2_next_state["next_state"]
next_block_ea = find_next_block_ea(cur_state_2_block_ea_list, next_state)
if next_block_ea == None: continue
block_ea = block_ea_2_next_state["block_ea"]
patch_b(block_ea, next_block_ea)
else:
# CSEL state_reg, true_reg, false_reg, cond
# B dispatcher_ea
# 改成
# B.<cond> true_block_ea
# B false_block_ea
true_state = block_ea_2_next_state["true_state"]
false_state = block_ea_2_next_state["false_state"]
true_block_ea = find_next_block_ea(cur_state_2_block_ea_list, true_state)
false_block_ea = find_next_block_ea(cur_state_2_block_ea_list, false_state)
if true_block_ea == None: continue
if false_block_ea == None: continue
block_ea = block_ea_2_next_state["block_ea"]
patch_ifelse(block_ea, true_block_ea, false_block_ea)
# 调用
unflatten_block_list(block_ea_2_next_state_list, cur_state_2_block_ea_list)
还原之前:逻辑复杂,总共181行

还原之后:逻辑清晰,总共53行,是原来的29%

四、总结
基于状态机的角度进行还原,关键的步骤如下:
1、找到分发器地址:dispatcher_ea
2、找到状态寄存器:state_reg
3、找到改变状态值:block_ea :state_reg = next_state
4、找到命中状态值:if state_reg == cur_state : block_ea
5、进行扁平化还原:block_ea -> next_state == cur_state -> block_ea
实战过程遇到的情况会复杂很多,比如:
1、同个函数里面存在多个分发器;
2、“存放状态的寄存器”和“比较状态的寄存器”不是同一个;
3、比较状态不是直接判断值是否相等,而是用等价的表达式。
如果代码写得好,一键还原还挺爽的。
PS:块关系的角度,是先对所有基本块进行分类,再找出真实块之间的关联。但真实块之间的关联需要执行才知道(真机trace、模拟运行或者符号执行)。
看雪ID:GhHei
https://bbs.kanxue.com/user-home-1000535.htm
看雪·第九届安全开发者峰会(SDC 2025)
# 往期推荐
球分享
球点赞
球在看
点击阅读原文查看更多

