基于Ollvm的混淆对抗方案

上周打了一下腾讯游戏安全的PC初赛,题目不难,算是传统的CTF题目风格,其中有一类简单混淆当时用的是脚本加手搓的办法,花了一点时间基本去掉了混淆。但是也在心里埋下了一颗种子,如果上的是更复杂的混淆,可能这就需要有一个自动化、扩展性强的去混淆方案。加上这几天遇到了一个强混淆的。基于Ollvm的程序,于是写了一个自动化工具,可用于对抗多规则的混淆方案,并总结了一些心得和对抗思路,于是有此文。

基本概念

关于Ollvm的基本概念,可以参考一下下文引用的参考文章,网上基本上都说的很清楚,在这里就不过多介绍。

我简单讲一下我的看法,我认为,Ollvm从狭义角度来讲,他就是一个开源的基于LLVM的混淆器,利用LLVM的pass阶段。对中间指令IR进行混淆,混淆方案包括控制流平坦化指令替换插入垃圾代码数据混淆

但是从广义角度来讲,我觉得它可以理解为一种基于LLVM的代码混淆技术,混淆方案可以定制化,不一定非得采用Ollvm的混淆思路,各大互联网厂商的混淆方案都有所不同,但是采用的技术本质上都是在LLVM的pass阶段对中间指令进行转换,这一点是不变的。

所以是否存在某种方案可以稳定的对抗这种所谓的广义Ollvm混淆呢?以下是我总结的一些对抗方案。

在这里先列举一些我所遇见的一些混淆方式,下文将对这些混淆进行分析和对抗:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
第一类
# .tvm0:000000014000883A 51 push rcx
# .tvm0:000000014000883B 48 B9 20 C4 14 41 01 00 00 00 mov rcx, 14114C420h
# .tvm0:0000000140008845 9C pushfq
# .tvm0:0000000140008846 48 81 C1 31 C4 EB FE add rcx, 0FFFFFFFFFEEBC431h
# .tvm0:000000014000884D 9D popfq
# .tvm0:000000014000884E FF E1 jmp rcx
# .tvm0:0000000140008850 E8 db 0E8h
# .tvm0:0000000140008851 59 pop rcx
第二类
# [0x1400171c1] 4151 => push r9
# [0x1400171c3] e801000000 => call 0x1400171c9
# .......
# [0x1400171c9] 4159 => pop r9
# [0x1400171cb] 4159 => pop r9
第三类
# [0x140016bbc] e802000000 => call 0x140016bc3
# .......
# [0x140016bc3] 488304240b => add qword ptr [0xrsp], 0xb
# [0x140016bc8] c3 => ret
第四类
# [0x140016a4e] 48b8c23c28c0ba34a35a => movabs rax, 0x5aa334bac0283cc2
# [0x140016a58] 48b9c5e8bc420860d52d => movabs rcx, 0x2dd5600842bce8c5
# [0x140016a62] 4809c8 => or rax, rcx
# [0x140016a65] 7502 => jne 0x140016a69
第五类
# [0x140016aa7] e905000000 => jmp 0x140016ab1
# .......
# [0x140016ab1] e8f6ffffff => call 0x140016aac
# .......
# [0x140016aac] c3 => ret
第六类
# [0x140016b1d] 7404 => je 0x140016b23
# [0x140016b1f] 7502 => jne 0x140016b23
第七类
# [0x140016d78] e800000000 => call 0x140016d7d
# .......
# [0x140016d7d] 488304241f => add qword ptr [0xrsp], 0x1f
# [0x140016d82] eb07 => jmp 0x140016d8b
# .......
# [0x140016d8b] 415b => pop r11
# [0x140016d8d] 41ffd3 => call r11
# .......
# [0x140016d9c] 415b => pop r11
第八类
# [0x140016c1a] 9c => pushfq
# [0x140016c1b] e80b000000 => call 0x140016c2b
# .......
# [0x140016c2b] 4883042417 => add qword ptr [0xrsp], 0x17
# [0x140016c30] c3 => ret
# ........
# [0x140016c37] 9d => popfq
第九类
# [0x140016e46] 4989f9 => mov r9, rdi
# [0x140016e49] 4981c9f04c50ab => or r9, 0xffffffffab504cf0
# [0x140016e50] 41c1e90a => shr r9d, 0xa
# [0x140016e54] 4983e101 => and r9, 1
# [0x140016e58] 741c => je 0x140016e76
# [0x140016e5a] 750d => jne 0x140016e69

在IDA里已经不成模样:

image-20250404165135753

基于IDApython和范围的模式匹配

对于一些简单函数,简单混淆,比如一百行汇编到五六百行汇编,尚能利用IDA去混淆后识别到函数,从而在IDA里分析的情况,可以采用基于IDApython和范围的指令特征匹配的去混淆方案。

这种混淆有一种特征,就是利用暂时不需要的寄存器进行无意义的复杂运算,然后近距离跳转,中间插入垃圾指令等来破坏IDA的分析。于是我们可以对这些指令进行基于范围的指令匹配,在允许范围内匹配到所有特征指令,即可定位这类混淆,再通过IDApython在某个特定的地址空间内挨个类型的匹配,将定位到的混淆地址范围存放到列表中,最后对该列表内的所有地址进行fillNop即可。以下是举例分析。

对于第一类混淆:

1
2
3
4
5
6
7
8
# .tvm0:000000014000883A 51                            push    rcx
# .tvm0:000000014000883B 48 B9 20 C4 14 41 01 00 00 00 mov rcx, 14114C420h
# .tvm0:0000000140008845 9C pushfq
# .tvm0:0000000140008846 48 81 C1 31 C4 EB FE add rcx, 0FFFFFFFFFEEBC431h
# .tvm0:000000014000884D 9D popfq
# .tvm0:000000014000884E FF E1 jmp rcx
# .tvm0:0000000140008850 E8 db 0E8h
# .tvm0:0000000140008851 59 pop rcx

指令特征为:

1
2
3
4
5
6
7
8
push r1
...
pushfq
...
popfq
jmp r1
...
pop r1

基于这类特征,我们可以在可接受的范围内匹配push jmp pop指令,并判断操作数寄存器是否相同,从而定位该类混淆。如下是我对这类混淆的去混淆方案,运用IDApython来去定位混淆:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
def getAsmInfo(start,end):
dic = {}
pointer = start
while(pointer <= end):
opcode = print_insn_mnem(pointer)
operand1 = print_operand(pointer,0)
operand2 = print_operand(pointer,1)
size = get_item_size(pointer)
dic[pointer] = [opcode,operand1,operand2,size]
pointer += size
return dic

def deObfuscation(start , end, jmp_push_gapMax = 25, jmp_pop_gapMax = 5):
asmInfo = getAsmInfo(start,end)
obfuscated_blocks = []

for addr in sorted(asmInfo.keys()):
processed_jmp = set()
processed_pop = set()
opcode, op1, _ , _ = asmInfo[addr]
if opcode == 'push':
reg = op1
# 在合理距离内寻找jmp指令
for jmp_addr in sorted(filter(lambda x: x > addr, asmInfo.keys())):
if jmp_addr - addr > jmp_push_gapMax:
break
jmp_op, jmp_op1, _ , _ = asmInfo[jmp_addr]
if jmp_op == 'jmp' and jmp_op1 == reg and jmp_addr not in processed_jmp:
processed_jmp.add(jmp_addr)
for pop_addr in sorted(filter(lambda x: x > jmp_addr, asmInfo.keys())):

if pop_addr - jmp_addr > jmp_pop_gapMax:
break
pop_op, pop_op1, _ , size = asmInfo[pop_addr]
if pop_op == 'pop' and pop_op1 == reg:
processed_pop.add(pop_addr)
obfuscated_blocks.append({
'start': addr,
'end': pop_addr + size,
})
break
elif opcode == 'jmp':
#跳过jmp后的0xE8,0xE9
_ , _ , _ , size = asmInfo[addr]
nextByte = get_bytes(addr + size ,1)[0]
idaapi.create_byte(addr + size ,1)
if nextByte == 0xE8 or nextByte == 0xE9:
idaapi.create_insn(addr + size + 1)

print('[jump convert to code]'
return obfuscated_blocks

解释一下以上代码,首先getAsmInfo函数是接收两个参数,指的是IDA选定范围的起始地址,即要去混淆的代码块,将这些汇编代码存放到一个字典里并返回,格式如下:

1
2
{addr1:[opcode,op1,op2,size],addr2:[opcode,op1,op2,size]}
例如:{0x1000:['mov','rax','0x1',5]}

deObfuscation函数接收四个参数,起始地址,jmp和push之间的允许的地址最大范围,同理第四个参数。

该函数调用getAsmInfo函数获取代码块的该字典结构,然后按照地址大小遍历该字典,先查询到push指令,然后在合理范围内查找jmp指令,查找到后接着查找pop指令,并最终判断操作数op是否相同,如果相同则返回一个列表,该列表存储着被混淆代码块的地址信息,格式如下:

1
2
[{'start':s_addr,'end':e_addr}]
例如:[{'start':0x1000,'end':0x2000}]

基于这个列表,即可对该混淆代码块进行fillNop等去混淆操作,这里就不细说了。对于elif opcode == 'jmp'之后的代码,是为了防止垃圾指令影响IDA对后续正常代码的识别,导致指令匹配失败,从而漏报。采用方式是对0xE8,0xE9等指令进行跳过,然后在正常代码的处进行数据转代码。

再比如第二类:

1
2
3
4
5
# [0x1400171c1] 4151       => push     r9
# [0x1400171c3] e801000000 => call 0x1400171c9
# .......
# [0x1400171c9] 4159 => pop r9
# [0x1400171cb] 4159 => pop r9

同理也是匹配push ,call,pop,pop并判断操作数寄存器是否相同,在call函数后进行范围匹配即可,这里直接给出去混淆脚本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def deObfuscation_0(start , end, asmInfo, call_pop_gapMax = 0x10):
obfuscated_blocks = []
sortedKeys = sorted(asmInfo.keys())
for i,addr in enumerate(sortedKeys):
if i < len(sortedKeys) - 1:#有下一个地址
opcode, op1, _ , _ = asmInfo[addr]
next_addr = sortedKeys[i + 1]
opcode_2, _ , _ , _ = asmInfo[next_addr]
if opcode == 'push' and opcode_2 == 'call':#匹配第一个push,call
for i in range(i+2, i+call_pop_gapMax):#在一定范围内匹配连续两个pop
if i < len(sortedKeys) - 1:
next_addr = sortedKeys[i]
next_addr_2 = sortedKeys[i+1]
opcode_3, op3 , _ , _ = asmInfo[next_addr]
opcode_4, op4 , _ , size = asmInfo[next_addr_2]
if opcode_3 == 'pop' and op3 == op1 and opcode_4 == 'pop' and op4 == op1:
endAddr = next_addr_2 + size
obfuscated_blocks.append({
'start': addr,
'end': endAddr,#该混淆结束地址
})
try:
idaapi.create_byte(endAddr,1)
idaapi.create_insn(endAddr)
except:
print(f'error on {endAddr}')
break
else:
continue
return obfuscated_blocks

对于带ret|retn的混淆,需要计算栈顶的值,从而判断ret后的代码地址rip,比如说第七类混淆:

1
2
3
4
5
# [0x140016aa7] e905000000 => jmp      0x140016ab1
# .......
# [0x140016ab1] e8f6ffffff => call 0x140016aac
# .......
# [0x140016aac] c3 => ret

该类混淆是典型的call ret的方式,需要在call指令时计算rip,由于我们的asmInfo结构中保存了代码长度size,因此只需要计算

1
rip = addr + size

即可,这里直接贴上去该类混淆代码的实现:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def deObfuscation_4(start , end, asmInfo):
sortedKeys = sorted(asmInfo.keys())
for i,addr in enumerate(sortedKeys):
opcode, op, _ , _ = asmInfo[addr]
if opcode == 'jmp':
jmpAddr = parse_simple(op)
try:
opcode_2, op2, _ , size = asmInfo[jmpAddr]
except:
continue
if opcode_2 == 'call' and op2.startswith('nullsub'): #call + retn
endAddr = jmpAddr + size
obfuscated_blocks.append({'start': addr,'end': jmpAddr + size})
try:
idaapi.create_byte(endAddr,1)
idaapi.create_insn(endAddr)
except:
print(f'error on {op}')
return obfuscated_blocks

还有基于复杂算术计算和伪条件跳转的混淆,也可以用这种方式去除,比如第九类混淆:

1
2
3
4
5
6
# [0x140016e46] 4989f9     => mov      r9, rdi
# [0x140016e49] 4981c9f04c50ab => or r9, 0xffffffffab504cf0
# [0x140016e50] 41c1e90a => shr r9d, 0xa
# [0x140016e54] 4983e101 => and r9, 1
# [0x140016e58] 741c => je 0x140016e76
# [0x140016e5a] 750d => jne 0x140016e69

这类混淆往往进行了一些复杂运算,后续跟上条件跳转,实际上都是绝对跳转,至于在je还是jne指令进行跳转,可以手动计算或者unicorn模拟测试,这里也不过多赘述测试方法,总之可以理解为一种绝对跳转,只需要特征指令匹配,然后获取伪条件跳转后的地址作为混淆代码块的end即可,去混淆代码实现:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def deObfuscation_8(start , end, asmInfo,pushfq_popfq_gapMax=0x20):
obfuscated_blocks = []
jmpList = ['jne','je','jz','jnz']
sortedKeys = sorted(asmInfo.keys())
for i,addr in enumerate(sortedKeys):
if i < len(sortedKeys) - 5:
opcode, _, _ , _ = asmInfo[addr]
next_addr = sortedKeys[i + 1]
opcode_2, _ , _ , _ = asmInfo[next_addr]
next_addr_2 = sortedKeys[i + 2]
opcode_3, _ , _ , _ = asmInfo[next_addr_2]
next_addr_3 = sortedKeys[i + 3]
opcode_4, _ , _ , _ = asmInfo[next_addr_3]
next_addr_4 = sortedKeys[i + 4]
opcode_5, op5 , _ , _ = asmInfo[next_addr_4]
next_addr_5 = sortedKeys[i + 5]
opcode_6, op6 , _ , _ = asmInfo[next_addr_5]
if opcode == 'mov' and opcode_2 == 'or' and opcode_3 == 'shr' and opcode_4 == 'and':
if opcode_5 == 'jz' or opcode_5 == 'je':
if opcode_6 == 'jnz' or opcode_6 == 'jne':
jmpAddr = parse_simple(op6)
obfuscated_blocks.append({'start': addr,'end': jmpAddr})
elif opcode_5 == 'jnz' or opcode_5 == 'jne':
jmpAddr = parse_simple(op5)
obfuscated_blocks.append({'start': addr,'end': jmpAddr})
if opcode == 'mov' and opcode_2 == 'and' and opcode_3 == 'shr' and opcode_4 == 'and':
if opcode_5 == 'je' or opcode_5 == 'jz':
jmpAddr = parse_simple(op5)#获取伪条件跳转的跳转地址
endAddr = jmpAddr
obfuscated_blocks.append({'start': addr,'end': endAddr})
try:
idaapi.create_byte(endAddr,1)
idaapi.create_insn(endAddr)
except:
print(f'error on {op5}')
return obfuscated_blocks

去除后截图:

image-20250404165224202

对于这种方案,优点是利用了IDApython,可以直接patch源代码,所以有希望去除混淆后能在IDA里查看反汇编,适用于开发小型插件,去除一些简单的混淆并还原函数,从而可以查看反汇编;缺点是依赖IDA的反编译引擎,对于IDA识别错误的数据和代码需要手动修正,并且对于复杂且多种的混淆,这种方式扩展性不强,需要耗费大量的时间编写代码,且最后由于IDA函数识别问题导致难以查看反汇编,去混淆后发现正确指令之间间隔太远反而影响阅读。

基于Unicorn和规则的模式匹配

对于复杂且多种的混淆,此时IDA的反编译引擎往往会对我们的去混淆产生干扰,此时我们可以利用unicorn+capstone来对混淆代码块进行模拟执行并记录运行到的每条汇编指令,模拟运行结束后针对这些指令进行基于规则的模式匹配,从而记录混淆代码块信息,最后去除这些混淆代码块。

比方说下面这个被混淆的函数:

image-20250404160224254

混淆的方式也在前文的那几类混淆之中,我们直接分析怎么利用unicorn+capstone来分析这个函数。首先,按照以往的思路,一般会对该函数进行模拟执行,然后再hookcode里打印处汇编指令,然后分析汇编指令。好,那我们先模拟执行一遍看看,如下是模拟执行结果:

image-20250404160640119

image-20250404160704728

足足有6000行,直接分析是不大实际的,所以我们要在获取这些指令后进行反混淆,然后再进行分析。

根据之前得到的一些结论,我们可以发现这些混淆代码块的一些特征,然后由于我们这次直接分析的是正确代码运行逻辑,所以不用基于范围的模式匹配从而跳过一些垃圾指令

由于以上的汇编代码都是连续的代码块,比如说这样:

1
2
3
4
5
6
7
8
9
10
11
12
13
[0x140019b4d] e800000000 => call     0x140019b52
[0x140019b52] 4883042410 => add qword ptr [0xrsp], 0x10
[0x140019b57] eb01 => jmp 0x140019b5a
[0x140019b5a] 415b => pop r11
[0x140019b5c] 41ffd3 => call r11
[0x140019b62] 415b => pop r11
[0x140019b64] 7410 => je 0x140019b76
[0x140019b66] 750e => jne 0x140019b76
[0x140019b76] 4c89c9 => mov rcx, r9
[0x140019b79] 4881e19d6c9dcb => and rcx, 0xffffffffcb9d6c9d
[0x140019b80] c1e916 => shr ecx, 0x16
[0x140019b83] 4883e101 => and rcx, 1
[0x140019b87] 7409 => je 0x140019b92

既然混淆的指令是连续的,所以我们可以采用基于规则的模式匹配,就像这样:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
    patterns = [
{
"name": "rule_1",
"rules": [
("push", "r1"),
("call", "*"),
("pop", "r1"),
("pop", "r1")
]
},

{
"name": "rule_2",
"rules": [
("call", "*"),
("add", "*, ??num"), # 匹配任何立即数
("ret", "")
]
},

{
"name": "rule_3",
"rules": [
("movabs|movq", "r1, ??num"),
("movabs|movq", "r2, ??num"),
("or|and|xor|sub|add", "r1, r2"),
("je|jne", "*")
]
},

{
"name": "rule_4",
"rules": [
("jmp", "*"),
("call","*"),
("ret", ""),
]
},

{
"name": "rule_5",
"rules": [
("je|jne", "n1"),
("je|jne", "n1"),
]
},

{
"name": "rule_6",
"rules": [
("call", "*"),
("add", "*, ??num"),
("jmp", "*"),
("pop", "r1"),
("call", "r1"),
("pop", "r1"),
]
},

{
"name": "rule_7",
"rules": [
("pushfq", ""),
("call", "*"),
("add", "*, ??num"),
("ret", ""),
("popfq", ""),
]
},

{
"name": "rule_8",
"rules": [
("mov", "r1, r2"),
("or|and", "r1, ??num"),
("shr", "r?, *"),
("and", "r1, *"),
("je|jne", "*"),
]
},

{
"name": "rule_9",
"rules": [
("mov", "r1, r2"),
("or|and", "r1, ??num"),
("shr", "r?, *"),
("and", "r1, *"),
("je|jne", "*"),
("je|jne", "*"),
]
},

{
"name": "rule_10",
"rules": [
("cvtsi2sd", "*, r?"),
("movq", "r?, *"),
("movabs", "r?, *"),
("sub", "r?, r?"),
("jne", "*"),
]
},

{
"name": "rule_11",
"rules": [
("cvtsi2sd", "*, r?"),
("movq", "r?, *"),
("movabs", "r?, *"),
("sub", "r?, r?"),
("je", "*"),
("jmp", "*"),
]
},

{
"name": "rule_12",
"rules": [
("jmp", "*")
]
},
]

通过unicorn模拟执行+capstone反编译得到的汇编代码信息和编写好的规则,结合模式匹配就可以收集到所有混淆代码块的信息,从而筛选地址输出去混淆后的汇编代码。

实现思路:

  1. unicorn模拟代码块

image-20250404163032201

  1. 在_hook_code里记录执行的汇编信息asmInfo

image-20250404163217400

  1. 对asmInfo进行模式匹配
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
def SovleObfuscation(asmInfo):
patterns = [
{
"name": "rule_1",
"rules": [
("push", "r1"),
("call", "*"),
("pop", "r1"),
("pop", "r1")
]
},

{
"name": "rule_2",
"rules": [
("call", "*"),
("add", "*, ??num"), # 匹配任何立即数
("ret", "")
]
},

{
"name": "rule_3",
"rules": [
("movabs|movq", "r1, ??num"),
("movabs|movq", "r2, ??num"),
("or|and|xor|sub|add", "r1, r2"),
("je|jne", "*")
]
},

{
"name": "rule_4",
"rules": [
("jmp", "*"),
("call","*"),
("ret", ""),
]
},

{
"name": "rule_5",
"rules": [
("je|jne", "n1"),
("je|jne", "n1"),
]
},

{
"name": "rule_6",
"rules": [
("call", "*"),
("add", "*, ??num"),
("jmp", "*"),
("pop", "r1"),
("call", "r1"),
("pop", "r1"),
]
},

{
"name": "rule_7",
"rules": [
("pushfq", ""),
("call", "*"),
("add", "*, ??num"),
("ret", ""),
("popfq", ""),
]
},

{
"name": "rule_8",
"rules": [
("mov", "r1, r2"),
("or|and", "r1, ??num"),
("shr", "r?, *"),
("and", "r1, *"),
("je|jne", "*"),
]
},

{
"name": "rule_9",
"rules": [
("mov", "r1, r2"),
("or|and", "r1, ??num"),
("shr", "r?, *"),
("and", "r1, *"),
("je|jne", "*"),
("je|jne", "*"),
]
},

{
"name": "rule_10",
"rules": [
("cvtsi2sd", "*, r?"),
("movq", "r?, *"),
("movabs", "r?, *"),
("sub", "r?, r?"),
("jne", "*"),
]
},

{
"name": "rule_11",
"rules": [
("cvtsi2sd", "*, r?"),
("movq", "r?, *"),
("movabs", "r?, *"),
("sub", "r?, r?"),
("je", "*"),
("jmp", "*"),
]
},

{
"name": "rule_12",
"rules": [
("jmp", "*")
]
},
]

results = match_patterns(asmInfo, patterns) #模式匹配
filterAddr = []
# 对于每一种规则获取到的混淆代码块信息存放到filterAddr列表
for pattern_name, matches in results.items():
for (start_addr, matched_inst) in matches:
for addr, op, args in matched_inst:
filterAddr.append(addr)
return filterAddr
  1. 最后根据筛选列表进行输出

image-20250404163802167

去混淆后的效果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
0x140016a8b => push,rbp
0x140016b09 => push,r14
0x140016b1c => push,rsi
0x140016b5c => fsubrp,st(5)
0x140016b76 => push,rdi
0x140016b85 => push,rbx
0x140016bcc => sub,rsp, 0x20
0x140016bed => rcr,al, 0x24
0x140016c76 => lea,rbp, [rsp + 0x20]
0x140016cf1 => ror,ecx, 1
0x140016d75 => test,r11, rcx
0x140016e69 => and,rsp, 0xfffffffffffffff0
0x140016ee9 => cmp,cx, 0x5f76
0x140016f23 => test,byte ptr [rdx + 4], 1
0x140016f40 => jne,0x140019636
0x140016f70 => je,0x140016f79
0x140016fb4 => mov,rsi, rdx
0x140017022 => punpckldq,mm1, mm1
0x140017066 => movabs,rax, 0x19c5d475aee585a5
0x1400171a5 => add,rax, qword ptr [rip + 0x24094]
0x1400172e9 => call,qword ptr [rip + 0x22721]
0x1400173da => je,0x1400173e8
0x140017499 => mov,r14, qword ptr [rsi + 8]
0x140017547 => je,0x140017551
0x140017656 => test,r14, r14
0x140017659 => je,0x140019636
0x1400176b0 => psubw,mm6, mm7
0x140017792 => mov,rdi, rax
0x140017822 => maxps,xmm5, xmm0
0x14001786a => movabs,rax, 0x66ffa2df2bcd95b2
0x1400178e7 => je,0x1400178f8
0x1400178f8 => je,0x140017908
0x140017940 => je,0x140017944
0x140017944 => movss,xmm3, xmm4
0x140017959 => je,0x140017964
0x14001797d => add,rax, qword ptr [rip + 0x238c4]
0x1400179a5 => mov,rcx, rdi
0x1400179ef => call,qword ptr [rip + 0x2201b]
0x140017a6c => fucomi,st(4)
0x140017aa9 => mov,rbx, rax
0x140017aea => cmp,ax, -0x61
0x140017b3f => call,0x140017b47
0x140017b47 => add,qword ptr [rsp], 9
0x140017b4c => ret,
0x140017bc1 => movabs,rax, 0x930c27c29917d3ce
0x140017c94 => add,rax, qword ptr [rip + 0x235b5]
0x140017d2c => mov,ecx, 0xb
0x140017d8d => call,qword ptr [rip + 0x21c7d]
0x14001deae => push,r15
0x14001df15 => je,0x14001df22
0x14001df22 => je,0x14001df2a
0x14001df2a => rol,dx, 0x37
0x14001df93 => push,r14
0x14001dfc3 => cmp,eax, edx
0x14001dfd6 => pcmpgtb,mm2, mm4
0x14001dff3 => push,r13
0x14001dfff => push,r12
0x14001e061 => push,rsi
0x14001e0d9 => rcl,al, 1
0x14001e0f8 => pabsw,mm4, mm5
0x14001e10d => push,rdi
0x14001e167 => push,rbx
0x14001e19f => call,0x14001e1a6
0x14001e1a6 => add,qword ptr [rsp], 9
0x14001e1ab => ret,
0x14001e1ad => sub,rsp, 0x20
0x14001e224 => fucomp,st(5)
0x14001e2ff => mpsadbw,xmm5, xmm4, 0x48
0x14001e40b => movabs,rbx, 0xe77c1aa196b0061b
0x14001e440 => add,rbx, qword ptr [rip + 0x1cce9]
0x14001e45a => movsxd,r14, ecx
0x14001e506 => call,0x14001e50b
0x14001e50b => add,qword ptr [rsp], 7
0x14001e510 => ret,
0x14001e512 => mov,rax, qword ptr [rbx + r14*8]
0x14001e669 => test,rax, rax
0x14001e68a => fcom,st(2)
0x14001e68c => je,0x14001e692
0x14001e6b2 => jne,0x14001f9e5
0x14001e723 => lea,rax, [r14*8]
0x14001e769 => mov,rcx, qword ptr [rip + 0x1c9c8]
0x14001e770 => je,0x14001e77f
0x14001e79f => add,rcx, rax
0x14001e7ce => movabs,rdx, 0x66b7936b1c05d2b6
0x14001e7d8 => mov,r15, qword ptr [rdx + rcx]
0x14001e7f7 => fxch,st(6)
0x14001e890 => add,rax, qword ptr [rip + 0x1c8a9]
0x14001e8b6 => movabs,rcx, 0x4ba92650902e11d8
0x14001e94c => mov,r12, qword ptr [rcx + rax]
0x14001ea03 => psllq,mm5, 0x88
0x14001ea35 => lea,rsi, [r12 + 1]
0x14001eadd => shr,dl, cl
0x14001eb17 => movabs,rax, 0x19c5d475aee585a5
0x14001eb85 => add,rax, qword ptr [rip + 0x1c744]
0x14001ec49 => psrld,mm3, 0xa2
0x14001ecdb => mov,rcx, rsi
0x14001ed08 => call,qword ptr [rip + 0x1ad02]
0x140001062 => call,0x140001069
0x140001069 => add,qword ptr [rsp], 9
0x14000106e => ret,
0x140001070 => push,rbp
0x140001071 => paddd,mm5, mm1
0x1400010f1 => sub,rsp, 0x20
0x140001199 => lea,rbp, [rsp + 0x20]
0x140001362 => and,rsp, 0xfffffffffffffff0
0x1400013d1 => mov,rdx, rcx
0x1400013df => psrlq,xmm1, xmm2
0x140001430 => movabs,rax, 0x19c5d475aee585a5
0x1400014c5 => add,rax, qword ptr [rip + 0x39c84]
0x1400014e2 => rdfsbase,r10
0x14000153c => xor,ecx, ecx
0x14000153e => je,0x140001547
0x140001575 => je,0x140001582
0x140001582 => je,0x14000158a
0x14000158a => mov,r8d, 0x4d454d45
0x1400015d8 => shr,r9, 0x75
0x14000160f => paddsb,mm7, mm2
0x14000164e => call,qword ptr [rip + 0x383bc]
0x140001741 => punpckhwd,mm6, mm4
0x140001753 => je,0x140001759
0x140001789 => pmaxub,xmm5, xmm2
0x1400017ac => mov,rsp, rbp
0x140001810 => pop,rbp
0x14000186a => ret,
0x14001ed48 => mov,rdi, rax
0x14001edd2 => psraw,mm2, mm1
0x14001ee06 => je,0x14001ee13
0x14001ee42 => movabs,rax, 0x66ffa2df2bcd95b2
0x14001ee65 => add,rax, qword ptr [rip + 0x1c46c]
0x14001eef7 => xor,ecx, 0x1ff0a0c9
0x14001ef45 => xor,r13d, r13d
0x14001efa1 => fstp,st(7)
0x14001f03e => mov,rcx, rdi
0x14001f05d => je,0x14001f06b
0x14001f083 => movq2dq,xmm1, mm3
0x14001f0cc => xor,edx, edx
0x14001f1d0 => mov,r8, rsi
0x14001f296 => call,qword ptr [rip + 0x1a774]
0x140037b40 => mov,rax, rcx
0x140037b43 => movzx,edx, dl
0x140037b46 => movabs,r9, 0x101010101010101
0x140037b50 => imul,rdx, r9
0x140037b54 => movq,xmm0, rdx
0x140037b59 => movlhps,xmm0, xmm0
0x140037b5c => cmp,r8, 0x40
0x140037b60 => jb,0x140037bd0
0x140037bd0 => cmp,r8, 0x10
0x140037bd4 => jb,0x140037c00
0x140037c00 => cmp,r8, 4
0x140037c04 => jb,0x140037c30
0x140037c06 => lea,r9, [r8 + rcx - 4]
0x140037c0b => and,r8, 8
0x140037c0f => mov,dword ptr [rcx], edx
0x140037c11 => shr,r8, 1
0x140037c14 => mov,dword ptr [r9], edx
0x140037c17 => mov,dword ptr [rcx + r8], edx
0x140037c1b => neg,r8
0x140037c1e => mov,dword ptr [r9 + r8], edx
0x140037c22 => ret,
0x14001f31c => mov,rax, rdi
0x14001f34d => movabs,rcx, 0xd82c9692ecdecfbb
0x14001f357 => nop,word ptr [rax + rax]
0x14001f3de => mov,rdx, qword ptr [rip + 0x1bd63]
0x14001f3f2 => add,rdx, r14
0x14001f438 => aesdec,xmm3, xmm0
0x14001f452 => movzx,edx, byte ptr [rcx + rdx]
0x14001f45b => xor,dl, byte ptr [r15 + r13]
0x14001f48d => mov,r8d, edx
0x14001f4b4 => xor,r8b, 0x58
0x14001f4c4 => orpd,xmm3, xmm4
0x14001f4d7 => paddd,xmm2, xmm2
0x14001f4df => xor,dl, 0xa7
0x14001f50a => mov,r9d, r13d
0x14001f546 => xor,r9b, 0x4b
0x14001f54a => and,r9b, dl
0x14001f56a => mov,edx, r13d
0x14001f56d => movups,xmm0, xmm2
0x14001f59a => xor,dl, 0xb4
0x14001f5a9 => and,dl, r8b
0x14001f638 => xor,dl, r9b
0x14001f703 => xor,dl, 0x13
0x14001f76c => mov,byte ptr [rax + r13], dl
0x14001f84d => inc,r13
0x14001f8c9 => cmp,r12, r13
0x14001f90b => jne,0x14001f360
0x14001f990 => test,dx, dx
0x14001f9e1 => mov,qword ptr [rbx + r14*8], rax
0x14001fa7e => sqrtss,xmm2, xmm5
0x14001fa82 => je,0x14001fa8c
0x14001fa8c => add,rsp, 0x20
0x14001faee => pop,rbx
0x14001fb9d => pop,rdi
0x14001fc01 => pop,rsi
0x14001fcb0 => pop,r12
0x14001fd5d => pop,r13
0x14001fdd2 => pop,r14
0x14001fdf8 => pop,r15
0x14001fe58 => ret,
0x140017e05 => movabs,r9, 0x55c1b67ba997c287
0x140017ec3 => add,r9, qword ptr [rip + 0x2338e]
0x140017eea => pminub,mm6, mm5
0x140017f34 => mov,r8d, 0xd
0x140017f69 => nop,dx
0x140017fcd => mov,rcx, rbx
0x14001804d => mov,rdx, rax
0x140018085 => psrld,xmm3, 1
0x140018110 => mov,rax, r9
0x140018171 => punpcklwd,mm7, mm6
0x1400181ce => call,qword ptr [rip + 0x2183c]
0x14001825f => test,eax, eax
0x140018285 => jne,0x140019636
0x14001828b => je,0x14001829b
0x1400182e3 => cmp,r14, rdi
0x14001830c => jne,0x140019636
0x140019692 => xor,eax, eax
0x140019712 => mov,rsp, rbp
0x1400197eb => pop,rbx
0x140019858 => fcmovne,st(0), st(0)
0x1400198e2 => pop,rdi
0x140019962 => pop,rsi
0x1400199f1 => pop,r14
0x140019a7d => fcompi,st(2)
0x140019aca => pop,rbp
0x140019b49 => psignb,mm5, mm5
0x140019b92 => ret,

除了还有个别的SIMD指令以及控制流(在模拟执行中无用)需要手动去除,基本上可以直接分析该函数了,并且函数的框架非常明显,说明去混淆的正确率也是不错的。

对于这种方案,优点是扩展性强,对于不同的混淆仅需添加规则;缺点是unicorn对导入函数,以及一些avx指令和simd指令支持较差,需要手动处理和绕过,去混淆后需要分析汇编。

结语:

baf4687142280a52d8eb298cad8bf522

参考文章

Ollvm混淆与反混淆