Decrypting Mirai configuration With radare2 (Part 1)
This is the second part of the three-part series about code Emulation for Reversing Malware :
Part 1 describes how to use radare2 function emulation along with an exercise of cracking password of function implemented using radare2 python scripting plugin r2pipe.
Part 2 describes how to use the feature to decode a configuration of a Mirai IOT botnet, by implementing the solution in radare python scripting capabilities.
Part 3 improves the script created in the previous by adding more features of searching for addresses of encrypted string and creating function signature to search for decryption function instead of using the hard-coded address of the function.
In the previous post we looked at how to we can use partial code emulation to decrypt a string in a binary. In this post we will take an example of a popular Linux IOT malware Mirai, the reason for choosing this particular malware is it stores its configuration like CNC server, port etc in encrypted form. Mirai botnet is cross-architecture so for this post we will reverse the x86 architecture version of the binary. The main goal of this post is to automate the configuration decryption using radare2. We will also use radare2 for static analysis of the binary and to reverse a little bit of decryption function.
To those who have not of the heard about Mirai before, it’s IOT botnet which infects the networked Linux devices once it has installed itself it then uses the infected device as DOS attacking medium. This malware can be cross-compiled across different architecture such as x64, SPARC, MIPS, ARM etc. Another dangerous aspect of this Malware is that the author has released the source code due to which there are lots script-kiddies using this malware either as its or they are developing a variant of it. This another reason if felt the need to create the script to automation the decryption of configuration. You can find the source of the malware on this GitHub repository.
#Malware File
I am using following files for this analysis, Surprisingly this sample was using the same source code which I pointed out earlier. But for the rest of the post, we will analyze the binary as if didn’t have the source code.
File SHA 1 hash | File Name |
---|---|
318998d8ad1e5a9be588ecccf9713c42788c9972 | mirai-telnet.x86 |
a7c901114b6c767562c32d30c9f6a076a380a767 | mirai-ssh.x86 |
#Find the encryption function
Malware author usually uses a very simple operation like XOR, bit shifting(SHR or SHL) or bit rotating(ROR or ROL) to encrypt strings. The algorithm is usually a combination of this operations. So to search for encryption function we can search for these instructions, radare2 has opcode search functionality you can see all the available options with /a? command.
For our particular case /at xor will find all the address at which xor instruction is present, this command search a particular family of instruction like and, or, push, etc are the other family of instruction. To get the list of all family of instruction type /atl command.
If you search for xor instruction you will find 226 hits and there were lots of instruction in which register as XORed with itself which is very common for normal code, this instruction is used to set the register to 0, we are looking for xor instruction whose source and destination operand is different. We can also do the similar search for shr and shl bit shift operation and ror and rol bit rotate operation. With lots of searches, there was a function starting at 0x804d680 which had a high concentration of these kinds of operation. Let reverse this function.
#Reversing the decryption function
Let’s try to understand the decryption function a bit about the function how it gets the input parameters, as a high-level understanding is required to understand the parameters which the function is manipulating the internally.
Open the visual mode with VV command, this will the graph of the decryption function and you can toggle through different graph view by pressing p. We will use this view to understand the detail of the decryption function.
This is a general pattern of string decryption functions that you can see here, first there is some setup of the string pointer, then the register which will be loaded with the decryption key, and then there will be the loop which will do the decryption of the string, there was also a crackme which I came across which first decrypted the decryption key and then the data was decrypted with that key, just another layer of protection but the technique remains the same. The decryption key may be passed through the function parameter or may be stored in the function local variable or in the global variable.
Before we look into reversing the function lets try to understand radare visual mode which we are using, in particular, the graph view we are currently in. This view represents the assembly code in a graph view (like IDA) but the code is collapsed instead only this connection between different node is shown either with a red/blue/green line, each of this color has a meaning. Blue color means this jump is unconditional, green means the jump is taken if the condition is true and red means the jump is taken if the condition is false. Each node is shown with a hex value of the address of the starting of the code of that particular node, the entire address is not shown, just a small part of it is displayed. There is also some alphabet shown in the square bracket in yellow color, that is the keys you have to press to read the code of that node. So in the screen above you can press gc to see the code for d6a0 node. If you do so that node will be shown as <@@@@@@>, this represents the current node whose code is displayed on the left side. This view gives us the high-level flow of the code, just by looking at this view you can see the loops and if else statement of the function. You can use arrow keys to move the graph around on the screen. Now that we know a little bit about this functionality lets use it to analyze the decryption function.
Looking at the graph we can see the pattern we discussed before, first the initialization in the first two nodes on the left side and then an unconditional jump to the loop function. There is also a conditional jump on the root node if the condition is true then the jump is taken at the end of the function, this must be some sort of validation of the parameter, if the validation is true then function cannot run the decryption and if the validation is false then the function does some additional initialization and then jumps to loop (which might be the real meat).
In this function you can see the initialization code in the left side, ebp, edi, esi and ebx registers are pushed on to stack and the value 0x1c is subtracted from esp, we see two things happening here, the registers are push on to stack to preserve their old value which we expect restored to old values by popping them back to the old values at the end of the function. Secondly, the subtracting of 0x1c value from the stack is done to allocate local variable in the function. we can verify this by switching to the node with [ga] you will see the assembly code of exit node as below.
1 | [0x804d66a] |
The values of the registers are popped back and the esp is torn down by adding 0x1c value. Next xor instruction sets the value of eax to zero and next the register al is set to the value of the first argument addressed by esp + 0x30 strange the function parameter is addressed using esp register rather then ebp register, this is the only parameter which is passed to the function according to the radare, we expected to pass as two parameters, string to obfuscate and the decryption key, but instead there is only one parameter which is of byte size integer which means that the value of parameter ranges from 0 to 31, the combination of previous two instruction do something really interesting, xor set the entire eax to zero and next set the lower 8-bit value to argument passed to the function. Next inst is lea ecx, [eax*8 + 0x8052800], lea instruction is usually used to calculate address, this instruction is using base + index*size format of addressing, eax register is the index which is passed as parameter to the function and 0x8052800 is the base address and the size is 8 byte, this instruction reveal a lot about the configuration of the bot. First of all, the configuration is an array of a data structure of size 8 byte ( maybe the structure has two members of size 4 byte) the base address of the structure and the indexing is pretty clear.
Next inst mov eax, dword [0x80526fc] loads the value at address 0x80526fc into EAX this instruction is not clear for now, next few instruction might help use. Next instruction is comparison the cmp word [ecx + 4], 0, ecx was loaded with the address of data structure and ecx + 4 is the second member of structure which is of size word (2 bytes) and is compared with zero and if it’s zero then jumps to exit node of the function. Next, let’s look at the false branch [gc] of the comparison instruction.
1 | 0x804d620 [gc] |
On this node copies of eax register is made onto other registers edi, esi and ebp register and later bit shift right operation of 8, 0x18 and 0x10 are performed respectively. EAX register as we can understand so far here is that another data in been used in this function which is not clear what it is. Lower 8 bits of EAX register(AL) are copied on the top of the stack. Moving on to the next node [gd].
On this node there are lots of xor instruction, this is probably the real meat of the function as we can see the last instruction which is a comparison instruction who’s true branch go to the starting of this same node and the false branch exist the function. The ecx register is not changed so far in the register which to recall is loaded with the base address of the struct of 8 bytes. The first instruction this node moves the value of pointing by ecx into ebx, this first member of the struct seems to be a pointer, maybe a char pointer (char*). Next the copy of the ebx register is made into eax, next eax and edx (which is 0 at this point) is added to eax and result is stored in eax and ebx is loaded with ebp which is the copy of data at address 0x80526fc(may be a key) and then xor byte [eax], bl instruction value at address eax is xored with bl register and value is stored back at the address pointed by eax register. So far we can make a safe assumption that value at address 0x80526fc is the key which is used for decryption. Rest of the instruction in this node are all based on register, no other other external data is loaded.
We should now look for the termination condition of this loop, in the end, the node eax register is compared with edx register. EDX is not changed to any other value before except for increase itself by 1, edx register was earlier set to zero we can say that edx is the counter of the loop. The counter is compared with eax register which is set to value pointed by ecx + 4 if you recall ecx register is not altered to any other value is still holds the base address of the data structure, eax register is set to the second member of the struct and then it is compared with 0xffff which means the upper 16 bits of the eax register is set to 0 which is another hint the 2nd member of struct is 2 byte, which we can conclude that 2nd member is the length of the string we want to decrypt which of 2 bytes. Lets not reversed this function any further and let us see if we can get any meaningful results out of this to understand so far if we fail we come back at this function and reverse it further.
#Emulating the decryption function
Now that we have a fair bit of understanding about the decryption function, let’s try to emulate for one string and then automate the decryption for any given string. There are a couple of conclusions that can be drawn from the reversing session before which are :
- We expected to get two parameters to the function decryption key and the data to decryption which is not the case.
- The parameter pass is in the index of the configuration the bot want to decrypt which then loads the base address of the data structure into ecx register which remains unchanged through the rest of the function.
- The decryption key is part of the function which is read from the global variable.
- Function manipulates the data structure which has two members, first member, being the char pointer to the encrypted string and 2nd member is 16-bit int member which is the length of the encrypted string.
To execute the function we need to do the following :
- Allocate a memory and copy the encrypted string to that memory.
- Create the config data structure and set the first member of the address of the encrypted string which we just copied on the stack and the 2nd member is set to the appropriate length.
- Start the execution of the emulator from the first instruction of the function and then halt the execution at the point where the ecx is loaded with the base address of the data structure and instead set the ecx register with the address of the config data structure which we created earlier.
- Then continue the execution of the function until the last instruction of the function and then print the decrypted string as a null-terminated string.
To get all the string in the binary we can use ps @@ str.* command. Since the string is encrypted, they might not be null terminated as during encryption the null value can be in the middle of the legitimate string and radare might fail to identify string and its proper length during auto-analysis. To mitigate this we will consider all the string of length 100 even if radare marks its less then 100 and copy the 100 bytes in the memory and then decrypt the string, but when we print the string we will print the string as a null-terminated string which can be less then 100 bytes. The string could be longer than 100 bytes, in that case, we will come back to the script and change length. For example, for we can consider anyone strings for emulation, I have considered string starting at address 0x08050ecd.
Below are the radare commands to execute the emulation :
- s 0x804d680: Seek to the starting of the function.
- y 100 @ 0x08050ecd: Copy 100 bytes of the string starting at address 0x08050ecd in radare clipboard.
- aei ; aeim ; aeip: Initialization of emulation engine, create the memory and set the register values. Memory created starts at address 0x100000.
- yy @ 0x100000: This command will paste the copied data start at the address 0x100000, this is the address of the string we are going to use for emulation.
- wx 0x00001000 @ 0x100064: This command creates the first member of the config data structure with the address of encrypted data. The base address of the config data structure is 0x100064. The value 0x00001000 is 4 bytes for a little-endian format for 0x100000 which is the address of the encrypted string we just copied.
- wv 100 @ 0x100068: This command sets the 2nd member of the data structure with the length of encrypted string which is the constant value 100 for our experiment, at this point, we have created the config data structure.
- aecu 0x804d694: This command starts the execution of the function till the address 0x804d694, this is the point where we want to change to value of ecx register to the address of data structure.
- ar ecx=0x100064: Change the value of ecx register with the address of the data structure we have created above.
- aecu 0x0804d6f0: This command continues the execution until the end of the function. At this point, the string is decrypted and we should be able to print a null-terminated string.
- ps @ 0x100000 : This command print the null-terminated string starting at address 0x100000.
Running this command you should be able to see the output shown below. You can even create the above commands as a macro and pass the address of the string as the parameter to the macro to see the output, copy paste the below code to create the macro.
1 | (mirai_decrypt flg, s 0x804d680, y 100 @ $0, aei, aeim, aeip, yy @ 0x100000, wx 0x00001000 @ 0x100064, wv 100 @ 0x100068, aecu 0x0804d694, ar ecx=0x100064, aecu 0x0804d6f0, ps @ 0x100000) |
The above command will create the macro named mirai_decrypt to use the macro issue the command .(mirai_decrypte
1 | import r2pipe |
This script should have the same effect as above macro you should be able to see the decrypted string printed. Next step should be to enumerate the address of all the strings and pass those address as the parameter to the emu_decrypt function. You can get the list of all strings by using ps @@ str.* command, we will use this command to get the address of all string in python code which is as follows.
1 | for str_obj in r.cmd('psj 1 @@ str.*').split('\n'): |
Once you run the script you will the output as below. As you can see there are lots of meaningful decrypted strings and I am sure that’s not all the configuration, as there are other garbage hex value been printed. Maybe radare is not able to find all the encrypted strings as they might look completely like normal binary hex value after encryption so radare doesn’t flag them as strings, we need another way to figure out the reference to this bot configuration strings.
The next problem we have to tackle the problem of finding all the address of the encrypted string, once that is done we can the python function we created above to decrypt the string. Finding those addresses was not that simple as I thought, nonetheless it helps me to discover other capabilities of radare which put to use to tackle the challenge.
The output of the decryption python script is as follow.
1 | I@MVJLLVJMVJIOxrxxxxyxzantari.duckdns.org |
#Conclusion
I think to a certain extent we have able to accomplish the goal, we were able to decrypt the bot configuration as you would have observed some domain names and other meaningful string. But the goal is not completely achieved we will address the problem of finding the reference to the encrypted string in the next post, as it requires addition reversing of the binary. We also saw how partial function emulation can be helpful and save our time on not writing the decryption function and with a little bit of understanding of encryption function you can emulate the code and get the benefits of decryption. In the next post, we will improve the above script to find all the configuration references and decrypt it.
This is GitHub link of the above python script we created, it has the full solution implemented. The sample files mentioned above are available on request to the researchers, please send me an email from your work email id to confirm your identity. Thank you for reading this post.