Hacking strace for System Call Instrumentation
I was always fascinated by projects like Unicorn Engine and Capstone Engine where the author repurposed existing software development tools, such as QEMU and LLVM, for Software Security. While working on Reverse Engineering a binary in the MIPS WiFi Router. I needed to intercept and manipulate Syscall data exchange between router application with a kernel driver. I couldn’t find any open-source tool that could do that for MIPS architecture, then it stuck me that was the perfect opportunity to do something like Unicorn project.
I was already using strace to monitor the syscalls for the Router binary. However, I could not modify the Syscall parameter value or write a parser for the IOCTL of the custom drivers. Since strace already traces Syscalls well, it also parses known Syscall parameters and return values and it is open source. I thought, what if I could patch it? what if I can get a callback before and after each Syscall. This callback should allow me to modify and trace Syscall parameters and return values.
That been said, there where many other reason for choosing this tool which are as follows:
- strace has a huge amount of code that parses the syscall parameters and various IOCTLs. I wanted to reuse much of my team’s hard work.
- It has been around for a long time and supports many architectures, and even many embedded Linux build systems such as Yocto and Buildroot can build strace.
- Another key feature of strace is the tracing multi-threaded programs and forked processes.
- If I did this myself using ptrace, it would take a long time to develop, debug, and test it, that would be a lot of code to maintain. I didn’t want to recreate many things that strace already provides. I really didn’t have the patience for that.
- Finally, I only wanted to write code to advance my Reverse Engineering project. Call me selfish/lazy, but I would like to call it “not losing focus.”
#System Requirements
Before we jump into strace source code or any sort of coding, let us define what we are we trying to achieve here. Below I have listed some of the requirements of the system I initially had in mind:
- To did not wanted to make too many changes to the strace codebase. I want to create some sort of plugin architecture. This would make the project more portable to any version of the strace.
- The plugin will have the core logic of tracing or manipulating Syscall data, and the plugin will be compiled as a shared library.
- strace will take plugin path as a command-line option, which it will load and invoke the callback function for before/after each Syscall.
- The plugin defines the following interfaces/functions:
- Setup Callback: This function is invoked once the plugin library is loaded by strace. To export the Syscall data or to accept any command from another process, you have to initialize resources like IPC, or File, Network socket, etc., and this lifecycle function will give you an opportunity to initialize those resources.
- Teardown Callback: This function is invoked before the plugin library is unloaded. It complements the previous function, and you are supposed to release all the resources you have created in the setup phase.
- Syscall Entry Callback: This function is invoked before every syscall. This callback is an opportunity to identify the Syscall number and its parameter type and take appropriate action.
- Syscall Exit Callback: This function is invoked once the Syscall returns , and this is where you can see the result of the syscall execution either in the Register (which is defined by the System Calling Convention) or memory pointer in one of the syscall parameters.
#Brief introduction to ptrace syscalls
If you already know how ptrace API work, then this section can serve a quick refresher or you can just skip it.
The ptrace syscall sub-system provides a way by which one process(the “debugger”) can observer and control the execution of another process(the “debuggee/tracee”). You can write you own debugger using this API’s, even the mainstream debuggers such as GDB, LLDB, etc use them internally.
The ptrace syscall work in an event loop pattern. In this method you launch a new program(the “tracee”) under ptrace syscall this way kernel give the “debugger” process to control the “tracee” process. While spawning the process we mention the type of events you want the debugger process to be notified. This is done with PTRACE_SETOPTIONS syscall, the list of supported events can be found on the ptrace man page.
The debugger event loop is blocked with wait system call, waiting on the tracee process. The wait4 syscall returns once it has the event to process and makes additional ptrace syscall to gather more information about the event.
This is what a very high level working of ptrace syscall looks like. We now how to find this pattern in the strace codebase.
To find syscall entry/exit point we need to find the function which is processing these events.
#Source Code Exploration
Let’s get our hands dirty, you can download the code from official strace repoistory.
We have already defined the goal in the previous section, to achieve we need to do the following:
- Add a command line argument which takes the library path in strace. Loads the library from that path and finds the interface function mentioned previously in the requirement section.
- Identify the point in the strace code which processes Syscall enter and exit event. It is at this point we will invoke our syscall entry/exit callback functions which are present in the plugin library.
- Figure out how to trace multiple threads and processes.
- Find the points at which we will call the resource setup & teardown callbacks.
The first objective is pretty easy. We have to search for code which is doing command-line parsing and add one more option to it, it should be somewhere near the main function. Below, is the code section where my search landed me, I have also added comments on what needs to be done.
1 | // src/strace.c |
Now on to the main part, the code which is processing the ptrace event loop. We can start by searching for ptrace event constants like PTRACE_EVENT_EXEC, PTRACE_EVENT_FORK, etc(“PTRACE_EVENT_*”). These constants are somewhere around wait syscall which is basically the Syscall that brings these events to the debugger process. Or you could directly search for the wait syscall. Both of these code snippets will be near to each other. By taking either path you will reach next_event function.
1 | // src/strace.c |
By quickly skimming through the function code I could conclude that it is parsing the tracee process debug event information into a tcb_wait_data struct, which is also the return type of the function. The return value is then passed as a parameter to dispatch_event function. And, the name of the function name also suggests that it must be processing some events, most likely debugging. So, I started to investigate dispatch_event function.
While reading the function code I came across a switch block, with one of the case labelled as TE_SYSCALL_STOP which looked interesting to me. Investigating that case further, I reached trace_syscall function, which is where I concluded my search for Syscall enter and exit. It does the processing of both Syscall entry and exit event, it also does Syscall parameter decoding and stores it in struct tcb
.
The code for that function is shown below, it looks like the point to invoke our Syscall callback.
1 | // src/strace.c |
The cherry on top is strace makes our job easier if you read struct tcb
. Many of the fields suggest that strace already extracts the Syscall value and the parameters into a proper data type. This will make our patch work in any architecture which strace support.
If it weren’t for this structure, I would have had to write the ptrace syscall to read/write Register/Memory values, which would make it architecture-dependent, and this would require a lot more effort.
The pid field in the struct tcb
identifies the thread making the Syscall. This field will fulfil our requirement to trace multiple threads/processes. The tcb struct is huge and not all the fields are relevant to our requirement. So in the below code block, I have highlighted only the fields which are useful.
1 | // src/defs.h |
Next, regarding the Resource setup & teardown callback function. We can call the resource setup function after the command-line parsing is done, and the resource teardown function can be called just before the main function returns. Shared library loading should be simple call dlopen. To find callback function pointers can be done using dlsym.
OK, Fellas that is all we need, next we will define the plugin interface.
#Plugin Interface
The requirment was mentioned in the start of the blog below the code which materiazled those idea.
1 | // Function prototype for Syscall entry/exit callback |
If you are going to do anything serious with your plugin, it’s inevitable that you will have to do memory read/write in the tracee process. To do this you can either use ptrace PTRACE_POKETEXT/PTRACE_PEEKDATA Syscall, or you can use a more modern alternative which is process_vm_readv/process_vm_writev Syscall. Either way, you will have to deal with architecture-dependent code. Or, you can dig more into the code base and figure out how strace is doing it and, how you can reuse some of that good stuff.
#System call number identification problem
Another interesting fact I discovered about the Linux Syscall interface while working on this project is that it has a standard Syscall interface, which doesn’t change across kernel versions. This ensures unconditional backward compatibility. But, across different architecture, the Syscall number is different. So, for example, a read Syscall number will be consistent across different Linux versions in x86 architecture, but for other architectures, like MIPS, ARM, etc, the number might differ.
You can find the Syscall number for all the different architecture here. If you observe the table, you will notice that not all syscalls are present in all the architecture. For example, getxgid, getxpid & getxuid syscalls are present only in alpha architecture.
The reason I am discussing this is because, when you start writing a Syscall-specific code, using the syscall number from an internet search won’t work. That’s because the strace has its own Syscall number convention.
The problem is, that different syscall numbers for different architectures can make your code very architecture-dependent. And, to deal with this issue strace uses a very clever technique, they created their Syscall number. Then, for each architecture they map the architecture-specific Syscall number to strace’s won syscall number. You can find this mapping for each architecture in the “linux/
#Some interesting notes about strace code
Some interesting things were found while I was navigating the code.
- The bulk of code in strace is regarding parsing the Syscall parameters and return value. The parsing code can be found in the file by the “
.c”. - The defs.h file has a lot of code which can be re-used, doing more research in this file can help you find more architecture-independent code.
- Macro
SYS_FUNC
defines Syscall handling functions. For example, read syscall is definedSYS_FUNC(read)
macro which expands to the codeint sys_read(struct tcb *tcp)
function. This handler code parses the syscall parameter and prints it on the console. - Metadata regarding each syscall stored in
struct_sysent
struct, there is an array of such struct describing each syscall. This struct also has a handler function pointer, which is described in the previous pointer. Below is the code for the structure.
1 | typedef struct sysent { |
#Conclusion
I was amazed at how easy it was to implement this, I would say most of the credit goes to the good programming standard and amazing readability of the project. The code documentation by means of variables, functions, struct names, comments, etc made it very easy to navigate the project.
In the post, we went from defining the system requirements to navigating the strace source to understanding how we can patch the code base to make our plugin work.
There is much more to explore in this project like tracing file descriptor resources like Socket, File, IPC, etc but I am going to end this post here, maybe it in another post.