Cross-Process Dylib Injection via remote_call
opainject or injecting shellcode, you can call dlopen directly in a remote process by setting thread registers via thread_set_state. No shellcode, no intermediate libraries — just a direct function call across process boundaries.Background
6 months ago I wrote remote_call — a way to call any function in a remote process by hijacking thread state on arm64. It suspends a thread, sets x0–x7 for arguments, points PC to the target function, sets LR to the original PC so the thread returns cleanly, and resumes.
Yesterday I told @nkhmelni about an idea — instead of using opainject in my projects (like Titanium), I could just call dlopen directly through remote_call. No shellcode, no intermediate libraries, nothing extra. He said he wants to test it out.
This morning I woke up, opened my mac, wrote a quick example, and it just works.
The idea
The approach is dead simple:
- Find the target process PID
task_for_pidto get the mach port- Allocate memory in the remote process and write the dylib path there
- Get
dlopenaddress (same across all processes — shared cache) remote_call(task, dlopen, path, RTLD_NOW)— done
No shellcode needed. remote_call sets registers directly via thread_set_state — x0/x1 for args, PC to dlopen, LR to return. The process continues running after injection like nothing happened.
remote_call
The core of the whole thing. Suspend a thread in the target process, rewrite its register state, resume it, and poll until it returns to the original PC:
static kern_return_t remote_call(mach_port_t task, uint64_t func_addr,
int argc, const uint64_t *args, uint64_t *out_ret)
{
if (!MACH_PORT_VALID(task) || func_addr == 0)
return KERN_INVALID_ARGUMENT;
if (argc < 0) argc = 0;
if (argc > 8) argc = 8;
thread_act_array_t threads = NULL;
mach_msg_type_number_t thread_count = 0;
kern_return_t kr = task_threads(task, &threads, &thread_count);
if (kr != KERN_SUCCESS || thread_count == 0)
return kr != KERN_SUCCESS ? kr : KERN_FAILURE;
thread_act_t thread = threads[0];
kr = thread_suspend(thread);
if (kr != KERN_SUCCESS)
goto cleanup;
arm_thread_state64_t state;
mach_msg_type_number_t count = ARM_THREAD_STATE64_COUNT;
kr = thread_get_state(thread, ARM_THREAD_STATE64,
(thread_state_t)&state, &count);
uint64_t orig_pc = __darwin_arm_thread_state64_get_pc(state);
for (int i = 0; i < argc; i++)
state.__x[i] = args[i];
__darwin_arm_thread_state64_set_pc_fptr(state, (void *)func_addr);
__darwin_arm_thread_state64_set_lr_fptr(state, (void *)orig_pc);
kr = thread_set_state(thread, ARM_THREAD_STATE64,
(thread_state_t)&state, ARM_THREAD_STATE64_COUNT);
thread_resume(thread);
// poll until thread returns to original PC
for (int i = 0; i < 5000; i++)
{
arm_thread_state64_t cur;
mach_msg_type_number_t cur_count = ARM_THREAD_STATE64_COUNT;
kr = thread_get_state(thread, ARM_THREAD_STATE64,
(thread_state_t)&cur, &cur_count);
if (__darwin_arm_thread_state64_get_pc(cur) == orig_pc)
{
if (out_ret) *out_ret = cur.__x[0];
kr = KERN_SUCCESS;
goto cleanup;
}
usleep(1000);
}
kr = KERN_FAILURE;
cleanup:
for (mach_msg_type_number_t i = 0; i < thread_count; i++)
mach_port_deallocate(mach_task_self(), threads[i]);
vm_deallocate(mach_task_self(), (vm_address_t)threads,
thread_count * sizeof(thread_act_t));
return kr;
}
The key trick: on arm64, dlopen follows the standard calling convention — first arg in x0, second in x1, return value in x0. We set LR to the original PC, so when dlopen returns, the thread lands right back where it was. The process doesn't crash, doesn't stall — it just keeps running with a new dylib loaded.
The injector
The main flow ties everything together — find the process, get its mach port, write the dylib path into its address space, resolve dlopen, and fire:
#define DYLIB_PATH "/path/to/d.dylib"
int main(void)
{
pid_t pid = find_pid_by_name("main");
printf("pid %d\n", pid);
mach_port_t task;
task_for_pid(mach_task_self(), pid, &task);
printf("mach port: %d\n", task);
// allocate memory in remote process and write dylib path
const char *dylib_path = DYLIB_PATH;
size_t path_len = strlen(dylib_path) + 1;
mach_vm_address_t remote_str = remote_alloc(task, path_len);
remote_write(task, remote_str, dylib_path, path_len);
printf("remote string at 0x%llx\n", (unsigned long long)remote_str);
// dlopen address is the same across processes (shared cache)
uint64_t dlopen_addr = (uint64_t)dlsym(RTLD_DEFAULT, "dlopen");
printf("dlopen at: 0x%llx\n", (unsigned long long)dlopen_addr);
uint64_t args[2] = { remote_str, 0x2 /* RTLD_NOW */ };
uint64_t ret = 0;
remote_call(task, dlopen_addr, 2, args, &ret);
if (ret == 0)
fprintf(stderr, "dlopen returned NULL\n");
else
printf("dlopen handle: 0x%llx\n", (unsigned long long)ret);
return 0;
}
The dlopen address is the same in every process because of the shared cache — all processes map dyld_shared_cache at the same slide per boot. So we just resolve it locally with dlsym and use it remotely.
The dylib
The injected dylib is trivial — a constructor that runs on load:
#import <Foundation/Foundation.h>
__attribute__((constructor))
void run()
{
NSLog(@"hello from dylib");
}
Build & run
# build
xcrun clang -o inj inj.c -framework CoreFoundation
xcrun clang -o main main.c
xcrun clang -dynamiclib -framework Foundation -o d.dylib d.m
# terminal 1 — start target process
./main
# terminal 2 — inject
sudo ./inj
Output
The target process prints the dylib's message without ever knowing it loaded anything. No crash, no interruption — it just continues running. It makes sense that this can be rewritten for iOS.
Requirements
- macOS arm64
- Run as root (or sign with
com.apple.security.cs.debuggerentitlement)