Replacing hard coded paths in ELF binaries
This article happens to talk about a problem in NixOS, but it generally applies to any operating system using ELF files (which is most Linux distributions).
Recently in the NixOS chat there was a discussion about how to package a binary-only driver that has a hardcoded firmware path in it. NixOS does not follow the FHS, which means that every hardcoded path must be patched.
The motivating example
The affected fingerprint reader driver is available as a binary blob on launchpad.
It contains the driver library (usr/lib/.../libfprint-2-tod1-broadcom.so) and a bunch of firmware files (var/lib/fprint/fw).
The library contains a hardcoded path to the firmware directory, which we'd like to replace.
Binary editing and why it won't work
beb is a binary stream editor, you can use it to replace strings in binary
files. There is a problem with this approach.
You can only make strings shorter, not longer. This is because strings are tightly packed in a binary file. You can observe this by opening an ELF file in a hex editor:
00020360: 4449 5350 4c41 5900 6c6f 6361 6c68 6f73 DISPLAY.localhos 00020370: 7400 7463 7000 696e 6574 0069 6e65 7436 t.tcp.inet.inet6 00020380: 0025 6875 0075 6e69 7800 2573 2564 0000 .%hu.unix.%s%d.. 00020390: 2f74 6d70 2f2e 5831 312d 756e 6978 2f58 /tmp/.X11-unix/X 000203a0: 0058 444d 2d41 5554 484f 5249 5a41 5449 .XDM-AUTHORIZATI 000203b0: 4f4e 2d31 004d 4954 2d4d 4147 4943 2d43 ON-1.MIT-MAGIC-C 000203c0: 4f4f 4b49 452d 3100 0000 0000 0000 0000 OOKIE-1.........
If you wanted to, say, replace the string
tcp above with
can't do it, as you would also trash the strings that come after. You cannot
easily shift the strings around, as those are referenced by x86 machine code,
which can do arbitrary pointer arithmetic.
This makes this method unusable in NixOS, since due to the nature of the Nix store, the replacement path will almost always be longer. For example, for this binary we would need to replace
/var/lib/fprint/fw with something like /nix/store/abcdefghijklmnopqrstuvwxyz012345-libfprint2-tod1-broadcom-firmware-0.0.6
The usual way to deal with problems like this in NixOS is to use a helper called buildFHSUserEnv, which is effictively a chroot:
buildFHSUserEnv provides a way to build and run FHS-compatible lightweight sandboxes. [...] This allows one to run software which is hard or unfeasible to patch for NixOS – 3rd-party source trees with FHS assumptions, [...]
Emphasis mine. The problem with this approach is that you cannot chroot a library.
At best you can chroot the program that uses your library, but that set of
programs may be unknown (in this particular case, there appears to be only one
It is also an example of a
global solution to a local problem,
as now all other code running inside of the
fprintd process, and all child processes,
run in this chroot, which can cause issues.
We cannot replace the string that is passed to
fopen(), but can we replace
fopen() itself? Indeed we can! At first glance replacing an entire function
looks harder than replacing a parameter to that function, but the exact opposite
is the case.
The driver library is dynamically linked, meaning it does not contain an implementation
fopen(). Instead, it imports fopen() from glibc, the GNU C standard library.
We can use tools like
nm to inspect the data structures relevant
for dynamic linking:
$ nm --dynamic libfprint-2-tod-1-broadcom.so 000000000000faa0 T AddNewSession 000000000002bc90 T appendCallback 000000000002bf40 T appendCommandIndex U asctime@GLIBC_2.2.5 [...] U fopen64@GLIBC_2.2.5 [...] 0000000000042a49 D version U __vfprintf_chk@GLIBC_2.3.4 U __vsnprintf_chk@GLIBC_2.3.4
The symbols with a T (text) sign are defined in the library itself, the symbols with an U (undefined) are not defined in the library, and will be imported at runtime by the dynamic linker:
$ readelf --dynamic libfprint-2-tod-1-broadcom.so Dynamic section at offset 0x40c40 contains 27 entries: Tag Type Name/Value 0x0000000000000001 (NEEDED) Shared library: [libfprint-2-tod.so.1] 0x0000000000000001 (NEEDED) Shared library: [libcrypto.so.1.1] 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] [...]
Which means that at runtime, the dynamic linker will load the libraries
One of them (libc.so.6) will provide the
fopen64@GLIBC_2.2.5 symbol, and the
dynamic linker will put the address of that symbol somewhere where the
driver library can find it.
The neat thing about this is that the symbol table displayed by
nm, and the
Dynamic section displayed by
readelf are documented data structures,
which means we can relatively easily modify them...
@GLIBC_2.2.5 is related to symbol versioning, I'll not talk about this here.
Replacing the fopen() call
The plan is:
- Write a wrapper function,
FILE* fopen_wrapper(const char* path, const char* mode);, which will inspect the path that's given to it. If it starts with
/var/lib/fprint/fw, it will replace that prefix with another directory of our choosing. In any case, it will then call the real
fopen()to open the file.
- Build a shared library containing that function. Let's call it
- Patch the driver library so that it imports
- Patch the driver library to add a dependency on
- (For Nix reasons, we must also patch in the directory of libfopen_wrapper.so)
Steps 1 and 2 are trivial, we're just writing a simple C function and compiling
it into a shared library (
cc -fPIC -shared stuff.c -o libfopen_wrapper.so).
Steps 4 and 5 are also trivial and handled by a tool called
patchelf, which was initially created to
patch binary blobs for Nix compatibility:
$ patchelf \ --add-needed libfopen_wrapper.so \ --set-rpath /path/to/the/wrapper/lib \ libfprint-2-tod-1-broadcom.so
The only thing missing is step 3, renaming the imported function...
Renaming a symbol
At first I thought this is trivial, since the
objcopy tool has the following flag:
--redefine-sym old=new Change the name of a symbol old, to new. This can be useful when one is trying link two things together for which you have no source, and there are name collisions.
... which sounds exactly like the functionality I want. Unfortunately, an ELF file
contains two symbol tables,
.dynsym. .symtab is used by
compile time, .dynsym is used by the dynamic linker at runtime.
And objcopy only edits .symtab, which is the wrong one.
But patchelf already has a
--replace-needed flag, which does something extremely
similar. We can take that code and have it manipulate the
.dynsym section instead.
The procedure to replace a symbol is:
- Iterate over the
.dynsymsection, which is an array of
ElfXX_Sym.st_namefield references a string in the
.dynstrsection, which contains the name of the symbol.
- If the name matches the old symbol name (fopen64), then...
- ... add the new symbol name (fopen_wrapper) to .dynstr, and adjust
- (Also clear the symbol version if it exists)
In code, you can see this in my patchelf fork.
Then we add this newly implemented flag to our patchelf invocation:
$ patchelf \ --replace-symbol fopen64 fopen_wrapper \ --add-needed libfopen_wrapper.so \ --set-rpath /path/to/the/wrapper/lib \ libfprint-2-tod-1-broadcom.so
And now the driver library will take a detour through our fopen_wrapper everytime it tries to open a file!
You can see how this looks like in terms of a Nix derivation.
I do not have the affected hardware, so I cannot test whether this actually works.
This method also assumes you know how your library opens files. There are many
glibc functions that can open files:
openat. And many functions
in other libraries like glib, qt, kio, ...
This only works with dynamically linked libraries, and only with native code (so Java and .NET IL bytecode can't be edited that way... however replacing strings in those types of files is much easier than in native code)
This method quickly becomes impractical if FHS assumptions are pervasive in the target binary.
The problem is the library, and the fix only affects the library. Neither other code running in the same process, nor child processses are affected by it.
No need to mess around with user namespaces and chroots.
This approach is basically a
LD_PRELOAD, but scoped to a single library,
instead of the entire process.
A simpler example
The fingerprint driver is quite complex, and you can't test it without having the hardware. For this reason I've set up a contrived example for testing this method.
First, running the example program directly:
$ nix run 'sourcehut:~raphi/elf-replace-symbol#simple-bad' simple.c: opening and printing contents of '/lib/hardcoding-paths-is-bad.txt' fopen: No such file or directory (/lib/hardcoding-paths-is-bad.txt)
But after we inject a wrapper library:
$ nix run 'sourcehut:~raphi/elf-replace-symbol#simple-good' simple.c: opening and printing contents of '/lib/hardcoding-paths-is-bad.txt' fopen_wrapper.c: Replacing path '/lib/hardcoding-paths-is-bad.txt' with '/nix/store/jrmrns7msqsxkcbgml3zvq7pm0zybshq-git-2.36.0-doc/share/doc/git/git-stage.txt' git-stage(1) ============ NAME ---- git-stage - Add file contents to the staging area
The Nix derivation is also available.
So far this is a cute proof of concept and it should be treated as such.
- Example project
- All about symbol versioning - the best resource I've found regarding this topic
- Tool Interface Standard (TIS) Executable and Linking Format (ELF) Specification