Replacing hard coded paths in ELF binaries

This article happens to talk about a problem in NixOS, but it generally applies to any operating system using ELF files (which is most Linux distributions).

Recently in the NixOS chat there was a discussion about how to package a binary-only driver that has a hardcoded firmware path in it. NixOS does not follow the FHS, which means that every hardcoded path must be patched.

The motivating example

The affected fingerprint reader driver is available as a binary blob on launchpad.

It contains the driver library (usr/lib/.../libfprint-2-tod1-broadcom.so) and a bunch of firmware files (var/lib/fprint/fw).

The library contains a hardcoded path to the firmware directory, which we'd like to replace.

Binary editing and why it won't work

beb is a binary stream editor, you can use it to replace strings in binary files. There is a problem with this approach.

You can only make strings shorter, not longer. This is because strings are tightly packed in a binary file. You can observe this by opening an ELF file in a hex editor:

00020360: 4449 5350 4c41 5900 6c6f 6361 6c68 6f73  DISPLAY.localhos
00020370: 7400 7463 7000 696e 6574 0069 6e65 7436  t.tcp.inet.inet6
00020380: 0025 6875 0075 6e69 7800 2573 2564 0000  .%hu.unix.%s%d..
00020390: 2f74 6d70 2f2e 5831 312d 756e 6978 2f58  /tmp/.X11-unix/X
000203a0: 0058 444d 2d41 5554 484f 5249 5a41 5449  .XDM-AUTHORIZATI
000203b0: 4f4e 2d31 004d 4954 2d4d 4147 4943 2d43  ON-1.MIT-MAGIC-C
000203c0: 4f4f 4b49 452d 3100 0000 0000 0000 0000  OOKIE-1.........

If you wanted to, say, replace the string tcp above with http+xml+soap, you can't do it, as you would also trash the strings that come after. You cannot easily shift the strings around, as those are referenced by x86 machine code, which can do arbitrary pointer arithmetic.

This makes this method unusable in NixOS, since due to the nature of the Nix store, the replacement path will almost always be longer. For example, for this binary we would need to replace

/var/lib/fprint/fw
with something like
/nix/store/abcdefghijklmnopqrstuvwxyz012345-libfprint2-tod1-broadcom-firmware-0.0.6

chroot

The usual way to deal with problems like this in NixOS is to use a helper called buildFHSUserEnv, which is effictively a chroot:

buildFHSUserEnv provides a way to build and run FHS-compatible lightweight sandboxes. [...] This allows one to run software which is hard or unfeasible to patch for NixOS – 3rd-party source trees with FHS assumptions, [...]

Emphasis mine. The problem with this approach is that you cannot chroot a library. At best you can chroot the program that uses your library, but that set of programs may be unknown (in this particular case, there appears to be only one relevant program, fprintd).

It is also an example of a global solution to a local problem, as now all other code running inside of the fprintd process, and all child processes, run in this chroot, which can cause issues.

Dynamic linking

We cannot replace the string that is passed to fopen(), but can we replace fopen() itself? Indeed we can! At first glance replacing an entire function looks harder than replacing a parameter to that function, but the exact opposite is the case.

The driver library is dynamically linked, meaning it does not contain an implementation for fopen(). Instead, it imports fopen() from glibc, the GNU C standard library.

We can use tools like readelf and nm to inspect the data structures relevant for dynamic linking:

$ nm --dynamic libfprint-2-tod-1-broadcom.so
000000000000faa0 T AddNewSession
000000000002bc90 T appendCallback
000000000002bf40 T appendCommandIndex
                 U asctime@GLIBC_2.2.5
[...]
                 U fopen64@GLIBC_2.2.5
[...]
0000000000042a49 D version
                 U __vfprintf_chk@GLIBC_2.3.4
                 U __vsnprintf_chk@GLIBC_2.3.4

The symbols with a T (text) sign are defined in the library itself, the symbols with an U (undefined) are not defined in the library, and will be imported at runtime by the dynamic linker:

$ readelf --dynamic libfprint-2-tod-1-broadcom.so

Dynamic section at offset 0x40c40 contains 27 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libfprint-2-tod.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libcrypto.so.1.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
[...]

Which means that at runtime, the dynamic linker will load the libraries libfprint-2-tod.so.1, libcrypto.so.1.1, and libc.so.6. One of them (libc.so.6) will provide the fopen64@GLIBC_2.2.5 symbol, and the dynamic linker will put the address of that symbol somewhere where the driver library can find it.

The neat thing about this is that the symbol table displayed by nm, and the Dynamic section displayed by readelf are documented data structures, which means we can relatively easily modify them...

The @GLIBC_2.2.5 is related to symbol versioning, I'll not talk about this here.

Replacing the fopen() call

The plan is:

  1. Write a wrapper function, FILE* fopen_wrapper(const char* path, const char* mode);, which will inspect the path that's given to it. If it starts with /var/lib/fprint/fw, it will replace that prefix with another directory of our choosing. In any case, it will then call the real fopen() to open the file.
  2. Build a shared library containing that function. Let's call it libfopen_wrapper.so.
  3. Patch the driver library so that it imports fopen_wrapper instead of fopen64.
  4. Patch the driver library to add a dependency on libfopen_wrapper.so.
  5. (For Nix reasons, we must also patch in the directory of libfopen_wrapper.so)

Steps 1 and 2 are trivial, we're just writing a simple C function and compiling it into a shared library (cc -fPIC -shared stuff.c -o libfopen_wrapper.so).

Steps 4 and 5 are also trivial and handled by a tool called patchelf, which was initially created to patch binary blobs for Nix compatibility:

$ patchelf \
    --add-needed libfopen_wrapper.so \
    --set-rpath /path/to/the/wrapper/lib \
    libfprint-2-tod-1-broadcom.so

The only thing missing is step 3, renaming the imported function...

Renaming a symbol

At first I thought this is trivial, since the objcopy tool has the following flag:

--redefine-sym old=new
  Change the name of a symbol old, to new. This can be useful when one is trying
  link two things together for which you have no source, and there are name collisions.

... which sounds exactly like the functionality I want. Unfortunately, an ELF file contains two symbol tables, .symtab and .dynsym. .symtab is used by ld at compile time, .dynsym is used by the dynamic linker at runtime.

And objcopy only edits .symtab, which is the wrong one.

patchelf --replace-symbol

But patchelf already has a --replace-needed flag, which does something extremely similar. We can take that code and have it manipulate the .dynsym section instead.

The procedure to replace a symbol is:

  1. Iterate over the .dynsym section, which is an array of ElfXX_Sym structs
  2. The ElfXX_Sym.st_name field references a string in the .dynstr section, which contains the name of the symbol.
  3. If the name matches the old symbol name (fopen64), then...
  4. ... add the new symbol name (fopen_wrapper) to .dynstr, and adjust ElfXX_Sym.st_name
  5. (Also clear the symbol version if it exists)

In code, you can see this in my patchelf fork.

Then we add this newly implemented flag to our patchelf invocation:

$ patchelf \
    --replace-symbol fopen64 fopen_wrapper \
    --add-needed libfopen_wrapper.so \
    --set-rpath /path/to/the/wrapper/lib \
    libfprint-2-tod-1-broadcom.so

And now the driver library will take a detour through our fopen_wrapper everytime it tries to open a file!

You can see how this looks like in terms of a Nix derivation.

Caveats

I do not have the affected hardware, so I cannot test whether this actually works.

This method also assumes you know how your library opens files. There are many glibc functions that can open files: fopen, open, openat. And many functions in other libraries like glib, qt, kio, ...

This only works with dynamically linked libraries, and only with native code (so Java and .NET IL bytecode can't be edited that way... however replacing strings in those types of files is much easier than in native code)

This method quickly becomes impractical if FHS assumptions are pervasive in the target binary.

Advantages

The problem is the library, and the fix only affects the library. Neither other code running in the same process, nor child processses are affected by it.

No need to mess around with user namespaces and chroots.

This approach is basically a LD_PRELOAD, but scoped to a single library, instead of the entire process.

A simpler example

The fingerprint driver is quite complex, and you can't test it without having the hardware. For this reason I've set up a contrived example for testing this method.

First, running the example program directly:

$ nix run 'sourcehut:~raphi/elf-replace-symbol#simple-bad'

simple.c: opening and printing contents of '/lib/hardcoding-paths-is-bad.txt'
fopen: No such file or directory
(/lib/hardcoding-paths-is-bad.txt)

But after we inject a wrapper library:

$ nix run 'sourcehut:~raphi/elf-replace-symbol#simple-good'

simple.c: opening and printing contents of '/lib/hardcoding-paths-is-bad.txt'
fopen_wrapper.c: Replacing path '/lib/hardcoding-paths-is-bad.txt' with '/nix/store/jrmrns7msqsxkcbgml3zvq7pm0zybshq-git-2.36.0-doc/share/doc/git/git-stage.txt'
git-stage(1)
============

NAME
----
git-stage - Add file contents to the staging area

The Nix derivation is also available.

Final remarks

So far this is a cute proof of concept and it should be treated as such.