Security Implications of (Non)Reproducible Middleware Builds

phelix
Posts: 1634
Joined: Thu Aug 18, 2011 6:59 am

Security Implications of (Non)Reproducible Middleware Builds

Post by phelix »

This keeps popping up in other discussions so I would like to discuss this in a separate thread.
phelix wrote:
biolizard89 wrote:
domob wrote:
biolizard89 wrote:Reproducible builds are very important, and I don't think Joseph or I want the project to be completely dependent something that Tor devs were unable to do with several orders of magnitude more funding.
I agree that deterministic builds are important, but what is really stopping us from simply running the source through a Python interpreter? I thought that's how it is supposed to be done anyway. With respect to a reproducible Python interpreter: What's the point if you can simply use the one bundled and signed by a major distro? You have to trust "some" component of your OS anyway.
Whoops, totally missed this post, sorry Daniel.

Major Linux distros are working on making all their packages reproducible (particularly Debian, though also Fedora). Armory is basically working in Debian's reproducible build toolchain (unless I'm misremembering what Joseph said), so Python on Debian-based OS's is reproducible. So for Linux users, this is less of an issue. The bigger problem is making reproducible builds for non-Linux distros. If you're a Windows user, you inherently trust Microsoft, but you may not trust a Python interpreter that you download from the Python website, and you definitely shouldn't trust a Python interpreter that's embedded in a PyInstaller-generated .exe file that a random software vendor (such as us) provides. Python is near-impossible to build for Windows reproducibly, while Go is trivially easy from looking at Tor's Gitian scripts.
Repeating myself: At this point reproducible builds are a red herring distracting us from more important things to work on.

edited: Added "at this point"
This doesn't make sense to me. In addition, I think you're underestimating the importance of reproducible builds. I do not want to be in a position where someone is trying to compromise my machine in the hopes of delivering a backdoor to our users. I assume you don't want to be in that position either.
Repeating myself again: With reproducible builds you will still need to trust a compiler and other software at some point.

Why would anybody with such security concerns not simply run the source? Do you really believe anybody would trust our "reproducible build" more than official Python interpreter versions?
How much safer do things get by reproducibly building the Python interpreter for our builds?
nx.bit - some namecoin stats
nf.bit - shortcut to this forum

somename
Posts: 80
Joined: Mon Sep 15, 2014 3:12 pm
os: windows

Re: Security Implications of (Non)Reproducible Middleware Bu

Post by somename »

If you don't trust Python.org's checksum-ed binaries, why trust their source (and all the Python modules from authors who usually trust Python.org or Linux distro binaries)?

biolizard89
Posts: 2001
Joined: Tue Jun 05, 2012 6:25 am
os: linux

Re: Security Implications of (Non)Reproducible Middleware Bu

Post by biolizard89 »

I would suggest watching Mike Perry and Seth Schoen's talk at CCCongress 2014, which primarily focuses on motivations for reproducible builds.

https://media.ccc.de/browse/congress/20 ... video&t=18

I will not attempt to summarize it in a paragraph (just watch the talk), but I will note that one of their major motivations is that "attackers target a project's users through its developers". Reproducible builds protect both our users and us, by making a strategy of targeting our infrastructure much less effective.
Jeremy Rand, Lead Namecoin Application Engineer
NameID: id/jeremy
DyName: Dynamic DNS update client for .bit domains.

Donations: BTC 1EcUWRa9H6ZuWPkF3BDj6k4k1vCgv41ab8 ; NMC NFqbaS7ReiQ9MBmsowwcDSmp4iDznjmEh5

josephbisch
Posts: 69
Joined: Sun Nov 23, 2014 3:34 pm
os: linux

Re: Security Implications of (Non)Reproducible Middleware Bu

Post by josephbisch »

somename wrote:If you don't trust Python.org's checksum-ed binaries, why trust their source (and all the Python modules from authors who usually trust Python.org or Linux distro binaries)?
There is a chance that people can read the source code and understand what is going on and identify possible issues. How many people do you think there are in the world that can read the contents of binaries and understand what is going on? I would venture to guess the number is zero in the case of something as large as cpython.

Reproducible builds (if the resulting hashes of the binaries are consistent among all the builders) tell users that either the binaries actually came from the source code, or that all the builders are lying to us in a consistent way. If the latter case isn't considered for now, then reproducible builds allow us to step back and examine the binaries by examining the source code. And since many more people can read source code, there are more eyes on the code to identify possible issues with the code.

The whole point is to avoid trusting individual people and groups as much as possible. Just because many people may trust Python binaries from the official website or Linux distros currently doesn't mean that we should not try to progress beyond trusting binaries where possible. Even if you trust the Python Software Foundation to not purposefully add a backdoor to the Python binaries that they publish, they can't guarantee that the build process was not somehow compromised, unless they used reproducible builds. And I do recognize that it may just not be currently possible to make something as large as Python reproducible across all platforms without some effort. So I'm not attacking the Python Software Foundation for not having reproducible builds right now. But where possible, I think reproducibility should be the goal. You mention Linux distro binaries. You may know that Debian is moving along nicely with their reproducible build effort and other distros are looking to follow in Debian's footsteps. So steps are being taken by Debian at least to make the entire archive of packages reproducible. So eventually their Python packages will be reproducible. The tricky issue is building Python reproducibly for platforms other than Linux.

Yes, you do have to trust to some extent, but maybe one day when all open source software (including OSes) are being built reproducibly, that trust will be limited to hardware for users of open source software instead of hardware and software as we do now.

Or maybe reproducible builds will all just be a passing fad. Who knows.

I also recommend the talk from CCCongress 2014.

pmc
Posts: 73
Joined: Thu Oct 03, 2013 8:50 pm
Location: Germany
Contact:

Re: Security Implications of (Non)Reproducible Middleware Bu

Post by pmc »

IMO reproducible builds of security relevant software make some sense, because less trust is required of whoever publishes pre-built binaries.

There is, however, a point where it stops making sense. That's simply because at some point you have to trust the operating system, the compiler, the libraries, the headers and everything else needed for building. And you have to trust everything that was used to build the OS, the compiler, the libraries and everything else. And so on.

I suppose you know that it's possible to include a backdoor into a compiler that will include itself into every build of that compiler's (unmodified) source code.

somename
Posts: 80
Joined: Mon Sep 15, 2014 3:12 pm
os: windows

Re: Security Implications of (Non)Reproducible Middleware Bu

Post by somename »

josephbisch wrote: Yes, you do have to trust to some extent, but maybe one day when all open source software (including OSes) are being built reproducibly, that trust will be limited to hardware for users of open source software instead of hardware and software as we do now.

Or maybe reproducible builds will all just be a passing fad. Who knows.

I also recommend the talk from CCCongress 2014.
I understand the value, but considering the complexity of the process, waiting for a quarter or two until Gitian gets improved seems like a very attractive option...

Couple of weeks ago I spent few hours on trying to use Gitian and it was a nightmare (because there docs are outdated and/or incomplete). I plan to return to it when I have more time, but for now I think unless there's a no-brainer guide for NMC + Gitian, we'd end up with a reproducible build signed by only 2-3 people (which may still be worth it?).

The vid is good.

biolizard89
Posts: 2001
Joined: Tue Jun 05, 2012 6:25 am
os: linux

Re: Security Implications of (Non)Reproducible Middleware Bu

Post by biolizard89 »

pmc wrote:IMO reproducible builds of security relevant software make some sense, because less trust is required of whoever publishes pre-built binaries.

There is, however, a point where it stops making sense. That's simply because at some point you have to trust the operating system, the compiler, the libraries, the headers and everything else needed for building. And you have to trust everything that was used to build the OS, the compiler, the libraries and everything else. And so on.

I suppose you know that it's possible to include a backdoor into a compiler that will include itself into every build of that compiler's (unmodified) source code.
You're referring to the Trusting Trust attack. That attack is possible to mitigate, by compiling the compiler with multiple compilers (including proprietary compilers) and seeing if the resulting compilers produce the same binary of the application you're building. This only works if your application can be reproducibly built. While this is a difficult problem, it does not mean that people shouldn't work on it. (There are a number of people working on this task.)

In addition, trusting just the compiler is better than trusting both the compiler and the person who built the application. The goal here is to raise the costs of compromising the resulting binary; decreasing the number of trusted parties is a perfectly effective way of doing this.
somename wrote:I understand the value, but considering the complexity of the process, waiting for a quarter or two until Gitian gets improved seems like a very attractive option...

Couple of weeks ago I spent few hours on trying to use Gitian and it was a nightmare (because there docs are outdated and/or incomplete). I plan to return to it when I have more time, but for now I think unless there's a no-brainer guide for NMC + Gitian, we'd end up with a reproducible build signed by only 2-3 people (which may still be worth it?).

The vid is good.
The Gitian documentation in the Bitcoin Core repo is reasonably complete and up-to-date. Don't use the docs from the Gitian repo, as Bitcoin's usage is a little bit different. If you have any trouble with it, please feel free to ask questions -- we have a number of people here who use Gitian (jbisch, jonasbits, midnightmagic) who may be able to help you. All three of them also hang out on #namecoin-dev, as does Luke-Jr, who also uses Gitian.
Jeremy Rand, Lead Namecoin Application Engineer
NameID: id/jeremy
DyName: Dynamic DNS update client for .bit domains.

Donations: BTC 1EcUWRa9H6ZuWPkF3BDj6k4k1vCgv41ab8 ; NMC NFqbaS7ReiQ9MBmsowwcDSmp4iDznjmEh5

somename
Posts: 80
Joined: Mon Sep 15, 2014 3:12 pm
os: windows

Re: Security Implications of (Non)Reproducible Middleware Bu

Post by somename »

biolizard89 wrote: The Gitian documentation in the Bitcoin Core repo is reasonably complete and up-to-date. Don't use the docs from the Gitian repo, as Bitcoin's usage is a little bit different. If you have any trouble with it, please feel free to ask questions -- we have a number of people here who use Gitian (jbisch, jonasbits, midnightmagic) who may be able to help you. All three of them also hang out on #namecoin-dev, as does Luke-Jr, who also uses Gitian.
If there was a complete and correct working documented procedure for Namecoin Core, I can't see why anyone would be against the effort.

I think the challenge is in the effort of creating and keeping up to date this documentation (even bitcoin, which has a very large community and is the earliest significant adopter of Gitian, isn't quite there yet).

biolizard89
Posts: 2001
Joined: Tue Jun 05, 2012 6:25 am
os: linux

Re: Security Implications of (Non)Reproducible Middleware Bu

Post by biolizard89 »

somename wrote:
biolizard89 wrote: The Gitian documentation in the Bitcoin Core repo is reasonably complete and up-to-date. Don't use the docs from the Gitian repo, as Bitcoin's usage is a little bit different. If you have any trouble with it, please feel free to ask questions -- we have a number of people here who use Gitian (jbisch, jonasbits, midnightmagic) who may be able to help you. All three of them also hang out on #namecoin-dev, as does Luke-Jr, who also uses Gitian.
If there was a complete and correct working documented procedure for Namecoin Core, I can't see why anyone would be against the effort.

I think the challenge is in the effort of creating and keeping up to date this documentation (even bitcoin, which has a very large community and is the earliest significant adopter of Gitian, isn't quite there yet).
Well, I can tell you that the first time I tried following Bitcoin's instructions, it worked perfectly with no problems. This was on Linux Mint 15, I think. More recently I switched my OS to Qubes, which caused Gitian to break on my install because Qubes does weird things involving virtualization that aren't compatible with Gitian by default. Yesterday I finally got Gitian working on Qubes (thanks to suggestions from Marek Marczykowski and Joseph Bisch), and I will publish exactly what steps I followed soon. But if you're on a typical OS that can run VirtualBox, Bitcoin's instructions should work just fine. Building Namecoin instead of Bitcoin is basically the same procedure, you just have to say "namecoin" instead of "bitcoin" in a few places. If you're one of the few awesome people who uses Qubes, then I agree that the documentation sucks. :)

Would it be worthwhile for Namecoin to maintain instructions for building Namecoin Core (which might differ from Bitcoin's instructions in a few strings)? The benefit is that it would be easier for people to copy/paste; the downside is that if Bitcoin updates their docs, we might have to manually resolve merge conflicts. (In many cases I'm qualified to resolve those merge conflicts. Joseph or Jonas or midnightmagic probably would be too, but I don't want to volunteer them for additional work without their consent.)

All that said, the main challenge isn't really documentation. The main challenge that led to this discussion is using only upstream projects that can be made to build reproducibly. C/C++ is pretty good at this (Bitcoin does it, Tor does it, and Jonas and I were able to get libcoin to do it). Go is also pretty good at this in cases where cgo isn't used (although Go 1.5 and higher are less convenient). Python is really, really bad at this (even just cross-compiling non-reproducibly is an absolute nightmare). I hope that Python fixes their code eventually, and Joseph has made a lot of progress on this, but I don't think it's safe to assume that Python will be reproducible soon. As a result, if we want reproducible builds, we cannot use Python (for the moment). Sadly, some of the code that we have (NMControl) is in Python. The reason for this discussion was that there is an alternative implementation (ncdns) which is in Go, and therefore is probably much easier to build reproducibly. Some people who like reproducible builds consider this a reasonably significant argument for using ncdns instead of NMControl.
Jeremy Rand, Lead Namecoin Application Engineer
NameID: id/jeremy
DyName: Dynamic DNS update client for .bit domains.

Donations: BTC 1EcUWRa9H6ZuWPkF3BDj6k4k1vCgv41ab8 ; NMC NFqbaS7ReiQ9MBmsowwcDSmp4iDznjmEh5

somename
Posts: 80
Joined: Mon Sep 15, 2014 3:12 pm
os: windows

Re: Security Implications of (Non)Reproducible Middleware Bu

Post by somename »

biolizard89 wrote: Would it be worthwhile for Namecoin to maintain instructions for building Namecoin Core (which might differ from Bitcoin's instructions in a few strings)? The benefit is that it would be easier for people to copy/paste; the downside is that if Bitcoin updates their docs, we might have to manually resolve merge conflicts. (In many cases I'm qualified to resolve those merge conflicts. Joseph or Jonas or midnightmagic probably would be too, but I don't want to volunteer them for additional work without their consent.)

All that said, the main challenge isn't really documentation. The main challenge that led to this discussion is using only upstream projects that can be made to build reproducibly. C/C++ is pretty good at this (Bitcoin does it, Tor does it, and Jonas and I were able to get libcoin to do it). Go is also pretty good at this in cases where cgo isn't used (although Go 1.5 and higher are less convenient). Python is really, really bad at this (even just cross-compiling non-reproducibly is an absolute nightmare). I hope that Python fixes their code eventually, and Joseph has made a lot of progress on this, but I don't think it's safe to assume that Python will be reproducible soon. As a result, if we want reproducible builds, we cannot use Python (for the moment). Sadly, some of the code that we have (NMControl) is in Python. The reason for this discussion was that there is an alternative implementation (ncdns) which is in Go, and therefore is probably much easier to build reproducibly. Some people who like reproducible builds consider this a reasonably significant argument for using ncdns instead of NMControl.
I think it would be better to submit improvements upstream (to Bitcoin Core), then the easier part (adjusting steps for Namecoin Core) could be documented by a Namecoin non-dev (or dev).

Python: I think I pasted links to bugs referred to related to Python (they have been resolved/closed). But then new ones have been discovered (https://wiki.debian.org/ReproducibleBui ... rch=Titles).
Some have already been fixed (https://bugs.debian.org/cgi-bin/bugrepo ... bug=759231) and others have existing workarounds (https://wiki.debian.org/ReproducibleBui ... ionNumbers). Perhaps we're only a quarter away from being able to build deterministic builds on Debian (but not on Ubuntu?).

While searching for Gitian related stuff I found this: https://www.bountysource.com/issues/841 ... to-bitcoin (a closed issue).
Few weeks back when trying to build Namecoin Core 0.11 I encountered a related issue (https://github.com/namecoin/namecoin-core/issues/31). Maybe those who use Gitian for both Namecoin and Bitcoin would run into this issue unless they were to take snapshots or clones of their Gitian environments.

Post Reply