If you have been watching the news during the last 2-3 years, you might occasionally have heard about reproducible builds.
What are reproducible builds?
The reproducible-builds project says
“Reproducible builds” aim to provide a verifiable path from software source code to its compiled binary form.
In its simplest form it means, that you can build something twice and get bit-identical results. The builds should be able to happen on different machines and at different times. Given that computers are deterministic machines, that seems not too hard. Except that some build processes do certain indeterministic things. Such as storing the build time or random numbers in their binary. Or they are using directory entries in the indeterministic order returned by the filesystem. I collected ten different causes in theunreproduciblepackage.
Why are reproducible builds important?
There is a large number of benefits. One is that it becomes easier to detect hacked build machines that insert backdoors into build results. But while that may be important, it is hard to quantify.
I heard, governments start asking their IT-suppliers for reproducible builds of their software. Consequently, those need to work with their OS vendors to fulfill such requests.
On the practical side, when building packages in OBS, it helps to have them reproducible because of automatic rebuilds. E.g. when someone fixed a small typo in a library, depending packages are rebuilt by OBS. If those rebuilds give the same results as before, OBS will notice it. It will then not push a new version to mirrors and not trigger rebuilds of packages depending on that one. This can save quite some build power and bandwidth.
Additionally, delta-rpms (that are used in maintenance updates) will often be more compact. That is because they only contain the changes from what was meant to change but not unrelated random changes.
How is openSUSE and SLE doing reproducible builds?
Work happens as part of the wider cross-distribution reproducible-builds project. I am continuously testing and fixing openSUSE Tumbleweed packages using custom scripts. This work includes working with dozens of different upstream projects. Since this has been going on for two years now, many important patches have also found their way into the SLE-15 codebase.
What is the progress status?
Currently 263 of 10650 Factory packages have major reproducibility issues. Another 305 have minor issues that are already ignored by build-compare. So those are good from the OBS point of view, but they do not give perfect bit-identical results yet. There are also 31 packages that I found to not build in 15 years from now. This timescale is relevant for maintaining an enterprise distribution. Because many users pick up new versions slowly and are reluctant to do major upgrades.
However, packages built in Tumbleweed, are not fully reproducible, yet. This is because they contain the build time and build hostname in rpm headers. Probably because people feel that they want to know. Possibly in case there is some problem with a certain build host, it makes it easier to track the issue. But, when it was always possible to generate identical results on any machine at any time, this would not be needed. So it first needs to be mostly reproducible before it can be fully reproducible.
When I started counting these stats in January 2017, there were 840 packages with major issues. Plus 1546 packages with minor issues, so my work already has improved the situation a lot. However, I expect future progress to be slower, because all the easy fixes are done. Solving the remaining issues will take more effort for each one. Maybe it will be in time for SLE-16…
Appendix: Reproducibility stats table for openSUSE Tumbleweed: