I have forked a project’s source code on GitHub. The program takes a private key as an input and that key must never leave the client. If I want to share a pre-built executable as a release it is essential that I can prove beyond reasonable doubt that it is built from the published source.
I have learned about how to publish the releases by using a Workflow in the GitHub actions such that GitHub itself will build the project and then repare a release draft with the built files as well as the file hashes…
However, I noticed that the release is first drafted, and at that point I have the option to manually swap the executable and the hashes. As far as I can tell, a user will not be able to tell if I swapped a file and its corresponding hashes. Or, is there a way to tell?
One potential solution that I have found is that I can pipe the output of the hashing both to a file that is stored and also to the publicly visible logs by using “tee”. This will make it such that someone can look through the logs of the build process and confirm that the hashes match the hashes published in the release.
Like this:
I would like to know whether:
-
There is already some built-in method to confirm that a file is the product of a GitHub workflow
-
The Github Action logs can easily be tampered by the repo owner, and the hashes in the logs can be swapped, such that my approach is still not good enough evidence
-
If there is another, perhaps more standard method, to prove that the executable is built from a specific source code.
I don’t know whether github actions output can be tampered with by you, but the only actually reliable way (that I know of) to prove that your binaries correspond to a certain state of the sourcecode is to support reproducible builds (See e.g. https://reproducible-builds.org/).
All other methods require trust (in either the developer or w.r.t. github actions towards github).
The drawback is of course, that to verify whether your binaries are good, someone needs to rebuild the software, but it is a good tool to build and maintain trust in your signed binaries, especially if they deal with sensitive information like private keys.
An important point to add for someone who hasn’t heard of reproducible builds before: The key difference to a normal build process is that it is 100% deterministic i.e. it produces exactly the same output every time.
You might think that most built processes would be like this by default, however this is not the case. Compilers and linkers usually have some non-deterministic values that they put in the final binary such as timestamps. For a build to be deterministic these sources of variation must be disabled or mocked to be a repeatable value (i.e. not based on the actual compile time).
True, while I think the page that I linked explains the concept well, it might not be easy to digest for someone who is new to software development.
But then again, if you handle cryptographic materiel, you better learn fast 😃
Yeah that site is pretty good. There’s a lot of information though. I think a good starting point is maybe this page: https://reproducible-builds.org/docs/env-variations/
Yeah, this topic would actually lend itself to an intro video which demonstrates the problem on a tiny project.
Unfortunately given how hard reproducible builds are they aren’t done much, and aren’t talked about much. A vicious cycle. A nice short video would indeed be helpful for understand and awareness.
All other methods require trust (in either the developer or w.r.t. github actions towards github).
Hopefully some day I will be able to create reproducible builds independently of github. But I am thinking that their workflows are reproducible builds, correct? So, anyone should be able to fork the project and run the workflow and it will build the program in the same way. I am O.K with the user needing to trust GitHub on this - it really is me who I worry about. I don’t want to tell someone that they have to trust me. I want to be able to remove blind trust from my own personal contribution. The program itself is built on top of many dependencies, so the user is also implicitly trusting a large amount of maintainers.
The drawback is of course, that to verify whether your binaries are good, someone needs to rebuild the software, but it is a good tool to build and maintain trust in your signed binaries, especially if they deal with sensitive information like private keys.
In my specific scenario I’m forking a community project (a crypto wallet) that the maintainers no longer want to maintain nor share PR access to. I’m adding a patch to fix some broken hard-coded endpoints. So what I want to be able to do is to transparently say “Here is my very simple commit that you can read, and here is the executable in case you want to download the fixed wallet but are not technically savvy enough to build it”. I don’t have any reputation in this community, nor do I share my identity. I would prefer to be able to remove the element of trust. Asking trusted members of the community to build from source and verify the checksums would be nice, but I don’t think it is such a simple thing to ask in this case.
(My instance won’t fetch content from lemmy.world, I’m not sure why… That’s why I switched to this account)
But I am thinking that their workflows are reproducible builds, correct?
A reproducible build is more than an automated build. It is a build process which enables any third party to build a binary that is bit-by-bit identical (see https://reproducible-builds.org/docs/definition/).
So if I would build a specific release/commit of your application on my PC (given an identical development environment, i.e. same version dependencies, compiler, etc.) it MUST result in a bit-by-bit identical binary to the one you built on your development machine and the one the github workflows built.
All these binaries would result in the same hash (and thus be verifiable by the same signature files).
“Here is my very simple commit that you can read, and here is the executable in case you want to download the fixed wallet but are not technically savvy enough to build it”
Other than a signed binary from a trusted developer/organization, there is (IMHO) no way for a non-tech savvy user to gauge the trustworthiness of a binary they download from the internet, and even then a signing key might have been lost or broken (see the recent Microsoft debacle w.r.t. AD signing key misuse).
Thanks a lot. I have been evangelized by you and the other commenters. I see now that reproducible builds is the solution.
I now understand better the value of reproducible builds, and the more I think about it the more I realize that it is very bad that something as sensitive as a crypto wallet executable that does not follow the reproducible build standard has been going around. I do trust that the devs are not being malicious, but it is essential to have a good way to verify. Even the original github workflow is failing to build now, and new flags need to be passed to npm while building due to some openssl changes, so I’m not sure that anyone can actually reproduce the build today and get the same hash.
I’ll read more about how to do it properly, and I’ll try to create a Reproducible Build fork if I can actually pull it off.
Reproducible builds. And then multiple parties to confirm the build. So a reproducible build and then f Droid to build the product would allow people to have confidence that they have the right thing. But if people are truly concerned about security they should build it from the source directly and then verify that signature against your reproducible build
Thanks! I am convinced now, I will learn how to create reproducible builds.
My worry is that the build is run through npm, and I think that the dependencies rely on additional dependencies such as openssl libraries. I worry that it will be a lot of work to figure out what every npm dependency is, what libraries they depend on, and how to make sure that the correct versions can be installed and linked by someone trying to reproduce the build 10 years from now. So it looks like a difficult project, but I will read more about it and hopefully it is not as complicated as it looks!
there are some excellent blogs/articles and books on it. Basically your entire build chain has to be tooled for reproducibility, so things like Rust are very good as a foundation.
Ooh, I did not know this one was of the properties of Rust.
There’s a paper from like 30 years ago about how you can never verify an executable because you don’t know that your compiler isn’t doing something nefarious. And if you do know that somehow, you don’t know it about it’s compiler, and so on. Scary a stuff.
Ooh, I think I found the paper!
Oof:
The actual bug I planted in the compiler would match code in the UNIX “login” command. The re- placement code would miscompile the login command so that it would accept either the intended encrypted password or a particular known password. Thus if this code were installed in binary and the binary were used to compile the login command, I could log into that system as any user
That’s it! Thanks!
I think you can even upload release files manually, independently of if you use actions or not, so it can never be guaranteed that it was built from the sources.
The only way to verify this may be to build it again and see if the result matches the published bins, but if the project does not do reproducible builds, then it may not match even if it was genuine.
I think you can even upload release files manually, independently of if you use actions or not, so it can never be guaranteed that it was built from the sources.
True, but that’s why my current idea is the following:
As part of the wortkflow, GitHub will build the executable, compute a few different hashes (sha256sum, md5, etc…), and those hashes will be printed out in the GitHub logs. In that same workflow, GitHub will upload the files directly to the release.
So, if someone downloads the executable, they can compute the sha256sum and check that it matches the sha256 that was computed by github during the action.
Is this enough to prove that executable they are downloading the same executable that GitHub built during that workflow? Since a workflow is associated a specific push, it is possible to check the source code that was used for that workflow.
In this case, I think that the only one with the authority to fake the logs or mess with the source during the build process would be GitHub, and it would be really hard for them to do it because they would need to prepare in advance specifically for me. Once the workflow goes through, I can save the hashes too and after that both GitHub and I would need to conspire to trick the users.
So, I am trying to understand whether my idea is flawed and there is a way to fake the hashes in the logs, or if I am over-complicating things and there is already a mechanism in place to guarantee a build.
As long as maintainers can upload arbitrary files to a release, this is not enough, I think. There is no distinction between release files coming from the build process, and release files just uploaded by the maintainer.
But, if during Github’s build process the sha156sum of the output binary is printed, and the hash matches what is in the release, isn’t this enough to demonstrate that the binary in the release is the binary built during the workflow?
Well, kind of.
If the printed hash checksum matches with that publish in the release, and it also matches the hash checksum of the release files, then it guarantees that the release files were produced by the github build process. However its very involved to verify that the released hash checksum was the same that was printed by the build process. This probably could be solved by having Github sign all release builds with their own keys. Since signing keypairs usually rarely change, this could be an easier way for verification.
This would verify that the binary was built during the github actions workflow, but only that. Unfortunately, there is much more to it.
First, in the build process, github will use whatever build scripts and instructions the repo maintainer has specified in the github actions files. The purpuse of one of the build scripts may be only to throw away the checked out sources and download a different set from a different place. Or to just add a single more dependency, or just a file, that will compromise the software. However if you have verified yourself that the build scripts only work with reputable sources of dependencies, the repository in question, and other repositories of the maintainer that you have also inspected, then its not really a problem, probably.
But then there is also the question if you trust github (and because of that microsoft, but also the USA because of laws) with always building from the sources, and adding nothing more.
But then there is also the question if you trust github (and because of that microsoft, but also the USA because of laws) with always building from the sources, and adding nothing more.
Yesterday I would have said ‘blah, they would not care about my particular small project’. But since then I read the paper recommended by a user in this post about building a compromised compiler that would installs a back-door to a type of login field. I now think it is not so crazy to think that intelligence agencies might collude with Microsoft to insert specific back-doors that somehow allows them to break privacy-related protocols or even recover private keys. Many of these might rely on a specific fundamental principle and so this could be recognized and exploited by a compiler. I came here for a practical answer to a simple practical situation, but I have learned a lot extra 😁
blah, they would not care about my particular small project
I think there is more to this. Maybe you are targeted because you(r project) reach someone else (the actual target, who you may not even know), but I could also imagine it happening like data mining in the past years: they are not after me or you, they are after everyone and anyone they can reach.
You might be able to adapt OpenBSD’s signify software for your purpose.
If I understand this correctly, signify would allow someone to verify that the executable was built by me. But then they would still have to trust me, because I can also sign the malicious executable.
deleted by creator
I think that any step that facilitates verifying the build is great. If trust is required, then I should simply not release any executables if I want to remain anonymous. I would like to be able to release executables without needing to ask people to blindly trust me. I would like to be able to show them reasonably good evidence that the program is built from the source that I say it is.
Yes, there is no avoiding that. But it’s a way of saying that the executable was built by you.
Thanks. In the future I work using the Reproducible Builds practices and use OpenBSD to sign my builds.
In the immediate situation I want to know whether there is a way to use GitHub as my trusted third-party builder. I would like to share something with people - some of who might not have the skills to replicate the build themselves, but I still would like to be able to point them to something that is easy to understand and give them argument.
My current argument is: “See, in the github logs you can see that github generated that hash internally during the workflow, and it matches the hash of the file that you have downloaded. So this way you can be sure that this build really comes from this source code, which was only changed here and there”. Of course I need to make absolutely sure that my argument is solid. I know that I’m not being malicious, but I don’t want to give them an argument of trust and then find out that I have mislead them about the argument, and that it was in fact possible to fake this.
The project would have to support reproducible builds somehow. For example, supply a Makefile and a hash of the generated executable.
Checkout sigstore and other pieces of the SLSA specification
What’s your concern? If there was a lawsuit I believe I discovery they’d find you didn’t modify the release on GitHub, right?
No, I’m not concerned about a lawsuit. It’s something that I want to do because I think that it is important. If I want to share tools with non-tech savvy people who are unable to build them from source, I want to be able to share these without anyone needing to “trust” me. The reproducible builds standards are a very nice idea, and I will learn how to implement them.
But I still wonder whether my approach is valid or not - is printing the hash of the output executable during Github’s build process, such that it is visible in the workflow logs, very strong evidence that the executable in the release with the same hash was built by github through the transparent build process? Or is there a way a regular user would be able to fake these logs?
Okay, I see your point now. I don’t know enough about low level GitHub Actions stuff to answer.
As far as I’m aware, there is no way to fully know there wasn’t any tampering or swapping of executables that were produced by a workflow. As most things on the internet, I believe there needs to be a degree of trust towards the original author and original owner of the repository that what they published is indeed a built executable from the original source. If there is any doubt about this, the only verifiable way to know for sure, if for a potential user to build from source themselves.
I can think of ways where there is a trusted third party that provides a public key with which to sign the built executable, after which it can be checked by the third party (with its private key) whether it is still the same executable. Specially if a different key pair is used for every signing operation. But there are still flaws there, and would, ultimately, still rely on a degree of trust in the third party.
Let’s say that I do trust GitHub as the third party. Is it possible to ask GitHub itself to sign the executable with a specific key created for a given workflow, and that only GitHub owns? Maybe it already signs it. I’ll look into it.
(My instance won’t fetch content from lemmy.world, I’m not sure why… That’s why I switched to this account)
Github doesn’t do any signing at all nor do they rally care about the actual output of actions, pipelines or manual releases (all of that is out of their interest scope).
If there’s any means of a ‘secret store’ for the build actions then you could store a keypair for signing the binaries as far as your target binary format and platforms support it (or go for something like a detached gpg-signature that can be stored with the build or in a central ‘trusted’ repository so the binary can be verified against it later).
You users however would still have no easy means to verify that signature on most platforms unless they are tech-savvy. (macOS code signing / notarization and gatekeeper check would be an example of a platform that would notify users and even fail to run the binary if it was tampered with).
Besides the obvious of telling your users to build the exe, have you considered alternative distribution methods like docker?
How does a docker distribution solve this problem? Is it because the build instructions are automated by the Dockerfile?
When you make a docker image and push it to dockerhub all of the instructions it took appear there so it’s very transparent, also super easy for any person to build it themselves unlike executables, just download the Dockerfile and run a single command
Ah. Cool. I was under the impression that docker images suffered from a similar issue - that one can’t verify that the image is built from the source. I’m happy to be mistaken about that.
You could definitely do clever things to obfuscate what you’re doing, but it’s much easier to replicate building the image as there are no external dependencies, if you have docker installed then you can build any docker image