Rewrite in Rust: paketkoll - Check installed distro files for changes

Vorpal@programming.dev · 6 months ago

The standard library does have some specialisation internally for certain iterators and collection combinations. Not sure if it will optimise that one specifically, but Vec::into_iter().collect::<Vec>() is optimised (it may look silly, but it comes up with functions returning impl Iterator

Vorpal@programming.dev · 6 months ago

Please, send an email to lwn@lwn.net to report this issue to them, they usually fix things quickly.

Vorpal@programming.dev · edit-2 6 months ago

Sounds interesting! As I don’t know restic that this is apparently based on, what are the differentiating factors between them? While I’m always on board for a rewrite in Rust in general, I’m curious as to if there is anything more to it than that.

EDIT: seems this is already answered in the FAQ, my bad.

Vorpal@programming.dev · 7 months ago

With native code I mean machine code. That is indeed usually produced by C or C++, though there are some other options too, notably Rust and Go both also compile to native machine code rather than some sort of byte code. In contrast Java, C# and Python all compile to various byte code representations (that are usually much higher level and thus easier to figure out).

You could of course also have hand written assembly code, but that is rare these days outside a few specific critical functions like memcpy or media encoders/decoders.

I basically learnt as I went, googling things I needed to figure out. I was goal oriented in this case: I wanted to figure out how some particular drivers worked on a particular laptop so I could implement the same thing on Linux. I had heard of and used ghidra briefly before (during a capture the flag security competition at univerisity). I didn’t really want to use it here though to ensure I could be fully in the clear legally. So I focused on tracing instead.

I did in fact write up what I found out. Be warned it is a bit on the vague side and mostly focuses on the results I found. I did plan a followup blog post with more details on the process as well as more things I figured out about the laptop, but never got around to it. In particular I did eventually figure out power monitoring and how to read the fan speed. Here is a link if you are interested to what I did write: https://vorpal.se/posts/2022/aug/21/reverse-engineering-acpi-functionality-on-a-toshiba-z830-ultrabook/

Vorpal@programming.dev · edit-2 7 months ago

The term you are looking for in general is “reverse engineering”. For software in particular you are looking at disassembly, decompilation and various forms of tracing and debugging.

As for particular software: For .NET there is ILSpy that can help you look into how things work. For native code I have used Ghidra in the past.

Native code is a lot more effort to understand. In both cases things like variable names names will be gone. Most function names will be missing (even more so for native code). Type names too. For native code the types themselves will be gone, so you will have to look at what is going on and guess if something is a struct or an array. How big is the struct and what are the fields?

Left over debug or logging lines are very valuable in figuring out what something is. Often times you have to go over a piece of disassembly or decompiled code several times as your understanding of it gradually builds.

C++ code with lots of object orientation tends to be easier to figure out the big picture of than C code, as the classes and inheritance provides a more obvious pattern.

Then there is dynamic tracing (running under some sort of debugger or call tracer to see what the software does). I have not had as much success with this.

Note that I’m absolutely an amateur at reverse engineering. I thought it was interesting enough that I wanted to learn it (and I had a small project where it was useful). But I’m mostly a programmer.

I have done a lot of low level programming (C, C++, even a small amount of assembly, in recent times a lot of Rust), and this knowledge helps when reverse engineering. You need to understand how compilers and linkers lowers code to machine code in order to have a fighting chance at reversing that.

Also note that there may be legal complications when doing reverse engineering, especially with regards to how you make use of the things you learned. I’m not a lawyer, this is not legal advice, etc. But check out the legal guidelines of Asahi Linux (who are working on reverse engineering M1 macs to run Linux on them): https://asahilinux.org/copyright/ (scroll down to “reverse engineering policy”).

Now this covers (at a high level) how to figure things out. How you then patch closed source software I have no idea. Haven’t looked into that, as my interest was in figuring out how hardware and drivers worked to make open source software talk to said hardware.

Vorpal@programming.dev · 7 months ago

I have read it, it is a very good book, and the memory ordering and atomics sections are also applicable to C and C++ since all of these languages use the same memory ordering model.

Can strongly recommend it if you want to do any low level concurrency (which I do in my C++ day job). I recommended it to my colleagues too whenever they had occasion to look at such code.

I do wish there was a bit more on more obscure and advanced patterns though. Things like RCU, seqlocks etc basically get an honorable mention in chapter 10.

Vorpal@programming.dev · edit-2 7 months ago

Yes, Sweden really screwed up the first attempt at switching to Gregorian calendar. But there were also multiple countries who switched back and forth a couple of times. Or Switzerland where each administrative region switched separately.

But I think we in Sweden still “win” for worst screw up. Also, there is no good way to handle these dates without specific reference to precise location and which calender they refer to (timestamps will be ambiguous when switching back to Julian calendar).

Vorpal@programming.dev · 7 months ago

My guess is that the relevant keyword for the choice of OpenSSL is FIPS. Rusttls doesn’t (or at least didn’t) have that certification, which matters if you are dealing with US government (directly or indirectly). I believe there is an alternative backend (instead of ring) these days that does have FIPS though.

Vorpal@programming.dev · 7 months ago

Rewrite in Rust: paketkoll - Check installed distro files for changes

Vorpal@programming.dev · 7 months ago

Another aspect is that calling a cli command is way slower than a library function (in general). This is most apparent on short running commands, since the overhead is mostly fixed per command invocation rather than scaling with the amount of work or data.

As such I would at the very least keep those commands out of any hot/fast paths.

Vorpal@programming.dev · 8 months ago

That assembly program the author compares to is waay bloated. This guy managed with 105 bytes: https://nathanotterness.com/2021/10/tiny_elf_modernized.html (that is with overlapping part of the code into the ELF header and other similar level shenanigans). ;)

All kidding aside, interesting article.

Vorpal@programming.dev · 8 months ago

The example FileDescriptorPollContext doesn’t really work. What if my runtime uses io-uring instead of polling? Those need very different interfaces to be sound. How do you abstract over that.

Vorpal@programming.dev · 8 months ago

Swedish layout. Not ideal for coding (too many things like curly and square brackets etc are under altgr. And tilde and backtick are on dead keys.

But switching back and forth as soon as you need to write Swedish (for the letters åäö) is just too much work. And yes, in the Swedish alphabet they are separate letters, not aao with diacretics.

Vorpal@programming.dev · 8 months ago

Two tips that work for me:

After cargo add I have to sometimes run the “restart rust-analyzer” command from the vscode command pallette (exact wording may be off, I’m on my phone as of writing this comment). Much faster than cargo build.
Consider using sccache to speed up rebuilds. It helps a lot, though uses a bit of disk space. But disk space is cheap nowadays (as long as you aren’t stuck with a laptop with soldered SSD, in which case you know what not to buy next time).

Vorpal@programming.dev · 8 months ago

Thanks for the clear and detailed explanation!

Vorpal@programming.dev · 8 months ago

Looks cool. Absolutely not my area of knowledge let alone expertise. But I thought digital colour stuff was all about ICC profiles (that basically describe how wrong a device handles colour and how to correct for it).

I don’t see any mention of ICC profiles in the docs though? Or is this the lower building block which you would use to work with data from ICC profiles? Basically I think I’m asking: who would use this crate and for what? Image viewers/editors?

Vorpal@programming.dev · 8 months ago

Unlocking Rust's power through mentorship and knowledge spreading, with Tim McNamara :: Rustacean Station

Vorpal@programming.dev · 8 months ago

I don’t feel like rust compile times are that bad, but I’m coming from C++ where the compile times are similar or even worse. (With gcc at work a full debug build takes 40 minutes, with clang it is down to about 17.)

Rust isn’t an interpreted or byte code compiled language, and as such it is hard to compete with that. But that is comparing apples and oranges really. Better to compare with other languages that compile to machine code. C and C++ comes to mind, though there are of course others that I have less experience with (Fortran, Ada, Haskell, Go, Zig, …). Rust is on par with or faster than C++ but much slower than C for sure. Both rust and C++ have way more features than C, so this is to be expected. And of course it also depends on what you do in your code (template heavy C++ is much slower to compile than C-like C++, similarly in Rust it depends on what you use).

That said: should we still strive to optimise the build times? Yes, of course. But please put the situation into the proper perspective and don’t compare to Python (there was a quote by a python developer in the article).

Vorpal@programming.dev · 9 months ago

It all depends on what part you want to work with. But some understanding of the close to hardware aspects of rust wouldn’t hurt, comes in handy for debugging and optimising.

But I say that as somone who has a background (and job) in hard realtime c++ (writing control software for industrial vehicles). We recently did our first Rust project as a test at work though! I hope there will be more. But the question then becomes how to teach 200+ devs (over time, gradually presumably). For now it is just like 3 of us who know rust and are pushing for this and a few more that are interested.

Vorpal@programming.dev · 9 months ago

I would indeed consider Go a bigger language, because I do indeed think in terms of the size of the runtime.

But your way of defining it also makes sense. Though in those terms I have no idea if Go is smaller or not (as I don’t know Go).

But Rust is still a small language by this definition, compared to for example C++ (which my day job still involves to a large extent). It is also much smaller than Python (much smaller standard library to learn). Definitely smaller than Haskell. Smaller than C I would argue (since there are leas footguns to keep in mind), though C has a smaller standard library to learn.

What other languages do I know… Erlang, hm again the standard library is pretty big, so rust is smaller or similar size I would argue. Shell script? Well arguably all the Unix commands are the standard library, so that would make shell script pretty big.

So yeah, rust is still a pretty small language compared to all other languages I know. Unsafe rust probably isn’t, but I have yet to need to write any (except one line to work around AsRawFd vs AsFd mismatch between two libraries).

Vorpal@programming.dev · 9 months ago

can have a nontrivial (or “thick”) runtime and doesn’t need to limit itself to “zero-cost” abstractions.

Wouldn’t that be a bigger rust rather than a smaller one?

Not an area I’m particularly interested in, given that I do embedded and hard realtime development. Rust is the best language for that now, I just which allocations were fallible as well. And storage/allocator API was stabilised.

Vorpal@programming.dev · 11 months ago

Here are some I found and used in my own code:

itertools
regex
anyhow and thiserror (error handling)
indoc (indented/formatted multi line string literals)
strum (various derive macros for enums)
petgraph (for working with general graphs)
winnow is a great (and fast) parser combinator library.
bpaf, clap and xflags are three different command line argument parser libraries. Which one to use depends on the needs of the project and if you need to match the behaviour of an existing non-rust program (as I needed to in one case)

Vorpal@programming.dev · 1 year ago

Rewrite in Rust: paketkoll - Check installed distro files for changes

Rewrite in Rust: paketkoll - Check installed distro files for changes

Unlocking Rust's power through mentorship and knowledge spreading, with Tim McNamara :: Rustacean Station

Unlocking Rust's power through mentorship and knowledge spreading, with Tim McNamara :: Rustacean Station

New features on lib.rs

New features on lib.rs