It’s a good thing that real open source models are getting good enough to compete with or exceed OpenAI.
It’s a good thing that real open source models are getting good enough to compete with or exceed OpenAI.
I like the game, but agree with the over-tutorialed complaints. They have two difficulty modes, I wish only story mode got all the handholding. I think there’s enough obvious indicators to get you through all the game mechanics.
It has been on my list to figure out how to move to forgejo, need to do it soon before the migration process breaks or gets awful.
surely he’ll be less of a twat then. right?
Donnie Darko - Just such a great, strange movie
I guess it wasn’t bacon I hate for breakfast yesterday.
Why do you hate bacon, are you a windmill?
Things I will bet money on
deleted by creator
It’s not a cinematic masterpiece but it had a distinctive look and vibe with a cool soundtrack, interestingly strange plot. I saw it again a few years ago and remembered why I liked it as an angsty teen.
Really love arch and the AUR. I’ve been tempted to get nix set up for the rare cases when there’s no AUR package or the AUR package is unmaintained. I figure if there’s no package in the AUR or nixpkgs, it’s probably not worth running.
I’m a Unity noob and even more of a noob in Godot, but the c# development experience is so much better in Godot it’s ridiculous.
I remember what was it like 6 years ago when Unity announced moving towards .net core. I can appreciate thats a large effort, but they’ve made ridiculously little progress that I can see
btop reports some gpu, network and disk information that I don’t think shows up in htop, feels a bit more comprehensive maybe? Both are fine, but I too use btop, it’s nice.
Random trivia: I think btop has been rewritten like 3-5 times now? It’s sort of an inside joke to the point that someone suggested another rewrite from C++ to Rust ( https://github.com/aristocratos/btop/issues/5 ). I guess the guy just likes writing system monitoring console apps.
Easiest shorting money I ever made.
It’s not uncommon on sensitive stories like this for the government to loop-in journalists ahead of time so they can pull together background and research with an agreed-upon embargo until some point in the future.
This wasn’t the US government telling the newspaper they couldn’t report on a story they had uncovered from their own investigation.
I guess this solves part of the mystery about why the French rioted when they raised the retirement age last year
There’s quantization which basically compresses the model to use a smaller data type for each weight. Reduces memory requirements by half or even more.
There’s also airllm which loads a part of the model into RAM, runs those calculations, unloads that part, loads the next part, etc… It’s a nice option but the performance of all that loading/unloading is never going to be great, especially on a huge model like llama 405b
Then there are some neat projects to distribute models across multiple computers like exo and petals. They’re more targeted at a p2p-style random collection of computers. I’ve run petals in a small cluster and it works reasonably well.
Is this the new “Simpsons already did it”?
Cunk already did it…
(3:40 if you want to get right to it) https://www.youtube.com/watch?v=UoSUx1xyj1E
First a caveat/warning - you’ll need a beefy GPU to run larger models, there are some smaller models that perform pretty well.
Adding a medium amount of extra information for you or anyone else that might want to get into running models locally
Tools
Models
If you look at https://ollama.com/library?sort=featured you can see models
Model size is measured by parameter count. Generally higher parameter models are better (more “smart”, more accurate) but it’s very challenging/slow to run anything over 25b parameters on consumer GPUs. I tend to find 8-13b parameter models are a sort of sweet spot, the 1-4b parameter models are meant more for really low power devices, they’ll give you OK results for simple requests and summarizing, but they’re not going to wow you.
If you look at the ‘tags’ for the models listed below, you’ll see things like
8b-instruct-q8_0
or8b-instruct-q4_0
. The q part refers to quantization, or shrinking/compressing a model and the number after that is roughly how aggressively it was compressed. Note the size of each tag and how the size reduces as the quantization gets more aggressive (smaller numbers). You can roughly think of this size number as “how much video ram do I need to run this model”. For me, I try to aim for q8 models, fp16 if they can run in my GPU. I wouldn’t try to use anything below q4 quantization, there seems to be a lot of quality loss below q4. Models can run partially or even fully on a CPU but that’s much slower. Ollama doesn’t yet support these new NPUs found in new laptops/processors, but work is happening there.