wvstolzing

wvstolzing@lemmy.ml · 1 month ago

That is a great change to the papers of the past where you have to have an affiliation to a university to get access to a paper and sometimes even that is not enough.

‘Oxford Scholarship Online’ would license different sets of books to different departments; so someone from the philosophy department couldn’t get access to books classified under sociology or history.

Imagine doing something similar at the checkout table in a ‘physical’ library.

wvstolzing@lemmy.ml · 1 month ago

Here’s another video: https://www.youtube.com/watch?v=PriwCi6SzLo (including an interview with the great Alexandra Elbakyan).

Cory Doctorow recently wrote about this in some detail (incl. helpful links): https://pluralistic.net/2024/08/16/the-public-sphere/#not-the-elsevier

wvstolzing@lemmy.ml · 1 month ago

The name of the pdf file inside the torrent is its md5 hashsum without the .pdf extension.

On libgen.rs you can see the md5 hashsum on the download page; on libgen.li you need to look at the JSON file provided at the link on the search result , as they don’t render it on the ui.

wvstolzing@lemmy.ml · 1 month ago

The torrents are alive; as long as you can get the torrent links from libgen, you have access to the files. (No need to share whole archives either, you can pick & choose).

wvstolzing@lemmy.ml · edit-2 2 months ago

Wouldn’t enabling the --system-site-packages flag during venv creation do exactly what the OP wants, provided that gunicorn is installed as a system package (e.g. with the distro’s package manager)? https://docs.python.org/3/library/venv.html

Sharing packages between venvs would be a dirty trick indeed; though sharing with system-site-packages should be fine, AFAIK.

wvstolzing@lemmy.ml · 6 months ago

Michael W. Lucas’s “Networking for System Administrators” is a great resource: https://mwl.io/nonfiction/networking#n4sa

wvstolzing@lemmy.ml · 6 months ago

That’s not a consideration in favor of grouping h/j as the ‘back keys’, and k/l as the ‘forward’ keys, though. It’s perfectly comfortable & intuitive to have the index finger on the key that goes forward.

wvstolzing@lemmy.ml · edit-2 6 months ago

Why, though? Why is it so obvious that j ‘should have’ been [edit: up]?

wvstolzing@lemmy.ml · 7 months ago

I was intrigued for a moment; installed the package; then got greeted with this – I don’t think I’ll proceed any further:

wvstolzing@lemmy.ml · 9 months ago

Sure if you drag it through the garden.

wvstolzing@lemmy.ml · edit-2 9 months ago

PyMuPDF is excellent for extracting ‘structured’ text from a pdf page — though I believe ‘pulling out relevant information’ will still be a manual task, UNLESS the text you’re working with allows parsing into meaningful units.

That’s because ‘textual’ content in a pdf is nothing other than a bunch of instructions to draw glyphs inside a rect that represents a page; utilities that come with mupdf or poppler arrange those glyphs (not always perfectly) into ‘blocks’, ‘lines’, and ‘words’ based solely on whitespace separation; the programmer who uses those utilities in an end-user facing application then has to figure out how to create the illusion (so to speak) that the user is selecting/copying/searching for paragraphs, sentences, and so on, in proper reading order.

PyMuPDF comes with a rich collection of convenience functions to make all that less painful; like dehyphenation, eliminating superfluous whitespace, etc. but still, need some further processing to pick out humanly relevant info.

Built-in regex capabilities of Python can suffice for that parsing; but if not, you might want to look into NLTK tools, which apply sophisticated methods to tokenize words & sentences.

EDIT: I really should’ve mentioned some proper full text search tools. Once you have a good plaintext representation of a pdf page, you might want to feed that representation into tools like the following to index them properly for relevant info:

https://lunr.readthedocs.io/en/latest/ – this is easy to use, & set up, esp. in a python project.

… it’s based on principles that are put to use in this full-scale, ‘industrial strength’ full text search engine: https://solr.apache.org/ – it’s a bit of a pain to set up; but python can interface with it through any http client. Once you set up some kind of mapping between search tokens/keywords/tags, the plaintext page, & the actual pdf, you can get from a phrase search, for example, to a bunch of vector graphics (i.e. the pdf) relatively painlessly.

wvstolzing@lemmy.ml · 9 months ago

What I can’t quite make sense of, is how ‘James’ itself is a diminuitive of ‘Jacob’.

wvstolzing@lemmy.ml · 9 months ago

I believe ‘Harry’ is the Welsh version of English ‘Henry’, & German ‘Heinrich’. … At least that’s the impression I got from Shakespeare’s ‘Henriad’ plays (H. IV 1-2, & H. V)

wvstolzing@lemmy.ml · edit-2 9 months ago

Another vote for Tesseract – just to clarify the terminology, though: PDF is a fragile format best used read-only; so you really don’t want to edit a pdf, but make a new one using the same (or cleaned-up) bitmaps and a new ocr text layer.

Now, tesseract is excellent at recognizing glyphs; but especially if the scanned image is a little fuzzy, the layout detection falters; and when it falters, you get redundant line breaks, & chunks of text in the wrong order – all of which gets incredibly annoying for searching & copying purposes. So if you can spare the time, and the text requires it, you may need to mark regions (paragraphs & titles mainly) on the bitmap image manually. There exist a few frontends to Tesseract that help with a task like that; check out, e.g., https://github.com/manisandro/gImageReader - inside single paragraph blocks of text, Tesseract doesn’t get as easily confused; and the text output is in the correct reading order, & w/o redundant breaks.

wvstolzing@lemmy.ml · 10 months ago

Better cite Wozniak as the one who ‘made’ Apple; but anyway.

wvstolzing@lemmy.ml · edit-2 11 months ago

Yeah I keep running into similar issues when trying to build pretty much anything on windows; for stuff that can’t be ‘nicely’ configured & dependency-managed through an IDE, windows is pure pain.

It really sounds like PySide would fit your use case better. Check out this website for a great starting point: https://www.pythonguis.com/pyqt6/ – the author also has an entire book on packaging PySide programs for cross-platform distribution.

As for installing Python itself; I think I’d stick with the plain installer from python.org, and afterwards, pip. In case of dependencies that are hard to get through PyPi, I think anaconda might be worth looking at as well: https://www.anaconda.com/download

msys2 provides a package manager, & several development toolchains; it’s an easy way to get native (mingw) gcc & bash on windows; cross-platform programs rely on it heavily, because it saves them from all the ‘visual studio’ BS: https://www.msys2.org/docs/what-is-msys2/ – I believe any implementation of GTK on windows requires a mingw toolchain.

wvstolzing@lemmy.ml · 11 months ago

Am I missing something?

It’s impossible to tell without knowing what specific aspect had failed.

Before we even get to GTK; there are some issues with python wheels under msys2; check out: https://www.msys2.org/docs/python/ – some wheels just can’t be built under msys2 due to various incompatibilities. Not being able to replace such packages with ‘pure’ python equivalents could end up being a (very annoying) roadblock.

The roadblock that I recently ran into with my simple GTK4 app was unpredictable ids on d-bus interface exports. D-bus does work under msys2; though you have to start the user session manually; d-feet and gdbus also work; though, as always, there’s a catch. On Linux I can automaticaly export ‘action groups’ that belong to GtkApplicationWindow widgets; & their 'object path’s show up predictably under the application’s path + / + the window’s id. This makes it really convenient when you want to add basic ‘remote controls’ to your widgets. Under msys2, though, I can’t figure out how to find those paths; which throws a monkey wrench, so to speak, in my ‘remote control’ implementation. Granted, d-bus is a linux-native technology; and expecting it to work w/o issues on windows is probably a bit too much.

– apart from those, I haven’t run into any issues with GTK4 under msys2. The GTK3 packages available in their repos also work just fine.

I do agree with the others who recommend PySide, though. Their cross platform support appears to be more robust. Their documentation has been improving as well.

wvstolzing@lemmy.ml · 11 months ago

and then try to play Doom on it. https://www.youtube.com/watch?v=D5NTJSfUWDE

wvstolzing@lemmy.ml · edit-2 11 months ago

I tend to agree with this take; as a pedantic side note, though, I’m not sure that OS X was ever based on FreeBSD – they took the unix userland, sure; but from the very start (NextSTEP), the kernel was derived from the Mach kernel, which itself was a fork of the 4.3BSD kernel; and the core libraries were written from scratch, all in the interests of marketing “quick application development” capability to Next’s customers. (Actually there’s an interview with S. Jobs somewhere where he lays this out very clearly; it was the late 80s/early 90s, the heyday of object-oriented toolkits & VMs after all)

I’m sure they’ve helped themselves liberally to the FreeBSD kernel for features; though still, OS X never was ‘based on’ FreeBSD (let alone a ‘FreeBSD with a pretty coat of paint’, as people like to say).

wvstolzing@lemmy.ml · 11 months ago

Wayfire brought back the compiz self-immolating window.

Actually I wonder if they named ‘wayfire’ after that fire effect.