166
Regain control of your privacy with Proton (and enjoy their Black Friday / Cyber Week deals while they last!): VPN: https://protonvpn.com/blackfriday Mail: https://proton.me/mail/black-friday Grab a brand new laptop or desktop running Linux: https://www.tuxedocomputers.com/en# 👏 SUPPORT THE CHANNEL: Get access to a weekly podcast, vote on the next topics I cover, and get your name in the credits: YouTube: https://www.youtube.com/@thelinuxexp/join Patreon: https://www.patreon.com/thelinuxexperiment Liberapay: https://liberapay.com/TheLinuxExperiment/ Or, you can donate whatever you want: https://paypal.me/thelinuxexp 👕 GET TLE MERCH Support the channel AND get cool new gear: https://the-linux-experiment.creator-spring.com/ 🎙️ LINUX AND OPEN SOURCE NEWS PODCAST: Listen to the latest Linux and open source news, with more in depth coverage, and ad-free! https://podcast.thelinuxexp.com 🏆 FOLLOW ME ELSEWHERE: Website: https://thelinuxexp.com Mastodon: https://mastodon.social/web/@thelinuxEXP Pixelfed: https://pixelfed.social/TLENick PeerTube: https://tilvids.com/c/thelinuxexperiment_channel/videos Discord: https://discord.gg/mdnHftjkja 00:00 Intro 00:59 Sponsor: Proton 02:17 Data grabbing 05:07 Why this data matters 07:41 Laws make it worse 11:11 What you can do 14:04 Sponsor: Get a PC made to run Linux 15:07 Support the channel Playlist on how to De-Google your life: https://www.youtube.com/playlist?list=PLqmbcbI8U55EfYUVdZfjrfyJyNHD-Bly8 #Privacy #anonymity #private Virtually everything online now collects data. And this data doesn't just stay at the company that collected it. This data is a giant repository for governments to use and track or monitor their citizens. See, in a LOT of countries, governments have the right to ask a company to provide all the data they've collected on their users. Companies have no choice but to comply with these, which is also why using end to end, and zero access encrypted services is crucial. For example, the US can request any company to give them data on a specific user, they've done so more than any other country in 2020. But other countries do the exact same: Germany, Denmark, South korea, France, virtually ever country does this. If you want even more scary numbers, in 2022, Meta, the parent company for Facebook, Instagram, or Whatsapp, got 827K requests for data. They complied with 76% of these requests. https://www.globalsecuritymag.com/Meta-received-over-800k-user-data-requests-from-governments-in-2022.html There are a lot of legal offensives being planned, or already implemented in various countries, so let's look at a few. In Russia, recent laws from 2017 banned anonymous use of online messaging apps, and prohibits the use of tools that would circumvent government censorship. This means that while VPNs aren't exactly banned, if they let people access banned websites, then they'll also be banned. This has happened to at least 15 VPNs, including NordVPN, ProtonVPN, and OperaVPN. https://www.hrw.org/news/2017/08/01/russia-new-legislation-attacks-internet-anonymity In Australia, in 2021, a law was proposed to force people to attach their real name to their social media posts, apparently to fight online trolls, bullying and harrassment. Users would have had to provide an ID before opening any social media account, which would obviously open the door to surveillance, monitoring, and censorship. https://ia.acs.org.au/article/2021/govt-wants-to-end-online-anonymity.html In France, we have the recent SREN law. This thing would give the telecom watchdog powers to block websites, and require tools for age verification. On top of that, the law will give the government capabilities to demand web browsers and DNS providers block certain websites. https://adguard.com/en/blog/france-web-browser-dns-blocking-law.html in the UK, the Online Safety Bill of 2022 allows the regulatory agency Ofcom to force websites to collect people's personal data, and they'll be able to scan, restrict and remove content that is considered harmful. The bill also mandates online communication services to be moderated, which basically means end to end encryption can be enabled there anymore. https://datainnovation.org/2022/05/the-uks-online-safety-bill-undermines-encryption-and-anonymity/ So, what can you do about this? For protecting your data, there are plenty of things you can do. First, stop using privacy invasive operating systems. If you can't move to something like Linux, try at least to disable all the telemetry you can in Windows or macOS, in Android and iOS. You can try using a degoogled, privacy focused Android ROM on your smartphone. Leaving Chrome for a more private browser is also pretty much mandatory. Same goes for your online services: stop using Google as a search engine, Gmail, or stuff like Outlook, OneDrive, iCloud, and the like. Using a VPN is also a solid option to at least try and blur the lines.
While forensic linguistics is pretty cool, the Unabomber was caught because they released his manifesto and his brother’s wife and brother recognized the unusual phrasing such as ‘Eat your cake and have it too’.
If an author has a large amount of known works then it’s not too difficult to identify other writings by that same author. But if the author does not have a large body of writing that is known to come from that individual, then the best we can do is determine an approximate age and geographic location where the Individual grew up, and that’s only when the unidentified writing is large enough, like in the case of the Unabomber where his manifesto was 30k words.
I did simplify the whole thing, as you noticed; but note that his SIL and brother identifying him is another example of the same process, David knew that expressions that Ted used like “cool-headed logicians” were highly unusual, not too unlike what the socio- and forensic linguists did there.
Such as a Lemmy or Facebook account? Or any other online account associated with your writing, really; we produce far more text in the internet than ourselves realise.
And while a priori, your different accounts through different websites might look completely disconnected, as you connect two of them as coming from the same person, connecting a third one is easier. And a fourth. So goes on.
A small caveat is that while the corpus is bigger, so is the noise introduced by people from the other side of the world that happen to use the same patterns as the person whom you want to identify. Even then, I believe that the ability to bulk process text to find authorship grew considerably faster than the number of potential matches.
Additional signal is not noise