If you’re modeling relational data, it doesn’t seem like you can get around using a DB that uses SQL, which to me is the worst: most programmers aren’t DB experts and the SQL they output is quite often terrible.
Not to dunk on the lemmy devs, they do a good job, but they themselves know that their SQL is bad. Luckily there are community members who stepped up and are doing a great job at fixing the numerous performance issues and tuning the DB settings, but not everybody has that kind of support, nor time.
Also, the translation step from binary (program) -> text (SQL) -> binary (server), just feels quite wrong. For HTML and CSS, it’s fine, but for SQL, where injection is still in the top 10 security risks, is there something better?
Yes, there are ORMs, but some languages don’t have them (rust has diesel for example, which still requires you to write SQL) and it would be great to “just” have a DB with a binary protocol that makes it unnecessary to write an ORM.
Does such a thing exist? Is there something better than SQL out there?
No, SQL is great for relational data. There’s like decades of research into it. It’s the best.
relational databases have years of reseach into them, not the query language itself.
sql was built so people other than devs can use it, but we got stuck with it.
SQL has been around and evolving since the 70s. It is an ISO/IEC standard and is portable across a multitude of databases.
portable, my ass. excuse my french.
each system has it’s own dialect and quirks
each system has it’s own dialect and quirks
That does not mean that SQL, as specified by one of it’s standard versions, is not portable. It just means that some implementations fail to comply with the standard and/or provide their own extensions.
If an implementation fails to comply with the standard, that’s a failure on the side of the implementation, not a failure of SQL.
sql was built so people other than devs can use it, but we got stuck with it.
Not really. Being designed with UX in mind, so that it sacrifices conciseness for readability, does not make it something for “people other than devs”.
Likewise, BASIC was also developed with UX in mind, and no one in their right mind would ever claim that it’s not a programming language.
SQL is horrible as a language to read or write. There’s a million different variants, because it lacks so many basic things. And when used in other code, you generally end up string concatinating one language in another language, with all the HORRIBLE bugs something like that brings about.
Imagine Backend People said we should just write adhoc Javascript for the frontend by concatinating the “correct” code in the backend.
most programmers aren’t DB experts and the SQL they output is quite often terrible.
Isn’t that looking at it the wrong way / missing the point?
If you’re fine with simple queries, use an ORM of your tech stack. Once you have to understand querying performance and weigh usage patterns, constraints, and cache performance and size, then it’s not about SQL anymore. It’s about understanding the DBMS you use.
You may ask “why can’t I use a different language for the querying part of it”. But I haven’t seen anything better yet.
Having a common, well-understood, established, documented, inter-product compatible language has a lot of value. Using a different language to the well-established industry standard is in itself an increased hurdle to adoption.
Getting back to the original quote: I don’t think anything else would serve bad developers / non-experts any better.
If you’re fine with simple queries, use an ORM of your tech stack. Once you have to understand querying performance and weigh usage patterns, constraints, and cache performance and size, then it’s not about SQL anymore. It’s about understanding the DBMS you use.
Simple queries don’t result in simple SQL. How many joins and subqueries do you think an SQL query would require in order fulfill “Give me the top 10 artists of the 90s whose albums were nominated for the MTV awards but didn’t win”?
In Django looks something like
nineties = (date(1,1,1990), date(31, 12, 1999) album_range=Q(albums__release_date__range=nineties) artists = Artists.objects.annotate( albums_sold=Sum("albums__sales", filter=album_range)), ).filter( album_range, nominations__date__range=nineties, nominations__won=False ).order_by("-albums_sold") top_artists = artists[:10]
What if one method wants the result of that but only wants the artists’ names, but another one wanted additional or other fields? In django you could simply use
artists.only(*field_names)
and each method would provide a different set of field names. What would that look like without a capable ORM? Do you think somebody would refactor the method to add afield_names
argument? In my experience the result is a bunch of copy pasted queries that modify the query itself to add the fieldnames.Another common thing is querying related objects. Say you simply wanted to have information about the record label of the aforementioned artists while handling the artists. A many-to-one relationship (artist has one record label, record label has many artists). You could either
artist.record_label
while in your for-loop, but that would trigger an query for every artist (1+n problem). Or in django that’sartists.select_related("record_label")
and it will get all the record_labels in the same query.
If it’s a many-to-many relationship for example “festivals”, then.prefetch_related()
will first select the artists, then make a second query of festivals of those artists, andartist.festivals
would be available.An ORM like django makes that simple. SQL, does not.
So, before we even get to the DB optimisation part (which indices to create, whether a view is better or now, which storage engine to use, WAL size, yadayadayada), there’s an entire interface / language that makes writing bad code very easy.
Datalog. Basically relational, but have much simpler syntax and semantics,and is able to convey more complicated queries in a composable manner. Mainly used in Clojure-based databases where actual programs instead of strings are used.
🤔 interesting. First order logic for querying. I found CozoDB which might be useful. Still not sure about how to map objects to the DB. The tutorial is not very relatable. Lots of numbers and single letters.
Thanks for the hint though. I’ll check it out.
deleted by creator
deleted by creator
I have seen that this is still a problem, even in established enterprise companies.
deleted by creator
“If you don’t know what I know, you shouldn’t be here”
“I don’t make mistakes that end up #3 on the vulnerability list, and if you do, I don’t want you around me”
I find that a gate keeperish attitude.
Gates exist for a reason
the translation step from binary (program) -> text (SQL) -> binary (server)
Your concern about this is misguided. Inter-process communication always has to cross a barrier, by definition.
I take it http also feels wrong to you?
Binary protocols do exist, e.g. gRPC, but they incur costs of their own.
the translation step from binary (program) -> text (SQL) -> binary (server)
Also, the translation step from binary (program) -> text (SQL) -> binary (server), just feels quite wrong. For HTML and CSS, it’s fine, but for SQL, where injection is still in the top 10 security risks, is there something better?
Does this not answer your question?
I’m absolutely biased as a data engineer who loves SQL, but there are some good reasons why SQL has been the de facto standard for interacting with databases since the 80s.
One of its draws is that it’s easy to understand. I can show a stakeholder that I’m selecting “sum(sale_amount) from transactions where date=yesterday” and they understand it. Many analysts are even able to write complicated queries when they don’t know anything else about programming.
Since it’s declarative, you rarely have to think about all the underlying fuckery that lets you query something like terabytes of data in redshift in minutes.
Debugging is often pretty nice too. I can take some query that didn’t do what it was supposed to and run it over and over in a console until the output is right.
I’m absolutely biased as a data engineer who loves SQL, but there are some good reasons why SQL has been the de facto standard for interacting with databases since the 80s.
I find it funny how the people who actually have to wrangle data swear by SQL as awesome, but there are always random hacks coming out of the woodwork, who don’t even look at SQL at all, with sweeping statements claiming SQL sucks because reasons.
It’s like the most opinionated people against SQL are the ones who don’t use SQL.
I’d recommend everyone check out https://prql-lang.org/. It’s SQL, but readable and writable in a sane way.
And no, SQL is NOT readable or writable for anything involving more than a single join.
You can write selects with many joins, as long they are regular and either add a column or reduce the result set. You have to write the joins explicitly though. Just shoving all of the restrictions into the where clause will definitely confuse everybody.
SQL, where injection is still in the top 10 security risks
This is absolutely true, but it’s not what it looks like on the surface, and if you dig into the OWASP entry for this, you’ll see they talk about mitigation.
You can completely eliminate the possibility of injection attacks using well-understood technologies such as bind variables, which an ORM will usually use under the covers but which you can also use with your own queries. There are many, many database applications that have never once had a SQL injection vulnerability and never will.
The reason SQL injection is a widespread security risk, to be blunt, is that there are astonishingly large numbers of inexperienced and/or low-skill developers out there who haven’t learned how to use the tools at their disposal. The techniques for avoiding injection vulnerability are simple and have been well-documented for literally decades but they can’t help if a lousy dev decides to ignore them.
Now, a case could be made that it’d be better if instead, we were using a query language (maybe even a variant of SQL) that made injection attacks impossible. I agree in principle, but (a) I think this ends up being a lot harder than it looks if you want to maintain the same expressive power and flexibility SQL has, (b) given that SQL exists, “get bad devs to stop using SQL” doesn’t seem any more likely to succeed than “get bad devs to use bind variables,” and © I have too much faith in the ability of devs to introduce security vulnerabilities against all odds.
It’s also worth noting that the current entry on OWASP is injection generically. It includes SQL, but it also covers things like HTTP links where you concatenate in unchecked user input. SQL injection by itself may no longer be prominent enough make the list.
I’m also going to put the blame squarely on PHP for SQL injection attacks hanging on for so long, particularly when combined with MySQL. That DB didn’t support bind variables for a long time (maybe still doesn’t?). Other languages may have used a library that simulated bind variables for you. Barring that, they tended to always always always show how to use a sanitation function even if it was the first mention in the first tutorial for the top Google result for “sql [language]”. That creates a culture in the language of writing safe SQL code. Not PHP, though; the sanitation functions were there, but they never gave them the prominence that they so badly needed.