Menu

What’s wrong with SQL?

**4-june-2024**

**Understanding SQL Complexity Through Basic Math Analogy**

We all understand basic math, and we know what is meant by

**(3 * 5) * 12**

But imagine if common convention required us to write this expression as:

**Calculate (3 times 5) multiply by 12 with times applied to 3 and 5**

This would be frustrating:

♦ Why do we have to start with CALCULATE?

♦ Why do we need two operators for multiplication: TIMES and MULTIPLY?

♦ Why do we need to explain the parameters of the TIMES operator?

But this is exactly the kind of complexity SQL imposes on us. Consider, for example, the SQL query:

**Select DepId, Sex, max(Salary) as MxSal****from Employees****where DepId > 3****group by DepId, Sex****having MxSal > 2000 **

We encounter here the same frustrating language elements:

**♦ **Why do we have to start with SELECT?

♦ Why do we need two different clauses for filtering: WHERE and HAVING?

♦ Why do we need to explain the grouping with GROUP BY?

Instead, wouldn’t it be simpler to write:

**DepId, Sex, max(Salary) as MxSal****from Employees****where DepId > 3 and ****MxSal > 2000 **

**What’s wrong with SQL?**

Not much. In fact, SQL is a powerful language for gaining insights from your data. The concept of a view serves as an efficient, dynamic mechanism to define how insights are derived without the need for copying or moving data.

However, there are some issues with SQL. While it is based on Relational Algebra, its implementation is not always sound.

By adopting a different perspective on SQL, it seamlessly intertwines with Relational Algebra, allowing you to write classic SQL while simultaneously benefiting from the simplicity and insight of Relational Algebra.

This article provides a concise technical overview of how this is implemented in Colbert’s SQL, an integral component of this Relational Database Management System (RDBMS).

**Relational Algebra versus SQL**

Removing the Superfluous: SQL Simplified to Relational Algebra

In the Relational Algebra a query operation is performed on a relation while the output of this operation is again a relation upon which next operations might be performed.

Basic Relational Algebra operators are:

**♦ Selection (σ)**: Selects rows from a relation that satisfy a given predicate.**♦ Projection (π)**: Selects specific columns from a relation.**♦ Cartesian Product (×)**: Combines each tuple of one relation with every tuple of another relation.

Colbert incorporates Relational Algebra in SQL in a way that you can still use normal SQL, but at the same time you benefit from the power and simplicity of a pure Relational Algebra.

Colbert achieves this by the recognition that:

♦ **Selection (σ) **can be regarded as the SQL **Where**.

♦ **Projection (π) **can be regarded as the SQL **From***.***♦ Cartesian Product (****×****)**** ** can be regarded as the SQL *Join**.*

So what about the SQL **select **?

♦ The SQL **Select **can be regarded as an identity operator (ι)*.*

In this capacity, ** Select** simply returns its argument unchanged. Therefore, it can be omitted if desired, but it can also still be applied for compatibility reasons.

These recognitions essentially define the basic Relational Algebra statements

(**π, σ, ι**) in SQL as follows:

**♦ select **<relation> => <relation>

♦ <relation> **where **<bool-expression> => <relation>

♦ <expression-list> **from **<relation> => <relation>

♦ <relation> **join** <relation> => <relation>

These operators have increasing priority in the order listed above.

Following this syntax, Colbert enables the creation of both standard SQL statements and Colbert SQL versions, and a seamless intermingling between them. This flexibility likely enhances the development experience, enabling developers to choose a syntax that best suits their needs without constraints.

The Colbert parser initially endeavors to evaluate and execute a provided statement to a <relation>, with an <identifier> potentially yielding a <relation> if that identifier is present as a table or a view in the Data Dictionary.

The syntax definition should focus on clarity, not on performance. Therefore, Colbert optimizes queries effectively by applying operators like filtering and sorting, regardless of the syntax used, at the right location at the right time.

Colbert’s SQL considers that the projection of fields is defined in the from clause. This approach may indicate a departure from the traditional SQL syntax, where the Select clause typically handles projection.

However, this approach offers advantages in writing SQL statements by enabling the removal of superfluous language elements, facilitating easier writing and comprehension of SQL. Additionally, it’s still possible to compose standard SQL using these operators.

Furthermore, like in mathematical expressions, parentheses aren’t always necessary but can be used for clarity and to ensure the correct order of operations.

**Rewriting SQL as Relational Algebra**

Composing Standard SQL and Relational Algebra with the Same Operators

When queries are constructed using the operators defined above, which differ from standard SQL, both standard SQL and Colbert SQL can be derived to build the same queries.

Therefore, in all the following examples, all queries are valid. Colbert implements SQL with these operators and thus offers a very simple, powerful, and compatible SQL.

Standard SQL:

**Select * from Sales;**

Same queries in Colbert SQL:

** 1. Sales;**

3.

Standard SQL:

**Select * from Sales where Year > 2000;**

Same query in Colbert SQL:

** Sales where Year > 2000; **

Standard SQL:

**Select EmpId, year from Sales;**

Same query in Colbert SQL:

**EmpId, Year from Sales;**

**Join**

In Colbert, the SQL Join is regarded to be the **Cartesian Product (×)**. The **Join** operator might be preceded by one or more join specifiers, such as: **inner, left, right, outer, natural **or** cross**, provided the specifiers do not conflict.

If a specifier is used, the **Join** keyword might be omitted. The **Cross** keyword might also be omitted, as it serves no additional purpose.

With this, the following statements become valid Colbert SQL join statements:

Standard SQL:

**Select ***** ****from ****Employees natural outer join Sales****;**

Same queries in Colbert SQL:

**1. ****Employees natural outer join Sales; 2. **Employees natural outer Sales;

**Group By**

Why does SQL need GROUP BY ?

*Group by *is a rather redundant clause. It does not make sense to incorporate it because it is always clear that all the non-aggregated columns in a *for* clause are the *group by* fields.

Colbert simply treats it this way, meaning that if there is any aggregation in the *for *clause, all non-aggregated columns are considered *group by* fields.

The practice of including more fields in the GROUP BY clause than those specified in the SELECT clause in standard SQL does not make sense as it often results in ambiguity and potential errors in the query results.

*Group by* can still be employed for compatibility, of course.

Next statements are valid Colbert SQL aggregation statements:

Standard SQL:

**Select DepId, Sex, Married, count(*), max (Salary) from Employees****group by DepId, Sex, Married;**

Same Query in Colbert SQL:

**DepId, Sex, Married, count, max Salary from Employees;**

** **

For grouping without an aggregation function, the distinct clause is always available:

**Select distinct DepId from Employees ;****distinct DepId from Employees ;**

which is same as:

**DepId from (DepId, count from Employees );**

Standard SQL:

**Select Name, Salary, Salary – MaxSalary as SalDifffrom **

from Employees group by DepId) ;

Same query in Colbert SQL:

**Name, Salary, Salary – MaxSalary as SalDiff fromEmployees natural (DepId, max Salary from Employees);**

*If no alias is provided for an aggregated or calculated field, Colbert assigns a name for that field consisting of only letters and digits.*

Note that in Colbert it is also allowed to use **count **instead of **count(*) **and omit parentheses for functions with zero or one parameter.

**Order by **

Colbert allows for sorting within the **from clause** by letting an expression be followed by **asc** or **desc**.

If sorting is defined both in the **from clause** and by an** order by clause**, only the **order by clause** is considered.

Standard SQL:

**Select Year, Target as Aim from Sales**

Same query in Colbert SQL

**Year, Target desc as Aim from Sales where year > 2000;**

**Having**

The use of Having is Superfluous and Unnecessarily Complicating.

Utilizing only the ** where** clause, even in situations where traditional SQL demands

Colbert treats ** where **and

Next statements are valid Colbert SQL statements:

Standard SQL:

**Select DepId, Sex, max(Salary) as MaxSalary, count(*)****from Employees****where DepId > 3****group by DepId, Sex****having MaxSalary > 2000****order by sex desc;**

Same Query in Colbert SQL, the optimizer will split the where clause into two filterings each applied differently, just like the previous standard SQL example:

**DepId, Sex desc, max Salary, count****from Employees****where DepId > 3 and MaxSalary > 2000;**

Which is, in Colbert SQL, the same as:

**DepId, Sex desc, max Salary, count from**

Colbert SQL allows a where clause over both a non-aggregated field and an aggregated field, even if it cannot be split:

**DepId, Sex desc, max Salary, count from Employees****where DepId > 3 or MaxSalary > 2000;**

The result of the former query is the same as if the next script was executed:

**create table Temp as DepId, Sex desc, max Salary, count from Employees;**

**Other blogs**

*Colbert is a relational database management system (RDBMS) that introduces unique technology, offering innovative solutions to analytical challenges beyond conventional boundaries.More details about these innovative solutions can be found at www.colbert.nl/#articles*