4-june-2024
Understanding SQL Complexity Through Basic Math Analogy
We all understand basic math, and we know what is meant by
(3 * 5) * 12
But imagine if common convention required us to write this expression as:
Calculate (3 times 5) multiply by 12 with times applied to 3 and 5
This would be frustrating:
♦ Why do we have to start with CALCULATE?
♦ Why do we need two operators for multiplication: TIMES and MULTIPLY?
♦ Why do we need to explain the parameters of the TIMES operator?
But this is exactly the kind of complexity SQL imposes on us. Consider, for example, the SQL query:
Select DepId, Sex, max(Salary) as MxSal
from Employees
where DepId > 3
group by DepId, Sex
having MxSal > 2000
We encounter here the same frustrating language elements:
♦ Why do we have to start with SELECT?
♦ Why do we need two different clauses for filtering: WHERE and HAVING?
♦ Why do we need to explain the grouping with GROUP BY?
Instead, wouldn’t it be simpler to write:
DepId, Sex, max(Salary) as MxSal
from Employees
where DepId > 3 and MxSal > 2000
What’s wrong with SQL?
Not much. In fact, SQL is a powerful language for gaining insights from your data. The concept of a view serves as an efficient, dynamic mechanism to define how insights are derived without the need for copying or moving data.
However, there are some issues with SQL. While it is based on Relational Algebra, its implementation is not always sound.
By adopting a different perspective on SQL, it seamlessly intertwines with Relational Algebra, allowing you to write classic SQL while simultaneously benefiting from the simplicity and insight of Relational Algebra.
This article provides a concise technical overview of how this is implemented in Colbert’s SQL, an integral component of this Relational Database Management System (RDBMS).
Relational Algebra versus SQL
Removing the Superfluous: SQL Simplified to Relational Algebra
In the Relational Algebra a query operation is performed on a relation while the output of this operation is again a relation upon which next operations might be performed.
Basic Relational Algebra operators are:
♦ Selection (σ): Selects rows from a relation that satisfy a given predicate.
♦ Projection (π): Selects specific columns from a relation.
♦ Cartesian Product (×): Combines each tuple of one relation with every tuple of another relation.
Colbert incorporates Relational Algebra in SQL in a way that you can still use normal SQL, but at the same time you benefit from the power and simplicity of a pure Relational Algebra.
Colbert achieves this by the recognition that:
♦ Selection (σ) can be regarded as the SQL Where.
♦ Projection (π) can be regarded as the SQL From.
♦ Cartesian Product (×) can be regarded as the SQL Join.
So what about the SQL select ?
♦ The SQL Select can be regarded as an identity operator (ι).
In this capacity, Select simply returns its argument unchanged. Therefore, it can be omitted if desired, but it can also still be applied for compatibility reasons.
These recognitions essentially define the basic Relational Algebra statements
(π, σ, ι) in SQL as follows:
♦ select <relation> => <relation>
♦ <relation> where <bool-expression> => <relation>
♦ <expression-list> from <relation> => <relation>
♦ <relation> join <relation> => <relation>
These operators have increasing priority in the order listed above.
Colbert’s SQL considers that the projection of fields is defined in the from clause. This approach may indicate a departure from the traditional SQL syntax, where the Select clause typically handles projection.
However, this approach offers advantages in writing SQL statements by enabling the removal of superfluous language elements, facilitating easier writing and comprehension of SQL. Additionally, it’s still possible to compose standard SQL using these operators.
Furthermore, like in mathematical expressions, parentheses aren’t always necessary but can be used for clarity and to ensure the correct order of operations.
Rewriting SQL as Relational Algebra
Composing Standard SQL and Relational Algebra with the Same Operators
When queries are constructed using the operators defined above, which differ from standard SQL, both standard SQL and Colbert SQL can be derived to build the same queries.
Therefore, in all the following examples, all queries are valid. Colbert implements SQL with these operators and thus offers a very simple, powerful, and compatible SQL.
Standard SQL:
Select * from Sales;
Same queries in Colbert SQL:
1. Sales;
2. Select Sales;
3. * from Sales;
Standard SQL:
Select * from Sales where Year > 2000;
Same query in Colbert SQL:
Sales where Year > 2000;
Standard SQL:
Select EmpId, year from Sales;
Same query in Colbert SQL:
EmpId, Year from Sales;
Join
In Colbert, the SQL Join is regarded to be the Cartesian Product (×). The Join operator might be preceded by one or more join specifiers, such as: inner, left, right, outer, natural or cross, provided the specifiers do not conflict.
If a specifier is used, the Join keyword might be omitted. The Cross keyword might also be omitted, as it serves no additional purpose.
With this, the following statements become valid Colbert SQL join statements:
Standard SQL:
Select * from Employees natural outer join Sales;
Same queries in Colbert SQL:
1. Employees natural outer join Sales;
2. Employees natural outer Sales;
Group By
Why does SQL need GROUP BY ?
Group by is a rather redundant clause. It does not make sense to incorporate it because it is always clear that all the non-aggregated columns in a for clause are the group by fields.
Colbert simply treats it this way, meaning that if there is any aggregation in the for clause, all non-aggregated columns are considered group by fields.
The practice of including more fields in the GROUP BY clause than those specified in the SELECT clause in standard SQL does not make sense as it often results in ambiguity and potential errors in the query results.
Group by can still be employed for compatibility, of course.
Next statements are valid Colbert SQL aggregation statements:
Standard SQL:
Select DepId, Sex, Married, count(*), max (Salary) from Employees
group by DepId, Sex, Married;
Same Query in Colbert SQL:
DepId, Sex, Married, count, max Salary from Employees;
For grouping without an aggregation function, the distinct clause is always available:
Select distinct DepId from Employees ;
distinct DepId from Employees ;
which is same as:
DepId from (DepId, count from Employees );
Standard SQL:
Select Name, Salary, Salary – MaxSalary as SalDiff
from Employees natural join
(select DepId, max(Salary) as MaxSalary
from Employees group by DepId) ;
Same query in Colbert SQL:
Name, Salary, Salary – MaxSalary as SalDiff from
Employees natural (DepId, max Salary from Employees);
If no alias is provided for an aggregated or calculated field, Colbert assigns a name for that field consisting of only letters and digits.
Note that in Colbert it is also allowed to use count instead of count(*) and omit parentheses for functions with zero or one parameter.
Order by
Colbert allows for sorting within the from clause by letting an expression be followed by asc or desc.
If sorting is defined both in the from clause and by an order by clause, only the order by clause is considered.
Standard SQL:
Select Year, Target as Aim from Sales
where year > 2000 order by Target desc;
Same query in Colbert SQL
Year, Target desc as Aim from Sales where year > 2000;
Having
The use of Having is Superfluous and Unnecessarily Complicating.
Utilizing only the where clause, even in situations where traditional SQL demands having, produces a clear and unambiguous SQL statement.
Colbert treats where and having the same, allowing them to be interchangeable. The optimizer then determines the most effective approach to apply filtering at the right time and at the right place.
Next statements are valid Colbert SQL statements:
Standard SQL:
Select DepId, Sex, max(Salary) as MaxSalary, count(*)
from Employees
where DepId > 3
group by DepId, Sex
having MaxSalary > 2000
order by sex desc;
Same Query in Colbert SQL, the optimizer will split the where clause into two filterings each applied differently, just like the previous standard SQL example:
DepId, Sex desc, max Salary, count
from Employees
where DepId > 3 and MaxSalary > 2000;
Which is, in Colbert SQL, the same as:
DepId, Sex desc, max Salary, count from
(Employees where DepId > 3)
where MaxSalary > 2000;
Colbert SQL allows a where clause over both a non-aggregated field and an aggregated field, even if it cannot be split:
DepId, Sex desc, max Salary, count from Employees
where DepId > 3 or MaxSalary > 2000;
The result of the former query is the same as if the next script was executed:
create table Temp as
DepId, Sex desc, max Salary, count from Employees;
Select Temp where DepId > 3 or MaxSalary > 2000;
Other blogs
Colbert is a relational database management system (RDBMS) that introduces unique technology, offering innovative solutions to analytical challenges beyond conventional boundaries.
More details about these innovative solutions can be found at www.colbert.nl/#articles