Slim SQL

18 jan 2024

A Robust Improvement that maintains the Current SQL Standard while Enhancing its Power and Flexibility.

What’s wrong with SQL?

Absolutely nothing. In fact, it is a powerful language for gaining insights from your data. Moreover, the concept of a view serves as an efficient, dynamic mechanism to define how insights are derived from your data without the need for copying or moving data.
However, that being said, there are some issues with SQL. SQL is based on Relational Algebra, but its implementation is not very sound.

This article offers a concise technical overview of Colbert’s Slim SQL, an integral part of this Relational Database management System (RDBMS). It highlights the enhancements that differentiate Colbert’s Slim SQL while still permitting the utilization of traditional SQL and even intermingle them. Additionally, it delves into specific aspects like generic function definition.

Relational Algebra versus SQL

In the Relational Algebra a query operation is performed on a relation while the output of this operation is again a relation upon which next operations might be performed. Furthermore, like in mathematical expressions, parentheses aren’t always necessary but can be used for clarity and to ensure the correct order of operations. Parentheses also make the construction of Common Table Expressions (CTE) very easy and natural.
Colbert implements Relational Algebra in SQL in a way that you can still use normal SQL, but at the same time you benefit from the power and simplicity of a pure Relational Algebra.
Colbert achieves this by the recognition that:

♦ the projection (π) is actual the SQL from
♦ the selection (σ) is actual the SQL where
♦ the identity operator (ι) is actual the SQL Select

In SQL, the Select statement essentially functions as the identity operator, simply returning its argument unchanged. Therefore, it can be omitted if preferred, but it can also still be applied for compatibility reasons.

Colbert considers that the projection of fields is defined in the from clause.  This approach may indicate a departure from the traditional SQL syntax, where the Select clause typically handles projection.
However, this approach offers advantages in writing SQL statements by enabling the removal of superfluous language elements, facilitating easier writing and comprehension of SQL. Additionally, it’s still possible to compose standard SQL using these operators.

Nevertheless, this departure from standard SQL syntax could potentially impact compatibility. Therefore, it would be beneficial to have a feature that rewrites a Slim SQL statement into a standard SQL statement.

These recognitions basically define the basic SQL statement (π, σ, ι)as :

<relation> ::=
    <table-name> |
    <proj-columns> from <relation> |
    <relation> where <expression> |
    select <relation>  |
    <relation>  |
    <relation>  :  <expression>        # same as where

<table-name>  ::=  <identifier>
<alias-def>  ::=  as <identifier>
<sort-def>   ::=  asc  | desc

<column>  ::=  <expression>
<proj-column>  ::=  <column> {{<alias-def> | <sort-def>}}  |  *
<proj-columns>  ::=  {<proj-column> ^ , } 

{…}        ⇒ zero or more
{{…}}     ⇒ unique alternatives
{… ^ ; }  ⇒ delimited by ;
{…} +    ⇒ one or more
[…]        ⇒ optional
#           ⇒ rest of line is comment

{{A|B}}  <none> | A | B | AB | BA

<column-name> might be a <factor> in an <expression>

It is still possible to Compose Standard SQL using these Relational Algebra Operators.

Following this syntax, Colbert enables the creation of both standard SQL statements and Colbert’s Slim versions, and a seamless intermingling between them. This flexibility likely enhances the development experience, enabling developers to choose a syntax that best suits their needs without constraints.
 
The Colbert parser initially endeavors to evaluate and execute a provided statement to a <relation>, with an <identifier> potentially yielding a <relation> if that identifier is present as a table or a view in the Data Dictionary.
 
The syntax definition should focus on clarity, not on performance. Therefore, Colbert optimizes queries effectively by applying operators like filtering and sorting, regardless of the syntax used, at the right location at the right time.

Standard SQL:

Select * from Sales;

Same queries in Colbert SQL:

Sales;
Select Sales;
* from Sales;

 

Standard SQL:

Select * from Sales where Year > 2000;

Same query in Colbert SQL:

Sales where Year > 2000; 
Sales : Year > 2000;

 

Other valid Colbert SQL:

EmpId, Year from Sales;
Name, Salary, Salary – MaxSalary as SalDiff from
Employees natural (DepId, max Salary from Employees);

If no alias is provided for an aggregated or calculated field, Colbert assigns a name for that field consisting of only letters and digits.
In the last example parentheses simply construct Common Table Expression (CTE) 

Order by 

Additional Colbert SQL production rules concerning order by:

<relation> ::=
    <relation> order by <sort-columns> |
    <relation> by <sort-columns>                          # same as order by

<column>  ::=  <expression>
<sort-column>  ::=  <column> [<sort-def>]
<sort-columns>  ::=  {<sort-column> ^ , }

From these production rules, both the standard SQl emerges as well as a slim Colbert version. The latter allows for sorting within the from clause.

Standard SQL:

Select Year, Target as Aim from Sales
where year > 2000 order by Target desc;

Same query in Colbert SQL

Year, Target desc as Aim from Sales : year > 2000;

If sorting is defined both in the from clause and by an order by clause, only the order by clause is considered.

Join

Additional Colbert SQL production rules concerning join are:

<relation>  ::=  <relation> <join-def> <relation>
<join-def>  ::=  join  |  {<join-spec>}+  [ join ]  |  ,
<join-spec> ::=  outer | inner | left | right | natural 

Colbert does not specify the superfluous cross join as it is simply a normal join without a where clause.
With this, the following statements become valid Colbert SQL join statements:

Standard SQL:

Select * from Employees E natural outer join Sales S;

Same queries in Colbert SQL:

Employees natural outer join Sales;
Employees natural outer Sales;

Group By

Why does SQL need  GROUP BY ?

Group by is a rather redundant clause. It does not make sense to incorporate it because it is always clear that all the non-aggregated columns in a for clause are the group by fields.
Colbert simply treats it this way, meaning that if there is any aggregation in the for clause, all non-aggregated columns are considered group by fields.
The practice of including more fields in the GROUP BY clause than those specified in the SELECT clause in standard SQL does not make sense as it often results in ambiguity and potential errors in the query results.

Group by can still be employed for compatibility, of course.

Next statements are valid Colbert SQL aggregation statements:

Standard SQL:

Select DepId, Sex, Married, count(*), max (Salary) from Employees
group by DepId, Sex, Married;

Same Query in Colbert SQL:

DepId, Sex, Married, count, max Salary from Employees;


For grouping without an aggregation function, the distinct clause is always available:

Select distinct DepId from Employees ;
distinct DepId from Employees ;

which is same as:

DepId from (DepId, count from Employees );

Note that in Colbert it is also allowed to use count instead of count(*) and omit parentheses for functions with zero or one parameter.

Having

The use of Having is Superfluous and Unnecessarily Complicating.

Utilizing only the where clause, even in situations where traditional SQL demands having, produces a clear and unambiguous SQL statement.

Colbert treats where and having the same, allowing them to be interchangeable. The optimizer then determines the most effective approach to apply filtering at the right time and at the right place.

Next statements are valid Colbert SQL statements:

Standard SQL:

Select DepId, Sex, max(Salary) as MaxSalary, count(*)
from Employees
where DepId >3
group by DepId, Sex
having MaxSalary > 2000
order by sex desc;

Same Query in Colbert SQL, the optimizer will split the where clause into two filterings each applied differently, just like the previous standard SQL example:

DepId, Sex desc, max Salary, count
from Employees
where DepId >3 and MaxSalary > 2000;

Which is, in Colbert SQL, the same as:

DepId, Sex desc, max Salary, count from
(Employees where DepId >3)
where MaxSalary > 2000;

Colbert SQL allows a where clause over both a non-aggregated field and an aggregated field, even if it cannot be split:

DepId, Sex desc, max Salary, count from Employees
where DepId >3 or MaxSalary > 2000;

The result of the former query is the same as if the next script was executed:

create table Temp as
   DepId, Sex desc, max Salary, count from Employees;

Temp where DepId > 3 or MaxSalary > 2000;

Function definition, typed or generic

Defining a function in SQL can be a cumbersome task.
Colbert simplifies this process significantly by allowing the parser to handle much of the work. For instance, take the expression 5/9 * (F - 32) for converting from Fahrenheit to Celsius. It’s evident what is intended, and it’s clear that F is a parameter of type int or double. Consequently, Colbert streamlines the function creation process by only requiring the definition of the function name and the expression.

Standard SQL:

create function FToC(@fahrenheit float)
returns float
as
begin
declare @celsius float;
set @celsius = (@fahrenheit – 32) * 5/9;
return @celsius;
end;

Colbert SQL:

Create func FToC as 5/9 * (F – 32)

If no types are specified in a function definition, Colbert treats it as a generic definition. In this case, Colbert generates all conceivable functions for every combination of parameter types. The order of the formal parameters is considered to be the order of appearance in the expression.

Nevertheless, Colbert provides the flexibility to specify types, allowing users to limit the number of implementations or to define a formal parameter order different from the order in which the actual parameters appear in the expression.

The syntax definition is as follows:

<sql-message>  ::=  create function<type-def> ) ] <identifier>  as <expression>

<type-def>  ::=
    <identifier> , {<identifier> ^ , }+  |                               # specify order of parameters
    <type> =>  <type>  |
    ( {<type> ^ ,} )  => <type>   |
    ( {<named-type> ^ ,} )  => <type> 

<type>  ::=  int, double, bool, string, datetime

<named-type>  ::=
    <identifier> <type>  <type>  <identifier>               #  Id int  as well as  int Id  is valid

Standard SQL:

create function Next(@X int )
returns int 
as
begin
declare @result int ;
set @result = @X + 1;
return @result;
end;

Colbert SQL generic declaration:

Create func next as X + 1

This definition produces the same result as the following Colbert SQL script with three separate overloaded definitions:

drop next;
create func ( int => int ) next as X + 1;
create func ( double => double ) next as X + 1;
create func ( datetime => datetime ) next as X + 1;

 

Specify the formal parameter order (L, R) differently from the order of appearance in the expression (R, L):

create func (L, R) delta as round ((R – L) * 100 / L, 2);

 

Specify parameters and types:

create func (DateTime => DateTime) NextDay as D + 1;

 

Specify parameters and types, overloading is allowed: 

create func (DateTime => DateTime) NextDay as D + 1;
create func (DateTime, Int => DateTime) NextDay as D + Nr;

Scope

Well-defined and straightforward scope rules, along with an orthogonal concept, contribute to the creation of simple and straightforward SQL statements.

Colbert follows a simple scope mechanism: all relation fields, including those defined in a from clause, are in scope and can be used in any expression within a where clause, an order by clause, or in the same from clause.

When a relation with a from clause is enclosed in parentheses, only the fields defined in that from clause are in scope outside the parentheses

If no alias is provided for an aggregated or calculated field, Colbert assigns a name for that field consisting of only letters and digits.

Colbert SQL:

*, Salary – MaxSalary as SalDif, Saldif < 0 and Married as Coh
from
Employees natural
(DepID, max Salary from Employees)
by Coh desc
where SalDif <> 0 and MaxDep < 5000

This is a valid Colbert SQL statement:

  • Parentheses are used to define a subquery.
  • A natural join is performed based on the only common field, DepId.
  • Sorting is performed on a boolean calculated field
  • Filtering includes both calculated and aggregated fields.
  • The order by, from and where clauses can be applied independently.
  • The Optimizer ensures an execution path as if the SQL statement were entered using standard SQL syntax.

 

First  Last  

Colbert offers a First and Last prefix:

<relation> ::= first <relation>  |  last <relation> 

It yields the first or last record by essentially preventing the relation from responding to a search for subsequent records based on the current criteria and order.

Who is the individual in my department with the highest salary, which is less than mine?

Me.Name, Others.Name from
Employees Me outer join last (Employees Others by Salary)
using DepId on Others.Salary > Me.Salary

This is a valid Colbert SQL statement:

  • Parentheses are used to define a subquery with an order.
  • A outer join is performed based on DepId and Salary 
 

What is the name of the oldest employee ?

first Employees by Birthdate

Further SQL Enhancements

Additional Extensions that Simplify SQL Writing

Colbert offers additional extensions that simplify SQL writing. Some of them will be addressed in upcoming blogs, as they are interesting and significant enough to justify dedicated posts. 

Here, some minor yet valuable extensions will be showcased, primarily consisting of syntactic sugar.

1) ==  Compare string with identifier.
<bool>  ::=  <string> == <identifier>

Employees : Name == Anna;

2) like operator: Regex with only metacharacters * for zero or more characters and ? for exactly one character. If no metacharacter are present in a match string ‘S’,  it is considered to be ‘*S*’.
Likes operator: Same as like but the match string is an <identifier>.

Employees Name like ‘P*’; 

Employees Name like ‘*An*’;
Employees : Name like ‘An’;
Employees Name likes An;

Last three examples are all the same.

3) DateTime. Text that matches a DateTime pattern is parsed as a DateTime accordingly:

Employees where Birthdate = 3-5-1987

4)  [ ]  and :::  Special where clause. Rel[V] or Rel ::: V  are both shortcuts for: Rel where Fv = V with Fv representing the first field in the current scope that shares the same type as V.
If V is an identifier it is a shortcut for Rel where Fv likes V

 

Employees [‘Anna’] is the same as
Employees : Name = ‘Anna’  (Name is the first string field)

Employees [Anna]   is the same as
Employees Name Likes Anna;

Employees[3-5-1987] is the same as
Employees where BirthDate = 3-5-1987 (BirthDate is the first DateTime field)

5)  { }  and ???  Special where clause analogous to the previous shortcut.
Rel{V} or Rel ??? V  are both shortcuts for: Rel where Fv1 = V or Fv2 = V with Fvi representing all the fields in the current scope that shares the same type as V.

DataDictionary ??? ‘date’

Filter the DataDictionary for a record containing ‘BirthDate’ in any string field without explicitly specifying a field name.

6)  Colbert offers a simple operator called when and is defined as
V when B  =>   if B then V else null

This function proves valuable when aggregating over a subset:

Sex,
avg (Salary),
avg (Salary when Married) ,
avg (Salary when not Married)
from Employees:

7)  The Colbert Parser endeavors to evaluate and execute a provided statement to a <relation>, any other SQL statement or an expression, in that order. Therefore next statements can be executed without the use of Select . . . from Dual:

23 + 5 * 60;
create const BoBe as 28-8-1672;
BoBe;
AgeOf(28-8-1672);
DateNow – BoBe;
NextDay BoBe;

Upcoming blog

The current blog discusses certain aspects, while future posts will delve more extensively into these subjects and also present other extensions.

Contact

© Bert Vegter 2024