Structured Query Language
History and Development
Structured Query Language (SQL) is a database language for managing data in a relation DBMS. Its original inception was based on a paper by Edgar F. Codd in 1970 titled A Relational Model of Data for Large Shared Data Banks (Codd 1970). Two employees working at IBM in the 1970s, Donald D. Chamberlin and Raymond F. Boyce, developed the first version of SQL in 1974 (Chamberlin and Boyce 1974).
The first official standard of SQL was SQL-86, or SQL1, which was published in 1986 by the American National Standards Institute (ANSI). The following table shows the release dates of major SQL standards along with a brief description of the changes made in each version.
Standard | Name | Description |
---|---|---|
SQL-86 | SQL1 | First official standard of SQL |
SQL-89 | SQL2 | Added support for integrity constraints, views, and assertions |
SQL-92 | SQL2 | Added support for triggers, recursive queries, and support for procedural programming |
SQL:1999 | SQL3 | Added support for object-relational features |
SQL:2003 | SQL3 | Added support for XML, window functions, and support for regular expressions |
SQL:2006 | SQL3 | Added more XML storage features and XQuery support |
SQL:2008 | SQL3 | Added support for TRUNCATE TABLE and enhanced MERGE statements |
SQL:2011 | SQL3 | Added support for temporal data |
SQL:2016 | SQL3 | Added support for JSON |
SQL:2023 | SQL3 | Added support for Propery Graph Queries and new JSON features |
Schemas
In our Introduction to Databases we discussed the concept of a schema as a definition of the structure of a database. In SQL, a schema is a collection of database objects, such as tables, views, and indexes. A schema is owned by a database user and has the same name as the user. A database user can own multiple schemas, and a schema can be owned by multiple users. A schema can also be owned by a role, which is a collection of users. A role can own multiple schemas, and a schema can be owned by multiple roles.
There are several practical reasons for which we would want to create multiple schemas. For example, a database might be used by both a Human Resources and Healthcare Management application. Creating two separate schemas would ensure that data for each application is kept secure from unauthorized users. Multiple schemas are also used for testing and development processes. Large structural changes to an application may require a new scheme to be created. New features can be developed in the new schema while the old schema is still being used by the application.
The following command creates a new schema named MedApp
and assigns it to the user MedAdmin
.
CREATE SCHEMA MedApp AUTHORIZATION MedAdmin;
Data Types
SQL supports a wide variety of data types. The following table shows the most common data types supported by SQL.
Data Type | Description |
---|---|
CHAR(n) | Fixed-length character string. The maximum length is n characters. |
VARCHAR(n) | Variable-length character string. The maximum length is n characters. |
INT | Integer value. The maximum value is 2^31 - 1 . |
SMALLINT | Integer value. The maximum value is 2^15 - 1 . |
DECIMAL(i,j) | Fixed-point number. The maximum precision is 38 digits. The maximum scale is 38 digits. |
NUMERIC(i,j) | Fixed-point number. The maximum precision is 38 digits. The maximum scale is 38 digits. |
REAL | Floating-point number. The maximum precision is 6 digits. |
DOUBLE | Floating-point number. The maximum precision is 15 digits. |
DATE | Date value. The range is 1000-01-01 to 9999-12-31 . |
TIME | Time value. The range is 00:00:00 to 23:59:59 . |
TIMESTAMP | Date and time value. The range is 1000-01-01 00:00:00 to 9999-12-31 23:59:59 . |
CLOB(n) | Specifies columns with large text values. Maximum length specified in kilobytes (K), megabytes (M), or gigabytes (G) |
BIT(n) | Fixed-length bit string. |
BIT VARYING(n) | Variable-length bit string. |
BLOB(n) | Binary Large Object - used for images, video, and other large items. |
Creation
Creating schemas, databases, and tables is done with the CREATE
command. The following command creates a new database named RPG
.
CREATE DATABASE RPG;
When creating a new table, we must specify the name of the table and the attributes of the table. The following command creates a new table named Users
with four attributes.
CREATE TABLE Users (
user_id INT,
username VARCHAR(50),
email VARCHAR(50),
created_at TIMESTAMP
);
Constraints
Constraints allow us to add rules to our database that ensure the integrity of our data. There are several types of constraints that can be added to a table. For example, if a user is deleted, we may want to delete all of the user’s posts as well. This can be accomplished by adding a CASCADE
constraint to the DELETE
statement. We can also set a default value to each attribute. Constraints such as CHECK
and UNIQUE
can be added to ensure that the data is valid and unique. The following table shows the most common constraints supported by SQL.
Constraint | Description |
---|---|
NOT NULL | Ensures that a column cannot have a NULL value. |
UNIQUE | Ensures that all values in a column are unique. |
PRIMARY KEY | A combination of a NOT NULL and UNIQUE. |
FOREIGN KEY | Ensures that values in a column match values in another table’s column. |
CHECK | Ensures that all values in a column satisfy a specific condition. |
DEFAULT | Sets a default value for a column when no value is specified. |
INDEX | Used to create and retrieve data from the database very quickly. |
AUTO INCREMENT | Automatically generates a unique number when a new record is inserted into a table. |
When creating the Users
table above, we may want to ensure that the user_id
attribute is unique. We can do this by adding a UNIQUE
constraint to the user_id
attribute. It is also possible to have it auto increment so that we do not have to specify a value for it when inserting a new user.
CREATE TABLE Users (
user_id INT UNIQUE AUTO_INCREMENT,
username VARCHAR(50),
email VARCHAR(50),
created_at TIMESTAMP
);
The following command creates a new table named Characters
with a VARCHAR
attribute named Name
which is set to NOT NULL
.
CREATE TABLE Characters (
Name VARCHAR(50) NOT NULL
);
Constraints can also be added after the initial attribute declaration. When creating the Characters
table, if we want to state that the user_id
field should be a foreign key, we can add a FOREIGN KEY
constraint to the user_id
attribute.
CREATE TABLE Characters (
id INT UNIQUE AUTO_INCREMENT,
Name VARCHAR(50) NOT NULL,
user_id INT,
CONSTRAINT fk_user_id
FOREIGN KEY (user_id) REFERENCES Users(user_id)
);
The constraint is given the name fk_user_id
and is added to the user_id
attribute. The FOREIGN KEY
constraint states that the user_id
attribute references the user_id
attribute in the Users
table.
Retrieving Data
Retrieving data from an SQL database is done with an SFW
query, SELECT-FROM-WHERE
.
SELECT <attribute list>
FROM <table list>
WHERE <condition>
For example, we can get the experience and level of a character named Atticus
from the Characters
table with the following query.
SELECT experience, level
FROM Characters
WHERE Name = 'Atticus';
The attributes we retrieve in a query are referred to as the projection attributes
. This query SELECT~s a ~Character
from all rows of the Character
table which satisfy the selection condition of the WHERE
clause.
We can also query the e-mail addresses of all users who have a character that is a human.
SELECT email
FROM Users, Characters, Races
WHERE Users.user_id = Characters.user_id AND Characters.race_id = Races.id AND Races.name = 'Human';
The WHERE
clause in this example is an example of a join condition since it combines attributes from multiple tables. Note that there are two tables which have a user_id
attribute, so we must differentiate them by prepending the table name before the attribute name. This is how ambiguities are solved in SQL.
You can also use the AS
keyword to shorthand the table names in your query. The previous query can be rewritten as
SELECT U.username
FROM Users AS U, Characters AS C, Races AS R
WHERE U.user_id = C.user_id AND R.id = C.race_id AND R.name = 'Human';
Duplicate Return Values
The previous query returns the names of all users who have a Human
character. If a user has multiple characters that are Human
, it will return their name multiple times. If we are instead only interested in the names of users who have a Human
character, we can use the DISTINCT
keyword to remove duplicate values.
SELECT DISTINCT U.username
FROM Users AS U, Characters AS C, Races AS R
WHERE U.user_id = C.user_id AND R.id = C.race_id AND R.name = 'Human';
Tables as Sets
SQL uses some set operations from set theory. It supports the UNION
, set difference EXCEPT
, and set intersection INTERSECT
operations. The following query returns the names of all users who have a Human
character or a Gnome
character.
(SELECT DISTINCT U.username
FROM Users as U, Characters as C, Races as R
WHERE U.user_id = C.user_id AND C.race_id = R.id AND R.name = 'Human')
UNION
(SELECT DISTINCT U.username
FROM Users as U, Characters as C, Races as R
WHERE U.user_id = C.user_id AND C.race_id = R.id AND R.name = 'Gnome');
If we wanted to find the users who had both a Human
character and a Gnome
character, we could use the INTERSECT
operator instead.
(SELECT DISTINCT U.username
FROM Users as U, Characters as C, Races as R
WHERE U.user_id = C.user_id AND C.race_id = R.id AND R.name = 'Human')
INTERSECT
(SELECT DISTINCT U.username
FROM Users as U, Characters as C, Races as R
WHERE U.user_id = C.user_id AND C.race_id = R.id AND R.name = 'Gnome');
We can also use the EXCEPT
operator to find the users who have a Human
character but not a Gnome
character.
(SELECT DISTINCT U.username
FROM Users as U, Characters as C, Races as R
WHERE U.user_id = C.user_id AND C.race_id = R.id AND R.name = 'Human')
EXCEPT
(SELECT DISTINCT U.username
FROM Users as U, Characters as C, Races as R
WHERE U.user_id = C.user_id AND C.race_id = R.id AND R.name = 'Gnome');
Pattern Matching
SQL supports pattern matching with the LIKE
operator. The LIKE
operator is used in the WHERE
clause to search for a specified pattern in a column. This is different from equality operators since it allows us to search for patterns rather than exact matches. The following table shows the most common wildcards used in SQL.
Wildcard | Description |
---|---|
% | Matches any string of zero or more characters. |
_ | Matches any single character. |
[] | Matches any single character within the brackets. |
[^] | Matches any single character not within the brackets. |
The following query returns the names of all simple items in the Items
table. These can be found based on their description, since the term simple
is not explicitly mentioned in the name.
SELECT name
FROM Items
WHERE description LIKE '%simple%';
We can also query based on arithmetic ranges. For example, we might be interested in the items that are less than 100 gold.
SELECT name
FROM Items
WHERE value < 100;
Ordering
SQL allows us to order the results of our query with the ORDER BY
clause. The following query returns the names of all items in the Items
table ordered by their value.
SELECT name
FROM Items
ORDER BY value;
We can also order by multiple attributes. The following query returns the names of all items in the Items
table ordered by their value and then their name.
SELECT name
FROM Items
ORDER BY value, name;
Modifying Data
Inserting Data
We previously saw an example of inserting new data. Let’s insert a new user account to our table. If we are inserting a value for every attribute, we can omit the attribute list.
INSERT INTO Users
VALUES (7, 'Alex', 'alex.dillhoff@uta.edu', '2023-10-31 15:26:17');
If we are only inserting values for some attributes, we must specify the attribute list.
INSERT INTO Users (user_id, username, email)
VALUES (7, 'Alex', 'alex.dillhoff@uta.edu');
If we attempt to leave out a value for an attribute that is not nullable, we will get an error. While working on our database, we may have realized that some of these important attributes should always be specified. We can add a NOT NULL
constraint to these attributes to ensure that they are always specified. We will look at ways of modifying tables in the next section.
Updating Data
Updating data is a common task and is easily supported by the UPDATE
command. In an RPG, players will use items, gain experience, and level up. All of these will require modifications to existing tables. For example, if we wish to update the experience of a character, we can use the following query.
UPDATE Characters
SET experience = experience + 100
WHERE name = 'Atticus';
Deleting Data
Deleting a tuple or several tuples is straightforward in SQL. The following query deletes the user with the user_id
of 7
from the Users
table.
DELETE FROM Users
WHERE user_id = 7;
If we want to delete all tuples from a table, we can use the TRUNCATE TABLE
command. This command is faster than deleting all tuples with the DELETE
command since it does not log each deletion. However, it cannot be used if the table is referenced by a foreign key constraint.
TRUNCATE TABLE Users;
When deleting tuples from a database, it’s important to consider any foreign key constraints that the table may have. If we delete a tuple from a table that is referenced by a foreign key constraint, we may end up with orphaned tuples. For example, if we delete a user from the Users
table, we may end up with a character that has no user. We can avoid this by adding a CASCADE
constraint to the DELETE
statement. This will delete all tuples that reference the tuple we are deleting.
Nested Queries
Nested queries allow us to make more complex queries on subsets of data returned from an original query. A nested query can be placed in any of the SELECT
, FROM
, or WHERE
clauses. The query that uses the results of the nested query is called the outer query.
What if we wanted a list of users who had at least 1 character whose class was the least represented class across all characters? We could first make a query to identify which class is the least represented before using that to find the users who have a character with that class.
SELECT username
FROM Users
WHERE user_id IN (SELECT user_id
FROM Characters
WHERE class_id = (SELECT class_id
FROM Characters
GROUP BY class_id
ORDER BY COUNT(*) ASC
LIMIT 1));
The innert-most query returns the class_id
of the least represented class by counting the number of characters in each class and ordering them in ascending order. The middle query returns the user_id
of all characters whose class is the least represented class. The outer query returns the username
of all users who have a character with the least represented class.
Pay attention to the second WHERE
clause that uses the =
operator instead of IN
. This is because the nested query returns a single value and single tuple. If we used the IN
operator, we would get an error since MySQL does not support the LIMIT
and IN/ALL/ANY/SOME
operators together.
Correlated Nested Queries
Some nested queries would have to execute for each tuple in the outer query. Consider the following query which returns the name and ID of all Human characters.
SELECT C.id, C.name
FROM Characters as C
WHERE C.race_id IN (SELECT R.id
FROM Races as R
WHERE R.name = 'Human');
This query is an example of a correlated nested query since the inner query is dependent on the outer query. The inner query must be executed for each tuple in the outer query. This can be inefficient if the outer query returns a large number of tuples.
Since this query uses only an IN
operator, we can rewrite it as a single block query.
SELECT C.id, C.name
FROM Characters as C, Races as R
WHERE C.race_id = R.id AND R.name = 'Human';
Let’s revisit an earlier query to introduce the EXISTS
operator. If we want to query the user names and IDs of all users who have an elf character, we can use the following query.
SELECT U.id, U.username
FROM Users as U
WHERE EXISTS (SELECT *
FROM Characters as C, Races as R
WHERE C.user_id = U.id AND C.race_id = R.id
AND R.name = 'Elf');
We can also use the NOT EXISTS
operator to find the users who do not have an elf character. This works opposite to the EXISTS
operator.
Joined Tables
Joined tables are the result of a query that combines rows from two or more tables. The syntax itself may be easier to understand as compared to the nested queries written above. The following query returns the names of all users who have a Human
character.
SELECT U.username, C.name
FROM (Users as U JOIN Characters as C ON U.user_id = C.user_id), Races as R
WHERE C.race_id = R.id AND R.name = 'Human';
Here, the JOIN
function is used to combine both tables on the condition that the user_id
of the Users
table is equal to the user_id
of the Characters
table. The WHERE
clause is used to filter the results to only those that have a Human
character.
This type of join is also referred to as an inner join since it only returns rows that satisfy the join condition. There are several other types of joins that we can use to combine tables. If we wanted to return the same information along with all of the users who do not have a Human
character, we can use a left outer join.
SELECT U.username, C.name
FROM (Users as U LEFT OUTER JOIN
(Characters as C JOIN Races as R ON C.race_id = R.id AND R.name = 'Human')
ON U.user_id = C.user_id);
Aggregate Functions
Aggregate functions are used to perform calculations on a set of values and return a single value. The following table shows the most common aggregate functions supported by SQL.
Function | Description |
---|---|
AVG() | Returns the average value of a numeric column. |
COUNT() | Returns the number of rows that match a specified criteria. |
MAX() | Returns the maximum value of a column. |
MIN() | Returns the minimum value of a column. |
SUM() | Returns the sum of all values in a column. |
These can be used in the SELECT
clause. The following query returns the average level of all characters.
SELECT AVG(level)
FROM Characters;
These statistics can be used with more complex queries. The following query returns the name of the user with the highest level character.
SELECT U.username
FROM Users as U, Characters as C
WHERE U.user_id = C.user_id AND C.level = (SELECT MAX(level)
FROM Characters);
In the next example, the query returns the tuple of the highest level Human character.
SELECT C.*
FROM (Characters as C JOIN Races as R ON C.race_id = R.id AND R.name = 'Human')
WHERE C.level = (SELECT MAX(level)
FROM Characters);
Grouping
As we just saw, aggregate functions permit some preliminary data analysis. We can take this further using grouping. For example, we calculated above the average level of all characters. What if we wanted to compute the average level of each class? We can use the GROUP BY
clause to group the tuples by class.
SELECT C.class_id, AVG(C.level)
FROM Characters as C
GROUP BY C.class_id;
In the next example, we will run a similar query except we will also include the number of distinct users who have a character of that class.
SELECT C.class_id, AVG(C.level), COUNT(DISTINCT C.user_id)
FROM Characters as C
GROUP BY C.class_id;
WITH
Clause
Let’s say we wanted to get the actual class name along with the users that had a character with the least represented class. We can use a nested query in the FROM
clause to get the class name.
SELECT username, Classes.name
FROM Users, Classes
WHERE user_id IN (SELECT user_id
FROM Characters
WHERE class_id = (SELECT class_id
FROM Characters
GROUP BY class_id
ORDER BY COUNT(*) ASC
LIMIT 1))
AND Classes.id = (SELECT class_id
FROM Characters
GROUP BY class_id
ORDER BY COUNT(*) ASC
LIMIT 1);
This query might immediately come across as inefficient since we are making the same query multiple times. If you agree, your intuition would be right. We can use the WITH
clause to make this query more efficient.
WITH LeastUsedClass AS (
SELECT class_id
FROM Characters
GROUP BY class_id
ORDER BY COUNT(*) ASC
LIMIT 1
)
SELECT U.username, C.name
FROM Users U
JOIN Characters CH ON U.user_id = CH.user_id
JOIN Classes C ON C.id = CH.class_id
WHERE CH.class_id = (SELECT class_id FROM LeastUsedClass);
Modifying Tables
Modifying databases used in production is inevitable. Typically, you will modify an offline test version before deploying it, but the process is the same. The ALTER TABLE
command is used to modify tables. The following table shows the most common modifications that can be made to a table.
Command | Description |
---|---|
ADD | Adds a new column to the table. |
DROP | Deletes a column from the table. |
MODIFY | Changes the data type of a column. |
RENAME | Changes the name of a column. |
When we originally created the Users
table, we did not specify a NOT NULL
constraint for the username
attribute. We can add this constraint with the following command.
ALTER TABLE Users
MODIFY username VARCHAR(50) NOT NULL;
Another mistake that was made was with the name of the ID column. We can rename this column with the following command.
ALTER TABLE Users
RENAME COLUMN user_id TO id;
Scenario: Updating Foreign Keys
Let’s say we want to delete anything related to a user if that user is deleted from the Users
table. That means all characters and inventories associated with those characters should be deleted. Currently, attempting to delete a user will fail with the following error:
SQL Error [1451] [23000]: Cannot delete or update a parent row: a foreign key constraint fails (`rpg`.`Inventory`, CONSTRAINT `Iventory_ibfk_1` FOREIGN KEY (`character_id`) REFERENCES `Characters` (`id`))
It looks like the foreign key also has a typo. Let’s recreate this foreign key so that it will delete all characters and inventories associated with a user. First we drop the current key.
ALTER TABLE Inventory
DROP FOREIGN KEY Iventory_ibfk_1;
Then we add a new one.
ALTER TABLE Inventory
ADD CONSTRAINT Inventory_ibfk_1
FOREIGN KEY (character_id) REFERENCES Characters(id) ON DELETE CASCADE;
Summary
A typical SQL query consists of a SELECT
clause, a FROM
clause, and a WHERE
clause. The SELECT
clause specifies the attributes to be returned. The FROM
clause specifies the tables to be queried. The WHERE
clause specifies the conditions that must be satisfied for a tuple to be returned.
As we saw in the examples above, there are up to 6 clauses that can be used in a query. The following table shows the clauses that can be used in a query and the order in which they must appear.
Clause | Description |
---|---|
SELECT | Specifies the attributes to be returned. |
FROM | Specifies the tables to be queried. |
WHERE | Specifies the conditions that must be satisfied for a tuple to be returned. |
GROUP BY | Groups the tuples by a specified attribute. |
HAVING | Specifies the conditions that must be satisfied for a group to be returned. |
ORDER BY | Specifies the order in which the tuples are returned. |