Null is a special marker used in Structured Query Language (SQL) to indicate that a
data value does not exist in the database. Introduced by the
creator of the relational database model, E. F. Codd, SQL Null serves to fulfill the
requirement that all true relational database management systems
(RDBMS) support a representation of "missing information
and inapplicable information". Codd also introduced the use of the
lowercase Greek omega (ω) symbol to represent Null in database
NULL is also an SQL reserved keyword
used to identify the Null special marker.
Null has been the focus of controversy and a source of debate because of its associated Three-Valued Logic (3VL), special requirements for its use in SQL joins, and the special handling required by aggregate functions and SQL grouping operators. Although special functions and predicates are provided to properly handle Nulls, opponents feel that resolving these issues introduces unnecessary complexity and inconsistency into the relational model of databases.
Null was introduced by E. F. Codd as a method of representing missing data in the relational model. Codd later reinforced his requirement that all RDBMS' support Null to indicate missing data in a two-part series published in ComputerWorld magazine. Codd also introduced a ternary (three-valued) logic, consisting of the truth values True, False, and Unknown, which is closely tied to the concept of Null. The Unknown truth value is generated whenever Null is compared with any data value, or with another Null.
Codd indicated in his 1990 book The Relational Model for Database Management, Version 2 that the single Null mandated by the SQL standard was inadequate, and should be replaced by two separate Null-type markers to indicate the reason why data is missing. These two Null-type markers are commonly referred to as 'A-Values' and 'I-Values', representing 'Missing But Applicable' and 'Missing But Inapplicable', respectively. Codd's recommendation would have required SQL's logic system be expanded to accommodate a four-valued logic system. Because of this additional complexity, the idea of multiple Null-type values has not gained widespread acceptance.
Since Null is not a member of any data domain, it is not considered a "value", but rather a marker (or placeholder) indicating the absence of value. Because of this, comparisons with Null can never result in either True or False, but always in a third logical result, Unknown. The logical result of the expression below, which compares the value 10 to Null, is Unknown:
10 = NULL -- Results in Unknown
However, certain operations on Null can return values if the value of Null is not relevant to the outcome of the operation. Consider the following example in which the OR statement is evaluated in short-circuited form:
TRUE OR NULL -- Results in True
In this case, the fact that the value on the right of OR is unknowable is irrelevant, because the outcome of the OR operation would be True regardless of the value on the right.
SQL implements three logical results, so SQL implementations must provide for a specialized three-valued logic (3VL). The rules governing SQL three-valued logic are shown in the tables below (p and q represent logical states)"
Basic SQL comparison operators always return Unknown when
comparing anything with Null, so the SQL standard provides for two
special Null-specific comparison predicates. The
IS NOT NULL predicates test whether
data is, or is not, Null.
Null is untyped in SQL, meaning that it is not designated as an integer, character, or any other specific data type. Because of this, it is sometimes mandatory (or desirable) to explicitly convert Nulls to a specific data type. For example, if overloaded functions are supported by the RDBMS, SQL might not be able to automatically resolve to the correct function without knowing the data types of all parameters, including those for which Null is passed.
SQL three-valued logic is encountered in Data Manipulation Language
(DML) in comparison predicates of DML statements and queries.
WHERE clause causes the DML statement to act on
only those rows for which the predicate evaluates to True. Rows for
which the predicate evaluates to either False or Unknown are not
acted on by
DELETE DML statements, and are
SELECT queries. Interpreting
Unknown and False as the same logical result is a common error
encountered while dealing with Nulls.
The following simple example demonstrates this fallacy:
SELECT * FROM t WHERE i = NULL;
The example query above logically always returns zero rows
because the comparison of the i column with Null always
returns Unknown, even for those rows where i is Null. The
Unknown result causes the
SELECT statement to
summarily discard each and every row. (However, in practice, some
SQL tools will retrieve rows using a comparison with Null.)
CASE expressions operate under the same rules
as the DML
WHERE clause rules for Null. Because it can
be evaluated as a series of equality comparison conditions, a
CASE expression cannot check for the
existence of Null directly. A check for Null in a simple
CASE expression always results in Unknown, as in the
SELECT CASE i WHEN NULL THEN 'Is Null' -- This will never be returned WHEN 0 THEN 'Is Zero' -- This will be returned when i = 0 WHEN 1 THEN 'Is One' -- This will be returned when i = 1 END FROM t;
Because the expression
i = NULL evaluates to
Unknown no matter what value column i contains (even if it
contains Null), the string
'Is Null' will never be
CASE expression also returns
the first value for which the result of the comparison predicate
evaluates to True, including comparisons using the
IS NOT NULL comparison predicates. The
following example shows how to use a searched
expression to properly check for Null:
SELECT CASE WHEN i IS NULL THEN 'Null Result' -- This will be returned when i is NULL WHEN i = 0 THEN 'Zero' -- This will be returned when i = 0 WHEN i = 1 THEN 'One' -- This will be returned when i = 1 END FROM t;
In the searched
CASE expression, the string
'Null Result' is returned for all rows in which
i is Null.
The primary place in which SQL three-valued logic intersects
with SQL Data Definition Language (DDL)
is in the form of check constraints.
A check constraint placed on a column operates under a slightly
different set of rules than those for the DML
clause. While a DML
WHERE clause must evaluate to True
for a row, a check constraint must not evaluate to False. This
means that a check constraint will succeed if the result of the
check is either True or Unknown. The following example table with a
check constraint will prohibit any integer values from being
inserted into column i, but will allow Null to be inserted
since the result of the check will always evaluate to Unknown for
CREATE TABLE t ( i INTEGER, CONSTRAINT ck_i CHECK ( i < 0 AND i = 0 AND i > 0 ) );
In order to constrain a column to reject Nulls, the
NULL constraint can be applied, as shown in the example
NOT NULL constraint is semantically
equivalent to a check constraint
IS NOT NULL predicate.
CREATE TABLE t ( i INTEGER NOT NULL );
SQL/PSM (SQL Persistent Stored Modules) defines procedural extensions for SQL,
such as the
IF statement. However,
the major SQL vendors have historically included their own
proprietary procedural extensions. Procedural extensions for
looping and comparisons operate under Null comparison rules similar
to those for DML statements and queries. The following code
fragment, in ISO SQL standard format, demonstrates the use of Null
3VL in an
IF i = NULL THEN SELECT 'Result is True' ELSEIF NOT(i = NULL) THEN SELECT 'Result is False' ELSE SELECT 'Result is Unknown';
IF statement performs actions only for those
comparisons that evaluate to True. For statements that evaluate to
False or Unknown, the
IF statement passes control to
ELSEIF clause, and
finally to the
ELSE clause. The
result of the code above will always be the message
is Unknown' since the comparisons with Null always evaluate
SQL outer joins,
including left outer joins, right outer joins, and full outer
joins, automatically produce Nulls as placeholders for missing
values in related tables. For left outer joins, for instance, Nulls
are produced in place of rows missing from the table appearing on
the right-hand side of the
LEFT OUTER JOIN operator.
The following simple example uses two tables to demonstrate Null
placeholder production in a left outer join.
The first table (Employee) contains employee ID numbers and names, while the second table (PhoneNumber) contains related employee ID numbers and phone numbers, as shown below.
The following sample SQL query performs a left outer join on these two tables.
SELECT e.ID, e.LastName, e.FirstName, pn.Number FROM Employee e LEFT OUTER JOIN PhoneNumber pn ON e.ID = pn.ID;
The result set generated by this query demonstrates how SQL uses Null as a placeholder for values missing from the right-hand (PhoneNumber) table, as shown below.
Inner joins and cross joins, also available in standard SQL, do not generate Null placeholders for missing values in related tables.
Care must be taken when using nullable columns in SQL join
criteria. Because a Null is not equal to any other Null, Nulls in a
column of one table will not join to Nulls in the related column of
another table using the standard equality comparison operators. The
COALESCE function or
can be used to "simulate" Null equality in join criteria, and the
IS NULL and
IS NOT NULL predicates can be
used in the join criteria as well.
The following predicate tests for equality of the values A and B
and treats Nulls as being equal. The
is required since A = B returns a Null value if at least one of A
or B is Null and
NULL OR FALSE is Null itself.
IFNULL( A = B, FALSE ) OR ( A IS NULL AND B IS NULL )
Because Null is not a data value, but a marker for an unknown value, using mathematical operators on Null results in an unknown value, which is represented by Null. In the following example, multiplying 10 by Null results in Null:
10 * NULL -- Result is NULL
This can lead to unanticipated results. For instance, when an attempt is made to divide Null by zero, platforms may return Null instead of throwing an expected "data exception - division by zero". Though this behavior is not defined by the ISO SQL standard many DBMS vendors treat this operation similarly. For instance, the Oracle, PostgreSQL, MySQL Server, and Microsoft SQL Server platforms all return a Null result for the following:
NULL / 0
String concatenation operations, which are
common in SQL, also result in Null when one of the operands is
The following example demonstrates the Null result returned by
using Null with the SQL
|| string concatenation
'Fish ' || NULL || 'Chips' -- Result is NULL
This is not true for all database implementations. In an Oracle RDBMS for example NULL and the empty string are considered the same thing and therefore 'Fish ' || NULL || 'Chips' results in 'Fish Chips'.
SQL defines aggregate functions to simplify server-side aggregate calculations on data. Almost all aggregate functions perform a Null-elimination step, so that Null values are not included in the final result of the calculation. This implicit Null elimination, however, can have an impact on aggregate function results.
The following example table results in different results being
returned for each column when the SQL
aggregate function is applied:
AVG aggregate function returns 233 when
applied to column i, but returns 175 when applied to
column j. The aggregate function's Null-elimination step
accounts for the difference in these results. The only aggregate
function that does not implicitly eliminate Null is the
Because SQL:2003 defines all Null markers as being unequal to
one another, a special definition was required in order to group
Nulls together when performing certain operations. SQL defines "any
two values that are equal to one another, or any two Nulls", as
This definition of not distinct allows SQL to group and
sort Nulls when the
GROUP BY clause (and other
keywords that perform grouping) are used.
Other SQL operations, clauses, and keywords use "not distinct" in their treatment of Nulls. These include the following:
PARTITION BYclause of ranking and windowing functions like
EXCEPToperator, which treat NULLs as the same for row comparison/elimination purposes
DISTINCTkeyword used in
The SQL standard does not explicitly define a default sort order
for Nulls. Instead, on conforming systems, Nulls can be sorted
before or after all data values by using the
NULLS LAST clauses of the
BY list, respectively. Not all DBMS vendors implement this
functionality, however. Vendors who do not implement this
functionality may specify different treatments for Null sorting in
B-trees can handle equality and range queries on data that can be sorted into some ordering. In particular, the PostgreSQL query planner will consider using a B-tree index whenever an indexed column is involved in a comparison using one of these operators: < <= = >= > Constructs equivalent to combinations of these operators, such as BETWEEN and IN, can also be implemented with a B-tree index search. (But note that IS NULL is not equivalent to = and is not indexable.)
In cases where the index enforces uniqueness, NULL values are excluded from the index and uniqueness is not enforced between NULL values. Again, quoting from the PostgreSQL documentation:
When an index is declared unique, multiple table rows with equal indexed values will not be allowed. Null values are not considered equal. A multicolumn unique index will only reject cases where all of the indexed columns are equal in two rows.
This is consistent with the SQL:2003-defined behavior of scalar Null comparisons.
Another method of indexing Nulls involves handling them as not distinct in accordance with the SQL:2003-defined behavior. For example, Microsoft SQL Server documentation states the following:
For indexing purposes, NULL values compare as equal. Therefore, a unique index, or UNIQUE constraint, cannot be created if the key values are NULL in more than one row. Select columns that are defined as NOT NULL when columns for a unique index or unique constraint are chosen.
Both of these indexing strategies are consistent with the SQL:2003-defined behavior of Nulls. Because indexing methodologies are not explicitly defined by the SQL:2003 standard, indexing strategies for Nulls are left entirely to the vendors to design and implement.
SQL defines two functions to explicitly handle Nulls:
NULLIF. Both functions are
abbreviations for searched
COALESCE function accepts a list of parameters,
returning the first non-Null value from the list:
COALESCE(value1, value2, value3, ...)
COALESCE is defined as shorthand for the following
CASE WHEN value1 IS NOT NULL THEN value1 WHEN value2 IS NOT NULL THEN value2 WHEN value3 IS NOT NULL THEN value3 ... END
Some SQL DBMSs implement vendor-specific functions similar to
COALESCE. Some systems implement an
ISNULL function, or other similar functions that are
functionally similar to
NULLIF function accepts two parameters. If the
first parameter is equal to the second parameter,
NULLIF returns Null. Otherwise, the value of the first
parameter is returned.
NULLIF is an abbreviation for the following
CASE WHEN value1 = value2 THEN NULL ELSE value1 END
Misunderstanding of how Null works is the cause of a great
number of errors in SQL code, both in ISO standard SQL statements
and in the specific SQL dialects supported by real-world database
management systems. These mistakes are usually the result of
confusion between Null and either 0 (zero) or an empty string (a
string value with a length of zero, represented in SQL as
''). Null is defined by the ISO SQL standard as
different from both an empty string and the numerical value
0, however. While Null indicates the absence of any
value, the empty string and numerical zero both represent actual
For example, a
WHERE clause or conditional
statement might compare a column's value with a constant. It is
often incorrectly assumed that a missing value would be "less than"
or "not equal to" a constant if that field contains Null, but, in
fact, such expressions return Unknown. An example is below:
SELECT * FROM sometable WHERE num <> 1; -- Rows where num is NULL will not be returned, -- contrary to many users' expectations.
Similarly, Null values are often confused with empty strings.
LENGTH function, which returns the number
of characters in a string. When a Null is passed into this
function, the function returns Null. This can lead to unexpected
results, if users are not well versed in 3-value logic. An example
SELECT * FROM sometable WHERE LENGTH(string) < 20; -- Rows where string is NULL will not be returned.
This is complicated by the fact that in some database interface programs, NULL is reported as an empty string, and empty strings may be incorrectly stored as NULL.
The ISO SQL implementation of Null is the subject of criticism, debate and calls for change. In The Relational Model for Database Management: Version 2, Codd suggested that the SQL implementation of Null was flawed and should be replaced by two distinct Null-type markers. The markers he proposed were to stand for "Missing but Applicable" and "Missing but Inapplicable", known as A-values and I-values, respectively. Codd's recommendation, if accepted, would have required the implementation of a four-valued logic in SQL. Others have suggested adding additional Null-type markers to Codd's recommendation to indicate even more reasons that a data value might be "Missing", increasing the complexity of SQL's logic system. At various times, proposals have also been put forth to implement multiple user-defined Null markers in SQL. Because of the complexity of the Null-handling and logic systems required to support multiple Null markers, none of these proposals have gained widespread acceptance.
Chris Date and Hugh Darwen, authors of The Third Manifesto, have suggested that the SQL Null implementation is inherently flawed and should be eliminated altogether, pointing to inconsistencies and flaws in the implementation of SQL Null-handling (particularly in aggregate functions) as proof that the entire concept of Null is flawed and should be removed from the Relational Model. Others, like author Fabian Pascal, have stated a belief that "how the function calculation should treat missing values is not governed by the relational model."
Another point of conflict concerning Nulls is that they violate the closed world assumption model of relational databases by introducing an open world assumption into it. The Closed World Assumption, as it pertains to databases, states that "Everything stated by the database, either explicitly or implicitly, is true; everything else is false." This view assumes that the knowledge of the world stored within a database is complete. Nulls, however, operate under the Open World Assumption, in which some items stored in the database are considered unknown, making the databases's stored knowledge of the world incomplete.
SQL allows three logical choices, true, false, and unknown, which means that SQL necessarily ignores the law of the excluded middle. Put simply the Law of the Excluded Middle essentially states that when given any Boolean result, the opposite of the result can be obtained by applying the logical "not" operator. This does not apply to SQL nulls, however. Under the precepts of the law of the excluded middle, a Boolean expression like the following can be simplified:
SELECT * FROM stuff WHERE ( x = 10 ) OR NOT ( x = 10 );
The law of the excluded middle allows for simplification of the WHERE clause predicate, which would result in a statement like the following:
SELECT * FROM stuff;
This will not work in SQL, since the x column could contain nulls which would result in some new rows being returned.
SELECT * FROM stuff; -- is (because of 3VL) equivalent to: SELECT * FROM stuff WHERE ( x = 10 ) OR NOT ( x = 10 ) OR x IS NULL;
Thus, to correctly simplify the first statement in SQL requires that we return all rows in which x is not null.
SELECT * FROM stuff WHERE x IS NOT NULL;
While ignoring the law of the excluded middle does introduce additional complexity to SQL logic, attempts to apply this rule to SQL's 3VL results in a false dichotomy.
The ISO SQL:1999 standard introduced the Boolean datatype to SQL. The Boolean datatype, as defined by the standard, can hold the truth values TRUE, FALSE, and UNKNOWN. Null is defined in this one instance as equivalent to the truth value UNKNOWN.
This "null equals UNKNOWN truth value" proposition introduces an inconsistency into SQL 3VL. One major problem is that it contradicts a basic property of nulls, the property of propagation. Nulls, by definition, propagate through all SQL expressions. The Boolean truth values do not have this property. Consider the following scenarios in SQL:1999, in which two Boolean truth values are combined into a compound predicate. According to the rules of SQL 3VL, and as shown in the 3VL truth table shown earlier in this article, the following statements hold:
However, because nulls propagate, treating null as UNKNOWN results in the following logical inconsistencies in SQL 3VL:
The SQL:1999 standard does not define how to deal with this inconsistency, and results could vary between implementations. Because of these inconsistencies and lack of support from vendors the SQL Boolean datatype did not gain widespread acceptance. Most SQL DBMS platforms now offer their own platform-specific recommendations for storing Boolean-type data.