SQL Zone is brought to you in partnership with:

Database consultant, working with database-related technologies since 1997. Alex is a DZone MVB and is not an employee of DZone and has posted 2 posts at DZone. View Full User Profile

10 things in MySQL that won’t work as expected

11.12.2010
| 13813 views |
  • submit to reddit

#10. Searching for a NULL

SELECT  *
FROM a
WHERE a.column = NULL

In SQL, a NULL is never equal to anything, even another NULL. This query won’t return anything and in fact will be thrown out by the optimizer when building the plan.

When searching for NULL values, use this instead:

SELECT  *
FROM a
WHERE a.column IS NULL


#9. LEFT JOIN with additional conditions

SELECT  *
FROM a
LEFT JOIN
b
ON b.a = a.id
WHERE b.column = 'something'

A LEFT JOIN is like INNER JOIN except that it will return each record from a at least once, substituting missing fields from b with NULL values, if there are no actual matching records.

The WHERE condition, however, is evaluated after the LEFT JOIN so the query above checks column after it had been joined. And as we learned earlier, no NULL value can satisfy an equality condition, so the records from a without corresponding record from b will unavoidably be filtered out.

Essentially, this query is an INNER JOIN, only less efficient.

To match only the records with b.column = 'something' (while still returning all records from a), this condition should be moved into ON clause:

SELECT  *
FROM a
LEFT JOIN
b
ON b.a = a.id
AND b.column = 'something'


#8. Less than a value but not a NULL

Quite often I see the queries like this:

SELECT  *
FROM b
WHERE b.column < 'something'
AND b.column IS NOT NULL

This is actually not an error: this query is valid and will do what’s intended. However, IS NOT NULL here is redundant.

If b.column is a NULL, then b.column < 'something' will never be satisfied, since any comparison to NULL evaluates to a boolean NULL and does not pass the filter.

It is interesting that this additional NULL check is never used for greater than queries (like in b.column > 'something').

This is because NULL go first in ORDER BY in MySQL and hence are incorrectly considered less than any other value by some people.

This query can be simplified:

SELECT  *
FROM b
WHERE b.column < 'something'
and will still never return a NULL in b.column.

#7. Joining on NULL

SELECT  *
FROM a
JOIN b
ON a.column = b.column

When column is nullable in both tables, this query won't return a match of two NULLs for the reasons described above: no NULLs are equal.

Here's a query to do that:

SELECT  *
FROM a
JOIN b
ON a.column = b.column
OR (a.column IS NULL AND b.column IS NULL)

MySQL's optimizer treats this as an equijoin and provides a special join condition, ref_or_null.

#6. NOT IN with NULL values

SELECT  a.*
FROM a
WHERE a.column NOT IN
(
SELECT column
FROM b
)

This query will never return anything if there is but a single NULL in b.column. As with other predicates, both IN and NOT IN against NULL evaluate to NULL.

This should be rewritten using a NOT EXISTS:

SELECT  a.*
FROM a
WHERE NOT EXISTS
(
SELECT NULL
FROM b
WHERE b.column = a.column
)

Unlike IN, EXISTS always evaluates to either true or false.

#5. Ordering random samples

SELECT  *
FROM a
ORDER BY
RAND(), column
LIMIT 10

This query attempts to select 10 random records ordered by column.

ORDER BY orders the output lexicographically: that is, the records are only ordered on the second expression when the values of the first expression are equal.

However, the results of RAND() are, well, random. It's infeasible that the values of RAND() will match, so ordering on column after RAND() is quite useless.

To order the randomly sampled records, use this query:

SELECT  *
FROM (
SELECT *
FROM mytable
ORDER BY
RAND()
LIMIT 10
) q
ORDER BY
column

#4. Sampling arbitrary record from a group

This query intends to select one column from each group (defined by grouper)

SELECT  DISTINCT(grouper), a.*
FROM a

DISTINCT is not a function, it's a part of SELECT clause. It applies to all columns in the SELECT list, and the parentheses here may just be omitted. This query may and will select the duplicates on grouper (if the values in at least one of the other columns differ).

Sometimes, it's worked around using this query (which relies on MySQL's extensions to GROUP BY):

SELECT  a.*
FROM a
GROUP BY
grouper

Unaggregated columns returned within each group are arbitrarily taken.

At first, this appears to be a nice solution, but it has quite a serious drawback. It relies on the assumption that all values returned, though taken arbitrarily from the group, will still belong to one record.

Though with current implementation is seems to be so, it's not documented and can be changed in any moment (especially if MySQL will ever learn to apply index_union after GROUP BY). So it's not safe to rely on this behavior.

This query would be easy to rewrite in a cleaner way if MySQL supported analytic functions. However, it's still possible to make do without them, if the table has a PRIMARY KEY defined:

SELECT  a.*
FROM (
SELECT DISTINCT grouper
FROM a
) ao
JOIN a
ON a.id =
(
SELECT id
FROM a ai
WHERE ai.grouper = ao.grouper
LIMIT 1
)

#3. Sampling first record from a group

This is a variation of the previous query:

SELECT  a.*
FROM a
GROUP BY
grouper
ORDER BY
MIN(id) DESC

Unlike the previous query, this one attempts to select the record holding the minimal id.

Again: it is not guaranteed that the unaggregated values returned by a.* will belong to a record holding MIN(id) (or even to a single record at all).

Here's how to do it in a clean way:

SELECT  a.*
FROM (
SELECT DISTINCT grouper
FROM a
) ao
JOIN a
ON a.id =
(
SELECT id
FROM a ai
WHERE ai.grouper = ao.grouper
ORDER BY
ai.grouper, ai.id
LIMIT 1
)


This query is just like the previous one but with ORDER BY added to ensure that the first record in id order will be returned.

#2. IN and comma-separated list of values

This query attempts to match the value of column against any of those provided in a comma-separated string:

SELECT  *
FROM a
WHERE column IN ('1, 2, 3')

This does not work because the string is not expanded in the IN list.

Instead, if column column is a VARCHAR, it is compared (as a string) to the whole list (also as a string), and of course will never match. If column is of a numeric type, the list is cast into the numeric type as well (and only the first item will match, at best).

The correct way to deal with this query would be rewriting it as a proper IN list

SELECT  *
FROM a
WHERE column IN (1, 2, 3)

, or as an inline view:

SELECT  *
FROM (
SELECT 1 AS id
UNION ALL
SELECT 2 AS id
UNION ALL
SELECT 3 AS id
) q
JOIN a
ON a.column = q.id

, but this is not always possible.

To work around this without changing the query parameters, one can use FIND_IN_SET:

SELECT  *
FROM a
WHERE FIND_IN_SET(column, '1,2,3')


This function, however, is not sargable and a full table scan will be performed on a.

#1. LEFT JOIN with COUNT(*)

SELECT  a.id, COUNT(*)
FROM a
LEFT JOIN
b
ON b.a = a.id
GROUP BY
a.id

This query intends to count number of matches in b for each record in a.

The problem is that COUNT(*) will never return a 0 in such a query. If there is no match for a certain record in a, the record will be still returned and counted.

COUNT should be made to count only the actual records in b. Since COUNT(*), when called with an argument, ignores NULLs, we can pass b.a to it. As a join key, it can never be a null in an actual match, but will be if there were no match:

SELECT  a.id, COUNT(b.a)
FROM a
LEFT JOIN
b
ON b.a = a.id
GROUP BY
a.id

P.S. In case you were wondering: no, the pictures don't have any special meaning. I just liked them.

This was idea came from "cracked"
(I just discovered cracked.com).

References
Published at DZone with permission of Alex Bolenok, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Dmitri Bichko replied on Fri, 2010/11/12 - 9:09pm

What the hell? This is all standard SQL behaviour - if this isn't what you expect, you need to spend 45 minutes reading any random "Beginning SQL" book.

Seriously, who would expect "column IN ('1, 2, 3')" to work? It's a database, not a mind reader.

 Some actlually unexpected things in MySQL:

   case-insensitive varchar searches by default
   silently inserting zeroes and empty strings into NOT NULL fields on null input
   doing a similar "conversion" on incorrect input types
   choking on spaces in SQL syntax in several random places

 (I don't know if some of these were fixed in 5, it's been a while since I've looked at MySQL)

 

Alex Bolenok replied on Sat, 2010/11/13 - 7:51am

What the hell? This is all standard SQL behaviour - if this isn't what you expect, you need to spend 45 minutes reading any random "Beginning SQL" book.

This is a random "Beginning SQL" book :)

Seriously, who would expect "column IN ('1, 2, 3')" to work?


More than one might think:

Silvio Bierman replied on Sat, 2010/11/13 - 8:32pm

This is all conforming to SQL standards and can be observed with PostgreSQL or Oracle as well. You seem to be completely unaware of ternary logic. If you want to be a database consultant you should study your primary subject a bit better. As another poster said MySQL does have some peculiarities and I will not touch it for any serious work (I prefer PostgreSQL by far).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.