The curious cases of json_extract

2020-04-25

json_extract is a function for extract data from a JSON document both in MySQL and MariaDB. The function normally works fine, for example

set @j = '{"num": 42, "list": [1, 2, 3], "obj": {"name": "Edward Stark"}}'; 
select json_extract(@j, '$.num') as num, json_extract(@j, '$.list') as list, json_extract(@j, '$.obj') as obj;
 
+------+-----------+--------------------------+
| num  | list      | obj                      |
+------+-----------+--------------------------+
| 42   | [1, 2, 3] | {"name": "Edward Stark"} |
+------+-----------+--------------------------+

But when it comes to a single JSON string, the results of json_extract is not always as expected. We shall investigate the different cases.

Unquote

The first case is about json_extract and unquote:

select json_extract(@j, '$.obj.name') as name,
       json_unquote(json_extract(@j, '$.obj.name')) as name_unquote;           

+----------------+--------------+
| name           | name_unquote |
+----------------+--------------+
| "Edward Stark" | Edward Stark |
+----------------+--------------+

Use json_extract on a single JSON string, what we get is double quoted, we need to use a json_unquote to wrap the result. On MariaDB, we may use json_value as json_unquote wrapped json_extract for a single JSON string. This is somehow like the difference between ’->’ and ‘-»’ operators in MySQL.

To unquote a extracted JSON string is reasonable, since the string type is quoted the json specification. The only inconvenience is that the case is not found in the documents.

Despite the double quote, we have the following results on MariaDB 10.4.12:

select json_extract(@j, '$.obj.name') as name_1,
       json_value(@j, '$.obj.name') as name_2;             

+----------------+--------------+
| name_1         | name_2       |
+----------------+--------------+
| "Edward Stark" | Edward Stark |
+----------------+--------------+
    
select json_extract(@j, '$.obj.name') = json_value(@j, '$.obj.name') as comp_1,
       json_extract(@j, '$.obj.name') = 'Edward Stark' as comp_2,
       json_value(@j, '$.obj.name') = 'Edward Stark' as comp_3; 
+--------+--------+--------+
| comp_1 | comp_2 | comp_3 |
+--------+--------+--------+
|      1 |      1 |      1 |
+--------+--------+--------+

The results from json_extract and json_value are different based on the first query above. But they are both equal to the string constant ‘Edward Stark’ based on the second query. We also have similar query results on MySQL 8.0.19(json_value is not supported on this version):

select json_extract(@j, '$.obj.name') as name_1,
       json_unquote(json_extract(@j, '$.obj.name')) as name_2;
+----------------+--------------+
| name_1         | name_2       |
+----------------+--------------+
| "Edward Stark" | Edward Stark |
+----------------+--------------+
    
select json_extract(@j, '$.obj.name') = json_unquote(json_extract(@j, '$.obj.name')) as comp_1,
       json_extract(@j, '$.obj.name') = 'Edward Stark' as comp_2,
       json_unquote(json_extract(@j, '$.obj.name')) = 'Edward Stark' as comp_3; 
+--------+--------+--------+
| comp_1 | comp_2 | comp_3 |
+--------+--------+--------+
|      1 |      1 |      1 |
+--------+--------+--------+

Differences on MySQL and MariaDB

Some behaviors are different on MySQL and MariaDB when using json_extract, let’s have a look.

On MySQL 8.0.19:

select json_extract(@j, '$.obj.name') in ('Edward Stark') as in_1,
       json_extract(@j, '$.obj.name') in ('Edward Stark', 'Ned Stark') as in_2;
+------+------+
| in_1 | in_2 |
+------+------+
|    1 |    1 |
+------+------+

The above results are as expected, but on MariaDB 10.4.12:

select json_extract(@j, '$.obj.name') in ('Edward Stark') as in_1,
       json_extract(@j, '$.obj.name') in ('Edward Stark', 'Ned Stark') as in_2,
       json_extract(@j, '$.obj.name') in ('"Edward Stark"', '"Ned Stark"') as in_3,
       json_value(@j, '$.obj.name') in ('Edward Stark', 'Ned Stark') as in_4;
+------+------+------+------+
| in_1 | in_2 | in_3 | in_4 |
+------+------+------+------+
|    1 |    0 |    1 |    1 |
+------+------+------+------+

So on MariaDB, when using an expression like “json_extract(doc, x) in some_array”, you have to make sure that the elements are manually quoted if the array size is more than 1. The case seems to be a bug of an implicit conversion of JSON types to database native types.

Final Notes

When using the JSON column on MySQL and MariaDB, we should really be careful with the above cases of json_extract. With SQLAlchemy in Python, if data is a JSON column, SQLAlchemy will translate data[‘x’] into json_extract(data, ‘$.x’) . So the same Python code with SQLAlchemy, would behave differently in MySQL and MariaDB.