Can I use the MySQL function characteristic DETERMINISTIC in combination with READS SQL DATA and do I want to?

TL;DR

If the following two groups of statements are the same to you, you want the DETERMINISTIC characteristic on your FUNCTION, even if you have READS SQL DATA.

SET @id = (SELECT my_func());
SELECT * FROM my_large_table WHERE id = @id;

-- versus

SELECT * FROM my_large_table WHERE id = my_func();

(All of this is tested with MySQL 5.7.16 and some was also tested with MySQL cluster 5.6.)

First, some background

You may or may not have run into this MySQL error at one point:
You do not have the SUPER privilege and binary logging is enabled (you *might* want to use the less safe log_bin_trust_function_creators variable)

You may specify certain characteristics to MySQL FUNCTIONs/PROCEDUREs when creating them:

CREATE
  [DEFINER = { user | CURRENT_USER }]
  PROCEDURE sp_name ([proc_parameter[,...]])
  [characteristic ...] routine_body

CREATE
  [DEFINER = { user | CURRENT_USER }]
  FUNCTION sp_name ([func_parameter[,...]])
  RETURNS type
  [characteristic ...] routine_body

...

characteristic:
    COMMENT 'string'
  | LANGUAGE SQL
  | [NOT] DETERMINISTIC
  | { CONTAINS SQL | NO SQL | READS SQL DATA | MODIFIES SQL DATA }
  | SQL SECURITY { DEFINER | INVOKER }

Source: http://dev.mysql.com/doc/refman/5.7/en/create-procedure.html

For instance. We can create a function that returns 4.

mysql> CREATE FUNCTION test_4() RETURNS INT
       DETERMINISTIC
       NO SQL
       RETURN 4;

mysql> select test_4()\G
*************************** 1. row ***************************
test_4(): 4

Not very useful, but it illustrates the most basic layout of a MySQL function. We added the DETERMINISTIC and NO SQL characteristics because the function always returns the same output for the same input (deterministic) and it contains no SQL statements (no sql).

DETERMINISTIC characteristic, and how it affects replication

If you happen to create such a function as a non-privileged user on a slavable MySQL server — i.e. one that creates binlogs that a MySQL slave can use to replicate the statements (using the log_bin mysqld setting) — you would run into the “You do not have the SUPER privilege and binary logging is enabled” error.

Why?

Because MySQL could use the characteristics from the functions to determine how a statement should be replicated. In this case, the characteristics might tell the replication: if this function is used in a statement, we can execute the same statement on the slave and keep consistent slave records.

To test this out, we create a few different functions and check how they would be replicated.

-- A real deterministic function.
CREATE FUNCTION f_deterministic() RETURNS INT
DETERMINISTIC RETURN 4;

-- A non-deterministic function.
CREATE FUNCTION f_non_deterministic() RETURNS FLOAT
DETERMINISTIC RETURN RAND();

-- A real deterministic function that modifies SQL data.
-- This one inserts a record into test_abc.
CREATE TABLE test_abc(id INT PRIMARY KEY AUTO_INCREMENT, value FLOAT NOT NULL);
DELIMITER ;;
CREATE FUNCTION f_modifies_sql_data() RETURNS FLOAT
DETERMINISTIC MODIFIES SQL DATA
BEGIN
  INSERT INTO test_abc (value) VALUES (5);
  RETURN 5;
END
;;
DELIMITER ;

If you run it, you get (something like) this:

mysql> SELECT f_deterministic(), f_non_deterministic(), f_modifies_sql_data();
+-------------------+-----------------------+-----------------------+
| f_deterministic() | f_non_deterministic() | f_modifies_sql_data() |
+-------------------+-----------------------+-----------------------+
|                 4 |    0.7306850552558899 |                     5 |
+-------------------+-----------------------+-----------------------+

If you use binlog-format = STATEMENT this you get this warning:
Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT.

The binlog contains this row; statement based replication as expected:
SELECT `testdb`.`f_modifies_sql_data`()

For ROW and MIXED binlog formats, the binlog looks like this:

$ mysqlbinlog /var/log/mysql/mysql-bin.000005 --base64-output=decode-rows --verbose
...
### INSERT INTO `testdb`.`test_abc`
### SET
###   @1=2
###   @2=5

This was unexpected. On this MySQL 5.7.16, for the MODIFIES SQL DATA characteristic, the DETERMINISTIC property is ignored for the binlogs and instead of replicating the query, it replicates the altered rows.

Note that that is not a bad thing. This keeps things consistent even if the function is mislabeled and should've been NON DETERMINISTIC.

Lying in the characteristics

What if we lie to MySQL? And tell it that the function does not modify SQL data?

No problem for MIXED based replication. With DETERMINISTIC NO SQL the INSERT statement is still propagated in the binlog.

This appears that the MIXED/ROW based replication is unaffected by labeling/mislabeling of the function characteristics. That's one thing less to worry about. (For the tested MySQL version only! YMMV!)

Then, is there another reason to get the characteristics right?

Yes there is: query optimizations.

Query optimizations and DETERMINISTIC

Consider this table:

mysql> CREATE TABLE test_seq (id INT PRIMARY KEY NOT NULL, value FLOAT NOT NULL);
mysql> INSERT INTO test_seq VALUES (1,-1), (2, -2), (3, -3), (4, -4), (5, -5), (6, -6);
mysql> SELECT * FROM test_seq;
+----+-------+
| id | value |
+----+-------+
|  1 |    -1 |
|  2 |    -2 |
|  3 |    -3 |
|  4 |    -4 |
|  5 |    -5 |
|  6 |    -6 |
+----+-------+

What happens if we do this query:

SELECT * FROM test_seq WHERE id = f_modifies_sql_data();

Does it return 5? Yes it does. But it also inserts records into test_abc, because we told the function to do so.

mysql> DELETE FROM test_abc;
mysql> SELECT * FROM test_seq WHERE id = f_modifies_sql_data();
+----+-------+
| id | value |
+----+-------+
|  5 |    -5 |
+----+-------+

mysql> SELECT * FROM test_abc;
+----+-------+
| id | value |
+----+-------+
|  8 |     5 |
|  9 |     5 |
+----+-------+

Apparently that function was called twice. We would expect once, but a second time is okay. But what if we relabel it as NOT DETERMINISTIC?

mysql> DELIMITER ;;
mysql> CREATE FUNCTION f_modifies_sql_data_nondet() RETURNS FLOAT
       NOT DETERMINISTIC MODIFIES SQL DATA
       BEGIN
         INSERT INTO test_abc (value) VALUES (5);
         RETURN 5;
       END
       ;;
ERROR 1418 (HY000): This function has none of DETERMINISTIC, NO SQL, or READS
SQL DATA in its declaration and binary logging is enabled (you *might* want to
use the less safe log_bin_trust_function_creators variable)

Heh, a different error. Lets enable log_bin_trust_function_creators. This time it writes.

mysql> DELETE FROM test_abc;
mysql> SELECT * FROM test_seq WHERE id = f_modifies_sql_data_nondet();
mysql> SELECT COUNT(*) FROM test_abc\G
*************************** 1. row ***************************
COUNT(*): 6

That's right! Six records. One record inserted for every comparison against id. Here you clearly see the difference between DETERMINISTIC and NOT DETERMINISTIC:

When used in a where_condition, a DETERMINISTIC function is called once (maybe twice) while a NOT DETERMINISTIC function is checked for every row.

Another clear example would be: SELECT COUNT(*) FROM test_seq WHERE RAND() > 0.5; That could return any one value {0, 1, 2, 3, 4, 5, 6}. But when we wrap RAND() in a DETERMINISTIC-labelled function, the result can only be in {0, 6}. Makes sense? Yes.

DETERMINISTIC and READS SQL DATA

On to the confusing bits: the internet does not agree on whether DETERMINISTIC and READS SQL DATA can be combined. But as you might realize at this point, this can be quite a useful combination:

First your custom FUNCTION does its magic and looks up a indexed value. Then you look up the record based on that indexes value, in a potentially huge table. You don't want the function to be called for every record.

mysql> EXPLAIN SELECT * FROM test_seq WHERE id = f_modifies_sql_data();
+----+----------+-------+---------------+---------+---------+-------+------+----------+-------+
| id | table    | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra |
+----+----------+-------+---------------+---------+---------+-------+------+----------+-------+
|  1 | test_seq | const | PRIMARY       | PRIMARY | 4       | const |    1 |   100.00 | NULL  |
+----+----------+-------+---------------+---------+---------+-------+------+----------+-------+

mysql> EXPLAIN SELECT * FROM test_seq WHERE id = f_modifies_sql_data_nondet();
+----+----------+------+---------------+------+---------+------+------+----------+-------------+
| id | table    | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+----------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | test_seq | ALL  | NULL          | NULL | NULL    | NULL |    6 |    16.67 | Using where |
+----+----------+------+---------------+------+---------+------+------+----------+-------------+

The difference between instant lookup and an uber-slow where lookup over all your records.

Caching is not an issue

Lastly, one could imagine that MySQL used the DETERMINISTIC characteristic to cache the result (ignoring the READS SQL DATA). This does not appear to be the case. (Again, tested with MySQL 5.7.16, YMMV! Although I would expect them not to change semantics here lightly.)

If anything, your function is called too often. It's even called for every resultant row if used in the select_expression:

mysql> DELETE FROM test_abc;
mysql> SELECT id, f_modifies_sql_data() FROM test_seq;
...

mysql> SELECT COUNT(*) FROM test_abc\G
*************************** 1. row ***************************
COUNT(*): 6

Moral of the story: yes, your function can safely be labelled DETERMINISTIC even if it READS SQL DATA.

The only things you should worry about are other non-derministic functions (RAND, UUID, ...), limits (LIMIT) and an out-of-sync database. But if your database is out of sync, you have more pressing issues to worry about. And MIXED/ROW based replication appears to handle all of that properly anyway. As we've seen, it safely replicates MODIFIES SQL DATA functions in all cases.

This matches with the statements made on Binary Logging of Stored Programs: In general, the issues described here result when binary logging occurs at the SQL statement level. If you use row-based binary logging, the log contains changes made to individual rows as a result of executing SQL statements.

That also means that the warnings/errors related to the log_bin_trust_function_creators can be safely ignored when you use anything other than STATEMENT based binary logging.

function replication optimization mysql