Deterministic function in MySQL

47.3k views Asked by At

I got confused with a seemingly simple concept. MySQL defines deterministic function as a function that:

always produces the same result for the same input parameters

So in my understanding, functions like:

CREATE FUNCTION foo (val INT) READS SQL DATA
BEGIN
   DECLARE retval INT;
   SET retval = (SELECT COUNT(*) FROM table_1 WHERE field_1 = val);
   RETURN retval;
END;

are not deterministic (there is no guarantee that delete/update/insert does not happen between 2 calls to the function). At the same time, I saw many functions which do pretty much the same, i.e. return value based on result of queries, and declared as DETERMINISTIC. It looks like I'm missing something very basic.

Could anyone clarify this issue?

Thanks.

Update Thanks for those who answered(+1); so far it looks like there is a widespread misuse of DETERMINISTIC keyword. It is still hard to believe for me that so many people do it, so I'll wait a bit for other answers.

8

There are 8 answers

3
Xint0 On BEST ANSWER

From the MySQL 5.0 Reference:

Assessment of the nature of a routine is based on the “honesty” of the creator: MySQL does not check that a routine declared DETERMINISTIC is free of statements that produce nondeterministic results. However, misdeclaring a routine might affect results or affect performance. Declaring a nondeterministic routine as DETERMINISTIC might lead to unexpected results by causing the optimizer to make incorrect execution plan choices. Declaring a deterministic routine as NONDETERMINISTIC might diminish performance by causing available optimizations not to be used. Prior to MySQL 5.0.44, the DETERMINISTIC characteristic is accepted, but not used by the optimizer.

So there you have it, you can tag a stored routine as DETERMINISTIC even if it is not, but it might lead to unexpected results or performance problems.

0
Super Kai - Kazuya Ito On

The doc says below:

A routine is considered "deterministic" if it always produces the same result for the same input parameters, and "not deterministic" otherwise.

For example, test() function below is deterministic because it always returns the same @num value for the same v1 and v2 values. For example, when v1 and v2 are 2 and 3 respectively, @num is always 5 so you should set DETERMINISTIC to test() function as shown below. The doc gives you more examples but it doesn't explain them much:

DELIMITER $$

CREATE FUNCTION test(v1 INT, v2 INT) RETURNS INT
DETERMINISTIC -- Here
BEGIN
SET @num = v1 + v2;
RETURN @num;
END$$

DELIMITER ;

And, test() function below is not deterministic because RAND() built-in function returns a randum number so you should set NOT DETERMINISTIC to test() function as shown below. *My answer explains how to enable to use NOT DETERMINISTIC which you cannot use by default getting error and NOW() and UUID() built-in functions are also not deterministic and without DETERMINISTIC and NOT DETERMINISTIC is also recognized as NOT DETERMINISTIC by default:

DELIMITER $$

CREATE FUNCTION test() RETURNS FLOAT
NOT DETERMINISTIC -- Here
BEGIN
RETURN RAND();
END$$

DELIMITER ;

Lastly, you should properly set DETERMINISTIC or NOT DETERMINISTIC to a function otherwise there are some problems according to the doc below:

However, misdeclaring a routine might affect results or affect performance. Declaring a nondeterministic routine as DETERMINISTIC might lead to unexpected results by causing the optimizer to make incorrect execution plan choices. Declaring a deterministic routine as NONDETERMINISTIC might diminish performance by causing available optimizations not to be used.

1
John Watson On

You're not missing anything. This function is non-deterministic. Declaring it deterministic won't cause your database to melt but it might affect performance. From the MySQL site: "Declaring a nondeterministic routine as DETERMINISTIC might lead to unexpected results by causing the optimizer to make incorrect execution plan choices." But MySQL does not enforce or check if your declared deterministic routine is actually deterministic---MySQL trusts that you know what you are doing.

0
dooku On

Deterministic is important if you have replication turned on or may use it one day. A non-deterministic function call that causes a row change (update or insert) for instance will need to be replicated using binary (row-based) where as a deterministic function can be replicated statement based. This becomes interesting when looking at your SQL examples above, which ones will happen the same (give the same result) when replicated using statement based, and which should be replicated using the result obtained in the master (row-based). If the statements are executed with the appropriate locking and can be guaranteed to execute in the same order on the Slave then they are indeed deterministic. If the locking / statement order that the Slave uses (no concurrency, serial processing of statements in the order they are started) means the answer can be different, then the function should be non-deterministic.

0
C.Poh On

I was looking through the answers and decide to contribute a more compact and updated answer.

A deterministic function always return the same result given the same input parameters in the same state of the database. Eg POW,SUBSTR(),UCASE().

A non deterministic function does not necessarily always return the same result given the same input parameters in the same state of the database. Eg CURDATE(), RAND(), UUID().

MySQL 8.0 Reference Manual have some update on this

8.2.1.20 Function Call Optimization

MySQL functions are tagged internally as deterministic or nondeterministic. A function is nondeterministic if, given fixed values for its arguments, it can return different results for different invocations. Examples of nondeterministic functions: RAND(), UUID().If a function is tagged nondeterministic, a reference to it in a WHERE clause is evaluated for every row (when selecting from one table) or combination of rows (when selecting from a multiple-table join).MySQL also determines when to evaluate functions based on types of arguments, whether the arguments are table columns or constant values. A deterministic function that takes a table column as argument must be evaluated whenever that column changes value. Nondeterministic functions may affect query performance. For example, some optimizations may not be available, or more locking might be required. The following discussion uses RAND() but applies to other nondeterministic functions as well.

This code example from MySQL 8.0 Reference Manual. You can create the table then populate the data with 49 rows like id column 1 to 49 and col_a some strings which are unique like "AA","AB","AC" until 49 rows. You can actually do 15 rows but you need to change the 49 to 15 that's more of topic of the random function.

CREATE TABLE t (id INT NOT NULL PRIMARY KEY, col_a VARCHAR(100));

SELECT * FROM t WHERE id = POW(1,2);
SELECT * FROM t WHERE id = FLOOR(1 + RAND() * 49);

The code will help illustrate the point, the MySQL 8.0 Reference manual is trying to make. Hopefully this helps thanks !

0
Thomas Jensen On

I would like to add an example of why it can have a huge impact on performance, to declare a function for DETERMINISTIC or not:

SELECT id1, foo(id1) f1
FROM table1
ORDER BY id1
LIMIT 1;

Using MariaDB 10.6:

If foo() is declared DETERMINISTIC, then foo() is only called once for the single returned row.

If foo() is NOT declared DETERMINISTIC, then foo() is called once for every row in table1, before returning the single row. This can take ages!!

So I recommend to using "bikeman868" and "C.Poh" definitions, to decide whether a function is DETERMINISTIC or not:

A deterministic function always return the same result given the same input parameters in the same state of the database. Eg POW,SUBSTR(),UCASE().

A non deterministic function does not necessarily always return the same result given the same input parameters in the same state of the database. Eg CURDATE(), RAND(), UUID().

In addition to this, you can also consider in what way the result of the function is used; is it saved in the database or only extracted for external use.

1
bikeman868 On

I think that your routine is deterministic. The documentation is not very clear and this has led to many people being very confused about this issue, which is actually more about replication than anything else.

Consider a situation where you have replication set up between two databases. The master database keeps a log of all the stored routines that were executed including their input parameters, and ships this log to the the slave. The slave executes the same stored routines in the same order with the same input parameters. Will the slave database now contain identical data to the master database? If the stored routines create GUIDs and store these in the database then no, the master and slave databases will be different and replication will be broken.

The main purpose of the DETERMINISTIC flag is to tell MySQL whether including calls to this stored routine in the replication log will result in differences between the master database and the replicated slaves, and is therefore unsafe.

When deciding if the DETERMINISTIC flag is appropriate for a stored routine think of it like this: If I start with two identical databases and I execute my routine on both databases with the same input parameters will my databases still be identical? If they are then my routine is deterministic.

If you declare your routine is deterministic when it is not, then replicas of your main database might not be identical to the original because MySQL will only add the procedure call to the replication log, and executing the procedure on the slave does not produce identical results.

If your routine is non-deterministic then MySQL must include the affected rows in the replication log instead. If you declare your routine as non-deterministic when it is not this will not break anything, but the replication log will contain all of the affected rows when just the procedure call would have been enough and this could impact performance.

3
Jon Gilbert On

DETERMINISTIC results does not refer to different results sets being returned at different times (depending on what data has been added in the mean time). Moreover it is a reference to the result sets on different machines using the same data. If for example, you have 2 machines which run a function including uuid() or referencing server variables then these should be considered NOT DETERMINISTIC. This is useful for example in replication because the function calls are stored in the binary log (master) and then also executed by the slave. For details and examples see http://dev.mysql.com/doc/refman/5.0/en/stored-programs-logging.html

The use of DETERMINISTIC is thus (99% of the time) correct, not to be considered misuse.