Boolean short circuiting is not guaranteed

Last week I was debugging a weird bug in one of our services. In the logs we saw an SQL error “repetition-operator operand invalid”, so I was pretty sure that we have a broken SQL comparison in one of our services. The weird part was that this error happened only occasionally, seemingly randomly.

After several hours of debugging, we learned something mind boggling at that moment — boolean expressions in where statements are not short circuited. This definitely clashed with our assumptions, and teched us an important lesson to always double check our assumptions when designing a system.

The system that we debugged

In one of our services, we had a scheme where we construct rules, where each rule is activated if it matches one or more regular expression patterns. For example, the following rule would be “activated” by master, staging-alpha, or release-3, but it wouldn’t be activated by feature/new-icons.

rule:
  name: "non-feature-branches"
  patterns:
    - /^master$/
    - /^staging-.*$/
    - /^release-3$/

We were using this system to activate some features based on git branch names, as you could already guess by the previous examples. A repository has multiple rules in this context.

We encoded this information in the following SQL tables:

table repository (
  id   integer
  name string
)

table rules (
  id      integer
  repo_id integer
  name    string
)

table patterns (
  rule_id integer
  term    string
)

The following query would list rules that are activated by a branch name:

SELECT id FROM rules
WHERE rules.repo_id = ($1) AND EXISTS (
  SELECT 1 FROM patterns
  WHERE rule.id = patterns.rule_id AND ($2) ~ pattern.term
)

The bug

As stated in the intro, the SQL error was invalid regular expression: quantifier operand invalid which happens if you have an invalid regular expression in a SQL select statement. In our case, the poisonous regular expression was *, which is invalid, and should be .*.

The weird part was that the pattern was not part of the repository we were filtering, and we were totally baffled why it is raised, if it should never even be tested.

Here is our assumptions how the following query should have worked:

SELECT id FROM rules
WHERE rules.repo_id = ($1) AND EXISTS (
  SELECT 1 FROM patterns
  WHERE rule.id = patterns.rule_id AND ($2) ~ pattern.term
)

First, we filter only repositories that we are interested in with rules.repo_id = ($1)
Then, we find the patterns associated with that subset with rule.id = patterns.rule_id in the inner query
Finally, we filter that subset with the innermost regex match ($2) ~ pattern.term

Based on the previous assumption, and order of operations, even if we had a broken rule in our system, it would only affect repositories that are using that rule, but no others.

This was false. The order of operations is different in SQL, as there is no short-circuiting between AND statements in an SQL query.

How to prove that short-circuiting is not working?

We hypothesised that short-circuiting is not working, but we needed a proof. SQL explain and explain analyse helped us to prove our hypothesis.

EXPLAIN SELECT id FROM rules
        WHERE rules.repo_id = ($1) AND EXISTS (
          SELECT 1 FROM patterns
          WHERE rule.id = patterns.rule_id AND ($2) ~ pattern.term
        );

---

Nested Loop Semi Join  (cost=0.14..19.05 rows=1 width=16)
  Join Filter: (rules.id = patterns.rule_id)
  ->  Index Scan using rules_repo_id_index on rules (cost=0.14..8.16 rows=1 width=16)
        Index Cond: (repo_id = 1)
  ->  Materialize  (cost=0.00..10.88 rows=1 width=16)
        ->  Seq Scan on patterns  (cost=0.00..10.88 rows=1 width=16)
              Filter: ('master'::text ~ (term)::text)

Turns out the actual order of operations is:

First, the whole rules table is filtered based on repo_id = 1.
Then, then we find the first pattern that matches 'master'::text ~ (term)::text
Finally, the join filter rules.id = patterns.rule_id is applied to combine the results

This was obviously in direct contradiction to our basic assumption that SQL follows AND joins in order they appear, and short circuit if the first expression is false.