Press enter to see results or esc to cancel.

Optimizing OR’ed WHERE Clauses Between JOIN’ed Tables

OR conditions are normally difficult to optimize when used on different columns, a pain when the columns are of range conditions and worst when done between 2 or more different tables. Look at my example below, the original query  is trying to find rows which conditions are based on columns from the JOIN’ed tables.

If you look at the execution plan, the optimizer will always do a full table scan of the first table even though there is proper indexes on employees.hire_date and salaries.from_date. This is because it cannot know in advance which rows from the second table will match the rows from the second table, the OR conditions needs to match both tables. For the sake of brevity, observe what happens if I change the OR condition to AND instead.

As you can see, the optimizer now used the hire_date index in which for every matching row, there is about 4 rows matching on the second table.

Notice how many index reads and the full table scan the original query does. Imagine if you are working on a reporting query based on date ranges that has millions of rows for the first table, you might have to wait the next day for the results!

Because the queries need to return rows matching both tables, we can rewrite it as UNION instead separating the WHERE clauses into 2 different SELECT queries like below.

With this approach, you are actually limiting the number of rows needed to be examined immediately before being JOINed. There is no real downsize with the new query, you just have to watch out that results from both SELECT’s does not grow too large, nearly the size of the total rows of both tables. In which case, you should not be doing the latter anyway i.e. trying to filter with big date ranges especially if you have very large dataset, instead you should implement summary tables that produces the results you need.