In my previous post about package constants I mentioned the application I am working on right now. They gave me the task to speed up the application since performance was getting worse and worse. I analyzed the statspack results together with a DBA and we found two queries that together took 40% of the logical IO’s of the system. That’s a lot for only two queries, especially when you look how big our application is; we have many queries.
The two queries were both in one procedure. That must have been an off day for the original programmer . The queries were small and had an IN statement in the where clause. I simply rewrote the queries to use an EXIST and they became blazingly fast. That was strange. I attended the Tom Kyte seminar in Utrecht in 2005 and he claimed that it didn’t matter anymore. IN or EXIST, the database would see this and the optimizer would have the same execution plan for both, but not in my case. How could that be? I was running the queries in a 9R2 database.
So I wrote a (meaningless) example to show the difference between the IN and EXIST statements. The source of this test script is as follows:
set timing off set autotrace off set echo on create table my_blocked_objects (object_name varchar2(200) not null) / create table my_objects (object_name varchar2(200) not null) / create index my_blk_idx on my_blocked_objects(object_name) / create index my_obj_idx on my_objects(object_name) / insert into my_blocked_objects select owner || '.' || object_name from all_objects / insert into my_objects select user || '.' || object_name from user_objects / commit / begin dbms_stats.gather_table_stats(user,'MY_BLOCKED_OBJECTS'); dbms_stats.gather_table_stats(user,'MY_OBJECTS'); end; / set timing on set autotrace on explain select count(*) from my_objects obj where obj.object_name in (select blk.object_name from my_blocked_objects blk) / select count(*) from my_objects obj where exists (select '' from my_blocked_objects blk where blk.object_name = obj.object_name) / drop table my_blocked_objects / drop table my_objects /
The autotrace results of this script on my database was:
SQL> select count(*) 2 from my_objects obj 3 where obj.object_name in (select blk.object_name 4 from my_blocked_objects blk) 5 / COUNT(*) ---------- 3376 Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=166 Card=1 Bytes=132 ) 1 0 SORT (AGGREGATE) 2 1 HASH JOIN (Cost=166 Card=3378 Bytes=445896) 3 2 INDEX (FULL SCAN) OF 'MY_OBJ_IDX' (NON-UNIQUE) (Cost=2 6 Card=3378 Bytes=101340) 4 2 VIEW OF 'VW_NSO_1' (Cost=156 Card=27148 Bytes=2769096) 5 4 SORT (UNIQUE) (Cost=156 Card=27148 Bytes=814440) 6 5 INDEX (FULL SCAN) OF 'MY_BLK_IDX' (NON-UNIQUE) (Co st=26 Card=27522 Bytes=825660) SQL> SQL> select count(*) 2 from my_objects obj 3 where exists (select '' 4 from my_blocked_objects blk 5 where blk.object_name = obj.object_name) 6 / COUNT(*) ---------- 3376 Elapsed: 00:00:00.00 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=30) 1 0 SORT (AGGREGATE) 2 1 INDEX (FULL SCAN) OF 'MY_OBJ_IDX' (NON-UNIQUE) (Cost=26 Card=169 Bytes=5070) 3 2 INDEX (RANGE SCAN) OF 'MY_BLK_IDX' (NON-UNIQUE) (Cost= 1 Card=1 Bytes=30)
Again not the same execution plan. The IN version was less efficient compared to the EXIST version of the query. And this was not what Tom Kyte was saying during his seminar. I then tested the same script on another instance; but also a 9R2 database. That gave the following results:
SQL> select count(*) 2 from my_objects obj 3 where obj.object_name in (select blk.object_name 4 from my_blocked_objects blk) 5 / COUNT(*) ---------- 2220 Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=12 Card=1 Bytes=62) 1 0 SORT (AGGREGATE) 2 1 HASH JOIN (SEMI) (Cost=12 Card=2223 Bytes=137826) 3 2 TABLE ACCESS (FULL) OF 'MY_OBJECTS' (Cost=4 Card=2223 Bytes=71136) 4 2 INDEX (FAST FULL SCAN) OF 'MY_BLK_IDX' (NON-UNIQUE) (C ost=5 Card=24923 Bytes=747690) SQL> SQL> select count(*) 2 from my_objects obj 3 where exists (select '' 4 from my_blocked_objects blk 5 where blk.object_name = obj.object_name) 6 / COUNT(*) ---------- 2220 Elapsed: 00:00:00.00 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=12 Card=1 Bytes=62) 1 0 SORT (AGGREGATE) 2 1 HASH JOIN (SEMI) (Cost=12 Card=2223 Bytes=137826) 3 2 TABLE ACCESS (FULL) OF 'MY_OBJECTS' (Cost=4 Card=2223 Bytes=71136) 4 2 INDEX (FAST FULL SCAN) OF 'MY_BLK_IDX' (NON-UNIQUE) (C ost=5 Card=24923 Bytes=747690)
Wow the same execution plan, but a slightly higher cost (Cost=12 compared to Cost=3). How was this possible???? TKPROF output showed the same results. It took some time to figure this out; but the difference was introduced by different optimizer parameters in the database. The database with two execution plans had the following parameters:
NAME TYPE VALUE ------------------------------------ ----------- ------ object_cache_optimal_size integer 102400 optimizer_dynamic_sampling integer 0 optimizer_features_enable string 8.1.7 optimizer_index_caching integer 0 optimizer_index_cost_adj integer 10 optimizer_max_permutations integer 80000 optimizer_mode string CHOOSE
The database with one execution plan had the following parameters:
NAME TYPE VALUE ------------------------------------ ----------- ------ object_cache_optimal_size integer 102400 optimizer_dynamic_sampling integer 1 optimizer_features_enable string 9.2.0 optimizer_index_caching integer 0 optimizer_index_cost_adj integer 100 optimizer_max_permutations integer 2000 optimizer_mode string CHOOSE
For some reason the DBA’s kept the optimizer_features_enable on 8.1.7 after the migration of the database. This prevented the optimizer from using new features like optimizing IN where clauses. The argument was that it could be possible that another value of this parameter could negatively impact performance. That could be true; but no one tested that; they just took tha assumption.
Setting the parameter to 9.2.0 would have automatically optimized the two bad queries that led to this post. They would never have been a problem. I would never have found them as being problem queries. My lesson learnt is that you never should rely on assumptions; you have to test your assumptions.
Tom Kyte was right in this case. There is no difference between IN or EXIST. I believe the difference in cost between the two databases (cost=3 vs. cost=12) is hard to compare because of the difference of the optimizer_features_enable parameter.