Access Path Scientific Analysis – Access Path Identification Part II
We are continuing our series on how to become access path scientists. In previous blogs, we This series of blogs begins with a high-level overview of Oracle query optimization. We’ll also learn the difference between estimated and runtime access paths. Subsequent blogs will provide details on the various tools we can use to retrieve access path information.
The intent of this series of blogs is to show readers the various tools that can be used to display a SQL statement’s access path. Once you have identified the top resource consuming queries (discussed in a previous blog), you will need to identify the access paths the statements are taking.
The intent of this blog is to not provide you with a detailed analysis of all of the access paths available in Oracle. It is intended to show you how to identify access paths using the various tools the Oracle system provides. It is important that I provide you with some background information on access paths before we begin using the various tools that display access path information. Part one of this blog will provide you with that background information. The blog will also provide you with additional resources that will help you have a better understanding of the different access paths that are available to the optimizer.
Optimization and Access Paths
When an application submits a SQL statement to the database for execution, several steps are performed before it begins accessing data. These steps are collectively called “parsing”. During the parse phase, the database ensures that the statement is syntactically correct, that the objects it is accessing exist in the environment, determines if the statement has executed before, etc., etc.
Generating a new access path is a relatively expensive process. If the statement has been executed before, Oracle will attempt to use the same access path that was created during the earlier parse. If the statement hasn’t been executed before, or it is invalidated for some reason, Oracle will activate the optimizer to generate the statement’s access path to the data. The access path is the physical steps Oracle will take to retrieve the data and return it to the calling application.
So little is known about the inner-workings of the cost based optimizer that is often called the Oracle database’s “little black box.” The optimizer’s job is to analyze the statement being executed and determine the most efficient access path to the data for that statement. Craig Mullins provides an excellent description of a database optimizer in The DB2 Developer’s Guide “the optimizer is equivalent to an expert system. An expert system is a standard set of rules when combined with situational data can return an expert opinion.”
The cost based optimizer uses statistics generated by the DBMS_STATS procedure as the situational data when creating its expert opinion on which access path to the data is most optimal. These statistics are stored in the data dictionary and describe the objects space characteristics, data uniqueness and data distribution.
The cost-based optimizer is only as good as the statistics it uses as input. Statistics collections should be run on a regular basis to ensure that the statistics are current (representative of the data being accessed). The optimizer is then able to create a highly accurate access path that is based on the least cost. If statistics are not available, the optimizer uses a simple algorithm to calculate the statistics, which often leads to “less than optimal” access paths. In other words, Oracle guesses and it is sometimes not a very good guesser.
Object Statistics
Object statistics describe the contents of the data and how it is stored in the physcal database objects. Before we continue, it may be beneficial for me to provide you with a few examples:
If a table only contains a few hundred rows, queries may run faster if the optimizer chooses to read all of the blocks in the table as opposed to using an index. The I/O generated traversing the index blocks to get to the table row entries would be higher than if Oracle read just the blocks allocated to the table being accessed.
What if we access a table using a column with poor cardinality? Cardinality describes the uniqueness of a particular value stored in a column. Poor cardinality means that the column has many values that are the same. If our statement contains a WHERE clause that searches for a column value that is contained in 90% of the table’s rows, it is best that Oracle, once again, read each and every row in that table.
Conversely if a WHERE clause searches for a column value that appears in 1% of the table rows, it would be beneficial for the optimizer to choose an index. The optimizer also uses object statistics to determine which join method should be used and the order in which the joined tables are accessed.
Here are just a few of the object statistcs that the optimizer uses as input during access path generation:
- Number of rows in the tables being accessed
- Number of blocks allocated to the table
- Average table row length
- Number of distinct values for a column
- Number of nulls in a column
- Number of leaf blocks allocated to indexes
- Number of leveles in the B-Tree
- Clustering factor – how the order of the index matches the order of the table rows
Access Path Prediction vs The Access Path Actually Taken During Execution
It is important to understand that there can be a difference between the access path that Oracle predicts a statement will take versus the actual access path taken during execution.
Many years ago, I learned this fact the hard way. I was executing a performance analysis test plan on a database that we were migrating from Version 8 to Version 9. I kept using the Explain command (discussed in the next blog) to display the access path of a statement I was attempting to tune. The access path looked perfect to me. It was using the correct indexes, join methods, join sequence… But the transaction continued to perform poorly. Describing the statement as running “poorly” is like describing the Titanic as having a “small leak”. It ran forever.
I began to question my ability to tune, Oracle Version 9I and my chosen career path. I just couldn’t figure out why this seemingly excellent access path was affecting the statement’s performance. I decided it was time to run a SQL Trace (discussed in next blog) to generate more detailed performance information. I ran the TKPROF to format the output. I used the explain option to dump the access paths and reviewed the results. I stumbled across the titles “Row Source Operation” and “Execution Plan”. I noticed that the access paths were different. Hmm, how could that be? I began to dig into the Oracle 9i documentation for TKPROF and found that “Row Source Operation” is the access path that was taken during the statement’s execution and the “Explain Plan” is the access path that Oracle is predicting the statement will take before it executes.
Here’s an excerpt of a recent TKPROF report showing two different access paths for the same statement.If you carefully review the two sets of access paths, it should become clear that they are different.
Bind Variables
As I stated, sometimes those access paths are the same and sometimes they aren’t. The times “they aren’t” are the times that will complicate your performance tuning analysis. Oracle will change the access path for a couple of different reasons. One reason is Bind Variable Peeking.
Bind variables are used to simplify application program coding and improve the performance of the parsing process. Oracle matches statements being executed to statements that are already parsed and stored in the shared pool. If Oracle finds an exact match, it will use the statement in the shared pool. When the statement is found in the shared pool, it is described as being a soft parse. If Oracle does not find any matching SQL in the shared pool, it will perform a hard parse which requires more steps to be performed than a soft parse. As with mostly everything elese, less steps = faster performance. When Oracle looks for a matching SQL statement in the shared pool, the statements must match EXACTLY. That is, the statement must match in case, line breaks and spacing. If the SQL doesn’t match exactly, Oracle executes a hard parse.
Here’s an example of a series of SQL statements using hardcoded variables:
select first_name, last_name, dept_id from scott.emp where emp_id = 103487;
select first_name, last_name, dept_id from scott.emp where emp_id = 989566;
select first_name, last_name, dept_id from scott.emp where emp_id = 436709;Here’s the same SQL statement using bind variables:
select first_name, last_name, dept_id from scott.emp where emp_id = :empid;
select first_name, last_name, dept_id from scott.emp where emp_id = :empid;
select first_name, last_name, dept_id from scott.emp where emp_id = :empid;The first set of SQL statements will all look different to the Oracle optimizer. As a result, Oracle will be forced to hard parse each execution of the statement. One of the steps in a hard parse is to generate the statement’s access path. Regenerating a statement’s access path each time it is executed is an extremely inefficient process and can drive the utilization of “finite CPU resources” to unacceptable levels. If you execute these exact statements again with the same values for emp_id (103847, 989566 and 436709), Oracle will find the matching statements (if they haven’t been flushed or invalidated) and not have to perform a hard parse.
The second set of SQL statements highlights the use of bind variables. The application program will move the values 103847, 989566 and 436709 into the EMPID bind variable and send it to Oracle for parsing and execution. Because the bind variable makes the statement looks the same, Oracle will perform a hard parse for the first execution and a soft parse of the statement for subsequent executions. A soft parse will not regenerate the access path. This leads to faster statement execution and a reduction in resource utilization.
Using bind variables is not a panacea. If you want to learn the pros and cons of bind variables, read Jonathan Lewis’s book titled Cost-Based Oracle Fundamentals. I have provided a link in the Access Path Education paragraph at the end of this blog.
Bind Variable Peeking Potential Issues
Oracle 9I introduced bind variable peeking to improve the performance of statements using bind variables. During the SQL statement’s execution, Oracle will look at the contents of the bind variable and use that information to generate the access path. One example of where this helps is skewed data. Skewed data means that one particular value in a column may occur a few times (high cardinality), while another value may occur many times (poor cardinality). The best access path for a value that occurs only a few times could differ from the access path from the same statement that has a value in the bind variable that occurs many times. For example, DBAs often use histograms on skewed data to provide more accurate situational data to the optimizer. This often causes issues when bind peeking occurs.
A bind variable peeking problem may occur when the access path generated during the hard parse uses “an unlucky set of values” as Jonathan Lewis describes it in Cost-Based Oracle Fundamentals. That “unlucky set of values” in the bind variables could generate an optimal access path for the initial execution of that statement but not be as optimal for future executions.
If we go back to my TKPROF output, the access path changes are a result of Oracle used bind peeking to identify the contents of the bind variable. Oracle then changed the access path the statement took during execution.
Bind Variable Interpretation in SQL*PLUS
When you explain in statements in SQL*PLUS, you also need to be aware of how the tool interprets the bind variables. Bind variable interpretation will also affect access path predictions. Tom Kyte has a discussion on bind variable interpretation on his Ask Tom website.
Which Access Path Information Can You Trust?
Because the access path that Oracle predicts a statement will use often differs from the one that it actually uses, I will highlight which output you can trust and which you can’t. I’ll also provide you with a few helpful hints and tips on how to influence some of the tools to predict an access path that more closely matches the access path taken during run time.
Access Path Education
You can start your education on the different access paths that are available to the optimizer by reading Oracle’s Database Performance Tuning Guide that is provided in each Oracle release’s documentation. Before you buy third-party books on any topic, I highly suggest that you read Oracle’s documentation first.
If you want to learn how the optimizer works, I highly suggest that you read Jonathan Lewis’s book titled Cost-Based Oracle Fundamentals. It is one of the most educational and informative books I have ever read on the cost-based optimizer. Here’s my review of the book for DBAZine.
What’s Next?
In my next blog, we’ll begin our review of the various tools we can use to identify the access path a given SQL statement will take.
