Saturday, 17 August 2013

DB2 Optimizer.

 
DB2 Optimizer

The DB2 optimizer is integral to the operation of SQL statements. The optimizer, as its name implies, determines the optimal method of satisfying a SQL request. For example, consider the following statement.


              SELECT EMPNO, WORKDEPT, DEPTNAME
                FROM DSN8230, EMP, DSN8230.DEPT
              WHERE DEPTNO = WORKDEPT

This statement, whether embedded statically in an application program or executed dynamically, must be passed through the DB2 optimizer before execution.

The optimizer parses the statement and determines the following :
  • Which table must be accessed. 
  • Which columns from those tables need to be returned 
  • Which columns participate in the SQL statement’s predicates 
  • Whether there are any indexes for this combination of tables and columns 
  • What statistics are available in the DB2 catalog 

Based on this information, the optimizer analyzes the possible access paths and chooses the best one for the given query. An access path is the navigation logic used by DB2 to access the requisite data. A tablespace scan using sequential prefetch is an example of a DB2 access path.

Note : Many factors can cause the DB2 optimizer to choose the wrong access path, such as incorrect or outdated statistics in the DB2 catalog, an improper physical or logical database design, an improper use of SQL (for example, record-at-a-time processing), or bugs in the logic of the optimizer (occurs infrequently).

In addition, the optimizer does not contain optimization logic for every combination and permutation of SQL statements.

Optimizing data access in DB2

The notion of optimizing data access in the DBMS is one of the most powerful capabilities of DB2. Remember, you access DB2 data by telling DB2 what to retrieve, not how to retrieve it. Regardless of how the data is physically stored and manipulated, DB2 and SQL can still access that data.

This separation of access criteria from physical storage characteristics is called physical data independence. DB2's optimizer is the component that accomplishes this physical data independence

If you remove the indexes, DB2 can still access the data (although less efficiently). If you add a column to the table being accessed, DB2 can still manipulate the data without changing the program code. This is all possible because the physical access paths to DB2 data are not coded by programmers in application programs, but are generated by DB2

The optimizer performs complex calculations based on a host of information. To visualize how the optimizer works, picture the optimizer as performing a four-step process:

  • Receive and verify the syntax of the SQL statement. 
  • Analyze the environment and optimize the method of satisfying the SQL statement. 
  • Create machine-readable instructions to execute the optimized SQL. 
  • Execute the instructions or store them for future execution. 

The second step of this process is the most intriguing. How does the optimizer decide how to execute the vast array of SQL statements that you can send its way? The optimizer has many types of strategies for optimizing SQL. How does it choose which of these strategies to use in the optimized access paths? IBM does not publish the actual, in-depth details of how the optimizer determines the best access path, but the optimizer is a cost-based optimizer. This means the optimizer will always attempt to formulate an access path for each query that reduces overall cost. To accomplish this, the DB2 optimizer applies query cost formulas that evaluate and weigh four factors for each potential access path: the CPU cost, the I/O cost, statistical information in the DB2 system catalog, and the actual SQL statement.

The EXPLAIN command

The EXPLAIN command in DB2 describes the access path selected by the DB2 optimizer for SQL query. The information provided by EXPLAIN is invaluable for determining the following :

  • The work DB2 does “behind the scenes” to satisfy a single SQL statement 
  • Whether DB2 is using available indexes, and, if indexes are used, how DB2 is using them 
  • The order in which DB2 tables are accessed to satisfy join criteria 
  • Whether a sort is required for the SQL statement 
  • Tablespace locking requirements for a statement 
  • The performance of a SQL statement based on the access paths chosen 

To EXPLAIN a single SQL statement, precede the SQL statement with the EXPLAIN command as follows :

        EXPLAIN ALL SET QUERYNO = integer FOR SQL statement ;

It can be executed in the same way as any other SQL statement. QUERY NO, which can be set to any integer, is used for identification in the PLAN_TABLE.

The other method of issuing an EXPLAIN as a part of the BIND command. By indicating EXPLAIN (YES) when building a package or a plan, DB2 externalizes the access paths chosen for all SQL statements in that DBRM to the PLAN_TABLE.

After issuing the EXPLAIN command on your statements, you can inspect the result in PLAN_TABLE. You can use a simple SQL query to retrieve this information.

For example.

      SELECT QUERYNO, QBLOCKNO, APPLNAME, PROGNAME, PLANNO,
                  METHOD, CREATOR, TNAME, TABNO, 
        FROM Ownerid. PLAN_TABLE
                 ORDER BY APPLNAME, COLLID 

A common method of retrieving access path data from the PLAN_TABLE is to use QMF to format the results of a simple SELECT statement.


Created with Artisteer

No comments:

Post a comment