Saturday, 10 August 2013

Top 10+ SQL Optimization Tips | DB2 Performance Tips.

In today's SQL optimization tips tutorial, you will learn the top 10 DB2 performance tips that developers can do in their SQL code to ensure better performance. In reality, 90% of the runtime problems in relational database management systems (RDBMS) are typically due to 10% of application issues.

Today's DB2 optimizes plays a vital role in selecting the correct access paths to fulfill SQL requests, but it can never be 100% correct. SQL performance tuning efforts are directly related to CPU usage, I/O, and concurrency. Most of the tips discussed in this tutorial are specific to SQL coding, but all tips are related to performance and tuning.

What is SQL Optimization or DB2 SQL Performace Tuning?


The DB2 optimizer does most of the optimization work but it's not sufficient. In laymen term, SQL optimization is a collection of techniques that are used to tweak the SQL query, so that data can be retrieved in an effective way from the DB2 database.

Is DB2 SQL Query Tuning really require?


Well, the answer is yes. As a programmer, you definitely need some sort of SQL performance tuning. Poorly coded programs or improperly coded SQL statements are most often the culprits causing poor performance of an application in production.

The SQL language seems quite easy to learn and master, but there are many inefficiencies in how the DB2 optimizer handles the language when it comes to performance. Developers, testers, DBAs, and business analysts alike will all benefit from this book, which provides many tips specific to coding proper SQL for efficiency.

What are the most common causes of poor performance in an application, program, or query?


  • Poor database design
  • Poor application design
  • Poorly structured SQL queries
  • Poor/inadequate catalog statistics
  • Poor allocation of system type resources such as buffer pools, RID pools, and logging of records

These are some of the ways you can address poor optimization:

  • Rewrite queries a different way
  • Rewrite predicates in a query differently
  • Change the design of the database objects
  • Change the design of the database partitioning
  • Add a partitioning strategy to an already existing table
  • Manipulate, modify, or enhance catalog statistics to account for the addition of nonstandard statistics
  • Modify system resources

Always remember that the DB2 optimizer simply does the following:

  • Accepts SQL queries
  • Validates syntax
  • Validates against the DB2 catalog
  • Checks catalog statistics against objects in a query
  • Applies costs to all different access path scenarios

#Tips 1: Eradicate any/all Scalar Functions Coded on Columns in Predicates.

Keep in mind that scalar functions are perfectly fine when placed on columns in the Select portion of an SQL statement, but they will automatically make a predicate non-indexable and stage 2 if used on columns in the Where portion. For example:

SELECT EMPNO, LASTNAME
FROM EMP
WHERE YEAR(HIREDATE) = 2019

Optimized SQL:

SELECT EMPNO, LASTNAME
FROM EMP
WHERE HIREDATE BETWEEN '2018-01-01' and '2018-12-31'

Once you rewrite this statement, as shown above, DB2 can choose to use an index on the column HIREDATE if one exists. Having it coded with the YEAR function in the predicate will not allow DB2 to use an index for that column.

#Tips 2: Remove Any/All Mathematics Coded on Columns in Predicates.

There is usually a way to recode a predicate without using mathematics on a column. Using mathematics is perfectly fine when placed on columns in the Select portion of an SQL statement, but it will automatically make a predicate non-indexable if used on columns in the Where portion. For example:

SELECT EMPNO, LASTNAME
FROM EMP
WHERE SALARY * 1.1 > 50000.00

should be coded as:

SELECT EMPNO, LASTNAME
FROM EMP
WHERE SALARY > 50000.00 / 1.1

After rewriting the above SQL query, DB2 can choose to use an index on the column SALARY, if one exists. Having it coded with mathematics directly applied to the SALARY column does not allow DB2 to use an index on that column. Always have columns coded by themselves on one side of the operator and have all calculations on the opposite side.

#Tips 3: Code Only the Columns Needed in the Select Portion of the SQL Statement

By having columns that are not needed, the optimizer can choose Indexonly = 'N', which could force DB2 to also go to the data page in order to get specific columns requested, and this would require more I/O. It can also make sorting more expensive due to the extra columns that may be part of any sorts, which causes a larger sort file to be created and passed. In terms of sorts, size has a direct cost. A sort is expensive as its size.

Having extra columns specified can also have an effect on the optimizer's choice of join type if there are multiple tables in a query. Currently there are four types of joins in z/OS® DB2 (nested loop, merge scan, hybrid, and star), and there is a hash join in DB2 LUW. The optimizer chooses each one for various reasons. Having extra columns that are not used may keep the optimizer from making the best choice in join processing.

#Tips 4: Stay Away from Distinct if Possible

Most of the time, the Distinct function causes a sort of the final result set, which makes it one of the most expensive sorts. Distinct has always been one of the most expensive functions in the SQL language. However, as of DB2 V9, the optimizer tries to take advantage of an index to eliminate a sort for uniqueness as it currently does in optimizing with a Group By statement. Rewriting a query to obtain the same results without having Distinct in the SQL is usually more efficient. Developers today are very Distinct happy, with many coding it on all statements to ensure that no duplicates show up. But this is not efficient coding. One of the first things I always do when tuning applications is to scan source code and grab all the statements that contain Distinct, understand whether the query really gets duplicates, check to see if a sort is being done due to the Distinct, and rewrite them. With a thorough analysis of nightly batch jobs, this can easily knock time off the batch cycles. Keep in mind that having a Distinct in the query may be efficient as long as there is no sort being executed because of it.

If duplicates are to be eliminated from the result set, try:

Group By (if prior to V9), which looks to take advantage of any associated indexes to eliminate a sort for uniqueness.

Rewriting the query using an In or Exists subquery. This will work if the table causing the duplicates (due to a one to many relationship) does not have data being returned as part of the result set.

For example, provide a list of employees who are currently working on projects. Many of the employees work on multiple projects at the same time, but we only want to see them once. This:

SELECT DISTINCT E.EMPNO, E.LASTNAME
FROM EMP E,
   EMPPROJACT EP
WHERE E.EMPNO = EP.EMPNO

can also be rewritten as:

SELECT E.EMPNO, E.LASTNAME
FROM EMP E,EMPPROJACT EP
WHERE E.EMPNO = EP.EMPNO
GROUP BY E.EMPNO, E.LASTNAME

and can also be rewritten as:

SELECT E.EMPNO, E.LASTNAME
FROM EMP E
WHERE EXISTS
     (SELECT 1
     FROM  EMPPROJACT EP
     WHERE E.EMPNO = EP.EMPNO)

and can also be rewritten as:

SELECT E.EMPNO, E.LASTNAME
FROM EMP E
WHERE E.EMPNO IN
     (SELECT EP.EMPNO
      FROM EMPPROJACT EP)

#Tips 5: Try Rewriting an In Subquery as an Exists Subquery


The In and Exists subqueries produce the same results, but they operate very differently. Typically one will perform better than the other, depending on data distributions. For example:

SELECT E.EMPNO, E.LASTNAME
FROM EMP E
WHERE E.EMPNO IN
   (SELECT D.MGRNO
    FROM DEPARTMENT D
    WHERE D.DEPTNO LIKE 'D%')

can also be coded as:

SELECT E.EMPNO, E.LASTNAME
FROM EMP E
WHERE EXISTS
    (SELECT 1
     FROM DEPARTMENT D
     WHERE D.MGRNO = E.EMPNO
      AND D.DEPTNO LIKE 'D%')

These correlated and noncorrelated subqueries have different processing advantages. DB2 V9 may now transform subqueries to whichever type it deems more efficient, especially when a subquery cannot be transformed into a join. DB2 may choose to transform a subquery into a join, handling the duplicates. This can lead to some confusion when you look at a DB2 Explain and see a join or a different type of subquery than what was originally coded.

DB2 chooses to correlate, de-correlate, or transform into a join based on cost. The IBM Data Studio tool shows any query transformations that the optimizer has done.

#Tips 6: Always Make Sure Host Variables Are Defined to Match the Columns Datatype

If a column is defined as a small integer, then the host variable that it is being compared to should be declared as the same definition (like S9(4) comp for COBOL). This has been improved dramatically as of DB2 V8 and V9, where DB2 handles different numeric datatypes and different character strings being compared more efficiently. Some development languages do not have all the same datatypes as can be defined in DB2, which caused problems in the past. But as a general rule, making the declared datatypes match the column definition ensures the most efficient performance and optimization. For example, if a column is defined as an integer datatype, then any host variable containing values for comparison in a predicate should have an integer definition (not small integer, decimal, floating point, etc.).

COBOL programmers should take advantage whenever possible of created DCLGEN host variables to ensure exact matches. DCLGEN stands for Declarations Generator and is a facility to generate DB2 SQL data structures in COBOL and many other languages. For COBOL, it will produce a declaration of the table EXEC SQL DECLARE TABLE and also the COBOL definitions of the host variables for the columns in the table.

#Tips 7: Because Or Logic Can Be Problematic to the Optimizer, Try a Different Rewrite


Split predicates out if possible. The following example clarifies this:

SELECT DEPTNO, DEPTNAME
FROM DEPT
WHERE (ADMRDEPT = 'E01'
       AND DEPTNAME LIKE 'BRANCH%')
  OR (DEPTNO = 'D01'
       AND DEPTNAME LIKE 'BRANCH%')

can also be tried as:

SELECT DEPTNO, DEPTNAME
FROM DEPT
WHERE (ADMRDEPT = 'E01' OR
      DEPTNO = 'D01')
 AND DEPTNAME LIKE 'BRANCH%'

The idea here is to see that both examples have the same logic.

Many times this will move the optimizer to indexable from non-indexable or possibly move the optimizer to choose a multi-index processing access path, or move from multi-index process. Connecting predicates with an OR creates non-Boolean term predicates, which means neither predicate can totally eliminate a row as false—and it is usually best to stay away from those. If one predicate is false, the other may be true. It is usually more efficient to connect predicates by Anding them together. But in this case, it gives the optimizer more choices.

When you have predicates that are connected by an 'OR', it is important to code the predicate that matches the most number of rows in the table first. This is because as soon as a row qualifies, predicate evaluation within the OR list stops.

When simple predicates are connected by OR logic, the resulting compound predicate will be evaluated at the higher stage 1 or 2 of the simple predicates. For example:

SELECT EMPNO, LASTNAME, SALARY
FROM EMP
WHERE WORKDEPT = 'D01'    -- Stage 1 Indexable
  OR EDLEVEL <> 16      -- Stage 1 Non Indexable

Because the second predicate is a stage 1 non-indexable predicate, and the other simple predicate is stage 1 indexable, the entire compound predicate becomes stage 1 non-indexable:

SELECT EMPNO, LASTNAME, SALARY
FROM EMP
WHERE WORKDEPT = 'D01'    -- Stage 1 Indexable
  OR YEAR(HIREDATE ) = 1990  -- Stage 2

Because the second predicate is a stage 2 non-indexable predicate, and the other simple predicate is stage 1 indexable, the entire compound predicate becomes stage 2 non-indexable.




No comments:

Post a comment