Showing posts with label SQL OPtimization.. Show all posts
Showing posts with label SQL OPtimization.. Show all posts

Saturday, 17 August 2013

Maximizing Database Performance with DB2 Optimizer: A Beginner's Guide.

DB2 Optimization
IBM DB2 Optimizer.

DB2 Optimizer is a critical component of IBM's DB2 database management system responsible for selecting the most efficient execution plan for SQL queries. This article will provide an overview of the DB2 Optimizer, including its function and operation, as well as best practices for optimizing database performance using DB2 Optimizer.

Introduction to DB2 Optimizer.

The DB2 Optimizer is a query optimization tool that determines the most efficient way to execute SQL statements. It uses a combination of heuristics and statistical information to determine the optimal execution plan for a given query.

Importance of DB2 Optimizer in database management.

The DB2 Optimizer plays a critical role in database performance, as it can greatly affect the speed and efficiency of query execution. A well-optimized query can reduce resource usage and improve response times, while a poorly optimized query can lead to slow performance and increased resource consumption.

How the DB2 Optimizer works?

The DB2 Optimizer uses a cost-based approach to determine the best query plan. It takes into account various factors such as the size of the data, the indexes available, and the distribution of the data. Based on this information, the Optimizer decides the most efficient method for executing the query, whether it be through index scans, table scans, joins, or any other operation.

One important aspect of the DB2 Optimizer is its use of statistics. Statistics provide the Optimizer with information about the distribution of data in the database, including information about the distribution of values in columns, the number of distinct values, and the distribution of NULL values. 

This information helps the Optimizer to make informed decisions about the best way to access the data. DB2 periodically collects statistics automatically, but it is also possible to manually update the statistics to provide the Optimizer with the most current information.

The DB2 Optimizer also considers the use of indexes in its plan selection. Indexes are used to quickly access specific rows in a table, and the Optimizer will often choose to use an index scan instead of a full table scan if it determines that the index will be more efficient. The Optimizer takes into account the size of the index and the number of distinct values in each column to determine the best plan.

In addition to using statistics and indexes, the DB2 Optimizer also considers the use of materialized query tables (MQTs). MQTs are pre-computed results of a query that can be stored and reused, which can significantly improve query performance. The Optimizer will consider using MQTs in its plan selection if it determines that using an MQT will be more efficient than executing the query from the underlying data.

The DB2 Optimizer also has the ability to use dynamic query optimization. This means that it can modify its plan during query execution if it determines that a different plan would be more efficient. For example, if the Optimizer determines that a joint operation would be more efficient after accessing a certain number of rows, it can switch to a join operation mid-query. This can result in significant performance improvements for complex queries.

How the Query is Optimized? 

Consider the following example to illustrate how the DB2 Optimizer works.

DB2 Table: 
 CUSTOMER_ID CUSTOMER_NAME CITY
     1           John Doe New York
     2           Jane Doe London
     3           Jack Smith Paris

Suppose we have a table named "CUSTOMERS" that contains the following data:

We want to retrieve the name and city of a customer with a specific customer ID. To do this, we can write the following SQL query:

SQL Query: 
SELECT CUSTOMER_NAME, CITY
FROM CUSTOMERS
WHERE CUSTOMER_ID = 2;

When this query is executed, the DB2 Optimizer will use the following steps to determine the best plan for executing the query:

  • Statistics Collection: The DB2 Optimizer will first collect statistics about the CUSTOMERS table to determine the size and distribution of the data. This information will be used to make informed decisions about how to access the data.
  • Plan Generation: Based on the statistics collected, the DB2 Optimizer will generate a plan for executing the query. In this case, it may determine that the best plan is to use an index scan on the "CUSTOMER_ID" column to quickly access the row with the specified customer ID.
  • Plan Execution: Finally, the DB2 Optimizer will execute the selected plan, which in this case will be an index scan on the "CUSTOMER_ID" column. This will result in the retrieval of the name and city of the customer with the specified customer ID, which in this case is "Jane Doe" and "London".
In this example, the DB2 Optimizer used a cost-based approach to determine the most efficient method of executing the query and selected the best plan based on the information about the data and the available indexes. By doing so, the Optimizer was able to greatly improve the performance of the query and retrieve the desired data quickly and efficiently.

What are the benefits of IBM DB2 Optimizer?

The IBM DB2 Optimizer provides several benefits to users of the DB2 database management system, including:
  • Improved Performance: By selecting the most efficient plan for executing a query, the DB2 Optimizer can greatly improve the performance of an application that relies on the database. This can result in faster query execution times and reduced resource utilization, leading to improved overall performance.
  • Reduced Resource Utilization: By making informed decisions about how to access data in the database, the DB2 Optimizer can reduce the number of resources required to execute a query. This can result in a reduction in CPU utilization, I/O operations, and memory usage, freeing up resources for other applications.
  • Better Data Access: The DB2 Optimizer takes into account the use of indexes and MQTs when selecting a query plan, which can result in improved data access and faster query execution times.
  • Dynamic Optimization: The ability of the DB2 Optimizer to modify its plan during query execution can result in significant performance improvements for complex queries.
  • Query Hints: The use of query hints provides a way for developers to explicitly specify the desired execution plan for a query, which can be useful in cases where the Optimizer is not making the best decisions or where a particular plan is required for a specific query.
  • Enhanced Development Productivity: By improving the performance of an application, the DB2 Optimizer can enhance the productivity of developers who are working on the application. This can result in faster development times and more efficient use of resources.

Conclusion. 

In summary, the IBM DB2 Optimizer is a powerful tool for improving the performance of a DB2 database. By making informed decisions about how to access data and executing queries in the most efficient manner possible, the Optimizer can greatly enhance the performance and functionality of an application.


Subscribe to Topictrick & Don't forget to press THE BELL ICON to never miss any updates. Also, Please visit mention the link below to stay connected with Topictrick and the Mainframe forum on - 

► Youtube
► Facebook 
► Reddit

Thank you for your support. 
Mainframe Forum™

Saturday, 10 August 2013

Top 10+ SQL Optimization Tips | DB2 Performance Tips.

In today's SQL optimization tips tutorial, you will learn the top 10 DB2 performance tips that developers can do in their SQL code to ensure better performance. In reality, 90% of the runtime problems in relational database management systems (RDBMS) are typically due to 10% of application issues.

Today's DB2 optimizes plays a vital role in selecting the correct access paths to fulfill SQL requests, but it can never be 100% correct. SQL performance tuning efforts are directly related to CPU usage, I/O, and concurrency. Most of the tips discussed in this tutorial are specific to SQL coding, but all tips are related to performance and tuning.

What is SQL Optimization or DB2 SQL Performace Tuning?


The DB2 optimizer does most of the optimization work but it's not sufficient. In laymen term, SQL optimization is a collection of techniques that are used to tweak the SQL query, so that data can be retrieved in an effective way from the DB2 database.

Is DB2 SQL Query Tuning really require?


Well, the answer is yes. As a programmer, you definitely need some sort of SQL performance tuning. Poorly coded programs or improperly coded SQL statements are most often the culprits causing poor performance of an application in production.

The SQL language seems quite easy to learn and master, but there are many inefficiencies in how the DB2 optimizer handles the language when it comes to performance. Developers, testers, DBAs, and business analysts alike will all benefit from this book, which provides many tips specific to coding proper SQL for efficiency.

What are the most common causes of poor performance in an application, program, or query?


  • Poor database design
  • Poor application design
  • Poorly structured SQL queries
  • Poor/inadequate catalog statistics
  • Poor allocation of system type resources such as buffer pools, RID pools, and logging of records

These are some of the ways you can address poor optimization:

  • Rewrite queries a different way
  • Rewrite predicates in a query differently
  • Change the design of the database objects
  • Change the design of the database partitioning
  • Add a partitioning strategy to an already existing table
  • Manipulate, modify, or enhance catalog statistics to account for the addition of nonstandard statistics
  • Modify system resources

Always remember that the DB2 optimizer simply does the following:

  • Accepts SQL queries
  • Validates syntax
  • Validates against the DB2 catalog
  • Checks catalog statistics against objects in a query
  • Applies costs to all different access path scenarios

#Tips 1: Eradicate any/all Scalar Functions Coded on Columns in Predicates.

Keep in mind that scalar functions are perfectly fine when placed on columns in the Select portion of an SQL statement, but they will automatically make a predicate non-indexable and stage 2 if used on columns in the Where portion. For example:

SELECT EMPNO, LASTNAME
FROM EMP
WHERE YEAR(HIREDATE) = 2019

Optimized SQL:

SELECT EMPNO, LASTNAME
FROM EMP
WHERE HIREDATE BETWEEN '2018-01-01' and '2018-12-31'

Once you rewrite this statement, as shown above, DB2 can choose to use an index on the column HIREDATE if one exists. Having it coded with the YEAR function in the predicate will not allow DB2 to use an index for that column.

#Tips 2: Remove Any/All Mathematics Coded on Columns in Predicates.

There is usually a way to recode a predicate without using mathematics on a column. Using mathematics is perfectly fine when placed on columns in the Select portion of an SQL statement, but it will automatically make a predicate non-indexable if used on columns in the Where portion. For example:

SELECT EMPNO, LASTNAME
FROM EMP
WHERE SALARY * 1.1 > 50000.00

should be coded as:

SELECT EMPNO, LASTNAME
FROM EMP
WHERE SALARY > 50000.00 / 1.1

After rewriting the above SQL query, DB2 can choose to use an index on the column SALARY, if one exists. Having it coded with mathematics directly applied to the SALARY column does not allow DB2 to use an index on that column. Always have columns coded by themselves on one side of the operator and have all calculations on the opposite side.

#Tips 3: Code Only the Columns Needed in the Select Portion of the SQL Statement

By having columns that are not needed, the optimizer can choose Indexonly = 'N', which could force DB2 to also go to the data page in order to get specific columns requested, and this would require more I/O. It can also make sorting more expensive due to the extra columns that may be part of any sorts, which causes a larger sort file to be created and passed. In terms of sorts, size has a direct cost. A sort is expensive as its size.

Having extra columns specified can also have an effect on the optimizer's choice of join type if there are multiple tables in a query. Currently there are four types of joins in z/OS® DB2 (nested loop, merge scan, hybrid, and star), and there is a hash join in DB2 LUW. The optimizer chooses each one for various reasons. Having extra columns that are not used may keep the optimizer from making the best choice in join processing.

#Tips 4: Stay Away from Distinct if Possible

Most of the time, the Distinct function causes a sort of the final result set, which makes it one of the most expensive sorts. Distinct has always been one of the most expensive functions in the SQL language. However, as of DB2 V9, the optimizer tries to take advantage of an index to eliminate a sort for uniqueness as it currently does in optimizing with a Group By statement. Rewriting a query to obtain the same results without having Distinct in the SQL is usually more efficient. Developers today are very Distinct happy, with many coding it on all statements to ensure that no duplicates show up. But this is not efficient coding. One of the first things I always do when tuning applications is to scan source code and grab all the statements that contain Distinct, understand whether the query really gets duplicates, check to see if a sort is being done due to the Distinct, and rewrite them. With a thorough analysis of nightly batch jobs, this can easily knock time off the batch cycles. Keep in mind that having a Distinct in the query may be efficient as long as there is no sort being executed because of it.

If duplicates are to be eliminated from the result set, try:

Group By (if prior to V9), which looks to take advantage of any associated indexes to eliminate a sort for uniqueness.

Rewriting the query using an In or Exists subquery. This will work if the table causing the duplicates (due to a one to many relationship) does not have data being returned as part of the result set.

For example, provide a list of employees who are currently working on projects. Many of the employees work on multiple projects at the same time, but we only want to see them once. This:

SELECT DISTINCT E.EMPNO, E.LASTNAME
FROM EMP E,
   EMPPROJACT EP
WHERE E.EMPNO = EP.EMPNO

can also be rewritten as:

SELECT E.EMPNO, E.LASTNAME
FROM EMP E,EMPPROJACT EP
WHERE E.EMPNO = EP.EMPNO
GROUP BY E.EMPNO, E.LASTNAME

and can also be rewritten as:

SELECT E.EMPNO, E.LASTNAME
FROM EMP E
WHERE EXISTS
     (SELECT 1
     FROM  EMPPROJACT EP
     WHERE E.EMPNO = EP.EMPNO)

and can also be rewritten as:

SELECT E.EMPNO, E.LASTNAME
FROM EMP E
WHERE E.EMPNO IN
     (SELECT EP.EMPNO
      FROM EMPPROJACT EP)

#Tips 5: Try Rewriting an In Subquery as an Exists Subquery


The In and Exists subqueries produce the same results, but they operate very differently. Typically one will perform better than the other, depending on data distributions. For example:

SELECT E.EMPNO, E.LASTNAME
FROM EMP E
WHERE E.EMPNO IN
   (SELECT D.MGRNO
    FROM DEPARTMENT D
    WHERE D.DEPTNO LIKE 'D%')

can also be coded as:

SELECT E.EMPNO, E.LASTNAME
FROM EMP E
WHERE EXISTS
    (SELECT 1
     FROM DEPARTMENT D
     WHERE D.MGRNO = E.EMPNO
      AND D.DEPTNO LIKE 'D%')

These correlated and noncorrelated subqueries have different processing advantages. DB2 V9 may now transform subqueries to whichever type it deems more efficient, especially when a subquery cannot be transformed into a join. DB2 may choose to transform a subquery into a join, handling the duplicates. This can lead to some confusion when you look at a DB2 Explain and see a join or a different type of subquery than what was originally coded.

DB2 chooses to correlate, de-correlate, or transform into a join based on cost. The IBM Data Studio tool shows any query transformations that the optimizer has done.

#Tips 6: Always Make Sure Host Variables Are Defined to Match the Columns Datatype

If a column is defined as a small integer, then the host variable that it is being compared to should be declared as the same definition (like S9(4) comp for COBOL). This has been improved dramatically as of DB2 V8 and V9, where DB2 handles different numeric datatypes and different character strings being compared more efficiently. Some development languages do not have all the same datatypes as can be defined in DB2, which caused problems in the past. But as a general rule, making the declared datatypes match the column definition ensures the most efficient performance and optimization. For example, if a column is defined as an integer datatype, then any host variable containing values for comparison in a predicate should have an integer definition (not small integer, decimal, floating point, etc.).

COBOL programmers should take advantage whenever possible of created DCLGEN host variables to ensure exact matches. DCLGEN stands for Declarations Generator and is a facility to generate DB2 SQL data structures in COBOL and many other languages. For COBOL, it will produce a declaration of the table EXEC SQL DECLARE TABLE and also the COBOL definitions of the host variables for the columns in the table.

#Tips 7: Because Or Logic Can Be Problematic to the Optimizer, Try a Different Rewrite


Split predicates out if possible. The following example clarifies this:

SELECT DEPTNO, DEPTNAME
FROM DEPT
WHERE (ADMRDEPT = 'E01'
       AND DEPTNAME LIKE 'BRANCH%')
  OR (DEPTNO = 'D01'
       AND DEPTNAME LIKE 'BRANCH%')

can also be tried as:

SELECT DEPTNO, DEPTNAME
FROM DEPT
WHERE (ADMRDEPT = 'E01' OR
      DEPTNO = 'D01')
 AND DEPTNAME LIKE 'BRANCH%'

The idea here is to see that both examples have the same logic.

Many times this will move the optimizer to indexable from non-indexable or possibly move the optimizer to choose a multi-index processing access path, or move from multi-index process. Connecting predicates with an OR creates non-Boolean term predicates, which means neither predicate can totally eliminate a row as false—and it is usually best to stay away from those. If one predicate is false, the other may be true. It is usually more efficient to connect predicates by Anding them together. But in this case, it gives the optimizer more choices.

When you have predicates that are connected by an 'OR', it is important to code the predicate that matches the most number of rows in the table first. This is because as soon as a row qualifies, predicate evaluation within the OR list stops.

When simple predicates are connected by OR logic, the resulting compound predicate will be evaluated at the higher stage 1 or 2 of the simple predicates. For example:

SELECT EMPNO, LASTNAME, SALARY
FROM EMP
WHERE WORKDEPT = 'D01'    -- Stage 1 Indexable
  OR EDLEVEL <> 16      -- Stage 1 Non Indexable

Because the second predicate is a stage 1 non-indexable predicate, and the other simple predicate is stage 1 indexable, the entire compound predicate becomes stage 1 non-indexable:

SELECT EMPNO, LASTNAME, SALARY
FROM EMP
WHERE WORKDEPT = 'D01'    -- Stage 1 Indexable
  OR YEAR(HIREDATE ) = 1990  -- Stage 2

Because the second predicate is a stage 2 non-indexable predicate, and the other simple predicate is stage 1 indexable, the entire compound predicate becomes stage 2 non-indexable.




New In-feed ads