Tuesday, 21 February 2012
AWR REPORT ANALYSING
AWR (Automatic Workload Repository) Reports are very handy during performance tuning of the database. These reports provide clear picture of database side activities. Various useful data is nicely documented in different sections in readable format. It becomes necessity to understand these reports for teams involved in activities like database tuning,performance/load testing or analyzing any performance issue.
Purpose of the document is to enable the reader to understand basic AWR terminology and to use these details for understanding the database performance issues.
This document describes how to utilize AWR reports to tune,
- Physical database instance
- Identify SQLs which are performing poorly
This document is intended for:
- Performance/ Load testing team
- Oracle DBAs
- Development team
2.AWR Walk through
AWR reports are divided into various sections. Each section provides specific information about database health. We will go through each of the important sections and will try to explore information captured in these sections.
|DB Name||DB Id||Instance||Inst num||Release||RAC||Host|
Provides information about the database,
- Name of the database
- Database ID
- Instance Name and Instance Number
- Oracle Database Server release version
- Whether database is part of RAC?
- Machine name hosting the database instance
|Snap Id||Snap Time||Sessions||Cursors/Session|
|DB Time:||820.16 (mins)|
Before we go through this table, we will see,
2.1.1.How AWR reports are generated?
Starting from Oracle 10g onwards, at every 60 mins database server takes snapshot of a database. Snapshot gathers database health related information at that point of time. This information is stored in Oracle data dictionary and servives database restarts.
Each snapshot is identified by a snapshot ID. In order to generate AWR reports, 2 snapshot IDs are needed. Latest of these snapshots is compared with old snapshot and based on the comparison an AWR report gets generated.
Above table provides information about,
1.Begin Snapshot ID and End Snapshot ID along with date and time details for the snapshot
2.The Sessions column indicates total no. of database sessions active when the snapshot was taken.
3.Elapsed Time column indicates the duration in mins between 2 snapshots. In this example, the duration is approx. 60 mins
4.DB Time columns represents, total time spent by foreground processes. It includes CPU Time, IO Time and Wait time and excludes Idle wait time and time taken by background processes.
(DB time) / (Elapsed time) gives total no. of database active sessions during that period. In case of above scenario total no. of active sessions are, 820.16 (DB Time) / 59.96 (Elapsed Time), which is 13.67 sessions.
|Buffer Cache:||740M||724M||Std Block Size:||8K|
|Shared Pool Size:||264M||280M||Log Buffer:||6,148K|
The section gives information about size details for memory componets
- Buffer Cache
- Shared Pool
- Standard Block Size
- Log Buffer
From Oracle 10g onwards, database server does Automatic Memory Management (AMM) for PGA and SGA components. Based on load, database server keeps on allocating or deallocating memory assigned to different components of SGA and PGA. Due to this reason, we can observe different sizes for Buffer Cache and Shared Pool, at the beginning or end of AWR snapshot period.
|Per Second||Per Transaction|
This section provides information Per Second and Per Transaction basis. For example it shows,
1.How many Hard Parse are happening
2.How much Redo is generated
3.How many Physical Reads are happening etc
4.How many Sorts are happening
This information is proportionate to the amount of load experienced by the database server during that period.
2.3.1.How load profile information is helpful during load testing?
Ans: Load testing for application release is done to assess the release performance. Load test is first done on baseline version and then performed on the release version. It is recommended to generate AWR reports for baseline as well as for the release load test.
In order to assess performance of release test against baseline test, we can compare the load profile section of AWR for both the tests. This comparison will give slight idea about the release performance. If there are significant differences in the count of baseline and release load profile figures, then that could give some pointers for investigation in case the release load test is bad in terms of performance.
2.3.2.Is it ok to keep the snapshot frequency to default i.e. every 60 minutes?
To analyse the performance issues in detail, it is recommended to generate AWR reports with interval less than 60 min. May be one can generate AWR reports at every 10 minutes only during the load test interval, and later change the frequency to default one.
2.4.Instance Efficiency Percentages (Target 100%)
|Buffer Nowait %:||99.29||Redo NoWait %:||100.00|
|Buffer Hit %:||99.41||In-memory Sort %:||100.00|
|Library Hit %:||78.19||Soft Parse %:||90.15|
|Execute to Parse %:||81.10||Latch Hit %:||99.44|
|Parse CPU to Parse Elapsd %:||9.09||% Non-Parse CPU:||98.03|
The section provides information about hit ratios for different memory components. These ratios, tells how often a particular data is found in a respecitive memory structure. Details about hit ratios are as mentioned delow,
Buffer Hit: Shows % of times particular block was found in buffer cache, instead of reading it from disk.
Buffer Nowait: Indicates % of times data buffers were accessed directly without any wait time.
Library Hit%: Shows % of times SQL and PL/SQL found in shared pool.
In-Memory Sort %: Shows %of times Sorting operations happened in memory than in the disk (temporary tablespace).
Soft parse %: Shows % of times the SQL in shared pool is used.
Latch Hit %: Shows % of time latches are acquired without having to wait.
As per Oracle AWR report the target should be 100% for these ratios. But in reality this is not possible always. Hence the ratio above 80% is always healthy.
2.5.Shared Pool Statistics:
|Memory Usage %:||91.36||90.18|
|% SQL with executions>1:||85.32||78.53|
|% Memory for SQL w/exec>1:||95.71||97.93|
% SQL with executions>1: Shows % of SQLs executed more than 1 time. The % should be very near to value 100.
% memory for SQL w/exec>1: From the memory space allocated to cursors, shows which % has been used by cursors more than 1.
The ratio above 80% is always healthy.
2.6.Top 5 Timed Events:
|Event||Waits||Time(s)||Avg Wait(ms)||% Total Call Time||Wait Class|
|cursor: pin S wait on X||963,939||10,958||11||22.3||Concurrency|
|db file sequential read||6,301,396||9,923||2||20.2||User I/O|
|read by other session||7,022,258||5,819||1||11.8||User I/O|
|row cache lock||1,881,861||3,899||2||7.9||Concurrency|
This section shows TOP 5 wait events the processes were waiting on during the snapshot time period. These events are helpful during analysis of any database related performance bottlenecks. The
- Wait Class, column helps in classifying whether the issue is related to application or infrastructure.
- Waits, column provides information about no. of wait happens.
- Time(s), column provides information about total CPU time in seconds spent on the wait.
Important wait events and their causes are explained in detail below,
2.6.1.db file scattered read:
This event indicates wait due to full table scans or index fast full scans.
To avoid this event, identify all the tables on which FTS is happening and create proper indexes so that oracle will do Index scans instead of FTS. The index scan will help in reducing no. of IO operations.
To get an idea about tables on which FTS is happening please refer to “Segment Statistics” -> “Segments By Physical Read” section of AWR report. This section lists down both Tables and Indexes on which Physical Reads are happening. Please note that physical reads doesn’t necessarily means FTS but a possibility of FTS.
2.6.2.db file sequential read:
The event indicates that index scan is happening while reading data from table. High no. of such event may be a cause of unselective indexes i.e. oracle optimizer is not selecting proper indexes from set of available indexes. This will result in extra IO activity and will contribute to delay in SQL execution.
Generally high no. is possible for properly tuned application having high transaction activity.
2.6.3.buffer busy wait:
Indicates that particular block is being used by more than one processes at the same. When first process is reading the block the other processes goes in a wait as the block is in unshared more. Typical scenario for this event to occur is, when we have batch process which is continuously polling database by executing particular SQL repeatedly and there are more than one parallel instances running for the process. All the instances of the process will try to access same memory blocks as the SQL they are executing is the same. This is one of the situation in which we experience this event.
2.6.4. enq: TX – row lock contention:
Oracle maintenance data consistency with the help of locking mechanism. When a particular row is being modified by the process, either through Update/ Delete or Insert operation, oracle tries to acquire lock on that row. Only when the process has acquired lock the process can modify the row otherwise the process waits for the lock. This wait situation triggers this event. The lock is released whenever a COMMIT is issued by the process which has acquired lock for the row. Once the lock is released, processes waiting on this event can acquire lock on the row and perform DML operation.
2.7.Time Model Statistics:
|Statistic Name||Time (s)||% of DB Time|
|sql execute elapsed time||45,168.56||91.79|
|parse time elapsed||8,626.43||17.53|
|hard parse elapsed time||7,616.90||15.48|
|hard parse (sharing criteria) elapsed time||929.82||1.89|
|hard parse (bind mismatch) elapsed time||929.20||1.89|
|PL/SQL compilation elapsed time||731.51||1.49|
|PL/SQL execution elapsed time||440.62||0.90|
|sequence load elapsed time||409.34||0.83|
|repeated bind elapsed time||26.35||0.05|
|connection management call elapsed time||6.02||0.01|
|background elapsed time||1,774.14|
|background cpu time||448.11|
Important statistics here is the DB Time. The statistic represents total time spent in database calls. It is calculated by aggregating the CPU time and wait time of all sessions not waiting on idle event (non-idle user sessions). Since this timing is cumulative time for all non-idle sessions, it is possible that the time will exceed the actual wall clock time.
Objective of tuning oracle database is to reduce the amount of time users spend in performing some action on the database.
This time represents time taken by foreground sessions and not background sessions.
In above example, DB Time is 49,209.52 seconds. 91% of the time is being spent on “SQL execution elapsed time” i.e for SQL execution.
DB CPU represents time spent on CPU resource by foreground user processes. This time doesn’t include waiting time for CPU. DB CPU is contributing to 22% of total DB time.
Important thing to note here is, the actual wall clock time is around 3600 seconds (difference between 2 snapshots) but the DB CPU shown here is 10,932 seconds.
If the server machine (on which database server is running) has more than 1 CPUs, it is possible to have DB CPU greater than actual wall clock time. In this example the database server is hosted on machine with 8 CPUs. Hence a 1 second is divided into 8 parts. Hence DB CPU 10,932 seconds means, 10,932 (DB CPU) / 8 (CPU) = 1366 wall clock seconds.
The “Parse time elapsed” and “Hard parse elapsed time” has taken around 17% and 15% of the total DB time. “Parse time elapsed” represents time spent for Syntax and Semantic checks. Whereas “Hard parse include time” represents time spent for Syntax and Semantic checks PLUS time spent for optimizing the SQL and generating optimizer plan.
|Wait Class||Waits||%Time -outs||Total Wait Time (s)||Avg wait (ms)||Waits /txn|
The events are sorted in descending order of “Total Wait Time (s)” column.
There are over 800 distinct wait events. Oracle has grouped these wait events in 12 wait classes. These wait classes are further divided in 2 categories, Administrative Wait Class and Application Wait Class.
These wait classes gives overall information about whether the waits happening for Application or for System events.
In the example above first 2 rows show that total wait time is higher for Concurrency and “User I/O” wait classes. Though we won’t have much control on reducing concurrency, but we could aim at reducing the User IO.
High User IO means,
- From the pool of available indexes proper indexes are not being used
- FTS is happening on big tables with millions of rows
This could give pointers for further tuning.
|Event||Waits||%Time -outs||Total Wait Time (s)||Avg wait (ms)||Waits /txn|
|cursor: pin S wait on X||963,939||99.70||10,958||11||5.10|
|db file sequential read||6,301,396||0.00||9,923||2||33.33|
|read by other session||7,022,258||0.00||5,819||1||37.15|
|row cache lock||1,881,861||0.00||3,899||2||9.95|
|log file sync||235,762||0.00||2,988||13||1.25|
|gc buffer busy||835,453||0.00||752||1||4.42|
|log file parallel write||145,027||0.00||716||5||0.77|
|library cache load lock||17,095||0.00||420||25||0.09|
|enq: TX – index contention||24,548||0.00||378||15||0.13|
|latch: cache buffers chains||172,466||0.00||292||2||0.91|
|kksfbc child completion||5,527||98.72||276||50||0.03|
|gc current grant 2-way||175,219||0.00||257||1||0.93|
|library cache lock||78,169||0.00||195||2||0.41|
|enq: SQ – contention||17,467||0.00||194||11||0.09|
|gc cr grant 2-way||141,816||0.00||148||1||0.75|
|gc current block 2-way||103,694||0.00||115||1||0.55|
|enq: HW – contention||4,142||0.00||112||27||0.02|
|db file parallel write||44,214||0.00||92||2||0.23|
|db file scattered read||31,617||0.00||89||3||0.17|
|latch: shared pool||81,395||0.00||87||1||0.43|
|log file sequential read||3,652||0.00||80||22||0.02|
|latch: row cache objects||40,058||0.00||76||2||0.21|
|latch: library cache||19,479||0.00||65||3||0.10|
|direct path write||18,866||0.00||53||3||0.10|
|gc cr multi block request||49,879||0.00||50||1||0.26|
|buffer busy waits||25,287||0.00||42||2||0.13|
|latch: session allocation||26,429||0.00||39||1||0.14|
|direct path read||21,708||0.00||39||2||0.11|
|enq: TX – row lock contention||914||0.00||34||37||0.00|
|ksxr poll remote instances||1,570,668||13.71||23||0||8.31|
The wait events are sorted on “Total Wait Time (s)” column in descending order. The idle events are listed down in the end. First 10 to 15 events should be looked into because rest of the events are idle events and can be ignore. These events are related to foreground processes.
2.10.Background Wait Events:
|Event||Waits||%Time -outs||Total Wait Time (s)||Avg wait (ms)||Waits /txn|
|log file parallel write||145,047||0.00||716||5||0.77|
|db file parallel write||44,226||0.00||92||2||0.23|
|log file sequential read||3,659||0.00||81||22||0.02|
|events in waitclass Other||1,651,179||17.00||71||0||8.73|
|row cache lock||10,726||0.00||35||3||0.06|
|control file sequential read||9,724||0.00||14||1||0.05|
|latch: row cache objects||9,547||0.00||13||1||0.05|
|control file parallel write||1,497||0.00||6||4||0.01|
|gc current block 2-way||3,159||0.00||5||2||0.02|
|library cache lock||1,464||0.00||4||2||0.01|
|latch: shared pool||3,523||0.00||4||1||0.02|
|Log archive I/O||3,642||0.00||3||1||0.02|
|db file sequential read||862||0.00||3||4||0.00|
|cursor: pin S wait on X||137||100.00||2||12||0.00|
|library cache pin||47||0.00||1||12||0.00|
|os thread startup||8||0.00||0||62||0.00|
|latch: cache buffers chains||70||0.00||0||7||0.00|
|gc current grant 2-way||120||0.00||0||3||0.00|
|gc cr grant 2-way||323||0.00||0||1||0.00|
The wait events are sorted on “Total Wait Time (s)” column in descending order. The idle events are listed down in the end. First 10 to 15 events should be looked into because rest of the events are idle events and can be ignore. These events are related to background processes.
This is a very important section. This section lists down TOP SQL details based on,
- Elapsed Time
- CPU Time
- No. of Gets
- No. of Reads
- No. of Executions etc.
2.11.1.SQL ordered by Elapsed Time:
|Elapsed Time (s)||CPU Time (s)||Executions||Elap per Exec (s)||% Total DB Time||SQL Id||SQL Module||SQL Text|
|1,202||97||1||1202.04||2.44||7pusny3989p7b||XXXXXXXX||declare XXX_XXXX XXX_XXX…|
|905||89||70||12.93||1.84||ftz3ja1zr82s5||XXXXXXXX||select ID , TO_CHAR(XXXX,…|
|88||11||41,246||0.00||0.18||6qz82dptj0qr7||select l.col#, l.intcol#, l….|
|59||11||0.12||2k3bu9k1p7kyz||select /*+ INDEX(ol$ ol$signat…|
|54||16||57,158||0.00||0.11||cqgv56fmuj63x||select owner#, name, namespace…|
|43||5||0.09||9g485acn2n30m||select col#, intcol#, reftyp, …|
|37||10||20,391||0.00||0.08||39m4sx9k63ba2||select /*+ index(idl_ub2$ i_id…|
|35||2||1||35.01||0.07||77zqh2uy696by||XXXXXXX||SELECT DISTINCT XX.XXXXXX…|
The section provides TOP SQLs sorted in descending order by, Elapsed Time(s).
Elapse per Exec (s): Elapse time in seconds for per execution of the SQL.
Executions: Total no. of executions for the SQL during the two snapshot period.
Elapsed Time: This is calculated by multiplying Elapse per Exec (s) by Executions.
% Total DB Time: % of DB time utilized by the SQL.
SQL Module: Provides module detail which is executing the SQL. Process name at the OS level is displayed as SQL Module name. If the module name starts with any of the names given below, then don’t consider these SQLs for tuning purpose as these SQLs are oracle internal SQLs,
- Enterprise Manager
- emagent etc…
This section is very important in terms of tuning perspective as it provides information about SQLs which need tuning.
During load testing one should compare this section for Release load test and with baseline load test. This comparison will give difference about which SQLs performed badly during baseline and release load test.
Looking at the column, Elapse per Exec (s), one can understand which SQLs need tuning. Normally SQLs taking time above 1 second are candidate for tuning.
For other sections like, SQL ordered by Gets, SQL ordered by CPU Time etc, the SQLs appearing under SQL ordered by Elapsed Time are listed most of the time. Hence this section is worth investigating.
Below is the brief description about each section,
- SQL ordered by CPU Time: SQLs are listed based on CPU time taken and are ordered by, CPU Time (s) column.
- SQL ordered by Gets: SQLs are listed based on Buffer Reads (Buffer Gets) for the SQL and are ordered by, Buffer Gets, column.
- SQL ordered by Reads: SQLs are listed based on Physical Reads (Disk Reads) for the SQL and are ordered by, Physical Reads, column.
- SQL ordered by Executions: SQLs are listed based on total no. of executions for the SQL and are ordered by, Executions, column.
For each of these sections, to get the complete SQL Text details, click on the SQL ID listed under SQL ID column.
2.12.Instance Activity Stats – Thread Activity
|log switches (derived)||2||2.00|
This section provides information about how many Redo Log Switches are happening. The “Total“ column provides information for log switches during the snapshot period and “per hour” column provides information per Hour basis.
2 log switches per hour is optimal value as per oracle standards.
3.1.Buffer Pool Advisory
|P||Size for Est (M)||Size Factor||Buffers for Estimate||Est Phys Read Factor||Estimated Physical Reads|
The section provides estimates on, how the increase or descrease of buffer cache size will casue decrease or increase in physical reads. This information is just an estimated data and not an actual data.
Starting point here is “Size Factor” = 1.0. This gives current memory allocation for Buffer Cache. In this example, 728 MB is being allocated to buffer cache. With this setting the estimated amount of Physical Reads are 676,024,586. In case we increase the memory allocation for buffer cache to say 1440 MB (“Size Factor” = 1.98) then estimated physical reads will be 347,627,804. This means by allocating additional 712 MB for Buffer Cache, total estimated physical reads will come down by 328,396,782.
On the other hand, by reducing Buffer Cache to say 216 MB (“Size Factor” = 0.30) estimated physical reads increase to 1,038,346,123.
The statistics acts as an input to DBA in order to tune the Buffer Cache memory.
3.2.PGA Memory Advisory
|PGA Target Est (MB)||Size Factr||W/A MB Processed||Estd Extra W/A MB Read/ Written to Disk||Estd PGA Cache Hit %||Estd PGA Overalloc Count|
Similar to Buffer Pool Advisory, the statistic provides information on how the increase or decrease in PGA memory will cause increase or decrease in Estd PGA Cahce Hit %.
Starting point here is “Size Factor” = 1.0. This gives current memory allocation for PGA. In this example 3072 MB is being allocated to PGA. With this allocation the Estd PGA Cahce Hit % is 97, which is good. Hence even if we increase PGA to 3686 MB we will get 2% increase in Estd PGA Cahce Hit %. Hence it won’t be advisable to increase PGA further.
3.3.Shared Pool Advisory
|Shared Pool Size(M)||SP Size Factr||Est LC Size (M)||Est LC Mem Obj||Est LC Time Saved (s)||Est LC Time Saved Factr||Est LC Load Time (s)||Est LC Load Time Factr||Est LC Mem Obj Hits|
Similar to Buffer Pool Advisory and PGA, the statistic provides information on how the increase or decrease in Shared pool memory will cause increase or decrease in Estd LC Load Time (s).
Starting point here is “SP Size Factor” = 1.0. This gives current memory allocation for shared pool. In this example 280 MB is being allocated to shared pool. With this allocation the Estd LC Load Time (s) is 57,751. If we increase the shared pool size to 308 then Estd LC Load Time (s) will come down to value 1. Hence shared pool should be set to 308 MB.
3.4.SGA Target Advisory
|SGA Target Size (M)||SGA Size Factor||Est DB Time (s)||Est Physical Reads|
The statistic provides information on how the increase or decrease in SGA memory will cause decrease or increase in Estd Physical Reads.
Starting point here is “SGA Size Factor” = 1.0. This gives current memory allocation for SGA. In this example 1024 MB is being allocated to SGA. With this allocation the Estd Physical Reads is 844,082,787. If we increase the SGA size by 1024 MB i.e. to 2048 MB then Estd Physical Reads will come down by 328,412,156. Since there is 50% reduction in Estd Physical Reads, the SGA should be increased to 2048 MB.
3.5.Undo Segment Summary
|Undo TS#||Num Undo Blocks (K)||Number of Transactions||Max Qry Len (s)||Max Tx Concurcy||Min/Max TR (mins)||STO/ OOS||uS/uR/uU/ eS/eR/eU|
The statistic proivides information about UNDO segments.
- Min/MAX TR (mins) – Represents Minimum and Maximum Tuned Retention Minutes for Undo data. This data will help to set the UNDO_RETENTION database parameter. In this example this parameter can be set to 15.25 minutes i.e. 915 seconds.
- Max Qry Len(s) – Represents Maximum query length in seconds. In this example the max query length is 135 seconds.
- STO/ OOS – Represents count for Sanpshot Too Old and Out Of Space errors, occurred during the snapshot period. In this example, we can see 0 errors occurred during this period.
4.1.Segments by Logical Reads
|Owner||Tablespace Name||Object Name||Subobject Name||Obj. Type||Logical Reads||%Total|
The statistic displays segment details based on logical reads happened. Data displayed is sorted on “Logical Reads” column in descending order. It provides information about segments for which more logical reads are happening. Most of these SQLs can be found under section SQL Statistics -> SQL ordered by Gets.
4.2.Segments by Physical Reads
|Owner||Tablespace Name||Object Name||Subobject Name||Obj. Type||Physical Reads||%Total|
The statistic displays segment details based on physical reads happened. Data displayed is sorted on “Physical Reads” column in descending order. It provides information about segments for which more physical reads are happening.
Queries using these segments should be analysed to check whether any FTS is happening on these segments. In case FTS is happening then proper indexes should be created to eliminate FTS. Most of these SQLs can be found under section SQL Statistics -> SQL ordered by Reads.
4.3.Segments by Row Lock Waits
|Owner||Tablespace Name||Object Name||Subobject Name||Obj. Type||Row Lock Waits||% of Capture|
The statistic displays segment details based on total “Row lock waits” which happened during snapshot period. Data displayed is sorted on “Row Lock Waits” column in descending order. It provides information about segments for which more database locking is happening.
DML statements using these segments should be analysed further to check the possibility of reducing concurrency due to row locking.
4.4.Segments by ITL Waits
|Owner||Tablespace Name||Object Name||Subobject Name||Obj. Type||ITL Waits||% of Capture|
Whenver a transaction modifies segment block, it first add transaction id in the Internal Transaction List table of the block. Size of this table is a block level configurable parameter. Based on the value of this parameter those many ITL slots are created in each block.
ITL wait happens in case total trasactions trying to update same block at the same time are greater than the ITL parameter value.
Total waits happening in the example are very less, 23 is the Max one. Hence it is not recommended to increase the ITL parameter value.
4.5.Segments by Buffer Busy Waits
|Owner||Tablespace Name||Object Name||Subobject Name||Obj. Type||Buffer Busy Waits||% of Capture|
Buffer busy waits happen when more than one transaction tries to access same block at the same time. In this scenario, the first transaction which acquires lock on the block will able to proceed further whereas other transaction waits for the first transaction to finish.
If there are more than one instances of a process continuously polling database by executing same SQL (to check if there are any records available for processing), same block is read concurrently by all the instances of a process and this result in Buffer Busy wait event.
4.6.Library Cache Activity
|Namespace||Get Requests||Pct Miss||Pin Requests||Pct Miss||Reloads||Invali- dations|
“Pct Misses” should be very low. In the example Pct Misses are above 10 for some of the library cache components. This indicates that the shared pool is not sufficiently sized. In case AMM (Automatic Memory Management) is used then the DBA can increase the SGA component. In case AMM is not used then increase the SHARED_POOL memory component.