Tuesday, January 10, 2012

SOME TIPS ABOUT ASH

Oracle collects Active Session History (ASH) statistics (mostly wait statistics for different events) for all active sessions every second from v$session and stores them in a circular FIFO buffer in the SGA. ASH records are very recent session history within the last 5-10 mins. The MMNL (Manageability Monitor light - shows up as "Manageability Monitor light 2" process) process, if the buffer fills up before the AWR flushes (by default every hour) the MMNL process will flush the data to disk (data stored in dba_hist_active_session_history).
ASH resides in the SGAand it’s size is calculated by the lesser of:
  • total # of cpu x 2MB memory
  • 5% of shared pool
So on a 16 cpu server with a shared pool of 500MB
  • Ash desired size 16 x 2MB = 32MB
  • 5% of 500MB = 25MB (this is the lower so ASH will be 25MB)
ASH collects the following:
  • SQL_ID
  • SID
  • Client ID, Service ID
  • Program, Module, Action
  • Object, File, Block
  • Wait event number, actual wait time (if session is waiting) 
Some hidden parameters to control the behaviour of ASH-

To enable or disable ASH data collection-
_ash_enable=TRUE


_ash_sampling_interval = 1000 (milliseconds)
This is where the one second sampling is specified. I'm tempted to try to reduce this to a silly level and watch the server fall on it's back-side.

_ash_sample_all = FALSE

Oooh, this one would be fun! Why not sample all sessions include those that aren't Active (Hint, you would have an enormous growth in the volume of data generated so, again, I'm kidding)

_ash_disk_write_enable = TRUE

Whether samples are flushed to the workload repository or not. Might initially seem a good idea to save space or improve performance a little but (just a guess) I think it would confuse the hell out of ADDM when it couldn't find any ASH samples to correlate with the AWR information.

_ash_disk_filter_ratio = 10

Monday, January 9, 2012

CASE STUDY ON BUFFER BUSY WAIT EVENT

PURPOSE
-------

This document discusses a rare and difficult to diagnose database performance 
problem characterized by extremely high buffer busy waits that occur at 

seemingly random times.  The problem persists even after traditional buffer 
busy wait tuning practices are followed (typically, increasing the number of 
freelists for an object).  
 
SCOPE & APPLICATION

-------------------

This document is intended for support analysts and customers.  It applies to 
both Unix and Windows-based systems, although the examples here will be 
particular to a Unix-based (Solaris) system.


In addition to addressing a specific buffer busy wait performance problem, 
in section II, this document presents various techniques to diagnose and 
resolve this problem by using detailed data from a real-world example.  The 

techniques illustrated here may be used to diagnose other I/O and performance 
problems.



RESOLVING INTENSE AND "RANDOM" BUFFER BUSY WAIT PERFORMANCE PROBLEMS
--------------------------------------------------------------------


This document is composed of two sections; a summary section that broadly 
discusses the problem and its resolution, and a detailed diagnostics section 
that shows how to collect and analyze various database and operating system 

diagnostics related to this problem.  The detailed diagnostics section is 
provided to help educate the reader with techniques that may be useful in
other situations.


I.  Summary
~~~~~~~~~~~

1.  Problem Description

~~~~~~~~~~~~~~~~~~~~~~~

At seemingly random times without regard to overall load on the database, 
the following symptoms may be witnessed:

- Slow response times on an instance-wide level
- long wait times for "buffer busy waits" in Bstat/Estat or Statpack reports

- large numbers of sessions waiting on buffer busy waits for a group of 
        objects (identified in v$session_wait)
 
Some tuning effort may have been spent in identifying the segments 
involved in the buffer busy waits and rebuilding those segments with a higher 

number of freelists or freelist groups (from 8.1.6 on one can dynamically add 
process freelists; segments only need to be rebuilt if changing freelist 
groups).  Even after adding freelists, the problem continues and is not 

diminished in any way (although regular, concurrency-based buffer busy waits 
may be reduced).
 
 
2.  Problem Diagnosis
~~~~~~~~~~~~~~~~~~~~~
 
     The problem may be diagnosed by observing the following:

 
        - The alert.log file shows many occurrences of ORA-600, ORA-7445 and 
          core dumps during or just before the time of the problem.
 
        - The core_dump_dest directory contains large core dumps during the 

          time of the problem. There may either be many core dumps or a few 
          very large core dumps (100s of MB per core file), depending on the 
          size of the SGA.
 
        - sar -d shows devices that are completely saturated and have high 

          request queues and service times.  These devices and/or their 
          controllers are part of logical volumes used for database files.
 
        - Buffer busy waits, write complete waits, db file parallel writes and 

          enqueue waits are high (in top 5 waits, usually in that order).  
          Note that in environments using Oracle Advanced Replication, the 
          buffer busy waits may at times appear on segments related to 

          replication (DEF$_AQCALL table, TRANORDER index, etc...).
 
 
3.  Problem Resolution
~~~~~~~~~~~~~~~~~~~~~~
 
The cause for the buffer busy waits and other related waits might be a 
saturated disk controller or subsystem impacting the database's ability to read

 or write blocks.  The disk/controller may be saturated because of the many 
core dumps occurring simultaneously requiring hundreds of megabytes each.  If 
the alert.log or core_dump_dest directory has no evidence of core dumps, then 

the source of the I/O saturation must be found.  It may be due to non-database 
processes, another database sharing the same filesystems, or a poorly tuned 
I/O subsystem.
 
 The solution is as follows:


  1) Find the root cause for the I/O saturation (core dumps, 
                   another process or database, or poorly performing I/O 
                   subsystem) and resolve it.
 OR, 
  2) If evidence of core dumps are found:

   -  Find the causes for the core dumps and resolve 
                           them (patch, etc)
   -  Move the core_dump_dest location to a filesystem 
                           not shared with database files.

   -  Use the following init.ora parameters to reduce 
                           or avoid the core dumps:
    shadow_core_dump = partial
    background_core_dump = partial
   These core dump parameters can also be set to "none" 

                        but this is not recommended unless the causes for the 
                        core dumps have been identified.
 
 
 
II.  Detailed Diagnostics with Examples
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 
     This section provides details into the diagnostic tools and methods used 
to resolve this problem and are intended to help illustrate the use of various 
diagnostics that may be applied in other situations.

 
 
1.  Diagnostics
~~~~~~~~~~~~~~~
 
     The following diagnostics will help fingerprint the problem.
 
        - alert.log and trace files in the background_dump_dest, user_dump_dest,
          and core_dump_dest directories 

        - iostat or sar data (on Windows, the Performance Monitor utility with 
          disk counters activated) with filesystem mapping tools or techniques 
          (e.g., Veritas vxprint)
        - StatsPack or BSTAT/ESTAT reports

        - V$SESSION_WAIT
 

A. ALERT.LOG file
~~~~~~~~~~~~~~~~~

The alert.log file should always be checked in any database diagnostic effort. 
In this case, the following entries were created during the time of the buffer

busy wait problem:

Wed May 16 00:38:15 2001
Errors in file /db_admin/prod/bdump/mlrep_s008_2731.trc:
ORA-07445: exception encountered: core dump [kzdurtc()+4] [SIGBUS] [Invalid address alignment] [859063291] [] []

Wed May 16 00:38:15 2001
Errors in file /db_admin/prod/bdump/mlrep_s014_2737.trc:
ORA-07445: exception encountered: core dump [kzdurtc()+4] [SIGSEGV] [Address not mapped to object] [926430458] [] []
Wed May 16 00:41:13 2001

Errors in file /db_admin/prod/bdump/mlrep_s005_930.trc:
ORA-07445: exception encountered: core dump [kzdurtc()+4] [SIGBUS] [Invalid address alignment] [1178678525] [] []

The stack traces corresponding to the core files indicated a problem during 

the login process.  It was discovered that a core dump could be caused if 
users accidentally entered passwords greater than 48 characters.  Furthermore, 
in this web-based application, users did not receive an error and would retry 

the login, producing another core dump with each click of the mouse.


B. SAR Diagnostics
~~~~~~~~~~~~~~~~~~

SAR, IOSTAT, or similar tools are critical to diagnosing this problem because 
they show the health of the I/O system during the time of the problem.  The 

SAR data for the example we are looking at is shown below (shown 
using "sar -d -f /var/adm/sa/sa16"):

SunOS prod1 5.6 Generic_105181-23 sun4u    05/16/01

01:00:00 device        %busy   avque   r+w/s  blks/s  avwait  avserv


         sd22            100    72.4    2100    2971     0.0    87.0
         sd22,c            0     0.0       0       0     0.0     0.0
         sd22,d            0     0.0       0       0     0.0     0.0
         sd22,e          100    72.4    2100    2971     0.0    87.0

                                 /\
                                 ||
  extremely high queue values (usually less than 2 during peak)

By mapping the sd22 device back to the device number (c3t8d0) and then back to 

the logical volume through to the filesystem (using "df" and Veritas' 
utility /usr/sbin/vxprint), it was determined the filesystem shared the same 
controller (c3) as several database files (among them were the datafiles for 

the SYSTEM tablespace).  

By looking within the filesystems using the aforementioned controller (c3), 
several very large (1.8 GB) core dumps were found in the core_dump_dest 
directory, produced around the time of the problem.

 

C.  StatsPack
~~~~~~~~~~~~~

During an episode of the performance problem, it is very helpful to collect 
samples using StatsPack.  The interval between samples is ideally about 
15 minutes, but even 1-hour intervals are acceptable.  It is very valuable to 

also collect baseline samples of the same interval during similar activity 
levels when there is NOT a problem.  By having the baseline, one will be able 
to see how certain statistics reflect the problem.



i.  Waits Events:  During the problem episode, "buffer busy waits" and "write 
complete waits" will be seen as the highest wait events.  During the baseline 
load, these events are not significant.  Other significant wait events during 

the problem may be "enqueue" and "db file parallel writes".  For example:

                                                                  Avg
                                                      Total Wait  wait  Waits

Event                               Waits   Timeouts   Time (cs)  (ms)   /txn
---------------------------- ------------ ---------- ----------- ----- ------
buffer busy waits                 225,559    211,961  24,377,029  1081    4.0 

enqueue                            25,731     21,756   6,786,722  2638    0.5
Parallel Query Idle Wait - S        9,980      7,929   1,762,606  1766    0.2
SQL*Net message from dblink       435,845          0   1,288,965    30    7.7

db file parallel write              4,252          0   1,287,023  3027    0.1
write complete waits                5,872      5,658     581,066   990    0.1
db file sequential read         1,249,981          0     510,793     4   22.0



ii.  Statistics:  There may be evidence of DB Writer falling behind and slow 
write performance as follows:

   -  low number of DBWR Timeouts
   -  the ratio (free buffers inspected)/(free buffers requested) * 100% > 5%

   -  much higher number for "dirty buffers inspected" than normal baseline


The following lists some key statistics to look at:

Statistic                          Total   per Second    per Trans

----------------------- ---------------- ------------ ------------
consistent changes                43,523         12.1          2.4     Much
free buffer inspected              6,087          1.7          0.3 <== higher

free buffer requested            416,010        115.6         23.1     than
logons cumulative                 15,718          4.4          0.9     normal
physical writes                   24,757          6.9          1.4

write requests                       634          0.2          0.0

It's important to look at workload parameters to ensure the problem isn't due 
to simply more work being done.  The statistic "consistent changes", 

"logons cumulative", "physical writes", and "write requests" are all compared 
to a baseline.


iii.  Tablespace I/O Summary  

The average wait times for tablespaces will be dramatically higher.


Tablespace IO Summary for DB: PROD  Instance: PROD  Snaps:    3578 -   3579 
                                                                              
                        Avg Read                  Total Avg Wait

Tablespace        Reads   (ms)        Writes      Waits   (ms)  
----------- ----------- -------- ----------- ---------- --------
BOM            482,368      7.0      18,865      3,161    205.9    very
CONF           157,288      0.6         420        779   9897.3 <= high

CTXINDEX        36,628      0.5           7          4     12.5    very
RBS                613    605.7      23,398      8,253   7694.6 <= high
SYSTEM          18,360      3.6         286         78    745.5
DB_LOW_DATA     16,560      2.6       1,335         14     24.3


Similar statistics per datafile may be seen in the "File IO Statistics" 
section of StatsPack.


iv.  Buffer Busy Wait Statistics

Buffer wait Statistics for DB: PROD  Instance: PROD  Snaps:    3578 -   3579

                                                                              
                                 Tot Wait    Avg                              
Class                    Waits  Time (cs) Time (cs)                           

------------------ ----------- ---------- ---------                           
data block             216,577 ##########       108  <== severe contention on 
undo header              5,072    609,734       120      data blks                     

undo block               3,770    333,876        89      ("off the scale")                     
free list                   70     17,426       249                           
segment header               8         34         4                           



v.  Enqueues

The StatsPack data for this case study shows many SQ (sequence) enqueue waits. 
SQ enqueues are held while Oracle updates the dictionary entry of the sequence 
with the next value (or starting value for the next batch of sequence numbers 

to cache).  The object id of the sequence is found by looking at the p2 column 
of v$session_wait for 'enqueue' events and selecting from DBA_OBJECTS using the
object_id (you must convert the value in the p2 column to decimal first).  

In this case it was for sys.sessaud$.  This means the SYSTEM tablespace is 
being impacted by the I/O problems and is taking a long time to generate the 
next sequence number for "sessaud$".

Enqueue activity for DB: MLREP  Instance: MLREP  Snaps:    3578 -   3579      

                                                                              
Enqueue            Gets      Waits                                            
---------- ------------ ----------                                            

SQ                5,130      1,345  <== Drill down v$session_wait.p2 and  
TX               80,735         63       DBA_OBJECTSfor object_id. In  
SR                8,207          7       this case it was for sys.audses$                                     

TM               93,225          4                                            
CF                1,396          1            



vi.  Buffer Pool Statistics

The buffer pool statistics should be compared with the baseline.  During the 

performance problem write complete waits may be 10 to 100 times longer than 
during the baseline load  (without the problem):

Buffer Pool Sets for DB: PROD  Instance: PROD  Snaps:    3578 -   3579      
                                                                              

                                                        Free    Write   Buffer
Set      Buffer  Consistent    Physical    Physical   Buffer Complete     Busy
 Id        Gets        Gets       Reads      Writes    Waits    Waits    Waits

--- ----------- ----------- ----------- ----------- -------- -------- --------
  1      99,132   4,060,929      97,235       2,860        0      384  148,348
  2      97,773   3,359,172      96,851       3,185        0      221      640

  3      97,320   3,486,183      96,592       3,014        0      303    1,024
  4      96,961   1,943,505      96,366       2,723        0      232      598
  5      97,182   1,508,089      96,223       2,798        0      107    5,731

 ...                                                      /\
                                                                   ||
    these are 10 times greater than baseline



D.  V$SESSION_WAIT 
~~~~~~~~~~~~~~~~~~


V$SESSION_WAIT can be used to see how many sessions are impacted by the buffer 
busy wait problem in real-time using a query as follows:

SELECT count(*), event
FROM v$session_wait
WHERE wait_time = 0

 AND event NOT IN ('smon timer','pmon timer','rdbms ipc message', 
    'SQL*Net message from client')
GROUP BY event
ORDER BY 1 DESC;

Output:

  COUNT(*) EVENT
---------- ----------------------------------------------------------------

       122 buffer busy waits
   15 enqueue
   15 enqueue
    5 db file parallel write
         3 SQL*Net message from dblink
         2 latch free
         1 SQL*Net more data to client


To see the file and block numbers in more detail:


SELECT count(*) NUM_WAITERS, p1 FILE#, p2 BLK#, p3 CLASS
FROM v$session_wait
WHERE event = 'buffer busy waits'
GROUP BY p1, p2, p3

NUM_WAITERS    FILE#   BLK#     CLASS
------------   -----  -------  ------

92             2      13487      1016
73             2      27762      1016
32             1      29697      1016

This shows that during the execution of the above query, there were 92 waiters 
for file 2, block 13487.


To find the object name and type for the objects being wait on, use the 
following query (supplying the file numbers and block numbers shown above):

SELECT owner,segment_name,segment_type
FROM dba_extents

WHERE file_id=&file
AND &blockid BETWEEN block_id AND block_id + blocks

After querying against DBA_EXTENTS and supplying the file and block numbers, 
the following correlation can be made: 
 
NUM_WAITERS   OWNER.SEGMENT_NAME       TYPE     FILE#    BLK#      CLASS

------------  -------------------    ---------  ----   -------    ------
92             SYSTEM.DEF$_TRANORDER   INDEX     2      13487      1016
73             SYSTEM.DEF$_AQCALL      TABLE     2      27762      1016

32             SYSTEM.DEF$_TRANORDER   INDEX     1      29697      1016

Normally, one would rebuild the above segments with more freelists or freelists
 groups (in 8.1.6 or higher you can add process freelists without rebuilding 

the segment) to reduce contention.  However, in this case more freelists won't 
help.



2.  Resolving the Problem
~~~~~~~~~~~~~~~~~~~~~~~~~~

With the above diagnostics, it was possible to conclude that the core dumps 

caused disk controller saturation and impeded the database from reading and 
writing blocks.  By scanning back through the alert.log file and looking for 
occurrences of similar core dumps, it was possible to ascertain when the 

problems had appeared.  Looking at StatsPack data for the same time period 
seen in the alert.log file, the buffer busy waits were indeed the highest 
wait event.

Since the customer could not change the application to limit the password 

length and there were no other places to put the core_dump_dest files, the 
following changes were made to the init.ora file:

 shadow_core_dump = partial
 background_core_dump = partial

These changes caused the core dumps to be created AFTER the oracle processes 

are detached from the SGA.  The resulting core dumps were 24 MB on average 
instead of 1.8 GB.  Later occurrences of core dumps were seen in the alert.log 
and cdump directory, with no buffer busy wait side effects or performance 

degradation (and no I/O impact visible in sar data); this confirmed that 
the init.ora changes solved the problem.