PURPOSE
-------
This document discusses a rare and difficult to diagnose database performance
problem characterized by extremely high buffer busy waits that occur at
seemingly random times. The problem persists even after traditional buffer
busy wait tuning practices are followed (typically, increasing the number of
freelists for an object).
SCOPE & APPLICATION
-------------------
This document is intended for support analysts and customers. It applies to
both Unix and Windows-based systems, although the examples here will be
particular to a Unix-based (Solaris) system.
In addition to addressing a specific buffer busy wait performance problem,
in section II, this document presents various techniques to diagnose and
resolve this problem by using detailed data from a real-world example. The
techniques illustrated here may be used to diagnose other I/O and performance
problems.
RESOLVING INTENSE AND "RANDOM" BUFFER BUSY WAIT PERFORMANCE PROBLEMS
--------------------------------------------------------------------
This document is composed of two sections; a summary section that broadly
discusses the problem and its resolution, and a detailed diagnostics section
that shows how to collect and analyze various database and operating system
diagnostics related to this problem. The detailed diagnostics section is
provided to help educate the reader with techniques that may be useful in
other situations.
I. Summary
~~~~~~~~~~~
1. Problem Description
~~~~~~~~~~~~~~~~~~~~~~~
At seemingly random times without regard to overall load on the database,
the following symptoms may be witnessed:
- Slow response times on an instance-wide level
- long wait times for "buffer busy waits" in Bstat/Estat or Statpack reports
- large numbers of sessions waiting on buffer busy waits for a group of
objects (identified in v$session_wait)
Some tuning effort may have been spent in identifying the segments
involved in the buffer busy waits and rebuilding those segments with a higher
number of freelists or freelist groups (from 8.1.6 on one can dynamically add
process freelists; segments only need to be rebuilt if changing freelist
groups). Even after adding freelists, the problem continues and is not
diminished in any way (although regular, concurrency-based buffer busy waits
may be reduced).
2. Problem Diagnosis
~~~~~~~~~~~~~~~~~~~~~
The problem may be diagnosed by observing the following:
- The alert.log file shows many occurrences of ORA-600, ORA-7445 and
core dumps during or just before the time of the problem.
- The core_dump_dest directory contains large core dumps during the
time of the problem. There may either be many core dumps or a few
very large core dumps (100s of MB per core file), depending on the
size of the SGA.
- sar -d shows devices that are completely saturated and have high
request queues and service times. These devices and/or their
controllers are part of logical volumes used for database files.
- Buffer busy waits, write complete waits, db file parallel writes and
enqueue waits are high (in top 5 waits, usually in that order).
Note that in environments using Oracle Advanced Replication, the
buffer busy waits may at times appear on segments related to
replication (DEF$_AQCALL table, TRANORDER index, etc...).
3. Problem Resolution
~~~~~~~~~~~~~~~~~~~~~~
The cause for the buffer busy waits and other related waits might be a
saturated disk controller or subsystem impacting the database's ability to read
or write blocks. The disk/controller may be saturated because of the many
core dumps occurring simultaneously requiring hundreds of megabytes each. If
the alert.log or core_dump_dest directory has no evidence of core dumps, then
the source of the I/O saturation must be found. It may be due to non-database
processes, another database sharing the same filesystems, or a poorly tuned
I/O subsystem.
The solution is as follows:
1) Find the root cause for the I/O saturation (core dumps,
another process or database, or poorly performing I/O
subsystem) and resolve it.
OR,
2) If evidence of core dumps are found:
- Find the causes for the core dumps and resolve
them (patch, etc)
- Move the core_dump_dest location to a filesystem
not shared with database files.
- Use the following init.ora parameters to reduce
or avoid the core dumps:
shadow_core_dump = partial
background_core_dump = partial
These core dump parameters can also be set to "none"
but this is not recommended unless the causes for the
core dumps have been identified.
II. Detailed Diagnostics with Examples
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section provides details into the diagnostic tools and methods used
to resolve this problem and are intended to help illustrate the use of various
diagnostics that may be applied in other situations.
1. Diagnostics
~~~~~~~~~~~~~~~
The following diagnostics will help fingerprint the problem.
- alert.log and trace files in the background_dump_dest, user_dump_dest,
and core_dump_dest directories
- iostat or sar data (on Windows, the Performance Monitor utility with
disk counters activated) with filesystem mapping tools or techniques
(e.g., Veritas vxprint)
- StatsPack or BSTAT/ESTAT reports
- V$SESSION_WAIT
A. ALERT.LOG file
~~~~~~~~~~~~~~~~~
The alert.log file should always be checked in any database diagnostic effort.
In this case, the following entries were created during the time of the buffer
busy wait problem:
Wed May 16 00:38:15 2001
Errors in file /db_admin/prod/bdump/mlrep_s008_2731.trc:
ORA-07445: exception encountered: core dump [kzdurtc()+4] [SIGBUS] [Invalid address alignment] [859063291] [] []
Wed May 16 00:38:15 2001
Errors in file /db_admin/prod/bdump/mlrep_s014_2737.trc:
ORA-07445: exception encountered: core dump [kzdurtc()+4] [SIGSEGV] [Address not mapped to object] [926430458] [] []
Wed May 16 00:41:13 2001
Errors in file /db_admin/prod/bdump/mlrep_s005_930.trc:
ORA-07445: exception encountered: core dump [kzdurtc()+4] [SIGBUS] [Invalid address alignment] [1178678525] [] []
The stack traces corresponding to the core files indicated a problem during
the login process. It was discovered that a core dump could be caused if
users accidentally entered passwords greater than 48 characters. Furthermore,
in this web-based application, users did not receive an error and would retry
the login, producing another core dump with each click of the mouse.
B. SAR Diagnostics
~~~~~~~~~~~~~~~~~~
SAR, IOSTAT, or similar tools are critical to diagnosing this problem because
they show the health of the I/O system during the time of the problem. The
SAR data for the example we are looking at is shown below (shown
using "sar -d -f /var/adm/sa/sa16"):
SunOS prod1 5.6 Generic_105181-23 sun4u 05/16/01
01:00:00 device %busy avque r+w/s blks/s avwait avserv
sd22 100 72.4 2100 2971 0.0 87.0
sd22,c 0 0.0 0 0 0.0 0.0
sd22,d 0 0.0 0 0 0.0 0.0
sd22,e 100 72.4 2100 2971 0.0 87.0
/\
||
extremely high queue values (usually less than 2 during peak)
By mapping the sd22 device back to the device number (c3t8d0) and then back to
the logical volume through to the filesystem (using "df" and Veritas'
utility /usr/sbin/vxprint), it was determined the filesystem shared the same
controller (c3) as several database files (among them were the datafiles for
the SYSTEM tablespace).
By looking within the filesystems using the aforementioned controller (c3),
several very large (1.8 GB) core dumps were found in the core_dump_dest
directory, produced around the time of the problem.
C. StatsPack
~~~~~~~~~~~~~
During an episode of the performance problem, it is very helpful to collect
samples using StatsPack. The interval between samples is ideally about
15 minutes, but even 1-hour intervals are acceptable. It is very valuable to
also collect baseline samples of the same interval during similar activity
levels when there is NOT a problem. By having the baseline, one will be able
to see how certain statistics reflect the problem.
i. Waits Events: During the problem episode, "buffer busy waits" and "write
complete waits" will be seen as the highest wait events. During the baseline
load, these events are not significant. Other significant wait events during
the problem may be "enqueue" and "db file parallel writes". For example:
Avg
Total Wait wait Waits
Event Waits Timeouts Time (cs) (ms) /txn
---------------------------- ------------ ---------- ----------- ----- ------
buffer busy waits 225,559 211,961 24,377,029 1081 4.0
enqueue 25,731 21,756 6,786,722 2638 0.5
Parallel Query Idle Wait - S 9,980 7,929 1,762,606 1766 0.2
SQL*Net message from dblink 435,845 0 1,288,965 30 7.7
db file parallel write 4,252 0 1,287,023 3027 0.1
write complete waits 5,872 5,658 581,066 990 0.1
db file sequential read 1,249,981 0 510,793 4 22.0
ii. Statistics: There may be evidence of DB Writer falling behind and slow
write performance as follows:
- low number of DBWR Timeouts
- the ratio (free buffers inspected)/(free buffers requested) * 100% > 5%
- much higher number for "dirty buffers inspected" than normal baseline
The following lists some key statistics to look at:
Statistic Total per Second per Trans
----------------------- ---------------- ------------ ------------
consistent changes 43,523 12.1 2.4 Much
free buffer inspected 6,087 1.7 0.3 <== higher
free buffer requested 416,010 115.6 23.1 than
logons cumulative 15,718 4.4 0.9 normal
physical writes 24,757 6.9 1.4
write requests 634 0.2 0.0
It's important to look at workload parameters to ensure the problem isn't due
to simply more work being done. The statistic "consistent changes",
"logons cumulative", "physical writes", and "write requests" are all compared
to a baseline.
iii. Tablespace I/O Summary
The average wait times for tablespaces will be dramatically higher.
Tablespace IO Summary for DB: PROD Instance: PROD Snaps: 3578 - 3579
Avg Read Total Avg Wait
Tablespace Reads (ms) Writes Waits (ms)
----------- ----------- -------- ----------- ---------- --------
BOM 482,368 7.0 18,865 3,161 205.9 very
CONF 157,288 0.6 420 779 9897.3 <= high
CTXINDEX 36,628 0.5 7 4 12.5 very
RBS 613 605.7 23,398 8,253 7694.6 <= high
SYSTEM 18,360 3.6 286 78 745.5
DB_LOW_DATA 16,560 2.6 1,335 14 24.3
Similar statistics per datafile may be seen in the "File IO Statistics"
section of StatsPack.
iv. Buffer Busy Wait Statistics
Buffer wait Statistics for DB: PROD Instance: PROD Snaps: 3578 - 3579
Tot Wait Avg
Class Waits Time (cs) Time (cs)
------------------ ----------- ---------- ---------
data block 216,577 ########## 108 <== severe contention on
undo header 5,072 609,734 120 data blks
undo block 3,770 333,876 89 ("off the scale")
free list 70 17,426 249
segment header 8 34 4
v. Enqueues
The StatsPack data for this case study shows many SQ (sequence) enqueue waits.
SQ enqueues are held while Oracle updates the dictionary entry of the sequence
with the next value (or starting value for the next batch of sequence numbers
to cache). The object id of the sequence is found by looking at the p2 column
of v$session_wait for 'enqueue' events and selecting from DBA_OBJECTS using the
object_id (you must convert the value in the p2 column to decimal first).
In this case it was for sys.sessaud$. This means the SYSTEM tablespace is
being impacted by the I/O problems and is taking a long time to generate the
next sequence number for "sessaud$".
Enqueue activity for DB: MLREP Instance: MLREP Snaps: 3578 - 3579
Enqueue Gets Waits
---------- ------------ ----------
SQ 5,130 1,345 <== Drill down v$session_wait.p2 and
TX 80,735 63 DBA_OBJECTSfor object_id. In
SR 8,207 7 this case it was for sys.audses$
TM 93,225 4
CF 1,396 1
vi. Buffer Pool Statistics
The buffer pool statistics should be compared with the baseline. During the
performance problem write complete waits may be 10 to 100 times longer than
during the baseline load (without the problem):
Buffer Pool Sets for DB: PROD Instance: PROD Snaps: 3578 - 3579
Free Write Buffer
Set Buffer Consistent Physical Physical Buffer Complete Busy
Id Gets Gets Reads Writes Waits Waits Waits
--- ----------- ----------- ----------- ----------- -------- -------- --------
1 99,132 4,060,929 97,235 2,860 0 384 148,348
2 97,773 3,359,172 96,851 3,185 0 221 640
3 97,320 3,486,183 96,592 3,014 0 303 1,024
4 96,961 1,943,505 96,366 2,723 0 232 598
5 97,182 1,508,089 96,223 2,798 0 107 5,731
... /\
||
these are 10 times greater than baseline
D. V$SESSION_WAIT
~~~~~~~~~~~~~~~~~~
V$SESSION_WAIT can be used to see how many sessions are impacted by the buffer
busy wait problem in real-time using a query as follows:
SELECT count(*), event
FROM v$session_wait
WHERE wait_time = 0
AND event NOT IN ('smon timer','pmon timer','rdbms ipc message',
'SQL*Net message from client')
GROUP BY event
ORDER BY 1 DESC;
Output:
COUNT(*) EVENT
---------- ----------------------------------------------------------------
122 buffer busy waits
15 enqueue
15 enqueue
5 db file parallel write
3 SQL*Net message from dblink
2 latch free
1 SQL*Net more data to client
To see the file and block numbers in more detail:
SELECT count(*) NUM_WAITERS, p1 FILE#, p2 BLK#, p3 CLASS
FROM v$session_wait
WHERE event = 'buffer busy waits'
GROUP BY p1, p2, p3
NUM_WAITERS FILE# BLK# CLASS
------------ ----- ------- ------
92 2 13487 1016
73 2 27762 1016
32 1 29697 1016
This shows that during the execution of the above query, there were 92 waiters
for file 2, block 13487.
To find the object name and type for the objects being wait on, use the
following query (supplying the file numbers and block numbers shown above):
SELECT owner,segment_name,segment_type
FROM dba_extents
WHERE file_id=&file
AND &blockid BETWEEN block_id AND block_id + blocks
After querying against DBA_EXTENTS and supplying the file and block numbers,
the following correlation can be made:
NUM_WAITERS OWNER.SEGMENT_NAME TYPE FILE# BLK# CLASS
------------ ------------------- --------- ---- ------- ------
92 SYSTEM.DEF$_TRANORDER INDEX 2 13487 1016
73 SYSTEM.DEF$_AQCALL TABLE 2 27762 1016
32 SYSTEM.DEF$_TRANORDER INDEX 1 29697 1016
Normally, one would rebuild the above segments with more freelists or freelists
groups (in 8.1.6 or higher you can add process freelists without rebuilding
the segment) to reduce contention. However, in this case more freelists won't
help.
2. Resolving the Problem
~~~~~~~~~~~~~~~~~~~~~~~~~~
With the above diagnostics, it was possible to conclude that the core dumps
caused disk controller saturation and impeded the database from reading and
writing blocks. By scanning back through the alert.log file and looking for
occurrences of similar core dumps, it was possible to ascertain when the
problems had appeared. Looking at StatsPack data for the same time period
seen in the alert.log file, the buffer busy waits were indeed the highest
wait event.
Since the customer could not change the application to limit the password
length and there were no other places to put the core_dump_dest files, the
following changes were made to the init.ora file:
shadow_core_dump = partial
background_core_dump = partial
These changes caused the core dumps to be created AFTER the oracle processes
are detached from the SGA. The resulting core dumps were 24 MB on average
instead of 1.8 GB. Later occurrences of core dumps were seen in the alert.log
and cdump directory, with no buffer busy wait side effects or performance
degradation (and no I/O impact visible in sar data); this confirmed that
the init.ora changes solved the problem.