ORACLE DBA ESSENTIAL

Tuesday, January 10, 2012

SOME TIPS ABOUT ASH

Oracle collects Active Session History (ASH) statistics (mostly wait statistics for different events) for all active sessions every second from v$session and stores them in a circular FIFO buffer in the SGA. ASH records are very recent session history within the last 5-10 mins. The MMNL (Manageability Monitor light - shows up as "Manageability Monitor light 2" process) process, if the buffer fills up before the AWR flushes (by default every hour) the MMNL process will flush the data to disk (data stored in dba_hist_active_session_history).
ASH resides in the SGAand it’s size is calculated by the lesser of:

total # of cpu x 2MB memory
5% of shared pool

So on a 16 cpu server with a shared pool of 500MB

Ash desired size 16 x 2MB = 32MB
5% of 500MB = 25MB (this is the lower so ASH will be 25MB)

ASH collects the following:

SQL_ID
SID
Client ID, Service ID
Program, Module, Action
Object, File, Block
Wait event number, actual wait time (if session is waiting)

Some hidden parameters to control the behaviour of ASH-

To enable or disable ASH data collection-
_ash_enable=TRUE

_ash_sampling_interval = 1000 (milliseconds)
This is where the one second sampling is specified. I'm tempted to try to reduce this to a silly level and watch the server fall on it's back-side.

_ash_sample_all = FALSE

Oooh, this one would be fun! Why not sample all sessions include those that aren't Active (Hint, you would have an enormous growth in the volume of data generated so, again, I'm kidding)

_ash_disk_write_enable = TRUE

Whether samples are flushed to the workload repository or not. Might initially seem a good idea to save space or improve performance a little but (just a guess) I think it would confuse the hell out of ADDM when it couldn't find any ASH samples to correlate with the AWR information.

_ash_disk_filter_ratio = 10

Monday, January 9, 2012

CASE STUDY ON BUFFER BUSY WAIT EVENT

PURPOSE
-------

This document discusses a rare and difficult to diagnose database performance 
problem characterized by extremely high buffer busy waits that occur at 

seemingly random times.  The problem persists even after traditional buffer 
busy wait tuning practices are followed (typically, increasing the number of 
freelists for an object).  
 
SCOPE & APPLICATION

-------------------

This document is intended for support analysts and customers.  It applies to 
both Unix and Windows-based systems, although the examples here will be 
particular to a Unix-based (Solaris) system.


In addition to addressing a specific buffer busy wait performance problem, 
in section II, this document presents various techniques to diagnose and 
resolve this problem by using detailed data from a real-world example.  The 

techniques illustrated here may be used to diagnose other I/O and performance 
problems.



RESOLVING INTENSE AND "RANDOM" BUFFER BUSY WAIT PERFORMANCE PROBLEMS
--------------------------------------------------------------------


This document is composed of two sections; a summary section that broadly 
discusses the problem and its resolution, and a detailed diagnostics section 
that shows how to collect and analyze various database and operating system 

diagnostics related to this problem.  The detailed diagnostics section is 
provided to help educate the reader with techniques that may be useful in
other situations.


I.  Summary
~~~~~~~~~~~

1.  Problem Description

~~~~~~~~~~~~~~~~~~~~~~~

At seemingly random times without regard to overall load on the database, 
the following symptoms may be witnessed:

- Slow response times on an instance-wide level
- long wait times for "buffer busy waits" in Bstat/Estat or Statpack reports

- large numbers of sessions waiting on buffer busy waits for a group of 
        objects (identified in v$session_wait)
 
Some tuning effort may have been spent in identifying the segments 
involved in the buffer busy waits and rebuilding those segments with a higher 

number of freelists or freelist groups (from 8.1.6 on one can dynamically add 
process freelists; segments only need to be rebuilt if changing freelist 
groups).  Even after adding freelists, the problem continues and is not 

diminished in any way (although regular, concurrency-based buffer busy waits 
may be reduced).
 
 
2.  Problem Diagnosis
~~~~~~~~~~~~~~~~~~~~~
 
     The problem may be diagnosed by observing the following:

 
        - The alert.log file shows many occurrences of ORA-600, ORA-7445 and 
          core dumps during or just before the time of the problem.
 
        - The core_dump_dest directory contains large core dumps during the 

          time of the problem. There may either be many core dumps or a few 
          very large core dumps (100s of MB per core file), depending on the 
          size of the SGA.
 
        - sar -d shows devices that are completely saturated and have high 

          request queues and service times.  These devices and/or their 
          controllers are part of logical volumes used for database files.
 
        - Buffer busy waits, write complete waits, db file parallel writes and 

          enqueue waits are high (in top 5 waits, usually in that order).  
          Note that in environments using Oracle Advanced Replication, the 
          buffer busy waits may at times appear on segments related to 

          replication (DEF$_AQCALL table, TRANORDER index, etc...).
 
 
3.  Problem Resolution
~~~~~~~~~~~~~~~~~~~~~~
 
The cause for the buffer busy waits and other related waits might be a 
saturated disk controller or subsystem impacting the database's ability to read

 or write blocks.  The disk/controller may be saturated because of the many 
core dumps occurring simultaneously requiring hundreds of megabytes each.  If 
the alert.log or core_dump_dest directory has no evidence of core dumps, then 

the source of the I/O saturation must be found.  It may be due to non-database 
processes, another database sharing the same filesystems, or a poorly tuned 
I/O subsystem.
 
 The solution is as follows:


  1) Find the root cause for the I/O saturation (core dumps, 
                   another process or database, or poorly performing I/O 
                   subsystem) and resolve it.
 OR, 
  2) If evidence of core dumps are found:

   -  Find the causes for the core dumps and resolve 
                           them (patch, etc)
   -  Move the core_dump_dest location to a filesystem 
                           not shared with database files.

   -  Use the following init.ora parameters to reduce 
                           or avoid the core dumps:
    shadow_core_dump = partial
    background_core_dump = partial
   These core dump parameters can also be set to "none" 

                        but this is not recommended unless the causes for the 
                        core dumps have been identified.
 
 
 
II.  Detailed Diagnostics with Examples
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 
     This section provides details into the diagnostic tools and methods used 
to resolve this problem and are intended to help illustrate the use of various 
diagnostics that may be applied in other situations.

 
 
1.  Diagnostics
~~~~~~~~~~~~~~~
 
     The following diagnostics will help fingerprint the problem.
 
        - alert.log and trace files in the background_dump_dest, user_dump_dest,
          and core_dump_dest directories 

        - iostat or sar data (on Windows, the Performance Monitor utility with 
          disk counters activated) with filesystem mapping tools or techniques 
          (e.g., Veritas vxprint)
        - StatsPack or BSTAT/ESTAT reports

        - V$SESSION_WAIT
 

A. ALERT.LOG file
~~~~~~~~~~~~~~~~~

The alert.log file should always be checked in any database diagnostic effort. 
In this case, the following entries were created during the time of the buffer

busy wait problem:

Wed May 16 00:38:15 2001
Errors in file /db_admin/prod/bdump/mlrep_s008_2731.trc:
ORA-07445: exception encountered: core dump [kzdurtc()+4] [SIGBUS] [Invalid address alignment] [859063291] [] []

Wed May 16 00:38:15 2001
Errors in file /db_admin/prod/bdump/mlrep_s014_2737.trc:
ORA-07445: exception encountered: core dump [kzdurtc()+4] [SIGSEGV] [Address not mapped to object] [926430458] [] []
Wed May 16 00:41:13 2001

Errors in file /db_admin/prod/bdump/mlrep_s005_930.trc:
ORA-07445: exception encountered: core dump [kzdurtc()+4] [SIGBUS] [Invalid address alignment] [1178678525] [] []

The stack traces corresponding to the core files indicated a problem during 

the login process.  It was discovered that a core dump could be caused if 
users accidentally entered passwords greater than 48 characters.  Furthermore, 
in this web-based application, users did not receive an error and would retry 

the login, producing another core dump with each click of the mouse.


B. SAR Diagnostics
~~~~~~~~~~~~~~~~~~

SAR, IOSTAT, or similar tools are critical to diagnosing this problem because 
they show the health of the I/O system during the time of the problem.  The 

SAR data for the example we are looking at is shown below (shown 
using "sar -d -f /var/adm/sa/sa16"):

SunOS prod1 5.6 Generic_105181-23 sun4u    05/16/01

01:00:00 device        %busy   avque   r+w/s  blks/s  avwait  avserv


         sd22            100    72.4    2100    2971     0.0    87.0
         sd22,c            0     0.0       0       0     0.0     0.0
         sd22,d            0     0.0       0       0     0.0     0.0
         sd22,e          100    72.4    2100    2971     0.0    87.0

                                 /\
                                 ||
  extremely high queue values (usually less than 2 during peak)

By mapping the sd22 device back to the device number (c3t8d0) and then back to 

the logical volume through to the filesystem (using "df" and Veritas' 
utility /usr/sbin/vxprint), it was determined the filesystem shared the same 
controller (c3) as several database files (among them were the datafiles for 

the SYSTEM tablespace).  

By looking within the filesystems using the aforementioned controller (c3), 
several very large (1.8 GB) core dumps were found in the core_dump_dest 
directory, produced around the time of the problem.

 

C.  StatsPack
~~~~~~~~~~~~~

During an episode of the performance problem, it is very helpful to collect 
samples using StatsPack.  The interval between samples is ideally about 
15 minutes, but even 1-hour intervals are acceptable.  It is very valuable to 

also collect baseline samples of the same interval during similar activity 
levels when there is NOT a problem.  By having the baseline, one will be able 
to see how certain statistics reflect the problem.



i.  Waits Events:  During the problem episode, "buffer busy waits" and "write 
complete waits" will be seen as the highest wait events.  During the baseline 
load, these events are not significant.  Other significant wait events during 

the problem may be "enqueue" and "db file parallel writes".  For example:

                                                                  Avg
                                                      Total Wait  wait  Waits

Event                               Waits   Timeouts   Time (cs)  (ms)   /txn
---------------------------- ------------ ---------- ----------- ----- ------
buffer busy waits                 225,559    211,961  24,377,029  1081    4.0 

enqueue                            25,731     21,756   6,786,722  2638    0.5
Parallel Query Idle Wait - S        9,980      7,929   1,762,606  1766    0.2
SQL*Net message from dblink       435,845          0   1,288,965    30    7.7

db file parallel write              4,252          0   1,287,023  3027    0.1
write complete waits                5,872      5,658     581,066   990    0.1
db file sequential read         1,249,981          0     510,793     4   22.0



ii.  Statistics:  There may be evidence of DB Writer falling behind and slow 
write performance as follows:

   -  low number of DBWR Timeouts
   -  the ratio (free buffers inspected)/(free buffers requested) * 100% > 5%

   -  much higher number for "dirty buffers inspected" than normal baseline


The following lists some key statistics to look at:

Statistic                          Total   per Second    per Trans

----------------------- ---------------- ------------ ------------
consistent changes                43,523         12.1          2.4     Much
free buffer inspected              6,087          1.7          0.3 <== higher

free buffer requested            416,010        115.6         23.1     than
logons cumulative                 15,718          4.4          0.9     normal
physical writes                   24,757          6.9          1.4

write requests                       634          0.2          0.0

It's important to look at workload parameters to ensure the problem isn't due 
to simply more work being done.  The statistic "consistent changes", 

"logons cumulative", "physical writes", and "write requests" are all compared 
to a baseline.


iii.  Tablespace I/O Summary  

The average wait times for tablespaces will be dramatically higher.


Tablespace IO Summary for DB: PROD  Instance: PROD  Snaps:    3578 -   3579 
                                                                              
                        Avg Read                  Total Avg Wait

Tablespace        Reads   (ms)        Writes      Waits   (ms)  
----------- ----------- -------- ----------- ---------- --------
BOM            482,368      7.0      18,865      3,161    205.9    very
CONF           157,288      0.6         420        779   9897.3 <= high

CTXINDEX        36,628      0.5           7          4     12.5    very
RBS                613    605.7      23,398      8,253   7694.6 <= high
SYSTEM          18,360      3.6         286         78    745.5
DB_LOW_DATA     16,560      2.6       1,335         14     24.3


Similar statistics per datafile may be seen in the "File IO Statistics" 
section of StatsPack.


iv.  Buffer Busy Wait Statistics

Buffer wait Statistics for DB: PROD  Instance: PROD  Snaps:    3578 -   3579

                                                                              
                                 Tot Wait    Avg                              
Class                    Waits  Time (cs) Time (cs)                           

------------------ ----------- ---------- ---------                           
data block             216,577 ##########       108  <== severe contention on 
undo header              5,072    609,734       120      data blks                     

undo block               3,770    333,876        89      ("off the scale")                     
free list                   70     17,426       249                           
segment header               8         34         4                           



v.  Enqueues

The StatsPack data for this case study shows many SQ (sequence) enqueue waits. 
SQ enqueues are held while Oracle updates the dictionary entry of the sequence 
with the next value (or starting value for the next batch of sequence numbers 

to cache).  The object id of the sequence is found by looking at the p2 column 
of v$session_wait for 'enqueue' events and selecting from DBA_OBJECTS using the
object_id (you must convert the value in the p2 column to decimal first).  

In this case it was for sys.sessaud$.  This means the SYSTEM tablespace is 
being impacted by the I/O problems and is taking a long time to generate the 
next sequence number for "sessaud$".

Enqueue activity for DB: MLREP  Instance: MLREP  Snaps:    3578 -   3579      

                                                                              
Enqueue            Gets      Waits                                            
---------- ------------ ----------                                            

SQ                5,130      1,345  <== Drill down v$session_wait.p2 and  
TX               80,735         63       DBA_OBJECTSfor object_id. In  
SR                8,207          7       this case it was for sys.audses$                                     

TM               93,225          4                                            
CF                1,396          1            



vi.  Buffer Pool Statistics

The buffer pool statistics should be compared with the baseline.  During the 

performance problem write complete waits may be 10 to 100 times longer than 
during the baseline load  (without the problem):

Buffer Pool Sets for DB: PROD  Instance: PROD  Snaps:    3578 -   3579      
                                                                              

                                                        Free    Write   Buffer
Set      Buffer  Consistent    Physical    Physical   Buffer Complete     Busy
 Id        Gets        Gets       Reads      Writes    Waits    Waits    Waits

--- ----------- ----------- ----------- ----------- -------- -------- --------
  1      99,132   4,060,929      97,235       2,860        0      384  148,348
  2      97,773   3,359,172      96,851       3,185        0      221      640

  3      97,320   3,486,183      96,592       3,014        0      303    1,024
  4      96,961   1,943,505      96,366       2,723        0      232      598
  5      97,182   1,508,089      96,223       2,798        0      107    5,731

 ...                                                      /\
                                                                   ||
    these are 10 times greater than baseline



D.  V$SESSION_WAIT 
~~~~~~~~~~~~~~~~~~


V$SESSION_WAIT can be used to see how many sessions are impacted by the buffer 
busy wait problem in real-time using a query as follows:

SELECT count(*), event
FROM v$session_wait
WHERE wait_time = 0

 AND event NOT IN ('smon timer','pmon timer','rdbms ipc message', 
    'SQL*Net message from client')
GROUP BY event
ORDER BY 1 DESC;

Output:

  COUNT(*) EVENT
---------- ----------------------------------------------------------------

       122 buffer busy waits
   15 enqueue
   15 enqueue
    5 db file parallel write
         3 SQL*Net message from dblink
         2 latch free
         1 SQL*Net more data to client


To see the file and block numbers in more detail:


SELECT count(*) NUM_WAITERS, p1 FILE#, p2 BLK#, p3 CLASS
FROM v$session_wait
WHERE event = 'buffer busy waits'
GROUP BY p1, p2, p3

NUM_WAITERS    FILE#   BLK#     CLASS
------------   -----  -------  ------

92             2      13487      1016
73             2      27762      1016
32             1      29697      1016

This shows that during the execution of the above query, there were 92 waiters 
for file 2, block 13487.


To find the object name and type for the objects being wait on, use the 
following query (supplying the file numbers and block numbers shown above):

SELECT owner,segment_name,segment_type
FROM dba_extents

WHERE file_id=&file
AND &blockid BETWEEN block_id AND block_id + blocks

After querying against DBA_EXTENTS and supplying the file and block numbers, 
the following correlation can be made: 
 
NUM_WAITERS   OWNER.SEGMENT_NAME       TYPE     FILE#    BLK#      CLASS

------------  -------------------    ---------  ----   -------    ------
92             SYSTEM.DEF$_TRANORDER   INDEX     2      13487      1016
73             SYSTEM.DEF$_AQCALL      TABLE     2      27762      1016

32             SYSTEM.DEF$_TRANORDER   INDEX     1      29697      1016

Normally, one would rebuild the above segments with more freelists or freelists
 groups (in 8.1.6 or higher you can add process freelists without rebuilding 

the segment) to reduce contention.  However, in this case more freelists won't 
help.



2.  Resolving the Problem
~~~~~~~~~~~~~~~~~~~~~~~~~~

With the above diagnostics, it was possible to conclude that the core dumps 

caused disk controller saturation and impeded the database from reading and 
writing blocks.  By scanning back through the alert.log file and looking for 
occurrences of similar core dumps, it was possible to ascertain when the 

problems had appeared.  Looking at StatsPack data for the same time period 
seen in the alert.log file, the buffer busy waits were indeed the highest 
wait event.

Since the customer could not change the application to limit the password 

length and there were no other places to put the core_dump_dest files, the 
following changes were made to the init.ora file:

 shadow_core_dump = partial
 background_core_dump = partial

These changes caused the core dumps to be created AFTER the oracle processes 

are detached from the SGA.  The resulting core dumps were 24 MB on average 
instead of 1.8 GB.  Later occurrences of core dumps were seen in the alert.log 
and cdump directory, with no buffer busy wait side effects or performance 

degradation (and no I/O impact visible in sar data); this confirmed that 
the init.ora changes solved the problem.

Wednesday, May 4, 2011

When agent unable to upload files to grid repository

When a grid control agent was installed during a RAC setup an error showed up at the end of the agent configuration assistant.

The agent was started successfully, but it was not possible to force the agent to have the XML files uploaded to the grid server, it was when the dreaded error "EMD upload error: uploadXMLFiles skipped :: OMS version not checked yet.." showed up.

The problem basically was due to a password typing error during the configuration phase.

The Troubleshooting Procedure Outline was:

Setup the Agent Home environment

export AGENT_HOME=/u01/app/oracle/pruduct/10.2.0/agent10g

export PATH=$AGENT_HOME/bin:$PATH

Startup the Agent

emctl start agent

Clear the Agent status

emctl clearstate agent

Reset Credentials

emctl secure agent

Retry and verify synchronization operations

emctl upload agent

emctl status agent

###

### EM Agent Troubleshooting log

###

$ export AGENT_HOME=/u01/app/oracle/product/10.2.0/agent10g

$ export PATH=$AGENT_HOME/bin:$PATH

$ # Start agent

$ emctl start agent

Oracle Enterprise Manager 10g Release 10.2.0.1.0.

Copyright (c) 1996, 2005 Oracle Corporation. All rights reserved.

Starting agent ......... started.

$

$

$ # Trying to force XML files uploading

mct$ emctl upload agent

Oracle Enterprise Manager 10g Release 10.2.0.1.0.

Copyright (c) 1996, 2005 Oracle Corporation. All rights reserved.

---------------------------------------------------------------

EMD upload error: uploadXMLFiles skipped :: OMS version not checked yet..

$ # Clear EM Agent status

$ emctl clearstate agent

Oracle Enterprise Manager 10g Release 10.2.0.1.0.

Copyright (c) 1996, 2005 Oracle Corporation. All rights reserved.

EMD clearstate completed successfully

$ # Configuring OMS Credentials

$ emctl secure agent

Oracle Enterprise Manager 10g Release 10.2.0.1.0.

Copyright (c) 1996, 2005 Oracle Corporation. All rights reserved.

Enter Agent Registration password : *******

Agent successfully stopped... Done.

Securing agent... Started.

Requesting an HTTPS Upload URL from the OMS... Done.

Requesting an Oracle Wallet and Agent Key from the OMS... Done.

Check if HTTPS Upload URL is accessible from the agent... Done.

Configuring Agent for HTTPS in CENTRAL_AGENT mode... Done.

EMD_URL set in /u01/app/oracle/product/10.2.0/agent10g/eg6881.us.oracle.com/sysm

an/config/emd.properties

Securing agent... Successful.

Agent successfully restarted... Done.

$ # Retrying operations

$ emctl status agent

Oracle Enterprise Manager 10g Release 10.2.0.1.0.

Copyright (c) 1996, 2005 Oracle Corporation. All rights reserved.

---------------------------------------------------------------

Agent Version : 10.2.0.1.0

OMS Version : 10.2.0.1.0

Protocol Version : 10.2.0.0.0

Agent Home : /u01/app/oracle/product/10.2.0/agent10g/eg6881.us.oracle.com

Agent binaries : /u01/app/oracle/product/10.2.0/agent10g

Agent Process ID : 5307

Parent Process ID : 5260

Agent URL : https://eg6881.us.oracle.com:3872/emd/main

Repository URL : https://eg6876.us.oracle.com:1159/em/upload

Started at : 2008-07-22 15:40:37

Started by user : oracle

Last Reload : 2008-07-22 15:40:37

Last successful upload : 2008-07-22 15:40:58

Total Megabytes of XML files uploaded so far : 0.79

Number of XML files pending upload : 18

Size of XML files pending upload(MB) : 1.86

Available disk space on upload filesystem : 27.54%

Last successful heartbeat to OMS : 2008-07-22 15:40:43

---------------------------------------------------------------

Agent is Running and Ready

$ emctl upload agent

Oracle Enterprise Manager 10g Release 10.2.0.1.0.

Copyright (c) 1996, 2005 Oracle Corporation. All rights reserved.

---------------------------------------------------------------

EMD upload completed successfully

$ emctl status agent

Oracle Enterprise Manager 10g Release 10.2.0.1.0.

Copyright (c) 1996, 2005 Oracle Corporation. All rights reserved.

---------------------------------------------------------------

Agent Version : 10.2.0.1.0

OMS Version : 10.2.0.1.0

Protocol Version : 10.2.0.0.0

Agent Home : /u01/app/oracle/product/10.2.0/agent10g/eg6881.us.oracle.com

Agent binaries : /u01/app/oracle/product/10.2.0/agent10g

Agent Process ID : 5307

Parent Process ID : 5260

Agent URL : https://eg6881.us.oracle.com:3872/emd/main

Repository URL : https://eg6876.us.oracle.com:1159/em/upload

Started at : 2008-07-22 15:40:37

Started by user : oracle

Last Reload : 2008-07-22 15:40:37

Last successful upload : 2008-07-22 15:41:13

Total Megabytes of XML files uploaded so far : 2.65

Number of XML files pending upload : 1

Size of XML files pending upload(MB) : 0.03

Available disk space on upload filesystem : 27.54%

Last successful heartbeat to OMS : 2008-07-22 15:40:43

---------------------------------------------------------------

Agent is Running and Ready

Wednesday, July 28, 2010

HOW TO SET-UP RAC IN 10G USING OPENFILER FOR VIRTUAL STORAGE

2- NODE RAC SETUP WITH OPENFILER

PART-A Open-filer Set-up

ü Install openfiler on a machine (Leave 60GB free space on the hdd)

ü Login to root user

ü Start iSCSI target service

# service iscsi-target start

# chkconfig –level 345 iscsi-target on

PART –B Configuring Storage on openfiler

ü From any client machine open the browser and access openfiler console (446 ports).

ü Open system tab and update the local N/W configuration for both nodes with netmask (255.255.255.0).

ü Create a single primary partition with partition type as “Physical Volume”.

ü Create a volume group.

ü Now create volumes on volume group(possible three volumes one for OCR, one for Voting Disk and another one for ASM)

Note: - To create multiple volumes with openfiler we need to use Multipathing that is quite complex that’s why here we are going for a single volume.

ü Edit the property of each volume and change access to allow.

PART –C N/W Configuration on Linux Nodes

ü Assign the ips

ü Edit the /etc/hosts on both nodes as given below

Oracle RAC Node 1- (node1)
Device	IP Address	Subnet	Gateway	Purpose
Eth0	192.9.201.183	255.255.255.0	192.9.201.1	Connect node to the public network
Eth1	10.0.0.1	255.255.255.0		Connect node1 (interconnect) to node2
/etc/hosts
127.0.0.1 localhost.localdomain localhost #Public network – (eth0) 192.9.201.183 node1.oracle.com node1 192.9.201.187 node2,oracle.com node2 #Private Interconnect – (eth1) 10.0.0.1 node1-priv.oracle.com node1-priv 10.0.0.2 node2-priv.oracle.com node2-priv #Public Virtual IP (VIP Addresses) – (eth0:1) 192.9.201.184 node1-vip.oracle.com node1-vip 192.9.201.187 node2-vip.oracle.com node2-vip #Private Storage network for openfiler 192.9.201.182 openfiler.oracle.com openfiler

Oracle RAC Node 2- (node2)
Device	IP Address	Subnet	Gateway	Purpose
Eth0	192.9.201.187	255.255.255.0	192.9.201.1	Connect node to the public network
Eth1	10.0.0.2	255.255.255.0		Connect node1 (interconnect) to node2
/etc/hosts
127.0.0.1 localhost.localdomain localhost #Public network – (eth0) 192.9.201.183 node1.oracle.com node1 192.9.201.187 node2,oracle.com node2 #Private Interconnect – (eth1) 10.0.0.1 node1-priv.oracle.com node1-priv 10.0.0.2 node2-priv.oracle.com node2-priv #Public Virtual IP (VIP Addresses) – (eth0:1) 192.9.201.184 node1-vip.oracle.com node1-vip 192.9.201.187 node2-vip.oracle.com node2-vip #Private Storage network for openfiler 192.9.201.182 openfiler.oracle.com openfiler Note:- Node wise perform ping operation For node1:- ping node2.oracle.com ping node2 ping node2-priv.oracle.com ping node2.priv ping openfiler.oracle.com ping openfiler For node2:- ping node1.oracle.com ping node1 ping node1-priv.oracle.com ping node1.priv ping openfiler.oracle.com ping openfiler

PART – D Node Configuration

ü Set Kernel Parameters (/etc/sysctl.conf)

kernel.shmall = 2097152

kernel.shmmax = 2147483648

kernel.shmmni = 4096

kernel.sem = 250 32000 100 128

fs.file-max = 65536

net.ipv4.ip_local_port_range = 1024 65000

net.core.rmem_default = 262144

net.core.rmem_max = 262144

net.core.wmem_default = 262144

net.core.wmem_max = 262144

ü Configure /etc/security/limits.conf file

oracle soft nproc 2047

oracle hard nproc 16384

oracle soft nofile 1024

oracle hard nofile 65536

ü Configure /etc/pam.d/login file

session required /lib/security/pam_limits.so

ü Create user and groups on both nodes

# groupadd oinstall

# groupadd dba

# groupadd oper

# useradd –g oinstall –G dba oracle

# passwd oracle

ü Create required directories and set the ownership and permission.

# mkdir –p /u01/crs1020

# mkdir –p /u01/app/oracle

# chown –R oracle:oinstall /u01/

# chmod –R 755 /u01/

ü Set the environment variables

$ vi .bash_profile

ORACLE_BASE=/u01/app/oracle/; export ORACLE_BASE

ORA_CRS_HOME=/u01/crs1020; export ORA_CRS_HOME

LD_ASSUME_KERNEL=2.4.19; export LD_ASSUME_KERNEL

LANG=”en_US”; export LANG

ü Dump all the required software

ü Configure ssh

ü Run cluster verifier

ü Create raw devices

SSH Configuration

On node1:- $ssh-keygen –t rsa

           $ssh-keygen –t dsa

On node2:- $ssh-keygen –t rsa

           $ssh-keygen –t dsa

On node1:- $cd .ssh

           $cat *.pub>>node1

On node2:- $cd .ssh

           $cat *.pub>>node2

On node1:- $scp node1 node2:/home/oracle/.ssh

On node2:- $scp node2 node2:/home/oracle/.ssh

On node1:- $cat node*>>authowized_keys

On node2:- $cat node*>>authowized_keys

Now test the ssh configuration

To run cluster verifier

On node1 :-$cd /…/stage…/cluster…/cluvfy

           $./runcluvfy stage –pre crsinst –n node1,node2

It should given a clean cheat.

PART –F iSCSI Volume configuration on nodes

Now on both nodes

1)      Open the /etc/iscs.conf

               ……………

               DiscoveryAddress=Openfiler hostname

2)      start ths iscsi service

         #service iscsi start

         #chkconfig –level 345 iscsi on

3)      Run the iscsi-ls to know the device mapped to disk

4)      Now prepare a plan for creating partitions. Examples are given below

SL NO	Partition name	Raw Device Name	For What?	Approx Size
1	/dev/sdb5	raw5	ocr	1g
2	/dev/sdb6	raw6	vote	1g
3	/dev/sdb7	raw7	asm	8g
4	/dev/sdb8	raw8	asm	8g
5	/dev/sdb9	raw9	asm	8g
6	/dev/sdb10	raw10	asm	8g
7	/dev/sdb11	raw11	asm	8g
8	/dev/sdb12	raw12	asm	8g

5. Now you need to create 8 partitions as per above table

#fdisk /dev/sdb

……

:e (extended)

Part No. 1

First Cylinder:

Last Cylinder:

First Cylinder:

Last Cylinder: +1024M

…………………

……………………

…………………………..

6. Note the /dev/sdb* names.

7. #partprobe

8. Login as root user on node2 and run partprobe

On node1 login as root user and create following raw devices

# raw /dev/raw/raw5 /dev/sdb5

#raw /dev/raw/taw6 /dev/sdb6

……………………………….

# raw /dev/raw/raw12 /dev/sdb12

Run ls –l /dev/sdb* and ls –l /dev/raw/raw* to confirm the above

-Repeat the same thing on node2

On node1 as root user

# vi .etc/sysconfig/rawdevices

/dev/raw/raw5 /dev/sdb5

/dev/raw/raw6 /dev/sdb6

…………………………

/dev/raw/raw12 /dev/sdb12

- Restart the raw service (# service rawdevices restart)

Repeat the same thing on node2 also

-# chown –R oracle:oinstall /dev/raw/raw*

#chmod –R 755 /dev/raw/raw8

On both nodes

# vi /etc/rc.local

Chown –R oracle:oinstall /dev/raw/raw*

Chmod –R 755 /dev/raw/raw*

PART – E Clusterware Installation on nodes

$ cd …/stage/clusterware

$./runInstaller

Use /dev/raw/raw5 for ocr disk

Use /dev/raw/raw6 for voting disk

Finish the clusterware installation

PART – G RDBMS Installation for ASM

Install 10g R2 database software for ASM at one home

PART – H RDBMS Installation for RDBMS

Install 10gR2 database software again at different home and then create a RAC database.

If you have any kind of queries regarding setup please send a mail to me on following email-id

rajeev108.cool@gmail.com

Wish you all the best...

DATA GUARD SETUP

Follow the following steps to configure data guard with broker configuration

Assumption: - Let’s suppose we have two machine machine1 & machine2 with the hostname machine1.oracle.com & machine2.oracle.com respectively. For the beginning machine1 will be considered as primary machine and machine2 will be considered as standby machine.

On machine1 we have a primary database 'dg01' and we will configure a physical standby database dg02 on machine2.

Configuration on primary site:-

Step 1) Configure the database in archive log mode and enable the force logging.

SHUTDOWN IMMEDIATE;

STARTUP MOUNT;

ALTER DATABASE ARCHIVELOG;

ALTER DATABASE FORCE LOGGING;

ALTER DATABASE OPEN;

Step 2) Configure standby redo log files to enable fast failover and real time apply. (Assume we have three redo log group)

ALTER DATABASE ADD STANDBY LOGFILE GROUP 4 ‘/u01/app/oracle/oradata/dg01/sredo04.log’ size 50m;

ALTER DATABASE ADD STANDBY LOGFILE GROUP 5 ‘/u01/app/oracle/oradata/dg01/sredo05.log’ size 50m;

ALTER DATABASE ADD STANDBY LOGFILE GROUP 6 ‘/u01/app/oracle/oradata/dg01/sredo06.log’ size 50m;

ALTER DATABASE ADD STANDBY LOGFILE GROUP 7 ‘/u01/app/oracle/oradata/dg01/sredo07.log’ size 50m;

Note: - Number of standby redo log group must be more than number of redo log group and size of the standby logfile must be same as the size of redo log file.

Steps 3) CREATE PFILE FROM SPFILE.

Now configure following parameters for primary database:-

LOG_ARCGIVE_CONFIG='DG_CONFIG=(dg01,dg02)'

DB_UNIQUE_NAME=dg01

SERVICE_NAMES=dg01

DB_FILE_NAME_CONVERT=’/u01/app/oracle/oradata/dg02’,’/u01/app/oracle/oradata/dg01’

LOG_FILE_NAME_CONVERT=’/u01/app/oracle/oradata/dg02’,’/u01/app/oracle/oradata/dg01’

LOG_ARCHIVE_DEST_1=’location=/u01/app/oracle/oradata/dg01/archive VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=DG01’

LOG_ARCHIVE_DEST_2=’service=dg02 VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLES) DB_UNIQUE_NAME=DG02’

LOG_ARCHIVE_DEST_STATE_1=enable

LOG_ARCHIVE_DEST_STATE_2=enable

STANDBY_FILE_MANAGEMENT=auto

FAL_CLIENT=dg01

FAL_SERVER=dg02

LOG_ARCHIVE_FORMAT=%t_%s_%r.arc

Step 4) Take the entire database backup on pre configured staging directory.

mkdir /home/oracle/rman (Staging directory for rman backup)

RMAN> CONFIGURE CONTROLFILE AUTOBACKUP ON;

RMAN> BACKUP FORMAT ‘/home/oracle/rman/%U’ DATABASE;

RMAN> BACKUP FORMAT ‘/home/oracle/rman/%U’ CURRENT CONTROLFILE FOR STANDBY;

RMAN> SQL “ALTER SYSTEM ARCHIVE LOG CURRENT”;

Step 5) Copy the following items from machine1 to machine2.

PASSWORD FILE

PARAMETER FILE

RMAN BACKUP TAKEN ABOVE

Note: - RMAN backup must be copy at the same directory on machine2 i.e. /home/oracle/rman.

Step 6) Configure net service for dg01 (primary database) as well as for dg02 (standby database).

DG01 =

(DESCRIPTION =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = machine1.oracle.com)(PORT = 1521))

)

(CONNECT_DATA =

(SERVER = DEDICATED)

(SERVICE_NAME = dg01)

(INSTANCE_NAME = dg01)

)

DG02 =

(DESCRIPTION =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = machine2.oracle.com)(PORT = 1521))

)

(CONNECT_DATA =

(SERVER = DEDICATED)

(SERVICE_NAME = dg02)

(INSTANCE_NAME = dg02)

)

Configuration on standby site:-

Step 7) Rename the password file and parameter file to initdg02.ora and orapwdg02 respectively. Now do the following parameter changes for standby database.

LOG_ARCGIVE_CONFIG='DG_CONFIG=(dg01,dg02)'DB_UNIQUE_NAME=dg02

SERVICE_NAMES=dg02

DB_FILE_NAME_CONVERT=’/u01/app/oracle/oradata/dg01’,’/u01/app/oracle/oradata/dg02’

LOG_FILE_NAME_CONVERT=’/u01/app/oracle/oradata/dg01’,’/u01/app/oracle/oradata/dg02’

LOG_ARCHIVE_DEST_1=’location=/u01/app/oracle/oradata/dg01/archive VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=DG02’

LOG_ARCHIVE_DEST_2=’service=dg01 VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLES) DB_UNIQUE_NAME=DG01’

LOG_ARCHIVE_DEST_STATE_1=enable

LOG_ARCHIVE_DEST_STATE_2=enable

STANDBY_FILE_MANAGEMENT=auto

FAL_CLIENT=dg02

FAL_SERVER=dg01

LOG_ARCHIVE_FORMAT=%t_%s_%r.arc

Step 8) Create the required directory for standby database

cd /u01/app/oracle/admin

mkdir dg02

cd dg02

mkdir bdump udump adump cdump create pfile scripts

mkdir -p /u01/app/oracle/oradata/dg02/archive

Step 9) Configure net service for dg01 (primary) as well as dg02 (standby) database.

DG01 =

(DESCRIPTION =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = machine1.oracle.com)(PORT = 1521))

)

(CONNECT_DATA =

(SERVER = DEDICATED)

(SERVICE_NAME = dg01)

(INSTANCE_NAME = dg01)

)

DG02 =

(DESCRIPTION =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = machine2.oracle.com)(PORT = 1521))

)

(CONNECT_DATA =

(SERVER = DEDICATED)

(SERVICE_NAME = dg02)

(INSTANCE_NAME = dg02)

)

Step 10) Start dg02 at nomount stage.

export ORACLE_SID=dg02

CREATE SPFILE FROM PFILE;

STARTUP NOMOUNT

EXIT

Step 11) Now using RMAN duplicate command create standby database.

RMAN TARGET SYS/ORACLE@DG01

RMAN> CONNECT AUXILIARY SYS/ORACLE@DG02

RMAN> DUPLICATE TARGET DATABASE FOR STANDBY DORECOVER;

Note: execute above commands on machine1.

Above are the simple steps to configure physical standby database for your primary database.

As you know about data guard broker which makes data guard management ease and centralized. Following steps will tel you how to configure data guard broker using command line utility DGMGRL.

Step 1) Register a service db_name.db_domain_DGMGRL statically with the listener for primary as well as for standby database on primary as well as standby server.

SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(GLOBAL_DBNAME = dg01_DGMGRL)
(ORACLE_HOME = /u01/app/oracle/oracle/product/10.2.0/db_1)
(SID_NAME = dg01)
)
(SID_DESC =
(GLOBAL_DBNAME = dg02_DGMGRL)
(ORACLE_HOME = /u01/app/oracle/oracle/product/10.2.0/asm)
(SID_NAME = dg02)
)
)

Note:- Reload the listener after modifying the setting. (LSNRCTL RELOAD)

Step 2) Configure a parameter DG_BROKER_START=TRUE on primary as well as on standby side.

ALTER SYSTEM SET DG_BROKER_START=TRUE SCOPE=BOTH;

Step 3) Invoke DGMGRL and create the configuration

$ dgmgrl

DGMGRL> connect sys/oracle@dg01

DGMGRL> CREATE CONFIGURATION DGCONFIG1 AS PRIMARY DATABASE IS DG01 CONNECT IDENTIFIER IS 'DG01';

DGMGRL> ADD DATABASE DG02 AS CONNECT IDENTIFIER IS 'DG02' MAINTAINED AS PHYSICAL;

Step 4) Enable the configuration and check their status

DGMGRL> SHOW CONFIGURATION

DGMGRL> ENABLE CONFIGURATION

DGMGRL> SHOW CONFIGURATION

I JUST HOPE YOU WILL FIND SOME VALUABLE INFORMATION WHICH MAY ENHANCE YOUR KNOWLEDGE.

WISH YOU ALL THE BEST...