Showing posts with label RAC(Real Applications Cluster). Show all posts
Showing posts with label RAC(Real Applications Cluster). Show all posts

Wednesday, February 01, 2012

Some more about RAC in details:

Cluster command in 10g and 11g

1. CRSCTL: Cluster Control utility performs various administrative operation of oracle clusterware. It is located in $ORA_CRS_HOME/bin and must be executed by the “root” user.

a. To check the current state of all oracle clusterware daemon:

[root@PROD1 bin]# ./crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy

b.You can also check the state of individual oracle clusterware daemon:

[root@PROD1 bin]# ./crsctl check cssd

CSS appears healthy

[root@PROD1 bin]# ./crsctl check crsd

CRS appears healthy

[root@PROD1 bin]# ./crsctl check evmd
EVM appears healthy

c. To start oracle clusterware

[root@PROD1 bin]# ./crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly

d. To stop oracle clusterware

[root@PROD1 bin]# ./crsctl stop crs
Stopping resources.
Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.

e. To disable oracle clusterware:

[root@PROD1 bin]# ./crsctl disable crs

f. To enable oracle clusterware:

[root@PROD1 bin]# ./crsctl enable crs

g. To get current value of CSS parameter

[root@PROD1 bin]# ./crsctl get css

for example: to get value of misscount parameter

[root@PROD1 bin]# ./crsctl get css misscount
60

h. To set a new value of CSS parameter

[root@PROD1 bin]# ./crsctl set css

for example: to set value of misscount parameter

[root@PROD1 bin]# ./crsctl set css misscount 120
Configuration parameter misscount is now set to 120.

i. To unset CSS parameter value

[root@PROD1 bin]# ./crsctl unset css

for example: to unset value of misscount parameter

[root@PROD1 bin]# ./crsctl unset css misscount
Configuration parameter misscount is now undefined.

j. To list the module for debugging in CSS

[root@PROD2 bin]# ./crsctl lsmodules css
The following are the CSS modules ::
CSSD
COMMCRS
COMMNS
2. CRS_STAT: It reports the current state of resources configured in the OCR.

[oracle@rac1 bin]$ ./crs_stat -t
Name Type Target State Host
———————————————————————————–
ora….C1.inst application ONLINE ONLINE PROD1
ora….C2.inst application ONLINE ONLINE PROD2
ora….AC1.srv application ONLINE ONLINE PROD1
ora.RAC.abc.cs application ONLINE ONLINE PROD1
ora.RAC.db application ONLINE ONLINE PROD2
ora….AC1.srv application ONLINE ONLINE PROD1
ora….ice2.cs application ONLINE ONLINE PROD1
ora….AC1.srv application ONLINE ONLINE PROD1


3. OCRDUMP : It dumps the contents of OCR into a text file.

[root@PROD1 bin]# ./ocrdump /home/oracle/ocr.dmp

4. OCRCHECK : It verifies the integrity of the OCR.

[root@PROD2 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 5237072
Used space (kbytes) : 9360
Available space (kbytes) : 5227712
ID : 794527192
Device/File Name : /u01/oracle/oradata/ocr
Device/File integrity check succeeded

Cluster registry integrity check succeeded

7. OCRCONFIG: It perform various administrative operation on the OCR.

Interconnect setup in RAC

The interconnect is a very important part of the cluster environment it is on of the aorta’s of a cluster environment. The interconnect is used as physical layer between the cluster nodes to perform heartbeats as well as the cache fusion is using it. The interconnect must be a private connection. Cross over cable is not support.

In a day to day operation it is proven that when the interconnect is configured correctly the interconnect will not be the bottleneck in case of performance issues. In the rest of this article will be focus on the how to validate the interconnect is really used. An DBA must be able to validate the interconnect settings in case of performance problems. Out of scope is the physical attachment of the interconnect.

Although you should thread performance issues in a Cluster environment the way you would normally also do in no-cluster environments here some area’s you can focus on. Normally the average interconnect latency using gigabit must be < 5ms. Latency around 2ms are normal.

10g and 11g
set linesize 120
col name for a22
col ip_address for a15

select inst_id,name,ip_address,is_public from gv$configured_interconnects order by 1,2;


INST_ID NAME IP_ADDRESS IS_PUBLIC
---------- ---------------------- --------------- ---------
1 en11 145.72.220.10 YES
1 en12 145.72.220.83 NO
2 en11 145.72.220.20 YES
2 en12 145.72.220.84 NO


Used interfaces for the interconnect ?
In 1.1.1 the interfaces available are listed, but which interface is used for the cache fusion part ? And make sure the interconnect is using the correct interface ?
set linesize 120
col name for a22
col ip_address for a15
select inst_id,name,ip_address,is_public from gv$cluster_interconnects order by 1,2;

INST_ID NAME IP_ADDRESS IS_PUBLIC
---------- ---------------------- --------------- ---------
1 en12 145.72.220.83 NO
2 en12 145.72.220.84 NO


9i,10g and 11gOradebug
Before the above queries where available (10g and 11g) you needed to use oradebug to validate if the correct interface was used. This was the way to validate the interconnect usage in Oracle 9i, bust still possible in 10g and 11g as well.

SQL> connect / as sysdba
SQL> alter session set tracefile_identifier=oradebug_interc
SQL> oradebug setmypid
SQL> oradebug ipc
SQL> exit

Now if you open the tracefile, in the bdump location, you can find the IP address used for the interconnect. Here is the result of the above oradebug ipc command.

INSTANCE FAILURE AND RECOVERY (RAC)

Oracle uses a simple mechanism of a heartbeat to detect instance failure of a node in a RAC cluster. The cluster manager software detects if it cannot receive the heartbeat of the other instances and if it does not the recovery process begins. From what I can find, this heartbeat is provided by the udlm package on Sun and the MC/ServiceGuard on HP. Fundamentally, as all instances in a cluster can read all threads of redo another surviving member performs recovery on behalf of the failed instance. This comprises two main steps. Firstly, the Global Resource Directory must be remastered so that all resources that were mastered on the failed node are remastered to the surviving nodes. This remastering is performed by lmon. Secondly, the redo thread of the failed node is read so that changes to blocks on the failed node that were not written to disk (dirty blocks) can be applied to the datafiles. The RAC Concepts Guide outlines the following steps that are followed after node failure has occurred:

1. During the first phase of recovery, which is the GES reconfiguration, Oracle first reconfigures the GES enqueues. Then Oracle reconfigures the GCS resources. During this time, all GCS resource requests and write requests are temporarily suspended. However, processes and transactions can continue to modify data blocks as long as these processes and transactions have already acquired the necessary enqueues.

2. After the reconfiguration of enqueues that the GES controlled, a log read and the remastering of GCS resources occur in parallel. At the end of this step the block resources that need to be recovered have been identified. Only the redo of the dead instances must be read.

3. Buffer space for recovery is allocated and the resources that were identified in the previous reading of the log are claimed as recovery resources. Then, assuming that there are PIs of blocks to be recovered in other caches in the cluster database, resource buffers are requested from other instances. The resource buffers are the starting point of recovery for a particular block.

4. All resources and enqueues required for subsequent processing have been acquired and the Global Resource Directory is now unfrozen. Any data blocks that are not in recovery can now be accessed. Note that the system is already partially available.

5. The cache layer recovers and writes each block identified in step 2, releasing the recovery resources immediately after block recovery so that more blocks become available as cache recovery proceeds.

6. After all blocks have been recovered and the recovery resources have been released, the system is again fully available. Recovered blocks are available after recovery completes.

If all instances in a cluster fail then crash recovery is performed by the first instance in the cluster to start, just as in single instance.

CONCEPTS AND FUNCTIONAL OVERVIEW 

RAC is the ability to have two or more instances connect to the same database. These instances reside on separate servers, thereby allowing each instance to make full use of the processing ability of each server. As these instances are accessing the same database they need to be able to communicate with each other, this is done through physical interconnects that join the servers together. All the servers in the configuration make up the cluster.

All instances in a cluster share access to common database resources, this access needs to be coordinated between the instances in order to maintain the overall integrity of the database. In order to coordinate this access RAC databases have a Global Resource Directory. This Global Resource Directory is RAC specific and is not required on single instance systems.

Global Resource Directory(GRD)
Global Resource Directory provides an extra layer of control in order to allow all instances in a cluster to share access to shared database resources. The Global Resource Directory resides in the SGA of each instance in the cluster, with each instance maintaining its own portion. The main role of the Global Resource Directory is to ensure that access and changes to common resources is controlled between the instances in order to maintain the integrity of the database.

Global Enqueue Service(GES) and the Global Cache Service(GCS) maintain the information in the Global Resource Directory. Although the Global Resource Directory is split amongst all instances in the cluster, the Global Cache Service(GCS) and Global Enqueue Service(GES) nominate one instance to manage all information about a particular database resource. This instance is called the resource master. This ownership is periodically checked and changed accordingly between instances, this is done to reduce interconnect traffic and resource acquisition time.

Global Cache Service(GCS)
Global Cache Service is responsible for cache fusion i.e. transmitting data blocks between the instances. The main features of this processing are:

-The lms processes are the Global Cache Service(GCS) background processes in the instance
-Blocks can exist in more than one instance at a time.
-If an instance is requested to transfer a dirty block (a dirty block is a block that has been modified but not yet written to disk) to another instance in exclusive mode it keeps a past image of the block. This past image is basically a copy of the block as the holding instance last knew it before it transferred it to the requesting instance. This past image is used for recovery. Once the most recent copy (the master copy) of the dirty block is written to disk by dbwr the past images can and will be discarded. Note that PI’s can be used for a consistent read of a block as this saves having to build a copy from the rollback segment. The important thing to note is that an instance will always create a PI version of a dirty block before sending it to another instance if the reqeusting instance wants it in exclusive mode. If an instance requests the master block for read (consistent or current) there is no need for the holding instance to keep a PI as the requesting instance is not going to change the block
-The most recent copy (the master copy or current block) of a block contains all changes made to it by transactions, regardless of which instance the change occurred on and whether the transaction(s) has committed or not
-A block is assigned a role and mode and a status (clean or dirty)
-The block is held in a local role if it is only held in one SGA, it is held in a global role if it is held in more than one SGA
-The block can be held in null, shared or exclusive mode. Null mode means that the instance has no access rights on the blocks, shared mode means that the instance can read the block and exclusive mode means that the instance can write to the block. Many instances can have the same block in null or shared mode, but only one can have it in exclusive mode (as exclusive mode implies the instance wants to modify the block). To view the current mode of the block in an instance view v$bh.status. The following applies to the state of the blocks:

Global Enqueue Service(GES)
Global Enqueue Service(GES) manages all non-cache fusion resource requests and tracks the status of all enqueuing mechanisms. The GES only does this for resources that are accessed by more than one instance. The primary resources that the GES controls are dictionary cache locks and library cache locks. The GES manages the interinstance communication that occurs between the instances for these resources. These resources exist in single instance, the difference being that in RAC these must be coordinated between all instances in the cluster.

Dictionary cache locks – The data dictionary must be consistent across all nodes, if a table definition is changed on one instance the Global Enqueue Service ensures that the definition is propagated to the dictionary cache on all the other instances.

Library cache locks – These locks are held by sessions whilst they parse or execute SQL or PLSQL statements. These locks prevent other sessions from modifying the definitions of objects in the data dictionary that are being referenced by the statement that is currently parsing or executing.

BACKGROUND PROCESSES IN RAC


Each instance in a RAC configuration has the same background processes as a single instance database, however there are extra background processes evident in a RAC enabled instance, these are mainly related to the Global Cache Service(GCS) and the Global Enqueue Service(GES):

LMS processes – These are the processes for the Global Cache Service. These processes are responsible for transferring blocks between instances and maintaining the Global Resource Directory to reflect the modes and roles the blocks are held in on each instance. A block can exist in more than one instance at a time, but the Global Cache Service controls who has what version of the block thereby ensuring that the most up to date block (the master copy) is always the one that is updated. All other versions of the block will be past images or read consistent versions of the block.

LMON – Global Enqueue Service Monitor
This process monitors global enqueues and resources across the cluster and performs global enqueue recovery operations. LMON also handles recovery associated with global resources and can detect instance failure of other nodes

LMD – Global Enqueue Service Daemon
This process manages global enqueue and global resource access. Within each instance, the LMD process manages incoming remote resource requests

LCK – This is the lock process and makes up part of the Global Enqueue Service. It manages non-Cache Fusion resource requests such as library and dictionary cache requests

DIAG – This is diagnostic daemon.Captures diagnostic information related to instance process failures.

To view all the background processes (RAC and non-RAC) evident in an instance:

SQL> select * from v$bgprocess where paddr <> ‘00’

STORAGE SYSTEM FOR RAC 
In a RAC environment all instances need to be able to write to the same datafiles simultaneously. There are 2 ways to do this, use RAW devices or use a Cluster Filesystem.

Datafiles and tempfiles
All datafiles and tempfiles must reside on shared disks. The first instance to start will verify that it can read all datafiles identified in the controlfile. This must be done so that the first instance to start can determine whether instance or media recovery is required or not, this behaviour is no different to single instance. However, instances that join the cluster at a later date can operate even if they cannot access all the files, they will simply raise an error when an attempt is made to access the file.

Control files
The control files must be on shared disks and must be accessible by all instances at startup time as determined in the parameter file.

Redo log files and archived logs
On RAC each instance must write to its own set of redo logs. This set is called a thread of redo. All threads of redo must reside on shared disks. The instance gets its thread of redo at startup time as determined by the thread parameter. If an instance cannot get its thread of redo it will fail to open. Each redo group will still be uniquely numbered at the database level and will be multiplexed or mirrored, just as in single instance. The only difference is that in RAC each redo group belongs to a thread, and only the instance specifying that thread number at startup time will write to the redo groups in that thread.

Each instance can, however, read all threads of redo. This is to facilitate instance recovery i.e if instance a fails then instance b will read instance a’s thread of redo to recover the failure. This must happen so that the consistency and integrity of the database is maintained if one instance fails. In order to facilitate instance recovery all redo files must reside on shared disks.

Archived log files are generated by each thread of redo and are uniquely identified by the thread number that we include in the log_archive_format and the sequence number which is unique for each instance. Archived_log can be on the local filesystem or shared filesystem.

Sequences
Sequences are held on disk. Even in single instance many DBA’s cache sequence numbers to avoid contention for the sequence. We cache most sequences in RAC to avoid contention on the sequence. If you have a high cumulative wait time in v$enqueue_stat on the SQ enqueue (the sequence number enqueue) then you should consider caching enqueues. RAC does support CACHING and ORDERING of sequence numbers.

Undo management
Undo/rollback datafiles must reside on RAW devices. If you use MANUAL undo then each instance must specify unique rollback segments in the instance specific parameter file. If you use AUTOMATIC undo then each instance must specify a separate tablespace, this tablespace must be available and of type UNDO. All instances in a RAC cluster must run in the same UNDO mode i.e you can’t have one running AUTOMATIC undo and another running MANUAL undo. If you are using AUTOMATIC undo monitor v$undostat for statistics.

To see if the oracle home is RAC enabled issue the following SQL:

select * from dba_registry where comp_id = 'RAC';

To relink an oracle home with RAC disabled or enabled:

cd $ORACLE_HOME/rdbms/lib
make –f ins_rdbms.mk rac_off install OR
make –f ins_rdbms.mk rac_on install

Important RAC wait events

SQL> select event from v$system_event  where event like '%global%' order by event;

EVENT
----------------------------------------------------------------
buffer busy global CR
buffer busy global cache
ges global resource directory to be frozen - no
ges global resource directory to be unfrozen - no
global cache busy
global cache cr request
global cache domain validation - no
global cache null to s
global cache null to x
global cache open s
global cache open x
global cache s to x

buffer busy global cache

This wait event falls under the umbrella of ‘global buffer busy events’. This wait event occurs when a user is waiting for a block that is currently held by another session on the same instance and the blocking session is itself waiting on a global cache transfer.

buffer busy global CR

This wait event falls under the umbrella of ‘global buffer busy events’. This wait event occurs when multiple CR requests for the same block are submitted from the same instance before the first request completes, users may queue up behind it

global cache busy

This wait event falls under the umbrella of ‘global buffer busy events’. This wait event means that a user on the local instance attempts to acquire a block globally and a pending acquisition or release is already in progress.

global cache cr request

This wait event falls under the umbrella of ‘global cache events’. This wait event determines that an instance has requested a consistent read version of a block from another instance and is waiting for the block to arrive

global cache null to s and global cache null to x

This wait event falls under the umbrella of ‘global cache events’. These events are waited for when a block was used by an instance, transferred to another instance, and then requested back again.

global cache open s and global cache open x

This wait event falls under the umbrella of ‘global cache events’. These events are used when an instance has to read a block from disk into cache as the block does not exist in any instances cache. High values on these waits may be indicative of a small buffer cache, therefore you may see a low cache hit ratio for your buffer cache at the same time as seeing these wait events.

global cache s to x

This wait event falls under the umbrella of ‘global cache events’. This event occurs when a session converts a block from shared to exclusive mode.

To find locks in RAC
SELECT inst_id,DECODE(request,0,'Holder: ','Waiter: ')||sid sess,
id1, id2, lmode, request, type
FROM GV$LOCK
WHERE (id1, id2, type) IN
(SELECT id1, id2, type FROM gV$LOCK WHERE request>0)
ORDER BY id1, request;

Some Imp points

1. Cache Fusion:Cache Fusion is a new parallel database architecture for exploiting clustered computers to achieve scalability of all types of applications. Cache Fusion is a shared cache architecture that uses high speed low latency interconnects available today on clustered systems to maintain database cache coherency. Database blocks are shipped across the interconnect to the node where access to the data is needed. This is accomplished transparently to the application and users of the system. As Cache Fusion uses at most a 3 point protocol, this means that it easily scales to clusters with a large numbers of nodes


2.The LMD and LMS processes are critical RAC processes that should not be blocked on CPU by queuing up behind other scheduled CPU events

3 v$ges_statistics view returns various statistics on the Global Enqueue Service.

4.gv$lock view will show all the locks held by all the instances

Source: http://appsoracle.blogspot.in/search/label/RAC

Sunday, January 22, 2012

More about RAC(Real Application Cluster)


RAC(Real Application Cluster)

1. When RAC is introduced?

Ans: Introduced in Oracle 9i

2. How to identify RAC instance?

Ans: show parameter cluster or use the DBMS_UTILITY.IS_CLUSTER_DATABASE function.

3. RAC advantages/features?

A:  1. High availability
     2. Failover
     3. Reliability
     4. Scalability
     5. Managebility
     6. Recoverability
     7. Transparency
     8. Row locking
     9. Error detection
     10. Buffer cache management
     11. Continuos Operations
     12. Load balancing/sharing

4. Components in RAC?


SGA - Each instance has its own SGA
Background processes - Each instance has its own set of background processes
Datafiles - Shared by all instances, so must be placed in shared storage
Control files - Shared by all instances, so must be placed in shared storage
Online redo logfiles - Only one instance can write, but other instance can read during recovery and archiving. If an instance is shutdown log switches by other instances can force idle instance redologs to be archived.
Archived redolog - Private to the instance, but other instance will need access to all required archives logs during media recovery.
Flash recovery log - Shared by all the instances, so must be place in shared storage.
Alert log & trace files - Private to each instance, other instances never read  or write to those files.
ORACLE_HOME - It can be private to each instance or can be on shared file system.

5. Network/IPs

1. Public/Physical IP - To communicate to server.
2. Private IP - This is used for inter instance communication used by cluster and dedicated to the server nodes of a cluster
3. Virtual IP - This is used in listener configuration for load balancing/failover.

6. What is shared and What is not shared?

Shared:
1. Disk access
2. Resources that manages data.
3. All instances have common data and control files.
Not shared:
Each node has its own dedicated:
1. System memory
2. OS
3. Database instance
4. application software
5. Each instance has individual Log files and Rollback segments

7. RAC background processes

  1. LMSn (Global Cache Service Processes) -
      a..LMSn handles block transfers between the holding instance's buffer cache and requesting foreground process on the requesting instance.
    b.LMS maintains read consistency by rolling back any uncommitted transactions for blocks that are being requested by any remote instance.
    c.Even if ’n’ value(0-9) varies depending on the amount of messaging traffic amongst nodes in the cluster, there is default, one LMS process per pair of CPUs.
2. LMON (Global Enqueue Service Monitor) -
        It constantly handles reconfiguration of locks and global resources when a node joins or leaves the cluster. Its services are also known as Cluster Group Services (CGS).
3. LMD  (Global Enqueue Service Daemon)  -
       It manages lock manager service requests for GCS resources and sends them to a service queue to be handled by the LMSn process. The LMD  process also handles global deadlock detection and remote resource requests (remote resource requests are requests originating from another   instance).
4. LCK (Lock Process) -
 LCK manages non-cache fusion resource requests such as library and row cache requests and lock requests that are local to the server. Because the LMS process handles the primary function of lock management, only a single LCK process exists in each instance.
5. DIAG (Diagnosability Daemon) -
 This background process monitors the health of the instance and captures diagnostic data about process failures within instances. The operation of this daemon is automated and updates an alert log file to record the activity that it performs.
6. GSD (Global service Daemon) -
 This is a component in RAC that receives requests from the SRVCTL control utility to execute administrative tasks like startup or shutdown. The command is executed locally on each node and the results are returned to SRVCTL. The GSD is installed on the nodes by default.

8. INTERNAL STRUCTURES AND SERVICES


 1. Global Resource Directory (GRD)

  1. Records current state and owner of each resource
  2. Contains convert and write queues
  3. Distributed across all instances in cluster
  4. Maintained by GCS and GES

2. Global Cache Services (GCS)

  1. Implements cache coherency for database
  2. Coordinates access to database blocks for instances
3. Global Enqueue Services (GES)

  1. Controls access to other resources (locks) including library cache and dictionary cache
  2. Performs deadlock detection


SRVCTL Utility commands

stop/start

1. srvctl start database -d <DB Name> [to start all instances of database with listeners ]
2. srvctl stop database –d <DB Name>
3. srvctl stop database -d <DB Name> -o immediate
4. srvctl start database -d <DB Name> -o force
5. srvctl stop database -d <DB Name> -i instance <Instance name>       [ individual instance]
6. srvctl stop service -d <database> [-s <service><service>] [-i <instance>,<instance>]
7. srvctl stop nodeapps -n <node>
8. srvctl stop asm -n <node>
9. srvctl start service -d <database> -s <service><service> -i <instance>,<instance>
10. srvctl start nodeapps -n <node>
11. srvctl start asm -n <node>

status

srvctl status database -d <database
srvctl status instance -d <database> -i <instance>
srvctl status nodeapps -n <node>
srvctl status service -d <database>
srvctl status asm -n <node>

adding/removing

srvctl add database -d <database> -o <oracle_home>
srvctl add instance -d <database> -i <instance> -n <node>
srvctl add service -d <database> -s <service> -r <preferred_list>
srvctl add nodeapps -n <node> -o <oracle_home> -A <name|ip>/network
srvctl add asm -n <node> -i <asm_instance> -o <oracle_home>
srvctl remove database -d <database> -o <oracle_home>
srvctl remove instance -d <database> -i <instance> -n <node>
srvctl remove service -d <database> -s <service> -r <preferred_list>
srvctl remove nodeapps -n <node> -o <oracle_home> -A <name|p>/network
srvctl asm remove -n <node>

nodeapps

1. VIP
2. ONS
3. GSD
4. Listener

Clusterware Components

OPROCd - (Process Monitor Daemon)
Provides basic cluster integrity services, Faiure of the process causes Node Restart. It runs as root.
CRSd   - CRS daemon, the failure of this daemon results in a node being reboot to avoid data corruption
Resource monitoring, failover and node recovery, failure of the process Daemon caused restarted automatically  
EVMd -  (EventManagement)
spawns a child process event logger and generates callouts OCSSd - Oracle Cluster Synchronization Service Daemon (updates the registry). Failure of          this process causes Daemon automatically restarted, no node restart. It runs as oracle
OCSSd - (Cluster Synchronization Services)
Basic node membership, group services, basic locking. Failure of this proces Node Restart and it runs as oracle

How to check CRS version
crsctl query crs activeversion
crsctl query crs softwareversion

Clusterware Files
Oracle Clusterware requires two files that must be located on shared storage for its operation.

1. Oracle Cluster Registry (OCR)
2. Voting Disk

Oracle Cluster Registry (OCR)

Located on shared storage and in Oracle 10.2 and above can be mirrored to maximum two copies.

1. Defines cluster resources including
2. Databases and Instances ( RDBMS and ASM)
3. Services and Node Applications (VIP,ONS,GSD)
4. Listener Process

Voting Disk (Quorum Disk / File in Oracle 9i)

1. Used to determine RAC instance membership and is located on shared storage accessible to all instances.
2. used to determine which instance takes control of cluster in case of node failure to avoid split brain .
3. In Oracle 10.2 and above can be mirrored to only Odd number of copies (1, 3, 5 etc)

crsctl commands

/oracle/product/grid_home/bin/crsctl check crs
/oracle/product/grid_home/bin/crsctl stat res -t
/oracle/product/grid_home/bin/ocrcheck
/oracle/product/grid_home/bin/crsctl query css votedisk
/oracle/product/grid_home/bin/cluvfy stage -post crsinst -n all -verbose
/oracle/product/grid_home/bin/srvctl status scan_listener

More about VIP

1. To make the applications highly available and to eliminate SPOF,Oracle 10g introduced a new feature called CLUSTER VIPs i.e a virtual IP address different    from the set of in cluster IP addresses that is used by the outside world to connect to the database.

2. A VIP name and address must be registered in the DNS along with standard static IP information. Listeners would be configured to listen on VIPs instead of    the public IP.
3. When a node is down, the VIP is automatically failed over to oneof the other nodes. The node that gets the VIP will “re-ARP”to the world, indicating the    new MAC address of the VIP. Clients are sent error message immediately rather than waiting for the TCP timeout value.

Friday, November 07, 2008

Using srvctl to Manage your 10g RAC Database

Oracle recommends that RAC databases be managed with srvctl, an Oracle-supplied tool that was first introduced with 9i RAC. The 10g version of srvctl is slightly different from the 9i implementation. In this article, we will look at how -- and why -- to manage your 10g databases with srvctl.

Interacting with CRS and the OCR: srvctl

srvctl is the tool Oracle recommends that DBAs use to interact with CRS and the cluster registry. Oracle does provide several tools to interface with the cluster registry and CRS more directly, at a lower level, but these tools are deliberately undocumented and intended only for use by Oracle Support. srvctl, in contrast, is well documented and easy to use. Using other tools to modify the OCR or manage CRS without the assistance of Oracle Support runs the risk of damaging the OCR.

Using srvctl

Even if you are experienced with 9i srvctl, it's worth taking a look at this section; 9i and 10g srvctl commands are slightly different.

srvctl must be run from the $ORACLE_HOME of the RAC you are administering. The basic format of a srvctl command is

srvctl [options]

where command is one of

enable|disable|start|stop|relocate|status|add|remove|modify|getenv|setenv|unsetenv|config

and the target, or object, can be a database, instance, service, ASM instance, or the nodeapps.

The srvctl commands are summarized in this table:



As you can see, srvctl is a powerful utility with a lot of syntax to remember. Fortunately, there are only really two commands to memorize: srvctl -help displays a basic usage message, and srvctl -h displays full usage information for every possible srvctl command.

Examples

Example 1. Bring up the MYSID1 instance of the MYSID database.

[oracle@myserver oracle]$ srvctl start instance -d MYSID -i MYSID1

Example 2. Stop the MYSID database: all its instances and all its services, on all nodes.

[oracle@myserver oracle]$ srvctl stop database -d MYSID

Example 3. Stop the nodeapps on the myserver node. NB: Instances and services also stop.

[oracle@myserver oracle]$ srvctl stop nodeapps -n myserver

Example 4. Add the MYSID3 instance, which runs on the myserver node, to the MYSID
clustered database.

[oracle@myserver oracle]$ srvctl add instance -d MYSID -i MYSID3 -n myserver

Example 4. Add a new node, the mynewserver node, to a cluster.

[oracle@myserver oracle]$ srvctl add nodeapps -n mynewserver -o $ORACLE_HOME -A
149.181.201.1/255.255.255.0/eth1

(The -A flag precedes an address specification.)

Example 5. To change the VIP (virtual IP) on a RAC node, use the command

[oracle@myserver oracle]$ srvctl modify nodeapps -A new_address

Example 6. Find out whether the nodeapps on mynewserver are up.

[oracle@myserver oracle]$ srvctl status nodeapps -n mynewserver
VIP is running on node: mynewserver
GSD is running on node: mynewserver
Listener is not running on node: mynewserver
ONS daemon is running on node: mynewserver

Example 7. Disable the ASM instance on myserver for maintenance.

[oracle@myserver oracle]$ srvctl disable asm -n myserver

Debugging srvctl

Debugging srvctl in 10g couldn't be easier. Simply set the SRVM_TRACE environment variable.

[oracle@myserver bin]$ export SRVM_TRACE=true

Let's repeat Example 6 with SRVM_TRACE set to true:

[oracle@myserver oracle]$ srvctl status nodeapps -n mynewserver
/u01/app/oracle/product/10.1.0/jdk/jre//bin/java -classpath
/u01/app/oracle/product/10.1.0/jlib/netcfg.jar:/u01/app/oracle/product/10.1.0/jdk/jre//lib/rt.jar:
/u01/app/oracle/product/10.1.0/jdk/jre//lib/i18n.jar:/u01/app/oracle/product/10.1.0/jlib/srvm.jar:
/u01/app/oracle/product/10.1.0/jlib/srvmhas.jar:/u01/app/oracle/product/10.1.0/jlib/srvmasm.jar:
/u01/app/oracle/product/10.1.0/srvm/jlib/srvctl.jar
-DTRACING.ENABLED=true -DTRACING.LEVEL=2 oracle.ops.opsctl.OPSCTLDriver status nodeapps -n
mynewserver
[main] [19:53:31:778] [OPSCTLDriver.setInternalDebugLevel:165] tracing is true at level 2 to
file null
[main] [19:53:31:825] [OPSCTLDriver.:94] Security manager is set
[main] [19:53:31:843] [CommandLineParser.parse:157] parsing cmdline args
[main] [19:53:31:844] [CommandLineParser.parse2WordCommandOptions:900] parsing 2-word
cmdline
[main] [19:53:31:866] [GetActiveNodes.create:212] Going into GetActiveNodes constructor...
[main] [19:53:31:875] [HASContext.getInstance:191] Module init : 16
[main] [19:53:31:875] [HASContext.getInstance:216] Local Module init : 19
...
[main] [19:53:32:285] [ONS.isRunning:186] Status of ora.ganges.ons on mynewserver is true
ONS daemon is running on node: mynewserver
[oracle@myserver oracle]$

Pitfalls

A little impatience when dealing with srvctl can corrupt your OCR, ie, put it into a state where the information for a given object is inconsistent or partially missing. Specifically, the srvctl remove command provides the -f option, to allow you to force removal of an object from the OCR. Use this option judiciously, as it can easily put the OCR into an inconsistent state.

Restoring the OCR from an inconsistent state is best done with the assistance of Oracle Support, who will guide you in using the undocumented $CRS_HOME/bin/crs_* tools to repair it. The OCR can also be restored from backup.

Error messages

srvctl errors are PRK% errors, which are not documented in the 10gR1 error messages manual. However, for those with a Metalink account, they are documented on Metalink here.

Conclusion

srvctl is a powerful tool that will allow you to administer your RAC easily and effectively. In addition, it provides a valuable buffer between the DBA and the OCR, making it more difficult to corrupt the OCR.

Note: Info from Natalka Roshak's blog

Cluster Ready Services and the OCR

Cluster Ready Services, or CRS, is a new feature for 10g RAC. Essentially, it is Oracle's own clusterware. On most platforms, Oracle supports vendor clusterware; in these cases, CRS interoperates with the vendor clusterware, providing high availability support and service and workload management. On Linux and Windows clusters, CRS serves as the sole clusterware. In all cases, CRS provides a standard cluster interface that is consistent across all
platforms.

CRS consists of four processes (crsd, occsd, evmd, and evmlogger) and two disks: the Oracle Cluster Registry (OCR), and the voting disk.

CRS manages the following resources:

* The ASM instances on each node
* Databases
* The instances on each node
* Oracle Services on each node
* The cluster nodes themselves, including the following processes, or "nodeapps":
o VIP
o GSD
o The listener
o The ONS daemon

CRS stores information about these resources in the OCR. If the information in the OCR for one of these resources becomes damaged or inconsistent, then CRS is no longer able to manage that resource. Fortunately, the OCR automatically backs itself up regularly and frequently.

RAC Architecture in Brief

RAC Architecture Overview

1.A cluster is a set of 2 or more machines (nodes) that share or coordinate resources to perform the same task.

2. A RAC database is 2 or more instances running on a set of clustered nodes, with all instances accessing a shared set of database files.

3.Depending on the O/S platform, a RAC database may be deployed on a cluster that uses vendor clusterware plus Oracle's own clusterware (Cluster Ready Services), or on a cluster that solely uses Oracle's own clusterware.

Thus, every RAC sits on a cluster that is running Cluster Ready Services. srvctl is the primary tool DBAs use to configure CRS for their RAC database and processes.

Oracle EBS integration with Oracle IDCS for SSO

Oracle EBS integration with Oracle IDCS for SSO Oracle EBS SSO? Why is it so important? Oracle E-Business Suite is a widely used application...