Monday, March 8, 2010

ICAS4108B Complete database back-up and recovery

Database Backup and Recovery
Most of database systems have incorporated backup and recovery tools into their interfaces and infrastructure it is a good idea to understand what the backup and recovery process involves, beyond the work flow of using the tools. With the growing dependency in the workplace on information and general, and the information in your database specifically, there has never been a time when safe backups and reliable recoveries were more important. Are you maintaining your databases properly? Do they participate in a backup and recovery routine that checks on the health of the data? If you are not 100% sure of this, then you need to find out.

Database Backup and Recovery Needs
It is not just the data files that need to be part of the backup process. You must also backup the transaction logs of the database as well. Without the transaction logs the data files are useless in a recovery event. How often you choose to perform these backup routines is really dependent on the data requirements of your company. If you do not know what those are then you need to find out either by referring to them or asking for them to be created through a research and investigation process.
Database Backup and Recovery Needs
It is not just the data files that need to be part of the backup process. You must also backup the transaction logs of the database as well. Without the transaction logs the data files are useless in a recovery event. How often you choose to perform these backup routines is really dependent on the data requirements of your company. If you do not know what those are then you need to find out either by referring to them or asking for them to be created through a research and investigation process.
Backup and Recovery and Database Failure
Failure can happen for any number of reasons. There are three main ones that recur enough to be worth incorporating into your backup and recovery plan. User error is the number one reason for data damage, loss, or corruption. Included in this type of failure is an application modifying or destroying the data on its own or through a user choice. Recovery and restore to the point in time before the corruption occurred.
This returns the data to a clean position at the cost of any other changes that were being made to the data since the point the corruption took place. Any lost work will need to be re-entered or processes repeated if necessary.
Media failure can also cause data loss or damage. Media failure can happen when the media the data files or transaction logs are stored on fail. Most databases will be stored on computer hard drives or across groups of hard drives on designated servers. Hard drives are mechanical devices, just like automobiles, and are made up of parts and pieces that work together. Mechanical devices are known for failure and will need to be replaced once, or if, the data has been retrieved from them.
Backup and Recovery and Disaster
The third reason for database failure is a disastrous or catastrophic event. This can be in the form of fire, flood, or any naturally occurring storm. It can also happen through electrical outage, a virus, or the deliberate hacking of your data. Any of these can corrupt or cause the loss of your data. The true disaster will be the lack of data backup and or the lack of a recovery plan. Without data backup recovery is impossible. And without a recovery plan there is no guarantee that your data backup will make it through the recovery process.
Purpose of Complete Database Recovery
This chapter assumes that some or all of your datafiles are lost or damaged. Typically, this situation is caused by a media failure or accidental deletion. Your goal is to return the database to normal operation by restoring the damaged files from RMAN backups and recovering all database changes.
Scope of This Chapter
This chapter explain how to use complete recovery to fix the most common database problems. This chapter makes the following assumptions:
.•You have lost some or all datafiles and your goal is to recover all changes, but you have not lost all current control files or an entire online redo log group.
.•Your database is using the current server parameter file.
.•You have the complete set of archived redo logs and incremental backups needed for recovery of your datafile backups. Every datafile either has a backup, or a complete set of online and archived redo logs goes back to the creation of a datafile with no backup.
.RMAN can handle lost datafiles without user intervention during restore and recovery. When a datafile is lost, the possible cases can be classified as follows:
.◦The control file knows about the datafile, that is, you backed up the control file after datafile creation, but the datafile itself is not backed up. If the datafile record is in the control file, then RESTORE creates the datafile in the original location or in a user-specified location. The RECOVER command can then apply the necessary logs to the datafile.
.◦The control file does not have the datafile record, that is, you did not back up the control file after datafile creation. During recovery, the database will detect the missing datafile and report it to RMAN, which will create a new datafile and continue recovery by applying the remaining logs. If the datafile was created in a parent incarnation, then it will be created during the restore or recovery phase as appropriate.
.•You are not restoring and recovering an encrypted tablespace.
.If you perform media recovery on an encrypted tablespace, then the Oracle wallet must be open when performing media recovery of this tablespace. See Oracle Database Administrator's Guide to learn about encrypted tablespaces.
.•Your database runs in a single-instance configuration.
.While RMAN can restore and recover databases in Oracle RAC and Data Guard configurations, these scenarios are beyond the scope of this manual.
.•You are using the RMAN client rather than Oracle Enterprise Manager.
.Enterprise Manager provides access to RMAN through a set of wizards. These wizards lead you through a variety of recovery procedures based on an analysis of your database, your available backups, and your data recovery objectives.
.By using Enterprise Manager, you can perform the simpler restore and recovery scenarios outlined in this chapter. You can also use more sophisticated restore and recovery techniques such as point-in-time recovery and database flashback, which allow for efficient repair of media failures and user errors. In most cases, using Enterprise Manager is simpler than using the RMAN command-line client directly
Preparing for Complete Database Recovery
While RMAN simplifies most database restore and recovery tasks, you must still plan your database restore and recovery strategy based on which database files have been lost and your recovery goal. This section contains the following topics:
•Identifying the Database Files to Restore or Recover

•Determining the DBID of the Database

•Previewing Backups Used in Restore Operations

•Validating Backups Before Restoring Them

•Restoring Archived Redo Logs Needed for Recovery
Identifying the Database Files to Restore or Recover
The techniques for determining which files require restore or recovery depend upon the type of file that is lost.
Identifying a Lost Control File
It is usually obvious when the control file of your database is lost. The database shuts down immediately when any of the multiplexed control files becomes inaccessible. Also, the database reports an error if you try to start it without a valid control file at each location specified in the CONTROL_FILES initialization parameter.
Loss of some but not all copies of your control file does not require you to restore a control file from backup. If at least one control file remains intact, then you can either copy an intact copy of the control file over the damaged or missing control file, or update the initialization parameter file so that it does not refer to the damaged or missing control file. After the CONTROL_FILES parameter references only present, intact copies of the control file, you can restart your database.
If you restore the control file from backup, then you must perform media recovery of the whole database and then open it with the OPEN RESETLOGS option, even if no datafiles need to be restored. This technique is described in "Performing Recovery with a Backup Control File".
Identifying Datafiles Requiring Media Recovery
When and how to recover depends on the state of the database and the location of its datafiles.
Identifying Datafiles with RMAN
An easy technique for determining which datafiles are missing is to run a VALIDATE DATABASE command, which attempts to read all specified datafiles. For example, start the RMAN client and run the following commands to validate the database (sample output included).
Example 17-1 BACKUP VALIDATE DATABASE
RMAN> VALIDATE DATABASE;
Starting validate at 20-OCT-06allocated channel: ORA_DISK_1channel ORA_DISK_1: SID=90 device type=DISKcould not read file header for datafile 7 error reason 4RMAN-00571: ===========================================================RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============RMAN-00571: ===========================================================RMAN-03002: failure of backup command at 10/20/2007 13:05:43RMAN-06056: could not access datafile 7
The output in Example 17-1 indicates that datafile 7 is inaccessible. You can then run the REPORT SCHEMA command to obtain the tablespace name and filename for datafile 7 as follows (sample output included):
RMAN> REPORT SCHEMA;
Report of database schema for database with db_unique_name RDBMS
===========================File Size(MB) Tablespace RB segs Datafile Name---- -------- -------------------- ------- ------------------------1 450 SYSTEM *** +DATAFILE/tbs_01.f2 86 SYSAUX *** +DATAFILE/tbs_ax1.f3 15 UD1 *** +DATAFILE/tbs_undo1.f4 2 SYSTEM *** +DATAFILE/tbs_02.f5 2 TBS_1 *** +DATAFILE/tbs_11.f6 2 TBS_1 *** +DATAFILE/tbs_12.f7 2 TBS_2 *** +DATAFILE/tbs_21.fList of Temporary Files=======================File Size(MB) Tablespace Maxsize(MB) Tempfile Name---- -------- -------------------- ----------- --------------------1 40 TEMP 32767 +DATAFILE/tbs_tmp1.f
Identifying Datafiles with SQLAlthough VALIDATE DATABASE is a good technique for determining whether files are inaccessible, you may want to use SQL queries to obtain more detailed information.

To determine whether datafiles require media recovery:

1-1.Start SQL*Plus and connect to the target database instance. For example, issue the following commands to connect to trgt
22.Determine the status of the database by executing the following SQL query
SELECT STATUS FROM V$INSTANCE;If the status is OPEN, then the database is open. Nevertheless, some datafiles may require media recovery.

3.Query V$DATAFILE_HEADER to determine the status of your datafiles. Run the following SQL statements to check the datafile headers:

COL FILE# FORMAT 999COL STATUS FORMAT A7COL ERROR FORMAT A10COL TABLESPACE_NAME FORMAT A10COL NAME FORMAT A30SELECT FILE#, STATUS, ERROR, RECOVER, TABLESPACE_NAME, NAMEFROM V$DATAFILE_HEADERWHERE RECOVER = 'YES'OR (RECOVER IS NULL AND ERROR IS NOT NULL);Each row returned represents a datafile that either requires media recovery or has an error requiring a restore. Check the RECOVER and ERROR columns. RECOVER indicates whether a file needs media recovery, and ERROR indicates whether there was an error reading and validating the datafile header.

If ERROR is not NULL, then the datafile header cannot be read and validated. Check for a temporary hardware or operating system problem causing the error. If there is no such problem, you must restore the file or switch to a copy.

If the ERROR column is NULL and the RECOVER column is YES, then the file requires media recovery (and may also require a restore from backup).

Note:

Because V$DATAFILE_HEADER only reads the header block of each datafile, it does not detect all problems that require the datafile to be restored. For example, this view cannot tell whether a datafile contains corrupt data blocks.
4.Optionally, query V$RECOVER_FILE to list datafiles requiring recovery by datafile number with their status and error information. For example, execute the following query:

SELECT FILE#, ERROR, ONLINE_STATUS, CHANGE#, TIMEFROM V$RECOVER_FILE;Note:

You cannot use V$RECOVER_FILE with a control file restored from backup or a control file that was re-created after the time of the media failure affecting the datafiles. A restored or re-created control file does not contain the information needed to update V$RECOVER_FILE accurately.
To find datafile and tablespace names, you can also perform useful joins using the datafile number and the V$DATAFILE and V$TABLESPACE views. For example:

COL DF# FORMAT 999COL DF_NAME FORMAT A35COL TBSP_NAME FORMAT A7COL STATUS FORMAT A7COL ERROR FORMAT A10COL CHANGE# FORMAT 99999999SELECT r.FILE# AS df#, d.NAME AS df_name, t.NAME AS tbsp_name, d.STATUS, r.ERROR, r.CHANGE#, r.TIMEFROM V$RECOVER_FILE r, V$DATAFILE d, V$TABLESPACE tWHERE t.TS# = d.TS#AND d.FILE# = r.FILE#;The ERROR column identifies the problem for each file requiring recovery.
Determining the DBID of the Database
In situations requiring the recovery of your server parameter file or control file from autobackup, you need to know the DBID. You should record the DBID along with other basic information about your database.

If you do not have a record of the DBID of your database, then you can find it in the following places without opening your database:

.•The DBID is used in forming the filename for the control file autobackup. Locate this file, and then refer to "Configuring the Control File Autobackup Format" to determine where the DBID appears in the filename
.•If you have any text files that preserve the output from an RMAN session, then the DBID is displayed by the RMAN client when it starts up and connects to your database. Typical output follows:

% rman TARGET /Recovery Manager: Release 11.1.0.6.0 - Production on Wed Jul 11 17:51:30 2007Copyright (c) 1982, 2007, Oracle. All rights reserved.connected to target database: PROD (DBID=36508508
Previewing Backupa Used in Restore Operations
You can apply RESTORE ... PREVIEW to any RESTORE operation to create a detailed list of every backup to be used in the requested RESTORE operation, as well as the necessary target SCN for recovery after the RESTORE operation is complete. This command accesses the RMAN repository to query the backup metadata, but does not actually read the backup files to ensure that they can be restored.

As an alternative to RESTORE ... PREVIEW, you can use the RESTORE ... VALIDATE HEADER command. In addition to listing the files needed for restore and recovery, the RESTORE ... VALIDATE HEADER command validates the backup file headers to determine whether the files on disk or in the media management catalog correspond to the metadata in the RMAN repository.

When planning your restore and recovery operation, use RESTORE ... PREVIEW or RESTORE ... VALIDATE HEADER to ensure that all required backups are available or to identify situations in which you may want to direct RMAN to use or avoid specific backups.

To preview backups to be used in a restore operation:
1.Run a RESTORE command with the PREVIEW option.

For example, run one of the following commands:

RESTORE DATABASE PREVIEW;RESTORE ARCHIVELOG FROM TIME 'SYSDATE-7' PREVIEW;If the report produced by RESTORE ... PREVIEW provides too much information, then specify the SUMMARY option as shown in the following example:

RESTORE DATABASE PREVIEW SUMMARY;If satisfied with the output, then stop here. If the output indicates that RMAN will request a backup from a tape that you know is temporarily unavailable, then continue with this procedure. If the output indicates that a backup is stored offsite, then skip to "Recalling Offsite Backups".

2.If needed, use the CHANGE command to set the backup status of any temporarily unavailable backups to UNAVAILABLE.

"Updating a Backup to Status AVAILABLE or UNAVAILABLE" explains how to perform this task.

3.Optionally, run RESTORE ... PREVIEW again to confirm that the restore will not attempt to use unavailable backups.
Recalling Offsite Backups
Some media managers provide status information to RMAN about which backups are offsite. An offsite backup is stored in a remote location, such as a secure storage facility, and cannot be restored unless the media manager retrieves the media.

Offsite backups are marked as AVAILABLE in the RMAN repository even though the media must be retrieved from storage before the backup can be restored. If RMAN attempts to restore a offsite backup, then the restore job fails.

You can use RESTORE ... PREVIEW to identify offsite backups. The command output indicates whether backups are stored offsite, as shown by the text at the end of the sample output in Example 17-2.

Example 17-2 RESTORE ... PREVIEW Output

BS Key Size Device Type Elapsed Time Completion Time------- ---------- ----------- ------------ ---------------9 2.25M SBT_TAPE 00:00:00 21-MAY-07 BP Key: 9 Status: AVAILABLE Compressed: NO Tag: TAG20070521T144258 Handle: 0aii9k7i_1_1 Media: 0aii9k7i_1_1 List of Archived Logs in backup set 9 Thrd Seq Low SCN Low Time Next SCN Next Time ---- ------- ---------- --------- ---------- --------- 1 1 392314 21-MAY-07 392541 21-MAY-07 1 2 392541 21-MAY-07 392545 21-MAY-07 1 3 392545 21-MAY-07 392548 21-MAY-07 1 4 392548 21-MAY-07 395066 21-MAY-07 1 5 395066 21-MAY-07 395095 21-MAY-07 1 6 395095 21-MAY-07 395355 21-MAY-07List of remote backup files
Handle: aii9k7i_1_1 Media: 0aii9k7i_1_1validation succeeded for backup pieceFinished restore at 21-MAY-07released channel: dev1You can use RESTORE ... PREVIEW RECALL to instruct the media manager to make offsite backups available.

To recall offsite backups:

1.If backups are stored offsite, then execute a RESTORE ... PREVIEW command with the RECALL option.

The following example initiates recall for the offsite archived log backups shown in Example 17-2 (sample output included):

RESTORE ARCHIVELOG ALL PREVIEW RECALL;The following sample output indicates that RMAN initiated a recall:

List of Backup Sets===================
BS Key Size Device Type Elapsed Time Completion Time------- ---------- ----------- ------------ ---------------9 2.25M SBT_TAPE 00:00:00 21-MAY-07 BP Key: 9 Status: AVAILABLE Compressed: NO Tag: TAG20070521T144258 Handle: VAULT0aii9k7i_1_1 Media: /tmp,VAULT0aii9k7i_1_1 List of Archived Logs in backup set 9 Thrd Seq Low SCN Low Time Next SCN Next Time ---- ------- ---------- --------- ---------- --------- 1 1 392314 21-MAY-07 392541 21-MAY-07 1 2 392541 21-MAY-07 392545 21-MAY-07 1 3 392545 21-MAY-07 392548 21-MAY-07 1 4 392548 21-MAY-07 395066 21-MAY-07 1 5 395066 21-MAY-07 395095 21-MAY-07 1 6 395095 21-MAY-07 395355 21-MAY-07Initiated recall for the following list of remote backup files
Handle: VAULT0aii9k7i_1_1 Media: /tmp,VAULT0aii9k7i_1_1validation succeeded for backup pieceFinished restore at 21-MAY-07released channel: dev12.Run the RESTORE ... PREVIEW command. If necessary,
2. return to the previous step until no backups needed for the restore are reported as offsite.

Validating Backups Before Restoring Them
While the procedures in "Previewing Backups Used in Restore Operations" indicate which backups will be restored, they do not verify that the backups are actually usable. You can run RMAN commands to test the availability of usable backups for any RESTORE operation, or test the contents of a specific backup for use in RESTORE operations. The contents of the backups are actually read and checked for corruption. You have the following validation options:

•RESTORE ... VALIDATE tests whether RMAN can restore a specific object from a backup. RMAN chooses which backups to use.

•VALIDATE BACKUPSET tests the validity of a backup set that you specify.
Restoring Archived Redo Logs Needed for Recovery
RMAN restore archived redo log files from backup automatically as needed to perform recovery. You can also restore archived redo logs manually to save the time needed to restore these files later during the RECOVER command, or if you want to store the restored archived redo log files in some new location.

By default, RMAN restores archived redo logs with names constructed using the LOG_ARCHIVE_FORMAT and the LOG_ARCHIVE_DEST_1 parameters of the target database. These parameters are combined in a platform-specific fashion to form the name of the restored archived log.

Restoring Archived Redo Logs to a New Location
You can override the default location for restored archived redo logs with the SET ARCHIVELOG DESTINATION command. This command manually stages archived logs to different locations while a database restore is occurring. During recovery, RMAN knows where to find the newly restored archived logs; it does not require them to be in the location specified in the initialization parameter file.
1.After connecting to the target database, ensure the database is mounted or open.

2.Perform the following operations within a RUN block:

1.Specify the new location for the restored archived redo logs using SET ARCHIVELOG DESTINATION.

2.Either explicitly restore the archived redo logs or execute commands that automatically restore the logs.

The following example explicitly restores all backup archived logs to a new location:

RUN{ SET ARCHIVELOG DESTINATION TO '/oracle/temp_restore'; RESTORE ARCHIVELOG ALL; # restore and recover datafiles as needed . . .}
The following example sets the archived log destination and
then uses RECOVER DATABASE to restore archived logs from this destination automatically:

RUN{ SET ARCHIVELOG DESTINATION TO
Restoring Archived Redo Logs to Multiple Locations
You can specify restore destinations for archived logs multiple times in one RUN block, in order to distribute restored logs among several destinations. (You cannot, however specify multiple destinations simultaneously to produce multiple copies of the same log during the restore operation.) You can use this feature to manage disk space used to contain the restored logs.

This example restores 300 archived redo logs from backup, distributing them across the directories /fs1/tmp, /fs2/tmp, and /fs3/tmp:

RUN
# Set a new location for logs 1 through 100. SET ARCHIVELOG DESTINATION TO '/fs1/tmp'; RESTORE ARCHIVELOG FROM SEQUENCE 1 UNTIL SEQUENCE 100; # Set a new location for logs 101 through 200. SET ARCHIVELOG DESTINATION TO '/fs2/tmp'; RESTORE ARCHIVELOG FROM SEQUENCE 101 UNTIL SEQUENCE 200; # Set a new location for logs 201 through 300. SET ARCHIVELOG DESTINATION TO '/fs3/tmp'; RESTORE ARCHIVELOG FROM SEQUENCE 201 UNTIL SEQUENCE 300; # restore and recover datafiles as needed . . .}When you issue a RECOVER command, RMAN finds the needed restored archived logs automatically across the destinations to which they were restored, and applies them to the datafiles.

Performing Complete Database Recovery
This section assumes that you have already performed the tasks in "Preparing for Complete Database Recovery". This section describes the basic outline of complete database recovery, which is intended to encompass a wide range of different scenarios.
About Complete Database Recovery
You use the RESTORE and RECOVER commands to restore and recover the database. During the recovery, RMAN automatically restores backups of any needed archived redo logs. If backups are stored on a media manager, then channels must be configured in advance or a RUN block with ALLOCATE CHANNEL commands must be used to enable access to backups stored there.

If RMAN restores archived redo logs to the flash recovery area during a recovery, then it automatically deletes the restored logs after they applying them to the datafiles. Otherwise, you can use the DELETE ARCHIVELOG command to delete restored archived redo logs from disk when they are no longer needed for recovery. For example, you can enter the following command:


'/oracle/temp_restore';
RESTORE DATABASE; RECOVER DATABASE; # restores and recovers logs automatically}

ICAS4107B Manage resolution of system faults on a live system


Analyse complex system problems
This unit will introduce a variety of faults and problems those are likely to encounter at some point in your career. You will learn to gather data, assess the status of a computer system, identify different types of faults, classify them, and choose the most appropriate course of action and tools. You will also learn how to document the fault-finding process.

Outcomes for this unit
After completing this learning pack you will be able to:
.Organise the collected data to enable an understanding of the status of the system
.Analyse the data to determine that there is a problem, and the nature of the problem
Activity 1: Fault Types
This activity will require you to think about a series of faults and classify them, according to their nature. Your task is to complete the chart below
Table: Fault classification