Recovering from RMAN Errors
The options for Manual Completion of an RMAN Duplication Task vary depending on which phase of the duplication failed. The following examples cover Oracle 9i and 10g. Manual completion steps out outlined per phase and most likely you’ll only two to execute a couple of the steps depending on where the failure occurred.
Restore Failure
If you receive an error during the RMAN restore of the database you need to determine what caused the problem and fix it. If very few files have been restored it may be easier to just start over and rerun the task from the beginning. If the duplication process failed after running for a long period of time and you rather not to start from the beginning (especially if it takes hours+) then you can try to recover manually and attempt to complete the process.
A likely cause if you are going from one server to another server is missing files or the rare outside possibility of a bad block in a file required for restore. You need to address whatever problem caused the file(s) to be missing from the restore location.
Remember that in order to successfully duplicate a database using RMAN’s DUPLICATE feature is that ALL files required to restore the database must be present on the remote server and in the same exact location unless you catalog the files if they are intentionally located in another directory. RMAN will not even start the restore process if it can’t find the backupsets in the expected location. A reason RMAN may be interrupted in this phase is because during the copy operating you ran out of disk space in the filesystem where you were depositing the backupsets, backup current controlfile and backup spfile and RMAN can’t locate one of the backupsets or arcivelog files.
$ export ORACLE_SID= (name of database cloning to)$ rman target /RMAN> run {set until scn xxxxxxxx;restore current controlfile from ‘restore directory’;alter database mount;set newname for datafile 1 to ”;set newname for datafile 2 to ”;….restore datafile i,i2,….;}
You need to identify the SCN from the output from the failed RMAN duplicate log and you must use the ‘SET NEWNAME’ for each datafile that remains to be restored as DB_FILE_NAME_CONVERT will not work with a normal restore.
For Oracle 10g+ it really is best to start over if RMAN failed during the first phase or restore step. Any files that have already been restored will be skipped and the duplicate process can be restarted without manual intervention.
Phase 2 Failure
This is the controlfile creation or switch of the datafile names after the datafiles have been restored. You need to review the log files and identify what the problem is and make sure you make a list to use for all the datafiles that have not been switched over. You can then attempt to complete this step manually by rename each datafile if the auxiliary instance uses a different file structure or the ASM Disk Group is different than that of the target’s directory structure or ASM Disk Group.
After the rename (switch) of all the datafiles that need to be renamed:
CREATE CONTROLFILE REUSE SET DATABASE RESETLOGS ARCHIVELOG…
SQL> alter database backup controlfile to trace;
Phase 3 Failure
Failure during the recovery of the restored datafiles. This is the next phase where each datafile is recovered to either a point in time or SCN. Determine the cause from the log file and then to continue after fixing the problem:
$ rman target / auxiliary sys/@RMAN> run {set until scn xxxxxxx;recover clone database;alter clone database open resetlogs;}
Get the ‘UNTIL SCN’ value from the duplicate logfile, connect to the target. Archivelogs will be automatically restored at 10g, restored into the Flash Recovery Area if this is defined. After completing recovery, change the Database Identifier (DBID) using the NID utility on Windows:
$ nid target=sys/oracle
DBNEWID …….…..Change database ID of database AUX? (Y/N)=>Y
The manual duplication process should be complete and you can jump down to Step 6 – Final Actions.
Phase 4 Failure
This phase is the controlfile recreation phase. Check the rman duplicate log and identify the reason the recovery didn’t complete – look for:
media recovery completeFinished recover at
Figure out what the problem is and fix the cause then execute the following after fixing the cause:
CREATE CONTROLFILE REUSE SET DATABASE ‘AUX’ RESETLOGS ARCHIVELOG…
Make sure ALL the files have been restored in the DATAFILE section of the RMAN duplicate log.
Phase 5 Failure
This failure would be in the phase that opens the database with resetlogs. Check the log file again and fix the problem. Look for Thread x closed at log sequence y
If the resetlogs was completed, determine what cause the error and fix the problem and restart the auxiliary instance. If resetlogs wasn’t completed successfully, determine what cause that problem and then open the clone database with resetlogs using RMAN (you can’t use SQL*Plus for this step) and connect to the target database first:
$ rman target / auxiliary sys/oracle@RMAN> alter clone database open resetlogs;
If the duplication process failed only in steps 5 then you are done, no further action is required. The DBID will have already been changed. Otherwise, execute the NID command to change the DBID (Windows).
Final Steps
$ rman target / auxiliary sys/oracle@RMAN> alter clone database open resetlogs;
Add any temp files missing to the new cloned auxiliary database. Files that were manuall restored to the auxiliary instance will be cataloged as datafile copies. Connect to the original target and execute:
RMAN> list copy of database;RMAN> crosscheck copy of datafile ;RMAN> delete expired copy of datafile ;
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment