DB2 System Backup Recovery ** Data Sharing Considerations **
DB2 for z/OS V8 provides enhanced backup and recover capabilities at the
DB2 subsystem or data sharing group level. The purpose is to provide an easier and less disruptive way to make fast volume level backups of an entire DB2 subsystem or data sharing group with minimal disruption, and recover the subsystem or data sharing group to an arbitrary point-in-time.
RECOVER TO CURRENT – SYSPITR LRSN
The arbitrary point-in-time (LRSN) chosen for recovery is specified by adding into the BSDS a conditional restart control record (CRCR) using the SYSPITR option.
With the current design and implementation of RESTORE SYSTEM, you *CANNOT* recover to absolute current point of time in a data sharing group i.e., you cannot recover to the absolute very end of the merged log stream across members. You can only recover as far forward as the MIN (last log LRSN across the *active* members).
You can set the SYSPITR LRSN to any value higher than the time of the backup, and lower than or equal to MIN (last log LRSN across the *active* members). Note that it is an inclusive value i.e., if a log record exists with the specific LRSN value, it is kept and DB2 will truncate the log right after this log record.
ACTIVE MEMBERS
As an absolute minimum the logs must be truncated to the same SYSPITR LRSN for (1) all members up and running at the SYSPITR log truncation point and
(2) those members restarted after the SYSPITR log truncation point.
You cannot pick and chose truncation points per member. You must use the same common point-in-time (LRSN) so that the RESTORE SYSTEM utility will recover the whole system to a consistent point.
DORMANT MEMBER
A *dormant* quiesced member is one that you intend never to come ever up again. There should be no need to add CRCR SYSPITR LRSN into the BSDS of a dormant member once it was quiesced for the very last time prior to the BACKUP SYSTEM being taken.
Note: If someone ever wants to bring a dormant (sleeping, never planned to be active) member back to life, but has lost BSDS and/or active logs - then the process is to re-initialise the BSDS and LOGs, and at first DB2 restart with the new BSDS/logs, we will begin logging at RBA 0.
EMERGENCY/AUXILIARY (NON-DORMANT) MEMBER
Some customers have an emergency/auxiliary (non-dormant) member that comes up and down occasionally. This is no longer a good idea; it can be a very bad idea. If an auxiliary member was *active* after BACKUP SYSTEM was taken and shutdown before SYSPITR LRSN, and CRCR SYSPITR LRSN was added to the BSDS of respective member, then the log end LRSN value for that member will come in play, it is fixed and this will then limit how far forward you can come on the merged log stream towards MIN (log end LRSN across the *active* members).
You should keep all non-dormant members up after taking the system level backup and DB2 will write "heartbeat" log records to keep log end LRSN for each member continuous moving forward.
FAILED MEMBER
A failed member that has remained down in failed state prior to the PITR point must be restarted normally, and stopped before doing the SYSPITR LRSN truncations. No customer shop should leave a failed member down. They should be using ARM and/or System Automation with DB2 RESTART(LIGHT) to detect the failed member and restart the failed member as soon as possible.
DB2 STRUCTURES
You *MUST* force away all possible DB2 structures and associated connections before any attempt is made to restart DB2 and run the RESTORE SYSTEM utility. Otherwise, it could lead to data corruption.
TIME-BASED CHECKPOINTING
It is strongly recommended to use time-based checkpointing to keep the RBLP (recovery base log point) for the whole data sharing group moving forward.
RBLP is linked to the minimum of system checkpoint LRSN of all members and GBP checkpoint. If not RBLP could be way back in the past, and log apply processing will have to start way back in the past.
QUIESCE – SET LOG SUSPEND
If you know in advance the point-in-time you may want to recover to (e.g.
the beginning of a batch window), you can run the QUIESCE utility against a dummy table (or some table) to register the specific LRSN at that moment in time. As an alternative you could use SET LOG SUSPEND to generate an LRSN value. However, this is not an essential ingredient for system level PIT recovery.
PPRC/XRC
RESTORE SYSTEM will fail if at least one of the target volumes is a primary or secondary volume in an XRC or PPRC volume pair.
OTHER IMPORTANT CONSIDERATIONS
You will lose some committed data recovering to a prior point (after SYSPITR LRSN).
Authors:
John J. Campbell
Distinguished Engineer
DB2 UDB for z/OS Development
IBM Software Group - Information Management
Florence Dubois
EMEA SWAT Team - DB2 for z/OS Development IBM Software Group - Information Management
Michael Dewert
EMEA SWAT Team - DB2 for z/OS Development IBM Software Group - Information Management
P.S. Please pass my address to anyone interested in DB2 HOTLINE - thank you.
With kind regards
Michael Dewert, Software Group
DB2 Development
DB2 Information Management Software