Oracle Data Guard - Keeping your data safe !

mail

What if a standby database is open ?

Situation

Albeit following the start order, the standby database doesn't work as expected :

Details

Solution

  1. As oracle, stop the standby database in SQL*Plus :
    shutdown
    ORA-16175: cannot shut down database when media recovery is active
    shutdown immediate
    Database closed.
    Database dismounted.
    ORACLE instance shut down.
  2. Then, properly restart the standby database.
  3. Re-check the standby database status, as oracle :
    echo 'select status from v$instance;' | sqlplus / as sysdba
    STATUS
    ------------------------------------
    MOUNTED
    This is much better
  4. Check the standby database configuration :
    show configuration
    Configuration - ORACLE_SIDDR
    
      Protection Mode: MaxAvailability
      Databases:
        ORACLE_SID_PRIMARYDATACENTER - Primary database
          Warning: ORA-16817: unsynchronized fast-start failover configuration
    
        ORACLE_SID_STANDBYDATACENTER - (*) Physical standby database
          Warning: ORA-16817: unsynchronized fast-start failover configuration
    
    Fast-Start Failover: ENABLED
    
    Configuration Status:
    WARNING
    Looks like a double ORA-16817 error while FSFO is enabled means an automatic re-synchronization is on going.
  5. After a while, check the standby database configuration again :
    show configuration
    Configuration - ORACLE_SIDDR
    
      Protection Mode: MaxAvailability
      Databases:
        ORACLE_SID_PRIMARYDATACENTER - Primary database
        ORACLE_SID_STANDBYDATACENTER - (*) Physical standby database
    
    Fast-Start Failover: ENABLED
    
    Configuration Status:
    SUCCESS
mail

How to change the sys password of a database running Data Guard and FSFO ?

this is a DRAFT !

  1. disable FSFO
  2. disable Data Guard
  3. on the primary, change the sys password
  4. copy the orapwORACLE_SID password file from the primary to the standby database
  5. restart the standby database (shutdown + startup as standby)
  6. enable Data Guard (see also)
  7. create a new FSFO wallet
  8. update the wallet on the observer side
  9. make sure it's OK with switchover/failover tests (optional ?)
1. Arrêt de la primaire
2. Arrêt de la standby
3. re créer le mdp
4. copier le mdp vers standby
5. démarrer la primaire
6. démarrer la standby
7. re créer le wallet


1. désactiver FSFO
2. changer le mdp sys sur le primaire
3. redémarrer primaire pour prise en compte nouveau mdp
4. copier le fichier orapw vers le standby
5. redémarrer standby pour prise en compte nouveau mdp
6. créer nouveau wallet
7. mettre à jour le wallet côté observer
8. réactiver FSFO
mail

How to disable FSFO ?

FSFO is designed so that it automatically "fails over" from the primary to the standby server should the primary server become unavailable for any reason. However, if we're planning maintenance or any action on our Oracle servers involving the primary server to stop working as expected, FSFO will do its job and engage the failover procedure. This is why FSFO needs to be disabled prior to any maintenance action.

All actions below are executed as oracle on the primary server. There's nothing special to do on the standby server.

Don't forget to : export ORACLE_SID=ORACLE_SID

Step-by-step version :

  1. dgmgrl
  2. connect /
    Connected.
  3. show configuration
    Configuration - ORACLE_SIDDR
    
      Protection Mode: MaxAvailability
      Databases:
        ORACLE_SID_PRIMARYDATACENTER - Primary database
        ORACLE_SID_STANDBYDATACENTER - (*) Physical standby database
    
    Fast-Start Failover: ENABLED
    
    Configuration Status:
    SUCCESS
  4. disable fast_start failover;
    Disabled.
  5. show configuration
    Configuration - ORACLE_SIDDR
    
      Protection Mode: MaxAvailability
      Databases:
        ORACLE_SID_PRIMARYDATACENTER - Primary database
        ORACLE_SID_STANDBYDATACENTER - Physical standby database
    
    Fast-Start Failover: DISABLED
    
    Configuration Status:
    SUCCESS

Big-bang version :

dgmgrl / << EOC
show configuration;
disable fast_start failover;
show configuration;
EOC
Connected.
DGMGRL>
Configuration - ORACLE_SIDDR

  Protection Mode: MaxAvailability
  Databases:
    ORACLE_SID_PRIMARYDATACENTER - Primary database
    ORACLE_SID_STANDBYDATACENTER - (*) Physical standby database

Fast-Start Failover: ENABLED

Configuration Status:
SUCCESS

DGMGRL> Disabled.
DGMGRL>
Configuration - ORACLE_SIDDR

  Protection Mode: MaxAvailability
  Databases:
    ORACLE_SID_PRIMARYDATACENTER - Primary database
    ORACLE_SID_STANDBYDATACENTER - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

When the maintenance is over, don't forget to re-enable FSFO.

mail

How to view the Data Guard configuration ?

  1. export ORACLE_SID=MYORACLESID
  2. dgmgrl
  3. connect /
  4. show configuration
mail

How to debug primary to standby replication ?

Here are some points you should check if replication doesn't work :
  1. From the primary :

    tnsping standbyServiceName

    should return :
    TNS Ping Utility for Linux: Version 11.2.0.3.0 - Production on 19-NOV-2014 17:20:00
    Copyright (c) 1997, 2011, Oracle.  All rights reserved.
    Used parameter files:
    Used TNSNAMES adapter to resolve the alias
    Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = standby.example.com)(PORT = 1521))) (CONNECT_DATA = (SERVICE_NAME = serviceName)))
    OK (0 msec)
    Like for any network problem, if tnsping can't reach the standby, no need to search further for the moment.
  2. Make sure the Oracle password files match between the primary and standby servers. Run this on both hosts :
    • ls -l $ORACLE_HOME/dbs/orapw$ORACLE_SID
    • paranoid mode :
      md5sum $ORACLE_HOME/dbs/orapw$ORACLE_SID
    This is likely to happen after editing ORACLE_HOME, which is done during database upgrades.
  3. Make sure the Data Guard broker is started :
    export ORACLE_SID=ORACLE_SID; echo -e 'set linesize 200\nshow parameter dg_broker_start' | sqlplus -s / as sysdba; done
    NAME               TYPE       VALUE
    ------------------ ---------- --------
    dg_broker_start    boolean    TRUE
mail

How to get status / enable / disable the synchronization process ?

Check status, on primary :
column name format a25;
column value format a10;
SELECT name, type, value FROM v$parameter WHERE name=LOWER('log_archive_dest_state_2');
will return :
NAME                        TYPE    VALUE
--------------------------- ------- -------------
log_archive_dest_state_2    2       resultValue
With resultValue being :
  • enable : synchronization is active
  • defer : no synchronization
Enable sync from primary :
ALTER system SET log_archive_dest_state_2='enable' scope=both;
will return :
System modified.
Checking the status should say enable.
Disable sync from primary :
ALTER system SET log_archive_dest_state_2='defer' scope=both;
will return :
System modified.
Checking the status should say defer.
mail

How to check the primary and standby databases are synchronized ?

When configuring / altering Data Guard, the settings may be correct while the checks described below fail. It takes a few minutes for these settings to be applied.

All commands below are run as oracle and after doing : export ORACLE_SID=ORACLE_SID

  1. On the server having the standby role (standbyServer in "normal" situation, primaryServer after a switchover / failover), make sure there is no gap in archivelogs :
    echo 'select * from v$archive_gap;' | sqlplus -s / as sysdba
    This MUST return :
    no rows selected
    Otherwise, databases are NOT in sync, no matter what commands below return (including matching sequence numbers).
  2. On both servers, get the sequence numbers :
    echo 'select thread#, sequence#, archived, applied from v$archived_log where recid in (select max(recid) from v$archived_log);' | sqlplus -s '/ as sysdba'
       THREAD#  SEQUENCE# ARCHIVED  APPLIED
    ---------- ---------- --------- ---------------------------
             1         41 YES       NO
    The values of SEQUENCE# must match between servers.
  3. On the server having the primary role (primaryServer in "normal" situation, standbyServer after a switchover / failover) :
    echo 'alter system switch logfile;' | sqlplus -s '/as sysdba'
    System altered.
  4. Then, on both servers, get the sequence number again. It must return the same incremented SEQUENCE# :
       THREAD#  SEQUENCE# ARCHIVED  APPLIED
    ---------- ---------- --------- ---------------------------
             1         42 YES       NO
mail

How to check the primary and standby databases are synchronized ? (alternate methods)

Reasons explaining why both servers are not in sync :

Sequence numbers are constantly incrementing, so, to get good results, you have to execute both queries below within the same second.

  1. On the primary database :

    select to_char(CURRENT_SCN) from v$database;

    TO_CHAR(CURRENT_SCN)
    --------------------------------------------------------------------------------
    711171630
  2. On the standby database :

    select to_char(FIRST_CHANGE#), to_char(LAST_CHANGE#), FIRST_TIME, LAST_TIME from V$STANDBY_LOG where status='ACTIVE';

    TO_CHAR(FIRST_CHANGE#)    TO_CHAR(LAST_CHANGE#)    FIRST_TIME    LAST_TIME
    ------------------------- ------------------------ ------------- ------------
    711170565                 711171630               04-JUL-16     04-JUL-16

We get identical sequence numbers, showing that the MRP0 process is running fine.

Check replication status between primary and standby :

All commands below are executed as Oracle in SQL*Plus : export ORACLE_SID=ORACLE_SID; sqlplus / as sysdba

PART 1/2 : check sequence numbers :

  1. Get the RESETLOGS_CHANGE# from the primary database :

    SELECT RESETLOGS_CHANGE# FROM V$DATABASE_INCARNATION WHERE STATUS = 'CURRENT';

    RESETLOGS_CHANGE#
    -----------------
             697696214
  2. On the target physical standby database, identify any active SRL's, then the maximum applied sequence number(s) :

    SELECT GROUP#, THREAD#, SEQUENCE# FROM V$STANDBY_LOG WHERE STATUS = 'ACTIVE' ORDER BY THREAD#, SEQUENCE#;
    SELECT THREAD#, MAX(SEQUENCE#) FROM V$LOG_HISTORY WHERE RESETLOGS_CHANGE# = 697696214 GROUP BY THREAD#;

        GROUP#    THREAD#  SEQUENCE#
    ---------- ---------- ----------
             7          1       1416
    
       THREAD# MAX(SEQUENCE#)
    ---------- --------------
             1           1415
    (When DB sync is defer (i.e. disabled), status is UNASSIGNED.)

    The value of MAX(SEQUENCE#) should be close (difference < 3) to the value of SEQUENCE#.

PART 2/2 : confirm archive logs are being shipped and applied on standby :

  1. Verify that the primary has archived a log for the thread(s). On the primary database issue the following :

    SELECT THREAD#, MAX(SEQUENCE#) FROM V$LOG_HISTORY WHERE RESETLOGS_CHANGE#=(SELECT RESETLOGS_CHANGE# FROM V$DATABASE_INCARNATION WHERE STATUS = 'CURRENT') GROUP BY THREAD#;

       THREAD# MAX(SEQUENCE#)
    ---------- --------------
             1           1415
  2. Verify that the last SEQUENCE# received and the last SEQUENCE# applied to standby database match.

    SELECT al.thrd "Thread", almax "Last Seq Received", lhmax "Last Seq Applied" from (SELECT thread# thrd, max(sequence#) almax from v$archived_log where resetlogs_change#=(SELECT resetlogs_change# from v$database) group by thread#) al, (SELECT thread# thrd, max(sequence#) lhmax from v$log_history where first_time=(SELECT max(first_time) from v$log_history) group by thread#) lh where al.thrd = lh.thrd;

        Thread Last Seq Received Last Seq Applied
    ---------- ----------------- ----------------
            1              1415             1415
    ==> MATCH !!

OTHER METHOD (A) :

  1. STEP 1 :
    1. On the primary server, as oracle, run :
      export ORACLE_SID=MYORACLESID; sqlplus '/ as sysdba' << EOSQL
      SELECT THREAD#, MAX(SEQUENCE#) FROM V\$ARCHIVED_LOG group by thread#;
      EOSQL
      Which may return :
         THREAD# MAX(SEQUENCE#)
      ---------- --------------
              1           4560
    2. On the standby server, as oracle, run :
      export ORACLE_SID=MYORACLESID; sqlplus '/ as sysdba' << EOSQL
      SELECT THREAD#, SEQUENCE#, ARCHIVED, APPLIED
      FROM V\$ARCHIVED_LOG
      WHERE SEQUENCE# IN (SELECT MAX(SEQUENCE#) FROM V\$ARCHIVED_LOG);
      column process format a10;
      column status format a15;
      SELECT process, status, thread#, sequence#, delay_mins from V\$MANAGED_STANDBY;
      EOSQL
      Which may return :
      SQL>   2    3
         THREAD#  SEQUENCE# ARCHIVED  APPLIED
      ---------- ---------- --------- ---------------------------
               1        173 YES       YES
      
      SQL> SQL> SQL>
      PROCESS    STATUS             THREAD#  SEQUENCE# DELAY_MINS
      ---------- --------------- ---------- ---------- ----------
      ARCH       CLOSING                  1        171          0
      ARCH       CONNECTED                0          0          0
      ARCH       CLOSING                  1        172          0
      ARCH       CLOSING                  1        173          0
      MRP0       WAIT_FOR_LOG             1        174          0
      RFS        IDLE                     0          0          0
      RFS        IDLE                     0          0          0
      RFS        IDLE                     0          0          0
      RFS        IDLE                     1        174          0
      • ARCHIVED and APPLIED must be YES
      • Check the value of SEQUENCE#
      • There must be some ARCH, MRP0 and RFS processes.
  2. STEP 2 :
    1. On the primary server, as oracle, run :
      export ORACLE_SID=MYORACLESID; sqlplus '/ as sysdba' << EOSQL
      ALTER SYSTEM SWITCH LOGFILE;
      SELECT THREAD#, MAX(SEQUENCE#) FROM V\$ARCHIVED_LOG group by thread#;
      EOSQL
      Which may return :
      SQL>
      System altered.
      
      SQL>
         THREAD# MAX(SEQUENCE#)
      ---------- --------------
               1           4560
    2. On the standby server, still as oracle, run :
      export ORACLE_SID=MYORACLESID; sqlplus '/ as sysdba' << EOSQL
      SELECT THREAD#, SEQUENCE#, ARCHIVED, APPLIED
      FROM V\$ARCHIVED_LOG
      WHERE SEQUENCE# IN (SELECT MAX(SEQUENCE#) FROM V\$ARCHIVED_LOG);
      column process format a10;
      column status format a15;
      SELECT process,status,thread#,sequence#,delay_mins from V\$MANAGED_STANDBY;
      EOSQL
      Which may return :
      SQL>   2    3
         THREAD#  SEQUENCE# ARCHIVED  APPLIED
      ---------- ---------- --------- ---------------------------
               1        174 YES       YES
      
      SQL> SQL> SQL>
      PROCESS    STATUS             THREAD#  SEQUENCE# DELAY_MINS
      ---------- --------------- ---------- ---------- ----------
      ARCH       CLOSING                  1        174          0
      ARCH       CONNECTED                0          0          0
      ARCH       CLOSING                  1        172          0
      ARCH       CLOSING                  1        173          0
      MRP0       WAIT_FOR_LOG             1        175          0
      RFS        IDLE                     0          0          0
      RFS        IDLE                     0          0          0
      RFS        IDLE                     0          0          0
      RFS        IDLE                     1        175          0
      • ARCHIVED and APPLIED must be YES
      • There must be some ARCH, MRP0 and RFS processes.
      • The value of SEQUENCE# must have been incremented.
mail

How to re-synchronize a Data Guard replication ?

This article is still quite experimental and may be wrong/obsolete on several points. Consider it as a draft.
Some procedures described here are based on internal tools/scripts that may not be available.

  1. On the primary, if a full backup was made from production to preprod database server, change the sys password and copy the password file the standby :
    ALTER USER SYS IDENTIFIED BY "newPassword";
    scp /home/oracle/admin/ORACLE_SID/dbs/orapwORACLE_SID bob@standbyServer:/home/bob
  2. On both primary and standby, free some space in the database backup directory :
    find /backupDir/ORACLE_SID -maxdepth 1 -type f -delete
  3. On the primary, make a fresh backup, as root :
    /bin/su - oracle -c "/home/oracle/admgen/SavGen/SavGen.ksh ORACLE_SID"
  4. On the primary, send files to the standby :
    cd /backupDir/ORACLE_SID; find . -maxdepth 1 -type f | xargs tar cf ORACLE_SID.tar; scp ORACLE_SID.tar bob@standbyServer:/home/bob
  5. On the standby, retrieve the backup :
    cd /backupDir/ORACLE_SID; mv /home/bob/ORACLE_SID.tar .; tar xf ORACLE_SID.tar
  6. On the standby, the fun starts with SQL*Plus :
    export NLS_LANG=AMERICAN_AMERICA.AL32UTF8
    sqlplus / as sysdba
    shutdown immediate
    startup nomount
    exit
  7. On the standby, let's continue with RMAN :
    rman target sys@ORACLE_SID_DATACENTERPRIMARY auxiliary /
    (then enter the new password)
  8. On the standby :
    duplicate target database for standby NOFILENAMECHECK dorecover;
    exit
  9. On the standby, back to SQL*Plus :
    sqlplus / as sysdba
    alter database open read only;
    shutdown immediate;
    startup nomount;
    alter database mount standby database;
    alter database recover managed standby database using current logfile disconnect;
  10. Then check the replication is OK