Oracle Fast-Start Failover - Step by step setup

mail

FSFO

Prerequisites :

Please read the preliminary notes, that are the same here than for settings up Data Guard.

Unless explicitly specified, all commands below are executed as oracle.

  1. Make sure there is no gap in archivelogs
  2. Check transmission of archivelogs

From asynchronous to synchronous mode (source) :

  • asynchronous mode : updates are committed on the primary server before the log file arrives on the standby server(s). This log transport mode is required for the Maximum Performance data protection mode.
  • synchronous mode : updates on the primary server will NOT be committed until the logs have been successfully transported to the standby server(s). This log transport mode is required for the Maximum Protection and Maximum Availability data protection modes.

On primaryServer :

Step-by-step version :

  1. dgmgrl
  2. connect /
    Connected.
  3. show database 'ORACLE_SID_PRIMARYDATACENTER' 'LogXptMode';
    	LogXptMode = 'ASYNC'
  4. show database 'ORACLE_SID_STANDBYDATACENTER' 'LogXptMode';
    	LogXptMode = 'ASYNC'
  5. edit database 'ORACLE_SID_PRIMARYDATACENTER' set property logxptmode=SYNC;
    Property "logxptmode" updated
  6. edit database 'ORACLE_SID_STANDBYDATACENTER' set property logxptmode=SYNC;
    Property "logxptmode" updated
  7. show database 'ORACLE_SID_PRIMARYDATACENTER' 'LogXptMode';
    	LogXptMode = 'sync'
  8. show database 'ORACLE_SID_STANDBYDATACENTER' 'LogXptMode';
    	LogXptMode = 'sync'
  9. edit configuration set protection mode as MAXAVAILABILITY;
    Succeeded.
  10. show configuration;
    Configuration - ORACLE_SIDDR
    
    	Protection Mode: MaxAvailability
    	Databases:
    		ORACLE_SID_PRIMARYDATACENTER - Primary database
    		ORACLE_SID_STANDBYDATACENTER - Physical standby database
    
    Fast-Start Failover: DISABLED
    
    Configuration Status:
    SUCCESS

Big-bang version :

dgmgrl / << EOC
show database '${ORACLE_SID}_PRIMARYDATACENTER' 'LogXptMode';
show database '${ORACLE_SID}_STANDBYDATACENTER' 'LogXptMode';
edit database '${ORACLE_SID}_PRIMARYDATACENTER' set property logxptmode=SYNC;
edit database '${ORACLE_SID}_STANDBYDATACENTER' set property logxptmode=SYNC;
show database '${ORACLE_SID}_PRIMARYDATACENTER' 'LogXptMode';
show database '${ORACLE_SID}_STANDBYDATACENTER' 'LogXptMode';
edit configuration set protection mode as MAXAVAILABILITY;
show configuration;
EOC
Welcome to DGMGRL, type "help" for information.
Connected.
DGMGRL>	LogXptMode = 'async'
DGMGRL>	LogXptMode = 'async'
DGMGRL> Property "logxptmode" updated
DGMGRL> Property "logxptmode" updated
DGMGRL>	LogXptMode = 'sync'
DGMGRL>	LogXptMode = 'sync'
DGMGRL> Succeeded.
DGMGRL>
Configuration - ORACLE_SIDDR

	Protection Mode: MaxAvailability
	Databases:
		ORACLE_SID_PRIMARYDATACENTER - Primary database
		ORACLE_SID_STANDBYDATACENTER - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

Switchover :

We'll now test the switchover with these new parameters :

  1. Switchover the primary database from PRIMARYDATACENTER to STANDBYDATACENTER : follow this procedure.
  2. Back to initial state : follow this procedure.

The only difference you may notice from the screenshots of the procedures linked above is when running show configuration verbose. The Protection Mode is now :

	Protection Mode: MaxAvailability

The FlashBack filesystem :

Creation :

At this step, as root, we're supposed to :
lvcreate -n lv_orafra_oracle_sid -L 100g data_vg
mkfs -t ext4 /dev/data_vg/lv_orafra_oracle_sid
lvdisplay /dev/mapper/data_vg-lv_orafra_oracle_sid

mkdir /appli/oracle/ORACLE_SID/orafra
mount /dev/mapper/data_vg-lv_orafra_oracle_sid /appli/oracle/ORACLE_SID/orafra
chown -R oracle:oinstall /appli/oracle/ORACLE_SID/orafra
chmod 750 /appli/oracle/ORACLE_SID/orafra/

Then append to /etc/fstab :
/dev/data_vg/lv_orafra_oracle_sid /appli/oracle/ORACLE_SID/orafra ext4 defaults 1 2

NB : 100GB is an estimated size...

==> But this can be done more easily with the "createFs" tools.

don't forget to create the FlashBack filesystem on both servers !

Declaration :

As oracle :
sqlplus '/ as sysdba' << EOSQL
alter system set db_recovery_file_dest_size = 100G;
alter system set db_recovery_file_dest='/appli/oracle/$ORACLE_SID/orafra' scope=both;
alter system set db_flashback_retention_target=480 scope=both;
column type format a11;
show parameter db_recovery
show parameter db_flashback
EOSQL
System altered.

System altered.

System altered.


NAME                          TYPE            VALUE
----------------------------- --------------- ------------------------------
db_recovery_file_dest         string          /appli/oracle/ORACLE_SID/orafra
db_recovery_file_dest_size    big integer     100G


NAME                             TYPE       VALUE
-------------------------------- ---------- --------
db_flashback_retention_target    integer    480

don't forget to declare the FlashBack filesystem on both servers !

Activation :

  1. On primaryServer : echo 'alter database flashback on;' | sqlplus -s / as sysdba
    Database altered.
  2. On standbyServer :
    sqlplus -s '/ as sysdba' << EOSQL
    alter database recover managed standby database cancel;
    alter database flashback on;
    alter database recover managed standby database using current logfile disconnect from session;
    EOSQL
    Database altered.
    
    Database altered.
    
    Database altered.
  3. Then, on both servers :
    sqlplus -s '/ as sysdba' << EOSQL
    column name format a15;
    column flashback_on format a20;
    select name, flashback_on from v\$database;
    EOSQL
    NAME          FLASHBACK_ON
    ------------- ---------------
    ORACLE_SID    YES

Oracle trigger :

Creation :

On primaryServer only, as oracle :
sqlplus -s '/ as sysdba' << EOSQL
CREATE OR REPLACE TRIGGER start_DGservice after startup on database
DECLARE
    mirole VARCHAR(30);
    miservicenames varchar2(2000);
    miservice varchar2(100) := '$ORACLE_SID';
    milen number := 0;
BEGIN
    SELECT DATABASE_ROLE INTO mirole FROM V\$DATABASE;
    IF mirole = 'PRIMARY' THEN
        DBMS_SERVICE.START_SERVICE(miservice);
        select trim(replace(value,miservice)) into miservicenames from v\$parameter where name = 'service_names';
        milen := length(miservicenames);
        if (milen > 0) then
            if (substr(miservicenames,milen,1) = ',') then
                miservicenames:= substr(miservicenames,1,milen-1);
            end if;
        end if;
        execute immediate 'alter system set service_names='''||miservicenames||''' scope=spfile';
    end if;
END;
/
EOSQL

Activation :

Again on primaryServer only, as oracle :
  1. sqlplus / as sysdba
  2. alter system set service_names='ORACLE_SID' scope=both;
    System altered.
  3. exec dbms_service.start_service('ORACLE_SID');
    PL/SQL procedure successfully completed.
    If this returns :
    ERROR at line 1:
    ORA-44305: service ORACLE_SID is running
    ORA-06512: at "SYS.DBMS_SYS_ERROR", line 86
    ORA-06512: at "SYS.DBMS_SERVICE", line 454
    ORA-06512: at "SYS.DBMS_SERVICE", line 375
    ORA-06512: at line 1
    Try :
    exec dbms_service.stop_service('ORACLE_SID');
    PL/SQL procedure successfully completed.
    Then retry the "start" command above. This weird behavior is the reason why this whole step doesn't work from the command line with a giant heredoc
  4. set linesize 200
    COL "TRIGGER_NAME" FORMAT A20
    COL "TRIGGERING_EVENT" FORMAT A20
    select TRIGGER_NAME, TRIGGER_TYPE, TRIGGERING_EVENT, STATUS, ACTION_TYPE from dba_triggers where TRIGGER_NAME like '%DG%';
    
    TRIGGER_NAME       TRIGGER_TYPE    TRIGGERING_EVENT    STATUS     ACTION_TYPE
    ------------------ --------------- ------------------- ---------- --------------
    START_DGSERVICE    AFTER EVENT     STARTUP             ENABLED    PL/SQL

FSFO Wallet :

Creation of wallet (What's a wallet ?, more details) :

If you're working on an application having several database instances, you'll have to :
  • create 1 wallet per instance, stored in $TNS_ADMIN/ORACLE_SID/wallet (a.k.a. /home/oracle/network/admin/ORACLE_SID/wallet)
  • have 1 sqlnet.ora per instance : /home/oracle/network/admin/ORACLE_SID/sqlnet.ora, and update the environment variable, for each instance : TNS_NAME=/home/oracle/network/admin/ORACLE_SID (WTF : the env. var. is set system-wide and can't be instance-specific. I must have missed something when dealing with my DBA )
    In practice, having a single sqlnet.ora repeating the WALLET_LOCATION = ... block for each instance should do the trick.
On primaryServer only, as oracle :
  1. Pick a wallet password from the generated list :
    source /home/afh/livraison/05.00.00/functions.sh; for i in {1..5}; do generatePassword 12 oracle; done
  2. Create the wallet directory :
    walletDirectory="$TNS_ADMIN/wallet"; mkdir -m 755 -p "$walletDirectory"; mkstore -wrl "$walletDirectory" -create
    Oracle Secret Store Tool : Version 11.2.0.4.0 - Production
    Copyright (c) 2004, 2013, Oracle and/or its affiliates. All rights reserved.
    
    Enter password: walletPassword
    Enter password again: walletPassword
    TNS_ADMIN's value is /home/oracle/network/admin.
  3. Add credentials for sys@ORACLE_SID_PRIMARYDATACENTER :
    mkstore -wrl "$walletDirectory" -createCredential ORACLE_SID_PRIMARYDATACENTER sys 'sysPassword'
    Oracle Secret Store Tool : Version 11.2.0.4.0 - Production
    Copyright (c) 2004, 2013, Oracle and/or its affiliates. All rights reserved.
    
    Enter wallet password: walletPassword
    Create credential oracle.security.client.connect_string1
  4. Same job for STANDBYDATACENTER :
    mkstore -wrl "$walletDirectory" -createCredential ORACLE_SID_STANDBYDATACENTER sys 'sysPassword'
    Oracle Secret Store Tool : Version 11.2.0.4.0 - Production
    Copyright (c) 2004, 2013, Oracle and/or its affiliates. All rights reserved.
    
    Enter wallet password: walletPassword
    Create credential oracle.security.client.connect_string2
  5. Check :
    mkstore -wrl "$walletDirectory" -listCredential
    Oracle Secret Store Tool : Version 11.2.0.4.0 - Production
    Copyright (c) 2004, 2013, Oracle and/or its affiliates. All rights reserved.
    
    Enter wallet password: walletPassword
    List credential (index: connect_string username)
    2: ORACLE_SID_STANDBYDATACENTER sys
    1: ORACLE_SID_PRIMARYDATACENTER sys

Creation / edition of sqlnet.ora :

Still as oracle on primaryServer :

  1. Does sqlnet.ora already exist ?
    ls /home/oracle/network/admin/sqlnet.ora
    We'll use that information later.
  2. cat << EOF >> /home/oracle/network/admin/sqlnet.ora
    WALLET_LOCATION =
        (SOURCE =
            (METHOD = FILE)
            (METHOD_DATA =
                (DIRECTORY = $walletDirectory)
            )
        )
    
    DIAG_ADR_ENABLED = OFF
    SQLNET.WALLET_OVERRIDE = TRUE
    EOF

Test connection with the wallet :

Let's check we can make a passwordless connection to the broker using the wallet, as oracle on primaryServer

  1. dgmgrl
  2. connect /@ORACLE_SID_PRIMARYDATACENTER
    Connected.
  3. connect /@ORACLE_SID_STANDBYDATACENTER
    Connected.
  4. If both connections succeeded, undo/comment the changes made earlier to sqlnet.ora. If the file has been created at that time, it can safely be purged : > /home/oracle/network/admin/sqlnet.ora

Technically speaking, you should disconnect from the 1st connection, leave DGMGRL, then try the 2nd connection. Because after the 1st connection succeeded, you're connected to ORACLE_SID_PRIMARYDATACENTER. No big deal anyway, since this is where we're testing from.

FSFO Observer :

On primaryServer :

  1. Ask dbaOracle to connect via SSH to primaryServer. This will create /home/dbaOracleId.
  2. Then, as root :
    • single instance :
      nni='dbaOracleId'; destinationDir="/home/$nni"; archiveName='wallet.tar'; cd /home/oracle/network/admin/wallet/; tar -cf "$destinationDir/$archiveName" *; for oraFile in /home/oracle/network/admin/tnsnames.ora /home/oracle/network/admin/sqlnet.ora; do [ -f "$oraFile" ] && cp "$oraFile" "$destinationDir/"; done; chmod 700 "$destinationDir/"*{tar,ora}; chown "$nni":users "$destinationDir/"*{tar,ora}; cd "$destinationDir"
    • multiple instances :
      nni='dbaOracleId'; destinationDir="/home/$nni"; for database in BASE1 BASE2 BASE3; do archiveName="wallet$database.tar"; cd "/home/oracle/network/admin/$database/wallet/"; tar -cf "$destinationDir/$archiveName" *; done; for oraFile in /home/oracle/network/admin/tnsnames.ora /home/oracle/network/admin/sqlnet.ora; do [ -f "$oraFile" ] && cp "$oraFile" "$destinationDir/"; done; chmod 700 "$destinationDir/"*{tar,ora}; chown "$nni":users "$destinationDir/"*{tar,ora}; cd "$destinationDir"
  3. Tell dbaOracle that files are available in his home directory on primaryServer.
  4. Wait for his feedback to continue.
  5. Once you can continue, delete the contents of the wallet directory :
    rm -f /home/oracle/network/admin/wallet/*

Enable FSFO :

As oracle on primaryServer :

Don't forget to : export ORACLE_SID=ORACLE_SID

Step-by-step version :

  1. dgmgrl
  2. connect /
    Connected.
    If this outputs, instead :
    ORA-12578: TNS:wallet open failed
    It's because you forgot to undo the edits made to /home/oracle/network/admin/sqlnet.ora at a previous step.
  3. show configuration
    Configuration - ORACLE_SIDDR
    
    	Protection Mode: MaxAvailability
    	Databases:
    		ORACLE_SID_PRIMARYDATACENTER - Primary database
    		ORACLE_SID_STANDBYDATACENTER - Physical standby database
    
    Fast-Start Failover: DISABLED
    
    Configuration Status:
    SUCCESS
  4. Enable FSFO :
    enable fast_start failover
    Enabled.
  5. Update the threshold parameter :
    edit configuration set property FastStartFailoverThreshold = '120';
    Property "faststartfailoverthreshold" updated
  6. Check : show fast_start failover;
    Fast-Start Failover: ENABLED
    
    	Threshold:          120 seconds
    	Target:             ORACLE_SID_STANDBYDATACENTER
    	Observer:           observer.mycompany.tld
    	Lag Limit:          30 seconds (not in use)
    	Shutdown Primary:   TRUE
    	Auto-reinstate:     TRUE
    	Observer Reconnect: (none)
    	Observer Override:  FALSE
    
    Configurable Failover Conditions
    	Health Conditions:
    		Corrupted Controlfile          YES
    		Corrupted Dictionary           YES
    		Inaccessible Logfile            NO
    		Stuck Archiver                  NO
    		Datafile Offline               YES
    
    	Oracle Error Conditions:
    		(none)

Big-bang version :

dgmgrl / << EOC
show configuration;
enable fast_start failover;
edit configuration set property FastStartFailoverThreshold = '120';
show fast_start failover;
EOC
Welcome to DGMGRL, type "help" for information.
Connected.
DGMGRL>
Configuration - ORACLE_SIDDR

	Protection Mode: MaxAvailability
	Databases:
		ORACLE_SID_PRIMARYDATACENTER - Primary database
		ORACLE_SID_STANDBYDATACENTER - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

DGMGRL> Enabled.
DGMGRL> Property "faststartfailoverthreshold" updated
DGMGRL>
Fast-Start Failover: ENABLED

	Threshold:          120 seconds
	Target:             ORACLE_SID_STANDBYDATACENTER
	Observer:           observer.mycompany.tld
	Lag Limit:          30 seconds (not in use)
	Shutdown Primary:   TRUE
	Auto-reinstate:     TRUE
	Observer Reconnect: (none)
	Observer Override:  FALSE

Configurable Failover Conditions
	Health Conditions:
		Corrupted Controlfile          YES
		Corrupted Dictionary           YES
		Inaccessible Logfile            NO
		Stuck Archiver                  NO
		Datafile Offline               YES

	Oracle Error Conditions:
		(none)

Failover test :

Everything is configured : if primaryServer goes down, FSFO should automatically failover to standbyServer. Let's turn primaryServer down without notice/clean shutdown and see what happens.

  1. As oracle on primaryServer : echo 'shutdown abort' | sqlplus / as sysdba
    ORACLE instance shut down.
  2. Wait at least 2 minutes so that the database switch to standbyServer is executed and active.
  3. As oracle on standbyServer : dgmgrl
  4. connect /
    Connected.
  5. show configuration
    Configuration - ORACLE_SIDDR
    
    	Protection Mode: MaxAvailability
    	Databases:
    		ORACLE_SID_STANDBYDATACENTER - Primary database
    		Warning: ORA-16817: unsynchronized fast-start failover configuration
    
    		ORACLE_SID_PRIMARYDATACENTER - (*) Physical standby database (disabled)
    		ORA-16661: the standby database needs to be reinstated
    
    Fast-Start Failover: ENABLED
    
    Configuration Status:
    WARNING

    We can see that standbyServer has become the primary database and that the whole system has an unstable status.

Reinstatement of the database on primaryServer :

  1. As oracle on primaryServer :
    echo 'startup mount' | sqlplus -s / as sysdba
    Connected to an idle instance.
    
    Database mounted.
  2. As oracle on standbyServer :
    dgmgrl
    connect sys
    Password: sys password
    Connected.
  3. reinstate database 'ORACLE_SID_PRIMARYDATACENTER';
    • It is likely that the reinstate command fails :
      Reinstating database "ORACLE_SID_PRIMARYDATACENTER", please wait...
      Error: ORA-16653: failed to reinstate database
      
      Failed.
      Reinstatement of database "ORACLE_SID_PRIMARYDATACENTER" failed
      Or :
      Reinstating database "ORACLE_SID_PRIMARYDATACENTER", please wait...
      Error: ORA-16657: reinstatement of database in progress
      
      Failed.
      Database reinstatement for "ORACLE_SID_PRIMARYDATACENTER" in progress.

      This means the Oracle Observer is currently already reinstating the database (which is its expected behavior, FSFO RuLeZ !!! ). To make sure :

      show configuration
      Configuration - ORACLE_SIDDR
      
      	Protection Mode: MaxAvailability
      	Databases:
      		ORACLE_SID_STANDBYDATACENTER - Primary database
      		ORACLE_SID_PRIMARYDATACENTER - (*) Physical standby database
      
      Fast-Start Failover: ENABLED
      
      Configuration Status:
      SUCCESS
    • Otherwise, the reinstate should return :
      Reinstating database "ORACLE_SID_PRIMARYDATACENTER", please wait...
      Operation requires shutdown of instance "ORACLE_SID" on database "ORACLE_SID_PRIMARYDATACENTER"
      
      
      
      Reinstatement of database "ORACLE_SID_PRIMARYDATACENTER" succeeded

Back to normal : switchover from standbyServer to primaryServer :

  1. Make sure the primary and standby servers are still synchronized.
  2. Switchover, back to initial situation.
  3. You _may_ check again primary and standby servers are still synchronized.
  4. If servers are effectively still synchronized : congratulations, this FSFO configuration is finished !!!