CRS does not start GIPC error: [29] msg [gipcretConnectionRefused]

 

CRS does not start GIPC error: [29] msg [gipcretConnectionRefused]

Table of Contents

What to do first ?

  • Check your disk space using:  #  df
  • Check whether your are a firewall: # service iptables status ( <— this command is very important )
  • Use Nslookup and ping to verify you Cluster Interconnect

 

 

CRS does not start GIPC error: [29] msg [gipcretConnectionRefused]
Table of Contents    
What to do first ?
Scenario 1: Wrong IP Address
Scenario 2: Filesystem full ( 12c )
Scenario 3 : Firwall ON
References
What to do first ?
Check your disk space using:  #  df
Check whether your are a firewall: # service iptables status ( <— this command is very important )
Use Nslookup and ping to verify you Cluster Interconnect
Scenario 1: Wrong IP Address

Errors:
   GIPC repot error [29] msg [gipcretConnectionRefused]
   CHM report clsu_get_private_ip failed 

Check CRS status
[root@grac41 Desktop]#  crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4534: Cannot communicate with Event Manager

[root@grac41 network-scripts]# my_crs_stat_init
NAME                           TARGET     STATE           SERVER       STATE_DETAILS   
-------------------------      ---------- ----------      ------------ ------------------
ora.asm                        ONLINE     OFFLINE                      Instance Shutdown
ora.cluster_interconnect.haip  ONLINE     OFFLINE                       
ora.crf                        ONLINE     ONLINE          grac41        
ora.crsd                       ONLINE     OFFLINE                       
ora.cssd                       ONLINE     UNKNOWN         grac41        
ora.cssdmonitor                ONLINE     ONLINE          grac41        
ora.ctssd                      ONLINE     OFFLINE                       
ora.diskmon                    OFFLINE    OFFLINE                       
ora.drivers.acfs               ONLINE     ONLINE          grac41        
ora.evmd                       ONLINE     OFFLINE                       
ora.gipcd                      ONLINE     ONLINE          grac41        
ora.gpnpd                      ONLINE     ONLINE          grac41        
ora.mdnsd                      ONLINE     ONLINE          grac41 
--> ASM, HAIP, CRSD, CTSSD, DISKMON, EVMD resource are OFFLINE  !

Check traces - ohasd trace file 
[root@grac41 ohasd]#  cat ohasd.log | grep -i failed
2014-04-22 15:09:17.966: [    AGFW][2735122176]{0:0:2} ora.cluster_interconnect.haip 1 1 received state from probe request. Old state = UNKNOWN, New state = FAILED
2014-04-22 15:09:30.292: [    GPNP][2745628416]clsgpnp_getCachedProfileEx: [at clsgpnp.c:623] Result: (26) CLSGPNP_NO_PROFILE. Failed to get offline GPnP service profile. 
2014-04-22 15:09:30.602: [    GPNP][2717640448]clsgpnp_getCachedProfileEx: [at clsgpnp.c:623] Result: (26) CLSGPNP_NO_PROFILE. Failed to get offline GPnP service profile. 
--> HAIP goes to FAILED status 

Try to find any repeating updated tracefiles - maybe some RAC process tries to fix the network problem 
[grid@grac41 grac41]$ date;  find . -type f -printf "%CY-%Cm-%Cd %CH:%CM:%CS  %h/%f\n" | sort -n | tail -5
Tue Apr 22 13:24:40 CEST 2014
2014-04-22 13:24:30.0571859790  ./gpnpd/gpnpd.log
2014-04-22 13:24:33.0756944610  ./agent/ohasd/oracssdmonitor_root/oracssdmonitor_root.log
2014-04-22 13:24:38.0881994320  ./ohasd/ohasd.log
2014-04-22 13:24:38.3523314350  ./gipcd/gipcd.log
2014-04-22 13:24:39.0876989250  ./crfmond/crfmond.log

[grid@grac41 grac41]$ date;  find . -type f -printf "%CY-%Cm-%Cd %CH:%CM:%CS  %h/%f\n" | sort -n | tail -5
Tue Apr 22 13:24:43 CEST 2014
2014-04-22 13:24:30.0571859790  ./gpnpd/gpnpd.log
2014-04-22 13:24:33.0756944610  ./agent/ohasd/oracssdmonitor_root/oracssdmonitor_root.log
2014-04-22 13:24:43.1007044060  ./ohasd/ohasd.log
2014-04-22 13:24:43.3668374000  ./gipcd/gipcd.log
2014-04-22 13:24:43.7580328990  ./crfmond/crfmond.log

[grid@grac41 grac41]$ date;  find . -type f -printf "%CY-%Cm-%Cd %CH:%CM:%CS  %h/%f\n" | sort -n | tail -5
Tue Apr 22 13:24:47 CEST 2014
2014-04-22 13:24:30.0571859790  ./gpnpd/gpnpd.log
2014-04-22 13:24:33.0756944610  ./agent/ohasd/oracssdmonitor_root/oracssdmonitor_root.log
2014-04-22 13:24:43.1007044060  ./ohasd/ohasd.log
2014-04-22 13:24:44.0972023860  ./crfmond/crfmond.log
2014-04-22 13:24:46.4033548850  ./gipcd/gipcd.log
--> Here we cans see  that ./ohasd/ohasd.log  ./gipcd/gipcd.log ./crfmond/crfmond.log 

Use tail to see what's going :
[grid@grac41 grac41]$ tail -f  ./gpnpd/gpnpd.log
2014-04-22 13:19:59.175: [  OCRMSG][4002494208]GIPC error [29] msg [gipcretConnectionRefused]
2014-04-22 13:21:29.469: [  OCRMSG][4002494208]GIPC error [29] msg [gipcretConnectionRefused]
2014-04-22 13:22:59.792: [  OCRMSG][4002494208]GIPC error [29] msg [gipcretConnectionRefused]
2014-04-22 13:24:30.057: [  OCRMSG][4002494208]GIPC error [29] msg [gipcretConnectionRefused]
2014-04-22 13:26:00.383: [  OCRMSG][4002494208]GIPC error [29] msg [gipcretConnectionRefused]
2014-04-22 13:27:30.622: [  OCRMSG][4002494208]GIPC error [29] msg [gipcretConnectionRefused]
2014-04-22 13:29:00.869: [  OCRMSG][4002494208]GIPC error [29] msg [gipcretConnectionRefused]
2014-04-22 13:30:31.203: [  OCRMSG][4002494208]GIPC error [29] msg [gipcretConnectionRefused]
2014-04-22 13:32:01.459: [  OCRMSG][4002494208]GIPC error [29] msg [gipcretConnectionRefused]
2014-04-22 13:33:31.770: [  OCRMSG][4002494208]GIPC error [29] msg [gipcretConnectionRefused]

[grid@grac41 grac41]$  tail -f    ./ohasd/ohasd.log
2014-04-22 13:33:42.806: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0x2d03370 [0000000000000010] { gipchaContext : host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2014-04-22 13:33:47.817: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0x2d03370 [0000000000000010] { gipchaContext : host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2014-04-22 13:33:52.839: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0x2d03370 [0000000000000010] { gipchaContext : host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2014-04-22 13:33:57.848: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0x2d03370 [0000000000000010] { gipchaContext : host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2014-04-22 13:34:03.859: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0x2d03370 [0000000000000010] { gipchaContext : host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2014-04-22 13:34:09.874: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0x2d03370 [0000000000000010] { gipchaContext : host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2014-04-22 13:34:15.881: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0x2d03370 [0000000000000010] { gipchaContext : host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2014-04-22 13:34:20.900: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0x2d03370 [0000000000000010] { gipchaContext : host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2014-04-22 13:34:25.920: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0x2d03370 [0000000000000010] { gipchaContext : host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2014-04-22 13:34:30.934: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0x2d03370 [0000000000000010] { gipchaContext : host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd

[grid@grac41 grac41]$ tail -f   ./crfmond/crfmond.log
[   CLWAL][467654400]clsw_Initialize: OLR initlevel [70000]
2014-04-22 13:34:49.349: [    CRFM][467654400]crfm_connstr: clsu_get_private_ip failed(7).
2014-04-22 13:34:49.458: [    CRFM][467654400]crfm_connect_to: send fail(gipcret: 13)
2014-04-22 13:34:49.458: [    CRFM][467654400]crfmctx dump follows
2014-04-22 13:34:49.458: [    CRFM][467654400]****************************
2014-04-22 13:34:49.458: [    CRFM][467654400]crfm_dumpctx: connection local name: tcp://0.0.0.0:45871
2014-04-22 13:34:49.458: [    CRFM][467654400]crfm_dumpctx: connection peer name:  tcp://192.168.1.101:61021
2014-04-22 13:34:49.458: [    CRFM][467654400]crfm_dumpctx: connaddr:  tcp://grac41:61021
2014-04-22 13:34:49.458: [    CRFM][467654400]crfm_dumpctx: ctype:  2
2014-04-22 13:34:49.458: [    CRFM][467654400]crfm_dumpctx: mytype:  0
2014-04-22 13:34:49.458: [    CRFM][467654400]crfm_dumpctx: hostname  grac41
2014-04-22 13:34:49.458: [    CRFM][467654400]crfm_dumpctx: myport:  
2014-04-22 13:34:49.458: [    CRFM][467654400]crfm_dumpctx: rhostname  
2014-04-22 13:34:49.458: [    CRFM][467654400]crfm_dumpctx: rport:  
2014-04-22 13:34:49.458: [    CRFM][467654400]crfm_dumpctx: flags:  1
2014-04-22 13:34:49.458: [    CRFM][467654400]****************************

According to above traces we can see that clsu_get_private_ip failed  getting private IP tcp://192.168.1.101

Check Network status and DNS
[root@grac41 Desktop]# ifconfig
eth1      Link encap:Ethernet  HWaddr 08:00:27:89:E9:A2  
          inet addr:192.168.2.101  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fe89:e9a2/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:17148 errors:0 dropped:0 overruns:0 frame:0
          TX packets:13307 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:22041591 (21.0 MiB)  TX bytes:1211055 (1.1 MiB)
          Interrupt:9 Base address:0xd240 

eth2      Link encap:Ethernet  HWaddr 08:00:27:6B:E2:BD  
          inet addr:192.168.1.101  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fe6b:e2bd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:17517 errors:0 dropped:0 overruns:0 frame:0
          TX packets:13475 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:22191772 (21.1 MiB)  TX bytes:1230703 (1.1 MiB)
          Interrupt:5 Base address:0xd260 
--> Check public and private interface for errors / Looks good 

[root@grac41 Desktop]# nslookup grac41
Name:    grac41.example.com
Address: 192.168.1.101

[root@grac41 Desktop]# nslookup grac41int
Name:    grac41int.example.com
Address: 192.168.2.101

[root@grac41 Desktop]# nslookup 192.168.1.101
101.1.168.192.in-addr.arpa    name = grac41.example.com.

[root@grac41 Desktop]# nslookup  192.168.2.101
101.2.168.192.in-addr.arpa    name = grac41int.example.com.
--> DNS and Network seems to be ok

Restart CRS

root@grac41 Desktop]# crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'grac41'
CRS-2673: Attempting to stop 'ora.crf' on 'grac41'
CRS-2673: Attempting to stop 'ora.ctssd' on 'grac41'
CRS-2673: Attempting to stop 'ora.evmd' on 'grac41'
...
CRS-2673: Attempting to stop 'ora.gpnpd' on 'grac41'
CRS-2677: Stop of 'ora.gpnpd' on 'grac41' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'grac41' has completed
CRS-4133: Oracle High Availability Services has been stopped.

Cleanup /var/tmp/.oracle
# rm  /var/tmp/.oracle/*
[root@grac41 Desktop]# crsctl start crs
[root@grac41 Desktop]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4534: Cannot communicate with Event Manager
--> Problem persists

Check OS logfile
#  cat /var/log/messages
--> Nothing related

Run orcheck ( and orcdump ) to check whether we can access our OCR repostory
[root@grac41 Desktop]#  ocrcheck
Status of Oracle Cluster Registry is as follows :
     Version                  :          3
     Total space (kbytes)     :     262120
     Used space (kbytes)      :       4076
     Available space (kbytes) :     258044
     ID                       :  630679368
     Device/File Name         :       +OCR
                                    Device/File integrity check succeeded
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
     Cluster registry integrity check succeeded
     Logical corruption check succeeded

Query voting disk :
[grid@grac41 grac41]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   b0e94e5d83054fe9bf58b6b98bfacd65 (/dev/asmdisk1_udev_sdf1) [OCR]
 2. ONLINE   88c2a08b4c8c4f85bf0109e0990388e4 (/dev/asmdisk1_udev_sdg1) [OCR]
 3. ONLINE   1108f9a41e814fb2bfed879ff0039dd0 (/dev/asmdisk1_udev_sdh1) [OCR]
Located 3 voting disk(s).

Debugging GIPCD and GPnPD daemons using strace 
As GIPCD and GPnPD daemon traces gets updated every 5s lets check the gipcd process with strace
# ps -elf | egrep 'gpnpd.bin|gipcd.bin'
# strace -t -f  -p 24376   2>&1  | grep '192.168' | grep eth
[pid 24872] 09:17:28 <... ioctl resumed> 200, {{"lo", {AF_INET, inet_addr("127.0.0.1")}}, {"eth0", {AF_INET, inet_addr("10.0.2.15")}}, {"eth1", {AF_INET, inet_addr("192.168.2.101")}}, {"eth2", {AF_INET, inet_addr("192.168.1.101")}}, {"virbr0", {AF_INET, inet_addr("192.168.122.1")}}}}) = 0
[pid 24870] 09:17:28 <... ioctl resumed> , {ifr_name="eth1", ifr_addr={AF_INET, inet_addr("192.168.2.101")}}) = 0
[pid 24870] 09:17:28 <... ioctl resumed> , {ifr_name="eth1", ifr_broadaddr={AF_INET, inet_addr("192.168.2.255")}}) = 0
[pid 24872] 09:17:28 <... ioctl resumed> , {ifr_name="eth1", ifr_addr={AF_INET, inet_addr("192.168.2.101")}}) = 0
[pid 24870] 09:17:28 <... ioctl resumed> , {ifr_name="eth2", ifr_addr={AF_INET, inet_addr("192.168.1.101")}}) = 0
[pid 24872] 09:17:28 <... ioctl resumed> , {ifr_name="eth1", ifr_broadaddr={AF_INET, inet_addr("192.168.2.255")}}) = 0
[pid 24870] 09:17:28 <... ioctl resumed> , {ifr_name="eth2", ifr_broadaddr={AF_INET, inet_addr("192.168.1.255")}}) = 0
[pid 24872] 09:17:28 <... ioctl resumed> , {ifr_name="eth2", ifr_addr={AF_INET, inet_addr("192.168.1.101")}}) = 0
[pid 24872] 09:17:28 <... ioctl resumed> , {ifr_name="eth2", ifr_broadaddr={AF_INET, inet_addr("192.168.1.255")}}) = 0
..
[pid 24872] 09:17:33 <... ioctl resumed> 200, {{"lo", {AF_INET, inet_addr("127.0.0.1")}}, {"eth0", {AF_INET, inet_addr("10.0.2.15")}}, {"eth1", {AF_INET, inet_addr("192.168.2.101")}}, {"eth2", {AF_INET, inet_addr("192.168.1.101")}}, {"virbr0", {AF_INET, inet_addr("192.168.122.1")}}}}) = 0
[pid 24870] 09:17:33 <... ioctl resumed> , {ifr_name="eth1", ifr_addr={AF_INET, inet_addr("192.168.2.101")}}) = 0
[pid 24870] 09:17:33 <... ioctl resumed> , {ifr_name="eth1", ifr_broadaddr={AF_INET, inet_addr("192.168.2.255")}}) = 0
[pid 24870] 09:17:33 <... ioctl resumed> , {ifr_name="eth2", ifr_addr={AF_INET, inet_addr("192.168.1.101")}}) = 0
[pid 24870] 09:17:33 <... ioctl resumed> , {ifr_name="eth2", ifr_broadaddr={AF_INET, inet_addr("192.168.1.255")}}) = 0
[pid 24872] 09:17:33 <... ioctl resumed> , {ifr_name="eth1", ifr_addr={AF_INET, inet_addr("192.168.2.101")}}) = 0
[pid 24872] 09:17:33 <... ioctl resumed> , {ifr_name="eth1", ifr_broadaddr={AF_INET, inet_addr("192.168.2.255")}}) = 0
[pid 24872] 09:17:33 <... ioctl resumed> , {ifr_name="eth2", ifr_addr={AF_INET, inet_addr("192.168.1.101")}}) = 0
..
--> Again we don't get an OS error but we are looping running the same ioctl() command
    Seems the kernel is not happy with the inforamtion we get from ioctl() call  and tries to reread the information every 5 seconds 


Check GPnP profile
[root@grac41 Desktop]#  gpnptool get > profile.xml 
Edit  profile.xml and extract the adapter usage 
<gpnp:Network-Profile><gpnp:HostNetwork id="gen" HostName="*">
   <gpnp:Network id="net1" IP="192.168.1.0" Adapter="eth1" Use="public"/>
   <gpnp:Network id="net2" IP="192.168.2.0" Adapter="eth2" Use="cluster_interconnect"/>
Verify with ifconfig
[root@grac41 Desktop]# ifconfig | egrep 'HWaddr|inet addr'
eth1      Link encap:Ethernet  HWaddr 08:00:27:89:E9:A2  
          inet addr:192.168.2.101  Bcast:192.168.2.255  Mask:255.255.255.0
eth2      Link encap:Ethernet  HWaddr 08:00:27:6B:E2:BD  
          inet addr:192.168.1.101  Bcast:192.168.1.255  Mask:255.255.255.0
          inet addr:127.0.0.1  Mask:255.0.0.0
--> eth1 is using  192.168.2.101 but according GPnP Profile it should use 192.168.1.101
    eth2 is using  192.168.1.101 but according GPnP Profile it should use 192.168.2.101

Problem found :
During manuall editing  ifcfg-eth1 and ifcfg-eth2  HWADR entry was wrongly filled ( /etc/sysconfig/network-scripts )

Reconfiguring/restart network and CRS
[root@grac41 network-scripts]# cat  ifcfg-eth2
HWADDR=08:00:27:89:E9:A2
IPADDR=192.168.2.101
NAME=eth2
[root@grac41 network-scripts]# cat  ifcfg-eth1 
IPADDR=192.168.1.101
NAME=eth1
HWADDR=08:00:27:6B:E2:BD

After changing HWaddr to follow the above ifconfig output the network looks good
[root@grac41 network-scripts] service network restart
[root@grac41 network-scripts]# ifconfig | egrep 'HWaddr|inet addr'
eth1      Link encap:Ethernet  HWaddr 08:00:27:89:E9:A2  
          inet addr:192.168.1.101  Bcast:192.168.1.255  Mask:255.255.255.0
eth2      Link encap:Ethernet  HWaddr 08:00:27:6B:E2:BD  
          inet addr:192.168.2.101  Bcast:192.168.2.255  Mask:255.255.255.0

Restart CRS
[root@grac41 network-scripts]# crsctl stop crs -f
[root@grac41 network-scripts]# crsctl start crs
[root@grac41 network-scripts]# crsctl check cluster -all
**************************************************************
grac41:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
grac42:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
grac43:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************

Lessons learned 
 - Verify carefully that IP addresses and Network Device names are clusterwide  consistent 

Scenario 2: Filesystem full ( 12c )
[root@gract1 Desktop]# crsi
*****  Local Resources: *****
Resource NAME               INST   TARGET       STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.asm                        1   ONLINE       OFFLINE      -               STABLE
ora.cluster_interconnect.haip  1   ONLINE       OFFLINE      -               STABLE
ora.crf                        1   ONLINE       OFFLINE      -               STABLE
ora.crsd                       1   ONLINE       OFFLINE      -               STABLE
ora.cssd                       1   ONLINE       OFFLINE      -               STABLE
ora.cssdmonitor                1   OFFLINE      OFFLINE      -               STABLE
ora.ctssd                      1   ONLINE       OFFLINE      -               STABLE
ora.diskmon                    1   OFFLINE      OFFLINE      -               STABLE
ora.drivers.acfs               1   ONLINE       ONLINE       gract1          STABLE
ora.evmd                       1   ONLINE       OFFLINE      gract1          STARTING
ora.gipcd                      1   ONLINE       OFFLINE      -               STABLE
ora.gpnpd                      1   ONLINE       OFFLINE      -               STABLE
ora.mdnsd                      1   ONLINE       OFFLINE      gract1          STARTING
ora.storage                    1   ONLINE       OFFLINE      -               STABLE

Related client trace 
2014-08-22 10:57:07.750: [  OCRMSG][2296473152]prom_waitconnect: CONN NOT ESTABLISHED (0,29,1,2)
2014-08-22 10:57:07.750: [  OCRMSG][2296473152]GIPC error [29] msg [gipcretConnectionRefused]
2014-08-22 10:57:07.750: [  OCRMSG][2296473152]prom_connect: error while waiting for connection complete [24]
2014-08-22 10:57:07.821: [  OCRMSG][2296473152]prom_waitconnect: CONN NOT ESTABLISHED (0,29,1,2)
2014-08-22 10:57:07.821: [  OCRMSG][2296473152]GIPC error [29] msg [gipcretConnectionRefused]
2014-08-22 10:57:07.821: [  OCRMSG][2296473152]prom_connect: error while waiting for connection complete [24]

Root Cause : File system full : 100% - No traces can be written 
# df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/vg_oel64-lv_root
                      39603624  37798864         0 100% /
tmpfs                  4194304       272   4194032   1% /dev/shm
/dev/sda1               495844    101751    368493  22% /boot
Scenario 3 : Firwall ON
*****  Cluster Resources: *****
Resource NAME               INST   TARGET    STATE        SERVER          STATE_DETAILS
--------------------------- ----   ------------ ------------ --------------- -----------------------------------------
ora.asm                        1   ONLINE    OFFLINE      -               STABLE
ora.cluster_interconnect.haip  1   ONLINE    OFFLINE      -               STABLE
ora.crf                        1   ONLINE    OFFLINE      -               STABLE
ora.crsd                       1   ONLINE    OFFLINE      -               STABLE
ora.cssd                       1   ONLINE    OFFLINE      -               STABLE
ora.cssdmonitor                1   ONLINE    ONLINE       gract2          STABLE
ora.ctssd                      1   ONLINE    OFFLINE      -               STABLE
ora.diskmon                    1   OFFLINE    OFFLINE      -               STABLE
ora.drivers.acfs               1   ONLINE    ONLINE       gract2          STABLE
ora.evmd                       1   ONLINE    INTERMEDIATE gract2          STABLE
ora.gipcd                      1   ONLINE    ONLINE       gract2          STABLE
ora.gpnpd                      1   ONLINE    ONLINE       gract2          STABLE
ora.mdnsd                      1   ONLINE    ONLINE       gract2          STABLE
ora.storage                    1   ONLINE    OFFLINE      -               STABLE

--> CSSD doesn't become ONLINE 

Client log :
014-08-23 11:49:21.920: [  OCRMSG][2580342528]GIPC error [29] msg [gipcretConnectionRefused]
2014-08-23 11:49:42.948: [  OCRMSG][2580342528]GIPC error [29] msg [gipcretConnectionRefused]
2014-08-23 11:50:10.978: [  OCRMSG][2580342528]GIPC error [29] msg [gipcretConnectionRefused]
2014-08-23 11:50:46.008: [  OCRMSG][2580342528]GIPC error [29] msg [gipcretConnectionRefused]
2014-08-23 11:51:28.042: [  OCRMSG][2580342528]GIPC error [29] msg [gipcretConnectionRefused]
2014-08-23 11:51:28.042: [  OCRMSG][2580342528]GIPC error [29] msg [gipcretConnectionRefused]

20665 <... connect resumed> )           = 0
20665 connect(66, {sa_family=AF_FILE, path="/var/tmp/.oracle/sOHASD_UI_SOCKET"}, 110 <unfinished ...>
20665 <... connect resumed> )           = 0
20665 connect(73, {sa_family=AF_FILE, path="/var/tmp/.oracle/sprocr_local_conn_0_PROC"}, 110 <unfinished ...>
20665 <... connect resumed> )           = -1 ECONNREFUSED (Connection refused)

occsd.log :
2014-08-23 12:32:58.427: [    CSSD][1279260416]clssnmvDHBValidateNCopy: node 1, gract1, has a disk HB, but no network HB, 
         DHB has rcfg 304252836, wrtcnt, 3207223, LATS 4294823390, lastSeqNo 3207220, uniqueness 1408783210, timestamp 1408789980/5988764
2014-08-23 12:32:58.427: [    CSSD][1283991296]clssnmvDHBValidateNCopy: node 1, gract1, has a disk HB, but no network HB, 
         DHB has rcfg 304252836, wrtcnt, 3207224, LATS 4294823390, lastSeqNo 3207221, uniqueness 1408783210, timestamp 1408789980/5988864
Fix : Disable Firewall
References
Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip (Doc ID 1210883.1)
Grid Infrastructure Installation root.sh Failed with “Failed to start CTSS” (Doc ID 1277307.1)
Troubleshoot Grid Infrastructure Startup Issues (Doc ID 1050908.1)
Top 5 Grid Infrastructure Startup Issues (Doc ID 1368382.1)

 

点赞