Recently we came across a problem  where one web site configured in OHS/WebCache was being hammered by a user (Similar to a DoS attack but it was not a hacker on this occasion) and it brought down the entire web layer when the application server running under this website stopped responding, this started connection build up on the OHS/Web cache eventually taking it down causing a downtime for all websites being served via this OHS/Web Cache tier.

We thought this can’t or should not happen as this site was limited to a maximum connections of 200 but it did and we needed to find out exactly what caused this downtime. So here you go,

The error reported in the Webcache logs was.

[2016-08-16T15:41:11+01:00] [webcache] [ERROR:1] [WXE-11313] [logging] [ecid: ] The cache server reached the maximum number of allowed incoming connections. Listening is temporarily suspended on port 80.
[2016-08-16T15:41:16+01:00] [webcache] [ERROR:1] [WXE-11314] [logging] [ecid: ] Listening has resumed on port 80.
[2016-08-16T15:41:18+01:00] [webcache] [WARNING:1] [WXE-14023] [backend] [ecid: 35110495184610,0:1] No origin server is running.
[2016-08-16T15:41:18+01:00] [webcache] [ERROR:32] [WXE-11364] [frontend] [ecid: 35110495184610,0:1] Network error response is returned.

We had this config in place,

+ httpd.conf (Linux)
--------------------------------------
<IfModule mpm_worker_module>
StartServers 2
-> MaxClients 150
MinSpareThreads 25
MaxSpareThreads 75
-> ThreadsPerChild 25
MaxRequestsPerChild 0
AcceptMutex fcntl
LockFile "${ORACLE_INSTANCE}/diagnostics/logs/${COMPONENT_TYPE}/${COMPONENT_NAME}/http_lock"
</IfModule>


+ webcache.xml
----------------
<RESOURCELIMITS MAXINBOUNDCONNECTIONS="500" MAXCACHESIZE_MB="500"/>



+ webcache.xml (Origin Servers)
----------------------------------------
<HOST ID="h1" ISPROXY="NO" NAME="OHS-Machine1" PORT="7777" OSSTATE="ON" LOADLIMIT="100" NUMRETRY="5" PINGURL="/" PINGINTERVAL="10" SSLENABLED="NONE" ISPROXYPASSWORDPLAINTEXT="YES"/>
<HOST ID="h2" ISPROXY="NO" NAME="OHS-Machine1" PORT="4443" OSSTATE="ON" LOADLIMIT="100" NUMRETRY="5" PINGURL="/" PINGINTERVAL="10" SSLENABLED="SSL" ISPROXYPASSWORDPLAINTEXT="YES"/>
<HOST ID="h3" ISPROXY="NO" NAME="OHS-Machine1" PORT="7786" OSSTATE="ON" LOADLIMIT="100" NUMRETRY="5" PINGURL="/" PINGINTERVAL="10" SSLENABLED="NONE" ISPROXYPASSWORDPLAINTEXT="YES"/>
<HOST ID="h4" ISPROXY="NO" NAME="OHS-Machine1" PORT="7788" OSSTATE="ON" LOADLIMIT="100" NUMRETRY="5" PINGURL="/" PINGINTERVAL="10" SSLENABLED="NONE" ISPROXYPASSWORDPLAINTEXT="YES"/>
<HOST ID="h5" ISPROXY="NO" NAME="OHS-Machine1" PORT="7795" OSSTATE="ON" LOADLIMIT="100" NUMRETRY="5" PINGURL="/" PINGINTERVAL="10" SSLENABLED="NONE" ISPROXYPASSWORDPLAINTEXT="YES"/>
<HOST ID="h6" ISPROXY="NO" NAME="OHS-Machine1" PORT="7791" OSSTATE="ON" LOADLIMIT="100" NUMRETRY="5" PINGURL="/" PINGINTERVAL="10" SSLENABLED="NONE" ISPROXYPASSWORDPLAINTEXT="YES"/>
<HOST ID="h7" ISPROXY="NO" NAME="OHS-Machine1" PORT="7792" OSSTATE="ON" LOADLIMIT="200" NUMRETRY="5" PINGURL="/" PINGINTERVAL="10" SSLENABLED="NONE" ISPROXYPASSWORDPLAINTEXT="YES"/>

So what was wrong, as I have highlighted in RED above,

What we had configured is this,

LOADLIMIT=”200″
MaxClients 150
MAXINBOUNDCONNECTIONS=”500″

Right configuration would have been,

LOADLIMIT (Or Origin Server Capacity configured via http://OHS-Machine1:9400/webcacheadmin) should NEVER CROSS The MaxClients in httpd.conf.

LOADLIMIT=“200”
MaxClients 250
MAXINBOUNDCONNECTIONS=”500″

So basically when a particular site reaches the allowed capacity (LOADLIMIT) then it should just respond with the following error instead of bringing down the entire Web Tier.

ohs1