MOS note 1496775.1 describes a situation with EM12cR2 where OEM will falsely report the Oracle HTTP Server instance (ohs1) as down, even though it is up. This is due to some changes in FMW 18.104.22.168. If you don’t have any incident rules or notifications set up that would catch this event, it’s easy to miss it and not know that it is happening. I had run into this note a couple times before but ignored it, since I had never seen any open events complaining about OHS being down so I figured I just wasn’t hitting the bug.
This morning I caught one of the events. I found myself wondering how often this had been happening — was it an issue once every couple days, every few hours, or what?
SQL> col msg format a45 SQL> select msg, count(*) from sysman.mgmt$events 2 where closed_date >= sysdate - 1 and msg like '%HTTP Server instance%' 3 group by msg; MSG COUNT(*) --------------------------------------------- ---------- CLEARED - The HTTP Server instance is up 430 The HTTP Server instance is down 430
Turns out it had been happening a LOT. If you’ve followed Oracle’s recommendations and set up target lifecycle status priorities (see my post on doing so) you’ve probably set your OEM targets up with “MissionCritical” priority. That means your OMS has been burning a lot of CPU to process all these up/down events on a mission critical target with high priority, potentially delaying processing of other events elsewhere in your events.
Applying patch 13490778, with ORACLE_HOME set to $MW_HOME/oracle_common should resolve this issue. For best results, stop all OEM components prior to patch application and restart them when complete.
To convince yourself that applying the patch helped, re-run that query about 15 minutes after applying the patch and you should see the count decrease.