Tag Archives: bug

Fix: Plugin error when upgrading EM13cR1 agent to EM13cR2 (13.2) on Windows 2008 R2 x64

I ran into the following issue while attempting to upgrade to the Oracle management agent on my one Windows (2008 R2, x64) server to the 13.2.0.0.0 version distributed with Oracle Enterprise Manager 13.2. The agent upgrade repeatedly failed in the “Upgrading Management Agent” step with an error message:


Exit Code :0
The version is 13.1.0.0.0
Checking for the version 13.2.0.0.0
The agent is not upgraded successfully
[...]
Plugins upgrade failed.
Plugin upgrade failed.
Check the file E:/agent13c/agent_inst/install/plugins_upgrade.txt.status on agent for plugin upgrade status.
Check latest E:/agent13c/agent_inst/install/logs/agentplugindeploy_.log
Plugins upgrade failed.0

The log file referenced in the error message contains some, but not much, additional information:


The command executed for discovery at _2016_10_12_11_48_28 is : E:\agent13c\agent_inst\bin\emctl_upgrade.bat set_discovery_root plugin oracle.sysman.si E:\agent13c\agent_13.2.0.0.0\plugins\oracle.sysman.si.discovery.plugin_13.2.1.0.0 13.2.1.0.0
Oracle Enterprise Manager Cloud Control 13c Release 2
Copyright (c) 1996, 2016 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
return value is : 65280
Install case : set_discovery_root failure...existing on error value 65280

I fixed this issue by installing the Microsoft Visual C++ 2010 Service Pack 1 Redistributable Package MFC Security Update package on the server. Finally, create a new directory called “prerelogs” inside of the …\agent_13.2.0.0.0\cfgtoollogs\ directory, or else my deployment hung forever during the “Performing prerequisite checks” step. Note that “prerelogs” does not contain the letter q, do not use ‘prereqlogs’. The agent deployment uses this directory to extract software for running the prerequisite checks; possibly setting SCRATCHPATH or another extra parameter would resolve this issue without manually creating the directory, but this worked for me.

After installation of the VC++ 2010 package, which did not require a reboot, followed by creating of the prerelogs directory, I repeated the agent upgrade and it completed successfully.

This issue may or may not occur on other Windows versions, I have not heard any other reports of problems. The first release of EM13c did not seem to require this package.

Advertisement

Oracle PSU 12.1.0.2.160719 (patch 23054246) for Linux x86-64 requires libodbcinst

Oracle recently released patch 23054246 (DATABASE PATCH SET UPDATE 12.1.0.2.160719) for database 12.1.0.2, containing security updates from the July 2016 critical patch update advisory.

[EDIT 20160726: Oracle has documented this issue in MOS note 2163593.1]

This patch appears to have introduced a dependency on libodbcinst. During my first attempt to install this patch, I received errors while linking libsqora. The error appears as follows in OPatch logs:


[Jul 20, 2016 11:22:57 AM] The following warnings have occurred during OPatch execution:
[Jul 20, 2016 11:22:57 AM] 1) OUI-67200:Make failed to invoke "/usr/bin/make -f ins_odbc.mk isqora ORACLE_HOME=/oracle/oem/product/12.1.0/awrdb"....'/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: cannot find -lodbcinst
collect2: ld returned 1 exit status
make: *** [/oracle/oem/product/12.1.0/awrdb/odbc/lib/libsqora.so.12.1] Error 1
'
[Jul 20, 2016 11:22:57 AM] 2) OUI-67124:Re-link fails on target "isqora".
[Jul 20, 2016 11:22:57 AM] 3) OUI-67200:Make failed to invoke "/usr/bin/make -f ins_odbc.mk isqora ORACLE_HOME=/oracle/oem/product/12.1.0/awrdb"....'/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld: cannot find -lodbcinst
collect2: ld returned 1 exit status
make: *** [/oracle/oem/product/12.1.0/awrdb/odbc/lib/libsqora.so.12.1] Error 1
'
[Jul 20, 2016 11:22:57 AM] 4) OUI-67124:
NApply was not able to restore the home. Please invoke the following scripts:
- restore.[sh,bat]
- make.txt (Unix only)
to restore the ORACLE_HOME. They are located under
"/oracle/oem/product/12.1.0/awrdb/.patch_storage/NApply/2016-07-20_11-20-22AM"

After installing the unixODBC package on my SLES11 system, this error went away.

[Update: see also Brian Peasland’s blog post “July 2016 PSU fails to make isqora” for a different workaround to this issue that does not involving installing any additional packages.]

At the time of release, Oracle’s installation requirements for database 12.1.0.2 listed the unixODBC package as an optional dependency, required only “[i]f you intend to use ODBC”. This no longer seems to hold true. At the moment Oracle has not made it clear whether or not patch 23054246 contains a bug that introduces the libodbcinst dependency or if the database platform will require this library in all cases going forward.

If you have attempted patch application without libodbcinst available, the opatch apply step will fail and you will have to manually revert the patch, following the instructions that OPatch provides and/or contact Oracle Support for guidance. In my case, I followed the instructions to revert, installed unixODBC, then attempted again to apply the patch, at which point it completed successfully as expected. If you have not yet attempted to apply this patch, I highly recommend installing unixODBC first. I have already seen two others report on Twitter that they encountered this issue but none have yet confirmed to me that installing unixODBC resolved the problem. I believe it will.

UPDATE: See also “BUG 24332805 – OUI-67124:RE-LINK FAILS ON TARGET “ISQORA” DURING JUL 2016 PSU APPLY” once made public.

WORKAROUND: Unable to monitor Oracle XE 11gR2 with Oracle Enterprise Manager 13c

I have recently switched to using Oracle Enterprise Manager 13c (EM13c – 13.1), replacing my previous EM12c installation.  I elected to install a clean new environment instead of an upgrade, because my old install had been upgraded repeatedly going back to the initial release of EM12c and I wanted a fresh start.

I encountered only one difficult issue during the process. When I attempted to add one production Oracle XE 11gR2 database target, EM13c could not compute the target’s dynamic properties, leaving the target broken. Since you cannot submit jobs against a broken target, this prevented me from using EM13c to back up this database.  I had no comparable issues with XE as a target under EM12c.

The key metric errors that showed during this process included:
“Metric evaluation error start – Target {oracle_database.SID.domain.com} is broken: Dynamic Category property error,Get dynamic property error,No such metadata – No valid queryDescriptor or executionDescriptor found for target [oracle_database.SID.domain.com$30]”

and for the database system target:

“Metric evaluation error start – Received an exception when evaluating sev_eval_proc for:Target name = SID.domain.com_sys, metric_name = Response, metric_column = Status; Error msg = Target encountered metric erros; at least one member in in metric error”

I enabled debugging for the agent logs and attempted again to add the XE target.  Errors showing up in the logs included:

2016-01-15 12:10:05,905 [1806:4CE3192] DEBUG – Computing of dynamic property: [ComputeVC] is done (1 msec, error=true)

2016-01-15 12:10:06,452 [1806:F917F5F8] DEBUG – Computing of dynamic property: [GetDumpDestination] is done (0 msec, error=true)

2016-01-15 12:10:06,508 [1813:6EEEAC87] DEBUG – Computing of dynamic property: [DeduceAlertLogFile] is done (1 msec, error=true)

2016-01-15 12:11:18,779 [1830:CD3A325D] DEBUG – Error was added to oracle_database.SID.domain.com$23(0|MONITORED|false|true|<UF>): Invalid Input

2016-01-15 12:11:18,779 [1831:3657AE55] DEBUG – abandoning long op “CDProps:oracle_database.SID.domain.com:ComputeVC:GENERIC_TASK:Fri Jan 15 12:11:18 EST 2016”

2016-01-15 12:11:18,780 [1830:CD3A325D] DEBUG – Error during dynamic property ComputeVC calculation: critical=true, missingCatProps=[VersionCategory], missingProps=[VersionCategory] oracle.sysman.emSDK.agent.fetchlet.exception.FetchletException: Invalid Input

2016-01-15 12:35:38,038 INFO – Finished dynamic properties computation (total time 817 ms). Number of DP computed: 19. Number of DP with error: 3. Number of dynamic properties that were added: 132.

After reviewing the logs carefully (and posting this as a question in the MOS Oracle XE forum – https://community.oracle.com/thread/3892946) I eventually narrowed the issue down to a query that EM13c runs against DBA_REGISTRY_HISTORY in a target database when added. For database versions greater than 11.2 but less than 12.1.0.2, EM13c assumes that DBA_REGISTRY_HISTORY contains a BUNDLE_SERIES column.  This column does not exist in Oracle XE 11gR2, which reports a version string of 11.2.0.2.

This bug should eventually get a fix as EM13c gets patched, but in the meantime if you need to monitor an Oracle XE target with EM13c, the following workaround took care of the problem for me: create a new DBA_REGISTRY_HISTORY table containing a BUNDLE_SERIES column in your monitoring user’s schema in XE.  So, as user DBSNMP on XE, I ran:

SQL> create table dba_registry_history (ACTION_TIME TIMESTAMP(6), ACTION VARCHAR2(30), NAMESPACE VARCHAR2(30), VERSION VARCHAR2(30), ID NUMBER, BUNDLE_SERIES VARCHAR2(30), COMMENTS VARCHAR2(255));

Since one cannot patch XE, the real DBA_REGISTRY_HISTORY view has no rows and so you do not need to populate any data into this new table.

After adding the table, force a recalculation of dynamic properties by running the following against the EM13c management agent on the XE server:

$ emctl reload agent dynamicproperties SID.domain.com:oracle_database
Oracle Enterprise Manager Cloud Control 13c Release 1
Copyright (c) 1996, 2015 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
EMD recompute dynprops completed successfully

Once that completed successfully my XE target started to show the correct status in EM13c and I can submit jobs against the target.  All fixed.  I recommend deleting the DBSNMP.DBA_REGISTRY_HISTORY table once the bug gets fixed in OEM.

[EDIT 20160216: Oracle has documented this issue in MOS note EM13c: Database Target Status Shows “Dynamic Category property error” In 13c Cloud Control (Doc ID 2105001.1) and in bug 22592461 DATABASE TARGET STATUS SHOWS “DYNAMIC CATEGORY PROPERTY ERROR” IN 13C CONSOLE. Users on supported databases (e.g., not Oracle XE) should follow the resolution steps in that document instead to correct the real error.]

EM12c opatchauto, SHA256, and you

This post serves to document an issue I encountered after replacing expired SSL/TLS certificates on the server I use for Oracle Enterprise Manager 12c. To put it simply, using opatchauto to apply EM12c PSUs does not work if your WebLogic adminserver has a certificate installed that uses the SHA256 hashing algorithm.

[UPDATED 20151012: Please see this comment and this comment below, from Adam Robinson, who has provided a workaround that may work for you involving editing the opatchauto script to enable JSSE. As always, please consider workarounds requiring you to edit files as unsupported and at your own risk, but I would consider this fix superior to reverting back to the demo certificate every time you need to patch. You will need to repeat this fix every time you update OPatch in your OMS_HOME, though. Adam’s workaround does succeed in my environment.]

Error message

Expect to see the following error when running “opatchauto apply -analyze” or “opatchauto apply” against an installation with an SHA256-hashed certificate on the WLS adminserver:

oracle@omshost:/oracle/stage/21603255> opatchauto apply -analyze -property_file ~/property_file 
OPatch Automation Tool
Copyright (c) 2014, Oracle Corporation.  All rights reserved.


OPatchauto version : 11.1.0.12.3
OUI version        : 11.1.0.12.0
Running from       : /oracle/oem/Middleware12cR4/oms
Log file location  : /oracle/oem/Middleware12cR4/oms/cfgtoollogs/opatch/opatch2015-09-11_10-57-19AM_1.log

OPatchauto log file: /oracle/oem/Middleware12cR4/oms/cfgtoollogs/opatchauto/21603255/opatch_oms_2015-09-11_10-57-22AM_analyze.log



OPatchauto failed to establish JMX connection to weblogic server. This could be because of one (or) more of the following reasons:
1. Weblogic admin server URL that manages OMS application may not be right.
2. Weblogic admin server credentials (username, password) may not be right.
3. Virtual host configuration. If OMS, weblogic server are on virtual host configuration, Please make sure to add OPatchAuto.OMS_DISABLE_HOST_CHECK=true to command line and run again. (example: /oracle/oem/Middleware12cR4/oms/OPatch/opatchauto apply -analyze -property_file /home/oracle/property_file -invPtrLoc /oracle/oem/Middleware12cR4/oms/oraInst.loc  OPatchAuto.OMS_DISABLE_HOST_CHECK=true)

Please check above conditions and if error(s) still persist, Please contact Oracle support.


[ Error during Get weblogic Admin Server information Phase]. Detail: OPatchauto was not able to find right interview inputs.
OPatchauto failed: 
OPatchauto failed to establish JMX connection to weblogic server. This could be because of one (or) more of the following reasons:
1. Weblogic admin server URL that manages OMS application may not be right.
2. Weblogic admin server credentials (username, password) may not be right.
3. Virtual host configuration. If OMS, weblogic server are on virtual host configuration, Please make sure to add OPatchAuto.OMS_DISABLE_HOST_CHECK=true to command line and run again. (example: /oracle/oem/Middleware12cR4/oms/OPatch/opatchauto apply -analyze -property_file /home/oracle/property_file -invPtrLoc /oracle/oem/Middleware12cR4/oms/oraInst.loc  OPatchAuto.OMS_DISABLE_HOST_CHECK=true)

Please check above conditions and if error(s) still persist, Please contact Oracle support.

Log file location: /oracle/oem/Middleware12cR4/oms/cfgtoollogs/opatchauto/21603255/opatch_oms_2015-09-11_10-57-22AM_analyze.log

Recommended actions: Please correct the interview inputs and run opatchauto again.

OPatchauto failed with error code 231

Confirmation of the issue

To confirm this issue in your environment after receiving the preceding error message, check the hashing algorithm used on your adminserver certificate. I prefer to use the openssl commandline tool for this. If you don’t know the port used for your adminserver, you can retrieve it from the $EM_INSTANCE_BASE/em/EMGC_OMS1/emgc.properties file under AS_HTTPS_PORT. If your certificate does not show the usage of SHA256 (or another hash algorithm from the SHA-2 family) as in my example below, you may have a different problem.

oracle@omshost:~> openssl s_client -prexit -connect omshost.domain.com:7103 /dev/null | openssl x509 -text -in /dev/stdin | grep "Signature Algorithm" 2> /dev/null
        Signature Algorithm: sha256WithRSAEncryption
    Signature Algorithm: sha256WithRSAEncryption

Workaround

To work around this issue, you need to (temporarily!) replace the certificate on your WLS adminserver. Now, whenever I need to apply a PSU release, I resecure WLS using the default demonstration certificate, apply the PSU, then replace my original certificate.

oracle@omshost:/oracle/stage/21603255> emctl secure wls -use_demo_cert
Oracle Enterprise Manager Cloud Control 12c Release 4
Copyright (c) 1996, 2014 Oracle Corporation.  All rights reserved.
Securing WLS... Started.
Enter Enterprise Manager Root (SYSMAN) Password :
Securing WLS... Successful
Restart OMS using 'emctl stop oms -all' and 'emctl start oms'
oracle@omshost:/oracle/stage/21603255> emctl stop oms -all ; sleep 5 ; emctl start oms
Oracle Enterprise Manager Cloud Control 12c Release 4
Copyright (c) 1996, 2014 Oracle Corporation.  All rights reserved.
Stopping WebTier...
WebTier Successfully Stopped
Stopping Oracle Management Server...
Oracle Management Server Successfully Stopped
Oracle Management Server is Down
Stopping BI Publisher Server...
BI Publisher Server Successfully Stopped
AdminServer Successfully Stopped
BI Publisher Server is Down
Oracle Enterprise Manager Cloud Control 12c Release 4
Copyright (c) 1996, 2014 Oracle Corporation.  All rights reserved.
Starting Oracle Management Server...
Starting WebTier...
WebTier Successfully Started
Oracle Management Server Successfully Started
Oracle Management Server is Up
Starting BI Publisher Server ...
BI Publisher Server Successfully Started
BI Publisher Server is Up

[install the PSU according to the README instructions, including any post-installation steps]

oracle@omshost:/oracle/stage/21603255> emctl secure wls -wallet /oracle/oem/oemwallet
Oracle Enterprise Manager Cloud Control 12c Release 4  
Copyright (c) 1996, 2014 Oracle Corporation.  All rights reserved.
Securing WLS... Started.
Enter Enterprise Manager Root (SYSMAN) Password : 
Securing WLS... Successful
Restart OMS using 'emctl stop oms -all' and 'emctl start oms'
If there are multiple OMSs in this environment, perform this configuration on all of them.
oracle@omshost:/oracle/stage/21603255> emctl stop oms -all ; sleep 5 ; emctl start oms
Oracle Enterprise Manager Cloud Control 12c Release 4
Copyright (c) 1996, 2014 Oracle Corporation.  All rights reserved.
Stopping WebTier...
WebTier Successfully Stopped
Stopping Oracle Management Server...
Oracle Management Server Successfully Stopped
Oracle Management Server is Down
Stopping BI Publisher Server...
BI Publisher Server Successfully Stopped
AdminServer Successfully Stopped
BI Publisher Server is Down
Oracle Enterprise Manager Cloud Control 12c Release 4
Copyright (c) 1996, 2014 Oracle Corporation.  All rights reserved.
Starting Oracle Management Server...
Starting WebTier...
WebTier Successfully Started
Oracle Management Server Successfully Started
Oracle Management Server is Up
Starting BI Publisher Server ...
BI Publisher Server Successfully Started
BI Publisher Server is Up

I have not noticed any other EM12c issues using SHA256-hashed certificates. With this workaround, you can both continue to use quality certificates and keep your OMS patched.

EM12c OHS, LOW strength ciphers, custom certificates, and patch 19948000 weirdness

This post documents an unusual issue I encountered with the Oracle HTTP Server (OHS) installation in my Oracle Enterprise Manager 12c R4 (12.1.0.4) environment after following MOS note 1984662.1 and applying patch 19948000 (CPUJAN2015) to my OHS home.  It also contains a workaround I found that you should consider UNSUPPORTED, UNOFFICIAL, and NOT RECOMMENDED, only for use if absolutely necessary to meet auditor requirements.  If you do not have to follow the steps I describe below, I suggest waiting for new patches and further guidance from Oracle Support. If this change breaks your system and eats all the food in the break room refrigerator, I warned you not to do it.

Like other security-conscious EM12c admins, I want to keep my installation secure, and so I watch closely when security patches become available for EM12c or its various components. Thus, when I noticed patch 19948000’s availability for OHS, which disables SSLv3, I installed it on my system, and confirmed through testing that OHS no longer permitted SSLv3 connections (test for yourself with: openssl s_client -connect host.domain.com:port -ssl3, or try my EM12c SSL security checkup script that I have blogged about previously).

As I proceeded with further hardening of my EM12c system, specifically an attempt to disable LOW and MEDIUM strength cipher suite usage as per MOS note 1477287.1, I noticed that after following the directions provided, all of my EM12c endpoints correctly rejected LOW and MEDIUM strength ciphers, with one exception.  The OMS HTTPS upload port, inexplicably, continued to permit LOW strength connections. It refused MEDIUM strength ciphers, but had no problem accepting a LOW strength DES-CBC-SHA connection over TLSv1:

$ openssl s_client -connect omshost.domain.com:4902 -cipher LOW
[...]
SSL handshake has read 4109 bytes and written 385 bytes
---
New, TLSv1/SSLv3, Cipher is DES-CBC-SHA
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1
    Cipher    : DES-CBC-SHA
    Session-ID: 37BF30668DCAD2CC5D0BAC4142CC1FA1
    Session-ID-ctx:
    Master-Key: [redacted]
    Key-Arg   : None
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    Start Time: 1429290250
    Timeout   : 300 (sec)
---

This confused me greatly, as I had edited all configuration files as instructed, and none of my other OHS listen ports accepted this LOW strength cipher connection.  I spent quite a bit of time trying to diagnose and resolve the issue with no luck, until I eventually stumbled upon an odd fix.  If I remove or comment out the “IfDefine SSL” directives from my $GC_INSTANCE_HOME/WebTierIH1/config/OHS/ohs1/httpd_em.conf file, then suddenly OHS would refuse LOW strength cipher connections on this port, with no apparent ill effect on the other listening ports.

$ openssl s_client -connect omshost.domain.com:4902 -cipher LOW 
CONNECTED(00000003)
2282780:error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure:s23_clnt.c:769:
---
no peer certificate available
---
No client certificate CA names sent
---
SSL handshake has read 7 bytes and written 67 bytes
---
New, (NONE), Cipher is (NONE)
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
---

I have noted these IfDefine SSL directives with “HERE” in the excerpt below from my httpd_em.conf file.

##
## CAUTION: Edit only the .template version of this file!
##
##     The command
##         emctl secure [lock|unlock]
##     will replace httpd_em.conf (discarding your changes) 
##     using the httpd_em.conf.template file.
##
## This file contains virtual hosts and other directives
## required for the "Enterprise Manager Central Console"
## to function correctly.
##

#UseWebCacheIp On

<IfDefine SSL>      #### HERE
    Listen 4902
    <VirtualHost *:4902>
        <Location /empbs/upload>
            Order allow,deny
            Allow from all
        </Location>
        <Location /empbs/jobrecv>
            Order allow,deny
            Allow from all
        </Location>
        <Location /em>
            Order allow,deny
            Allow from all
        </Location>
        <Location /agent_download>
            Order allow,deny
            Allow from all
        </Location>
        <Location /xmlpserver>
            Order allow,deny
            Allow from all
        </Location>

        #DocumentRoot &ORACLE_HOME&/Apache/Apache/htdocs
        ServerName omshost.domain.com
        #Port 4902
        Timeout 900

        LogFormat "%h %l %u %t \"%r\" %>s %b [ecid: %{ECID-Context}i] [User-Agent: %{User-Agent}i]" common
        SetEnvIf Request_URI "\.(bmp|jpg|png|gif|css|js$)" no-log
        SetEnvIf Request_URI "/em/dynamicImage/*"  no-log
        CustomLog "|${ORACLE_HOME}/ohs/bin/odl_rotatelogs /oracle/oem/gc_inst1/WebTierIH1/diagnostics/logs/OHS/ohs1/em_upload_https_access_log 10M 100M" common env=!no-log

        ErrorLog "|${ORACLE_HOME}/ohs/bin/odl_rotatelogs /oracle/oem/gc_inst1/WebTierIH1/diagnostics/logs/OHS/ohs1/em_upload_https_error_log 10M 100M"
        SSLEngine on
        SSLCipherSuite HIGH
        SSLWallet file:/oracle/oem/gc_inst1/WebTierIH1/config/OHS/ohs1/keystores/upload
        SSLProtocol TLSv1

        <Files ~ "\.(cgi|shtml)$">
            SSLOptions +StdEnvVars
        </Files>
        #<Directory &ORACLE_HOME&/Apache/Apache/cgi-bin>
        #    SSLOptions +StdEnvVars
        #</Directory>
        SetEnvIf User-Agent ".*MSIE.*" nokeepalive ssl-unclean-shutdown
    </VirtualHost>
</IfDefine>     #### HERE
[remainder of file removed]

If I leave the IfDefine SSL statements in there, my OMS upload port accepts the weak DES-CBC-SHA cipher along with HIGH strength ciphers.  If I remove the IfDefine SSL, my OMS upload port refuses DES-CBC-SHA along with all other LOW/MEDIUM strength ciphers.

This makes no sense, given what I know of OHS and Apache-like products and the way that the handle the SSLCipherSuite configuration directive.

I raised this issue on Twitter and heard back from Andrew Bulloch at Oracle, who graciously spent quite a bit of time attempting to reproduce the issue on his side and working with me to identify the situations in which this behavior occurs.  After much testing, it appears that this behavior only occurs in the following situation:

  1. The OHS installed with EM12c R4 has patch 19948000 installed, AND
  2. The administrator has installed a third party SSL certificate, replacing the demo certificate used by default, AND
  3. The OHS httpd_em.conf contains the “IfDefine SSL” directive.

If I remove my custom certificate, returning the OMS to the demo certificate, the issue disappears, then returns if I reinstall the custom certificate.

If I remove patch 19948000, the issue disappears, and does not return whether I use a custom certificate or a demo certificate.

If I remove the IfDefine SSL directive, the issue disappears, and does not return whether I use a custom certificate, a demo certificate, or whether or not I have patch 19948000 installed.

I attempted to replicate this behavior with an SSL certificate that did not come from a true certificate authority, by using OpenSSL to create a CA, create a cert, sign it, then install it into OHS per the documentation in MOS note 1399293.1, but I could not reproduce it, possibly due to the fact that I used a certificate signed directly by a root CA (as with the demo certificate) instead of a certificate signed by an intermediate chain certificate signed by a root CA, as with the paid-for commercial certificate that revealed the issue. I have not had a chance to test that configuration.

Unfortunately, removing patch 19948000 means that OHS cannot refuse SSLv3 connections, and removing the custom certificate reverts the system back to the demo certificate that I do not wish to use, both of which will represent audit findings in regulated sites.

Due to this issue, I have edited my EM12c security checkup script to remove my recommendation to install patch 19948000, although I still have it installed.  For security reasons, I will leave my system in the workaround state I have described here, as I want SSLv3 disabled, and I want LOW strength cipher suites disabled, and I want to use a custom SSL certificate, but I accept the risk that I may have to undo this setup at any time to receive support or to successfully apply later patches.  You will have to make your own decisions based on your site’s audit requirements and the availability of personnel to validate your configuration and handle future patching.

I would be very interested if anyone else reading this has encountered this issue, as I do not know if my installation somehow uniquely surfaces this behavior or if the certificate vendor that we used has some strange settings on their certificates that cause confusion for OHS.

When proactive EM12c JDK upgrades bite back

You probably will not encounter this issue, but I will post this anyway to get the error message and resolution indexed by Google.

While attempting to apply patch 19513382 (EM agent bundle patch 12.1.0.4.3) to my EM12cR4 agents, I ran into multiple problems.  Initially it would not apply to any of my agents.  Bug 20134182 and the resolution described in MOS note 1952355.1 resolved the first problem (OPatch reporting that identical patches 18721761 and 18502187 already exist), but that left me with one agent I could not upgrade. Attempts to run patch plan validation within EM12c produced the following error:

PatchList : 19513382
PatchLocList : /tmp/p19513382_600000000009641_2000_0/oraagent
TargetName : [redacted]:[port]
----------------------------------------
[11_12_2014_10_00_40] Command Arguments:
/oraagent/agent12c/core/12.1.0.4.0/OPatch/opatch checkComponents -phbasedir /tmp/p19513382_600000000009641_2000_0/oraagent/19513382 -oh /oraagent/agent12c/core/12.1.0.4.0 -invPtrloc /oraagent/agent12c/core/12.1.0.4.0/oraInst.loc
 
OPatch cannot continue because it would not be able to load OUI platform dependent library from the directory "/oraagent/agent12c/core/12.1.0.4.0/oui/lib/linux64". The directory does not exist in the Oracle home.
This could be due to the following reasons.

(1) Incompatible usage of java with OUI (32/64 bit).
(2) Wrong 32-bit Oracle Home installation in 64-bit environment (or) vice-versa.
Please contact Oracle support for more details.
 
OPatch failed with error code 1
 
PREREQ_CONTEXT_HOST_NAME:[redacted]
REREQ_CONTEXT_HOME_LOCATION:/oraagent/agent12c/core/12.1.0.4.0
PREREQ_NAME: Checking if the patches are applicable.
PREREQ_DESC: Checking if the patches are applicable on the Management Agent.
PREREQ_TYPE:APPLICABILITY
PREREQ_STATUS:FAILED
PREREQ_MESG: None of these patches are applicable on the Management Agent.
PREREQ_MESG_PATCH:19513382
PREREQ_REMEDY:MANUAL
PREREQ_REMEDY_DETAILS: Remove patch(es) 19513382 from this patch plan.

I already know from the previously referenced MOS note that OPatch 11.1.0.12.3 contains bugs, so as a first debugging step I attempted to rollback the OPatch upgrade by restoring the backup copies of OPatch found in $AGENT_HOME/OPatch/backup/.  I received the same error message with OPatch 11.1.0.10.4 and 11.1.0.11.0.  I also received a similar error even if I simply tried to run “opatch lsinv” from the command line with the older versions. So OPatch did not cause this issue.

Since the error message mentions 32-bit and 64-bit incompatibility, I needed to consider the environment.  This server runs Linux x86-64 (SLES 10 SP4), but must use a 32-bit EM agent based on the certification matrix and MOS note 1488161.1. I next checked to find my last successful patch run, which happened only a month ago, so a recent change has to have caused this problem. Going through my notes, the only recent change on this server involved upgrading the JDK used by the EM agent per MOS note 1944044.1.

Luckily I still had the old JDK available for comparison.

> java -version
java version "1.6.0_43"
Java(TM) SE Runtime Environment (build 1.6.0_43-b01)
Java HotSpot(TM) Server VM (build 20.14-b01, mixed mode)
> file `which java`
/oraagent/agent12c/core/12.1.0.4.0/jdk/bin/java: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), not stripped

Looking at the new JDK:

> ./java -version
java version "1.6.0_85"
Java(TM) SE Runtime Environment (build 1.6.0_85-b13)
Java HotSpot(TM) 64-Bit Server VM (build 20.85-b01, mixed mode)
> file java
java: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), for GNU/Linux 2.4.0, dynamically linked (uses shared libs), not stripped

There I have my problem. In upgrading the JDK, I had installed the 64-bit version of Java 1.6u85, not the 32-bit version, based on the fact that the server runs a 64-bit OS. I had not considered that a 64-bit JDK would not remain compatible with the 32-bit agent on this 64-bit system.

Surprisingly, everything about the agent seems to have worked fine, despite the 64-bit JDK.  Nothing broke until I attempted to use OPatch.

To resolve the issue, I stopped the agent and moved the original 32 bit 1.6u43 JDK back to where it belongs, followed note 1952355.1 to work around the known bugs when using OPatch 11.1.0.12.3 to apply 19513382, then successfully applied the patch.  After that I downloaded the correct 32-bit version of the 1.6u85 JDK, installed it per 1944044.1, and now OPatch works as expected.

Stale EM12c patch recommendations? Get patch 14822626

I don’t use the automated patching functionality provided by EM12c.  I do, however, get value out of the patch recommendations since they serve as a good reminder when I’ve missed a patch that should be applied to one of my targets.  For this reason I was disappointed when, after upgrading from EM12cR1 to EM12cR2, the patch recommendations it gave me became stale and stopped getting updated when I loaded the em_catalog.zip file.

If you DO use the automated patching functionality, you have probably already followed all of the advice and installed the required patches documented in MOS note 427577.1, “Enterprise Manager patches required for setting up Provisioning, Patching and Cloning (Deployment Procedures)”.  In that case you already have this patch installed and don’t need to read any further, but if not, read on.

After upgrading to EM12cR2, I also upgraded several databases from 10gR2 to 11gR2.  Months passed, and yet the patch recommendations EM12c gave me continued to refer to 10gR2 patches which I knew weren’t applicable as I was running 11gR2.  I tried several things, like setting EM12c to offline mode, to online mode, loading em_catalog.zip, re-running the various “Refresh From My Oracle Support” jobs, all without ever receiving fresh patch recommendations.

So to sort this out, I did what I usually do, and asked about it on Twitter.  Big thanks to Sudip Datta, Vice President of Product Management at Oracle, who pointed me to bug 14822626 and its associated patch.  The bug does not appear to be public, but MOS note 1522918.1, “12C – Patch Recommendations Not Updating After Upgrade To 12.1.0.2 Cloud Control – ‘…Patch Recommendations Computation is disabled … skipping …'” documents the problem as a known issue after upgrading from 12.1.0.1 to 12.1.0.2 and clearly matches the behavior I saw.

As soon as I applied patch 14822626, the old stale patch recommendations were cleared out, and once I loaded the current em_catalog.zip file, I had accurate patch recommendations for my environment that I can now use to make decisions going forward.

Thank you, Sudip!

Why your EM12cR2 FMW stack probably needs patch 13490778 to avoid OHS down/up events

MOS note 1496775.1 describes a situation with EM12cR2 where OEM will falsely report the Oracle HTTP Server instance (ohs1) as down, even though it is up.  This is due to some changes in FMW 11.1.1.6.  If you don’t have any incident rules or notifications set up that would catch this event, it’s easy to miss it and not know that it is happening.  I had run into this note a couple times before but ignored it, since I had never seen any open events complaining about OHS being down so I figured I just wasn’t hitting the bug.

This morning I caught one of the events.  I found myself wondering how often this had been happening — was it an issue once every couple days, every few hours, or what?

SQL> col msg format a45
SQL> select msg, count(*) from sysman.mgmt$events
  2  where closed_date >= sysdate - 1 and msg like '%HTTP Server instance%'
  3  group by msg;

MSG                                             COUNT(*)
--------------------------------------------- ----------
CLEARED - The HTTP Server instance is up             430
The HTTP Server instance is down                     430

Turns out it had been happening a LOT.  If you’ve followed Oracle’s recommendations and set up target lifecycle status priorities (see my post on doing so) you’ve probably set your OEM targets up with “MissionCritical” priority.  That means your OMS has been burning a lot of CPU to process all these up/down events on a mission critical target with high priority, potentially delaying processing of other events elsewhere in your events.

Applying patch 13490778, with ORACLE_HOME set to $MW_HOME/oracle_common should resolve this issue.  For best results, stop all OEM components prior to patch application and restart them when complete.

To convince yourself that applying the patch helped, re-run that query about 15 minutes after applying the patch and you should see the count decrease.

Final (!) update/fix for EM12c increased redo from EM_JOB_METRICS after upgrade to R2 (bug/patch 14726136)

Update 13 Nov 2012: Patch 14726136 has been obsoleted. The note on MOS indicated that the patch caused some metrics to be calculated incorrectly. The patch has been replaced by patch 14833587. I have applied the new patch and all appears well — jobs are running and redo generation on the repository database remains where it was before the upgrade to EM12c R2.

Following up again on the EM12cR2 upgrade issue from BP1 that causes significantly increased redo logging on the repository database due to heavy insert activity in EM_JOB_METRICS. I first covered this here, with a followup here, a partial workaround here, and then a warning here.

Patch 14726136 has been re-released on MOS. The initial release caused problems for me as it prevented all OEM scheduled jobs from running, eventually causing me to rollback the patch. I am very pleased to report that the new update of the patch (from Oct 30th) applies cleanly and all of my jobs are now running on-schedule and completing successfully. EM_JOB_METRICS is showing no more than 11 inserts per second, and more than a minute is passing between the batches of inserts. My redo volume is already down significantly.

Big thanks to Oracle support and development and the EM12c team!

Patch 14726136

You may have seen the patch in the title drop regarding increased redo logging after upgrading to EM12cR2.

I tried the patch Friday morning, and while it did decrease my redo logging, it also prevented any OEM scheduled jobs from running. I eventually rolled back the patch and jobs ran again.

I would be very interested to know if the patch works for you AND your jobs still run.

Partial workaround for increased redo generation on EM12cR2 repository database after upgrade from R1

Another followup to my previous posts on increased redo logging on the repository database after upgrading to EM12cR2.

On the OTN Forum thread, slohit posted a suggestion with a way to reduce the frequency of job dispatcher runs:

To reduce the frequency of inserts, it would be better to slow down the frequency of the dispatcher. This is OK only if you are fine with a lower response time between steps for your jobs. If it is only a few jobs/job steps scheduled to run together, this is all right

emctl set property -name oracle.sysman.core.jobs.minDispatcherSleepInterval -value 0.5
emctl set property -name oracle.sysman.core.jobs.linearBackoff -value 0.5

By making this change, followed by a bounce of the OMS, the redo generation on my repository database decreased by about 50%. It’s still 200-300% more than before the upgrade, but it’s quite an improvement. The EM_JOB_METRICS table now only receives about 20 inserts per second compared to 60. My 13GB repository database should now only produce about 5GB of redo per day instead of 10GB.

Prior to the change, emctl get property reported these property values as null so presumably the OMS was working with whatever the built-in default values were.

In my case, a delay between job steps is a tradeoff that I can tolerate. This OMS runs simple database backups, SQL and RMAN scripts, so I do not have any time-critical multi-step jobs that could be negatively impacted by delayed response to completion of individual job steps. Your mileage may vary.

Oracle Support is continuing to investigate this bug. At least one other poster on the forum has reported that they too are experiencing this issue so I’m looking forward to finding out if this change reduces their redo generation to an acceptable rate. That same poster also reports that they do not see the issue on a new install of EM12cR2, only on a system upgraded from R1. It’s not clear to me why an upgraded system would experience this when processing the raw metric data while a fresh install would not, nor is it clear how many upgraded systems are experiencing the issue. But at least we now have a partial workaround for bug 14726136.

BUG 14726136 – INCREASED REDO LOGGING ON REPOSITORY DB AFTER UPGRADING TO OEM 12.1.0.2

Following up on my previous post about increased redo logging on the repository DB after upgrading to OEM 12cR2, we now have bug 14726136 filed.

The heavy redo activity is coming from SYSMAN inserts into the EM_JOB_METRICS table. It’s doing 60 inserts a second, in batches of 10 per LOG_TIMESTAMP, where each insert looks like:

insert into "SYSMAN"."EM_JOB_METRICS"("SOURCE","LOG_TIMESTAMP","METRIC_NAME","METRIC_QUALIFIER","METRIC_VALUE") values
('omshost.domain.com:4890_Management_Service',TO_TIMESTAMP('26-SEP-12 03.09.25.215415 PM'),'DISP_REQ_STEPS','0','25');

The 10 inserts differ only in the values of the METRIC_QUALIFIER and METRIC_VALUE columns:

'0', '25'
'0', '0'
'1', '12'
'1', '0'
'2', '25'
'2', '0'
'3', '10'
'3', '0'
'4', '10'
'4', '0'

LogMiner shows this as a significant proportion of the total database activity, with the following extracts based on about 20 minutes (or about 200MB) of repository DB operation:

SQL> select TABLE_NAME, count(*) from v$logmnr_contents group by table_name order by 2 desc;

                                                       TABLE_NAME           COUNT(*)
                                                     EM_JOB_METRICS           201740
                                                                              163843
                                                    EM_METRIC_ITEMS            13374
                                                    EM_METRIC_VALUES           12524
                                                MGMT_AVAILABILITY_MARKER        3845
                                                       COL_USAGE$               2403
                                                          ROUT                  1453
                                                     SCHEDULER$_JOB             1067


SQL> select seg_owner, operation, count(*) from v$logmnr_contents group by seg_owner, operation order by 1;

                                              SEG_OWNER     OPERATION     COUNT(*)  
                                               DBSNMP        DELETE             13  
                                               DBSNMP        INSERT             13  
                                               DBSNMP        UPDATE              4  
                                                RCAT         DELETE            269  
                                                RCAT         INSERT           1372  
                                                RCAT    SELECT_FOR_UPDATE       50  
                                                RCAT         UPDATE            884  
                                                 SYS         DELETE             88  
                                                 SYS         INSERT           1041
                                                 SYS    SELECT_FOR_UPDATE      265
                                                 SYS         UPDATE           4802  
                                               SYSMAN        DELETE          97325  
                                               SYSMAN        INSERT         106610  
                                               SYSMAN       INTERNAL           196  
                                               SYSMAN       LOB_TRIM             5  
                                               SYSMAN       LOB_WRITE          326  
                                               SYSMAN   SELECT_FOR_UPDATE     1203  
                                               SYSMAN    SEL_LOB_LOCATOR       103  
                                               SYSMAN      UNSUPPORTED       13626  
                                               SYSMAN        UPDATE          21808
                                               UNKNOWN       DELETE             82  
                                               UNKNOWN       INSERT             80  
                                               UNKNOWN  SELECT_FOR_UPDATE       65
                                               UNKNOWN       UPDATE             65
                                                             COMMIT          19792
                                                          DPI SAVEPOINT         15
                                                            INTERNAL        124178
                                                            ROLLBACK            33
                                                              START          19825

This is not a particularly busy OMS, it’s monitoring fewer than 200 targets and only runs a single job every couple minutes.

Followup: EM12cR2 reports Job Purge repository job as Down

This is a follow up to EM12cR2 reports Job Purge repository job as Down. Erik at the OTN Forum has provided a fix:

EM12cR2 – “Job Purge” repository scheduler job falsely(?) reported down .

Simply run the following update connected as user SYSMAN on your repository database, and the repository jobs will all show as clear and the open incident will be closed:

UPDATE MGMT_PERFORMANCE_NAMES SET display_name = NULL,
dbms_jobname = NULL, is_dbmsjob = 'N', is_deleted = 'Y'
WHERE UPPER(DBMS_JOBNAME) LIKE 'MGMT_JOB_ENGINE.APPLY_PURGE_POLICIES%';
COMMIT;

Patch 6895422 confirmed to fix bug 13729794 (TOO_MANY_OPEN_FILES) in EM12cR2

Update 20130619: While patch 6895422 is still available, Oracle has released patch 16087066, which contains the fix from 6895422 along with a fix for bug 13583799.  The general recommendation from Oracle now is to use patch 16087066.  Also, based on a comment in this MOS post, the 12.1.0.3 EM agent will include this patch and we will no longer need to apply it manually once the next version of the agent is released.

I’ve confirmed that patch 6895422 fixes the TOO_MANY_OPEN_FILES bug encountered with 12.1.0.2.0 agents in an Oracle Enterprise Manager Cloud Control 12c Release 2 environment.

Witness the following metric value graphs of unpatched and patched agents for the “number files open” metric over seven days:

Unpatched agent metric graph

Patched Agent

BUG 13729794 still impacts Enterprise Manager 12cR2

Those of us with busy servers may have run into bug 13729794, causing OEM 12c agents to eventually stop monitoring after raising a TOO_MANY_OPEN_FILES error. On 12cR1, patch 6895422 took care of this problem. Oracle’s bug database reports this bug as “fixed in 12.1.0.2.0” but in status 75 “Closed, Code fix resolution not verified.”

I had two 12cR2 agents bomb out over the weekend (nothing like logging into MOS on a day off) with TOO_MANY_OPEN_FILES. The OMS wouldn’t recognize either of them after bouncing the agent. I had to run an emctl clearstate on each of them with the agent down, then start it back up, before the 12cR2 OMS would report the agents as reachable again.

So it seems 12cR2 does not fix this bug, at least not on Linux x86-64. I should know in about two days whether or not this patch still fixes the problem.

EM12cR2 reports Job Purge repository job as Down

Crossposted from the OTN Forum post. According to Oracle this is a bug and a script to resolve the reported problematic job will be forthcoming.

On the Repository page (Setup -> Manage Cloud Control -> Repository), in the Repository Scheduler Jobs Status section, the “Job Purge” job is shown as down with a red arrow and blanks for both Next Scheduled Run and Last Scheduled Run. All the other jobs look good and are running successfully.

This also appears as an incident stating “DBMS Job UpDown for Job Purge exceeded the critical threshold (DOWN). Current value: DOWN”.

The framework metric reference manual for this metric (link here) indicates that this metric equates to the DBMS_JOB ‘broken’ status and suggests retrieving the job_id and re-submitting it, but the provided instructions don’t find anything useful.

SQL> sho user
USER is "SYSMAN"
SQL> select dbms_jobname from mgmt_performance_names where display_name = 'Job Purge';

DBMS_JOBNAME
--------------------------------------------------------------------------------
MGMT_JOB_ENGINE.apply_purge_policies()

SQL> select job from all_jobs where what='MGMT_JOB_ENGINE.apply_purge_policies()';

no rows selected

SQL> select count(*) from all_jobs;

COUNT(*)
----------
0

No ORA-12012 messages found in the alert/trace logs.

Repvfy also reports this as a missing scheduler job:

2. Missing DBMS_SCHEDULER jobs
------------------------------

DISPLAY_NAME DBMS_JOBNAME
---------------------------------------- -----------------------------------------------------
Job Purge MGMT_JOB_ENGINE.apply_purge_policies()

The MGMT_JOB_ENGINE package no longer contains apply_purge_policies. It seems to have moved to EM_JOB_PURGE.

According to dba_scheduler_jobs, EM_JOB_PURGE.APPLY_PURGE_POLICIES is running just fine:

SQL> select job_action, enabled, state, failure_count, last_start_date, last_run_duration
2 from dba_scheduler_jobs where job_name = 'EM_JOB_PURGE_POLICIES';

JOB_ACTION ENABL STATE FAILURE_COUNT
---------------------------------------- ----- ---------- -------------
LAST_START_DATE
----------------------------------------
LAST_RUN_DURATION
----------------------------------------
EM_JOB_PURGE.APPLY_PURGE_POLICIES(); TRUE SCHEDULED 0
14-SEP-12 05.00.00.010318 AM US/EASTERN
+000000000 00:00:09.464552

It seems to me this is some kind of metadata inconsistency between mgmt_performance_names and the DBMS scheduler, and probably isn’t something to be concerned with. Any thoughts?

EM12cR2 increased redo logging on repository database

I’ve posted this on the OTN forum here, but perhaps it’s worth a blog post.

Has anyone else noticed increased redo logging in their repository database after an upgrade to EM12cR2?

I’m running on Linux x86-64 with an 11.2.0.3 repository database. Prior to the upgrade I was running 12.1.0.1.0 with BP1, and my repository database was producing, on average, about 80-120MB of archived redo logs per hour. Since the upgrade to 12.1.0.2.0, my repository database is producing, on average, 450-500MB of archived redo logs per hour.

Upgrading the deployed agents from 12.1.0.1.0 to 12.1.0.2.0 hasn’t seemed to help; I’ve upgraded about 10 of 15 without any apparent decrease in redo logging.

Just wondering if this is comparable to what others are seeing or if I have a local issue I should investigate.