Tag Archives: emcli

EM13cR2 AWR Warehouse “Error communicating with agent” during transfer step with custom certificates

I have just noticed and resolved an issue in my EM13c R2 AWR Warehouse environment that I brought upon myself, hence a blog post for any others who might run into this, which also seems like a good time to release the scripts I use to generate and populate Oracle wallets for my EM13c agents.

After moving an AWRW source database from one EM13c managed server to a different EM13c managed server (same OS, same DB release), AWRW loads from that server began to fail. While debugging the issue, I first had to resolve an already-documented issue (see MOS note 2075341.1) where the source database had a NULL definition for the CAW_EXTR directory object, then fix up the data in the DBSNMP.CAW_EXTRACT_PROPERTIES table to reflect the CAW_EXTR directory. After resolving that, AWRW extracts ran successfully from the source database, but began to hang indefinitely during the CAW_RUN_ETL_NOW job in the transferAWR/transferFile job step, displaying only a cryptic error message:

An unhelpful error message

A helpful error message

I ran through many debugging steps: changing preferred credentials, bouncing the agents, checking for firewalls blocking connectivity, none seemed to help. Eventually I realized the step I had missed in setting up the new managed server where the source database now runs: I had not generated an Oracle wallet for the agent on the new server, while I did have an Oracle wallet for the agent on the previous, now-retired server. This created an issue because I have secured the agent on my OMS host (where my AWRW repository database runs) with a custom third party certificate, and the new agent, lacking a wallet containing a trusted root certificate to which it could trace the repository agent’s certificate, could not initiate a connection from the AWRW source DB host agent to the AWRW repository DB host agent.

I generated a wallet for the new agent, added the trusted root certificate and a certificate for the host to the wallet, stopped the agent, deployed the wallet, and started the agent. After those steps, running the AWRW load from this source database completed successfully. I believe that the missing trusted root certificate prevented the creation of a secure channel between the two agents. I probably did not need to add the host certificate to resolve this problem, but consider it a good practice anyway.

If you read this far, you may find my create_agent_wallets.sh script useful to generate wallets and certificate signing requests for every agent in your environment. If you find the wallet creation script useful, you may also find my import_agent_wallets.sh script useful to populate those wallets with signed certificates received from your CA.

Script to automate lock down of all EM13c agents to TLSv1.2 with EMCLI

I could not find any obvious documentation about locking down Oracle Enterprise Manager 13c management agents to forbid TLSv1 and TLSv1.1, permitting only TLSv1.2, so I went looking and found the emdpropdefs.xml file in $AGENT_HOME/agent_13.1.0.0.0/sysman/admin/ that documents the existence of the minimumTLSVersion property in emd.properties:

name='minimumTLSVersion'
modifiable='true'
defaultValue='TLSv1'
description='The oldest version of the TLS protocol which this agent should support when accepting connections or initiating connections to the OMS. Currently supported values are "TLSv1", "TLSv1.1", and "TLSv1.2".'
valueType='String'
advanced='true'
migrate='source'
filename='emd.properties'
category='Runtime Settings'
internal='true'
restartRequired='true'

I tested this parameter on my OMS server agent, restarted the agent, and confirmed with my Securing Oracle Enterprise Manager 13c script that the agent no longer allowed connections using any protocol other than TLSv1.2. Next I wanted to automated this, to avoid the effort of manually changing this property on each agent and then restarting that agent, so I went directly to EMCLI which allows EM13c admins to (among many other things) set agent properties and restart agents. I then created a script to fetch a list of all agents, check for the TLS protocols each agent permits, and then apply the change and restart the agent for every agent that I had not already locked down. I have copied this script below.

Before using the script, you must login to EMCLI using “emcli login -username=yourusername” and provide your password. For security reasons I elected not to wrap the EMCLI login within this script; that way you do not have to trust my script to handle your password securely, as the script never sees your password. For the step to restart your agents to work correctly, you need to make sure that your EM13c user account has preferred host credentials set for your agent targets that can successfully login to the host server and restart the agent.

Here is a copy of the script, followed by the (anonymized) output from a sample run. Someday soon I will get set up on github to make it easier to retrieve my scripts, but for now you can copy and paste this. This script expects to find the emcli binary inside of the $MW_HOME/bin directory, so make sure you have $MW_HOME set before running it, or provide the full path to EMCLI within the script. It will also log you out of EMCLI when the script completes.


#!/bin/bash
#
# This script will retrieve a list of agents from your EM13c environment,
# determine if they allow connections using TLS protocol versions older
# than TLSv1.2, and then disable all protocols older than TLSv1.2.
#
# Finally it will restart each modified agent to apply the change.
#
# You need to login to EMCLI first before running this script.
#
# Released v0.1: Initial beta release 5 Oct 2016
#
#
# From: @BrianPardy on Twitter
# https://pardydba.wordpress.com/
#
# Known functional on Linux x86-64, may work on Solaris and AIX.

EMCLI=$MW_HOME/bin/emcli

if [[ -x "/usr/sfw/bin/gegrep" ]]; then
GREP=/usr/sfw/bin/gegrep
else
GREP=`which grep`
fi

OPENSSL=`which openssl`

if [[ -x "/usr/bin/openssl1" && -f "/etc/SuSE-release" ]]; then
OPENSSL=`which openssl1`
fi

OPENSSL_HAS_TLS1_2=`$OPENSSL s_client help 2>&1 | $GREP -c tls1_2`

$EMCLI sync
NOT_LOGGED_IN=$?

if [[ $NOT_LOGGED_IN > 0 ]]; then
echo "Login to EMCLI with \"$EMCLI login -username=USER\" then run this script again"
exit 1
fi

for agent in `$EMCLI get_targets -targets=oracle_emd | grep oracle_emd | awk '{print $4}'`
do
echo
if [[ $OPENSSL_HAS_TLS1_2 > 0 ]]; then
echo -n "Checking TLSv1 on $agent... "

OPENSSL_RETURN=`echo Q | $OPENSSL s_client -prexit -connect $agent -tls1 2>&1 | $GREP Cipher | $GREP -c 0000`

if [[ $OPENSSL_RETURN == 0 ]]; then
echo "allows TLSv1"
else
echo "already forbids TLSv1"
fi
fi

if [[ $OPENSSL_HAS_TLS1_2 > 0 ]]; then
echo -n "Checking TLSv1.1 on $agent... "

OPENSSL_TLS11_RETURN=`echo Q | $OPENSSL s_client -prexit -connect $agent -tls1_1 2>&1 | $GREP Cipher | $GREP -c 0000`

if [[ $OPENSSL_RETURN == 0 ]]; then
echo "allows TLSv1.1"
else
echo "already forbids TLSv1.1"
fi
fi

if [[ $OPENSSL_RETURN == 0 || $OPENSSL_TLS11_RETURN == 0 ]]; then
$EMCLI set_agent_property -agent_name=$agent -name=minimumTLSVersion -value=TLSv1.2 -new

echo
echo "Restarting $agent to apply changes"
$EMCLI restart_agent -agent_name=$agent -credential_setname="HostCreds"
RESTART_RETURN=$?

if [[ $RESTART_RETURN != 0 ]]; then
echo "Unable to restart agent: restart agent manually or set preferred host credentials for agent"
fi
fi
done

$EMCLI logout

exit 0

Sample (anonymized) output below. Note how the script cannot restart an agent lacking preferred host credentials. In this case, I assign preferred host credentials and then re-run the script to complete the process.


Synchronized successfully

Checking TLSv1 on server1.subdomain.domain.com:1830... already forbids TLSv1
Checking TLSv1.1 on server1.subdomain.domain.com:1830... already forbids TLSv1.1

Checking TLSv1 on server2.domain.com:3872... already forbids TLSv1
Checking TLSv1.1 on server2.domain.com:3872... already forbids TLSv1.1

Checking TLSv1 on server3.domain.com:3872... already forbids TLSv1
Checking TLSv1.1 on server3.domain.com:3872... already forbids TLSv1.1

Checking TLSv1 on server4.domain.com:3872... already forbids TLSv1
Checking TLSv1.1 on server4.domain.com:3872... already forbids TLSv1.1

Checking TLSv1 on server5.domain.com:1830... already forbids TLSv1
Checking TLSv1.1 on server5.domain.com:1830... already forbids TLSv1.1

Checking TLSv1 on server6.domain.com:1830... already forbids TLSv1
Checking TLSv1.1 on server6.domain.com:1830... already forbids TLSv1.1

Checking TLSv1 on server7.domain.com:3872... already forbids TLSv1
Checking TLSv1.1 on server7.domain.com:3872... already forbids TLSv1.1

Checking TLSv1 on server8.domain.com:3872... already forbids TLSv1
Checking TLSv1.1 on server8.domain.com:3872... already forbids TLSv1.1

Checking TLSv1 on server9.domain.com:3872... already forbids TLSv1
Checking TLSv1.1 on server9.domain.com:3872... already forbids TLSv1.1

Checking TLSv1 on server10.domain.com:1830... already forbids TLSv1
Checking TLSv1.1 on server10.domain.com:1830... already forbids TLSv1.1

Checking TLSv1 on server11.domain.com:1830... already forbids TLSv1
Checking TLSv1.1 on server11.domain.com:1830... already forbids TLSv1.1

Checking TLSv1 on server12.domain.com:1830... already forbids TLSv1
Checking TLSv1.1 on server12.domain.com:1830... already forbids TLSv1.1

Checking TLSv1 on omshost.domain.com:3872... already forbids TLSv1
Checking TLSv1.1 on omshost.domain.com:3872... already forbids TLSv1.1

Checking TLSv1 on server13.domain.com:3872... allows TLSv1
Checking TLSv1.1 on server13.domain.com:3872... allows TLSv1.1
Agent Property minimumTLSVersion has been successfully updated to the value TLSv1.2.

Restarting server13.domain.com:3872 to apply changes
The Restart operation is in progress for the Agent: server13.domain.com:3872
The Agent "server13.domain.com:3872" has been restarted successfully.
---------------------
Operation Output
---------------------
Oracle Enterprise Manager Cloud Control 13c Release 1
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.Stopping agent ... stopped.Oracle Enterprise Manager Cloud Control 13c Release 1
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.Starting agent ................ started.

Checking TLSv1 on server14.domain.com:1830... allows TLSv1
Checking TLSv1.1 on server14.domain.com:1830... allows TLSv1.1
Agent Property minimumTLSVersion has been successfully updated to the value TLSv1.2.

Restarting server14.domain.com:1830 to apply changes
The Restart operation is in progress for the Agent: server14.domain.com:1830
Unable to restart agent: restart agent manually or set preferred host credentials for agent

Checking TLSv1 on server15.domain.com:3872... already forbids TLSv1
Checking TLSv1.1 on server15.domain.com:3872... already forbids TLSv1.1

Checking TLSv1 on server16.domain.com:3872... already forbids TLSv1
Checking TLSv1.1 on server16.domain.com:3872... already forbids TLSv1.1

Checking TLSv1 on server17.domain.com:3872... already forbids TLSv1
Checking TLSv1.1 on server17.domain.com:3872... already forbids TLSv1.1
Logout successful

How and why you should set target lifecycle status properties in EM12c

If, like me, you’re using EM12c after you were already plenty familiar with EM11g, you may have missed an important detail in the EM12c new features guide.

From New Features in Oracle Enterprise Manager Cloud Control 12c

Each target now has a lifecycle status target property which can be set to one of the following values: mission critical, production, staging, test, or development. This target property is used to specify the priority by which data from the target should be handled. When Enterprise Manager is under a heavy load, targets where the value of the lifecycle property is mission critical or production are treated with higher priority than targets where the value of the lifecycle property is staging, test, or development. Setting the priorities of your most important targets ensures that even as your data center grows and the number of managed targets grows, your most important targets continue to be treated at high priority.

You may not use some of the other new features like administration groups or lifecycle management, but it’s still very much worth your while to set the lifecycle status target property. After all, you’re more concerned about alerts and monitoring on your mission critical and other production systems than you are on the staging and test systems, so why not tell EM12c about that and gain the benefits of target prioritization?

If you have quite a few targets it can be quite tedious to step through them all in the GUI interface to set this property. It works, but it’ll take a while. Enter emcli. Rob Zoeteweij has covered the setup of EMCLI in his blog post Installing EMCLI on EM12c so I won’t repeat that here other than to add that with the release of EM12cR2 there is no longer a JDK in the $OMS_HOME so if you’re running 12.1.0.2 you should amend his instructions as follows:


oracle@omshost$ export JAVA_HOME=$OMS_HOME/../jdk16/jdk
oracle@omshost$ export PATH=$JAVA_HOME/bin:$PATH
oracle@omshost$ export ORACLE_HOME=$OMS_HOME
oracle@omshost$ cd $ORACLE_HOME
oracle@omshost$ mkdir emcli
oracle@omshost$ java -jar $ORACLE_HOME/sysman/jlib/emclikit.jar client -install_dir=$ORACLE_HOME/emcli
Oracle Enterprise Manager 12c Release 2.
Copyright (c) 1996, 2012 Oracle Corporation. All rights reserved.

EM CLI client-side install completed successfully.
oracle@omshost$ $ORACLE_HOME/emcli/emcli setup -url=https://omshost.domain.com:7803/em -username=sysman
Oracle Enterprise Manager Cloud Control 12c Release 2.
Copyright (c) 1996, 2012 Oracle Corporation and/or its affiliates. All rights reserved.


Enter password

Warning: This certificate has not been identified as trusted in the local trust store
--------------------------------------
[certificate details snipped]
--------------------------------------
Do you trust the certificate chain? [yes/no] yes
Emcli setup successful

Once you have emcli running you can easily create files containing the target properties you would like to set, including the target lifecycle status, and apply them in bulk to your EM12c installation.

For the example I will demonstrate setting the target lifecycle status property for host targets. First you need to produce a list of your host targets, formatted for eventual input to the set_target_property_value verb:

oracle@omshost$ ./emcli get_targets -noheader -script -targets=host | awk '{print $4":"$3":LifeCycle Status:"}' > /tmp/targets

As another example, from commenter Bill Korb, to produce a list of database targets, including any currently under blackout, run:

oracle@omshost$ ./emcli get_targets -noheader -format='name:script;column_separator:|;' -targets='%database%' | awk -F\| '{print $4":"$3":LifeCycle Status:"}' > /tmp/targets

Edit the resulting file, appending the host or database’s lifecycle stage to each line. Be aware of the predefined lifecycle stages provided by Oracle, which are listed below in order of precedence:

  1. MissionCritical
  2. Production
  3. Stage
  4. Test
  5. Development

You can modify the names of these lifecycle stages with the modify_lifecycle_stage_name verb if you wish. Your file should now look something like:

dev1.domain.com:host:LifeCycle Status:Development
omshost.domain.com:host:LifeCycle Status:MissionCritical
prod1.domain.com:host:LifeCycle Status:Production
prod2.domain.com:host:LifeCycle Status:Production
prod3.domain.com:host:LifeCycle Status:Production
prod4.domain.com:host:LifeCycle Status:Production
stage1.domain.com:host:LifeCycle Status:Stage
test1.domain.com:host:LifeCycle Status:Test
test2.domain.com:host:LifeCycle Status:Test

Now make the call to emcli to load your target property definitions into the OMS:

oracle@omshost$ ./emcli set_target_property_value -property_records="REC_FILE" -input_file="REC_FILE:/tmp/targets" -separator=property_records="\n"

There you go. Your hosts are now updated with appropriate target lifecycle stages and the OMS will prioritize them based on these settings whenever the OMS is under high load. Repeat this for your listeners (-targets=oracle_listener), database instances (-targets=oracle_database) and so on until all of your targets have a lifecycle stage assigned. I’ve broken these out by target type for simplicity of documentation, but you can also just produce a single large file containing the lifecycle status for all of your targets and load the whole thing at once. This same technique works to assign a contact, comment, location, or any other target property you find useful.