Chapter 5. Troubleshooting

Contents

5.1. Log Filesdebranded, RFC
5.2. General Problemsdebranded
5.3. Host Not Found/Could Not Determine FQDNdebranded
5.4. Connection Errorsdebranded
5.5. SUSE Manager Debuggingdebranded

This chapter provides tips for determining the cause of and resolving the most common errors associated with SUSE Manager. For services and support options available for your product, refer to http://www.novell.com/services/.

In addition, you may package configuration information and logs from SUSE Manager and send them to Novell for further diagnosis. Refer to Section 5.5, “SUSE Manager Debugging” for instructions.

5.1. Log Files

Virtually every troubleshooting step should start with a look at the associated log file or files. These provide invaluable information about the activity that has taken place on the device or within the application that can be used to monitor performance and ensure proper configuration. See Table 5.1, “Log Files” for the paths to all relevant log files:

[Note]

There may be numbered log files (such as /var/log/rhn/rhn_satellite_install.log.1, /var/log/rhn/rhn_satellite_install.log.2, etc.) within the /var/log/rhn/ directory. These are rotated logs, which are log files created with a .<NUMBER> extension when the current rhn_satellite_install.log file fills up to a size as specified by the logrotate(8) daemon and the contents written to a rotated log file. For example, the rhn_satellite_install.log.1 contains the oldest rotated log file, while rhn_satellite_install.log.4 contains the most recently rotated log.

Table 5.1. Log Files

Component/TaskLog File Location
Apache Web server/var/log/httpd/ directory
SUSE Manager/var/log/rhn/ directory
SUSE Manager Installation /var/log/susemanager_setup.log
Database installation (Embedded Database) /var/log/rhn/install_db.log
Database population/var/log/rhn/populate_db.log
SUSE Manager Synchronization Tool /var/log/rhn/mgr-ncc-sync.log
Monitoring infrastructure/var/log/nocpulse/ directory
Monitoring notifications/var/log/notification/ directory
Task Engine (taskomatic)/var/log/messages
yum /var/log/yum.log
zypper /var/log/zypper.log
XML-RPC transactions/var/log/rhn/rhn_server_xmlrpc.log

5.2. General Problems

To begin troubleshooting general problems, examine the log file or files related to the component exhibiting failures.

A common issue is full disk space. An almost sure sign of this is the appearance of halted writing in the log files. If logging stopped during a write, such as mid-word, you likely have filled disks. To confirm this, run this command and check the percentages in the Use% column:

df -h

In addition to log files, you can obtain valuable information by retrieving the status of your SUSE Manager and its various components. This can be done with the command:

/usr/sbin/spacewalk-service status

In addition, you can obtain the status of components such as the Apache Web server and the Task Engine individually. For instance, to view the status of the Apache Web server, run the command:

rcapache2 status

If the Apache Web server is not running, entries in your /etc/hosts file may be incorrect. Refer to Section 5.3, “Host Not Found/Could Not Determine FQDN” for a description of this problem and possible solutions.

To obtain the status of the Task Engine, run the command:

rctaskomatic status

To obtain the status of SUSE Manager's embedded database, if it exists, run the command:

service oracle status

To determine the version of your database schema, run the command:

rhn-schema-version

To derive the character set types of your SUSE Manager's database, run the command:

rhn-charsets

If the administrator is not getting email from SUSE Manager, confirm the correct email addresses have been set for traceback_mail in /etc/rhn/rhn.conf.

If the traceback mail is marked from susemanager@suse.de and you would like the address to be valid for your organization, include the web.default_mail_from option and appropriate value in /etc/rhn/rhn.conf.

If importing a channel fails and you cannot recover it in any other way, run this command to delete the cache:

rm -rf temporary-directory

Next, restart the importation .

If zypper up) or the push capability of SUSE Manager ceases to function, it is possible that old log files may be at fault. Stop the jabberd daemon before removing these files. To do so, issue the following commands as root:

rcjabberd stop
cd /var/lib/jabberd
rm -f _db*
rcjabberd start

5.3. Host Not Found/Could Not Determine FQDN

Because SUSE Manager configuration files rely exclusively on fully qualified domain names (FQDN), it is imperative key applications are able to resolve the name of the SUSE Manager server into an IP address. Red Hat Update Agent, , and the Apache Web server are particularly prone to this problem with the applications issuing errors of "host not found" and the Web server stating "Could not determine the server's fully qualified domain name" upon failing to start.

This problem typically originates from the /etc/hosts file. You may confirm this by examining /etc/nsswitch.conf, which defines the methods and the order by which domain names are resolved. Usually, the /etc/hosts file is checked first, followed by Network Information Service (NIS) if used, followed by DNS. One of these has to succeed for the Apache Web server to start and the client applications to work.

To resolve this problem, identify the contents of the /etc/hosts file. It may look like this:

127.0.0.1 this_machine.example.com this_machine localhost.localdomain \ localhost

In a text editor, remove the offending machine information:

127.0.0.1 localhost.localdomain.com localhost

Save the file and attempt to re-run the client applications or the Apache Web server. If they still fail, explicitly identify SUSE Manager server's IP address in the file, such as:

127.0.0.1 localhost.localdomain.com localhost
123.45.67.8 this_machine.example.com this_machine

Replace the value here with the actual IP address of the SUSE Manager server. This should resolve the problem. Keep in mind, if the specific IP address is stipulated, the file will need to be updated when the machine obtains a new address.

5.4. Connection Errors

A common connection problem, indicated by SSL_CONNECT errors, is the result of a SUSE Manager server being installed on a machine whose time had been improperly set. During the installation process, SSL certificates are created with inaccurate times. If the time on SUSE Manager is then corrected, the certificate start date and time may be set in the future, making it invalid.

To troubleshoot this, check the date and time on the clients and on SUSE Manager with the following command:

date

The results should be nearly identical for all machines and within the "notBefore" and "notAfter" validity windows of the certificates. Check the client certificate dates and times with the following command:

openssl x509 -dates -noout -in /usr/share/rhn/RHN-ORG-TRUSTED-SSL-CERT

Check the SUSE Manager server certificate dates and times with the following command:

openssl x509 -dates -noout -in /etc/httpd/conf/ssl.crt/server.crt

By default, the server certificate has a one-year life while client certificates are good for 10 years. If you find the certificates are incorrect, you can either wait for the valid start time, if possible, or create new certificates, preferably with all system times set to GMT.

The following measures can be used to troubleshoot general connection errors:

  • Attempt to connect to SUSE Manager's database at the command line using the correct connection string as found in /etc/rhn/rhn.conf:

    sqlplus username/password@sid
    
  • Ensure SUSE Manager is using Network Time Protocol (NTP) and set to the appropriate time zone. This also applies to all client systems and the separate database machine in SUSE Manager (if used with a stand-alone database).

  • Confirm the correct package:

    rhn-org-httpd-ssl-key-pair-MACHINE_NAME-VER-REL.noarch.rpm 
    

    is installed on SUSE Manager and the corresponding rhn-org-trusted-ssl-cert-*.noarch.rpm or raw CA SSL public (client) certificate is installed on all client systems.

  • Verify the client systems are configured to use the appropriate certificate.

  • If also using one or more SUSE ManagerProxy Servers, ensure each Proxy's SSL certificates are prepared correctly. The Proxy should have both its own server SSL key-pair and CA SSL public (client) certificate installed, since it will serve in both capacities. Refer to Chapter SSL Infrastructure (↑Client Configuration Guide) for specific instructions.

  • Make sure client systems are not using firewalls of their own, blocking required ports.

5.5. SUSE Manager Debugging

If you have exhausted the troubleshooting steps above and need more help, contact the Novell support with an aggregation of SUSE Manager's configuration parameters, log files, and database information.

SUSE Manager provides a command line tool explicitly for this purpose. Log in to your SUSE Manager server as root and execute the following command:

spacewalk-debug

It collects several pieces of information and stored them in a tarball:

Collecting and packaging relevant diagnostic information.
   Warning: this may take some time...
   * copying configuration information
   * copying logs
   * copying cobbler files
   * copying monitoring moc logs
   * copying monitoring scout logs
   * copying ssl-build
   * copying /etc/sudoers
   * copying apache, oracle, tomcat, nocpulse entries from /etc/passwd
   * copying apache, oracle, tomcat, nocpulse entries from /etc/group
   * querying RPM database (versioning of Spacewalk, etc.)
   * querying schema version, database charactersets and database
   * get diskspace available
   * get database statistics
   * get schema statistics
   * copying audit.log
   * timestamping
   * creating tarball (may take some time): /tmp/spacewalk-debug.tar.bz2
   * removing temporary debug tree
   
   Debug dump created, stored in /tmp/spacewalk-debug.tar.bz2