Showing posts with label OpenManage. Show all posts
Showing posts with label OpenManage. Show all posts

Wednesday, February 5, 2014

Omreport fails : object not found

If you get the following message while using omreport :
 $ omreport chassis memory  
 Memory Information  
 Error : Memory object not found  
 $ omreport chassis hwperformance  
 Error! No Hardware Peformance probes found on this system.  

The first thing to do is to restart the srvadmin services :
 # srvadmin-services.sh restart  
 # service ipmi restart  

Check that the services are properly started.

If that doesn't solve the problem, you might have a semaphore issue. In my case Zabbix agent/scripts became nuts and didn't close its semaphores.

To list the current semaphore's arrays use the following command :
 # ipcs -s  

To show the current system limits
 # ipcs -sl  

You can use the following command to count the current number of semaphore's arrays
 # ipcs -us  

If you reached the system limit, it will certainly explain the omreport issue. From now on, you have two possibilities :

  • You've reached the limit because there is an issue on your system (semaphores not closed or whatever reason). You need to cleanup your semaphores with the following command :
 # ipcrm -s semaphore_id  
 To clean all semaphores from a particular user :  
 # ipcs -s | awk '/username/ {system("ipcrm -s" $2)}'   

Important : You need to stop attached process before removing the semaphores.
  • All your semaphores are legit, you need to increase the system limits :
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Tuning_and_Optimizing_Red_Hat_Enterprise_Linux_for_Oracle_9i_and_10g_Databases/sect-Oracle_9i_and_10g_Tuning_Guide-Setting_Semaphores-Setting_Semaphore_Parameters.html

Hope that helps !

Friday, October 11, 2013

Dell Firmware update fails with "mktemp: too many templates"

If you have the following error while updating a Dell server firmware (BIOS, RAID, etc) via Linux binary (*.BIN) :
 mktemp: too many templates  

Then check the binary's filename for specials characters, in my case Chrome added a "(1)" at the end of the filename. Remove it, restart the update process and you're good to go !

Wednesday, May 22, 2013

omreport : failed to load external entity "/opt/dell/srvadmin/var/lib/openmanage/xslroot//oma/cli/about.xsl"

If you're having the following error when executing omreport :
 I/O warning : failed to load external entity "/opt/dell/srvadmin/var/lib/openmanage/xslroot//oma/cli/about.xsl"  
 error  
 xsltParseStylesheetFile : cannot parse /opt/dell/srvadmin/var/lib/openmanage/xslroot//oma/cli/about.xsl  
 Error! XML Transformation failed  

Then install srvadmin-omcommon package :
 # yum install srvadmin-omcommon  

Tuesday, May 21, 2013

DRAC Firmware update failed : Error: 30001 Method httpCgiErrorPage()

Have tried to update an old DRAC4 Firmware from firmware 1.5 to 1.75 via Linux binary and came to an unplaisant surprise :
 Dell Remote Access Controller 4/P  
 The version of this Update Package is newer than the currently installed version.  
 Software application name: Dell Remote Access Controller 4/P Firmware  
 Package version: 1.75  
 Installed version: 1.50  
 Continue? Y/N:Y  
 Executing update...  
 WARNING: DO NOT STOP THIS PROCESS OR INSTALL OTHER DELL PRODUCTS WHILE UPDATE IS IN PROGRESS.  
 THESE ACTIONS MAY CAUSE YOUR SYSTEM TO BECOME UNSTABLE!  
 ......................................................................................
 /tmp/duptmp.xml:6: parser error : Extra content at the end of the document  
 <SVMExecution lang = "en">  
 ^  
 /tmp/.dellSP-XmlResult12908-32487.M19124:6: parser error : Extra content at the end of the document  
 <SVMExecution lang = "en">  
 ^  
 unable to parse /tmp/.dellSP-XmlResult12908-32487.M19124  
 /tmp/.dellSP-XmlResult12908-32487.M19124:6: parser error : Extra content at the end of the document  
 <SVMExecution lang = "en">  
 ^  
 unable to parse /tmp/.dellSP-XmlResult12908-32487.M19124  

Doesn't look good and of course if I try to access the DRAC via HTTPs, I've got a nice CGI error :
 Error: 30001 Method httpCgiErrorPage()  

Looked on the web and somebody (who contacted Dell Support) advises to shutdown the server, unplug the DRAC card for a while and plug it in back... Well explain to your CTO that you need to shutdown a production server, unrack it, unplug a card just because a DRAC update failed o_O
Reference: http://lists.us.dell.com/pipermail/linux-poweredge/2008-January/034556.html

The solution that worked for me was to install the racadm Dell tool on my bastion and reset the firmware remotely.

  • First install racadm :
 # yum install srvadmin-racadm4.x86_64  
Note : This is for DRAC4, didn't had the issue with newer DRAC.
Note 2 : You need to have the Dell OSMA repository installed on your server:
http://www.openfusion.net/linux/dell_omsa

  •  Then run the following command :
 # racadm -rDRAC_IP -i racreset  
Note : Change DRAC_IP with your DRAC IP.
Note 2 : This operation will NOT erase your DRAC configuration.
  •  Wait a while, pray, and if you're lucky as me you should be back on line (with the original firmware version of course).
Final word, I stopped being lazy and updated the firmware via the Web GUI which is a long and annoying process. Of course I used Internet Explorer as I felt like Murphy's law was around this day ^^

Hope that helps !

Thursday, March 28, 2013

Omreport doesn't update disk rebuild progress

Had to replace a hard drive on a Dell Server and omreport rebuild progress got stuck at 1%.

The solution is to restart the srvadmin service :

 # srvadmin-services.sh restart  

This is quite dirty but it's the only solution I found. This also happened when I changed a PERC H700 battery.

Another way to check the rebuild process is to export log with omconfig :

 # omconfig storage controller action=exportlog controller=0  

This creates a /var/log/lsi_MMDD.log file, with the rebuild progress :

 03/09/13 22:07:51: EVT#13296-03/09/13 22:07:51: 99=Rebuild complete on VD 01/1  
 03/09/13 22:07:51: EVT#13297-03/09/13 22:07:51: 100=Rebuild complete on PD 05(e0x20/s5)  
 03/09/13 22:07:51: EVT#13298-03/09/13 22:07:51: 114=State change on PD 05(e0x20/s5) from REBUILD(14) to ONLINE(18)  
 03/09/13 22:07:51: EVT#13299-03/09/13 22:07:51: 81=State change on VD 01/1 from DEGRADED(2) to OPTIMAL(3)  
 03/09/13 22:07:51: EVT#13300-03/09/13 22:07:51: 249=VD 01/1 is now OPTIMAL  

Same thing for the battery learn cycle.

Hope that helps !

Tuesday, March 26, 2013

Dell Openmanage/Omreport failed after updating to CentOS 6.4

After updating a testing machine from CentOS 6.3 to 6.4, the Dell OpenManage tools stopped working AT ALL.
It seems that with the lastest CentOS kernel (2.6.32-358.2.1.el6.x86_64), they moved away some IPMI drivers from kernel modules to "built-in"

The result is :

 # omreport chassis  
 Health   
 # srvadmin-services.sh start  
 Starting Systems Management Device Drivers:  
 Starting dell_rbu:                     [ OK ]  
 Starting ipmi driver:                   [FAILED]  
 Starting Systems Management Device Drivers:  
 Starting dell_rbu: Already started             [ OK ]  
 Starting ipmi driver:                   [FAILED]  
 Starting DSM SA Shared Services:              [ OK ]  
 /var/log/messages reports :   
 instsvcdrv: /etc/rc.d/init.d//dsm_sa_ipmi start command failed with status 1  

Solution : 

 # yum install OpenIPMI  

Note : There is no need to start or chkconfig the service.

You can check that the IPMI components are seen with the following command :

 # service ipmi status  
 ipmi_msghandler module in kernel.  
 ipmi_si module in kernel.  
 ipmi_devintf module loaded.  
 /dev/ipmi0 exists.  

Then start Openmanager services :
 # srvadmin-services.sh start