Showing posts with label Hardware. Show all posts
Showing posts with label Hardware. Show all posts

Friday, February 7, 2014

Smartctl : Linux disk I/O scheduler is reseted back to default's CFQ

Got a weird issue recently, I'm monitoring my SSD's life time with smartctl + Zabbix and realized that my scheduler settings are reseted each time smartctl was executed !

 # echo noop > /sys/block/sda/queue/scheduler  
   
 # cat /sys/block/sda/queue/scheduler  
 [noop] anticipatory deadline cfq  
   
 # smartctl -A --device=sat+megaraid,0 /dev/sda  
 smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-358.23.2.el6.x86_64] (local build)  
 Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net  
 === START OF READ SMART DATA SECTION ===  
 ...  
 ...  
   
 # cat /sys/block/sda/queue/scheduler  
 noop anticipatory deadline [cfq]  


There is no real solution, but you can work around by specifying  the generic SCSI name i.e "sgX "instead of sdX.

 # echo noop > /sys/block/sda/queue/scheduler  
   
 # cat /sys/block/sda/queue/scheduler  
 [noop] anticipatory deadline cfq  
   
 # smartctl -A --device=sat+megaraid,0 /dev/sg0  
 smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-358.23.2.el6.x86_64] (local build)  
 Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net  
 === START OF READ SMART DATA SECTION ===  
 ...  
 ...  
   
 # cat /sys/block/sda/queue/scheduler  
  [noop] anticipatory deadline cfq 

And voila ! Problem not really solved but that does the job !

You can use sg_map (part of the sg3_utils package) to check the sdX -> sgX mappings :

 # sg_map -a  
 /dev/sg0 /dev/sda  
 /dev/sg1 /dev/sdb  
 /dev/sg2 /dev/scd0  

Wednesday, February 5, 2014

Omreport fails : object not found

If you get the following message while using omreport :
 $ omreport chassis memory  
 Memory Information  
 Error : Memory object not found  
 $ omreport chassis hwperformance  
 Error! No Hardware Peformance probes found on this system.  

The first thing to do is to restart the srvadmin services :
 # srvadmin-services.sh restart  
 # service ipmi restart  

Check that the services are properly started.

If that doesn't solve the problem, you might have a semaphore issue. In my case Zabbix agent/scripts became nuts and didn't close its semaphores.

To list the current semaphore's arrays use the following command :
 # ipcs -s  

To show the current system limits
 # ipcs -sl  

You can use the following command to count the current number of semaphore's arrays
 # ipcs -us  

If you reached the system limit, it will certainly explain the omreport issue. From now on, you have two possibilities :

  • You've reached the limit because there is an issue on your system (semaphores not closed or whatever reason). You need to cleanup your semaphores with the following command :
 # ipcrm -s semaphore_id  
 To clean all semaphores from a particular user :  
 # ipcs -s | awk '/username/ {system("ipcrm -s" $2)}'   

Important : You need to stop attached process before removing the semaphores.
  • All your semaphores are legit, you need to increase the system limits :
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Tuning_and_Optimizing_Red_Hat_Enterprise_Linux_for_Oracle_9i_and_10g_Databases/sect-Oracle_9i_and_10g_Tuning_Guide-Setting_Semaphores-Setting_Semaphore_Parameters.html

Hope that helps !

Friday, October 11, 2013

Dell Firmware update fails with "mktemp: too many templates"

If you have the following error while updating a Dell server firmware (BIOS, RAID, etc) via Linux binary (*.BIN) :
 mktemp: too many templates  

Then check the binary's filename for specials characters, in my case Chrome added a "(1)" at the end of the filename. Remove it, restart the update process and you're good to go !

Monday, July 8, 2013

Dell DRAC Console/KVM with Chrome or Firefox

Here is a really simple trick to access to your DRAC remote console (i.e virtual KVM) with Chrome or firefox.
This trick has been tested with DRAC 5,6 and 7 only.

Requirement : You need to have a working JRE

  • Log in to your DRAC Web interface, go to "System -> Console Media"
  • Clic on "Launch Virtual Console"
  • The browser will ask you to open or save a file, save it on your Hard Drive
  • The downloaded file has the form "viewer.jnlp(x.x.x.x@x@idrac-xxxxxxx,+xxxxxxxxx,+User-xxxxx@xxxxxxxxx)"
  • Rename the file "viewer.jnlp" (i.e remove the garbage data after the extension)
  • Double clic on the file and you're done.

Really easy but so handy !

Hope that helps

Wednesday, May 22, 2013

omreport : failed to load external entity "/opt/dell/srvadmin/var/lib/openmanage/xslroot//oma/cli/about.xsl"

If you're having the following error when executing omreport :
 I/O warning : failed to load external entity "/opt/dell/srvadmin/var/lib/openmanage/xslroot//oma/cli/about.xsl"  
 error  
 xsltParseStylesheetFile : cannot parse /opt/dell/srvadmin/var/lib/openmanage/xslroot//oma/cli/about.xsl  
 Error! XML Transformation failed  

Then install srvadmin-omcommon package :
 # yum install srvadmin-omcommon  

Tuesday, May 21, 2013

DRAC Firmware update failed : Error: 30001 Method httpCgiErrorPage()

Have tried to update an old DRAC4 Firmware from firmware 1.5 to 1.75 via Linux binary and came to an unplaisant surprise :
 Dell Remote Access Controller 4/P  
 The version of this Update Package is newer than the currently installed version.  
 Software application name: Dell Remote Access Controller 4/P Firmware  
 Package version: 1.75  
 Installed version: 1.50  
 Continue? Y/N:Y  
 Executing update...  
 WARNING: DO NOT STOP THIS PROCESS OR INSTALL OTHER DELL PRODUCTS WHILE UPDATE IS IN PROGRESS.  
 THESE ACTIONS MAY CAUSE YOUR SYSTEM TO BECOME UNSTABLE!  
 ......................................................................................
 /tmp/duptmp.xml:6: parser error : Extra content at the end of the document  
 <SVMExecution lang = "en">  
 ^  
 /tmp/.dellSP-XmlResult12908-32487.M19124:6: parser error : Extra content at the end of the document  
 <SVMExecution lang = "en">  
 ^  
 unable to parse /tmp/.dellSP-XmlResult12908-32487.M19124  
 /tmp/.dellSP-XmlResult12908-32487.M19124:6: parser error : Extra content at the end of the document  
 <SVMExecution lang = "en">  
 ^  
 unable to parse /tmp/.dellSP-XmlResult12908-32487.M19124  

Doesn't look good and of course if I try to access the DRAC via HTTPs, I've got a nice CGI error :
 Error: 30001 Method httpCgiErrorPage()  

Looked on the web and somebody (who contacted Dell Support) advises to shutdown the server, unplug the DRAC card for a while and plug it in back... Well explain to your CTO that you need to shutdown a production server, unrack it, unplug a card just because a DRAC update failed o_O
Reference: http://lists.us.dell.com/pipermail/linux-poweredge/2008-January/034556.html

The solution that worked for me was to install the racadm Dell tool on my bastion and reset the firmware remotely.

  • First install racadm :
 # yum install srvadmin-racadm4.x86_64  
Note : This is for DRAC4, didn't had the issue with newer DRAC.
Note 2 : You need to have the Dell OSMA repository installed on your server:
http://www.openfusion.net/linux/dell_omsa

  •  Then run the following command :
 # racadm -rDRAC_IP -i racreset  
Note : Change DRAC_IP with your DRAC IP.
Note 2 : This operation will NOT erase your DRAC configuration.
  •  Wait a while, pray, and if you're lucky as me you should be back on line (with the original firmware version of course).
Final word, I stopped being lazy and updated the firmware via the Web GUI which is a long and annoying process. Of course I used Internet Explorer as I felt like Murphy's law was around this day ^^

Hope that helps !

Thursday, April 4, 2013

Choosing RAID Level / Stripe Size

Below interesting articles on how to choose your RAID level / stripe size.

Good litterature on RAID :
http://www.fccps.cz/download/adv/frr/hdd/hdd.html

RAID Level Explained :
http://www.techrepublic.com/blog/datacenter/choose-a-raid-level-that-works-for-you/3237

RAID Stripe Explained :
http://www.anandtech.com/show/788/5

RAID Benchmarks :
https://raid.wiki.kernel.org/index.php/Performance

RAID Calculator :
http://www.z-a-recovery.com/art-raid-estimator.htm

In any case, always plan your workload type (Read/Writes, Sequential/Random, Large/Small File, Number of concurrent access).

Wednesday, April 3, 2013

Create large partitions on Linux / Bypass the 2TB partition Limit

The default partition schema (MBR based) limits partition to 2.2TB. With new hardrives this limit is easily reached.

In order to create partition bigger than 2.2TB you need to switch from MBR to GUID (GPT) partition table.
This can be done with the "parted" utility on Linux.

For exemple if you want to create a single big partition on /dev/sdb :

 # parted /dev/sdb  
 (parted) mklabel GPT  
 (parted) mkpart partition_name fstype 1 -1  
 (parted) print  
 Model: DELL PERC H700 (scsi)  
 Disk /dev/sdb: 4000GB  
 Sector size (logical/physical): 512B/512B  
 Partition Table: gpt  
 Number Start  End   Size  File system Name Flags  
  1   1049kB 4000GB 4000GB        data  

Note : I found out that partition name and fstype are quite useless.

You can then format the partition with the filesystem of your choice or create a LVM PV.

More info on GUID / MBR Limits :
http://en.wikipedia.org/wiki/GUID_Partition_Table

Parted official website :
http://www.gnu.org/software/parted/

More parted exemples :
http://www.thegeekstuff.com/2011/09/parted-command-examples/

Hope that helps ! 

Thursday, March 28, 2013

Omreport doesn't update disk rebuild progress

Had to replace a hard drive on a Dell Server and omreport rebuild progress got stuck at 1%.

The solution is to restart the srvadmin service :

 # srvadmin-services.sh restart  

This is quite dirty but it's the only solution I found. This also happened when I changed a PERC H700 battery.

Another way to check the rebuild process is to export log with omconfig :

 # omconfig storage controller action=exportlog controller=0  

This creates a /var/log/lsi_MMDD.log file, with the rebuild progress :

 03/09/13 22:07:51: EVT#13296-03/09/13 22:07:51: 99=Rebuild complete on VD 01/1  
 03/09/13 22:07:51: EVT#13297-03/09/13 22:07:51: 100=Rebuild complete on PD 05(e0x20/s5)  
 03/09/13 22:07:51: EVT#13298-03/09/13 22:07:51: 114=State change on PD 05(e0x20/s5) from REBUILD(14) to ONLINE(18)  
 03/09/13 22:07:51: EVT#13299-03/09/13 22:07:51: 81=State change on VD 01/1 from DEGRADED(2) to OPTIMAL(3)  
 03/09/13 22:07:51: EVT#13300-03/09/13 22:07:51: 249=VD 01/1 is now OPTIMAL  

Same thing for the battery learn cycle.

Hope that helps !

Tuesday, March 26, 2013

Dell Openmanage/Omreport failed after updating to CentOS 6.4

After updating a testing machine from CentOS 6.3 to 6.4, the Dell OpenManage tools stopped working AT ALL.
It seems that with the lastest CentOS kernel (2.6.32-358.2.1.el6.x86_64), they moved away some IPMI drivers from kernel modules to "built-in"

The result is :

 # omreport chassis  
 Health   
 # srvadmin-services.sh start  
 Starting Systems Management Device Drivers:  
 Starting dell_rbu:                     [ OK ]  
 Starting ipmi driver:                   [FAILED]  
 Starting Systems Management Device Drivers:  
 Starting dell_rbu: Already started             [ OK ]  
 Starting ipmi driver:                   [FAILED]  
 Starting DSM SA Shared Services:              [ OK ]  
 /var/log/messages reports :   
 instsvcdrv: /etc/rc.d/init.d//dsm_sa_ipmi start command failed with status 1  

Solution : 

 # yum install OpenIPMI  

Note : There is no need to start or chkconfig the service.

You can check that the IPMI components are seen with the following command :

 # service ipmi status  
 ipmi_msghandler module in kernel.  
 ipmi_si module in kernel.  
 ipmi_devintf module loaded.  
 /dev/ipmi0 exists.  

Then start Openmanager services :
 # srvadmin-services.sh start