Linux 指南/监控
外观
< Linux 指南
此页面处于 TODO 状态。任何人都可以自由地完成/为其贡献。目前(2010-06-11)它包含我一直在收集的一些随机笔记。
TODO 标记表示“待办事项”(“TODO”被一些编辑工具自动识别为待办事项)。
下一个 链接 提供了一个快速脚本,用于在 Linux 中重新扫描 SCSI 总线。
大多数情况下,有一种更简单的方法可以正常工作。
echo "- - -" > /sys/class/scsi_host/host0/scan
针对 Qlogic 卡的稍微复杂一些的脚本示例。
#!/bin/bash for HBA in `ls -A /proc/scsi/qla2xxx/` do echo "scsi-qlascan" > /proc/scsi/qla2xxx/${HBA} done
或者,如果可用,可以使用 iscsiadm。
iscsiadm -t discovery --type sendtargets --portal <IP> iscsiadm -t node --targename <targetname>-- portal<IP> --login
在网上提供的其他文档中,Red Hat Enterprise Linux 5 在线存储重新配置指南 也可以提供有用的帮助。
Dmidecode 根据 SMBIOS/DMI 标准(参见示例输出)报告有关系统硬件的信息,这些信息在系统 BIOS 中描述。此信息通常包括系统制造商、型号名称、序列号、BIOS 版本、资产标签,以及许多其他细节,这些细节的兴趣程度和可靠性根据制造商的不同而有所不同。这通常包括 CPU 插槽、扩展插槽(例如 AGP、PCI、ISA)和内存模块插槽的使用状态,以及 I/O 端口列表(例如串行、并行、USB)。
What is IPMI? The Intelligent Platform Management Interface (IPMI) specification defines a set of interfaces for platform management. It is implemented by a large number of hardware manufacturers to support system management on motherboards. The features of IPMI that most users will be interested in are sensor monitoring (i.e. CPU temperatures, fan speeds), remote power control, and serial-over-LAN (SOL). What is FreeIPMI? FreeIPMI provides in-band and out-of-band IPMI software based on the IPMI v1.5/2.0 specification. FreeIPMI provides tools and libraries for users to access and read IPMI sensor readings, system event log (SEL) entries, serial-over-LAN (SOL), remote power control functions, field replaceable unit (FRU) device information, and more. More information about FreeIPMI can be found at the FreeIPMI webpage at: http://www.gnu.org/software/freeipmi/index.html
************************************************************************ ~# smartctl -d cciss,0 -a /dev/cciss/c0d0 smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: HP DH072ABAA6 Version: HPD7 Serial number: 3PD19ZMN0000983153B8 Device type: disk Transport protocol: SAS Local Time is: Sat Jul 19 20:09:09 2008 CEST Device supports SMART and is Enabled Temperature Warning Enabled SMART Health Status: OK Current Drive Temperature: 29 C Drive Trip Temperature: 68 C Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 899299930 Blocks received from initiator = 14843797 Blocks read from cache and sent to initiator = 3793967485 Number of read and write commands whose size <= segment size = 48565840 Number of read and write commands whose size > segment size = 0 Vendor (Seagate/Hitachi) factory information number of hours powered up = 945.00 number of minutes until next internal SMART test = 7 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 0 0 0 0 0 0.000 0 write: 0 0 0 0 0 0.000 0 Non-medium error count: 0 No self-tests have been logged Long (extended) Self Test duration: 840 seconds [14.0 minutes] ************************************************************************ ~# smartctl -d cciss,1 -a /dev/cciss/c0d0 smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: HP DH072ABAA6 Version: HPD7 Serial number: 3PD19ZPV000098315CX2 Device type: disk Transport protocol: SAS Local Time is: Sat Jul 19 20:09:12 2008 CEST Device supports SMART and is Enabled Temperature Warning Enabled SMART Health Status: OK Current Drive Temperature: 30 C Drive Trip Temperature: 68 C Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 920490987 Blocks received from initiator = 14368268 Blocks read from cache and sent to initiator = 3755437180 Number of read and write commands whose size <= segment size = 48820139 Number of read and write commands whose size > segment size = 0 Vendor (Seagate/Hitachi) factory information number of hours powered up = 945.02 number of minutes until next internal SMART test = 8 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 0 0 0 0 0 0.000 0 write: 0 0 0 0 0 0.000 0 Non-medium error count: 0 No self-tests have been logged Long (extended) Self Test duration: 840 seconds [14.0 minutes] ************************************************************************ ~# smartctl -d cciss,2 -a /dev/cciss/c0d0 smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: HP DH072ABAA6 Version: HPD7 Serial number: 3PD1A0SD000098300K39 Device type: disk Transport protocol: SAS Local Time is: Sat Jul 19 20:09:15 2008 CEST Device supports SMART and is Enabled Temperature Warning Enabled SMART Health Status: OK Current Drive Temperature: 31 C Drive Trip Temperature: 68 C Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 913141941 Blocks received from initiator = 11455509 Blocks read from cache and sent to initiator = 3697098775 Number of read and write commands whose size <= segment size = 49159966 Number of read and write commands whose size > segment size = 0 Vendor (Seagate/Hitachi) factory information number of hours powered up = 944.93 number of minutes until next internal SMART test = 18 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 0 0 0 0 0 0.000 0 write: 0 0 0 0 0 0.000 0 Non-medium error count: 0 No self-tests have been logged Long (extended) Self Test duration: 840 seconds [14.0 minutes]
在戴尔服务器中安装 OMSA 用于硬件监控
OMSA 允许监控 RAID 的运行状况、主板/磁盘/机箱温度、警报生成、设置/修改 BIOS、查看已安装的设备等。
要在 Debian 下安装
1.- 在 /etc/apt/sources.list 中添加下一行
deb ftp://ftp.sara.nl/pub/sara-omsa dell sara
2.- 执行
apt-get update && apt-get install dellomsa
这将把 OMSA 安装在 /opt/dell 中。
3.- 要引导系统
~# /opt/dell/srvadmin/dataeng/bin/dsm_sa_datamgr32d -run ~# /opt/dell/srvadmin/dataeng/bin/dsm_sa_eventmgr32d -run
检查连接到控制器 0 的磁盘的运行状况
~# /etc/delloma.d/oma/bin/omreport.sh storage pdisk controller=0
输出将类似于
List of Physical Disks on Controller PERC 4e/Di (Embedded) Controller PERC 4e/Di (Embedded) ID : 0:0 Status : Ok Name : Physical Disk 0:0 State : Online Failure Predicted : No Progress : Not Applicable Type : SCSI Capacity : 68.24 GB (73274490880 bytes) Used RAID Disk Space : 68.24 GB (73274490880 bytes) Available RAID Disk Space : 0.00 GB (0 bytes) Hot Spare : No Vendor ID : MAXTOR Product ID : ATLAS10K5_73SCA Revision : JNZY Serial No. : J20KVCTK Negotiated Speed : 320 Capable Speed : 320 Manufacture Day : Not Available Manufacture Week : Not Available Manufacture Year : Not Available SAS Address : Not Available ID : 0:1 Status : Ok Name : Physical Disk 0:1 State : Online Failure Predicted : No Progress : Not Applicable Type : SCSI Capacity : 68.24 GB (73274490880 bytes) Used RAID Disk Space : 68.24 GB (73274490880 bytes) Available RAID Disk Space : 0.00 GB (0 bytes) Hot Spare : No Vendor ID : MAXTOR Product ID : ATLAS10K5_73SCA Revision : JNZY Serial No. : J20KV5RK Negotiated Speed : 320 Capable Speed : 320 Manufacture Day : Not Available Manufacture Week : Not Available Manufacture Year : Not Available SAS Address : Not Available ID : 0:2 Status : Ok Name : Physical Disk 0:2 State : Online Failure Predicted : No Progress : Not Applicable Type : SCSI Capacity : 68.24 GB (73274490880 bytes) Used RAID Disk Space : 68.24 GB (73274490880 bytes) Available RAID Disk Space : 0.00 GB (0 bytes) Hot Spare : No Vendor ID : MAXTOR Product ID : ATLAS10K5_73SCA Revision : JNZY Serial No. : J20KTS8K Negotiated Speed : 320 Capable Speed : 320 Manufacture Day : Not Available Manufacture Week : Not Available Manufacture Year : Not Available SAS Address : Not Available
检查 RAID 的状态/配置
~# /etc/delloma.d/oma/bin/omreport.sh storage vdisk controller=0
这将看起来像
Virtual Disk 0 on Controller PERC 4e/Di (Embedded) Controller PERC 4e/Di (Embedded) ID : 0 Status : Ok Name : Virtual Disk 0 State : Ready Progress : Not Applicable Layout : RAID-5 Size : 136.48 GB (146548981760 bytes) Device Name : /dev/sda Type : SCSI Read Policy : Adaptive Read Ahead Write Policy : Write Back Cache Policy : Direct I/O Stripe Element Size : 64 KB
获取服务器的摘要
~# /etc/delloma.d/oma/bin/omreport.sh system summary System Summary ------------------ Software Profile ------------------ Systems Management Name : Information not available. Version : 3.2.0 Description : Systems Management Software Operating System Name : Linux Version : Kernel 2.6.18.2 (i686) System Time : Sun Nov 25 18:30:37 2007 System Bootup Time : Fri Oct 12 15:20:31 2007 -------- System -------- System Host Name : MySuperServidor System Location : Please set the value --------------------- Main System Chassis --------------------- Chassis Information Chassis Model : PowerEdge 2850 Chassis Service Tag : Chassis Lock : Present Chassis Asset Tag : Processor 1 Processor Manufacturer : Intel Processor Family : Xeon Processor Version : Model 4 Stepping 3 Current Speed : 3200 MHz Maximum Speed : 3600 MHz External Clock Speed : 800 MHz Voltage : 1400 mV Processor 2 Processor Manufacturer : Intel Processor Family : Xeon Processor Version : Model 4 Stepping 3 Current Speed : 3200 MHz Maximum Speed : 3600 MHz External Clock Speed : 800 MHz Voltage : 1400 mV Memory Total Installed Capacity : 2048 MB Memory Available to the OS : 2023 MB Total Maximum Capacity : 16384 MB Memory Array Count : 1 Memory Array 1 Location : System Board or Motherboard Use : System Memory Installed Capacity : 2048 MB Maximum Capacity : 16384 MB Slots Available : 6 Slots Used : 2 ECC Type : Multibit ECC Slot PCI1 Adapter : [Not Occupied] Type : PCI X Data Bus Width : 64 Bits Speed : 133 MHz Slot Length : Long Voltage Supply : 3.3 Volts Slot PCI2 Adapter : [Not Occupied] Type : PCI X Data Bus Width : 64 Bits Speed : 133 MHz Slot Length : Long Voltage Supply : 3.3 Volts Slot PCI3 Adapter : PRO/100 S Server Adapter Type : PCI X Data Bus Width : 64 Bits Speed : 133 MHz Slot Length : Short Voltage Supply : 3.3 Volts BIOS Information Manufacturer : Dell Inc. Version : A04 Release Date : 09/22/2005 -------------- Network Data -------------- IP Address Data IP Address 0 : 192.168.2.2 IP Address 1 : 192.168.0.115 -------------------- Storage Enclosures -------------------- Storage Enclosures Name : Backplane Service Tag : 62P00P8
语法 | 简要说明 |
---|---|
top | 允许监视和管理正在运行的进程(用于终止进程)。 按“q”退出,按“k”终止进程。 |
htop | 类似于 top,但具有更友好的基于菜单的用户界面。 |
lsof | 显示哪些进程正在“接触”文件或目录,以及进程正在访问的文件集(这也包括任何网络套接字、管道或设备)。 |
netstat | 提供网络使用情况和连接(已建立的连接和监听连接)的统计信息和报告。 |
vmstat | 提供有关内存使用情况的统计信息。 |
iostat | 提供有关读写外部设备的统计信息。 |
inotifywatch inotifywait |
现代 Linux 内核允许将任何对文件的访问或更改立即通知进程(用户应用程序)。“inotifywatch”和“inotifywait”命令允许等待来自内核的新的事件通知,这些事件通知与一组文件/目录相关的任何内容。 |
strace -p <pid> |
允许监视用户应用程序的系统调用(对内核提供的服务的调用)。 |
stap | 允许实时以高细节监视内核。可以在 此处 阅读教程。 |
oprofile 和 perfmon2 | 允许访问硬件性能计数器;可以在 此处 浏览教程。 |
AMD CodeAnaylist | Oprofile 的图形用户界面前端。可以在 此处 和 此处 浏览简介/教程。 |
Intel VTune | 允许在 Intel 硬件中进行性能调整。 |