2016年4月,唐合易成接到用户报修,内容为:两台sun 4500-M2存储有告警灯长亮。
我们的工程师赴现场诊断设备故障:
1. 现场两台存储有告警等长亮;
2. 与存储相关的小型机均有存储lun方面的告警和控制器方面的告警;
3. 检查光纤交换机链路状态无明显异常,端口正常。
在登录存储控制器底层查看详细情况发现。存储没有硬件告警,运行正常。通过日志可以看到所有在用lun在下午15时左右由A控切换到B控导致存储上所有的lun没有在最优路径上,导致控制器告警灯长亮,原因为A控制器例行重启。B控在5天后(3月4日,4:00左右)也同时进行了例行重启,LUN切换至A控。
针对如上问题 ,提出了故障维修方案:
为及时消除告警灯,避免有新问题发生可以及时发现,我司工程师把现在所有的lun所在的控制器调整成最优路径,避免了在线切换lun所在控制器造成的瞬断影响应用。告警消除。
故障日志分析:
通过对取下的日志进行分析后,我司工程师对此次告警时间做出以下故障判定:
1.两台控制器均配置有例行重启计划;
2.host side都down了,导致控制器重启;
盘整1:
Date/Time: 16-2-28 5:29:44 Sequence number: 5941 Event type: 400F Event category: Internal Priority: Informational Event needs attention: false Event send alert: false Event visibility: true Description: Controller reset by its alternate Event specific codes: 0/0/0 Component type: Controller Component location: Enclosure 99, Slot A Logged by: Controller in slot B 控制A开始自动重启;
Date/Time: 16-2-28 5:30:27 Sequence number: 5960 Event type: 2606 Event category: Internal Priority: Informational Event needs attention: false Event send alert: false Event visibility: true Description: Start-of-day routine begun Event specific codes: 0/0/0 Component type: Controller Component location: Enclosure 99, Slot A Logged by: Controller in slot A
Date/Time: 16-2-28 5:31:00 Sequence number: 5982 Event type: 2605 Event category: Internal Priority: Informational Event needs attention: false Event send alert: false Event visibility: true Description: Start-of-day routine completed Event specific codes: 0/0/0 Component type: Controller Component location: Enclosure 99, Slot A Logged by: Controller in slot A A控重启完成,之间完成了自检和链路切换。
Date/Time: 16-3-4 4:42:15 Sequence number: 6057 Event type: 400F Event category: Internal Priority: Informational Event needs attention: false Event send alert: false Event visibility: true Description: Controller reset by its alternate Event specific codes: 0/0/0 Component type: Controller Component location: Enclosure 99, Slot B Logged by: Controller in slot A
Date/Time: 16-3-4 4:43:02 Sequence number: 6083 Event type: 2606 Event category: Internal Priority: Informational Event needs attention: false Event send alert: false Event visibility: true Description: Start-of-day routine begun Event specific codes: 0/0/0 Component type: Controller Component location: Enclosure 99, Slot B Logged by: Controller in slot B B控开始重启 Date/Time: 16-3-4 4:43:34 Sequence number: 6106 Event type: 2605 Event category: Internal Priority: Informational Event needs attention: false Event send alert: false Event visibility: true Description: Start-of-day routine completed Event specific codes: 0/0/0 Component type: Controller Component location: Enclosure 99, Slot B Logged by: Controller in slot B B控重启完成,之间完成了自检和链路切换。
|
盘整2
Date/Time: 16-2-28 4:24:53 Sequence number: 5682 Event type: 400F Event category: Internal Priority: Informational Event needs attention: false Event send alert: false Event visibility: true Description: Controller reset by its alternate Event specific codes: 0/0/0 Component type: Controller Component location: Enclosure 99, Slot A Logged by: Controller in slot B
Date/Time: 16-2-28 4:25:36 Sequence number: 5700 Event type: 2606 Event category: Internal Priority: Informational Event needs attention: false Event send alert: false Event visibility: true Description: Start-of-day routine begun Event specific codes: 0/0/0 Component type: Controller Component location: Enclosure 99, Slot A Logged by: Controller in slot A A控制器开始例行重启。 Date/Time: 16-2-28 4:26:08 Sequence number: 5722 Event type: 2605 Event category: Internal Priority: Informational Event needs attention: false Event send alert: false Event visibility: true Description: Start-of-day routine completed Event specific codes: 0/0/0 Component type: Controller Component location: Enclosure 99, Slot A Logged by: Controller in slot A A控重启完成,之间完成了自检和链路切换
Date/Time: 16-3-4 4:05:33 Sequence number: 5789 Event type: 400F Event category: Internal Priority: Informational Event needs attention: false Event send alert: false Event visibility: true Description: Controller reset by its alternate Event specific codes: 0/0/0 Component type: Controller Component location: Enclosure 99, Slot B Logged by: Controller in slot A
Date/Time: 16-3-4 4:06:07 Sequence number: 5830 Event type: 2606 Event category: Internal Priority: Informational Event needs attention: false Event send alert: false Event visibility: true Description: Start-of-day routine begun Event specific codes: 0/0/0 Component type: Controller Component location: Enclosure 99, Slot B Logged by: Controller in slot B B控开始重启
Date/Time: 16-3-4 4:06:41 Sequence number: 5853 Event type: 2605 Event category: Internal Priority: Informational Event needs attention: false Event send alert: false Event visibility: true Description: Start-of-day routine completed Event specific codes: 0/0/0 Component type: Controller Component location: Enclosure 99, Slot B Logged by: Controller in slot B
|
两台盘整的上次重启时间是13年11月,距今850天
盘整1 Date/Time: 13-11-29 18:22:53 Sequence number: 2883 Event type: 2605 Event category: Internal Priority: Informational Event needs attention: false Event send alert: false Event visibility: true Description: Start-of-day routine completed Event specific codes: 0/0/0 Component type: Controller Component location: Enclosure 99, Slot B Logged by: Controller in slot B
盘整2: Date/Time: 13-11-29 18:43:22 Sequence number: 3129 Event type: 2605 Event category: Internal Priority: Informational Event needs attention: false Event send alert: false Event visibility: true Description: Start-of-day routine completed Event specific codes: 0/0/0 Component type: Controller Component location: Enclosure 99, Slot B Logged by: Controller in slot B |
综上所述:此盘整日志为正常的850天未重启,盘整自动例行A、B控分别重启,属于微码设计,无需人为干预。无任何报错。
我司工程师通过对这两台sun存储进行故障处理和日志分析发现,现有的所有lun都在一个控制器上,这对控制器I/O性能有所影响。建议合理利用控制器资源,重新分配lun的最优路径。
我们支持7*24小时IT备件/备机销售及租赁和售后服务
我们支持IT多平台运维、维保服务、机房迁移服务、数据容灾备份服务
24小时服务热线:400-6296-001
业务支持邮箱:support@tanghop.com
请关注唐合易成公众订阅号,了解更多!


