大数跨境
0
0

SUN2500-M2控制器告警鉴定分析

SUN2500-M2控制器告警鉴定分析 唐合易成
2016-04-18
2
导读: 2016年4月,唐合易成接到用户报修,内容为:两台sun 4500-M2存储有告警灯长亮。 我们的工

 2016年4月,唐合易成接到用户报修,内容为:两台sun 4500-M2存储有告警灯长亮。

    我们的工程师赴现场诊断设备故障:

1. 现场两台存储有告警等长亮;

2. 与存储相关的小型机均有存储lun方面的告警和控制器方面的告警;

3. 检查光纤交换机链路状态无明显异常,端口正常。

 

在登录存储控制器底层查看详细情况发现。存储没有硬件告警,运行正常。通过日志可以看到所有在用lun在下午15时左右由A控切换到B控导致存储上所有的lun没有在最优路径上,导致控制器告警灯长亮,原因为A控制器例行重启。B控在5天后(3月4日,4:00左右)也同时进行了例行重启,LUN切换至A控。

针对如上问题 ,提出了故障维修方案:

为及时消除告警灯,避免有新问题发生可以及时发现,我司工程师把现在所有的lun所在的控制器调整成最优路径,避免了在线切换lun所在控制器造成的瞬断影响应用。告警消除。


故障日志分析:

通过对取下的日志进行分析后,我司工程师对此次告警时间做出以下故障判定:

1.两台控制器均配置有例行重启计划;

2.host side都down了,导致控制器重启;


盘整1:

Date/Time:  16-2-28 5:29:44

Sequence  number: 5941

Event type:  400F

Event  category: Internal

Priority:  Informational

Event needs  attention: false

Event send  alert: false

Event  visibility: true

Description:  Controller reset by its alternate

Event  specific codes: 0/0/0

Component  type: Controller

Component  location: Enclosure 99, Slot A

Logged by:  Controller in slot B

控制A开始自动重启;

 

 

Date/Time:  16-2-28 5:30:27

Sequence  number: 5960

Event type:  2606

Event  category: Internal

Priority:  Informational

Event needs  attention: false

Event send  alert: false

Event  visibility: true

Description:  Start-of-day routine begun

Event  specific codes: 0/0/0

Component  type: Controller

Component  location: Enclosure 99, Slot A

Logged by:  Controller in slot A

 

Date/Time:  16-2-28 5:31:00

Sequence  number: 5982

Event type:  2605

Event  category: Internal

Priority:  Informational

Event needs  attention: false

Event send  alert: false

Event  visibility: true

Description:  Start-of-day routine completed

Event  specific codes: 0/0/0

Component  type: Controller

Component  location: Enclosure 99, Slot A

Logged by:  Controller in slot A

A控重启完成,之间完成了自检和链路切换。

 

Date/Time:  16-3-4 4:42:15

Sequence  number: 6057

Event type:  400F

Event  category: Internal

Priority:  Informational

Event needs  attention: false

Event send  alert: false

Event  visibility: true

Description:  Controller reset by its alternate

Event  specific codes: 0/0/0

Component  type: Controller

Component  location: Enclosure 99, Slot B

Logged by:  Controller in slot A

 

Date/Time:  16-3-4 4:43:02

Sequence  number: 6083

Event type:  2606

Event  category: Internal

Priority:  Informational

Event needs  attention: false

Event send  alert: false

Event  visibility: true

Description:  Start-of-day routine begun

Event  specific codes: 0/0/0

Component  type: Controller

Component  location: Enclosure 99, Slot B

Logged by:  Controller in slot B

B控开始重启

Date/Time:  16-3-4 4:43:34

Sequence  number: 6106

Event type:  2605

Event  category: Internal

Priority:  Informational

Event needs  attention: false

Event send  alert: false

Event  visibility: true

Description:  Start-of-day routine completed

Event  specific codes: 0/0/0

Component  type: Controller

Component  location: Enclosure 99, Slot B

Logged by:  Controller in slot B

B控重启完成,之间完成了自检和链路切换。

 


盘整2

 

Date/Time:  16-2-28 4:24:53

Sequence  number: 5682

Event type:  400F

Event  category: Internal

Priority:  Informational

Event needs  attention: false

Event send  alert: false

Event  visibility: true

Description:  Controller reset by its alternate

Event  specific codes: 0/0/0

Component  type: Controller

Component  location: Enclosure 99, Slot A

Logged by:  Controller in slot B

 

Date/Time:  16-2-28 4:25:36

Sequence  number: 5700

Event type:  2606

Event  category: Internal

Priority:  Informational

Event needs  attention: false

Event send  alert: false

Event  visibility: true

Description:  Start-of-day routine begun

Event  specific codes: 0/0/0

Component  type: Controller

Component  location: Enclosure 99, Slot A

Logged by:  Controller in slot A

A控制器开始例行重启。

Date/Time:  16-2-28 4:26:08

Sequence  number: 5722

Event type:  2605

Event  category: Internal

Priority:  Informational

Event needs  attention: false

Event send  alert: false

Event  visibility: true

Description:  Start-of-day routine completed

Event  specific codes: 0/0/0

Component  type: Controller

Component  location: Enclosure 99, Slot A

Logged by:  Controller in slot A

A控重启完成,之间完成了自检和链路切换

 

Date/Time:  16-3-4 4:05:33

Sequence  number: 5789

Event type:  400F

Event  category: Internal

Priority:  Informational

Event needs  attention: false

Event send  alert: false

Event  visibility: true

Description:  Controller reset by its alternate

Event  specific codes: 0/0/0

Component  type: Controller

Component  location: Enclosure 99, Slot B

Logged by:  Controller in slot A

 

Date/Time:  16-3-4 4:06:07

Sequence  number: 5830

Event type:  2606

Event  category: Internal

Priority:  Informational

Event needs  attention: false

Event send  alert: false

Event  visibility: true

Description:  Start-of-day routine begun

Event  specific codes: 0/0/0

Component  type: Controller

Component  location: Enclosure 99, Slot B

Logged by:  Controller in slot B

B控开始重启

 

Date/Time:  16-3-4 4:06:41

Sequence  number: 5853

Event type:  2605

Event  category: Internal

Priority:  Informational

Event needs  attention: false

Event send  alert: false

Event  visibility: true

Description:  Start-of-day routine completed

Event  specific codes: 0/0/0

Component  type: Controller

Component  location: Enclosure 99, Slot B

Logged by:  Controller in slot B

 


两台盘整的上次重启时间是13年11月,距今850天

盘整1

Date/Time:  13-11-29 18:22:53

Sequence  number: 2883

Event type:  2605

Event  category: Internal

Priority:  Informational

Event needs  attention: false

Event send  alert: false

Event  visibility: true

Description:  Start-of-day routine completed

Event  specific codes: 0/0/0

Component  type: Controller

Component  location: Enclosure 99, Slot B

Logged by:  Controller in slot B

 

盘整2:

Date/Time:  13-11-29 18:43:22

Sequence  number: 3129

Event type:  2605

Event  category: Internal

Priority:  Informational

Event needs  attention: false

Event send  alert: false

Event  visibility: true

Description:  Start-of-day routine completed

Event  specific codes: 0/0/0

Component  type: Controller

Component  location: Enclosure 99, Slot B

Logged by:  Controller in slot B


 综上所述:此盘整日志为正常的850天未重启,盘整自动例行A、B控分别重启,属于微码设计,无需人为干预。无任何报错。

我司工程师通过对这两台sun存储进行故障处理和日志分析发现,现有的所有lun都在一个控制器上,这对控制器I/O性能有所影响。建议合理利用控制器资源,重新分配lun的最优路径。





我们支持7*24小时IT备件/备机销售及租赁和售后服务

我们支持IT多平台运维、维保服务、机房迁移服务、数据容灾备份服务

24小时服务热线:400-6296-001

业务支持邮箱support@tanghop.com 


请关注唐合易成公众订阅号,了解更多!




【声明】内容源于网络
0
0
唐合易成
提供IT运维及维保、机房搬迁、容灾备份、数据迁移服务;服务器、小型机、存储、网络及安全等设备的整机和备件销售;整机及备件租赁;数据库、中间件、备份、虚拟化等各类软件的支持与服务。
内容 1101
粉丝 0
唐合易成 提供IT运维及维保、机房搬迁、容灾备份、数据迁移服务;服务器、小型机、存储、网络及安全等设备的整机和备件销售;整机及备件租赁;数据库、中间件、备份、虚拟化等各类软件的支持与服务。
总阅读137
粉丝0
内容1.1k