theducks.org

12 Jul, 2013

Have you tried turning it off and on again, five times?

Posted by: alex in: Gibbering

These days I’m doing a lot of work implementing Datacentre equipment, including SANs, servers and switches. I recently installed some Brocade VDX6720 switches. Pretty cool stuff, especially the way that TRILL and vLAGs work so easily. I had a loaner switch, running Network OS (NOS) 3.0.1aa, while waiting for the final switches to be delivered. When the final switches were delivered, they were running NOS 2.0.1, so I had to upgrade them. That wasn’t the smoothest of experiences.

In Brocade’s defence, I should have read the manual closer, both of these issues are in there, if you read it all.. but time is money!

1 – I couldn’t get it to download new firmware over SCP – it kept reporting file not found, and that’s hard to diagnose. Might not be brocade’s fault, so I went to use the USB method of loading firmware. It took about 15 minutes till I re-read the manual and saw that you can only use the Brocade branded USB key to do that. Yes, there’s a USB port on the thing, but you can only use the ONE USB key it came with to load firmware onto it (or a similar Brocade one, at least they aren’t node locked)

2 – Turns out by “not supported”, they mean that a direct upgrade from 2.0.1 to 3.0.1 will almost brick your switch. You need to upgrade to 3.0.0 first. I missed that bit. One of the switches didn’t have VLAN information on, and came back pretty quickly, while the other, that I had configured VLANs on, would crash at startup.

eAnvil rev B found
Info: panic dump has been initialized!
Exisitng reboot reason fsize = 5 rb=
Global Fan Direction is 0

The file contains no trace dump information.

Network OS ((none))

(none) console login: ********************************************************************************************************
** Crashed in OM/Worker (WaveNs::ClusterLocalObjectManager::boot(WaveNs::WaveAsynchronousContextForBootPhases*))
********************************************************************************************************

WaveNs::ClusterLocalObjectManager::boot(WaveNs::WaveAsynchronousContextForBootPhases*)
WaveNs::WaveObjectManager::bootBootSelfStep(WaveNs::PrismLinearSequencerContext*)
WaveNs::PrismLinearSequencerContext::executeCurrentStep()
WaveNs::PrismLinearSequencerContext::executeNextStep(unsigned int const&)
WaveNs::WaveObjectManager::bootBootWorkersStep(WaveNs::PrismLinearSequencerContext*)
WaveNs::PrismLinearSequencerContext::executeCurrentStep()
WaveNs::PrismLinearSequencerContext::start()
WaveNs::WaveObjectManager::bootHandler(WaveNs::PrismBootObjectManagerMessage*)
WaveNs::WaveObjectManager::PrismOperationMapContext::executeMessageHandler(WaveNs::PrismMessage*&)
WaveNs::WaveObjectManager::handlePrismMessage(WaveNs::PrismMessage*)
WaveNs::PrismThread::start()
WaveNs::PrismPosixThread::pthreadStartMethod(WaveNs::PrismPosixThread*)
/lib/libpthread.so.0 [0xc306e5c]
clone

Thu Jul 11 22:04:07 UTC 2013 :: Confd: Waiting for Dcmd to become ready...

How did I recover? Well, reading scrollback I noticed a “Found 2(threshold 5) abnormal reboots within 3000 seconds window(threshold)” message, and wondered what would happen if I hit 5 abnormal reboots? Well, that gives you an option to “clean databases”, which fixed it good and proper.

3 – Bonus gripe here. As far as I can tell, there’s no way to configure a range of interfaces, like cisco’s “int range” or dell’s “int blah/0 to blah/2”. I’ve seen some people say on forums that they wouldn’t buy Brocade again because of this. A little harsh, but it seems like a pretty trivial feature to add.