Microsoft's Zune Meltdown: Three Lessons Developers Should Learn

to Development |

On December 31, all the 30GB Zune models turned into bricks because of a Leap Year firmware coding error. This quality assurance and testing debacle demonstrates three lessons every software developer should take to heart.

On the last day of 2008, every one of an older model of the Microsoft Zune MP3 player (the ones with 30GB of storage) locked up. The devices were back in operation again a day later, and Microsoft explained the cause of the trouble:

"A bug in the internal clock driver related to the way the device handles a leap year. The issue should be resolved over the next 24 hours as the time change moves to January 1, 2009. We expect the internal clock on the Zune 30GB devices will automatically reset tomorrow (noon, GMT)."

With that data, Zune and technical users have some idea of what happened in the "Z2K9" incident. Microsoft's Scott Hanselman wrote a very good technical analysis for programmers about the dangers of such edge cases, and apparently he's not the only one to cover the bug. (My thanks to Indrajit Chakrabarty for the pointer.)

However, aside from "how not to write code like that," there are three important things for developers and software QA professionals—and their managers—to take away from the experience.

This Was a Failure of the Software Development Process and QA Testing

It's great that the technical problem was so easily addressed ("wait a day"), but it's one heck of an embarrassment for Microsoft. I'm not talking about their PR issues per se, though Microsoft is still trying to live down the Red Ring of Death debacle with its XBox. However, Microsoft has a long history of, shall we say, a less-than-stellar reputation for quality, and they did not do themselves any favors with this incident. I feel especially sorry for the authors of the new book, How We Test Software at Microsoft (cue: pointing and giggling) and the many smart people I have met from the company. (They have great people. Really. Some of the smartest techies I've met. But somehow Microsoft doesn't seem to create a culture that demands quality.)

But the bottom line is that this problem was entirely preventable. As a London-based web developer pointed out to me, "Edge conditions such as year transitions on leap years really ought to be tested as a matter of course, and shouldn't be that difficult to do on devices where you can adjust the clock." The date problem really should have been spotted before it was checked in, he says; any sort of code review probably would have spotted the infinite loop possibility. So why wasn't it done? Why wasn't it caught?

I do understand the notion of "ship on time," and that some things get lost in the eternal desire to make a production date. Quality assurance testing is not the only victim. But this is a well-defined problem set with pretty darned obvious unit tests. (I won't be surprised if I get e-mail messages from QA Tools companies telling me that their products include such tests as a matter of course. Just post a response to this post, folks. In this context, it's fine.)

We all make mistakes. But the purpose of software engineering is to catch and fix errors before the product is released.

For further contemplation: Would your company's software development process have caught an error like this?

Failure to Learn From History: It's

Continue Reading

Print

Browse CIO Blogs

See all CIO Blogs »

Cloud computing has emerged as one of the most significant game changers to hit the technology landscape in the past 20 years. With this massive expansion of the cloud, the perception of the IT organization is shifting from a utility player to a change agent. This eBook breaks down five ways progressive organizations are using cloud-based IT Management solutions to help drive innovation and become more strategic, including: adding visibility and analytics, speeding up time-to-value, lowering costs, improving prioritization, and providing a blueprint for future cloud deployments.
Read the white paper to see how IBM helped Citigroup deliver new services and enhancements to their 200 million customers faster.
There are 3 ways to modernize legacy applications: rewrite completely, acquire packaged solutions or migrate existing code. This paper explains why it's best to migrate and how IBM® Rational® software can help.
Accommodating specific lines of business can result in a hybrid ecosystem of applications and servers. The resulting complexity of this architecture makes for an environment that is costly to maintain and difficult to change when addressing new challenges.
This whitepaper will help you to define a mobile device passcode policy. Security managers must attempt to reconcile two opposing goals. They must: 1) create a passcode policy that is strong enough to protect the device if it is lost or stolen, while: 2) not annoying users with needless length or complexity.
This whitepaper, authored by The Radicati Group, looks at the key reasons organizations should consider moving to a cloud-based archiving solution. Email archiving solutions enable organizations to store, monitor, and collect electronic data exchanged by their users to comply with internal policies and regulations.
ATERNITY will showcase a 30-minute demo on how Fortune 500 companies are leveraging its award-winning FPI Platform to deliver a user-centric approach to Proactive IT Management.
For businesses to move forward and tap into the ever-expanding universe of Internet users and network-enabled devices, it's critical to learn how to make the transition to IPv6. Learn the critical steps your organization must take to make a seamless transition-and keep your business world connected.
Learn how IT teams can protect against spear phishing tactics. Harry Sverdlove, chief technology officer of Bit9 offers a frank discussion about spear phishing - the most common technique used in today's advanced attacks.
Learn how to build a solid business case for your migration to Red Hat Enterprise Linux so you can run leaner, innovate faster, be more flexible and own the New Now.
Social media isn't about you; it's about everything around you. As you consider how your customers want to communicate with you, social media is something that can't be ignored. But what should your strategy be? Is social media "just another channel?" What kind of a plan makes sense for your contact center and for your customers? Join our experts as they share their insight and research results.
Hardware tokens were a popular method of strong authentication in past years but the cumbersome provisioning and distribution tasks, high support requirements and replacement costs have limited their growth. The additional log-in steps that hardware tokens require and the resulting user frustrations have limited adoption and make them impractical for larger scale partner and customer applications.

Newsletter Sign-Up »

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all Newsletters | Privacy Policy