NEWSLETTERS
 

CIO.com updates, insights and advice on technology, management and your career.

 CIO BlackBerry News and Tips
 CIO Research and Analysis
 CIO Microsoft
 CIO Insider
 
 
 
SUBSCRIBE TO CIO
 
Are you involved in setting the direction for your company's IT budget or strategy?

Apply today for a FREE subscription to CIO Magazine!

 


Mon, Jan 5, 2009 17:31 EST

Microsoft's Zune Meltdown: Three Lessons Developers Should Learn

Topic: Development

Blog: Developer Wisdom

Current Rating: 4 Comments: 36

On December 31, all the 30GB Zune models turned into bricks because of a Leap Year firmware coding error. This quality assurance and testing debacle demonstrates three lessons every software developer should take to heart.

On the last day of 2008, every one of an older model of the Microsoft Zune MP3 player (the ones with 30GB of storage) locked up. The devices were back in operation again a day later, and Microsoft explained the cause of the trouble:

"A bug in the internal clock driver related to the way the device handles a leap year. The issue should be resolved over the next 24 hours as the time change moves to January 1, 2009. We expect the internal clock on the Zune 30GB devices will automatically reset tomorrow (noon, GMT)."

With that data, Zune and technical users have some idea of what happened in the "Z2K9" incident. Microsoft's Scott Hanselman wrote a very good technical analysis for programmers about the dangers of such edge cases, and apparently he's not the only one to cover the bug. (My thanks to Indrajit Chakrabarty for the pointer.)

However, aside from "how not to write code like that," there are three important things for developers and software QA professionals—and their managers—to take away from the experience.

This Was a Failure of the Software Development Process and QA Testing

It's great that the technical problem was so easily addressed ("wait a day"), but it's one heck of an embarrassment for Microsoft. I'm not talking about their PR issues per se, though Microsoft is still trying to live down the Red Ring of Death debacle with its XBox. However, Microsoft has a long history of, shall we say, a less-than-stellar reputation for quality, and they did not do themselves any favors with this incident. I feel especially sorry for the authors of the new book, How We Test Software at Microsoft (cue: pointing and giggling) and the many smart people I have met from the company. (They have great people. Really. Some of the smartest techies I've met. But somehow Microsoft doesn't seem to create a culture that demands quality.)

But the bottom line is that this problem was entirely preventable. As a London-based web developer pointed out to me, "Edge conditions such as year transitions on leap years really ought to be tested as a matter of course, and shouldn't be that difficult to do on devices where you can adjust the clock." The date problem really should have been spotted before it was checked in, he says; any sort of code review probably would have spotted the infinite loop possibility. So why wasn't it done? Why wasn't it caught?

I do understand the notion of "ship on time," and that some things get lost in the eternal desire to make a production date. Quality assurance testing is not the only victim. But this is a well-defined problem set with pretty darned obvious unit tests. (I won't be surprised if I get e-mail messages from QA Tools companies telling me that their products include such tests as a matter of course. Just post a response to this post, folks. In this context, it's fine.)

We all make mistakes. But the purpose of software engineering is to catch and fix errors before the product is released.

For further contemplation: Would your company's software development process have caught an error like this?

Failure to Learn From History: It's


You do not have flash or javascript support.
Average (3 votes)
4
 
 
Tue, Jan 6, 2009 2:16 EST
Anonymous user
Posted by: EmbeddedStupidity
Rating: 50

I agree this is an epic fail. Stupid even. Epically stupid. And more stupidly, inevitable. Having worked in the embedded systems milieu for many years, I'm not surprised. I see stupid stuff almost everyday. I don't want to, but it's embedded in the development geography.

For consumer devices, everything is a compromise. Limited specs, but creeping features. Limited hardware and firmware development cycles, even more limited test cycles and usually supported by primitive and often buggy tools. This is keenly balanced by unlimited marketing pressure to release a beta beast in production skin.

So, Microsoft got caught with a really stupid one. Not the first. Not the last. Embedded systems devel model isn't broken. No one has had the time to fix it enough. We're too busy re-inventing code-wheels that were perfected 20 years ago, but can't run on our current platform or the licensing cost won't fit our BOM or aren't shiny enough to fit our new shiny methodology. Oh, well, when you're building tomorrow's obsolete gadget, there's no point dwelling on the past.

 
Wed, Jan 7, 2009 4:40 EST
Anonymous user
Posted by: Anonymous
Rating: 10

It happens with Microsoft Also. then stupidity comes :) I hope small companies are more Awareness with their products

 
Tue, Jan 6, 2009 3:44 EST
Anonymous user
Posted by: Patrick
Rating: 50

The assumption is made here that the programmer thought it was tested thoroughly enough... which we don't know.

 
Tue, Jan 6, 2009 6:36 EST
Anonymous user
Posted by: Anonymous
Rating: 63.3333

It's funny how you go bash Microsoft for such an apparent error on their part when you have no idea what you're talking about. The issue was caused by a third party OEM chip driver for the internal clock. The same one used in the Toshiba Gigabeat S Series, which, in fact, had the same exact freezing/clock issue the Zune did (since Toshiba made the first Gen Zunes for MS). But I hear no mention of Gigabeats freezing or a 'GigaGate'!. You blatantly bash Zune but give no mention of other devices effected by the same erroneous code. (find out more here -> http://www.zuneboards.com/forums/zune-news/38143-cause-zune-30-leapyear-problem-isolated-4.html#post351116 )

It's because of supposed journalists such as yourself jumping on the bandwagon of MS hate and trying to pass your opinion off as truth, when MS really had nothing to do with this issue. Next time, get your facts straight before you go blasting something you have no clue about.

Thanks for wasting my time on that...

 
Tue, Jan 6, 2009 10:13 EST
Posted by: Esther Schindler
Rating: 63.3333

The fact that they work for Microsoft is irrelevant. I don't care where the developer worked. I only care that something this dumb was written and failed in the software development process.

And my comment about Microsoft lowering user expectations is less about the Zune than it's about the willingness of people to excuse the error as unimportant.

Post new comment

* Subject:
* Username:
* E-mail:
The content of this field is kept private and will not be shown publicly.
Homepage:
* Body:
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <img> <blockquote> <strike> <p> <br>
  • Lines and paragraphs break automatically.
More information about formatting options

* Denotes required field.

About this Blog

Helping developers use technology to solve business problems every day.

Hot Conversations

Take My Windows 7 Please: A Resale Tale

Posted by Shane ONeill in News | 6 comments

Creating a Privacy Policy Part V

Posted by Ariel Silverstone in Best Practices | 1 comments

Start a Conversation
Click to post

Got something to say? We want to hear it! Click the Post button to get started. GO»

EXPERT ADVICE
See our roster of experts.

Advice & Opinion from more than 113 of IT's most insightful thinkers.

  PARTNERS       WEBCASTS    
 

Windows 7 Webcast Series

There's a lot of buzz about Windows 7 out there. Each month in our webcast series, listen to analysts and customers discuss how Windows 7 and the Windows Optimized Desktop is impacting large companies around the world. Learn how they evaluated Windows 7, including the cost of deployment, deployment strategies, and tangible benefits.

Sponsored by Microsoft  Listen to on-demand Recordings »

 

Service Level Management Best Practices Life Cycle Overview - Improve Service Levels

Best practices for Service Level Management (SLM) is a process for consistently meeting customer requirements and delivering on IT's promises. See the steps required to ensure high-quality SLM.

Sponsored by Compuware  Read this White Paper »

 

Keeping Your Members Safe from Online Scams and Predators

In order to keep fraudsters out, romance sites must deploy effective solutions that look at information independent of what is supplied by users. A device fingerprinting solution such as iovation ReputationManager™ provides unique insight into the computers being used to create multiple accounts and exposes hidden device-account relationships that identity-based fraud solutions often miss.

Sponsored by iovation  Read this White Paper »

Resource Alerts

Get instant email notifications by topic when white papers, webcasts, and case studies are added to our library.

Resource Alerts

Get instant email notification when white papers, webcasts, and case studies are added to our library. Don't just be up-to-date—be up to the minute with our new Resource Alerts.

Defend Against Blended Threats: What You Need to Know

Blended Web and email threats are becoming increasingly complex and represent a huge...  View Now »

 

Prescriptive Actions to Reduce Risk

In this Webcast, learn best practices for effective systems management in a heterogeneous environment and keep client systems cost under control.   View Now »

 

Webcast- Vantage 11: Redefining Application Performance Management

Compuware's latest release, Vantage 11, is a major advance in end-to-end application performance management--bringing together proactive issue identification, quantification of business impact and problem resolution into a single solution. Tune in to learn how Vantage 11's top-down approach helps you make better decisions and dramatically lower operations costs.  View Now »

Resource Alerts

Get instant email notification when white papers, webcasts, and case studies are added to our library. Don't just be up-to-date—be up to the minute with our new Resource Alerts.

 
NEWSLETTER

Sign-up for the Blogs & Discussion Newsletter

 
FEATURED SPONSORS
 
 
 
SPONSORED LINKS
 

See how AT&T can help protect your network.

Top Five CIO Challenges

Streamline IT Costs. Boost Performance with WAN Optimization.

Want to know how you can maximize employee productivity?

Build your 1st app FREE with Force.com

TDWI checklist helps define data readiness for analytics. Download report.

Increase UPS efficiency without sacrificing protection.

A Clear View Toward Virtualization

Virtualization Technology as a Business Solution

The rules of infrastructure management just changed.

A Clear View Toward Virtualization

Interactive Q&A helps you discover key ways to maximize IT assets.

Ready to virtualize tier one applications? Check your virtualization maturity.

Think you can't afford a Cisco Switch? Cisco Catalyst Switches are now more affordable.

Five minute business analytics assessment. Immediate results.

The Case for Investing in Business Analytics Technology. Read white paper.

White Paper: Right-Sizing Your Power Infrastructure

Webcast: Unleashing the Power of Customer Data

White Paper: Managed Security for a Not-So-Secure World

SharePoint - Unchecked growth of content is unsustainable.

White Paper: Legacy Tools: Not Built for the Helpdesk

Taking a Seat at the Executive Table: The Reality of Virtualization

Five-Step Mobility Management Plan

White Paper: Next Generation Remote Infrastructure Management

Disciplined Autonomy: Resolving the Tension Between Flexibility and Control

Join us at the US-Brazil IT-BPO Summit, on November 10th in New York.

Unified Communications: Thoughts, Strategies and Predictions. Join the discussion

Read the RSA report: Security for Business Innovation

Webcast: Looking to the Cloud for Email and Collaboration Services

64-page prescriptive guide to security, compliance, and IT operations.

Keep your IT expertise up to date. Join the Intel Premier IT Professionals.

A new fleet of PCs with a total ROI in 10 months. Find your ROI.

eZine: A Roadmap to Reducing IT Complexity

Reduce risk, gain agility. See how Progress can help your business.

Virtualization Technology as a Business Solution

eZine: A Roadmap to Reducing IT Complexity

World-class trading technology solutions from NYSE Technologies.

If You're Paying for Telecom, You're Paying Too Much. Contact Asentinel Today.

Trade-In your old printer and save up to $1,000 plus free recycling!

infoBOOM! - The Mid-Sized Company CIO's Exclusive Community

Live Webinar: Applying Business Analytics. Click here to learn more

White Paper: 4 Customer Service Myths

Mobile Security: The Essential Ingredient for Today's Enterprise

White Paper: Improve Agility with Operational Responsiveness

White Paper: 5 Best Practices for Smartphone Support

Global Research: CIOs Weigh In On Virtualization

5 Key Virtualization Management Challenges

Learn How Web Site Performance Impacts Shopper Behavior

IDC White Paper: CCM for IT Compliance and Risk Management

Tolly Group Lab Test Results: Cisco vs. ShoreTel