Dell OptiPlex 9010 Ubuntu Linux Lock-Up

The History

I’ve been waiting for a while to get a new desktop computer at home. Back in 2006, I was biding my time to replace my dead desktop until the Windows Vista buy your computer now and we’ll give you Vista whenever it comes out program to be officially announced by Dell, and finally bought my OptiPlex 745 Small Form Factor PC (with a nearly top-of-the-line Conroe Core 2 Duo E6600 CPU) in October. Since that date, I’ve been running said computer nearly 24 hours a day, seven days a week, and it’s held up remarkably well.

While I ran Windows XP full time when I got it, I switched to Ubuntu full time (aside from playing through Portal when it came out) whenever I got my hands on Vista. That’s not to come as a condemnation of Vista, as everyone is so quick to do, but it just worked out that way: since I was going to do a fresh install on a fresh hard drive for Vista, it was a perfect opportunity to install Ubuntu alongside. And once Ubuntu was installed, I wanted to keep using it (as I had been full time in the months between my old desktop dying and purchasing that new computer). I preferred the desktop environment, window management, and most of the apps were either the exact same ones I used already on Windows (e.g. Opera, Firefox, Pidgin), or had suitable (if not preferable) replacements, with the exception of Foobar2000 and Media Player Classic (MPC). Things worked well, too!

The long wait

Fast forward to late 2011, and I started to get the upgrade itch. Part of that was inspired by the desire to play the (then-)upcoming Portal 2 game, which I was confident my hardware wouldn’t be able to handle, and the other part was just my inner geek. I’d certainly gotten my money’s worth out of the system that’d passed its five-year anniversary of arriving on my desk, and technology had advanced quite a bit, mostly in terms of CPU speed and storage technology (i.e. SSDs). I knew that Intel’s roadmap had another microarchitecture—Ivy Bridge—due up around the end of 2011, or early 2012, depending on which rumors and schedule slips were in the press at a given time, so I decided I’d wait for that.

The retail availability of the first Ivy Bridge desktop CPUs came and went, and there were scant few desktops announced. (My interest in building my own PCs died along with my last self-built desktop, very inconveniently timed to coincide with my last few weeks of college–hence the Dell in October 2006.) Lenovo was first to the punch with some announced ThinkVantage refreshes on Ivy Bridge, but my stellar experience with the OptiPlex 745 caused me to hold out for Dell’s OptiPlex refresh. In early-to-mid-June, that finally happened, and I ordered my own OptiPlex 9010 Small Form Factor (with i7-3770) a few days later.

The Excitement

The machine arrived on a Monday (a few days ahead of schedule—woo!), but I had Monday Night Dinner to attend, so I couldn’t play with it any more than to open the box and plug in the RAM (Crucial Ballistix 2x 8GB DDR3 1600) and SSD (Intel 520-series 180 GB) I bought separately. I don’t think I had time to turn the machine on until Wednesday or Thursday, at which point I set about shrinking the Windows partition (which went fine) and installing linux (Ubuntu 12.04). Unfortunately I overcomplicated things a bit, and was a bit naive, so on my first attempt I neglected to provide an EFI partition at the beginning of the SSD, which caused me quite a bit of confusion until I realized that was necessary. Rather than trying to shift data and partitions slightly (since I assume the EFI partition is supposed to go at the beginning of the drive? Maybe that’s not true…), I just installed again from scratch, since it’s a pretty painless procedure. After that, things were peachy. The machine was fast! And I wasn’t using it enough yet to be terribly mad at Unity yet! But then things went south…

The Frustration

At some point when I was doing an aptitude install of some packages, the machine locked up hard. No disk activity indicator ever blinked on any key presses; I couldn’t ctrl-alt-F1 to another TTY; I couldn’t Alt-SysRq-<whatever> to sync the filesystem, etc. This is clearly not what you want to happen on any system, let alone a new system. The same thing happened again another time when I was using the Ubuntu Software Center to install a package. It happened again when browsing a web page. It happened three more times when I was doing an rsync over the LAN.

I ran the included Dell boot diagnostics, which turned up no errors. (I actually first tried Ubuntu’s hard-drive-installed Memtest86 instance, but that wouldn’t boot, complaining about ‘linux16’ not being a valid boot option, or something.) I later put Memtest86+ 4.20 on a USB stick and ran that, and got some memory errors, but initially suspected it might be an incompatibility with the new Ivy Bridge chips, so didn’t put too much thought into it. That led me to find the Memtest86+ 5.00 Beta 1 claiming to have added support for Ivy Bridge. Hooray! Right? It did successfully detect the SPD information for my RAM, unlike 4.20, but still turned up memory errors. I took out the stick of RAM the results indicated was bad … and still got errors. Ok, so maybe I took out the wrong one, so I swapped them, and … still got errors.

Much testing later, I finally determined what I believe to be some sort of perverse pattern of behavior: When Ethernet is plugged in, Memtest86+ will display errors, when run from a cold boot. If Ethernet is unplugged, Memtest86+ does not show any errors. When run not from a cold boot, Memtest86+ does not display errors (at least, not in the place that it consistently does during all my other tests—I haven’t, for instance, let it run for hours).

Next Steps

What to do now? Some folks on the ArsTechnica #linux channel are convinced that I just got a lemon, and some lead from the Ethernet controller was laid poorly in the PCB, and is interfering with the memory controller. Based on their advice, I contacted Dell support, and they’re shipping me a new system.

I’m less optimistic that the problem I’m seeing is a single-instance hardware issue. Although I have little to base my theory on, my current musing is that this is a systemic issue, possibly with the vPro implementation or the BIOS/EFI. My primary support for this is the fact that, in Windows, I’ve seen no instability during the ~couple-hour ~1.5 MB/s download of Portal 2, web browsing, installing Windows Updates, etc. If you’ve ever read Michael Matthew Garrett‘s stuff, you might be familiar with the craziness that can exist in BIOS / EFI, and the poor state of any reasonable implementation, testing, or support outside of whatever happens to work in Windows. In Windows, of course, there are lots of Intel-written drivers loaded, an Intel vPro/AMT management toolkit that runs in the system tray to interface with the vPro junk, and who knows what other kind of voodoo going on. In Ubuntu, yes, there’s some Intel-written driver code in the kernel, but the 3.2.0-23 kernel I’m running at the time of this writing is a bit outdated relative to whatever fixes they might’ve put into the current stable 3.4.4 or 3.5-rc6. I think the amount of testing the desktop driver code receives on Linux is probably laughably small compared to their Windows code (which is probably laughably small compared to maybe someone like Nvidia). I also have no idea if there’s any Linux desktop code to handle the vPro stuff. I may be in a bit over my head.

I’m not even sure how to start debugging or diagnosing this issue, which I assume will present itself again in 3-8 days, when the replacement system arrives. Writing down what I’ve done so far, so I can link others to this when I cry to them for help is all I can think of at the moment. I’ve not been able to turn up any other useful google results for OptiPlex 9010 and Linux or Ubuntu or Memtest86, so now hopefully at least this puts the issue on the radar, in case anyone else is running into a similar problem.

(As an afterthought, I do want to mention that the OptiPlex 9010 is actually Ubuntu Certified … for pre-install only of 11.10, anyway. Dell naturally only offers that configuration in certain regions, and I’m relatively confident that gives me approximately zero traction with them for my issue…in the US…on 12.04.)

Edit: A Few Other Notes

As I was running up against my desired bed-time last night as I composed this, I rushed a bit to publish, and have since thought of a few other details from my experience so far.

  1. I have not been able to reproduce a lock-up using nc -l 9999 > /dev/null or nc -l 9999 > /some/file/on/the/hard/drive with the other side of a gigabit link piping /dev/zero or /dev/urandom into netcat (using tcp or udp). My theory behind this is that maybe netcat is too simple of an operation, so it’s not exercising the cpu or memory enough that the rest of the system trips over it? I don’t feel especially satisfied by that explanation, though.
  2. I have not been able to reproduce a lock-up doing an rsync from my ssd to the spinny hard drive (both internal, sata); I wanted to try to rule out disk activity from being the culprit, so that was my attempt to isolate the symptom a bit further. I don’t know if it’s meaningful.
  3. I also got a lock-up one time immediately after I copied text from Thunderbird and was moving my mouse to paste it into Firefox. At the time, the only other active desktop process I’m aware of was Banshee playing music off a cifs mount (though it’d been doing that with the computer otherwise idle for a few hours up to that point).
  4. Two times I’ve experienced what seems to be a drop in all my active network connections: SSH sessions unrecoverably stop responding, music playback from a cifs mount stopped, and a Remmina rdp session unrecoverably stopped responding. The only evidence in any logs I can find that something went wrong is: CIFS VFS: Server sark has not responded in 300 seconds. Reconnecting… Update: I’m pretty sure this was caused by my computer’s IP address changing. I noticed that my computer had progressively been marching up through the DHCP lease pool, rather than sticking to its originally allocated address like it should. It’s my belief this is due to vPro/AMT trying to poke its head up and get a DHCP lease using the same MAC address. Further investigation is merited. I also don’t know if this means anything for the bigger problem of the lockups.
  5. Special thanks to some Ubuntu guys who have come to my aid (compliments of johnf)!

Updated: The replacement system arrived, and it appears that maybe I’m running into a BIOS/UEFI bug?

Tags: , , ,

9 Responses to “Dell OptiPlex 9010 Ubuntu Linux Lock-Up”

  1. Mathieu Trudel Says:

    Have you had any more crashes with the replacement?

    Please, file a bug on Launchpad so we can look into the issues if you get them again. Make sure you use ‘ubuntu-bug linux’, so that all the information about your system is there.

  2. p14nd4 Says:

    The replacement is scheduled to arrive between 2012-07-16 and 2012-07-23. If you share my pessimism and/or otherwise think it’s worthwhile, I can get that process started now, and then close the bug if it does get magically resolved by the replacement system. Thanks for magically stumbling across my blog and reaching out to me!

  3. Jorge Castro Says:

    This indeed does look like a hardware error, however if you want to try a newer kernel than what’s shipped you can always install it and boot into it to test: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-precise/

    Also it’s Matthew Garrett, not Michael. :)

  4. p14nd4 Says:

    The replacement system arrived, and it appears that maybe I’m running into a BIOS/UEFI bug?

  5. Benoit Van Bleyenberghe Says:

    Hi David,
    Dell OptiPlex 9010 running Ubuntu Linux…
    This is my dream :)
    I replace my equipment and works with Ubuntu.
    I will buy this OptiPlex. Because it is Ubuntu certified…
    Does it work well for you now?
    Thank you for your presentation of problems.
    Benoît

  6. Basil Wallace Says:

    So at work we have > 50 Dell OptiPlex machines running continuous testing (basically, they check out code and run build tests)

    We were previously buying the Dell 990’s (we have 45 of them) and now have moved onto Dell 9010’s. We run Oracle 5.8, 6.2 and now Oracle 6.3.

    I came across this page because I too am having kernel panics and other lock ups with ONLY the 9010’s.

    Guess I’ll check out the UEFI side of things.

  7. BAtdelger Says:

    We work has replaced with Dell OptiPlex 9010 Ubuntu 3 days ago. There are 5 working fine, only my PC has the freeze problem. (even I don’t touch anything) Thought it was display cards (I use 4 screens so had 2 cards) causing the problem. So changed those display cards with 2 new ones. Few hours later same thing happened. How should I fix this problem?

  8. p14nd4 Says:

    If you’re not doing so already, I’d recommend giving Ubuntu 13.04 a try, primarily for the new kernel and other video drivers. Also, with your video cards, are you using the opensource or proprietary drivers? You could consider giving the other ones a try (and, for instance, I know NVidia has a few versions of their driver available in Ubuntu, so you could try each).

  9. J. Wandeto Says:

    Hei.
    I also bought the Dell Optilex 9010. But Caninical no longer supports Ubuntu 11.10 that it came with
    I upgraded the Ubuntu 11.10 to 12.04 but when it restarted it shows images on screen that do not make any sense and stops the starting process.
    Any body with way out?
    email me at: ndetos@yahoo.com

    Thanks

Leave a Reply