Since the beginning of the year (or, perhaps, more germanely, the beginning of the hot season) the hard drive in my iMac G5 would have a little clicking fit. This is usually the first sign of impending drive failure, but as it usually would stop and get back to work in a few seconds I did nothing except resolve to backup more often than I usually do. Just FYI, it’s a 20″ iMac G5, the last series before the iSight model, with a 250GB Maxtor SATA drive.

The last few days it’s been hotter than usual – often around 33C during the day – and around the middle of last week the clicking started to happen more often, and it would sometimes take minutes to recover… this only happened when I was booted from a certain partition, and never when booted from the other partition, so I tended to spend more time in the latter situation. I also installed Marcel Bresink’s excellent freeware Temperature Monitor, which told me that the iMac’s built-in hard drive temperature sensor showed 54C, while the drive’s own SMART sensor said it was at 70C. Which of course is somewhat beyond the usual rated operating temperature of 60C…

I finally got some free time to actually do something about it and proceeded to do a full backup of my home folder and of selected other folders to an external hard drive. I then tried to do an erase-and-zero-data operation on the internal drive, which (after 10 hours!) failed with an I/O error. And the drive temperature went up to 72C while the external sensor still said 54C! Something was very wrong.

Well, clearly this meant the drive was no longer reliable and I proceeded to find a replacement. Only a few months earlier I’d phoned around to find a larger backup drive, finding out that nobody had anything larger than a 160GB IDE in stock, and that an exorbitant price. SATA drives were “about to come in”. This time, too, the first stores I tried had no large drives available, until the nice people at TecMania pointed me at WAZ, where I promptly found a 320GB SATA drive for about US$210.00, not too bad for someone in a hurry. So on Saturday I was the proud owner of a new Western Digital WD3200KS, and gained several dozen GBs space, not too bad.

The new drive’s power consumption specs were about 20% lower than the old Maxtor’s, so I was reasonably confident that it wouldn’t overheat as badly. Still, after installing it, I looked closely at the way the temperature sensor was mounted on the drive bracket. It turns out that the bracket on that side is a thin metal strip fixed to the drive with two mounting screws, and the sensor is glued on near the middle. However, even with the screws properly tightened, the metal strip arches out a little in the middle, so that there was a small air gap between the sensor place and the drive itself – clearly not a thermically optimal solution, and this might explain the huge 18C difference between the internal and external temperature readings.

I googled around and some people had indeed run into the same problem. A few had mounted external fans onto the air inlet and/or outlets, and some had even cut into the iMac cover to do so! This seemed a little radical to me, especially as it would drastically cut resale value. Another user recommended cutting off the sensor and re-gluing it onto the drive body itself, something which I actually considered doing, but I found the sensor cable would be too strained if I did so.

The actual solution I implemented is shown here:

I added the round-headed Philips screw in the middle of the mounting bracket, which goes into the center (previously unused) hole on that side of the drive. I also spread a thin layer of thermal heatsink paste onto the mounting bracket, in the space between the two holes on each side of the sensor. The air gap was completely eliminated, and indeed after I fired the system up and restored my backups, the temperature gap between internal and external sensors was reduced to a much more reasonable 4C.

This means that the drive peaks at about 58C; still within the nominal operating range of 60C max, but uncomfortably close to the upper limit. By coincidence while I was doing this, I became aware of a Google paper (pdf) about disk failures. Very interesting; they investigated an awful lot of drives, and concluded that elevated temperature wasn’t necessarily a factor; then again, their operating temperatures were below 50C.

Meanwhile, I’m monitoring the drive closely and think of alternate methods to make the sensor’s temperature track the drive’s temperature more closely (which would make the cooling fan kick in a little earlier). My first attempt, putting a piece of tape over the sensor to take it out of the fan’s airstream, didn’t make any appreciable difference.

Update: Another paper on disk failures just came out. Also very interesting.