Looks like my local DSL gateway/mail server/... is having serious hardware problems: Some time ago the machine started freezing under more-or-less heavy load. Because the machine was performing write-intensive harddisk operations, when that happened, I first suspected the harddisk (Matrox 40 GB) to be the source of the error. So I ended up copying all the stuff from the old harddisk to a new one, which needed several attempts, because the machine also decided to freeze while mirror'ing. But in the end, the problem persisted and so I decided to test the RAM for errors. Memtest86 needed > 10 hours to check the memory, but it did not find any problems as well.

So I finally figured out it might be some problem with the CPU getting to hot and started to build a custom kernel for Debian woody that included the lm-sensors patch in order to measure the CPU temperature. As the attentive reader might already have guessed, this also caused the machine to freeze ...

Now I am stuck and a bit unsure what to do next: I might end up unmounting the CPU and re-applying the heat-conductive paste between the CPU and the fan.

Did I already mention how much I appreciate hardware problems?

Update: There were actually two problems: First of all there were incompatibilities between the Linux Kernel and the mainboard. Fixing this required me to disable UDMA mode for all drives. This makes the harddisks slow as hell compared to how it was before, but it seems to help a lot. The second problems was, like I had already expected, that the CPU got too hot. I solved this by re-appyling heat-conductive paste. The machine sustained all my tests so far, which is a good sign — Yay!
Written on 28 Sep 03 07:15 PM.