Subject: Re: SETI@home (Classic) Phenomenology
From: Randall Schulz
Date: 05/08/2004, 19:00
Newsgroups: alt.sci.seti

On Thu, 05 Aug 2004 13:49:07 +0000, Martin 53N 1W wrote:

Randall Schulz wrote:
...


It sounds like you have hardware problems causing the CPU/software to
fail intermittently.

I doubt that, really. As I said, this symptom has occurred on two
different systems. The stall always happens at the same point in a work
unit. This is too patterend to be sporadic hardware failure.

So far, my best guess is that what I'm seeing is some kind of interaction
between the SETI@home client and Ksetiwatch.


Dirt, knackered/noisy fans, or other overheating is the first guess. Bad
power feed comes next.

Again, as I said, this is brand new system (new motherboard and new CPU).
There is no dust in the CPU heat sink. The BIOS shows nominal operating
temperatures on all three sensors.


If you've ever disturbed the CPU heatsink, it is very worthwhile
cleaning and reseating it with new thermally conductive grease.

It's a new system. The stock, boxed-CPU cooler assembly was installed and
has not been removed since (since three days ago, that is).

(I found it interesting that the heat-sink compound they use is apparently
designed to be solid at room temperature but to melt at CPU operating
temperatures and hence to conform fully to the surfaces (CPU and heat
sink) between which it's situated regardless of its form and distribution
when the heat sink is installed.)


Another problem point can be the northbridge heatsink/fan. If there is a
small fan on there, refit a large passive heatsink. Better and more
reliable and a lot less noisy.

This board is a current Intel design (D865PERL) and uses a passive heat
sink, in the form of a 10x10 grid of thin, 3cm-long fins perpendicular to
the MCH chip surface.

The system is well ventilated and the ambient room temperature is cool.


Check out your system with:
Memtest86;
Your HDD manufacturer's disk diagnostics tests; GIMPS torture test.

Memtest86 and prime95 (torture mode) reveal no problems, so far. I'll
allow prime95 to run its torture test for the rest of the day to get a
better idea of the system's reliability.


See what you find.

Computers should be _completely_ _reliable_, and consistent and
repeatable for their results.

Yes. Yes they should. In principle, they implement the essence of pure
mechanism (a.k.a. algorithm). But, they are not. There are just too many
moving parts (electrons and holes, that is...)


Let us know what you find.

Good luck,
Martin

The hardware is healthy (as modern, mass-produced, desktop computing
hardware goes).


Randall Schulz