Subject: Re: Algo question
From: david@djwhome.demon.co.uk
Date: 25/09/2005, 15:15
Newsgroups: alt.sci.seti

in article <1127600441.999886.123200@g49g2000cwa.googlegroups.com>,
ccristian@gmail.com wrote:

 Can anyone point me to where I can find detailed informations about
seti algorithm?

I assume you meant the SETI@Home client algorithms, not the generic Search
for Extra Terrestial Intellgence one, or even the total S@H project one.
(The general SETI process can monitor any physical parameter for
variations from random that are unlikely to be produced by natural
mechanisms, and where those variations are originating from somewhere
not under human control.)

The source code for the BOINC version is published; that is your best place
for *details* of the algorithm.  This is essentially the same as S@H classic
but may have some improvements.  Some of the server algorithms, for BOINC,
will be there, but some parts of the process are manual and some will be
adapted as time progresses.

There are also papers on the setiathome web site that cover the various
detection strategies at a high level.

At a high level the client algorithm is (excluding data communication
related aspects):

Convert to frequency domain

High pass filter *frequency* domain spectrum  (I think I have this right -
note this is not high pass filtering the time domain signal by manipulating
it in the frequency domain)

Convert back to time domain.

For each chirp rate that produces sufficiently different  results
  Multiply complex time domain signal by complex time domain chirp

  For each power of 2 data sample length from 128K down to where the result
    is not significantly different from the last for that length

      Compute set of, non-overlapping in time, power spectrum estimates

      For each run of data and each frequency within it 
         If estimated power >  22 * work unit mean power for such a combination
             Report a spike
         End if
      Next frequency / run combination

      If sufficient number of runs across beam transit time
         For each possible complete beam transit
            For each frequency
               Try to match a gaussian to the power spectrum
               If there is a sufficiently good combination of 
                         low mean square error and high peak
 
                  (Sufficiently here means that there is a fairly
                   even spread of false positives and a whole work
                   unit produces very roughly one false positive.)

                  Report a gaussian
               End if
            Next frequency
      End if

      If sufficient number of runs across transit time
         For each frequency
            If there are three sufficiently large samples for runs which
                   are equally spaced in time

               (It's possible that they are also checked for being within
                as single beam transit width.)

               Report a triplet
            End if
         Next frequency
      End if

      
      If sufficient number of runs across transit time
         For a sufficient number of intervals
             If there are a sufficient number of that interval in the
                  beam transit time
                 For each frequency
                     For each distinct set of runs with a spacing of that
                          interval  (I think the actual algorithm interpolates
                          and doesn't require runs actually exist at each step)
                         Average the power
                         If the average is sufficiently large for one 
                            starting point (relative to some mean)
                            Report a pulse

                            (The actual algorithm is more efficient than 
                             this, and it is possible that the possible
                             transit boundaries are taken into account)
                         End if
                     Next set of runs
                Next frequency
            End if
         Next inteval length
      End if
   Next frequency resolution
Next chirp correction

This is largely from memory.  The actual order of operations may differ
and some algorithms are computationally more efficient than as described.
You will need to see the source code for the exact definitions of
"sufficiently", but the basic principle is to generate about one false
positive per type per work unit and to spread the false positives, in
some way, evenly across the parameter space.