Wednesday, September 25, 2013

The Science Algorithm

The Science Algorithm

The American Heritage Science Dictionary defines science as "the investigation of natural phenomena through observation, theoretical explanation, and experimentation, or the knowledge produced by such investigation. Science makes use of the scientific method, which includes the careful observation of natural phenomena, the formulation of a hypothesis, the conducting of one or more experiments to test the hypothesis, and the drawing of a conclusion that confirms or modifies the hypothesis." The Oxford English Dictionary likewise defines the scientific method (or "science algorithm") as a "method or procedure that has characterized natural science since the 17th century, consisting in systematic observation, measurement, and experiment, and the formulation, testing, and modification of hypotheses." 

Anything that subverts or undermines the scientific method is like a lie that once told, multiplies its effect exponentially. Like a soot cloud from a chimney, the effect swirls out, obscuring scientific truths, sometimes for years. 

Scientific misconduct comes in many flavors, including everything from ethical lapses, to fraud, negligence, deliberate bias, and premeditated dishonesty (trimming, forging, cooking--three terms were coined by computer pioneer Charles Babbage in 1830, which shows that we are not dealing with a new problem). Indeed, any such action that willfully compromises the integrity of scientific research introduces "glitches" or "bugs" in the algorithmic process of science. As science studies the ever more subtle aspects of nature, it is increasingly important to make sure that our scientific conclusions are as free from bias as possible. Of course all observations are biased by the transfer function of the measuring instruments and the unconscious bias of the observer, but that is a given.

A recent science controversy, Climategate, is part of a broadening debate about failure of the scientific peer review process (witness the WMO/IPCC's much maligned 2007 4th annual assessment report, that carelessly used dubious data pertaining to Himalayan glaciers). Other important recent science controversy concerns scientific fraud (i.e., manufacturing data and altering, fudging or biasing experimental results to favor a particular outcome) in research funded by commercial interests. Good old-fashioned bias (which can be additive or subtractive) seems to fall in the middle somewhere. 

This is a debate about a defective process in science, not defective people. The factors leading to various "sciencegates" (and to a general increase in scientific fraud generally, particularly in the biological science, but also Earth science) seem to be a result, not only of the Internet, but also of the need for researchers to please the powerful government or corporate bodies that fund them. Scientific fraud may also be committed to serve a particular viewpoint (e.g., Intelligent Design). In contrast to the nice, tidy, synthetic papers that grace so many scientific journal pages, many of which claim use of the scientific method, science is actually a very messy, ill-defined, process. Nevertheless, when the public spotlight falls on science and finds problems, true science suffers. The solution is to fix the problems, not turn off the spotlight! 

A debate during Climategate about whether certain emails were stolen, or whether they were actually obtained under the FIA, is simply a smokescreen. Email works like this: Everything you send in open (unencrypted) form across public networks is public. Whether you are a climate biaser or a bank robber, your emails are only a little more private than a small advertisement in the newspaper. Moreover, modern information tools and the Internet make it increasingly easy to commit fraud of any kind, not just scientific fraud. The Internet has created a more open society, and as a result we must all live to higher standards of accuracy and truth. This is good for society, and is good for science. It is also really good for the environment. Acts against the planet that went unchecked in the past are now on public view. This is our brave new world. We can like it or lump it; that is our choice. If we have lived in the past by the philosophy that some particular end justifies the means, now that viewpoint is much more likely to become fodder for public debate. I highly recommend a paper by Brian Martin, an Aussi academic, that discusses the prevalence of and sociological reasons for scientific fraud (http://www.bmartin.cc/pubs/92prom.html).  

Martin's paper is particularly interesting to me because decades ago when I worked as a scientist, I frequently came across a lot of this sort of thing, and (powered by the Internet) it seems to have only got worse since. To give an idea of the magnitude of the problem, school teachers now insist that essays and papers (written with the aid of the Internet) be submitted electronically so that they can be analyzed by an anti-plagiarization service; the teachers are fighting fire with fire. 

Science also needs to go outside the old algorithmic processes, and in particular address the negative influence of ideology-based government funding and also address the level power and influence of various politically-motivated organizations that would seem to operate under the peer-review process umbrella, but don't really. On this topic, it always comes to my mind how US Pres GW Bush once said to his captive scientists and analysts regarding WMDs, "That's the wrong answer, go back and look again!" 

Even blogs about science are themselves subject to bias, however feel free to comment, by way of an impromptu peer review... Now, I wish there were bias-free-blogger web cams in the arctic to see how the polar bears are doing...it's very difficult to know who to believe...

Tuesday, February 5, 2013

Gauss's FFT: DIT or DIF?

Gauss's FFT: DIT or DIF?


I may have an answer, and it has to do with the structure of the butterflies.

[Notation: N-point sequence {X[n]} is the DFT of N-point sequence {x[k]}, typically frequency and time domains, respectively.]

Assuming radix-2, if we want to combine two DFTs of length N/2, the DIT butterfly is:

DIT:          X[n]         = Y[n]  +  W^n Z[k],
                  X[n+N/2] = Y[n]   -  W^n Z[n]

where n = 0 ... N/2 - 1, and W = exp(-j2*Pi/N)

The DIF structure is the dual of the DIT structure. Namely, if we want to replace a DFT calculation of length N, with two length N/2 calculations, the data {x[k]}k=0,N-1, is split into {y[k]} and {z[k]} using the DIF butterfly:

DIF:           y[k] = x[k] + x[k+N/2],
                   z[k] = (x[k] - x[k+N/2])W^k

where k = 0 ... N/2 -1, and as in the DIT case, W = exp(-j2*Pi/N).

In Gauss's mixed-radix algorithm, N = 4 x 3, with reference to the previously posted diagrams, we have radix-3 output butterflies such as

                   X[3] = X[0,3] + X[1,3]W^3 + X[2,3]W^6
                   X[11] = X[0,3] + X[1,3]W^11 + X[2,3]W^22             

Note that the second line of the butterfly above, X[11], was not used by Gauss, who did not bother to compute X[6] through X[11], as is appropriate for real data.

Gauss's version of this butterfly reads:

                  X' = γ + (γ' cos x + j δ' sin x) + (γ'' cos 2x + j δ'' sin 2x)

where γ and δ are a confusing notation that Gauss uses three-times-over for the outputs of the three length-4 1st stage DFTs.  Note that the cos nx and sin nx are the twiddle factors equivalent to W^n.

The above butterfly structure makes Gauss's N = 4 x 3 FFT algorithm a DIT, which also makes it of the Cooley-Tukey type. 

Since Gauss seems to have used the same methodology for all of his examples, the same conclusion, that his algorithms were DIT, applies to his other mixed-radix example, N = 3 x 4, too, and to the common radix N = 6 x 6 example.


Monday, December 12, 2011

Gauss's 1805 mixed-radix FFT algorithm: Decimation in frequency or decimation in time? Which is it?

Gauss's 1805 mixed-radix FFT algorithm: Decimation in frequency or decimation in time? Which is it?


Does your favorite FFT algorithm have a decimation in time (DIT) structure, or could it be a decimation in frequency (DIF) algorithm?  This useful-sounding clue to the architecture of FFT algorithms is poorly described in many sources, including perhaps the most widely read (otherwise quite superb) reference work on FFTs, Oran Brigham's The fast Fourier transform (Prentice-Hall, 1974).  Other useful clues to FFT algorithm architecture are the radix (which may be mixed), and whether it is basically Cooley-Tukey, Sande-Tukey, mixed radix, split radix, prime factor, Winograd Fourier transform algorithm (WFTA), or a hybrid mixture. There are also Winograd's efficient small-N cyclic convolution algorithms.

Why am I interested in sorting this topic out? The very first FFT algorithm, invented by German math genius and human computer, Carl Gauss, 160 years ago, may have been misidentified by Heideman, Johnson, and Burrus ("Gauss and the history of the FFT" IEEE ASSP Magazine, Oct. 1984, page 18), all of whom are authorities in the field. They write: "...Gauss's algorithm is as general and powerful as the Cooley-Tukey common-factor algorithm and is, in fact, equivalent to a decimation-in-frequency algorithm adapted to a real data sequence." Depending on how I draw the flow graph it seems to me to be DIT, or a mixture of DIT and DIF (if this latter case is even possible). If I've overlooked some important fact, please let me know.

Before I show the evidence for why I think Gauss's algorithm is DIT, here's some more details of what Gauss achieved so long ago, approximately 100 years B.C. (before computers). In Gauss's published works,* which fortunately for us contain a transcription of all of his notebooks (written in an obscure dialect called neo Latin!), after some 50 pages of FFT theory, Gauss spends ten pages giving three numerical examples: a length-12 algorithm done two ways, as a mixed radix 4 x 3, and then to check his results, factored the other way around, 3 x 4; lastly a length-36 as a radix-6, with factors 6 x 6. Gauss noted that his method "greatly reduces the tediousness of mechanical calculations; success will teach the one who tries it." By the way, Gauss stated that if the factors are themselves composite, his splitting algorithm could be applied recursively. As far as we know, this is something he never did, but that he saw the advantages of applying his FFT method recursively, he was obviously very ahead of his time! Gauss also did not discuss the (Log2N)/N savings to be had if the transform length was a power of 2.

Brigham (p. 178) says that the term DIT arises from the fact that alternate derivations of the algorithm are structured to appeal to the concept of sample rate reduction or throwing away samples. Likewise, Brigham says that the term DIF arises for reasons that are analogous to that for DIT. Crystal clear? Not to me.

The Cooley-Tukey radix-2 algorithm splits the input into interleaved even and odd sets, where each has half the sampling rate. This is clearly DIT!

The Sande-Tukey radix-2 algorithm computes the DFT terms in two interleaved sets, even and odd, each with effectively half of the frequency resolution. This is clearly DIF!

The Gauss mixed radix N = r1 x r2 = 4 x 3, mixed radix 3 x 4, and radix-6 FFT algorithms follow the same general DIT floorplan, as follows (for the 4 x 3 example):

1) The length-12 input sequence is split into three interleaved sequences (i.e., the samples are "decimated"), numbered 0, 3, 6, 9;  1, 4, 7, 10;  2, 5, 8, 11.

2) The DFT is computed for each of these three sequences using a length-4 algorithm (no details given, but is no big deal whether an efficient algorithm or not).

3) What follows next can be viewed two different ways. (Gauss's calculation stopped at X6, as is appropriate for real input data.)

Either (3a) Corresponding output triples are multiplied by twiddle factors and added together.
E.g., X1 =  X0,1  + W^1 . X1,1 + W^2 . X2,1 , where the first subscript 0,1,2 of X refers to an output from one of the three length-4 DFTs.  W is the twiddle factor. The outputs are naturally in a 1D array, and are in natural order.

Or (3b) The corresponding output triples from the length-4 DFTs are fed to four length-3 DFTs, which compute the following 2D array of X values: 0, 4, 8;  1, 5, 9;  2, 6, 10;  3, 7, 11. These outputs are in scrambled order, however this is not enough to reclassify Gauss's algorithm as a DIF. 3a does exactly the same thing as 3b, but 3b cannot be made to come out in natural order without a 2D-to-1D mapping.

Please refer to the relevant flow graphs, attached below.  Comments?



______________
*Download free from Google Books:  Gauss, Carl Friedrich, [1866] “Nachlass: Theoria interpolationis methodo nova tractata,” pp. 265–327, in Carl Friedrich Gauss, Werke, Band 3 Königlichen Gesellschaft der Wissenschaften, Göttingen.

Welcome to the Lair of the Algorithmage!

The name of this blog is a play on three words: algorithm, age, mage.

al·go·rithm 

noun \ˈal-gə-ˌri-thəm\  :a step-by-step procedure for solving a problem or accomplishing some end, especially by a computer

age

noun   \ˈāj\  :a period in history or human progress

mage  

noun\ˈmāj\ :magician

Mage is old Middle English for magician (one skilled in magic), and is an old name for the most advanced technologists of that era. The Hacker's Dictionary: A Guide to the world of Computer Wizards (G.L. Steele Jr., et al, Harper, 1983) defines magic as “something as yet unexplained, or too complicated to explain.” In a free interpretation, magic therefore refers to something algorithmic that is unexplained (unpublished), or that is too complicated (or too advanced) to explain to anyone who is not initiated into the relevant wizard/magician/technologist circle.

My favorite author, Arthur C. Clark, once said that any sufficiently advanced technology is indistinguishable from magic. Therefore, since algorithms are a major ingredient of most modern technologies, Clark would no doubt have agreed that sufficiently advanced algorithms are themselves tantamount to magic. The obvious corollary is that since technologists create algorithms, any sufficiently advanced technologist must be a magician, or at least be indistinguishable from one, as far as a general audience is concerned.