Thursday, April 22, 2010

A Very Brief History of Probability

by Alexander O.Z.

The importance of mathematics for measurement and predictability is not hard to understand. After all, science’s methods to deal with chance probabilistically legitimized it and teased it away from natural philosophy (Horton, 2005). Experiments gave science ways to isolate causal relationships and to explain its proofs through mathematics and statistics. This Golden Bough of science, however, deals with probabilities and forecasting likelihoods that Laplace says are relative (1795, p. 7) and Paracelsus’ says are simply opinions approved by the wise (Hacking, 1975, p. 41).

Statistics ultimately came to science through the Newtonian concept that observations should be done mathematically and are combinations of reality and error (Stigler, 1999). However, not until a century later with astronomers like Gauss (1777-1855) and Laplace (1749-1827) was probability separated from gaming and applied to scientific observation (Boring, 1961; Stigler, 1999). Other sciences followed. For instance, tradition says that psychology began using statistics with Fechner’s Psychophysics (1860) and that by Ebbinghaus’ memory experiments published in 1885 psychology was firmly fixed with statistics (Stigler, 1999).

However, long before Newton and as far back as at least ancient Greece the world played dice for oracles based upon some sort of probability tables from which they derived the future or the will of the gods (Stigler, 1999). For instance, Alexander visited the famed Oracle at Delphi devoted to the god Apollo before campaigning against Persia (Plutarch, 75). Even a popular Christian story is when the Roman centurions played lots for Jesus’ garment (Luke 23:34). The Latin writer Cicero (106bc-43bc) speaks about dice and that omens were linked with certain combinations, for instance the Venus-throw. His suggestion that the rarity of the combination caused people superstitiously to look for extra meaning could be analogous to modern p-values.

Nothing is so uncertain as a cast of dice and yet there is no one who plays often who does not sometimes make a Venus-throw and occasionally twice or thrice in succession. Then are we, like fools, to prefer to say that it happened by the direction of Venus rather than by chance (Cicero, 44, paragraph 59)?  However, after being separated from occultism and gambling probability has become a powerful mathematical tool. And although the history of secular probability is seen in traces in Archimedes’ proof of the Quadrature of a parabola (Pearson, 1978), it’s not until almost 2,000 years later with the beginnings of modern astronomy and the problems of calculating the movements of the planets that the real foundations for a theory of probability theory could be laid.

The Dark Ages lifted during the middle of the 16th century and mathematical ideas begun to percolate through Europe. The Crusades brought the West ideas from the East like the Hindu concepts of number and some of its numerical symbols. Michael Stifel’s Arithmetica Integra (1544), Christoff Rudolff’s Die Coss (1525), and Robert Records’ Grounde de Artes (1540) are some of the first western publications to use mathematical symbols like +, - , and = (David, 1998).  However, the beginning of probability is a bit obscure. Partly due to its connection to gambling and the characters associated with that, and partly because of political oppression. Also, like many discoveries the discoverer is not always the one who’s namesake the invention holds (i.e. see Stigler’s theory of eponymy, 1999). For instance, Laplace used the Fornier transformations before Fourier, and Lagrange had the Laplace transformations before Laplace (Stigler, 1999).

Although its actual beginnings are unknown, probability was probably transferred to the public through the mathematician Cardano (1501-1576). However, even the origin of his ideas cannot be traced (David, 1998). However, he was an acquaintance of Leonardo da Vinci, so probability-type ideas may have been known in the esoteric circles of Europe (Tabak, 2004).

Pythagoras, like Paracelsus, is esteemed for his legendary knowledge of many branches of natural science. He is said to have discovered the musical ratios of the pentatonic scale, he discovered the famous Pythagorean Theorem He was also involved with metaphysics and various forms of occultism. Condorcet (1743-1794) believed that Pythagoras began the mathematicization process of nature that Descartes furthered and that Newton ultimately perfected (Baker, 1975). In Baker’s Condorcet: From natural philosophy to mathematics (1975), Egret’s La Prerevolution is cited saying that the function of the European academies should be to galvanize Europe’s scientific research. In other words, academies were to liberate the commoners from beliefs not built upon rationality, such people relied upon the enlightened expertise of the learned societies to dictate what they must think (p. 48). This model of the Academy and its purpose as disseminator of knowledge is not an idea begun during the French Revolution and the post enlightenment; it was also a fundamental of Plato’s Republic. Thus, men like Bacon with his New Atlantis were heavily influenced by classical philosophers, and during this age in Europe, its academic minds wanted to able themselves to communicate and disperse their discoveries to initiate a new sort of Golden Age (Baker, 1975). A utopian lost-age that Plato symbolized as the lost Atlantis in his Timeaus and Bacon revived in his New Atlantis.

Part of ushering in this new scientific age and rational world order (i.e. of Leibniz) was the concept of a universal scientific language. Bacon expressed in his Novum Organum the idea of Table’s of Discovery to classify all natural phenomenons for quantitative measurability, so their underlying laws could be established. Such was Newton’s concept first laid out in his Principia, which put physics on solid scientific ground (Baker, 1975). It’s in following that tradition that Condorcet hoped to bring to the moral and political sciences the rationality that Newton brought to the physical sciences (Baker, 1975).

Condorcet’s desire to bring people a rational world-view and the controversy surrounding his death in prison in 1794 probably led Pearson (1978) to make the obscure statements that he was the enemy of the Jacobins and had been driven to death, as were all men of science and especially those of probability (p. 651).
Nevertheless, regarding probability what can be certain is that Cardano (1501-1576) published Opus novum de proportionibus where he put forth the binomial coefficients and a binomial theorem, and also Liber de ludo aleae published posthumously in 1663 and which is said to be first systematic conception of probabilities (Hacking, 1975; Stigler, 1999; Pearson, 1978). His error however, was that he conceived probability in a way that in three throws of the dice there was a fifty percent chance that any certain number would be had, instead of the correct forty-two percent (Tabak, 2004). Ultimately, Cardano was a superstitious man who apparently liked to gamble and could not shake the notion of luck. He believed that the will of the gambler had a play in the outcome even when casting a fair dice. In fact, De Moivre inserted an argument against Cardano’s invocation of will in his Doctrine of Chances (Tabak, 2004).

Galileo catalogued all the possible combinations of three dice in a fragment written between 1613 and 1623 titled Sopra le Scoperte dei Dadi (David, 1998), although it’s said he cared little for the subject (Tabak, 2004). The well-known scientist Pascal (1623-1662) wrote a work on the same subject Traite du triangle arithmetique in 1654 that was published posthumously in 1665. Pascal’s book on probability was the field’s first full-fledged encounter with the binomial distribution (Stigler, 1999).

Tabak (2004) says that probability began with a dialogue through letters between Fermat (1601-1665) and Pascal (1623-1662) in which they were discussing possible solutions to a popular gambling puzzle. Although they did not sort out a full theory of probability, nor speak of it as such, they did bring the concepts further than Galileo and Cardano. What’s important is they found that although they could not predict an individual cast of the dice, they could predict the relative frequency that certain casts would occur over many throws.
However, one of the first works devoted exclusively to probability was a small fifteen-page booklet by Dutch mathematician Christian Huygens (1629-1695) published in 1657 in Leiden De Raciocinniis in Aleae Ludo (Laplace, 1795). De Raciocinniis became a principle treatise on the theory of probability for fifty years. Huygen’s influence did not vanish after his death; Jacob Bernoulli (1654-1705) admits to have absorbed Hyugens’ work into his own Ars Conjectandi (1713). Ars Conjectandi was incomplete at James’ death, and was published posthumously by his nephew Nicholas Bernoulli (Laplace, 1795; Pearson, 1978). In this work, Bernoulli put forth his theory of large numbers and of independent events (Stigler, 1999; Tabak, 2004).

It is however, Pitcairne’s 1693 dissertation as Stigler (1999) suggests that shows the first explicit use of probability (p. 216). In this example, Pitcairne argued that strainers and sieves could not account for bodily secretion as previously thought. He laid out this theory using Huygen’s probability, and grains and sieves to demonstrate the argument that such a mechanism would be ineffective. Pitcairne then attempted to apply the sieve mechanism to bodily secretion using a more complicated concept of sieve, only still to find the mechanism to fall short. He realized that he could not provide a solution for this phenomenon. He said in this same lecture that the nature of all bodies is the same (Stigler, 1999, p. 217). Ultimately, he argued against the known mechanism of secretion using a probabilistic argument.

However, just as now, not everyone wanted to derive knowledge of humankind through mathematics. Edward Eizat’s Apollo Mathematicus argued very much against extracting knowledge of man mathematically, and called Pitcairne’s approach gibberish. He claimed that sieves based on mass and shape do not account for the diversity in nature. Eizat believed that math would turn the world upside-down and overturn the ancient landmarks that our fathers have set (Stigler, 1999, p. 230). He believed that medicine should not be overtaken by Pitcairne’s speculative science of mathematics, because medicine was rational and empirical (Stigler, 1999). The confrontation between these two men laid the groundwork for modern medical science, and launched the still relevant criticism of experimental science, which is validity (Stigler, 1999).

From the 17th and 18th centuries, a symbiosis began between mathematics and the other sciences. Although there were a number of important mathematical contributions in the 18th century, Stigler (1986) says that the single most important contribution of the 18th century to statistics was Least Squares. Three scientific problems are related to the story of Least Squares during the 18th century. These three scientific problems are the shape of the earth, the motion of the Moon, and the perturbations of Jupiter and Saturn. However, one of these problems was of greater importance at that time due to its commercial and naval use- the motion of the moon. The mastery of the motion of the moon held incredible importance to sea navigation because it meant navigational freedom. In 1714, England offered a monetary reward to the discoverer of the highly prized longitude at sea (Stigler, 1986).

Mathematician Tobias Mayer (1723-1762), who collected data of lunar observations between April 1748 and March 1749, was first to understand the importance of realizing the true error through averaging out multiple observations. Although Stigler (1986) says he was a bit optimistic about his formulas- claiming proof of an inverse relationship between the amount of observations and the determinant results.
However, once moved from games to observations, then from observations to the error in the observations (i.e. Simpson), probability really started to become useful. In other words, once from astronomical observations the random distribution of errors of these observations could be gathered, the concept of inverse probability could be brought out (Stigler, 1986). An important step regarding the evolution of statistics was the inverse probability of Bernoulli and De Moivre (David, 1998).

During his lifetime, De Moivre (1667-1754) became close friends with Newton (Pearson, 1978). Pearson (1978) offers an image of De Moivre sitting in a coffee house at a dirty table accompanied by a degenerate gambler with Newton tunneling through the cafe to Moivre’s table for a discussion. De Moivre’s first paper was presented to the Royal Society in 1695 and was an expounding of Newton’s theory of fluxions, which became modern differential calculus (Pearson, 1978). De Moivre published his Doctrine of Chances in 1756 in which’s introduction Huygens was offered inspirational credit (Tabak, 2004). Although there is more insight than actual mathematical theory and formula in his treatise, he does supply the concept of the bell-curve. De Moivre uses this bell curve concept a bit different than it is used today- not conceptualizing the continuous distribution that it’s modeled around today (Pearson, 1978; Tabak, 2004). Also, DeMoivre is said by Pearson (1978) to have developed the concept of standard deviation, which De Moivre called the measure of dispersion.

In addition, another important statistical concept was determined in the 18th century by Thomas Simpson (1710-1761), the density function, and in 1757 he published a letter considered the first publication of a continuous error distribution (Stigler, 1986). Pearson (1978) suggests that Simpson’s The nature and laws of chance (1740) is little more than a summary of the second edition of De Moivre’s Doctrine of chances (1738), even that Simpson uses some of De Moivre’s examples without citing them. A rampant issue apparently with Simpson, as Pearson (1978) attacks Simpson again for having stolen De Moivre’s results and published them in his own 1742 treatise on annuities.

Stigler (1986) suggests that another main statistical product of the 18th century was the inversion of the probability analyses of Jacob Bernoulli and De Moivre. Stigler (1986) says that it was Simpson’ and Bayes who teased the concepts away from games and dice, and focused astronomical observations and the errors of observations, which in turn inspired Bernoulli. Bayes’ An essay toward solving a problem in the doctrine of chance was read by Richard Price to the Royal Society in 1763. There’s an anecdotal note that Bayes’ may have been working on a mathematical model to trace to the First Cause. However, partly due to lack of appeal among their contemporaries, and Bayes’ lack of publishing, it was not until Laplace that this concept was brought into a broader use and application (Stigler, 1986).

Laplace (1749–1827) was a French astronomer and mathematician probably most known for his treatise Mécanique Céleste (1802). However, he is widely credited with formulating the theories that later became associated with Bayes’. Pearson (1978) says that Laplace drew heavily from De Moivre, Bayes, and Lagrange, and does not give these three men enough acknowledgements. Laplace also speculated on the origin of the solar system and the existence of black holes. He wrote two papers that would gain him a reputation as a mathematician before he was elected into the Académie in 1773 (Stigler 1986). He in 1795 gave lectures at the Ecole Normale under the heading Theory of analytic probability, but didn’t publish them until 1812 (Pearson, 1978). Pearson suggests that probably Laplace was not involved in mathematics during this time because of the political climate, the same politics that may have stifled Condorcet and may have led to his death. People, Pearson (1978) suggests, were more concerned with their astrological reading than the math behind astronomy. It was much safer to deal with the zodiac than with probabilities. For the same reasons, Laplace probably made no mention of Condorcet in his treatise of 1795 (Pearson, 1978).

However, regarding statistics, Laplace suggests that data collection and data treatment lead to varying interpretations of data. Thus, individual inclinations and history become relevant (Laplace, 1795; Pearson, 1978). Laplace (1795) also said that numerous comparisons were the only way to get to the truth, and that probability was done through a categorization process based upon sameness. He glorified Francis Bacon in his Essay (1795) for embracing the experimental method. Laplace is known to have said to Napoleon when asked of God that he had no need of that hypothesis.

By far, the most famous family in a history of math and statistics is the Bernoulli family. Many generations of Bernoulli’s enter into the history of science and mathematics (Tabak, 2004). Another prominent mathematician from the famous Bernoulli line was James’ (i.e. Jacob) brother Daniel (1700-1782). Daniel Bernoulli brought the statistics of the 18th century a bit further with his theory that the arithmetical mean of a set of observations cannot be accurate because it does not weight the each data point, and they should be weighted because they are not all equally valid observations. Bernoulli saw that astronomers remedy this by eliminating the extreme observations before calculating the mean. Bernoulli suggested that errors would be accumulated equally above and below the true point, and errors closer to the true point would be more likely while those further would be less likely. This he surmised would generate a distribution of error likened to a semicircle. His claim was that by this method a truer mean could be gathered than by simply averaging out all the original scores (Stigler, 1999). Bernoulli sent his paper to St. Petersburg for publication in 1778. Euler inserted a commentary before the paper was published. Sigler (1999) suggests that this may have made Bernoulli quite happy, as Euler was perhaps the most prolific mathematician of all time (p. 305).

Legendre (1752-1833) was a mathematician who wrote on various subjects including geometry, gravitational theory, and was among those who measured the meridian arc in 1795. However, in 1805 Legendre published a work that introduced the concept of summing the squared error terms (Stigler, 1986). Legendre suggested that by doing this a balance was generated to stop the bias of the extremes (Stigler, 1986). But, also others have laid claim to this technique: Adrain published it in late 1808 or early 1809, and Gauss in 1809. However, and whoever had it first, Legendre published it first. Nevertheless that did not stop Gauss from claiming to have used the method since 1795 (Stigler, 1999).

Most notable in statistics in the 19th century is probably Francis Galton’s (1822-1911) invention of regression in 1869 while researching human heredity. Stigler (1986) says that though the mathematical foundation (i.e. Least Squares) was in place in the earlier part of the 18th century Galton had to develop the concept of bivariate-distributions which allowed for coefficients to ultimately support his theories for heredity. Galton did experiments with sweet-pea seeds around 1875, in which he observed that the average size of the offspring of larger seeds was less than that of their parents, and the average size of the offspring of smaller seeds was greater than that of their parents. Galton conceived that the mechanism was that offspring tended to revert towards the mean. Reversion was later changed to regression leading to the term regression towards the mean (Cohen, 2001).

The International Health Exhibition in 1884-85 celebrated Galton’s new concept, and Galton established a laboratory devoted to measuring human statistics (i.e. biometrics). He collected data like height, weight, and strength from a large sample of 9,337 people (Boring, 1950). After the exhibition, the laboratory continued and became the Biometric Laboratory at University College in London chaired by Galton’s disciple Karl Pearson who had access to Galton’s heredity data (Boring, 1950; Wachmuth, Wilkinson, & Dallal, 2003).

Pearson had hoped to make Galton a British Wundt but the English-speaking psychologists he was attempting to influence had chosen German instead of British lineage. And, except for the correlation, much to Pearson’s disappointment Galton’s research was generally ignored (Boring, 1950). Boring (1950) says that whereas Wundt wanted to evolve psychology Galton wanted to evolve humankind. However, Boring (1950) also suggests that Galton’s work may have been fully assimilated (p. 482) by modern psychology and that he is not recognized as a psychologist because he was preoccupied with his uncle Charles Darwin’s theory of evolution and didn’t generate much at that time that seemed directly related to psychology. What psychology he was involved with was called individual psychology, and in this research he extensively documented large samples of human traits (i.e. differences) (Boring, 1950). Galton published his research in 1883 called Inquires into human faculty and development. In this publication Galton argued that his research demonstrated that no differences existed between the world’s various religious populations, and that in fact the only scientifically demonstrable differences between any two human groups were between men and women, where men were inferior in almost every way (Boring, 1950).

Boring (1950) says that Galton hoped to displace religious dogmas with his new scientific Creed (p. 482) by which men would seek through evolutionary means to become Supermen (i.e. Nietzsche’s Zarathustra). Boring (1950) says that Galton believed that psychology was not measuring human differences that were the product of humankind’s conquer-ship of his natural plight, but were a set of defects and limitations and were ancestors of better generations (p. 483). Galton’s view of hereditary defects took the connotation of religions sin (p. 483). From this doctrine came the eugenics programs of the 20th century (Boring, 1950). In 1889, Galton published Natural inheritance that summarized his work on correlation and regression.

Galton devised the mental test to measure and document human differences (Boring, 1950) which was the forerunner of Binet’s mental tests and Stern’s introduction of the mental quotient (1911). Which became the I.Q. with the Standard Revision of the Binet Scale through Terman in 1916 (Boring, 1950). I.Q. testing and mental testing became different concepts as testing methods developed, and during WWI the United States Army implemented mental and intelligence testing to select soldiers and determine their duties (Boring, 1950).
Although Galton and Pearson are inextricably bound with the concept of correlation, in Pearson’s own 1920 memoir Notes on the History of Correlation he sates that Auguste Bravais (1811-1863) is the Father of Correlation. Pearson later said there was nothing in the work of Bravais that could not be found in Gaus (Piovani, 2008). However, Pearson published his paper in 1896 which typified the correlational methods used today (Cohen, 2001).

In 1904 Spearman developed the rank order correlation when studying human intelligence (Piovani, 2008). Pearson viciously rejected Spearman’s idea of rank order correlations on the basis that measurable things were true and continuous and therefore could not be ranked (Piovani, 2008). Pearson is also credited for inventing the Chi squared test for goodness of fit, and the concept of degrees of freedom (Stigler, 1999).

One of the early 20th century statistical inventions was William Gosset’s t-distribution published as the Student's distribution in 1908. At the time of his discovery, Gosset was an employee of Guinness Brewery who would not let their quality control secret out to the public. To get around his employer’s admonition Gosset published the t-distribution under the pseudonym Student that led to the common name Student's t-distribution (Cohen, 2001). The t-distribution is used for inference when the actual population standard deviation is not known and must be estimated. The t-distribution is used to test sample means and the likely-hood that they are from the same populations (Cohen, 2001).

The earliest statisticians like those at Delphi casted dice in fumy caves to predict the future or the meaning of certain events. They believed that improbable numerical combinations were communications from the gods. Similar oracles prevailed in the east, like the Chinese I-Ching that uses three coins. It could be said that to these cultures the gods resided in the random chance. Hope could be put into the random chance (i.e. Cardano) the same way hope could be put into a god.

Today science has methods to rid or lessen the experimental noise of its measurements, but still science explains much of their world based upon probabilities and odds. Although we have gotten very clever at disguising this, we still do this thing or that thing based upon probabilities. It’s just inescapable.
What’s important however is that by this powerful mathematic we can alleviate some of the chaos and the illusion of some static order can be gotten for the sake of measurement. Unfortunately, one never knows if chance is being eliminated when it shouldn’t be, or isn’t included when it should be. Nevertheless, statistics are the best we have to measure the world objectively, but the inferences drawn from statistics should be used with caution.

No comments:

Post a Comment