Professor Brian Blais' Blog: 2009

Tuesday, December 22, 2009

Polar Bears, Data, Opinions, and Global Warming

So I had a nice discussion with a student, who was confused after receiving some opposing viewpoints on Global Warming from different professors in a relatively short time period. In one of the statements, I had claimed that there were more polar bears now than there were 20-30 years ago. In another statement, another professor claimed that the polar bears were dying off due to arctic ice retreat due to global warming. This is actually a nice case of "follow the data", where the problem is defined in a relatively concrete way. Another example of this is the question "is it warmer now than 1000 years ago?". On the second question, www.realclimate.org would say "yes, it is warmer than 1000 years ago" and cite the proxy data, etc... www.climateaudit.org would say (I believe) "maybe, maybe not" and cite the issues with the analysis of the proxy data.

Now, back to the polar bears. From the polarbearsinternational it is confirmed that the population of polar bears from the 1960's was very low, in the few thousands, and in the early 2000s was up to between 22000-25000. These are also reflected in the usgs site on polar bears where you can actually get some of the data.

If you go to the polar bear specialist group which advises for the IPCC, you can find a table of the status of the polar bear in various regions. The problem with this table is that there is a column titled "Observed or predicted trend". Hello? Why would you mix observed trends and predicted trends in a table? Just show me the data. Anyway, there is a document here which has an explanation of the projections, and possibly some data, although I haven't read the 200 pages of the document to see if it is buried in the text (there is no figure with the data...just model predictions). I'd love to find a straightforward presentation of the estimates of the current numbers of polar bears, complete with error-bars to denote uncertainty.

It seems reasonable that with the decline of the arctic ice that the polar bear populations can be affected, some more than others, but the role of hunting (the regulation of which caused the surge in the bear numbers from the 1970s) still plays a role can is difficult to disentangle...I've seen unsubstantiated claims that the areas with the decrease are primarily hunting areas. I haven't confirmed this, but this could also be the fact that ice retreat will be more substantial in the more habitable areas, where there would be more people. Correlation does not equal cause and effect.

So, it is a fact that there are many more polar bears now than, say 30-40 years ago. It can also be true, although I have difficulty tracking the data down in a readable form, that arctic ice retreat could impact polar bear numbers adversely.

One question that I have now is, if it was warmer 1000 years ago, is there evidence that there was a significant retreat of the ice back then? If that is the case, then the polar bear scare is just that...a scare. Again, many of global warming consequences that are being reported are tied to the question of the size, extent, and effects of the Medieval Warming Period. That's why, in my opinion, that is the most important question of all.

Tuesday, December 15, 2009

Open Information, Reproducible Research, and Climategate

David Donoho, the creator of Wavelab is featured in an article about reproducible research in the journal CISE (Computing in Science and Engineering). I am struck by the resonance of a couple quotes, as they apply to Climategate and Climate Modeling:

The scientific method's central motivation is the ubiquity of error--the awareness that mistakes and self-delusion can creep in absolutely anywhere and that the scientist's effort is primarily expended in recognizing and rooting out error.

[...]

In stark contrast to the sciences relying on deduction and empiricism, computational science is far less visibly concerned with the ubiquity of error. At conferences and in publications, it's now completely acceptable for a researcher to simply say "here is what I did, and here are my results." Presenters devote almost no time to explaining why the audience should believe that they found and corrected errors in their computations. The presenter's core isn't about the struggle to root out error--as it would be in mature fields--but is instead a sales pitch...

[...]

Many users of scientific computing aren't even trying to follow a systematic, rigorous discipline that would in principle allow others to verify the claims they make. How dare we imagine that computational science, as routinely practiced, is reliable!

On ClimateAudit, there is an older article (2005) about the Hockey Stick plot. Ross McKitrick makes the suggestion of an audit panel,

A group of experts fully independent of the IPCC should
be assembled immediately after the release of any future IPCC Reports to
prepare an audit report which will be released under the imprimatur of the IPCC itself. The audit will
identify the key studies on which the Report’s conclusions have been
based, and scrutinize those studies, with a view to verifying that, at a
minimum:

data are publicly available,

The statistical methods were fully described, correctly
implemented and the computer code is published

If the findings given maximum prominence are
at odds with other published evidence, good reason is provided in the
text as to why these findings have been given prominence.

Any competent scientist can assess these things. My strong
recommendation is that such a panel be
drawn from the ranks of competent mathematicians, statisticians,
physicists and computer scientists
outside the climatology profession, to prevent the conflict of
interest that arises because
climatologists face career repercussions from
publicly criticizing the IPCC. Also, participation should
exclude officials from environment ministries,
because of the conflict of interest entailed in the fact
that environment ministries are the main financial
beneficiaries of the promotion of global warming fears.

The second recommendation is for a "counter-weight panel", whose job would be to actively try to find holes in the analysis, assumptions, etc...

I'm not sure how I feel about the second one (I'll have to think about it), but the audit panel to me makes total sense. Why don't the scientific journals do this as a matter of policy?

Saturday, December 12, 2009

Word of the Day: Micromort

I'm not starting a new "word of the day" series, but I did learn this work today from the following video on risk analysis:

It's an entertaining introduction to risk analysis, and they use the word micromort, referring to a measurement of the quantity of an action which gives you a 1 in a million chance of dying. The website is www.understandinguncertainty.org.

Thursday, December 10, 2009

Climate Change Denial is not the same as Evolution Denial

Articles like this one and this one and others make the comparison between, what they call, global warming deniers and evolution deniers (aka Intelligent Design (ID) proponents), and even holocaust deniers. Personally, I find these comparisons misleading and dangerous. It is true that there are some who believe that climate is not changing at all, and that flies in the face of all of our knowledge of climate, weather, and the Earth system. Then there are those, like in climateaudit.org, that criticize the statistics of the data, and the possible false conclusions that can arise from that, and the lack of transparency on such an important topic. Lumping them in with the ID crowd is just ridiculous.

Why is the scientific consensus on Anthropomorphic Global Warming (AGW) different than the scientific consensus on Evolution? Let me list some of the ways:

Evolution has many independent, very different, lines of evidence (fossils, embryology, immunology, molecular biology, paleontology, etc...). AGW has at best 50-100 different data sets, from the dozen or so tree rings, to the dozen or so ice cores, satellite and surface temperature records. Much of our inference comes from computer simulations, that a very few completely understand. Much of global warming consensus comes from a small minority that are directly involved with the data or the simulations.
We can control aspects of evolution. With knowledge of DNA, we can make genetically modified foods, we can change the course of diseases, and breed bacteria to eat nylon. Our understanding of AGW is at such a low level that we can only possibly control the climate at the grossest level. Our lack of understanding of feedback loops prevents even the most basic possible control of the system.
Although evolution occurs on long time scales, we can see its action on the small scale. AGW also occurs on longish time scales, but there is no short-term equivalent. This adds to our level of control (with evolution), or lack of it with AGW.
Those that are denying evolution want to replace it with something that violates not just evolution, but all of physics, chemistry, astronomy...pretty much all of science. Although the extremists in the anti-global warming camp can seem pretty anti-science, they aren't trying to replace global warming with something that violates all of science (they still might be wrong!). There is also a much more nuanced camp that admits that the planet is warmer, but perhaps it is not as special as the AGW theory would suggest, and that draconian CO2 policies are unwarranted given the uncertainties. This puts it on a very different scale than the anti-evolution group.

It is dangerous to make the comparison. One is partly the demonizing of your opponent and, at the same time, angelizing (is that a word? :) ) ones own perspective: by saying that the AGW deniers are just like the evolution deniers, both makes the deniers seem unreasonable, but by association, implies that AGW is as solid as evolution. This latter claim, despite the claims of its proponents, is definitely hyperbole.

Wednesday, December 9, 2009

Climategate, oh my!

I've been reading a lot about Climategate, and have a few comments now, and hopefully more to come. What sparked this current thread of thinking for me was this post over in the Statistical Modeling blog. He summarized the physicists perspective on the "settled science" in a nice way:

The evidence for anthropogenic (that is, human-caused) global warming is strong, comes from many sources, and has been subject to much scientific scrutiny. Plenty of data are freely available. The basic principles can be understood by just about anyone, and first- and second-order calculations can be perfomed by any physics grad student. Given these facts, questioning the occurrence of anthropogenic global warming seems crazy. (Predicting the details is much, much more complicated). And yet, I have seen discussions, articles, and blog posts from smart, educated people who seem to think that anthropogenic climate change is somehow called into question by the facts that (1) some scientists really, deeply believe that global warming skeptics are wrong in their analyses and should be shut out of the scientific discussion of global warming, and (2) one scientist may have fiddled with some of the numbers in making one of his plots. This is enough to make you skeptical of the whole scientific basis of global warming? Really?

I would love to go point by point in this quote and show the calculations, and I'd imagine that it would get stymied once I tried to put in the water vapor feedback. I need to read more about this, because from what I've read we don't understand the magnitude, or sign, of the cloud feedback and that it could easily wipe out any warming caused by CO₂ increases.

Some of the comments are very good too, like:

A. zarkov: I'm really disappointed to see you engage in the usual group think about global warming. Have you read the Wegman report? How come you don't refer people to ClimateAudit for the other side of the debate? Did you know that Michael Mann had to be forced by Congress to provide the data and codes behind the hockey stick calculation? ClimateAudit give you everything, the data and the R code they use. The other side stonewalls, and no wonder-- their results are a fraud.

The blog he refers to, climateaudit.org, is very interesting and is exactly the way the commenter says: they are all for open information. They post the data, the code, everything right up front and simply ask everyone else to do the same. Why this isn't required for all scientific publications, I don't know. Why it is not required for all high-stakes publications (ones that could result in very high-stakes policy) I don't know either. It's a travesty.

If everyone were as open about the data and the code, Climategate couldn't have happened.

One final comment on this thread:

Radford, Neal: Few people ever disputed that the current temperatures are higher than those of earlier times back to four hundred years ago. The big issue has always been whether the Medieval Warm Period (usually seen as occuring around a thousand years ago) was warmer than at present, since if it was, that makes the present warming seem not so unusal and perhaps due to natural causes.

This is my point too: if it was warmer 1000 years ago, then the hysterical language of the global warming media is completely unjustified.

From RealClimate.org:

Phil Jones in discussing the presentation of temperature reconstructions stated that “I’ve just completed Mike’s Nature trick of adding in the real temps to each series for the last 20 years (ie from 1981 onwards) and from 1961 for Keith’s to hide the decline.” The paper in question is the Mann, Bradley and Hughes (1998) Nature paper on the original multiproxy temperature reconstruction, and the ‘trick’ is just to plot the instrumental records along with reconstruction so that the context of the recent warming is clear. Scientists often use the term “trick” to refer to a “a good way to deal with a problem”, rather than something that is “secret”, and so there is nothing problematic in this at all. As for the ‘decline’, it is well known that Keith Briffa’s maximum latewood tree ring density proxy diverges from the temperature records after 1960 (this is more commonly known as the “divergence problem”–see e.g. the recent discussion in this paper) and has been discussed in the literature since Briffa et al in Nature in 1998 (Nature, 391, 678-682). Those authors have always recommend not using the post 1960 part of their reconstruction, and so while ‘hiding’ is probably a poor choice of words (since it is ‘hidden’ in plain sight), not using the data in the plot is completely appropriate, as is further research to understand why this happens.

A must read is the article by David Holland,
which outlines the problems with the hockey-stick analysis. He explains the divergence problem, and the
history of all this far better than I can summarize here.

Sunday, December 6, 2009

Universality of Religion

On the Effect Measure blog, there is post about "Freethinker Sunday Sermonette: Dawkins on evolution and religion", with the following video:

I am struck by a few things.

First, in the blog post he mentions:
It assumes that all things we call religion or religious impulses are essentially the same or have some common core. This faces the philosophical problem of properties and propositions in general. For example, take the property of redness. Is there something that all objects we call red have in common? And if there is, is this the same kind of thing we call religious belief?

In fact there is something common to all things red: the wavelengths of light that are absorbed. I think what he is asking is whether we experience red in the same way as our friend. In fact, it is quite likely, and it is not a philosophical idea at all. It seems to me more and more that philosophy tries to handle questions that are out of reach for science (for the moment) but the solutions found in philosophy evaporate or are insubstantial once we really understand what is going on. Tom Mitchell has done some very interesting work with looking at fMRI data in his "Brains, Meaning, and Corpus Statistics" talk (talk slides on his home page).

In the work, he compares fMRI data from different individuals, and finds that he can correctly identify images and words from brain activity of one person, using the associations between the images and words derived from the brain activity of other people. This strongly suggests that the internal representations of words and concepts may be very similar between individuals. Not only that, but that we have the possibility of determining what those are and not just leave it up to philosophical ruminations.

Dawkins mentions belief in authorities as a psychological tendency that may lead to religious thinking under the right circumstances. I would further add the brain's tendency for seeing patterns where there are none as the other piece of the religious-thinking puzzle. It is evolutionary advantageous to see tigers where there are none as opposed to not seeing tigers where there are some. Not all errors are equally costly. Religious interpretation of experience seems to me to easily follow from these sorts of errors.

Monday, November 30, 2009

A great observation

So this post from Laura Wattenberg's "Baby Name Wizard" blog has a wonderful observation:

Here's a little pet peeve of mine: nothing rhymes with orange. You've heard that before, right? Orange is famous for its rhymelessness. There's even a comic strip called "Rhymes with Orange." Fine then, let me ask you something. What the heck rhymes with purple?

If you stop and think about it, you'll find that English is jam-packed with rhymeless common words. What rhymes with empty, or olive, or silver, or circle? You can even find plenty of one-syllable words like wolf, bulb, and beige. Yet orange somehow became notorious for its rhymelessness, with the curious result that people now assume its status is unique.

I was directed to the quote by Andrew Gelman's Statistical Modeling blog, and he has other posts about names and sounds.

Information

It all started with "information is beautiful", with this post about the 2012 non-issue, and then it went to this interesting figure which I am pondering...not sure if the labels are correct. Finally I was pointed to infochimps, a collection of data which I need to look at more closely.

We are in the information age, and I'd love the way the information in the first two links is portrayed, and I plan on playing in the infochimps site more.

Wednesday, November 18, 2009

Autotune, Science, and Creativity

I was just introduced to a very creative project called the Symphony of Science. It uses a technology called Autotune, traditionally used to keep lousy pop singers in key. In real-time it adjusts the pitch of the singer, so what comes out of the speakers is the correct frequency. Technically, it's very challenging to modify the waveform consistently, and in real-time.

Anyway, this particular project takes famous scientists, and puts their speech to music. The music is catchy, and really captures well the philosophies of Carl Sagan and others. It just makes me realize how much I miss Carl Sagan, which then makes me miss Stephen Jay Gould and E.T. Jaynes.

Saturday, November 14, 2009

Yet another reason to leave Iraq...dowsing for bombs

So it seems as if the Iraqi police are dowsing for bombs and firearms, as covered in the New York Times and James Randi. They are doing this using a device they've purchased to the tune of $80 million, and it works no better using a coat hanger!

I think the best quote which summarizes the danger and the bad thinking is:

Major General Jehad al-Jabiri is head of the Ministry of the Interior’s General Directorate for Combating Explosives. “I don’t care about Sandia or the Department of Justice or any of them,” he says. “Whether it’s magic or scientific, what I care about is it detects bombs.”

Of course, if it hasn't been shown to detect bombs by science, the other option is irrelevant.

Monday, October 12, 2009

Netflix prize

Fortune magazine had a recent article on the Netflix prize. A very interesting competition to increase their correct recommendation percentage by 10%. I was struck by a comment at the bottom by a reader of the article:

If it is true what the article says, that the BellKor team’s algorithms used information such as "genre" and "actors", then they plainly cheated!
This information was not available in the contest. The contest data had only 4 pieces of information: people ID, movie, date, and the rating values.
If BellKor managed to get extra information about the people and movies, it is hard to understand how NetFlix overlooked it. Therefore I'm quite sure that the article got the story wrong – it is too extreme to be believable. But if the article is right, then the winning team used crooked means to get the prize and its 'solution' does not worth much. Hope it is the first option…

I remember looking at the data, and if I recall correctly, I remember a script to add to the database from IMDB. I assume this was allowed, because otherwise there would have been no way to get 10%.

Saturday, October 10, 2009

Disturbing video misusing Einstein's name

I saw this video posted on facebook The link to one rebuttal, which links to the original, is here:

http://filipinofreethinkers.org/2009/10/08/does-god-exist-video-campaign-refuted/

I wrote as a comment to the person posting it:

Actually, this is not a true story, nor does it capture even in the slightest way the views of Einstein on religion (see http://www.einsteinandreligion.com/).

Although it is a cute video, there are logical flaws such as the fact that neither "hot" nor "cold" exists except as labels on "temperature", neither label with any special role.

I'm mostly disturbed by the fact that this Christian organization is trying to legitimize a flawed argument, by falsely attributing it to a very famous non-Christian, and nearly non-religious, scientist.

Wednesday, September 16, 2009

Recovery from Vision

The following article talks about recovery of vision from people who were blind from birth. A couple of things really jumped out at me.

1.

S.K. could identify some shapes (triangles, squares, etc.) when they were side-by-side, but not when they overlapped. His brain was unable to distinguish the outlines of a whole shape; instead, he believed that each fragment of a shape was its own whole. For S.K. and other patients like him, "it seems like the world has been broken into many different pieces," says Sinha.

However, if a square or triangle was put into motion, S.K. (and the other two patients) could much more easily identify it. (With motion, their success rates improved from close to zero to around 75 percent.) Furthermore, motility of objects greatly influenced the patients' ability to recognize them in images.

This is very easily interpreted using the HTM framework. It's very interesting!

Probability Problems and Simulation

There are a number of classic probability problems that challenge the intuition, both for students and for teachers. I have found that one way to overcome this intuition block is to write a quick simulation. A good example is the classic evil probability problem of the Monty Hall. The derivation of the solution is straightforward, but it is easy to convince yourself of the wrong answer. A quick simulation, like the one below, makes it clear: 1/3 of the time the host gets a choice with which door to open, and 2/3 of the time the host has no choice - with the other door having the prize. I find a numerical simulation helps to bolster my confidence in a mathematical analysis, especially when it is particularly unintuitive.

from random import randint
import random
turn=0
win=0
human=False

while turn<50:

   prize=randint(1,3)
   door_choices=[1,2,3]

   if human:
       your_first_answer=input('Which door %s? ' % str(door_choices))
   else:   # automatic
       your_first_answer=random.choice(door_choices)

   if prize==your_first_answer:  # happens 1/3 of the time
       door_choices.remove(your_first_answer)  # get the other two
       door_choices=sorted([your_first_answer,
                            random.choice(door_choices)])
   else:
       door_choices=sorted([prize,your_first_answer])

   if human:
       your_second_answer=input('Which door %s? ' % str(door_choices))
   else:   # automatic

       # always switch
        if door_choices[0]==your_first_answer:
           your_second_answer=door_choices[1]
       else:
           your_second_answer=door_choices[0]

   if your_second_answer==prize:
       print "You win!"
       win+=1
   else:
       print "You Lose!"

   turn+=1


print "Winning percentage: ",float(win)/turn*100

Wednesday, September 9, 2009

Frequentist thinking, or just bad math?

In Steven Pinker's excellent book "How the Mind Works", he describes how people are bad at probability assessments, but are much better at frequency assessments (pg 348). It almost comes out and says that the brain is frequentist and not Bayesian, and it certainly implies it. He outlines how badly people do on the classic rare disease problem: "frequency of a disease is 0.01%, you take a test that is 99.99% accurate (false positives at 0.01%), you test positive. What is your chance of having the disease". People, even educated people (even in the medical fields) get this one wrong a lot.

Pinker contrasts this with "Think of 10,000 people, so we expect 1 to be infected and 9,999 to be not infected. You take the test, and the 1 person infected will almost certainly test positive, and we expect 1 person out of the 9,999 to test positive as well. We know that you tested positive, so what is your chance of having the disease?"

He claims that people are much better at getting the answer right. In my view, this is less about being good at calculating frequencies, and more about being bad at math. The second way of describing the problem pretty much sets up and carries out all of the "difficult" math, and then rounds so that all you have are small integer values. People do much better with that. If you want an example, not in probability, you can read my paper on "Teaching Energy Balance using Round Numbers: A Quantitative Approach to the Greenhouse Effect and Global Warming", which was motivated by the Weight Watchers system.

In the weight watchers system, counting calories (215+340+...) is replaced by dividing by 50 and rounding (4+7+...). Same result, but small numbers are easier to work with.

Wednesday, September 2, 2009

A nice series to look at

Bill Harris has a nice blog entry about Bayesian versus Classical stats. I'd like to go through the rest of these posts, because I think there is some great stuff in there.

Wednesday, August 12, 2009

230 miles per gallon!

So this story today about the new GM Volt to come out that supposedly gets 230 mpg. My first thought when reading it is that they are touting it as fossil fuel saver, and I'm skeptical. Essentially you are replacing one fossil fuel with another (whatever the electric company uses versus gasoline). There is no guarantee that the electric company will use less fossil fuel than you would use, say, per mile of driving. And there is no guarantee that the electric company will use *cleaner* fuel than gasoline. Plus, batteries tend to be notoriously difficult to dispose of, and are an environmental hazard themselves when thrown out.

One advantage to replacing mobile power (e.g. gasoline) with stationary power (e.g. electric company) is that one can replace the stationary power with nuclear, which you just can't do with cars.

It is claimed that it will be cheaper for end users, "In Detroit, where off-peak electricity rates are 5 cents a kilowatt hours, it will cost about 40 cents to recharge batteries overnight." One does have to factor in the maintenance costs, battery replacement, and battery disposal costs into the cost of owning this new car.

Now, I am not against this development, as such, but it is important (as always) to be aware of both the benefits and the costs of the new technology. It is far too easy to read one side of the equation, pat oneself on the back, while ignoring the hidden costs.

Thursday, July 30, 2009

Free will

After a discussion with a friend about Nostradamus, I realized that the existence of prophets conflicts with the idea of free will: if the future is written in such a way that we can make definite predictions years ahead of time, then the choices of people can mean nothing...they are thus not free. Perhaps this is true, but I find it interesting that Christianity (and probably other religions) has free will as a basic axiom, and yet prophets are a common and fill an important component of the faith!

Friday, July 24, 2009

neat way to introduce programming

Just came upon this post which describes a nice analogy between programming and Dr Seuss' Sneetch star-on and star-off machines. A modified version might be useful even for older students.

another post here, although written in spanish, has a number of interesting LOGO (or python turtle) example exercises.

Silly challenge to silly statement

There is a new Challenge to Global Warming Skeptics by the FiveThirtyEight statisticians, who did such a good job with the Obama-McCain forecasts. The challenge is summed up by:

"For each day that the high temperature in your hometown is at least 1 degree Fahrenheit above average, as listed by Weather Underground, you owe me $25. For each day that it is at least 1 degree Fahrenheit below average, I owe you $25."

He's trying to address recent statements by some conservatives, paraphrased as "It's cold this summer here in Minneapolis, so global warming must be wrong." That's a bit of a strawman, but from the Power Line blog post, there really is this sense of local vs global perspective.

Well, it's actually a pretty silly challenge to a pretty silly statement. No serious GW skeptic I've heard contests that their is warming on a global scale, but argues against the magnitude or, more commonly, the cause of the warming (human vs not). The statistical challenge here only addresses whether there is warming, and even there is rigged to win even if there were no real global warming, because of the urban heat island effect. Most of the thermometers started out in rural areas, or in fields outside of towns, and cities were built around them. Areas around pavement are warmer than the surrounding areas, so there would be a measured warming trend due to development, not due to atmospherics.

A better bet would involve predicting the global temperature for, say, 5 years from now (along with the uncertainty). Each side puts in their prediction, and pays $1 times the ratio of the posterior probabilities for the two models, P(M₁)/P(M₂). Would anyone take a bet like that?

Monday, July 20, 2009

A quick comment on error

I read this article in the Week magazine, concerning the upcoming census. I plan to look at statistical sampling later, but I was struck by the following:

Because supporters and opponents tend to break down along partisan lines. Democrats favor sampling because the people who are traditionally hardest to count are the urban poor, minorities, and immigrants, all of whom tend to live in Democratic strongholds and vote Democratic. These groups are often undercounted because they move so frequently and do not trust government employees asking questions. Republicans, by contrast, stress that the Constitution specifies an “actual enumeration” of the population, not an estimate. They also argue that statistical sampling is inferior to counting. “Anyone familiar with public opinion polling can tell you that statistical sampling carries a margin of error,” Republican Reps. Darrell Issa and Patrick McHenry recently wrote. “And error is the enemy of a full and accurate census.”

The notion that a national count is completely error free is ridiculous. I think everyone would agree that if you do a count, that you will not get everyone. It is known that mistakes are made, omissions occur, and that some people actively avoid the census. Because the census avoidance is not random, the omissions are biased in some way. One can argue in which ways the bias points, but the bias is there.

So what is the best plan of action in this case? You want to make an estimate of the number of people in the country. "Estimate" is the correct word, even for an enumeration, given the fact that the enumeration is known to be incomplete. The best thing to do, then, is to have a public and open statistical model of the process of sampling, with independent ways to confirm the validity of the model. If the model is simple, and open, it would be difficult to argue against. Without this approach, a "strict enumeration" is really an unstated statistical model where the assumptions are very difficult to see.

Sunday, July 19, 2009

Laplace and the Divine

In a previous post I used the word "God" in quotes, when referring to Laplace's view of determinism. This was done because Laplace himself did not believe in God, and I used the term as a convenience to represent a hypothetical all-knowing being. The clearest view of Laplace's perspective comes from an interaction with Napoleon. After reading Laplace's Mécanique céleste, Napoleon asked him about the lack of the reference to God anywhere in the work. Laplace responded that he had no need for that hypothesis.

In a strict way, this is an agnostic perspective. The description of the universe, as described by Laplace, does not need to use the concept of God in any way. This does not disprove the existence of God, or even deny God's existence. It merely states that the concept of God is not needed. This is the pure vision of science, and why science does not necessarily conflict with religion. However, there could be certain claims from specific religions that conflict with science. The 6000 year old Earth, part of some fundamentalist Christian beliefs, is one example. The God as the mystery in the Universe is not something that can conflict with science.

Saturday, July 18, 2009

Misunderstanding Laplace

I finished Leonard Mlodinow's "The Drunkard's Walk: How Randomness Rules Our Lives" this past week, and have a couple of thoughts related to it.

In Chapter 10 he quotes Laplace:

"We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect were also vast enough to submit these data to analysis, it would embrace in a single formula the movements of the greatest bodies of the universe and those of the tiniest atom; for such an intellect nothing would be uncertain and the future just like the past would be present before its eyes."

and Mlodinow states that this is an expression of determinism. He then further states

But for Laplace's dream to hold true, several conditions must be met. First, the laws of nature must dictate a definite future, and we must know those laws. Second, we must have access to data that completely describe the system of interest, allowing no unforeseen influences. Finally, we must have sufficient intelligence or computing power to be able to decide what, given the data about the present, the laws say the future will hold.

He then criticizes it with the following three problems:

society is not governed (as far as we know) by definite and fundamental laws in the way physics is
like Lorenz, we cannot obtain the precise data necessary for making predictions
human affairs are so complex that it is doubtful we'd be able to make the calculations anyway

He concludes "as a result, determinism is a poor model for the human experience." His point seems to be, in some ways, obvious and in other ways irrelevant.

Laplace was simply saying that "God" would not find anything random, because of complete knowledge. The connection between knowledge and inference, which probability theory affords, was worked out by Laplace in great detail and it known to use today as Bayesian inference. The structure of Bayesian inference describes randomness simply as the product of our ignorance of the model, the parameters, the initial conditions, the measurement details, etc... Laplace was simply saying that with perfect knowledge, there is no randomness. E.T. Jaynes would describe the "random process" as a mind-projection fallacy: you have ignorance of the system, so you attribute its unpredictable behavior as a product of the system itself. A rolled die is following Newton's Laws, deterministically, and detailed knowledge of the die and the roll and the surface should allow you to predict 100% of the time what it will do. We lack that knowledge, thus the behavior becomes unpredictable. We often then attribute that unpredictable behavior as a "random die", as if it were the die that contains the randomness and not our own ignorance.

Bringing in Lorenz, and chaos theory, is irrelevant here. Lorenz's systems were completely deterministic, and it is theoretically possible for a being to know the state of the system out to a sufficient number of decimal places to provide any particularly set level of uncertainty in the system. With the quantization of states, it then becomes possible to know *exactly* what state something is in. Of course, quantum mechanics is a two-edged sword in this example: it solves the chaos problem, but adds an inherent, physical, randomness to the system which is very peculiar.

The problem with Mlodinow, it seems, is that he hold human activity to be a bit too special. We are, after all, made up of atoms and would thus be governed by the laws of physics. Certainly it would be too complex to handle, for us, but Laplace was not talking about us in his quote, or at least not us right now or in the near future.

Friday, July 10, 2009

Homeopathic "Medicine"

Homeopathic medicine "is a form of alternative medicine, first expounded by German physician Samuel Hahnemann in 1796, that treats patients with heavily diluted preparations which are thought to cause effects similar to the symptoms presented." (http://en.wikipedia.org/wiki/Homeopathy). I saw the mock video below on the skeptics blog, http://skepticblog.org/, and thought it was so amusing I had to post it here. I've known people who swear by this stuff, which is unfortunate. Treatments like these, that don't work, are dangerous and can actually kill people by diverting them from legitimate treatment. A good analysis of this is at http://www.csicop.org/si/9709/park.html.

The video, however, is quite amusing.

Wednesday, July 8, 2009

"Erroneous" Probabilistic Reasoning

I've been reading Leonard Mlodinow's "The Drunkard's Walk: How Randomness Rules Our Lives", and he describes a set of experiments which I had heard of before but never gave too much thought to. The experiments deal with people making probability assessments about a series of statements. The experiments were done by Daniel Kahneman and Amos Tversky[cite here]. It starts with a description:

Imagine a woman named Linda, thirty-two years old, single, out-spoken, and very bright. In college she majored in philosophy. While a student she was deeply concerned with discrimination and social justice and participated in antinuclear demonstrations.

They then ask for a ranking of most (1) to least (8) probable for a number of statements. The interesting three statements are:

Linda is active in the feminist movement: 2.1
Linda is a bank teller and is active in the feminist movement: 4.1
Linda is a bank teller: 6.2

This is then used to say that people do not figure probabilities correctly because "the probability that two events will both occur can never be greater than the probability that each will occur individually" (italics in original).

The book reports that "even highly trained doctors make this error", with the following example.

They presented a group of internists with a serious medical problem: a pulmonary embolism (a blood clot in the lung). If you have that ailment, you might display one or more of a set of symptoms. Some of those symptoms, such as partial paralysis, are uncommon; others, such as shortness of breath, are probable. Which is more likely: that the victim of an embolism will experience partial paralysis or that the victim will experience both partial paralysis and shortness of breath? Kahneman and Tversky found that 91 percent of the doctors believed a clot was less likely to cause just a rare symptom than it was to cause a combination of the rare symptom and a common one. (In the doctor's defense, patients don't walk into their offices and say things like "I have a blood clot in my lungs. Guess my symptoms."

Now, I haven't read past this point, or the original study, so take what I say here with a grain of salt. I wanted to put down my thoughts on these observations before going on to read the study's conclusions. Perhaps what I say now will be inconsistent with other aspects of the studies, or further data.

I do not think that one should conclude poor reasoning in these examples.

I believe there are two things going on here. One is a property of the English language, and the other is a property of human reasoning. In English, if I were to say "Do you want steak for dinner, or steak and potatoes?" one would immediately parse this as choice between

steak with no potatoes
steak with potatoes

Although strict logic would have it otherwise, it is common in English to have the implied negative when given a choice like this. If we interpret the doctor's choice, we have:

clot with paralysis and shortness of breath
clot with paralysis and no shortness of breath

the second one is much less likely, because it would be odd to have a clot and not have a very common symptom associated with it. It is less clear in Linda's case, but I think the same reasoning applies there. What is interesting is that the error is not seen in ranking statements which have nothing to do with the given knowledge about Linda, such as:

Linda owns an IHOP franchise
Linda had a sex-change and is now Larry
Linda had a sex-change and is now Larry and owns an IHOP franchise

There might be something to being completely unrelated that changes the interpretation of the English sentence, and makes it a bit more formal, closer to the mathematical reasoning. I am not sure what types of statements would do this, but it is a bit challenging to disentangle subtle language interpretations I think.

When reading these experiments, I recalled a description from E.T. Jaynes about people receiving the same new information, but updating their knowledge in a diverging way, due to differences in their prior information. I think something like that could be going on here. What I mean is, when doctors are asked: "Which is more likely: that the victim of an embolism will experience partial paralysis or that the victim will experience both partial paralysis and shortness of breath?" it is interpreted as:

someone is claiming that the patient has an embolism
the patient is claiming, or someone has measured, that she has partial paralysis
the patient is claiming, or someone has measured, that she has shortness of breath

I don't believe the doctors are separating the analysis of the claim of the clot, which is given information, from the other claims. As Mlodinow admits, the situation where one knows the diagnosis is practically never encountered, so the doctors are really assessing the truthfulness of the existence of the clot. Because of this, the implied negative in (2) above (i.e. paralysis with no shortness of breath) is even stronger.

Another way of looking at it is to include the knowledge of the method of reporting. Someone who is reporting information about an ailment will report all of the information accessible to them. By reporting only the paralysis, there are two possibilities concerning the person measuring the symptoms of the patient:

they had the means to measure shortness breath in the patient, but there was none
they did not have the means to measure shortness of breath

In the first case, the doctor's probability assessment is absolutely correct: both symptoms together are more likely than just one. In the second case, the doctors are also correct: one of the sets of diagnostic results (i.e. just paralysis) is less dependable than the other set (i.e. both symptoms), thus the second one is more likely to indicate a clot or is consistent with the known clot.

It isn't that the doctors are reasoning incorrectly. They are including more information, and doing a more sophisticated inference than the strict, formal, minimalistic interpretation of the statements would lead one to do.

This analysis works well for other examples stated in the book, like "Is it more probable that the president will increase federal aid to education or that he or she will increase federal aid to education with function freed by cutting other aid to states?".

Now I have to continue reading the book, and track down the study, to see if any of these thoughts pan out.

Tuesday, July 7, 2009

A Little Geometry

I've just finished the book "Euclid's Window" by Leonard Mlodinow, and really enjoyed it. The book describe the history of geometry from Euclid, Descartes, Gauss, and Einsten. During his coverage of Euclid he presents a simple proof of the Pythagorean Theorem that really resonated with me. I don't recall ever seeing a proof of it, or at least no memorable proof. This one uses a minimum of jargon and formality...you just draw the picture, discuss it for a bit, and you see it!

You start with a right triangle, like:

and you make two constructions, from a square with sides a+b. The first construction looks like:

which, by eye, you can see that the total area of the square is the area of 4 triangles (just like our original) plus the area of the inner square, which is c*c (which reminds me that I have to figure out how to do superscripts and subscripts in this blog. :) )

The second construction is nearly the same as the first, and looks like:

which, again by eye (with a little shading to make it a bit more obvious), the total area of the square is the area of 4 triangles (just like our original) plus the area of the two inner squares, which are a*a and b*b. Therefore:

a*a+b*b=c*c

for any triangle for which you can make this construction, which are right triangles.

Really neat!

Beginnings

Although I seem to have missed the big blog burst, I am starting one now to include my musings on various topics of my interest, from statistics and probability to physics and history.