Professor Brian Blais' Blog: July 2009

Thursday, July 30, 2009

Free will

After a discussion with a friend about Nostradamus, I realized that the existence of prophets conflicts with the idea of free will: if the future is written in such a way that we can make definite predictions years ahead of time, then the choices of people can mean nothing...they are thus not free. Perhaps this is true, but I find it interesting that Christianity (and probably other religions) has free will as a basic axiom, and yet prophets are a common and fill an important component of the faith!

Friday, July 24, 2009

neat way to introduce programming

Just came upon this post which describes a nice analogy between programming and Dr Seuss' Sneetch star-on and star-off machines. A modified version might be useful even for older students.

another post here, although written in spanish, has a number of interesting LOGO (or python turtle) example exercises.

Silly challenge to silly statement

There is a new Challenge to Global Warming Skeptics by the FiveThirtyEight statisticians, who did such a good job with the Obama-McCain forecasts. The challenge is summed up by:

"For each day that the high temperature in your hometown is at least 1 degree Fahrenheit above average, as listed by Weather Underground, you owe me $25. For each day that it is at least 1 degree Fahrenheit below average, I owe you $25."

He's trying to address recent statements by some conservatives, paraphrased as "It's cold this summer here in Minneapolis, so global warming must be wrong." That's a bit of a strawman, but from the Power Line blog post, there really is this sense of local vs global perspective.

Well, it's actually a pretty silly challenge to a pretty silly statement. No serious GW skeptic I've heard contests that their is warming on a global scale, but argues against the magnitude or, more commonly, the cause of the warming (human vs not). The statistical challenge here only addresses whether there is warming, and even there is rigged to win even if there were no real global warming, because of the urban heat island effect. Most of the thermometers started out in rural areas, or in fields outside of towns, and cities were built around them. Areas around pavement are warmer than the surrounding areas, so there would be a measured warming trend due to development, not due to atmospherics.

A better bet would involve predicting the global temperature for, say, 5 years from now (along with the uncertainty). Each side puts in their prediction, and pays $1 times the ratio of the posterior probabilities for the two models, P(M₁)/P(M₂). Would anyone take a bet like that?

Monday, July 20, 2009

A quick comment on error

I read this article in the Week magazine, concerning the upcoming census. I plan to look at statistical sampling later, but I was struck by the following:

Because supporters and opponents tend to break down along partisan lines. Democrats favor sampling because the people who are traditionally hardest to count are the urban poor, minorities, and immigrants, all of whom tend to live in Democratic strongholds and vote Democratic. These groups are often undercounted because they move so frequently and do not trust government employees asking questions. Republicans, by contrast, stress that the Constitution specifies an “actual enumeration” of the population, not an estimate. They also argue that statistical sampling is inferior to counting. “Anyone familiar with public opinion polling can tell you that statistical sampling carries a margin of error,” Republican Reps. Darrell Issa and Patrick McHenry recently wrote. “And error is the enemy of a full and accurate census.”

The notion that a national count is completely error free is ridiculous. I think everyone would agree that if you do a count, that you will not get everyone. It is known that mistakes are made, omissions occur, and that some people actively avoid the census. Because the census avoidance is not random, the omissions are biased in some way. One can argue in which ways the bias points, but the bias is there.

So what is the best plan of action in this case? You want to make an estimate of the number of people in the country. "Estimate" is the correct word, even for an enumeration, given the fact that the enumeration is known to be incomplete. The best thing to do, then, is to have a public and open statistical model of the process of sampling, with independent ways to confirm the validity of the model. If the model is simple, and open, it would be difficult to argue against. Without this approach, a "strict enumeration" is really an unstated statistical model where the assumptions are very difficult to see.

Sunday, July 19, 2009

Laplace and the Divine

In a previous post I used the word "God" in quotes, when referring to Laplace's view of determinism. This was done because Laplace himself did not believe in God, and I used the term as a convenience to represent a hypothetical all-knowing being. The clearest view of Laplace's perspective comes from an interaction with Napoleon. After reading Laplace's Mécanique céleste, Napoleon asked him about the lack of the reference to God anywhere in the work. Laplace responded that he had no need for that hypothesis.

In a strict way, this is an agnostic perspective. The description of the universe, as described by Laplace, does not need to use the concept of God in any way. This does not disprove the existence of God, or even deny God's existence. It merely states that the concept of God is not needed. This is the pure vision of science, and why science does not necessarily conflict with religion. However, there could be certain claims from specific religions that conflict with science. The 6000 year old Earth, part of some fundamentalist Christian beliefs, is one example. The God as the mystery in the Universe is not something that can conflict with science.

Saturday, July 18, 2009

Misunderstanding Laplace

I finished Leonard Mlodinow's "The Drunkard's Walk: How Randomness Rules Our Lives" this past week, and have a couple of thoughts related to it.

In Chapter 10 he quotes Laplace:

"We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect were also vast enough to submit these data to analysis, it would embrace in a single formula the movements of the greatest bodies of the universe and those of the tiniest atom; for such an intellect nothing would be uncertain and the future just like the past would be present before its eyes."

and Mlodinow states that this is an expression of determinism. He then further states

But for Laplace's dream to hold true, several conditions must be met. First, the laws of nature must dictate a definite future, and we must know those laws. Second, we must have access to data that completely describe the system of interest, allowing no unforeseen influences. Finally, we must have sufficient intelligence or computing power to be able to decide what, given the data about the present, the laws say the future will hold.

He then criticizes it with the following three problems:

society is not governed (as far as we know) by definite and fundamental laws in the way physics is
like Lorenz, we cannot obtain the precise data necessary for making predictions
human affairs are so complex that it is doubtful we'd be able to make the calculations anyway

He concludes "as a result, determinism is a poor model for the human experience." His point seems to be, in some ways, obvious and in other ways irrelevant.

Laplace was simply saying that "God" would not find anything random, because of complete knowledge. The connection between knowledge and inference, which probability theory affords, was worked out by Laplace in great detail and it known to use today as Bayesian inference. The structure of Bayesian inference describes randomness simply as the product of our ignorance of the model, the parameters, the initial conditions, the measurement details, etc... Laplace was simply saying that with perfect knowledge, there is no randomness. E.T. Jaynes would describe the "random process" as a mind-projection fallacy: you have ignorance of the system, so you attribute its unpredictable behavior as a product of the system itself. A rolled die is following Newton's Laws, deterministically, and detailed knowledge of the die and the roll and the surface should allow you to predict 100% of the time what it will do. We lack that knowledge, thus the behavior becomes unpredictable. We often then attribute that unpredictable behavior as a "random die", as if it were the die that contains the randomness and not our own ignorance.

Bringing in Lorenz, and chaos theory, is irrelevant here. Lorenz's systems were completely deterministic, and it is theoretically possible for a being to know the state of the system out to a sufficient number of decimal places to provide any particularly set level of uncertainty in the system. With the quantization of states, it then becomes possible to know *exactly* what state something is in. Of course, quantum mechanics is a two-edged sword in this example: it solves the chaos problem, but adds an inherent, physical, randomness to the system which is very peculiar.

The problem with Mlodinow, it seems, is that he hold human activity to be a bit too special. We are, after all, made up of atoms and would thus be governed by the laws of physics. Certainly it would be too complex to handle, for us, but Laplace was not talking about us in his quote, or at least not us right now or in the near future.

Friday, July 10, 2009

Homeopathic "Medicine"

Homeopathic medicine "is a form of alternative medicine, first expounded by German physician Samuel Hahnemann in 1796, that treats patients with heavily diluted preparations which are thought to cause effects similar to the symptoms presented." (http://en.wikipedia.org/wiki/Homeopathy). I saw the mock video below on the skeptics blog, http://skepticblog.org/, and thought it was so amusing I had to post it here. I've known people who swear by this stuff, which is unfortunate. Treatments like these, that don't work, are dangerous and can actually kill people by diverting them from legitimate treatment. A good analysis of this is at http://www.csicop.org/si/9709/park.html.

The video, however, is quite amusing.

Wednesday, July 8, 2009

"Erroneous" Probabilistic Reasoning

I've been reading Leonard Mlodinow's "The Drunkard's Walk: How Randomness Rules Our Lives", and he describes a set of experiments which I had heard of before but never gave too much thought to. The experiments deal with people making probability assessments about a series of statements. The experiments were done by Daniel Kahneman and Amos Tversky[cite here]. It starts with a description:

Imagine a woman named Linda, thirty-two years old, single, out-spoken, and very bright. In college she majored in philosophy. While a student she was deeply concerned with discrimination and social justice and participated in antinuclear demonstrations.

They then ask for a ranking of most (1) to least (8) probable for a number of statements. The interesting three statements are:

Linda is active in the feminist movement: 2.1
Linda is a bank teller and is active in the feminist movement: 4.1
Linda is a bank teller: 6.2

This is then used to say that people do not figure probabilities correctly because "the probability that two events will both occur can never be greater than the probability that each will occur individually" (italics in original).

The book reports that "even highly trained doctors make this error", with the following example.

They presented a group of internists with a serious medical problem: a pulmonary embolism (a blood clot in the lung). If you have that ailment, you might display one or more of a set of symptoms. Some of those symptoms, such as partial paralysis, are uncommon; others, such as shortness of breath, are probable. Which is more likely: that the victim of an embolism will experience partial paralysis or that the victim will experience both partial paralysis and shortness of breath? Kahneman and Tversky found that 91 percent of the doctors believed a clot was less likely to cause just a rare symptom than it was to cause a combination of the rare symptom and a common one. (In the doctor's defense, patients don't walk into their offices and say things like "I have a blood clot in my lungs. Guess my symptoms."

Now, I haven't read past this point, or the original study, so take what I say here with a grain of salt. I wanted to put down my thoughts on these observations before going on to read the study's conclusions. Perhaps what I say now will be inconsistent with other aspects of the studies, or further data.

I do not think that one should conclude poor reasoning in these examples.

I believe there are two things going on here. One is a property of the English language, and the other is a property of human reasoning. In English, if I were to say "Do you want steak for dinner, or steak and potatoes?" one would immediately parse this as choice between

steak with no potatoes
steak with potatoes

Although strict logic would have it otherwise, it is common in English to have the implied negative when given a choice like this. If we interpret the doctor's choice, we have:

clot with paralysis and shortness of breath
clot with paralysis and no shortness of breath

the second one is much less likely, because it would be odd to have a clot and not have a very common symptom associated with it. It is less clear in Linda's case, but I think the same reasoning applies there. What is interesting is that the error is not seen in ranking statements which have nothing to do with the given knowledge about Linda, such as:

Linda owns an IHOP franchise
Linda had a sex-change and is now Larry
Linda had a sex-change and is now Larry and owns an IHOP franchise

There might be something to being completely unrelated that changes the interpretation of the English sentence, and makes it a bit more formal, closer to the mathematical reasoning. I am not sure what types of statements would do this, but it is a bit challenging to disentangle subtle language interpretations I think.

When reading these experiments, I recalled a description from E.T. Jaynes about people receiving the same new information, but updating their knowledge in a diverging way, due to differences in their prior information. I think something like that could be going on here. What I mean is, when doctors are asked: "Which is more likely: that the victim of an embolism will experience partial paralysis or that the victim will experience both partial paralysis and shortness of breath?" it is interpreted as:

someone is claiming that the patient has an embolism
the patient is claiming, or someone has measured, that she has partial paralysis
the patient is claiming, or someone has measured, that she has shortness of breath

I don't believe the doctors are separating the analysis of the claim of the clot, which is given information, from the other claims. As Mlodinow admits, the situation where one knows the diagnosis is practically never encountered, so the doctors are really assessing the truthfulness of the existence of the clot. Because of this, the implied negative in (2) above (i.e. paralysis with no shortness of breath) is even stronger.

Another way of looking at it is to include the knowledge of the method of reporting. Someone who is reporting information about an ailment will report all of the information accessible to them. By reporting only the paralysis, there are two possibilities concerning the person measuring the symptoms of the patient:

they had the means to measure shortness breath in the patient, but there was none
they did not have the means to measure shortness of breath

In the first case, the doctor's probability assessment is absolutely correct: both symptoms together are more likely than just one. In the second case, the doctors are also correct: one of the sets of diagnostic results (i.e. just paralysis) is less dependable than the other set (i.e. both symptoms), thus the second one is more likely to indicate a clot or is consistent with the known clot.

It isn't that the doctors are reasoning incorrectly. They are including more information, and doing a more sophisticated inference than the strict, formal, minimalistic interpretation of the statements would lead one to do.

This analysis works well for other examples stated in the book, like "Is it more probable that the president will increase federal aid to education or that he or she will increase federal aid to education with function freed by cutting other aid to states?".

Now I have to continue reading the book, and track down the study, to see if any of these thoughts pan out.

Tuesday, July 7, 2009

A Little Geometry

I've just finished the book "Euclid's Window" by Leonard Mlodinow, and really enjoyed it. The book describe the history of geometry from Euclid, Descartes, Gauss, and Einsten. During his coverage of Euclid he presents a simple proof of the Pythagorean Theorem that really resonated with me. I don't recall ever seeing a proof of it, or at least no memorable proof. This one uses a minimum of jargon and formality...you just draw the picture, discuss it for a bit, and you see it!

You start with a right triangle, like:

and you make two constructions, from a square with sides a+b. The first construction looks like:

which, by eye, you can see that the total area of the square is the area of 4 triangles (just like our original) plus the area of the inner square, which is c*c (which reminds me that I have to figure out how to do superscripts and subscripts in this blog. :) )

The second construction is nearly the same as the first, and looks like:

which, again by eye (with a little shading to make it a bit more obvious), the total area of the square is the area of 4 triangles (just like our original) plus the area of the two inner squares, which are a*a and b*b. Therefore:

a*a+b*b=c*c

for any triangle for which you can make this construction, which are right triangles.

Really neat!

Beginnings

Although I seem to have missed the big blog burst, I am starting one now to include my musings on various topics of my interest, from statistics and probability to physics and history.