Why there must be universal grammar

The guardian ran an interview with Daniel Everett yesterday. Everett is a linguist most famous for his claim that universal grammar (the belief that some rules of grammar are "hard wired" into the brain) as popularized by Chomsky, is false. Specifically, he believes that the Pirahã language lacks recursion.

His claims are quite controversial, but one thing which is worth mentioning is that universal grammar is (for a reasonable definition of "proof") provably correct. By this I mean:

Theorem: Learning grammar is so hard that the only way humans (or anyone) can do it is if they have innate structures.

This is related to Chomsky's poverty of the stimulus argument.

It can be proven in the following way: suppose we restrict ourselves to just the subset of English sentences consisting only of nouns and verbs. "I like John" and "You are here" would be two examples. These both follow the pattern "noun verb noun". A sentence like "jump run you" is non-grammatical, because "verb verb noun" is not an acceptable pattern in English.

Now let's consider how long it would take a learner to learn these patterns. There are 23 = 8 possible patterns of length three, so if a learner thinks they're all possible, it will have to test out all eight of them. ("Mommy, is 'jump run you' a sentence?")

Most sentences have much more than three words of course, so a learner will need to test out the 24 = 16 four word patterns, the 25 = 32 five word patterns, etc. In general, there are 2n possible sentences with n words, meaning that the number of tests that the learner will need to run is exponential in the number of words.

The Cobham-Edmonds thesis states that any problem which takes exponential time is, in practice, unsolvable.

Why is this true? There are, depending on your definition of "part of speech", about 20 parts of speech in English. If you tested one grammar per second, it would take you about a month to learn all the five word grammars. The six word grammars would take you two years, and you would be forty before you learned all the seven word grammars. That last sentence had 22 words, and it would take you 1021 years to test all of the 22-word-grammars. The universe is only 1013 years old.

So who knows whether all languages are recursive. But it seems unlikely that human children consider all possible grammars equally. They must use some shortcuts and those shortcuts must, by definition, be innate.

Thoughts on Sokal v. Lynch

The New York Times ran a debate between Sokal (of Sokal affair fame) and Lynch regarding the underpinnings of science, apparently sparked by Rick Perry's denial of evolution. I've read several "why science is better than religion" things like this, and none of them ever give what I see as the obvious proof, so I'd like to contribute it here.

If you have some theory which works 10% of the time, and you do one experiment, there's a 10% chance you'll falsely believe your theory is good. Do two experiments, and that probability drops to 1%. Three, four, ..., N experiments later, and the likelihood that you'll have seen all false positives is vanishingly small.

Another way of putting this is: the law of large numbers says that, if you do a large number of experiments, you'll tend towards the right answer. If evolution is supported by vast amounts of evidence, the probability of it being wrong is so small as to be inconsequential. This has nothing to do with experimental science, it's just a mathematical fact. QED.

I guess Prof. Lynch will tell me that the mathematical assumptions which underlie the law of large numbers are just as suspect as the assumption that the bible is infallible. Maybe, but it strikes me that few fundamentalists are claiming that 2 + 2 = 5, indicating that much progress could be made by making clear the mathematical foundations of science.

I'll leave you with what I think is Sokal's best argument (tragically not in that op-ed):

Anyone who believes that the laws of physics are mere social conventions is invited to try transgressing those conventions from the windows of my apartment. (I live on the twenty-first floor.)

A Simple Proof: Occam's Razor

How do you know that I'm not a robot? How do you know we're not living in the matrix?

The usual resolution is some form of Occam's razor: sure, it's possible that I'm a robot, but the simpler explanation is that I'm human, and simpler explanations are preferable.1

This just pushes the question back: why are simpler explanations better?

There is a straightforward proof that comes from Computer Science, of all places, which I hope to explain here.


Suppose I enter the world as a blank slate - I have a "bag" of hypotheses about how things work, and I consider them all equally probable. As I perform experiments, I disprove some of my hypotheses, while others remain. As time goes on, my bag of plausible hypotheses gets smaller and smaller.

If I eventually reach a point at which I only have two hypotheses remaining and I randomly choose one to believe, I'm 50% certain that I've got the right one. But if I randomly believe one out of a hundred possible hypotheses, I've almost certainly chosen wrong (i.e. I've probably selected a hypothesis that by luck happened to fit with all the observed data, even though it's in fact wrong).

Believe it or not, this concludes the proof.

If I have a simple hypothesis ("fire is hot") there's really only one other hypothesis that could be in my bag ("fire is not hot"), so I can rapidly determine which is the right one. If my hypothesis is complicated ("fire is hot, provided it's the first full moon of a year with zodiac symbol ...") there are tons of equally complex hypotheses, and some of them are bound to fit the data, so I'm unlikely to have chosen the right one.


In my job, I spend some time in the back rooms at medical offices, which means I hear nurses complain about doctors, and doctors complain about patients. One conversation I had with a dietition sticks into my memory: she was complaining about patients who expect the faddish, complicated dietary advice you hear on TV - "good" carbs, antioxidants etc. - but all she does is give people a calorie target, and recommend eating more fresh fruits and vegetables.

I told her to give her patients a brochure on Occam's razor. I doubt they've implemented my suggestion.

Postscript: This proof is a vague mishmash of the motivation for Bonferroni correction and VC theory. Any book on computational learning theory will have a better one, but you can see de Wolf's thesis for an explicit application of PAC learning to Occam's razor. You might also like my post why you will never see an eight-sided snowflake.

  1. That's not true. The usual resolution is to ignore the problem.