Pumps for the “Aha!” moment

One of the great things about being a Cantabrigian is the constant, abundant feast that is available for the mind’s consumption and delight. Tonight was no exception. I was lucky enough to snag the second-to-last ticket for a discussion between Daniel Dennett and Steven Pinker at the Brattle Theater in Harvard Square. Dennett has come out with a new book, called Intuition Pumps and Other Tools for Thinking, and this was his book signing event. There wasn’t a clear thesis to the discussion — other than that of “You should buy my book to find out more!” — but the topics Pinker and Dennett touched upon are ones dear to my heart, including the nature of consciousness, the relationship between philosophy and science, and gedankenexperiments.

The discussion explored the notion of an “intuition pump”, a phrase coined by Dennett. I really like the phrase — one of the reasons why I appreciate philosophers is that they (the good ones, anyway) have such a way of finding the right words to capture the vague floating concepts we have in the back of our minds. An intuition pump, as Dennett puts it, is a communicative device used by a person to pump out another’s intuitive response of “Aha! Now I totally see!” — without the other person actually truly seeing. More concretely, intuition pumps are commonly thought experiments or analogies that elicit a visceral feeling of understanding and/or agreement with some conclusion, even though the thought experiment/analogy may be unjustified, non-rigorous, or flawed in some way. They’re shortcuts to rigorous arguments, essentially.

There are good intuition pumps, and there are bad ones. The good ones have better correspondence with truth and reality. The bad ones obscure them. An example of a good intuition pump is the analogy that the “brain is a computer”. It pumps your intuition about computers, which are familiar to you, to develop some level of (pseudo)understanding of the brain, which includes conclusions such as (1) the brain’s operation can be decomposed into the operation of many simple units, and (2) the brain is a programmable device.

An example of a bad intuition pump is Searle’s Chinese Room thought experiment, which is used as an argument against Strong AI. It’s just one of those popular concepts in philosophy that drives me nuts, because it’s so painfully obvious that it’s not even a real argument. I don’t see why people still spill endless amounts of ink over the subject! This thought experiment is trying to pump out your intuition about what “understanding” is, and clearly understanding must be more than some mindless shuffle of symbols and cards!

One thing that I had hoped to ask Dennett was this: philosophical thinking is important to the progress of science, because it encourages asking the “big picture” questions which guide the course of scientific investigation. Crucial to this is the use of intuition pumps: when exploring the fringes of knowledge, we must rely on our intuition to reason at a meta-level, which hopefully points the way to truths discovered by rigorous science. However, as science is increasingly exploring realms that are outside everyday human experience — and hence, outside of our intuitions — what is the role of the philosopher that has little to no scientific training? The brain, quantum mechanics, economic systems — the workings of these systems are foreign and inscrutable to those who aren’t practitioners in the respective fields. How can philosopher try to pump her own intuition to ask the right questions about consciousness? About what quantum mechanics tells us about the nature of reality?

“He’s white – he has lots of money!”

China trip anecdote of the day: Matt and I had finished a hike in one part of Wulingyuan, and needed to get back to the city. We were on the opposite side of the park, it was getting dark fast, and the park bus system had appeared to (prematurely) shut down. We might as well have been wearing a sign screaming “Lost Tourists”, because a gang of taxi drivers scrambled to obtain our fare. It didn’t take long to realize that they were probably our only way back to the city, a fact they were happily aware of. So, we tried to bargain them down.

“100 kuai!”, they announced.

“50!”, we countered. But they knew they had us. Matt stepped away to look for buses one last time. Meanwhile, I tried to plead with the cab vultures.

“Please, sirs, we’re just students, we can’t afford your high rates,” I offered lamely. I expected laughter, mockery, sneering — but not what came next.

One of the cab drivers grew very serious, sidled up to me, and gave me a furtive look.

“Don’t worry,” he whispered, as if letting me in on a big secret. “Your friend over there, he’s white — he has lots of money!” I was too stunned to laugh.

“Look!” He whipped out his cell phone and started showing me pictures of white people. “They’re white! See? Americans! Happy! They have lots of money!”

Well, golly, then – can’t argue with that. We took his cab for 70 kuai. Because Matt the white guy has lots of money.

Espresso and PCP theorems

I get this view on the way to work every morning. Also, there’s the supervolcano Mt. Rainier in the distance.

Greetings from the Pacific Northwest! The lack of content recently has been due to travel; for until September I’m graciously hosted by Aram Harrow and Anup Rao at the University of Washington. I’ve been here for over a month now, and let me just mention that Seattle is amazing. Particularly, I have to say that living in the district around UW (known as “U-dub” amongst the locals) has been a blast. Every day, I have the joy of taking a leisurely walk from my apartment, strolling down University Avenue (known as “The Ave”), with its cornucopia of restaurants, cafe’s, shops, and street musicians, and then I soak in the splendor of UW’s campus — unequivocably the most beautiful campus I’ve ever seen — on the way to work. Continue reading

Nalini Joshi’s story

I got this from Jennifer Oulette’s Google+ feed, and thought it too good not to share. It’s a mini-documentary about Nalini Joshi, a professor of mathematics at the University of Sydney. She talks about her childhood in the tumultuous Burma (where Aung Sang Suu Kyi was imprisoned for the last 22 years), and her lifelong love of mathematics. It’s a very charming video.

My favorite line: “I want people to feel about maths, the way they feel about music.” I love the sincerity and earnestness she has when describing the similarities between doing mathematics and engaging in art, or music. Being both a mathematician and musician myself, I completely agree.

Also, I find myself relating to her childhood story, insofar as both our parents had to move continents amidst political upheaval in their respective countries (my parents were originally from Cambodia).

A Realization

That’s it — I’ve had it. I just can’t deal with this anymore. I’m sick and tired of trying to solve my problems with polynomials all by myself. I’ve been trying to talk it through with them via “elementary” methods. I mean, the manipulations are slow and tedious but it’s what I’ve known all my life. But let’s face it: the methods are really just caveman tools from before the 19th century. I can definitely do better, with Bezout’s Theorem and Nullstellensatz out there.

I think it’s time that I learn algebraic geometry. It’ll be better for the both of us.

“Director’s Cut” for papers

You know what would be a great idea? Authors of critically acclaimed TCS papers — such as IP = PSPACE, or On the power of unique 2-prover 1-round games, or Expander flows, geometric embeddings and graph partitioning – should release a “Director’s Cut” or “Extended Edition” version of the paper! What would be in this version? Well, obviously, it should include:

    • Behind the scenes. What went into the making of this paper? What awesome technology was used to build the theorems and lemmas (that usually have omitted proofs due to page limit constraints)? Where were critical theorems and lemmas proved? If the Foo Chow Chinese Restaurant can be famous for being filmed in the movie Rush Hour, why can’t 1369 Coffee House be known for being the site where Lemma 3.7 was proved?
    • Director’s Commentary. Yes, I’ve read the paper — but I want to hear it from the author’s point of view. They should give a voice-over commentary as the paper is being narrated. They should point out things to pay attention to, subtle miscellany, cameo appearances of techniques.
    • Bloopers. Oh, we all love bloopers. There should definitely be a gag reel highlighting failed attempts at proving a false theorem, the faux pas’s in meetings, and generally the funny moments that invariably happen during mathematical research.
    • Deleted scenes. By this, I mean all the proofs of lemmas that were omitted! Also, interesting theorems and corollaries that were proved but weren’t included because they were brutally cut down by the unforgiving scythe of reviewer’s comments.

Imagine being able to browse through a library of such Director Cuts of papers in iTunes, and downloading your favorites. I’d even pay for them!

You heard it here first. Also, authors: since you never know if your paper is going to be a great hit, I suggest recording the behind the scenes footage just in case.

An argument about min-entropy

Update: Reader “miforbes” has given a simpler, direct proof in the comments!

I was reading the beautiful paper “Kakeya sets, new mergers and old extractors” by Dvir and Wigderson, and I came across the following lemma:

Lemma: Suppose Z is a distribution over a finite universe \Omega that is \epsilon-far from having min-entropy k. In other words, for every distribution X with min-entropy at least k, the statistical distance between Z and X is at least \epsilon. Then there must exist a set T \subseteq \Omega where |T| \leq 2^k such that \Pr[Z \in T] \geq \epsilon.

This was stated without proof. It’s a very reasonable lemma! Intuitively, having large min-entropy means that your distribution is very well “spread out”. Thus, being far from having such min-entropy should mean that there is a small area where your distribution has concentrated some non-trivial amount of mass.

The problem was that I could not immediately see a proof. I won’t say how long it took me to come up with one (because it’s quite embarrassing), and I’m somewhat dissatisfied with it: it’s not as simple as I thought it would be. Does anyone know of a simple proof?

My argument goes as follows. Let’s try to prove the contrapositive. Suppose for all small sets T (where by small I mean |T| \leq 2^k), \Pr[Z \in T] < \epsilon. We want to show that this must mean there is a distribution X with min-entropy at least k that is \epsilon-close to Z.

Define the “bad points” B of the distribution Z to be the x\in\Omega such that \Pr_{x \sim Z}[x] \geq 2^{-k}. These are the points that prevent Z from having min-entropy k. Note that |B| \leq 2^k, but by our assumption this means \Pr[Z \in B] \leq \epsilon, but then this means actually |B| \leq \epsilon 2^k!

Everywhere outside of B, the distribution Z looks like it has the right min-entropy. So let’s construct the following distribution X: we first set it exactly equal to Z, but then we modify it a little bit. In particular, we take the “excess” probability mass in the B region — for each point in B we take any probability mass beyond 2^{-k} — and redistribute it throughout the rest of \mathrm{Supp}(Z) - B, ensuring that no point gets more than 2^{-k} mass. If we can do this redistribution, then by definition X will have min-entropy at least k. Finally, the statistical distance between Z and X will be less than \epsilon: we’ve only moved at most \epsilon probability mass around.

All that remains is to show that we can do this redistribution. It is easy to see that there must be a set C\subseteq\Omega that is disjoint from B, and of size 2^k. By our assumption the mass assigned to C is at most \epsilon, so the average mass in C is at most \epsilon/2^k. By Markov’s inequality this means that at least (1-2\epsilon)2^k points P\subseteq C have probability mass at most 2^{-(k+1)}. We can take the excess mass from the B region (of which there is at most \epsilon) and evenly distribute it on P, without putting more than 2^{-k} total mass on any point (this requires a very mild assumption that \epsilon < 1/4).

This concludes the proof. Now I’m sure there’s a one line version of this. What is it?!

Normalizer of normalizers

I recently got nerdsniped by the following question: Let N_G(H) be the normalizer of a subgroup H in the parent group G. Is it true that N_G(N_G(H)) = N_G(H)?

It’s not true. Here’s a counterexample: Consider the dihedral group D_n where n is a multiple of 4, which consists of all the symmetries of a n sided polygon. Denote the elements of D_n as either R_i, a rotation of i vertices, or S_j, a reflection of the polygon about the jth vertex.

Let H = \{R_i \mid i \equiv 0 \mod 4\} \cup \{S_j \mid j \equiv 0 \mod 4\} \leq D_n. Observe that H is not normal in D_n, so N_{D_n}(H) \neq D_n. Let R_k be an arbitrary rotation, and S_i\in H. Then R_k S_i R_{-k} = S_{i + 2j}, which is only in H if and only if i + 2j is divisible by 4, which is only when j is even. Let S_k be an arbitrary reflection, and S_i\in H. Then S_k S_i S_k = R_{2j - i}, and again R_{2j-i}\in H if and only if j is even. Therefore, N_{D_n}(H) = \{R_i \mid i \equiv 0 \mod 2 \} \cup \{S_j \mid j \equiv 0 \mod 2 \}, which is not equal to D_n.

On the other hand, observe that, by the same analysis, N_{D_n}(N_{D_n}(H)) = D_n. Thus  N_G(N_G(H)) \neq N_G(H) for G = D_n.

Thanks to Tanay for the clarifying discussion!

The second coming of algebra, and a surprising algorithm

This term, I’m attending Madhu Sudan’s delightful course on Algebra and Computation. It’s truly a treat — the last time he taught this course was back in ’05! Algebra seems to offer some really high quality tools (perfected over hundreds of years!) to attack many problems in TCS these days. Indeed, Michael Forbes remarked to me recently that he views these days as “the second coming of algebra to theoretical computer science” (the first being the flurry of algebraic algorithms in the 70′s). Well, if that’s the case, then all ye non-believers, you had better study up!

Madhu is also introducing the web 2.0, social media paradigm to the course, by maintaining a class blog. Add it to your feeds! So far, Madhu has solicited us for “an algorithm that surprises us most.” Check out what people find surprising, and why. As for myself, after agonizing for quite some time I eventually settled on the Moser-Tardos algorithm for the Constructive Lovász Local Lemma. My explanation/ramblings are reproduced below:

The Constructive Lovász Local Lemma, and Why I Like It

One of the most potent tools in combinatorics and theoretical computer science is the probabilistic method, where objects can be proven to exist “simply” by arguing that they probably exist. Some prominent applications of the probabilistic method include Shannon’s Noisy Coding Theorem, pseudorandomness, and circuit complexity (e.g. there exist functions with high circuit complexity). One of the main techniques of the probabilistic method works as follows: to show that an object with property P exists, it suffices to show that P is equivalent to a set of “bad” events A_1,\ldots,A_n not happening; and if an object is sampled from some appropriate distribution at random, and the probability of none of A_1,\ldots,A_n happening is strictly positive — an object with property P exists. Continue reading