Friday, May 12, 2006

Wiretapping vs. Futures Market

There is some irony in the most recent revelation about warrantless NSA wiretapping of the phonecalls of tens of millions of Americans. Following 9/11, there was a flurry of ideas of how to improve the predictive capability of the intelligence community to help prevent the feared next wave of attacks. The programs we're hearing of today; the massive data-mining of communications, such as this new NSA program, the so-called Terrorist Surveillance Program, the MATRIX database, and probably others, all had their genesis in the Total Information Awareness project at DARPA. Headed up by Adm. John Poindexter, the program was intended to amass vast amounts of data and use sophisticated mining techniques to search for hints of terrorist activity. Operating on the assumption that the intelligence failures of 9/11 were due to insufficient information, their goal was to gather all information that exists. With perfect information, the world becomes deterministic. However, due to it's Orwellian overtones (and Poindexter's very creepy logo for the Information Awareness Office), the program was (supposedly) cancelled by congress.

Around the same time, a very different idea was proposed, called the Policy Analysis Market. The PAM was based on the idea that the intelligence failures were not due to lack of information, but to the inability to synthesis the information and analysis that already existed. The PAM was a prediction market, basically a futures market where traders bet on the outcomes of future events, as one can do on In the PAM, players, which could be anyone in the world who wanted to sign up, could buy and sell contracts for predictions of the future. The price of the contracts then indicates the aggregate belief of all the players of the probability of an outcome. But this program too, was cancelled after public and congressional outcry, because of aversion to the idea that people who bet on a terrorist attack could profit from the deaths of others.

But, as Bruce Schneier wrote in Wired, the TIA may have been cancelled, but it lives on in these programs we're hearing about now. There does not, however, appear to be any successor to the PAM. It's unfortunate and ironic that the administration chose to continue the TIA because it is almost assuredly to fail to be useful while trampling civil liberties, whereas the PAM would probably be much more accurate and have the side benefit of being constitutional. As Schneier says,
Finding terrorism plots is not a problem that lends itself to data mining. It's a needle-in-a-haystack problem, and throwing more hay on the pile doesn't make that problem any easier.
While it seems logical that more information is better and all information is best, reality is much different. Maybe if you were omniscient and had all information available in the universe, then it might actually be deterministic. But for mere mortals, it will always be a game of probability and signal detection. You will always have incomplete information and will always have a probability of both missed detection and false positives. In this case, more information can actually be worse, due to what Schneier refers do as the "base rate fallacy"

Let's look at some numbers. We'll be optimistic -- we'll assume the system has a one in 100 false-positive rate (99 percent accurate), and a one in 1,000 false-negative rate (99.9 percent accurate). Assume 1 trillion possible indicators to sift through: that's about 10 events -- e-mails, phone calls, purchases, web destinations, whatever -- per person in the United States per day. Also assume that 10 of them are actually terrorists plotting.

This unrealistically accurate system will generate 1 billion false alarms for every real terrorist plot it uncovers. Every day of every year, the police will have to investigate 27 million potential plots in order to find the one real terrorist plot per month. Raise that false-positive accuracy to an absurd 99.9999 percent and you're still chasing 2,750 false alarms per day -- but that will inevitably raise your false negatives, and you're going to miss some of those 10 real plots.

There is a high cost to false positives, in that it wastes investigators' time and also intrudes on innocent people's constitutional rights. Data mining can work in certain cases, but does not seem appropriate in uncovering terrorist activity. Predictive markets, on the other hand, are much more efficient at aggregating opinions of a large number of people. An arbitraty number of people can trade on a particular question and many may disagree, but the simple market price of the contract represents the probability of that event occurring according the the entire market. The more people in the market, the better.

When making predictions, even having more information is not always beneficial. A study by Daniel G. Goldstein and Gerd Gigerenzer found that, when inferring answers to uncertain questions, knowing a moderate amount of information led to more correct answers than knowing very little or very much information. George Freidman, of, reached a similar conclusion about the U.S. intelligence community, writing
The American IC is much too big. It has
way too many resources. It is awash in information that is not
converted into intelligence that is delivered to its customers.
Huge organizations will lose information in the shuffle. The bigger they are, the more they lose.
One wonders if the vast intelligence apparatus should even exist at all. Instead, it could be split up and diversified into much smaller organizational groups of analysts linked by a prediction market to aggregate their findings. With all the information that's available openly on the internet, you could probably put together a very capable shop of analysts on a very small budget, at least for grand-scale, strategic intelligence. Of course you need other types of intelligence for other uses, in particular satellite and signals intel for the faster-paced tactical world. But for the big picture, you would probably get quite a bang for the buck by gathering up a small group of smart, educated, experienced, and motivated people, giving them an internet connection and subscriptions to all the information providers they want, and let them run with it.


At 10:25 PM, Blogger Mike said...

At 10:32 PM, Blogger Mike said...

This comment is somewhat off-topic, but related to the news that just came out in the USA Today reports. I would say that we are already half way down the slippery slope. According to the
NY Times
, the basic call info collected by the NSA included: "numbers called; time, date and direction of calls; and other details, but not the words spoken... Customers' names and addresses are not included in the companies' call records, though they could be cross-referenced to obtain personal data." Essentially, this is the same info your local police get when they pull your call records ("LUDs") during an investigation (minus the publicly available name-to-address correlation records). However, the local police can't do this willy-nilly, they must actively be investigating a crime and must have probable cause to pull your records legally. I would be willing to bet that the NSA's request fails both tests for collecting such records on the general public.


