### The Asimov data set

Dec. 11th, 2011 03:54 pm**maxwells_daemon**

I've been a fan of Isaac Asimov since I was a teenager. I read all his short stories and novels I could get my hands on, and it was probably his essays on science that did the most to get me interested in physics. So I have found it immensely cool that his name is being bandied about in so many discussions in the ATLAS combined Higgs search group.

It is all down to a paper by some of my colleagues (two of whom I am working with now, and another who helped introduce me to another piece of statistics I am working on). They dubbed a representative data set, used to calculate expected sensitivities, the "Asimov data set" and cite Asimov's short story, Franchise (I should probably add something to Wikipedia). I remember the story well: it's about someone chosen by Multivac (a global supercomputer) as the sole voter, because his views are representative of the whole population.

Since it has come up in so many discussions over the last year, I've followed the evolution of the term: "Asimov dataset", "Asimov likelihood", "Asimov method", "Asimov distribution", or just "the Asimov". I get a little thrill each time I hear a new one (I know, I'm a real fanboy).

Despite this, what I've been doing in the Higgs group is to cross-check the asymptotic results, which use the Asimov method, using more traditional methods (known as "ensemble pseudo-experiments", or more informally as "toy Monte Carlo"). They don't rely on assumptions like large statistics (as did Multivac, or Hari Seldon, for that matter), but do require a large amount of computer time to generate many random pseudo-experiments. I developed a way to run these on the Grid. With hundreds of thousands of machines round the world, I have used 8 years of CPU time to generate 8 million toys in a few days.

This sort of thing went into the results that generated some excitement in the summer (eg. p14-16 of the EPS conference presentation). I'm not allowed to say what we will show on Tuesday, but it should be worth watching.

It is all down to a paper by some of my colleagues (two of whom I am working with now, and another who helped introduce me to another piece of statistics I am working on). They dubbed a representative data set, used to calculate expected sensitivities, the "Asimov data set" and cite Asimov's short story, Franchise (I should probably add something to Wikipedia). I remember the story well: it's about someone chosen by Multivac (a global supercomputer) as the sole voter, because his views are representative of the whole population.

Since it has come up in so many discussions over the last year, I've followed the evolution of the term: "Asimov dataset", "Asimov likelihood", "Asimov method", "Asimov distribution", or just "the Asimov". I get a little thrill each time I hear a new one (I know, I'm a real fanboy).

Despite this, what I've been doing in the Higgs group is to cross-check the asymptotic results, which use the Asimov method, using more traditional methods (known as "ensemble pseudo-experiments", or more informally as "toy Monte Carlo"). They don't rely on assumptions like large statistics (as did Multivac, or Hari Seldon, for that matter), but do require a large amount of computer time to generate many random pseudo-experiments. I developed a way to run these on the Grid. With hundreds of thousands of machines round the world, I have used 8 years of CPU time to generate 8 million toys in a few days.

This sort of thing went into the results that generated some excitement in the summer (eg. p14-16 of the EPS conference presentation). I'm not allowed to say what we will show on Tuesday, but it should be worth watching.