Why Is Getting A Random Number So Hard?

Why Is Getting A Random Number So Hard?

Harnessing Entropy To Give Computers Free Will

Why Do We Have To Worry About Random Numbers?

On May 19th, 1984, in a filming studio in Los Angeles, California, an unassuming ice cream truck driver from Ohio named Michael Larson stepped up to be a contestant on the CBS game show TV series Press Your Luck.

The game involved a prize board, which flashed random reward values, which added to their cash total, and a spoiler icon embodied as a “whammy,” which caused contestants to lose, all blinking in a flashing display on the stage.

Contestants had to time a button press just right to hopefully land on a cash prize amount without hitting a “whammy.” A nervy contestant who had hit a dollar amount without hitting the spoiler could elect to have another run on the board, pressing their luck, as per the title.

But on that day, the game show’s host and staff would meet their match. Michael Larson had freakish luck at the board, winning prize total after prize total. Typical contestants filled out a half-hour show plus commercials and went home with a few thousand dollars at best.

Larson ran the show into overtime, requiring them to continue a single game until the next episode for the first time in the show’s history. Larson blew past the record amount ever won on the show, and still, he won more. He passed $50,000, $100,000, and eventually settled on $110,237.

It turned out that Press Your Luck’s random prize board wasn’t so random at all.

Larson had been watching the show at home and noticed a pattern in the prize board.

He taped episodes on his VCR and played them back, noting that there were a set of five patterns the board always blinked through and that there was a can’t-lose strategy for beating it simply by timing the button press.

In short, Larson had found an exploit, a “hack,” and after months of training and practicing, eagerly hopped a plane to appear on the show. They had to let him keep the prize money. There was no rule that said you couldn’t memorize a pattern.

In modern computing, the realm of cyber-security is likewise a fragile practice based on the factor of being unguessable. Many security protocols require random numbers to create keys that will be difficult for an attacker to guess.

Any security protocol requiring initiation values, digital signatures, authentications, or password generation relies on being unpredictable, and hence unguessable. There’s a document out there, RFC 4086 (1), detailing “Randomness Requirements For Security.”

In fact, randomness in cryptography has been important since WWII. The Japanese in that conflict relied on a cipher machine that used weak randomness algorithms and hence was easily cracked.

So you say, fine then, have some random numbers. You play solitaire all the time on your computer and it shuffles up a different game all the time, so what’s the big deal?

Well, it turns out that this is actually a very big deal – in fact, with time and patience, you could theoretically defeat your desktop solitaire game because it probably uses pseudo-random numbers instead of truly random numbers, just as surely as Michael Larson found patterns in a game board.

In fact, most of the games you play don’t bother generating truly random values at all. That’s because getting a truly random value out of a computer is fiendishly difficult, and not worth the trouble for idle amusement.

The Different Flavors Of Random

In the early days, scientists needing random values for simulating any real-world event did it the old fashioned way: With a set of dice, a deck of cards, or other methods familiar to any casino game. That was slow.

So in 1947, RAND Corporation published a book called “A Million Random Digits with 100,000 Normal Deviates,” compiled from a roulette wheel wired to a computer; users could close their eyes and fan through the pages, and get something good enough. It hardly made for scintillating reading, but at the time was celebrated as an important breakthrough.

But of course, the age of computation required more sophisticated randomness generators still, because let’s face it, nothing is so predictable as a set-in-type book.

The problem with this is that computers are inherently deterministic; at their base level, a bit is either switched on or off, “1” or “0,” and there is no way to tell a computer to “just take a guess.”

The next-best solution is to use a pseudo-random method, which starts with a seed value that is then plugged into a recursive algorithm to produce many values off the original value.

But how do you get a seed value in the first place?

There are a number of semi-entropic methods we use:

  • The computer’s on-board clock is measured in microseconds, so a user clicking a button or hitting a key is unlikely to be at a predetermined exact time.
  • Mouse movements. Recording the pattern of the exact X-and-Y position of the mouse pointer at a given time works as a source of pseudo-entropy, even if weak.
  • Ping times. Timing the exact microsecond a pulse of information traveling over the Internet takes to make a round-trip is also a weak variable.
  • On-board sensors. Measuring the current temperature of the processor gives a narrow range of values.

For most gaming needs, one of the above sources of limited entropy plus a host of pseudo-random algorithms with colorful names like “Mersenne Twister” and “Blum-Blum-Shub” is good enough. Yet these methods are proven insecure.

The problem isn’t just that each of these entropy sources produces a narrow range of values, it’s that generating any sort of pseudo-random values off a seed becomes nearly useless once the algorithm is known.

Mirroring the Press Your Luck story, a group of enthusiastic hackers once reverse-engineered an online poker website in 2001. The site’s method of simulating a deck shuffle used just such a method, pulling a time-stamp off a clock and running some code over it, and hey-presto, the hackers learned how to cheat the game.

They’re part of the reason modern gambling commissions in various countries oversee gambling operations to ensure high-quality randomness. For a more potent example, in the 1990s the Netscape Navigator web browser was hacked by reverse-engineering the encryption process – it, too, used a timestamp for a random value seed.

So What’s The Strongest Entropy Method In The World?

That’s actually a great question. One would want a tiny roulette wheel embedded in the circuit board. In fact, there are specialized hardware devices you can buy which use tiny little quantum events for just such a purpose.

The problem is spinning that wheel fast enough; getting just one random bit means you have a range of two values. Getting a whole block of random values takes time. And while there are, ahem, “cryptographically secure pseudorandom number generators,” we’re still dealing with pseudo-random events seeding algorithms, we’re just doing it faster with fancier mathematics trying to stay ahead of the Michael Larsons of this world.

Even Linux, widely regarded as being near the top security-wise, doesn’t have a perfect random number generation method either. Here’s a good read analyzing the process, which boils down to harvesting entropy from input events and hardware activity and feeding it through a software system.

This also illuminates another problem: harvesting entropy from a mouse and keyboard only works if you have a human there interacting with it; this isn’t practical for a bank of servers that need to encrypt high volumes of data.

But try it now at your Linux command line: cat /dev/urandom | head -c 32 | base64

One bright comp-sci professor has tried to make headway in this problem: Mads Haahr in Trinity College, Dublin, Ireland, runs random.org, a website which supplies pure, high-grade entropy via using radio receivers to pick up atmospheric noise.

You can use any number of services for free right from the website, even using automated methods to fetch some entropy for your algorithm. But only so much at a time (or risk ending up on its list of banned web hosts)! Because the entropy has to collect over time, and one user can’t just hog all the numbers for themselves.

Yes, this has been a problem before they implemented the quota. Somewhere, there’s a cricket chirping who is affecting the atmosphere enough to modulate an air current passing by a radio receiver, and we can’t encrypt too much at once because the cricket isn’t chirping fast enough!

If tapping random.org is not an option, it’s possible to buy a Modular Entropy Multiplier from any number of retailers, which is a USB pluggable gizmo generating entropy off thermal noise (temperature measurement).

That’s The Best We Get?

True perfect entropy, generated fast and cheaply, is still an open problem. The most perfect entropy is the collective will of human imagination, however, so if you’re keen to set up your own ad-hoc homemade entropy, there’s a number of creative ways to do it for fun and minor profit:

  • Set up your own radio receiver, tune it between stations so it’s getting static, and dangle a microphone in front of it connected to your device, reading the digital output from the microphone.
  • Buy an ant farm, the kind with a glass window, and point a camera at it, reading digital data from the image. May not prove effective if the ants are sluggish.
  • Pull data from an online source. Perhaps a public webcam, or a website which records results from a huge volume of traffic. The top-scoring post on Reddit.com or the top of your Twitter feed might change fast enough if you follow enough people.

But wait a minute. We’re worrying too much about this. 99.99% quality entropy is actually good enough for most purposes, at least for now. The trouble is, “for now” is a very shaky word in computer science.