Hopefully this will help those that are not conversant with probability and statistics.
A fair coin flip should produce a 50 50 chance of heads or tails, over a large number of flips. If you flipped the coin
three times and got heads, what are the chances that if you flipped it again a head would appear? Well, it is 50%.
The coin holds no prior history of the previous tosses and each toss has a 50 50 chance of coming up heads.
Now, lets do an experiment. Suppose we have an LRL with an experienced operator, an observer/recorder, an assistant,
two identical containers, a gold coin and a junk target. A wall separates the LRL operator from where the boxes are.
So, the assistant goes behind the wall where the two boxes and the two targets are and he places one target in each box,
then leaves the area. The observer and the LRL operator then enter the area where the boxes are and the LRL operator, using
his trusty LRL selects the box he feels has the gold. The box is opened by the observer/recorder and the results are written down.
A short digression. The above is a double blind test. Neither the observer nor the LRL operator know in advance which box has
the gold and the assistant that put the targets in the boxes left the area and can't communicate his actions to the other two.
Getting back to the test, pure chance predicts that, over a large number of tests, the LRL operator would guess correctly 50%
of the time. Ok, we need to conduct the test many times to see if there is a statistically significant difference in the number of
successful outcomes over those that pure chance predicts.
Lets say we do the test four times. There are 16 possible outcomes. Using "S" for success and "F" for failure,
SSSS
SSSF
SSFS
SSFF
SFSS
SFSF
SFFS
SFFF
FSSS
FSSF
FSFS
FSFF
FFSS
FFSF
FFFS
FFFF
Note from the above sequence of 4 tests, the average probability of getting four successes is 1/16 or 6.25%.
If instead we insisted that there'd be no more than 1 failure, then the probability would be 5/16 or 31.25%.
Of course, if we used sequences with more tests, we can produce even smaller probabilities of success from purely
random chance.
However, if we do this test only once, then we can't rule out that it may have been an accidental quirk that produced
the results obtained. To eliminate, or at least quantify the probability of a quirk producing the results, we would need to run
the experiment multiple times, each time recording the results obtained.
If enough of these 4 test trials are run (from sampling theory and the Central Limit Theorem, 30 or more sequences of tests
should suffice), we can assure ourselves of approaching the random chance probabilities with some suitably small confidence
interval, as well as determining if the LRL produced results that are statistically significant and different from random chance.