summaryrefslogtreecommitdiff
path: root/tests/chi-squared-tests.txt
blob: fa64ed9fb1ea57586e311bc33bd11f1ff865dc60 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
About the trandom_deviate.c and t[ne]random_chisq.c

trandom_deviate tests the functions in random_deviate.c, which provides
support routines for mpfr_[ne]random.  The tests

  * improve the code coverage of the tests by constructing random
    deviates with special properties: (a) the initial part of the
    fraction matches that of another random deviate; (b) with the
    leading bit in various positions.

  * that the values provided by mpfr_random_deviate_value at a given
    precision do not change if additional random words are added to the
    deviate;

  * that the values provided by mpfr_random_deviate_value at a given
    precision but different rounding modes are consistent.

t[ne]random_chisq.c provide chi-squared tests for mpfr_[ne]random.  Note
that there's a fair amount of code duplicated in these files, but with
enough differences to make putting the code into a separate file a
little messy.

Two types of tests are performed

  * using equal width bins at moderate precision (prec = 64) treating
    the deviates are continuous quantities;

  * assigning a bin to each possible value for the random deviates
    (within a range) at low precisions (prec = 2, 3, 4).

Only abnormally large values of chi-squared are flagged, since
abnormally low values can only result from

  * the expected variation in chi-squared results;

  * a correlation between the random deviates; but this can only happen
    if gmp's random numbers are correlated (in a rather pernicious way);
    the tests on gmp's random number generator should catch this.

The chi-squared tests have to accomplish contradictory goals:

  * reliably detect when there's a problem with the routines under test
    (the probability of a false negative is low);

  * rarely report a problem when there isn't one (the probability of a
    false positive is low);

  * complete in about 1 second.

Some attributes of the testing come to our aid:

  * the tests are executed many times on many different platforms;

  * typical coding errors in the routines for normal and exponential
    sampling lead to distributions which are far from the correct one;

  * carrying out a longer (by a factor of N) test on a bad routine
    multiplies chi-squared by about N, a result which has a vanishingly
    small probability for a good routine.

The strategy for testing is

  * use 100000 samples (with this number, the test of mpfr_nrandom takes
    about 0.3 sec); compute Q, the probability that chi-square exceeds
    the value obtained.  If Q > 0.01, the test passes; if Q < 1.0e-9,
    the test fails; otherwise:

  * repeat the test with 10 times the number of samples and compute Q
    again.  If Q > 0.001, the test passes; if Q < 1.0e-9, the test
    fails; otherwise:

  * repeat the test with 10 times the number of samples and compute Q
    again.  If Q > 0.0001, the test passes; if Q < 1.0e-9, the test
    fails; otherwise:

  * the test fails.

If the mpfr_[ne]random is OK, then the probability that the test fails
(a false positive) is about 1.0e-9.  If there's a bad error in
mpfr_[ne]random, failure will probably occur at one of the stages.  It
there's an error in mpfr_[ne]random which results in a slight deviation
from the true distribution (which I don't think is likely given the
nature of the algorithms), then there a reasonable chance that the
problem will be discovered "eventually".  Because "eventually" is not
very satisfactory, it is important that the tests be run with a large
number (e.g., 10^9) of samples whenever changes are made to
mpfr_[ne]random.

I've left the t[ne]random tests as is, even though t[ne]random_chisq are
performing the same tests in a more comprehensive fashion.  There's some
utility in having a set of really simple tests.

  --Charles Karney <charles.karney@sri.com>