NIST 800-22 Statistical Tests Overview

Introduction
#

Random number generation is a critical step in many security-related tasks and applications such as cryptography, network security, and other cybersecurity applications. Quantum random number generation is a method in which a quantum source is used for the generation of a random sequence of numbers or bits.

The random number or bit sequence generated by the RNG is commonly used as a key for cryptographic methods. A generator’s output must be unpredictable to ensure security; all elements of the sequence must be generated independently of each other, and the distribution of 0s and 1s must be approximately equal within tolerance.

The NIST Special Publication 800-22 provides a guide for testing the bias, correlations, and patterns in RNGs. It provides 15 statistical tests to test the outputs of the generator for statistical standards. However, it does not test the quality, security, and randomness of the generator. Assuming a generator is not fit for use by only using a statistical test is a bias that should be taken into account. In the same sense, assuming a generator is suitable for application if the statistical tests are successful is also consequently a mistake. Additional testing must be done in order to ensure the generator is suitable for the application it is intended, in the same spirit that cryptographic schemes should be tested in real environments in order to test and ensure security.

A General Overview Of Random Number Generation
#

As specified before, a random number generator is a machine, programme, or a system that generates a random sequence of numbers and bits. Their applications are most prominent in cryptographic schemes in order to encrypt the scheme by using the generated sequence by the generator as a key. A generator of good quality must consist of statistical elements, independent parts within the sequence, be unpredictable, and must use non-deterministic sources alongside a processing function in order to generate a random sequence. The use of non-deterministic sources is especially important, as deterministic sources may be reverse-engineered in order to acquire the sequence, which would compromise security in a system. The statistical test to be outlined is useful in order to test if the generated sequence is statistically random. This is of particular interest, as a system may who physically appears to be random when tested statistically may actually be non-random.

Pseudo Random Number Generation
#

When the generation of a long sequence may be particularly time-consuming, the use of a pseudo-random number generator may be preferable. A pseudo-random number generator (PRNG) is a type of generator that uses a small sequence generated from an RNG as an input, which is called a “seed”, and generates one or more pseudorandom sequences using transformations of the seed. The NIST publication states that PRNGs may appear more statistically random than RNGs; due to the transformations applied to the seed, it usually introduces additional randomness to the output of the PRNG and provides better statistical properties. However, they are still deterministic and not truly random; however, they may be faster and better suited for applications that require a long sequence, as the transformation of the random seed may take less time than to generate the desired sequence length from the RNG itself.

Quantum Random Number Generation
#

Quantum random number generation (QRNG) is a process in which a non-deterministic quantum source is used to generate a random sequence. This particular method of generation provides the properties of fundamental unpredictability and a source of entropy that is well suited for generating seeds, as quantum effects and states are inherently unpredictable, as outlined in quantum physics. This provides a higher level of randomness added to the generated seeds. These seeds may be generated with a quantum computer (QC). However, as QCs are currently not fault tolerant, they may provide errors in the outputs they provide, which, according to the tolerance specified, may cause the statistical tests to fail, as the statistical tests are done on the output of the QC, not the inherent quantum source within. These errors in the output of the QC may be mitigated through quantum correction algorithms and other methods.

The Purpose Of The Statistical Tests Regarding RNG Statistical Variations
#

The general application and use of the statistical tests regarding the statistical characteristics of the outputs of the generator are to analyse the quality of the outputs generated. The following descriptions outline what each test evaluates.

Frequency (Monobit) Test: The purpose of this test is to see the distribution of ones and zeros throughout the generated bit sequence. The same amount of zeros and ones would be expected from a generator that is perfectly random, given a long enough sequence. The fractions of the appearance for each should be near ½. All subsequent tests are conditioned on having passed this first basic test.

Frequency Test within a Block: Similarly to the Monobit test, this test evaluates the distribution of ones and zeros in M-bit blocks. Ideally, the distribution of ones and zeros would be M/2. If block size M=1, the test degenerates into the Monobit test.

Runs Test: This test evaluates the runs in the test, which is the longest sequence of continuous ones or zeros. A run of length k consists of exactly k identical bits and is bounded before and after with a bit of the opposite value. A random sequence would have runs of ones and zeros, but the test’s main purpose is the evaluation of the speed of oscillation between zeros and ones being too fast or too slow, which would, in either case, produce bias.

Test for the Longest Run of Ones in a Block: This test evaluates the longest run of ones within M-bit blocks. The purpose of this test is to determine whether the length of the longest run of ones within the tested sequence is consistent with the length of the longest run of ones that would be expected in a random sequence. Intuitively, the irregularity in the longest run of ones would also indicate an irregularity in the length of the longest run of zeros. This results in only a singular test being necessary.

Binary Matrix Rank Test: The focus of the test is the rank of disjoint sub-matrices of the entire sequence. This test’s purpose is to check for linear dependence among fixed-length substrings of the original sequence.

Discrete Fourier Transform (Spectral) Test: This test focuses on the peak heights in the Discrete Fourier Transformation of the sequence. The test detects periodic failures in the sequence, which would show a difference from the assumption of randomness. The intention is to detect whether the number of peaks exceeding the 95% threshold is significantly different than 5%.

Non-overlapping Template Matching Test: The focus of this test is the number of occurrences of pre-specified target strings. This test is a detector for generators which produce an aperiodic pattern above the tolerance given. For this test and for the Overlapping Template Matching test of Section 2.8 within the suite, an m-bit window is used to search for a specific m-bit pattern. If the pattern is not found, the window slides one bit in position. If the pattern is found, the window is reset to the bit after the found pattern, and the search resumes.

Overlapping Template Matching Test: The Overlapping Template Matching test checks the number of appearances of pre-specified target strings. This test, alongside the Non-overlapping Template Matching test, uses an m-bit window to search for the given m-bit pattern. If the pattern is not found, the window slides one bit position. When the pattern is found, the window slides only one bit before resuming the search.

Maurer’s “Universal Statistical” Test: The focus of this test is the number of bits between matching patterns (a measure that is related to the length of a compressed sequence). The test checks the compression capability of the sequence without a loss in information. A sequence that can be compressed significantly is to be taken as non-random.

Linear Complexity Test: The focus of this test is the length of a linear feedback shift register (LFSR), which finds the shortest possible formula for regenerating the sequence. The test evaluates the complexity of the sequence, as a sequence that is not complex enough may not be considered random. Longer LFSRs imply randomness, whereas shorter ones imply non-randomness, as if the sequence is truly random, the shortest formula will be almost as long as the sequence itself, whereas shorter formulas may be predictable. This specific test is stated to be the most time-consuming in the NIST suite, which should be taken into account.

Serial Test: The focus of this test is the frequency of all possible overlapping m-bit patterns across the entire sequence. 2^m m-bit overlapping patterns being approximately the same is expected of a random sequence, which the test in question evaluates. Random sequences are uniform; every m-bit pattern has the same chance of appearing as every other m-bit pattern. When m=1, the Serial test is the same as the Frequency Test.

Approximate Entropy Test: Similarly to the Serial test, the focus of this test is the frequency of all possible overlapping m-bit patterns across the entire sequence. The test compares the frequency of overlapping blocks of two adjacent/consecutive lengths (m and m+1) to the expected result of a random sequence.

Cumulative Sums (Cusum) Test: The focus of this test is the maximal excursion (from zero) of the random walk defined by the cumulative sum of adjusted (-1, +1) digits in the sequence. The test determines whether the cumulative sum of the partial sequence occurring in the tested sequence is too large or small relative to the expected behaviour of the cumulative sum for random sequences. The cumulative sum can be taken as a random walk. For random sequences, the excursion of the random walk should be near zero, whereas, in certain types of non-random sequences, the excursion of this random walk from zero will be large.

Random Excursions Test: The focus of this test is the number of cycles having exactly K visits in a cumulative sum random walk. The cumulative sum random walk is derived from partial sums after the (0,1) sequence is transferred to the appropriate (-1, +1) sequence. A cycle of a random walk consists of a sequence of steps of unit length taken at random that begins at and returns to the origin. The purpose of this test is to determine if the number of visits to a particular state within a cycle deviates from what one would expect for a random sequence. This test is actually a series of eight tests (and conclusions), one test and conclusion for each of the states: -4, -3, -2, -1 and +1, +2, +3, +4.

Random Excursions Variant Test: The focus of this test is the total number of times that a particular state is visited (i.e., occurs) in a cumulative sum random walk. The purpose of this test is to detect deviations from the expected number of visits to various states in the random walk. This test is actually a series of eighteen tests (and conclusions), one test and conclusion for each of the states: -9, -8, …, -1 and +1, +2, …, +9.

Testing Strategy And Implementation Of Statistical Results:
#

The testing and implementation of the statistical data could be done according to the following pipeline.

1. Selecting a generator: A software or hardware-based generator may be implemented; the generator has to produce a binary sequence of length n of 0s and 1s.

2. Generating a binary sequence: For a fixed sequence of length n, the pre-selected generator constructs a set of m binary sequences, saving the results into a file.

3. Execution of the Statistical Suite: Invoke the NIST Statistical Test Suite using the file generated that contains the binary sequence. Select the statistical tests and relevant input parameters to be tested.

4. Examination of the P-values: An output file will be generated by the testing suite and will have a list of P-values for each statistical test done. According to this list of P-values, analysis of the generator’s quality can be made.

5. Assignment of Pass or Fail designation to the tests: For a fixed significance level, a certain percentage of the P-values are expected to indicate failure. For example, if the chosen significance level is deemed to be 0.01 ($\alpha$ = 0.01), then 1% of the sequences are expected to fail. A sequence successfully passes the statistical test when the P-value is greater than the chosen level of significance and fails otherwise. For each statistical test, the proportion of sequences that pass is computed and analysed accordingly. More in-depth analysis should be performed using additional statistical methods and procedures.

Interpretation of the Empirical Results Gained Post Testing:
#

There are three scenarios available to consider after the testing has been completed and the results from the statistical suite have been recorded.

Case 1: The analysis of the P-values does show a deviation from randomness.

Case 2: The analysis clearly indicates a deviation from randomness.

Case 3: The analysis is inconclusive.

The interpretation of the empirical results may be made in several ways. The NIST has adopted the following:

(1) the examination of the proportion of sequences that pass a statistical test, and

(2) the distribution of P-values to check for uniformity.

If either of the given approaches fails, additional numerical experiments should be conducted on different samples from the generator to determine whether the phenomenon was a statistical anomaly or clear evidence of non-randomness.

Proportion of Sequences Passing a Test
#

Given the empirical results from a test, the proportion of tests that are deemed to have passed or failed is determined by taking into consideration the confidence interval chosen and calculated according to the formula:

$$\hat{p} \pm 3 \sqrt{\frac{\hat{p}(1 - \hat{p})}{m}}$$

Where $\hat{p} = 1 - \alpha$.

The range of acceptable proportions is determined using the confidence interval defined above.

If the proportion falls outside of this interval, then there is evidence that the data is non-random. Other standard deviation values could also be used.

Uniform Distribution of P-values

The distribution of the P-values is examined to ensure uniformity. Uniformity may also be determined via an application of a $\chi^2$ test and the determination of a P-value corresponding to the Goodness-of-Fit Distributional Test on the P-values obtained for an arbitrary statistical test.

$$\chi^2 = \sum_{i=1}^{10} \frac{(F_i - s/10)^2}{s/10}$$

Common causes for error
#

An incorrectly programmed statistical test: As the code for each test has been created for a particular problem type, they feature selection of parameters to be made for the tests. However, this does not mean the programme is easy to create.

An underdeveloped (immature) statistical test: There are occurrences where probability or complexity theory has not sufficiently developed in order to rigorously analyse a statistical test. Since many statistical tests are based upon asymptotic approximations, detailed work should be done in order to determine how good an approximation is.

Poor mathematical routines for computing P-values: Quality mathematical software must be used to ensure good approximations whenever possible. In particular, the incomplete gamma function is more difficult to approximate for larger values of the constant a. Eventually, P-value formulas will result in false values due to difficulties arising from numerical approximations. The NIST publication also includes recommended values for the parameters.

Incorrect choices for input parameters: In real-life conditions, a statistical test will not provide reliable results for all seemingly valid input parameters. Constraints made on the tests are made on a test- by-test basis; thus, the statistical tests are sensitive to input parameters. Considerations must often be made regarding the numerical experimentations input parameters, namely: sequence length, sample size, block size, and template.

Conclusion
#

The NIST suite provides valuable user-side information alongside the mathematical models used for carrying out each test but it is not an all-encompassing standard for checking an RNG’s quality in real-life circumstances. Additional tests must be conducted on the outputs of the generator, and issues that appear due to the generator’s nature must be accounted for and fixed before usage in real-world applications. However, it is a good guide in testing certain statistical qualities of the RNG that is being tested.

References
#

“A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications”.

Introduction#

A General Overview Of Random Number Generation#

Pseudo Random Number Generation#

Quantum Random Number Generation#

The Purpose Of The Statistical Tests Regarding RNG Statistical Variations#

Testing Strategy And Implementation Of Statistical Results:#

Interpretation of the Empirical Results Gained Post Testing:#

Proportion of Sequences Passing a Test#

Common causes for error#

Conclusion#

References#