scipy.stats.qmc.Sobol#

class scipy.stats.qmc.Sobol(d, *, scramble=True, bits=None, seed=None, optimization=None)[source]#

Engine for generating (scrambled) Sobol’ sequences.

Sobol’ sequences are low-discrepancy, quasi-random numbers. Points can be drawn using two methods:

  • random_base2: safely draw \(n=2^m\) points. This method guarantees the balance properties of the sequence.

  • random: draw an arbitrary number of points from the sequence. See warning below.

Parameters:
dint

Dimensionality of the sequence. Max dimensionality is 21201.

scramblebool, optional

If True, use LMS+shift scrambling. Otherwise, no scrambling is done. Default is True.

bitsint, optional

Number of bits of the generator. Control the maximum number of points that can be generated, which is 2**bits. Maximal value is 64. It does not correspond to the return type, which is always np.float64 to prevent points from repeating themselves. Default is None, which for backward compatibility, corresponds to 30.

New in version 1.9.0.

optimization{None, “random-cd”, “lloyd”}, optional

Whether to use an optimization scheme to improve the quality after sampling. Note that this is a post-processing step that does not guarantee that all properties of the sample will be conserved. Default is None.

  • random-cd: random permutations of coordinates to lower the centered discrepancy. The best sample based on the centered discrepancy is constantly updated. Centered discrepancy-based sampling shows better space-filling robustness toward 2D and 3D subprojections compared to using other discrepancy measures.

  • lloyd: Perturb samples using a modified Lloyd-Max algorithm. The process converges to equally spaced samples.

New in version 1.10.0.

seed{None, int, numpy.random.Generator}, optional

If seed is an int or None, a new numpy.random.Generator is created using np.random.default_rng(seed). If seed is already a Generator instance, then the provided instance is used.

Notes

Sobol’ sequences [1] provide \(n=2^m\) low discrepancy points in \([0,1)^{d}\). Scrambling them [3] makes them suitable for singular integrands, provides a means of error estimation, and can improve their rate of convergence. The scrambling strategy which is implemented is a (left) linear matrix scramble (LMS) followed by a digital random shift (LMS+shift) [2].

There are many versions of Sobol’ sequences depending on their ‘direction numbers’. This code uses direction numbers from [4]. Hence, the maximum number of dimension is 21201. The direction numbers have been precomputed with search criterion 6 and can be retrieved at https://web.maths.unsw.edu.au/~fkuo/sobol/.

Warning

Sobol’ sequences are a quadrature rule and they lose their balance properties if one uses a sample size that is not a power of 2, or skips the first point, or thins the sequence [5].

If \(n=2^m\) points are not enough then one should take \(2^M\) points for \(M>m\). When scrambling, the number R of independent replicates does not have to be a power of 2.

Sobol’ sequences are generated to some number \(B\) of bits. After \(2^B\) points have been generated, the sequence would repeat. Hence, an error is raised. The number of bits can be controlled with the parameter bits.

References

[1]

I. M. Sobol’, “The distribution of points in a cube and the accurate evaluation of integrals.” Zh. Vychisl. Mat. i Mat. Phys., 7:784-802, 1967.

[2]

J. Matousek, “On the L2-discrepancy for anchored boxes.” J. of Complexity 14, 527-556, 1998.

[3]

Art B. Owen, “Scrambling Sobol and Niederreiter-Xing points.” Journal of Complexity, 14(4):466-489, December 1998.

[4]

S. Joe and F. Y. Kuo, “Constructing sobol sequences with better two-dimensional projections.” SIAM Journal on Scientific Computing, 30(5):2635-2654, 2008.

[5]

Art B. Owen, “On dropping the first Sobol’ point.” arXiv:2008.08051, 2020.

Examples

Generate samples from a low discrepancy sequence of Sobol’.

>>> from scipy.stats import qmc
>>> sampler = qmc.Sobol(d=2, scramble=False)
>>> sample = sampler.random_base2(m=3)
>>> sample
array([[0.   , 0.   ],
       [0.5  , 0.5  ],
       [0.75 , 0.25 ],
       [0.25 , 0.75 ],
       [0.375, 0.375],
       [0.875, 0.875],
       [0.625, 0.125],
       [0.125, 0.625]])

Compute the quality of the sample using the discrepancy criterion.

>>> qmc.discrepancy(sample)
0.013882107204860938

To continue an existing design, extra points can be obtained by calling again random_base2. Alternatively, you can skip some points like:

>>> _ = sampler.reset()
>>> _ = sampler.fast_forward(4)
>>> sample_continued = sampler.random_base2(m=2)
>>> sample_continued
array([[0.375, 0.375],
       [0.875, 0.875],
       [0.625, 0.125],
       [0.125, 0.625]])

Finally, samples can be scaled to bounds.

>>> l_bounds = [0, 2]
>>> u_bounds = [10, 5]
>>> qmc.scale(sample_continued, l_bounds, u_bounds)
array([[3.75 , 3.125],
       [8.75 , 4.625],
       [6.25 , 2.375],
       [1.25 , 3.875]])

Methods

fast_forward(n)

Fast-forward the sequence by n positions.

integers(l_bounds, *[, u_bounds, n, ...])

Draw n integers from l_bounds (inclusive) to u_bounds (exclusive), or if endpoint=True, l_bounds (inclusive) to u_bounds (inclusive).

random([n, workers])

Draw n in the half-open interval [0, 1).

random_base2(m)

Draw point(s) from the Sobol' sequence.

reset()

Reset the engine to base state.