# Quasi-Monte Carlo submodule (scipy.stats.qmc)¶

This module provides Quasi-Monte Carlo generators and associated helper functions.

## Quasi-Monte Carlo¶

### Engines¶

 QMCEngine(d, *[, seed]) A generic Quasi-Monte Carlo sampler class meant for subclassing. Sobol(d, *[, scramble, seed]) Engine for generating (scrambled) Sobol’ sequences. Halton(d, *[, scramble, seed]) Halton sequence. LatinHypercube(d, *[, centered, seed]) Latin hypercube sampling (LHS). MultinomialQMC(pvals, *[, engine, seed]) QMC sampling from a multinomial distribution. MultivariateNormalQMC(mean[, cov, cov_root, …]) QMC sampling from a multivariate Normal $$N(\mu, \Sigma)$$.

### Helpers¶

 discrepancy(sample, *[, iterative, method, …]) Discrepancy of a given sample. update_discrepancy(x_new, sample, initial_disc) Update the centered discrepancy with a new sample. scale(sample, l_bounds, u_bounds, *[, reverse]) Sample scaling from unit hypercube to different bounds.

## Introduction to Quasi-Monte Carlo¶

Quasi-Monte Carlo (QMC) methods , ,  provide an $$n \times d$$ array of numbers in $$[0,1]$$. They can be used in place of $$n$$ points from the $$U[0,1]^{d}$$ distribution. Compared to random points, QMC points are designed to have fewer gaps and clumps. This is quantified by discrepancy measures . From the Koksma-Hlawka inequality  we know that low discrepancy reduces a bound on integration error. Averaging a function $$f$$ over $$n$$ QMC points can achieve an integration error close to $$O(n^{-1})$$ for well behaved functions .

Most QMC constructions are designed for special values of $$n$$ such as powers of 2 or large primes. Changing the sample size by even one can degrade their performance, even their rate of convergence . For instance $$n=100$$ points may give less accuracy than $$n=64$$ if the method was designed for $$n=2^m$$.

Some QMC constructions are extensible in $$n$$: we can find another special sample size $$n' > n$$ and often an infinite sequence of increasing special sample sizes. Some QMC constructions are extensible in $$d$$: we can increase the dimension, possibly to some upper bound, and typically without requiring special values of $$d$$. Some QMC methods are extensible in both $$n$$ and $$d$$.

QMC points are deterministic. That makes it hard to estimate the accuracy of integrals estimated by averages over QMC points. Randomized QMC (RQMC)  points are constructed so that each point is individually $$U[0,1]^{d}$$ while collectively the $$n$$ points retain their low discrepancy. One can make $$R$$ independent replications of RQMC points to see how stable a computation is. From $$R$$ independent values, a t-test (or bootstrap t-test ) then gives approximate confidence intervals on the mean value. Some RQMC methods produce a root mean squared error that is actually $$o(1/n)$$ and smaller than the rate seen in unrandomized QMC. An intuitive explanation is that the error is a sum of many small ones and random errors cancel in a way that deterministic ones do not. RQMC also has advantages on integrands that are singular or, for other reasons, fail to be Riemann integrable.

(R)QMC cannot beat Bahkvalov’s curse of dimension (see ). For any random or deterministic method, there are worst case functions that will give it poor performance in high dimensions. A worst case function for QMC might be 0 at all n points but very large elsewhere. Worst case analyses get very pessimistic in high dimensions. (R)QMC can bring a great improvement over MC when the functions on which it is used are not worst case. For instance (R)QMC can be especially effective on integrands that are well approximated by sums of functions of some small number of their input variables at a time , . That property is often a surprising finding about those functions.

Also, to see an improvement over IID MC, (R)QMC requires a bit of smoothness of the integrand, roughly the mixed first order derivative in each direction, $$\partial^d f/\partial x_1 \cdots \partial x_d$$, must be integral. For instance, a function that is 1 inside the hypersphere and 0 outside of it has infinite variation in the sense of Hardy and Krause for any dimension $$d = 2$$.

Scrambled nets are a kind of RQMC that have some valuable robustness properties . If the integrand is square integrable, they give variance $$var_{SNET} = o(1/n)$$. There is a finite upper bound on $$var_{SNET} / var_{MC}$$ that holds simultaneously for every square integrable integrand. Scrambled nets satisfy a strong law of large numbers for $$f$$ in $$L^p$$ when $$p>1$$. In some special cases there is a central limit theorem . For smooth enough integrands they can achieve RMSE nearly $$O(n^{-3})$$. See  for references about these properties.

The main kinds of QMC methods are lattice rules  and digital nets and sequences , . The theories meet up in polynomial lattice rules  which can produce digital nets. Lattice rules require some form of search for good constructions. For digital nets there are widely used default constructions.

The most widely used QMC methods are Sobol’ sequences . These are digital nets. They are extensible in both $$n$$ and $$d$$. They can be scrambled. The special sample sizes are powers of 2. Another popular method are Halton sequences . The constructions resemble those of digital nets. The earlier dimensions have much better equidistribution properties than later ones. There are essentially no special sample sizes. They are not thought to be as accurate as Sobol’ sequences. They can be scrambled. The nets of Faure  are also widely used. All dimensions are equally good, but the special sample sizes grow rapidly with dimension $$d$$. They can be scrambled. The nets of Niederreiter and Xing  have the best asymptotic properties but have not shown good empirical performance .

Higher order digital nets are formed by a digit interleaving process in the digits of the constructed points. They can achieve higher levels of asymptotic accuracy given higher smoothness conditions on $$f$$ and they can be scrambled . There is little or no empirical work showing the improved rate to be attained.

Using QMC is like using the entire period of a small random number generator. The constructions are similar and so therefore are the computational costs .

(R)QMC is sometimes improved by passing the points through a baker’s transformation (tent function) prior to using them. That function has the form $$1-2|x-1/2|$$. As $$x$$ goes from 0 to 1, this function goes from 0 to 1 and then back. It is very useful to produce a periodic function for lattice rules , and sometimes it improves the convergence rate .

It is not straightforward to apply QMC methods to Markov chain Monte Carlo (MCMC). We can think of MCMC as using $$n=1$$ point in $$[0,1]^{d}$$ for very large $$d$$, with ergodic results corresponding to $$d \to \infty$$. One proposal is in  and under strong conditions an improved rate of convergence has been shown .

Returning to Sobol’ points: there are many versions depending on what are called direction numbers. Those are the result of searches and are tabulated. A very widely used set of direction numbers come from . It is extensible in dimension up to $$d=21201$$.