Markov Chains, Central Limit Theorem and the Metropolis-Hastings
In the previous article I gave a generic overview of Monte Carlo as well as introduced importance sampling. We now dive deeper by giving strict definitions of some of the widely used and yet misunderstood or rather commonly neglected concepts due to its perceived importance. There after we explore the Metropolis-Hastings algorithm which form a foundation of numerous Markov Chain Monte Carlo methods.
Markov Chains Monte Carlo
Markov Chains were invented soon after ordinary Monte Carlo by a Los Almos who used computation to obtain such simulations. The chains are classically obtained from following the variety of algorithms with the Metropolis algorithm being the most embryonic and yet significant as it paved the way for more elaborate algorithms like the Metropolis-Hasting algorithm. Since Monte Carlo integrations draw samples from a required distribution and forms sample averages to approximate expectation, the Markov Chains Monte Carlo draw these samples by running cleverly constructed Markov chains for a long time (Gilks , Richardson, & Spiegelhalter)
Let us at this point give strict definitions of some important terms and concepts (Some definitions are extracted from Geyer C.J: Introduction to Markov Chain Monte Carlo):
Markov Chain: A sequence , , , … Of random elements of some set is a Markov chain if the conditional distribution of given , , , … , only depends on . From this definition we see that in order to compute the next random variable we only need current information and nothing before where we currently stand thus greatly reducing the time spent on sourcing and calculation time.
Reversibility: A transition probability distribution is reversible with respect to the initial distribution if, of a Markov Chain , , , … , the distribution of pairs are exchangeable. Reversibility eases the applications of the central limit Theorem (CLT) as the conditions of the CLT are much simpler when this property holds.
Stationarity: A stochastic process ( in and element of the parameter space) is stationary if all have the same mean, variance and autocorrelation.
Central Limit Theorem: For a stationary, irreducible, reversible Markov Chain and and as well as variance with finite then where is the sample estimate . We can further explain it in a general manner as follows: for any sequence of random variables , , , … With , , , … , independently identically distributed, the joint distribution function as n tends to infinity is given by a Normal distribution. It is worth noting that various forms of the CLT exists all with their specific requirements
Monte Carlo Estimates and Central Limit Theorem
In order to use the CLT to estimate the Monte Carlo error (not discussed here but equally significant), we need a consistent estimate of the variance or at least a variance estimate whose asymptotic distribution is known (Geyer, 1992). Many methods of variance estimation have been proposed, most coming from time series literature and are applicable to arbitrary stationary stochastic processes, not just to Markov chains (Geyer, 1992). We now look at one such method for standard time series.
Non-overlapping Batch Means
Under the Non-overlapping Batch Means (batch: being a subsequence of conservative iterations of the Markov chain with n terms , , … , and assuming a stationary Markov Chain we will have that all the batches of the same length have the same joint distribution (Geyer, 1992). Also the Central Limit Theorem will apply to each batch.
The batch mean is given by
this variance will then be used in the computation our Central limit theorem.
The metropolis-Hastings algorithm is given by the following steps given
We call the acceptance probability. By performing this process we obtain the so called Metropolis Hasting Markov Chain, , , , …, with approximately distributed according to . ( is what we may consider our ideal sample distribution) for large . (Kroese, Taimre, & Botev, 2011).
The main aim behind the Metropolis-Hasting is to simulate a Markov chain such that the stationary distribution of this chain coincides with the target distribution.
Though MC method are often more efficient than conventional numeric methods its implementation can be very difficult as well as expensive to compute both in time and analysis. Hastings proposes a generalized method which stems from Metropolis et al (1953). The main features of this sampling method is that
- The computation depends on only through ratios of the form where and are sample points eliminating the need to factorize nor determine the normalizing constant
Further literature is available on Metropolis-Hastings but what we highlighted is sufficient at this point. Ideally what we try to do with algorithms like the Metropolis is to come up with a sample that is as close to reality as possible thus ensuring which ever proposed model is tested under as realistic settings as possible.
We have to this point gathered enough theoretical knowledge to structure a strategy that may be employed for investment usage. In our last part of the trilogy we will make use of all concepts developed up to know evaluating their significance.
Geyer, C. J. (1992). Practical Markov Chain Monte Carlo. Statistical Science vol 7 No 4, 473-483.
Gilks , W. R., Richardson, S., & Spiegelhalter, D. J. (n.d.). Intrduccing Markov Chain Monte Carlo.
Kroese, D., Taimre, T., & Botev, Z. I. (2011). Hand Book of Monte Carlo Methods. New Jersey: John Wiley & Sons .
Xu, H., & Zhang, D. (2010). Monte Carlo for Mean-Risk Optimization and Portfolio Selection. Southhampton: University of Southhampton.