--- title: "Function Minimization with Markov chains" author: "Prof. Richey an Prof. Wright" date: "April 30, 2018" output: html_document --- {r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE)  # Introduction Up to this point, we have used Markov chains to create a trail of agent locations distributed according to predefined steady state probability distribution. Now we want to use this tool to solve other problems. ## Using Markov chains to minimize a function Imagine a function$f(x)$defined on an interval$(x_\text{min},x_\text{max})$. We want to find the minimum value of$f$. Start with a simple function: {r} f <- function(x){ x^2 }  Define a reasonably large number of states: {r} N <- 100  Define an interval: {r} xmin <- -3 xmax <- 3  Now create the$x$-values. These are the states the agent will visit. {r} xvals <- seq(xmin, xmax,len=N)  Take a peek at the plot of the values of the function on the sequence we just defined. The minimum is easy to see. We want to find it using a Markov chain! {r} plot(xvals, f(xvals), cex=.3)  Here's the key step: From$f(x)$, we can create a probability distribution on the xvals, via $$P(x) = e^{-f(x)/\sigma^2}$$ where$\sigma^2$represents the "variance". In other words,$1/sigma^2$is our precision. **The important thing here is that$P$is maximimal where$f$is minimal.** Let's see what this means in practice. Create the probability function from$f(x)$. Note that the probability function depends on the choice of$\sigma^2$. {r} prob <- function(x,sig2) { exp(-f(x)/sig2) }  Let's plot it for a few values of$\sigma^2${r} sig2 <- 1 plot(xvals, prob(xvals, sig2), cex=.3)  Now with a smaller, or equivalently more precise, choice of$\sigma^2$. {r} sig2 <- 0.1 plot(xvals, prob(xvals, sig2), cex=.3)  Notice how these plots look like normal distributions. Also notice how much "tighter" the second plot is around its maximum value. ## Implement a Markov chain sampler. We will start by creating a doMove function which depends on both the current state and the value of$\sigma^2$. This function will work much like our move function from the last class. The algorithm will propose a transition to a new state, and then use a probabilistic approach that might or might not move to the proposed state. Let$P_1$represent the current probability and$P_2the proposed probability: \begin{aligned} P_1 &= e^{-\frac{f(x_1)}{\sigma^2}} \\ P_2 &= e^{-\frac{f(x_2)}{\sigma^2}} \end{aligned} Notice how simple the ratioP_1/P_2$becomes. As before, we will have the rule that if$P_2 \ge P_1$, then automatically make the transition. If$P_2 < P_1$, the make the transition with probability $$\rho = \frac{P_2}{P_1} = e^{-\frac{f(x_2)-f(x_1)}{\sigma^2}}.$$ Notice that we can compute this key ratio without computing the probabilities directly. We only need the difference in function values. These two steps can be combined into a single statement. * Regardless of the values of$P_1$and$P_2$, define$\rho$as above. Note that if$P_2 > P_1$, then$\rho > 1$. * Generate a uniformly distributed random number$r \in [0,1]$. If$r < \rho$, then make the transition! ## Implementation Here's an implementation of this strategy. In this case, we will think of the state as one of the$N$equally spaced values between (xmin, xmax). First create a function that defines a proposal transition. {r} proposal <- function(currState){ # choose a proposal transition at random delta <- sample(c(-1,1),1)/N propState <- currState + delta # avoid going off the end of the interval if(propState <= xmin) propState <- xmin if(propState >= xmax) propState <- xmax propState }  Now we can build a move. If you set things up right, this part almost always looks like this: {r} doMove <- function(currState,sig2){ # propose a transtion propState<-proposal(currState) # get function value difference dFunc <- f(propState) - f(currState) # calculate rho and make a move based on result rho <- exp(-dFunc/sig2) if(runif(1,0,1) < rho){ # return the proposed state (this will be the new state) return(propState) } # return the old state (no transition) return(currState) }  ## Try it out! Set up: Define the number of steps (varies with the application), a vector to keep the states, an initial state, and a value of \sigma^2$. {r} M <- 100000 vals <- rep(0,M) currState <- runif(1, xmin, xmax) sig2 <- 1.0  Now the main loop. {r} for(m in 1:M){ vals[m] <- currState currState <- doMove(currState, sig2) }  Take a peek at the histogram of these values. It should look like a normal distribution with mean $0$ and variance $\sigma^2.$ The larger $M$ is, the more normal this should look. {r} hist(vals,breaks=100,xlim=c(xmin,xmax))  A more revealing plot is the "trace" of the Markov chain states. {r} plot(vals, cex=.1)  The trace allows us to see how values of $x$ move around. Overall they represent a sample from the normal distribution with the fixed value of $\sigma^2$. The histogram peaks at the minimum value of $f(x)$, but there is a better way to identify this critical value. # Simulated Annealing Here is a clever idea: as the Markov chain moves along, slowly decrease $\sigma^2$, which means the probability distribution gets more tightly clustered around its maximum values. This is a way of "trapping" $x$ near the maximum value. This process is called **simulated annealing.** The key difference is that $\sigma^2$ is slowly decreased as the chain moves along. Here is another version of the previous code. Note the only difference is that now $\sigma^2$ slowly decreases. {r} M <- 100000 vals <- rep(0,M) sig2 <- 1 decFac <- .9999 currState <- runif(1, xmin, xmax) for(m in 1:M){ vals[m] <- currState currState <- doMove(currState,sig2) # slowly decrease sig2 sig2 <- sig2*decFac }  **Note:** The method by which $\sigma^2$ decreases is art, not a science! Even in this simple case, you have to play around with both the initial value of $\sigma^2$ and the factor by which it decreases each step. The histogram: {r} hist(vals,breaks=100,xlim=c(xmin,xmax))  The trace...this is most interesting: {r} plot(vals,cex=.1,type="l")  Notice now that the trace of the Markov chains funnels down to the value $x=0$, which the minimum value of $f(x) = x^2$ on the interval $(-3,3)$. You might think this is a hard way to find a the minimum of $x^2$. You're right, but the idea has wonderful applications in more complicated minimization problems. The key components are: * A well defined *state space* for the state variable $x$. * A natural *proposal transition* on the state space * A function $f(x)$ on the state space that we want to minimize. Implementation of simulated annealing can usually be broken into three steps. * Create a *proposal* function. This very much depends on the problem under consideration. * Create a *doMove* function. This often is very similar across applications * Set up an initial value of $\sigma^2$ and a method to slowly decrease $\sigma^2$ as the main loop progresses. With any luck, and some fiddling around, you can usually design this process so that minimum value of $f(x)$ is identified. ## Exercise: Repeat with a different function This process can work for any function. We find a minimum of $f(x)$ by implementing the Markov chain agent model and then slowly decrease the value $\sigma^2$ and hope we lock onto the correct value. Try it with the following function on the interval $(-3,3)$. {r} f <- function(x) { (x-2)*(x-1)*x*(x-2.4) }  Run your Markov chain several times and compare the output. If you finish this before the end of class, try some other functions. Can you consistently find the minimum values?