---
title: "Function Minimization with Markov chains"
author: "Prof. Richey an Prof. Wright"
date: "April 30, 2018"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Introduction
Up to this point, we have used Markov chains to create a trail of agent locations distributed according to predefined steady state probability distribution. Now we want to use this tool to solve other problems.
## Using Markov chains to minimize a function
Imagine a function $f(x)$ defined on an interval $(x_\text{min},x_\text{max})$.
We want to find the minimum value of $f$.
Start with a simple function:
```{r}
f <- function(x){ x^2 }
```
Define a reasonably large number of states:
```{r}
N <- 100
```
Define an interval:
```{r}
xmin <- -3
xmax <- 3
```
Now create the $x$-values. These are the states the agent will visit.
```{r}
xvals <- seq(xmin, xmax,len=N)
```
Take a peek at the plot of the values of the function on the sequence we just defined.
The minimum is easy to see. We want to find it using a Markov chain!
```{r}
plot(xvals, f(xvals), cex=.3)
```
Here's the key step:
From $f(x)$, we can create a probability distribution on the xvals, via
$$ P(x) = e^{-f(x)/\sigma^2} $$
where $\sigma^2$ represents the "variance". In other words, $1/sigma^2$ is our precision.
**The important thing here is that $P$ is maximimal where $f$ is minimal.**
Let's see what this means in practice. Create the probability function from $f(x)$. Note that the probability function depends on the choice of $\sigma^2$.
```{r}
prob <- function(x,sig2) {
exp(-f(x)/sig2)
}
```
Let's plot it for a few values of $\sigma^2$
```{r}
sig2 <- 1
plot(xvals, prob(xvals, sig2), cex=.3)
```
Now with a smaller, or equivalently more precise, choice of $\sigma^2$.
```{r}
sig2 <- 0.1
plot(xvals, prob(xvals, sig2), cex=.3)
```
Notice how these plots look like normal distributions.
Also notice how much "tighter" the second plot is around its maximum value.
## Implement a Markov chain sampler.
We will start by creating a `doMove` function which depends on both the current state and the value of $\sigma^2$.
This function will work much like our move function from the last class.
The algorithm will propose a transition to a new state, and then use a probabilistic approach that might or might not move to the proposed state.
Let $P_1$ represent the current probability and $P_2$ the proposed probability:
$$ \begin{aligned}
P_1 &= e^{-\frac{f(x_1)}{\sigma^2}} \\
P_2 &= e^{-\frac{f(x_2)}{\sigma^2}}
\end{aligned} $$
Notice how simple the ratio $P_1/P_2$ becomes.
As before, we will have the rule that if $P_2 \ge P_1$, then automatically make the transition.
If $P_2 < P_1$, the make the transition with probability
$$ \rho = \frac{P_2}{P_1} = e^{-\frac{f(x_2)-f(x_1)}{\sigma^2}}. $$
Notice that we can compute this key ratio without computing the probabilities directly. We only need the difference in function values.
These two steps can be combined into a single statement.
* Regardless of the values of $P_1$ and $P_2$, define $\rho$ as above. Note that if $P_2 > P_1$, then $\rho > 1$.
* Generate a uniformly distributed random number $r \in [0,1]$. If $r < \rho$, then make the transition!
## Implementation
Here's an implementation of this strategy. In this case, we will think of the state as one of the $N$ equally spaced values between `(xmin, xmax)`.
First create a function that defines a proposal transition.
```{r}
proposal <- function(currState){
# choose a proposal transition at random
delta <- sample(c(-1,1),1)/N
propState <- currState + delta
# avoid going off the end of the interval
if(propState <= xmin)
propState <- xmin
if(propState >= xmax)
propState <- xmax
propState
}
```
Now we can build a move. If you set things up right, this part almost always looks like this:
```{r}
doMove <- function(currState,sig2){
# propose a transtion
propState<-proposal(currState)
# get function value difference
dFunc <- f(propState) - f(currState)
# calculate rho and make a move based on result
rho <- exp(-dFunc/sig2)
if(runif(1,0,1) < rho){
# return the proposed state (this will be the new state)
return(propState)
}
# return the old state (no transition)
return(currState)
}
```
## Try it out!
Set up: Define the number of steps (varies with the application), a vector to keep the states, an initial state, and a value of \sigma^2$.
```{r}
M <- 100000
vals <- rep(0,M)
currState <- runif(1, xmin, xmax)
sig2 <- 1.0
```
Now the main loop.
```{r}
for(m in 1:M){
vals[m] <- currState
currState <- doMove(currState, sig2)
}
```
Take a peek at the histogram of these values. It should look like a normal distribution with mean $0$ and variance $\sigma^2.$ The larger $M$ is, the more normal this should look.
```{r}
hist(vals,breaks=100,xlim=c(xmin,xmax))
```
A more revealing plot is the "trace" of the Markov chain states.
```{r}
plot(vals, cex=.1)
```
The trace allows us to see how values of $x$ move around. Overall they represent a sample from the normal distribution with the fixed value of $\sigma^2$. The histogram peaks at the minimum value of $f(x)$, but there is a better way to identify this critical value.
# Simulated Annealing
Here is a clever idea: as the Markov chain moves along, slowly decrease $\sigma^2$, which means the probability distribution gets more tightly clustered around its maximum values. This is a way of "trapping" $x$ near the maximum value.
This process is called **simulated annealing.** The key difference is that $\sigma^2$ is slowly decreased as the chain moves along.
Here is another version of the previous code.
Note the only difference is that now $\sigma^2$ slowly decreases.
```{r}
M <- 100000
vals <- rep(0,M)
sig2 <- 1
decFac <- .9999
currState <- runif(1, xmin, xmax)
for(m in 1:M){
vals[m] <- currState
currState <- doMove(currState,sig2)
# slowly decrease sig2
sig2 <- sig2*decFac
}
```
**Note:** The method by which $\sigma^2$ decreases is art, not a science! Even in this simple case, you have to play around with both the initial value of $\sigma^2$ and the factor by which it decreases each step.
The histogram:
```{r}
hist(vals,breaks=100,xlim=c(xmin,xmax))
```
The trace...this is most interesting:
```{r}
plot(vals,cex=.1,type="l")
```
Notice now that the trace of the Markov chains funnels down to the value $x=0$, which the minimum value of $f(x) = x^2$ on the interval $(-3,3)$.
You might think this is a hard way to find a the minimum of $x^2$. You're right, but the idea has wonderful applications in more complicated minimization problems.
The key components are:
* A well defined *state space* for the state variable $x$.
* A natural *proposal transition* on the state space
* A function $f(x)$ on the state space that we want to minimize.
Implementation of simulated annealing can usually be broken into three steps.
* Create a *proposal* function. This very much depends on the problem under consideration.
* Create a *doMove* function. This often is very similar across applications
* Set up an initial value of $\sigma^2$ and a method to slowly decrease $\sigma^2$ as the main loop progresses.
With any luck, and some fiddling around, you can usually design this process so that minimum value of $f(x)$ is identified.
## Exercise: Repeat with a different function
This process can work for any function.
We find a minimum of $f(x)$ by implementing the Markov chain agent model and then slowly decrease the value $\sigma^2$ and hope we lock onto the correct value.
Try it with the following function on the interval $(-3,3)$.
```{r}
f <- function(x) {
(x-2)*(x-1)*x*(x-2.4)
}
```
Run your Markov chain several times and compare the output.
If you finish this before the end of class, try some other functions. Can you consistently find the minimum values?