That said, I recently had to perform an optimization of a non-linear function (if you don't understand what that means, don't worry), and "nlopt" is a really fast way to do that. I nearly pulled out my hair trying to get NLOPT to play nicely with R and julia on my computer, so I thought I'd share the end product to hopefully prevent a few meltdowns from other people who didn't grow up programming or have never taken a class in C.
Anywho, here's the story:
R has lots of ways to farm out things to existing C libraries, which are super fast. One way to perform opimization (for example, maximum likelihood) is to use C libraries that can optimize any function that can be minimized (for example, taking the negative of the likelihood and minimizing it). Smart people have written these algorithms, and, if you have some non-standard estimation to do, then the C libraries are a good option in R
The NLoptr package in R interfaces with NLOPT, which is a C library (collection of different functions written in C) that has some great optimization routines. For example, the Nelder-Mead simplex algorithm was the first non-linear optimization routine on which I cut my teeth. "Non-linear" here just means (*lots of hand waving*), but that it can also work with simpler functions, such as linear regression. Everyone (well, everyone who still cares at this point) understands linear regression, so it's a good starting point for learning the NLOPT routines.
Here are the steps I had to carry out (not necessarily the ones I DID carry out, since those involved heavy drinking and therapy):
- Buy a Mac and make yourself an admin user (search Google - I did this a while ago)
- Install Xcode command line tools (see above)
- Install the NLOPT library on your mac
- Install the "nloptr"package in R
- Grossly over-code a linear regression in order to understand how to use nloptr
Let's skip to step 2:
Download the nlopt source code file at into /Users/foo/Downloads.
In terminal, run this:
cd /Users/foo/Downloads tar -xvf nlopt-2.3.tar.gz cd nlopt-2.3 ./configure --enable-shared --prefix="/usr" make sudo make install
The "./configure" statement is key, this tells your computer to set up NLOPT as a shared library in the "/usr" folder of your computer. I have used some programs *cough* julia *cough* that could only handle shared libraries with this particular prefix (I'm not sure if R is the same, but try this way of configuring if it doesn't work right away). The sudo command works like this.
Now to the R. This is basic:
library("nloptr")
library("ggplot2")
#help(package="nloptr") #if you didn't know this one
Make the data and define the optimization algorithm. In this case, since we are doing linear regression, it's just the normal log-likelihood.
N=100
set.seed(8675309)
x = rnorm(N)
y = rnorm(N, 0 + .15*x)
normal_likelihood = function(beta, x, y){
mu = beta[1] + beta[2]*x
sigma=beta[3]
#contribution of each individual to the log likelihood
lli <- log(2*pi)/2 - log(sigma^2)/2 - 1/(2*sigma^2)*(y-mu)^2
logLik <- sum(lli)
logLik
}
mystartvals = c(0, .4, 1)
normal_likelihood(beta=mystartvals, x, y)
This is the log-likelihood of the data, given the model, evaluated at the starting values
[1] 41.86465
Now let's have fun! One way to visualize optimization is by plotting the objective function (the log-likelihood, in this case) over a set of parameter values. I'll use the ggplot2 package and the stat_contour statement to make a contour plot. I usually find contour plots extremely hard to interpret, but this one is easy. This involves a "grid search" for the maximum likelihood estimate, i.e. I evaluate the objective function over a grid of points and then just pick out the grid points that give me the highest likelihood (or, equivalently lowest value of the deviance=2*LogLik, here).
#plotting the log likelihood (assume sigma=1)
alongb1 = seq(-.5, .5, 0.01)
alongb2 = seq(-.5, .5, 0.01)
store = matrix(nrow = length(alongb1)*length(alongb2), ncol=3)
idx = 1
for(i in alongb1){
for(j in alongb2){
store[idx,] = c(i,j,-2*normal_likelihood(beta=c(i,j, 1), x, y))
idx=idx+1
}
}
dat = data.frame(store)
names(dat) = c("beta1", "beta2", "Dev")
str(dat)
#this is the maximimum likelihood estimate along our grid of points
mindev = dat[dat$Dev==min(dat$Dev),]
z = ggplot(dat, aes(beta1, beta2, z=Dev), geom="contour") + stat_contour(binwidth=3.84) + theme_bw() +
geom_point(data=mindev, aes(beta1, beta2)) + geom_text(data=mindev, aes(beta1, beta2), label=paste0("MLE =(", round(mindev$beta1, 2), ", ", round(mindev$beta2, 2), ")"), vjust=2)
#MLE for beta2, given that beta1 = 0
ggsave("/Users/foo/Desktop/contour.png", z)
Note here that I am setting "sigma" to be 1. Since I simulated the data, I know this to be true, but it is part of the assumptions necessary to make this function plottable (how would you plot in four dimensions?) Here's the result, very nice! Now let's optimize this function! The point of optimization is to do basically what we did with the plot, so let's break down the steps:
- set up constraints (in our grid search, we decided to look on only over the range of beta1 and beta2 on the axes of the plot)
- evaluate that function over a range of values (in the grid search, these values were pre-defined, but during optimization these values will be chosen by the optimization algorithm according to the "shape" of the objective function).
- Pick the values of the parameters where the objective function is optimal (minimized or maximized, depending on how you set up the objective function)
Here that is with the "nloptr" package. First, set up starting values and constraints (upper and lower bounds on beta1, beta2, and sigma). Note that most of my bounds are infinite (or negatively infinite), except for sigma, which has a lower bound of 0 (you can't have a negative variance!)
#using NL opt #starting values (repeated from above) mystartvals = c(0, .4, 1) #lower bounds mylb = c(-Inf, -Inf, 0) #upper bounds myub = c(Inf, Inf, Inf)
Next, define the objective function to minimize (the default is minimization in NLOPT). The arguments for the functions should be only the parameters you are trying to minimize over. In this case, that is beta1, beta2, and sigma. Thus, I create a "wrapper" function for my original log likelihood where the only argument is a vector of length three that corresponds to beta1, beta2, and sigma.
#wrapper function (to avoid need to call x and y, and also to have a function to be minimized)
normal_likelihood_foropt = function(...){
dev = -2*normal_likelihood(..., x=x, y=y)
}
And without further ado, let's minimize!
nloptr(x0 = mystartvals, eval_f = normal_likelihood_foropt, eval_grad_f = NULL, #optional gradient function (needed for some algorithms) lb = mylb, ub = myub, opts = list(algorithm= "NLOPT_LN_NELDERMEAD", maxeval=1000, xtol_rel=10e-8) #options list available by submitting nloptr.print.options() )
The only thing to note from the above is the "opts" part of the "nloptr" function call. I use the Nelder-Mead simplex algorithm (since it doesn't require me to use find any derivatives), and set the xtol_rel value to 10e-8 (10 to the minus 8). This makes sure that the optimization doesn't stop until the value of each parameter changes by less than 10e-8, relative to its absolute value (so it accepts larger error for very large parameter values). "maxeval" just tells the algorithm to not stop before 1000 iterations if it hasn't successfully minimized the function - easy enough! Because the Nelder-Mead algorithm does not require the objective function gradient (derivative), it is a true lazy persons' algorithm. Hence, my choice. It might also be your choice if there is no good way to calculate a derivative. One tradeoff is that it can take a while to converge (196 iterations, here - good thing NLOPT is fast!).
Here's the output from nloptr.
Call: nloptr(x0 = mystartvals, eval_f = normal_likelihood_foropt, eval_grad_f = NULL, lb = mylb, ub = myub, opts = list(algorithm = "NLOPT_LN_NELDERMEAD", maxeval = 1000, xtol_rel = 0.0000001)) Minimization using NLopt version 2.3.0 NLopt solver status: 4 ( NLOPT_XTOL_REACHED: Optimization stopped because xtol_rel or xtol_abs (above) was reached. ) Number of Iterations....: 196 Termination conditions: maxeval: 1000 xtol_rel: 0.0000001 Number of inequality constraints: 0 Number of equality constraints: 0 Optimal value of objective function: -87.0180037431632 Optimal value of controls: -0.05436936 0.2180665 0.9839782Now compare with a linear fit (using GLM so we can compare the sigma values)
> summary(glm(y ~ x, dat, family=gaussian))
...
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.05437 0.09956 -0.546 0.5862
x 0.21807 0.10752 2.028 0.0453
(Dispersion parameter for gaussian family taken to be 0.9879727)
...
As you can see, the results are very close for beta1 = (intercept), beta2 = x, and sigma = dispersion parameter (sort of, there are differences between my sigma estimates based on degrees of freedom - that's for a statistics text to explain). Both of these are also very close to the values we found on the grid search, plotted above (assuming sigma=1).
Optimization is worth learning if for no other reason than to help understand some of the black-boxiness that typically goes into models. It's still very black-boxy, but at least, by the time you are done with this process we've had to think about some of the assumptions that our results are based on. Happy optimizing!






