Neverending Math Equation: Julia: sometimes it's just better not to multitask

As a follow up to my last post regarding julia, here is the faster, vectorized version of one of the functions I created last time to make a dataset for use in simulations:

#vectorized
function mk_data(N, dummy::Int64)
        z::Array{Int64,1} = rand_bernoulli(N, 0.5)
        x::Array{Int64,1} = rand_bernoulli(N, 1/(1+exp(-(log(1) - log(2)*.5 + .*(log(2),z)))))
        y::Array{Int64,1} = rand_bernoulli(N, 1/(1+exp(-(log(1) + log(2)*.5 - TheTruth*.5 - .*(log(2),z) + .*(TheTruth,x)))))
    dat = DataFrames.DataFrame(["z"=>z,"x"=>x,"y"=>y])
end

Well - if you have a computer that can handle parallel processing (which, let's face it, if you are reading a write-up about a not-quite-yet-totally-useful programming language, you do - sorry for the syntax), then you can eke just a bit more speed out of some of the functions by utilizing more of your processors.

Here is the even more sped up version of the mk_data function (start up julia with "julia -p n" where n is the number of cores or "workers" you wish (and have) to make available to julia):

#vectorized, parallel
function mk_data_parallel(N, dummy::Int64)
#better way, using vectors, parallel processing
        Z::RemoteRef = @spawn rand_bernoulli(N, 0.5)
        z::Array{Int64,1} = fetch(Z)
        X::RemoteRef = @spawn rand_bernoulli(N, 1/(1+exp(-(log(1) - log(2)*.5 + .*(log(2),z)))))
        x::Array{Int64,1} = fetch(X)                
        Y::RemoteRef = @spawn rand_bernoulli(N, 1/(1+exp(-(log(1) + log(2)*.5 - TheTruth*.5 - .*(log(2),fetch(z)) + .*(TheTruth,fetch(x))))))
        y::Array{Int64,1} = fetch(Y)                
        dat = DataFrames.DataFrame(["z"=>z,"x"=>x,"y"=>y])#found this using "index(data)"
end

How much faster, you ask? Well... a little:

samples = 5000000
@time for(i in 1:20) mk_data(samples, 1);end
@time for(i in 1:20) mk_data_parallel(samples, 1);end

elapsed time: 16.037076406 seconds (15263312584 bytes allocated)
elapsed time: 14.392936369 seconds (15263391460 bytes allocated)

This function makes a nice candidate for the "@spawn" macro in julia. @spawn works by sending away the calculation to one of the "workers" and then immediately starting in on the next process. This involves the extra step of fetching the results from the worker (the 'fetch' command, literally). Think of this as being the equivalent of sending one of your minions to do a job for you - you won't know if it's done (or how much faster they are than you) until you fetch the results from them.

Bo, stop trying to make fetch happen. pic.twitter.com/Ez6hWGFpFc
— The White House (@WhiteHouse) August 13, 2013

So while the Z vector is being created, julia has already started multitasking and putting together X. This works nicely when these are large vectors. Under a smaller sample (but with more iterations run, to really show the difference):

samples = 50000
@time for(i in 1:2000) mk_data(samples, 1);end
@time for(i in 1:2000) mk_data_parallel(samples, 1);end

elapsed time: 26.610033651 seconds (15298193488 bytes allocated)
elapsed time: 27.616185355 seconds (15313162140 bytes allocated)

The parallel version is slower! The @spawn macro involves a bit of overhead, since it must undertake the task of moving things back and forth from the various workers - when this task is long relative to the time saved by having multiple processes operating simultaneously, then parallel operations may not help and may even cost you efficiency. The takeaway here is that all of the whizbang gadgetry in the world can't save us time if we spend too much time on the very human process of trudging from task to task. Efficiency is multi-dimensional. I'm looking at you: laptop!

One implementation note: in order to run code on multiple processes, each process needs to know the functions you are calling. To do this with user defined functions, this means placing your functions into a file (sample.jl) and then calling it with something like

require("sample.jl")

Which automatically makes the code in that file available to all processes. It need only contain the code that you will be calling with "@spawn" (or the remotecall_fetch function shown below). My sample.jl file looks like this:

function rand_bernoulli(N, mu::Float64)
    x::Array{Int64,1}= Array(Int64,N) 
    for i in 1:N
        x[i] = rand()\< mu
    end
    x
end
#vectorized
function rand_bernoulli(N, mu::Array{Float64, 1})
    x::Array{Int64,1}= (.\<
                        (rand(N),mu))
    x
end
function do_logistic(dat)
    res = GLM.glm(:(y ~ x + z), dat, Binomial(), GLM.LogitLink())
    GLM.coef(res)
end

A second way to do parallel processing with julia is through the remotecall or remotecall_fetch functions. The remote call function takes a process number (worker number) as an argument, so it is slightly more tunable than the @spawn. Here are two examples of the function I used last time: the first is familiar and the second is parallelized. The remotecall_fetch function distributes the "do_logistic" function on the data frame "dat" to the 1st and 2nd processes. Because the parallelized version uses a smaller loop and more vectorization, it's a slightly more efficient process, anyways.

function rep_logistic2(N, iter)
    res = Array(Float64,iter,3)
    for i in 1:iter
        #use vectorized method of data generation by including the data-type 
        # which denotes a different type
        dat::DataFrame = mk_data(N,1)
        res[i,1:3] = do_logistic(dat)
    end
    DataFrames.DataFrame([res], ["bz", "bx", "by"])
end
#parallel version
function rep_logistic2parallel(N, iter) #n should be even
    res = Array(Float64,int(iter/2),3)
    res2 = Array(Float64,int(iter/2),3)
    for i in int(linspace(1, int(iter/2), int(iter/2)))
        #use vectorized method of data generation by including the data-type 
        # which denotes a different type
        dat::DataFrame = mk_data(N,1)
        res[i,1:3] = remotecall_fetch(1,do_logistic, dat)
        res2[i,1:3] = remotecall_fetch(2,do_logistic, dat) #depends on number of processes

    end
    DataFrames.DataFrame([res;res2], ["bz", "bx", "by"])
end

Oops!

Turns out, I can't show any run-times comparing this last set of functions. I'm on the "nightly build" train with julia, which occasionally means that some features don't work. Before I tried out the previous function, I built an updated julia that apparently has a little bug in processes that were working for me in older versions. Such is the life on the bleeding edge!

[Update: I've since re-built julia and re-attained parallel functioning. However, julia is generally allowing me to run each parallelized function 1 time before suffering from a kernel re-start. There are still things to be worked out with parallel computing in julia. This is also one of those concrete signs that julia is not *quite* ready to become a primary language for many projects. Still... progress!]

Verdict for parallelism: sort of hot/not

Neverending Math Equation

Monday, October 21, 2013

Julia: sometimes it's just better not to multitask

No comments:

Post a Comment