Speed of Mimi Monte Carlo and Some Other Issues

I need to run millions of model simulations, so I’ve been messing around with different ways of getting this to be fast using Mimi. One issue I’m having trouble with is whether the internal Mimi Monte Carlo set-up is faster/slower than a Monte Carlo that I code up from scratch.

Here’s the basic scenario: I run DICE with millions of different climate sensitivity values. For each run, I want to return the sum of discounted climate damages. My Monte Carlo from scratch looks something like this:

for i = 1:1000000
    update_param!(m, :t2xco2, ECS_sample[i])
    npv_damages[i] = sum(df .* m[:neteconomy, :DAMAGES])

where …
m = Mimi version of DICE,
df = discount factors,
npv_damages = an empty array to store my results,
ECS_sample = a vector of climate sensitivity values.

When I use the internal Mimi approach for a Monte Carlo, I ended up with a result that had type Array{Dict{Tuple,DataFrames.DataFrame},1}. I then had to extract my results from that to be able to carry out some calculations, but it was in a “melted form”, so I next needed to reshape the annual damages so they were isolated by the individual Monte Carlo simulations…and then I could discount and sum them up. Ultimately there seemed to be so many steps to sort out my results that I gave up.

So my basic question is (1) Is there an easy, quick way to do a Monte Carlo in Mimi and then modify the results without all of the reshaping steps (e.g. I take the mean of damages for each simulation). A related question that I couldn’t figure out is (2) how do I match the uncertain parameters used for an individual simulation to the actual results in the Mimi Monte Carlo?

Hi Frank,

Just to clarify, how did you end up with the Array{Dict{Tuple,DataFrames.DataFrame},1} result? Was that from a post_trial_func or something? I would’ve thought you would use save in your run_sim to save your results for the parameter of interest i.e. save(neteconomy.DAMAGES) and then post process those arrays after running your simulation? The parameter results would be saved in a csv, and then you could import that csv file with CSVFiles.load(DAMAGES.csv) |> DataFrame if you wanted it as a DataFrame, or skip that to get an array. Each row of the csv has the year, value, and trial number, so I guess that would require one reshape but shouldn’t be too complicated?

The Array{Dict{Tuple...} thing is what popped up after I just ran the Mimi Monte Carlo and swapped in different climate sensitivity values and accessed it with my_sim.results. I didn’t save it as a .csv because I just wanted the discounted sum of damages for each run. So saving it to a .csv would mean I would have to save something with “N” simulations x “Y” model years, then reload it, then do another Monte Carlo over each simulation to discount the damages and sum them up.

Am I correct that the Mimi Monte Carlo is mostly designed to swap in different model parameters, but probably isn’t the right tool if there’s more complicated analysis occurring within each simulation? I saw the post_trial_func thing in the documentation, but couldn’t really tell what it did.

@plevin looping you in here

Looking at the code you wrote, I see that the loop would work well, and also think that the MCS functionality would be useful in this case with the use of a post_trial_func like we use in FUND for SCC calculations. I do plan to build out the tutorials for MC soon, just have had a lot on my plate!

The my_sim.results access is not supposed to be part of the API, which is why the data structure is so wonky. If you saved your variable of interest to a .csv, why would you need to Monte Carlo over each simulation again to discount the damages and sum them up? In that second run what would Monte Carlo be achieving? It seems you could simply run this line npv_damages[i] = sum(df .* m[:neteconomy, :DAMAGES]) for each simulation in a for loop over the .csv loaded results? Regardless I think this is exactly what a post_trial_func could also help you do like we do with FUND below.

Yes, Mimi Monte Carlo is designed to swap model parameters, but the more complicated analysis within each simulation would be assumed to be done either in a post_trial_func or within the model itself. I think what you want to do could be handled in a post_trial_func, which is run for each simulation. Cora uses one within FUND to calculate the SCC after each simulation: https://github.com/fund-model/MimiFUND.jl/blob/master/src/montecarlo/run_fund_scc_mcs.jl that might be a good template for you although you won’t need to do the marginal run portion so yours would be quite a bit simpler.

Sorry if I’m missing something here, I do think that this functionality should be able to do what you want though so I want to make sure I understand the problem.

Thanks for the response. For the second Monte Carlo run I was just being sloppy with language. In my head any for loop is basically a Monte Carlo. What I meant was I would have to do the Monte Carlo with the Mimi model, then save the results as a .csv, and then do a second for loop to calculate discounted damages… when I could have just done all this in the initial Monte Carlo run.

It seems like the post_trial_func is probably the answer for most simple cases. In the SCC example you sent, if I wanted to calculate multiple SCCs with different discount rates, would I need to either (1) re-run the entire Monte Carlo each time, (2) just do the Monte Carlo, save the base/marg damages in a csv, and then calculate the SCC in my own loop/function, or (3) set up some super post_trial_func that calculates the SCC for multiple discount rates? Or is it somehow possible to run the Monte Carlo once, but the post_trial_func multiple times where I can pass it different discount rates each time? Using the Mimi Monte Carlo with options 1-3 would technically work for the actual code I’m running, but they’d be slower/less convenient than coding up everything on my own.

**Edit: I just saw in the SCC example you sent that there’s an extra scenario function you have to create to pass around that could hold discount rates? I guess that would solve the issue I mentioned above.

Hey Frank! I think this will be the perfect example of using the post trial function with a Monte Carlo simulation, and you can do multiple discount rates within it (if you are okay with the NPV for each discount rate being calculated on the same trial data, which it seems like that is your intention). I would advise against using the other scenario function feature in this case, as that is intended to be used for larger changes to the actual model parameters, and separate sampling happens for each set of scenarios.

I’ve written some pseudo code below to describe how I would imagine you doing your described calculations within the Mimi capability. Please let me know if I’m missing any steps you needed or if it doesn’t make sense!

I am super willing to help you get this set up and running for you because it really should be faster than your own code. An important advantage is that in your own code, each time you call update_param!, the model instance is decached from your model, so when you call run(m) it has to rebuild the model each time, which will make it slower (especially over millions of runs). If you use the built in Mimi Simulation capability, it will automatically swap in the sampled climate sensitivity values into the model instance so that it doesn’t have to be rebuilt.

# define your trial number
trials = 1000000    # or however many you want

# define your desired discount rates and pre compute the discount factors
discount_rates = [0.025, 0.03, 0.05]
dfs = [calculate_df(rate) for rate in discount_rates]    # need to define or replace calculate_df

# make an array to store the npv damages results to during the post trial function
npv_results = zeros(trials, length(discount_rates))    

# define your Monte Carlo simulation
mcs = @defsim begin
    t2xco2 = MyDistribution()     # what type of distribution are you sampling from? I can help you set this up if it isn't obvious

# define your post trial function; this is the required type signature, even though we won't use all of the arguments
function my_npv_calculation(mcs::Simulation, trialnum::Int, ntimesteps::Int, tup::Tuple)
    m = mcs.models[1]    # access the model after it is run for this trial
    damages = m[:neteconomy, :DAMAGES]    # access the damage values for this run
    for (i, df) in enumerate(dfs)    # loop through our precomputed discount factors
        npv_results[trialnum, i] = sum(df .* damages)    # do the npv calculation and save it to our array of results
    nothing    # return nothing

# set the model, generate trials, and run the simulation
set_models!(mcs, m)
generate_trials!(mcs, trials; filename = "ECS_sample.csv")   # providing a file name is optional; only use if you want to see the climate sensitivity values later
run_sim(mcs; post_trial_func = my_npv_calculation)

# do something with the npv_results array
println(mean(npv_results, dims=2))    # or write to a file

@ckingdon95 thank you for putting this into clear language, I couldn’t quite recall but knew the gist of it and this is an important point about speed!

I just realized I’m not sure if you are just trying to calculate the npv of damages for one model, but if your end goal is to do an SCC calculation across a base and marginal models, it could look like this:

scc_results = zeros(trials, length(discount_rates))

function my_scc_calculation(mcs::Simulation, trialnum::Int, ntimesteps::Int, tup::Tuple)
    base, marginal = mcs.models
    base_damages = base[:neteconomy, :DAMAGES]
    marg_damages = marginal[:neteconomy, :DAMAGES]
    for (i, df) in enumerate(dfs)
        scc_results[trialnum, i] = sum(df .* (marg_damages .- base_damages))

# Build the base and marginal models
base = get_dice_model()
marginal = get_dice_model_with_marginal_emissions(year)    # pseudo function...

set_models!(mcs, [base, marginal])
generate_trials!(mcs, trials; filename = "ecs_sample.csv")
run_sim!(mcs; post_trial_func = my_scc_calculation)

Hey Cora,

Thanks for taking the time to write this out, it’s super helpful! The actual stuff I’m doing is a bit more complicated than an SCC calculation. That (and the discounted damages) were just examples I was using to try and understand the Mimi Monte Carlo. But I think I have a good handle on what it involves now thanks to you and Lisa. If I give it a test I’ll let you know how it does speed wise, but I must admit I’m a bit hesitant to adopt an approach that forces so much structure on my code. Two quick questions:

(1) By any chance, has anybody tried to get the Mimi Monte Carlo set up to run in parallel?

(2) Is there an alternative to update_param! so that calling run(m) doesn’t have to rebuild the model each time?

(1) unfortunately no, not yet. Although it is high on our development priority list (to at least start scoping it out to see how easy/difficult it would be)
(2) There’s no “sanctioned” way of doing this (i.e. nothing in the public API) but the following code would do it. This would be frowned upon because it uses internal functions and dot notation to access values, and we cannot guarantee that these internal features won’t be changed at some point.

function _update_external_param!(model::Model, paramname::Symbol, value)
    param = Mimi.external_param(model.mi.md, paramname)
    param.value = value