Multiple replications

Multiple replications#

We will extend the Experiments notebook by modifying our single_run model wrapper function so that it runs multiple independent replications.

For this example, we will ignore common random number streams across scenarios

The first part of the notebook is setup in an similar manner to the Experiments notebook. But we do make one small modification to the Experiment class to reset results collection at the start of each replication. The final section modifies single_run and creates a new wrapper function and uses it to run multiple independent replications.

1. Imports#

import numpy as np
import pandas as pd
import simpy
import itertools

2. Notebook level variables, constants, and default values#

A useful first step when setting up a simulation model is to define the base case or as-is parameters. Here we will create a set of constant/default values for our Experiment class, but you could also consider reading these in from a file.

# default resources
N_OPERATORS = 13

# default mean inter-arrival time (exp)
MEAN_IAT = 60 / 100

## default service time parameters (triangular)
CALL_LOW = 5.0
CALL_MODE = 7.0
CALL_HIGH = 10.0

# Seeds for arrival and service time distributions (for repeatable single run)
ARRIVAL_SEED = 42
CALL_SEED = 101

# Boolean switch to simulation results as the model runs
TRACE = False

# run variables
RESULTS_COLLECTION_PERIOD = 1000

3. Distribution classes#

We will also define two distribution classes (Triangular and Expeonential) to encapsulate the random number generation, parameters and random seeds used in the sampling. This simplifies what we will need to include in the Experiment class and as we will see later makes it easier to vary distributions as well as parameters.

class Triangular():
    '''
    Convenience class for the triangular distribution.
    packages up distribution parameters, seed and random generator.
    '''
    def __init__(self, low, mode, high, random_seed=None):
        '''
        Constructor. Accepts and stores parameters of the triangular dist
        and a random seed.
        
        Params:
        ------
        low: float
            The smallest values that can be sampled
            
        mode: float
            The most frequently sample value
            
        high: float
            The highest value that can be sampled
        
        random_seed: int, optional (default=None)
            Used with params to create a series of repeatable samples.
        '''
        self.rand = np.random.default_rng(seed=random_seed)
        self.low = low
        self.high = high
        self.mode = mode
        
    def sample(self, size=None):
        '''
        Generate one or more samples from the triangular distribution
        
        Params:
        --------
        size: int
            the number of samples to return.  If size=None then a single
            sample is returned.
            
        Returns:
        -------
        float or np.ndarray (if size >=1)
        '''
        return self.rand.triangular(self.low, self.mode, self.high, size=size)

class Exponential():
    '''
    Convenience class for the exponential distribution.
    packages up distribution parameters, seed and random generator.
    '''
    def __init__(self, mean, random_seed=None):
        '''
        Constructor
        
        Params:
        ------
        mean: float
            The mean of the exponential distribution
        
        random_seed: int, optional (default=None)
            A random seed to reproduce samples.  If set to none then a unique
            sample is created.
        '''
        self.rand = np.random.default_rng(seed=random_seed)
        self.mean = mean
        
    def sample(self, size=None):
        '''
        Generate a sample from the exponential distribution
        
        Params:
        -------
        size: int, optional (default=None)
            the number of samples to return.  If size=None then a single
            sample is returned.
            
        Returns:
        -------
        float or np.ndarray (if size >=1)
        '''
        return self.rand.exponential(self.mean, size=size)

3. Experiment class#

An experiment class is useful because it allows use to easily configure and schedule a large number of experiments to occur in a loop. We set the class up so that it uses the default variables we defined above i.e. as default the model reflects the as-is process. To run a new experiment we simply override the default values.

We have now modified the experiment class to include the init_results_variables() method. We are now going to run experiments multiple times. This means we need to reset all results data collection before the model is run. Otherwise we will carry over the results from one replication to the next.

class Experiment:
    '''
    Parameter class for 111 simulation model
    '''
    def __init__(self, n_operators=N_OPERATORS, mean_iat=MEAN_IAT, 
                 call_low=CALL_LOW, call_mode=CALL_MODE, call_high=CALL_HIGH,
                 arrival_seed=None, call_seed=None):
        '''
        The init method sets up our defaults. 
        '''
        self.n_operators = n_operators
        self.arrival_dist = Exponential(mean_iat, random_seed=arrival_seed)
        self.call_dist = Triangular(call_low, call_mode, call_high, 
                                    random_seed=call_seed)

        # resources: we must init resources inside of the arrivals process.  
        # but we will store a placeholder for them for transparency
        self.operators = None
        
        # initialise results to zero
        self.init_results_variables()
        
    # MODIFICATION: method to reset results collection
    def init_results_variables(self):
        '''
        Initialise all of the experiment variables used in results 
        collection.  This method is called at the start of each run
        of the model
        '''
        # variable used to store results of experiment
        self.results = {}
        self.results['waiting_times'] = []
        
        # total operator usage time for utilisation calculation.
        self.results['total_call_duration'] = 0.0
        

4. Model code#

def trace(msg):
    '''
    Turing printing of events on and off.
    
    Params:
    -------
    msg: str
        string to print to screen.
    '''
    if TRACE:
        print(msg)

def service(identifier, args, env):
    '''
    simulates the service process for a call operator

    1. request and wait for a call operator
    2. phone triage (triangular)
    3. exit system
    
    Params:
    ------
    
    identifier: int 
        A unique identifer for this caller
        
    experiment: Experiment
        The settings and input parameters for the current experiment
        
    env: simpy.Environment
        The current environent the simulation is running in
        We use this to pause and restart the process after a delay.
    
    '''
    
    # record the time that call entered the queue
    start_wait = env.now
    
    # request an operator
    with args.operators.request() as req:
        yield req
        
        # record the waiting time for call to be answered
        waiting_time = env.now - start_wait
        
        # store the results for an experiment 
        args.results['waiting_times'].append(waiting_time)

        trace(f'operator answered call {identifier} at ' \
              + f'{env.now:.3f}')

        # the sample distribution is defined by the experiment.
        call_duration = args.call_dist.sample()       
        
        # schedule process to begin again after call_duration
        yield env.timeout(call_duration)
        
        # update the total call_duration 
        args.results['total_call_duration'] += call_duration
        
        # print out information for patient.
        trace(f'call {identifier} ended {env.now:.3f}; ' \
              + f'waiting time was {waiting_time:.3f}')

def arrivals_generator(env, args):
    '''
    IAT is exponentially distributed

    Parameters:
    ------
    env: simpy.Environment
        The simpy environment for the simulation

    experiment: Experiment
        The settings and input parameters for the simulation.
    '''

    # use itertools as it provides an infinite loop 
    # with a counter variable that we can use for unique Ids
    for caller_count in itertools.count(start=1):

        # MODIFICATION:the sample distribution is defined by the experiment.
        inter_arrival_time = args.arrival_dist.sample()
        yield env.timeout(inter_arrival_time)

        trace(f'call arrives at: {env.now:.3f}')

        # MODIFICATION: we pass the experiment to the service function
        env.process(service(caller_count, args, env))

5. A multiple replications wrapper function#

def single_run(experiment, rc_period=RESULTS_COLLECTION_PERIOD):
    '''
    Perform a single run of the model and return the results
    
    Parameters:
    -----------
    
    experiment: Experiment
        The experiment/paramaters to use with model
    '''

    # results dictionary.  Each KPI is a new entry.
    results = {}
    
    # reset all results variables to zero and empty
    experiment.init_results_variables()
    
    # environment is (re)created inside single run
    env = simpy.Environment()

    # we create simpy resource here - this has to be after we
    # create the environment object.
    experiment.operators = simpy.Resource(env, capacity=experiment.n_operators)
    
    # we pass the experiment to the arrivals generator
    env.process(arrivals_generator(env, experiment))
    env.run(until=rc_period)

    # end of run results: calculate mean waiting time
    results['01_mean_waiting_time'] = \
        np.mean(experiment.results['waiting_times'])
    
    # end of run results: calculate mean operator utilisation
    results['02_operator_util'] = (experiment.results['total_call_duration'] \
                           / (rc_period * experiment.n_operators)) * 100.0
    
    print(f'Experiment complete')

    # return the results from the run of the model
    return results

def multiple_replications(experiment, 
                          rc_period=RESULTS_COLLECTION_PERIOD,
                          n_reps=5):
    '''
    Perform multiple replications of the model.
    
    Params:
    ------
    experiment: Experiment
        The experiment/paramaters to use with model
    
    rc_period: float, optional (default=DEFAULT_RESULTS_COLLECTION_PERIOD)
        results collection period.  
        the number of minutes to run the model to collect results

    n_reps: int, optional (default=5)
        Number of independent replications to run.
        
    Returns:
    --------
    pandas.DataFrame
    '''

    # loop over single run to generate results dicts in a python list.
    results = [single_run(experiment, rc_period) for rep in range(n_reps)]
        
    # format and return results in a dataframe
    df_results = pd.DataFrame(results)
    df_results.index = np.arange(1, len(df_results)+1)
    df_results.index.name = 'rep'
    return df_results

TRACE = False
default_scenario = Experiment()
results = multiple_replications(default_scenario)
results

Experiment complete
Experiment complete
Experiment complete
Experiment complete
Experiment complete

	01_mean_waiting_time	02_operator_util
rep
1	3.186282	94.502858
2	3.291959	92.245049
3	2.459465	93.051442
4	2.825698	91.802439
5	1.695967	88.281648

results.describe()

	01_mean_waiting_time	02_operator_util
count	5.000000	5.000000
mean	2.691874	91.976687
std	0.645812	2.307181
min	1.695967	88.281648
25%	2.459465	91.802439
50%	2.825698	92.245049
75%	3.186282	93.051442
max	3.291959	94.502858

6. Multiple experiments#

The single_run and multiple_replications wrapper functions for the model and the Experiment class mean that is very simple to run multiple experiments using replication analysis. We will define three new functions for running multiple experiments:

get_experiments() - this will return a python dictionary containing a unique name for an experiment paired with an Experiment object
run_all_experiments() - this will loop through the dictionary, run all experiments and return combined results.
experiment_summary_frame() - take the results from each scenario and format into a simple table.

def get_experiments():
    '''
    Creates a dictionary object containing
    objects of type `Experiment` to run.
    
    Returns:
    --------
    dict
        Contains the experiments for the model
    '''
    experiments = {}
    
    # base case
    # we will sync scenarios by using seeds
    experiments['base'] = Experiment(arrival_seed=ARRIVAL_SEED, 
                                     call_seed=CALL_SEED)
    
    # +1 extra capacity
    experiments['operators+1'] = Experiment(arrival_seed=ARRIVAL_SEED, 
                                            call_seed=CALL_SEED,
                                            n_operators=N_OPERATORS+1)
    
    return experiments

def run_all_experiments(experiments, rc_period=RESULTS_COLLECTION_PERIOD):
    '''
    Run each of the scenarios for a specified results
    collection period and replications.
    
    Params:
    ------
    experiments: dict
        dictionary of Experiment objects
        
    rc_period: float
        model run length
    
    '''
    print('Model experiments:')
    print(f'No. experiments to execute = {len(experiments)}\n')

    experiment_results = {}
    for exp_name, experiment in experiments.items():
        
        print(f'Running {exp_name}', end=' => ')
        results = multiple_replications(experiment, rc_period)
        print('done.\n')
        
        #save the results
        experiment_results[exp_name] = results
    
    print('All experiments are complete.')
    
    # format thje results
    return experiment_results
                    

# get the experiments
experiments = get_experiments()

#run the scenario analysis
experiment_results = run_all_experiments(experiments)

Model experiments:
No. experiments to execute = 2

Running base => Experiment complete
Experiment complete
Experiment complete
Experiment complete
Experiment complete
done.

Running operators+1 => Experiment complete

Experiment complete
Experiment complete
Experiment complete

Experiment complete
done.

All experiments are complete.

experiment_results

{'base':      01_mean_waiting_time  02_operator_util
 rep                                        
              1.673905         91.990826
              3.670187         94.173962
              9.841969         96.770083
              4.813703         93.538493
              2.588796         94.288763,
 'operators+1':      01_mean_waiting_time  02_operator_util
 rep                                        
              0.715242         85.523632
              1.287553         87.707918
              1.739517         92.071198
              1.608727         86.998684
              0.986897         88.059699}

def experiment_summary_frame(experiment_results):
    '''
    Mean results for each performance measure by experiment
    
    Parameters:
    ----------
    experiment_results: dict
        dictionary of replications.  
        Key identifies the performance measure
        
    Returns:
    -------
    pd.DataFrame
    '''
    columns = []
    summary = pd.DataFrame()
    for sc_name, replications in experiment_results.items():
        summary = pd.concat([summary, replications.mean()], axis=1)
        columns.append(sc_name)

    summary.columns = columns
    return summary

# as well as rounding you may want to rename the cols/rows to 
# more readable alternatives.
summary_frame = experiment_summary_frame(experiment_results)
summary_frame.round(2)

	base	operators+1
01_mean_waiting_time	4.52	1.27
02_operator_util	94.15	88.07