smdc_perftests.performance_tests package

Submodules

smdc_perftests.performance_tests.analyze module

Module for analyzing and the test results Created on Thu Apr 2 14:30:51 2015

@author: christoph.paulik@geo.tuwien.ac.at

smdc_perftests.performance_tests.analyze.bar_plot(df, show=True)[source]

Make a bar plot from the gathered results

Parameters:

df: pandas.DataFrame

Measured data

show: boolean

if set then the plot is shown

Returns:

ax: matplotlib.axes

axes of the plot

smdc_perftests.performance_tests.analyze.esa_cci_grouping(n)[source]
smdc_perftests.performance_tests.analyze.esa_cci_name_formatter(n)[source]
smdc_perftests.performance_tests.analyze.prep_results(results_files, name_fm=None, grouping_f=None)[source]

Takes a list of results file names and bundles the results into a pandas DataFrame

Parameters:

results_files: list

list of filenames to load

name_fm: function, optional

if set a function that gets the name of the results and returns a more meaningful name. This is useful if the names of the results are very long or verbose.

grouping_f: function ,optional

can be used to assign groups according to the name of the results. Gets the name and returns a string.

Returns:

df : pandas.DataFrame

Results named and possibly grouped

smdc_perftests.performance_tests.test_cases module

This module contains functions that run tests according to specifications from SMDC Performance comparison document.

Interfaces to data should be interchangeable as long as they adhere to interface specifications from rsdata module

Created on Tue Oct 21 13:37:58 2014

@author: christoph.paulik@geo.tuwien.ac.at

class smdc_perftests.performance_tests.test_cases.SelfTimingDataset(ds, timefuncs=['get_timeseries', 'get_avg_image', 'get_data'])[source]

Bases: object

Dataset class that times the functions of a dataset instance it gets in it’s constructor

Stores the results as TestResults instances in a dictionary with the timed function names as keys.

Methods

gentimedfunc(funcname) generate a timed function that calls
gentimedfunc(funcname)[source]

generate a timed function that calls the function of the given dataset but returns the execution time

Parameters:

funcname: string

function to create/call of the timed dataset

class smdc_perftests.performance_tests.test_cases.TestResults(init_obj, name=None, ddof=1)[source]

Bases: object

Simple object that contains the test results and can be used to compare the test results to other test results.

Objects of this type can also be plotted by the plotting routines. Parameters ———- measured times or filename: list or string

list of measured times or netCDF4 file produced by to_nc of another TestResults object
ddof: int
difference degrees of freedom. This is used to calculate standard deviation and variance. It is the number that is subtracted from the sample number n when estimating the population standard deviation and variance. see bessel’s correction on e.g. wikipedia for explanation

Attributes

median: float median of the measurements
n: int sample size
stdev: float standard deviation
var: float variance
total: float total time expired
mean: float mean time per test run

Methods

confidence_int([conf_level]) Calculate confidence interval of the mean
to_nc(filename) store results on disk as a netCDF4 file
confidence_int(conf_level=95)[source]

Calculate confidence interval of the mean time measured

Parameters:

conf_level: float

confidence level desired for the confidence interval in percent. this will be transformed into the quantile needed to get the z value for the t distribution. default is 95% confidence interval

Returns:

lower_mean : float

lower confidence interval boundary

mean : float

mean value

upper_mean : float

upper confidence interval boundary

to_nc(filename)[source]

store results on disk as a netCDF4 file

Parameters:

filename: string

path and filename

smdc_perftests.performance_tests.test_cases.measure(exper_name, runs=5, ddof=1)[source]

Decorator that measures the running time of a function and calculates statistics.

Parameters:

exper_name: string

experiment name, used for plotting and saving

runs: int

number of test runs to perform

ddof: int

difference degrees of freedom. This is used to calculate standard deviation and variance. It is the number that is subtracted from the sample number n when estimating the population standard deviation and variance. see bessel’s correction on e.g. wikipedia for explanation

Returns:

results: dict

TestResults instance

smdc_perftests.performance_tests.test_cases.read_rand_cells_by_cell_list(dataset, cell_date_list, cell_id, read_perc=1.0, max_runtime=None)[source]

reads data from the dataset using the get_data method. In this method the start and end datetimes are fixed for all cell ID’s that are read.

Parameters:

dataset: instance

instance of a class that implements a get_data(date_start, date_end, cell_id) method

date_start: datetime

start dates which should be read.

date_end: datetime

end dates which should be read.

cell_date_list: list of tuples, time intervals to read for each cell

cell_id: int or iterable

cell ids which should be read. can also be a list of integers

read_perc : float

percentage of cell ids to read from the

max_runtime: int, optional

maximum runtime of test in second.

smdc_perftests.performance_tests.test_cases.read_rand_img_by_date_list(dataset, date_list, read_perc=1.0, max_runtime=None, **kwargs)[source]

reads image data for random dates on a list additional kwargs are given to read_img method of dataset

Parameters:

dataset: instance

instance of a class that implements a read_img(datetime) method

date_list: iterable

list of datetime objects

read_perc: float

percentage of datetimes out of date_list to read

max_runtime: int, optional

maximum runtime of test in second.

**kwargs:

other keywords are passed to the get_avg_image method dataset

smdc_perftests.performance_tests.test_cases.read_rand_img_by_date_range(dataset, date_list, read_perc=1.0, max_runtime=None, **kwargs)[source]

reads image data between random dates on a list additional kwargs are given to read_img method of dataset

Parameters:

dataset: instance

instance of a class that implements a read_img(datetime) method

date_list: iterable

list of datetime objects The format is a list of lists e.g. [[datetime(2007,1,1), datetime(2007,1,1)], #reads one day

[datetime(2007,1,1), datetime(2007,12,31)]] # reads one year

read_perc: float

percentage of datetimes out of date_list to read

max_runtime: int, optional

maximum runtime of test in second.

**kwargs:

other keywords are passed to the get_avg_image method dataset

smdc_perftests.performance_tests.test_cases.read_rand_ts_by_gpi_list(dataset, gpi_list, read_perc=1.0, max_runtime=None, **kwargs)[source]

reads time series data for random grid point indices in a list additional kwargs are given to read_ts method of dataset

Parameters:

dataset: instance

instance of a class that implements a read_ts(gpi) method

gpi_list: iterable

list or numpy array of grid point indices

read_perc: float

percentage of points from gpi_list to read

max_runtime: int, optional

maximum runtime of test in second.

**kwargs:

other keywords are passed to the get_timeseries method dataset

smdc_perftests.performance_tests.test_scripts module

Module implements the test cases specified in the performance test protocol Created on Wed Apr 1 10:59:05 2015

@author: christoph.paulik@geo.tuwien.ac.at

smdc_perftests.performance_tests.test_scripts.run_ascat_tests(dataset, testname, results_dir, n_dates=10000, date_read_perc=0.1, gpi_read_perc=0.1, repeats=3, cell_read_perc=10.0, max_runtime_per_test=None)[source]

Runs the ASCAT tests given a dataset instance

Parameters:

dataset: Dataset instance

Instance of a Dataset class

testname: string

Name of the test, used for storing the results

results_dir: string

path where to store the test restults

n_dates: int, optional

number of dates to generate

date_read_perc: float, optioanl

percentage of random selection from date_range_list read for each try

gpi_read_perc: float, optional

percentage of random selection from gpi_list read for each try

repeats: int, optional

number of repeats of the tests

cell_list: list, optional

list of possible cells to read from. if given then the read_data test will be run

max_runtime_per_test: float, optional

maximum runtime per test in seconds, if given the tests will be aborted after taking more than this time

smdc_perftests.performance_tests.test_scripts.run_equi7_tests(dataset, testname, results_dir, n_dates=10000, date_read_perc=0.1, gpi_read_perc=0.1, repeats=3, cell_read_perc=100.0, max_runtime_per_test=None)[source]

Runs the ASAR/Sentinel 1 Equi7 tests given a dataset instance

Parameters:

dataset: Dataset instance

Instance of a Dataset class

testname: string

Name of the test, used for storing the results

results_dir: string

path where to store the test restults

n_dates: int, optional

number of dates to generate

date_read_perc: float, optioanl

percentage of random selection from date_range_list read for each try

gpi_read_perc: float, optional

percentage of random selection from gpi_list read for each try

repeats: int, optional

number of repeats of the tests

cell_list: list, optional

list of possible cells to read from. if given then the read_data test will be run

max_runtime_per_test: float, optional

maximum runtime per test in seconds, if given the tests will be aborted after taking more than this time

smdc_perftests.performance_tests.test_scripts.run_esa_cci_netcdf_tests(test_dir, results_dir, variables=['sm'])[source]

function for running the ESA CCI netCDF performance tests the tests will be run for all .nc files in the test_dir

Parameters:

test_dir: string

path to the test files

results_dir: string

path in which the results should be stored

variables: list

list of variables to read for the tests

smdc_perftests.performance_tests.test_scripts.run_esa_cci_tests(dataset, testname, results_dir, n_dates=10000, date_read_perc=0.1, gpi_read_perc=0.1, repeats=3, cell_read_perc=10.0, max_runtime_per_test=None)[source]

Runs the ESA CCI tests given a dataset instance

Parameters:

dataset: Dataset instance

Instance of a Dataset class

testname: string

Name of the test, used for storing the results

results_dir: string

path where to store the test restults

n_dates: int, optional

number of dates to generate

date_read_perc: float, optioanl

percentage of random selection from date_range_list read for each try

gpi_read_perc: float, optional

percentage of random selection from gpi_list read for each try

repeats: int, optional

number of repeats of the tests

cell_list: list, optional

list of possible cells to read from. if given then the read_data test will be run

max_runtime_per_test: float, optional

maximum runtime per test in seconds, if given the tests will be aborted after taking more than this time

smdc_perftests.performance_tests.test_scripts.run_performance_tests(name, dataset, save_dir, gpi_list=None, date_range_list=None, cell_list=None, cell_date_list=None, gpi_read_perc=1.0, date_read_perc=1.0, cell_read_perc=1.0, max_runtime_per_test=None, repeats=1)[source]

Run a complete test suite on a dataset and store the results in the specified directory

Parameters:

name: string

name of the test run, used for filenaming

dataset: dataset instance

instance implementing the get_timeseries, get_avg_image and get_data methods.

save_dir: string

directory to store the test results in

gpi_list: list, optional

list of possible grid point indices, if given the timeseries reading tests will be run

date_range_list: list, optional

list of possible dates, if given then the read_avg_image and read_data tests will be run. The format is a list of lists e.g. [[datetime(2007,1,1), datetime(2007,1,1)], #reads one day

[datetime(2007,1,1), datetime(2007,12,31)]] # reads one year

cell_list: list, optional

list of possible cells to read from. if given then the read_data test will be run

cell_date_list: list, optional

list of time intervals to read per cell. Should be as long as the cell list or longer.

gpi_read_perc: float, optional

percentage of random selection from gpi_list read for each try

date_read_perc: float, optioanl

percentage of random selection from date_range_list read for each try

cell_read_perc: float, optioanl

percentage of random selection from cell_range_list read for each try

max_runtime_per_test: float, optional

maximum runtime per test in seconds, if given the tests will be aborted after taking more than this time

repeats: int, optional

number of repeats for each measurement

Module contents