dlutils package¶

Submodules¶

dlutils.async module¶

dlutils.async_func(fnc=None, callback=None)[source]

dlutils.batch_provider module¶

dlutils.batch_provider(data, batch_size, processor=None, worker_count=1, queue_size=16, report_progress=True)[source]

Return an object that produces a sequence of batches from input data

Input data is split into batches of size batch_size which are processed with function processor Data is split and processed by separate threads and dumped into a queue allowing continuous provision of data. The main purpose of this primitive is to provide easy to use tool for parallel batch processing/generation in background while main thread runs the main algorithm. Batches are processed in parallel, allowing better utilization of CPU cores and disk that may improve GPU utilization for DL tasks with Storage/IO bottleneck.

This primitive can be used in various ways. For small datasets, the input data list may contain actual dataset, while processor function does from small to no data processing. For larger datasets, data list may contain just filenames or keys while processor function reads data from disk or db.

There are many purposes that function processor can be used for, depending on your use case.

Reading data from disk or db
Data decoding, e.g. from JPEG.
Augmenting data, flipping, rotating adding nose, etc.
Concatenation of data, stacking to single ndarray, conversion to a tensor, uploading to GPU.
Data generation.

Note

Sequential order of batches is guaranteed only if number of workers is 1 (Default), otherwise batches might be supplied out of order.

Parameters

data (list) – Input data, each entry in the list should be a separate data point.
batch_size (int) – Size of a batch. If size of data is not divisible by batch_size, then the last batch will have smaller size.
processor (Callable[[list], Any], optional) – Function for processing batches. Receives slice of the data list as input. Can return object of any type. Defaults to None.
worker_count (int, optional) – Number of workers, should be greater or equal to one. To process data in parallel and fully load CPU worker_count should be close to the number of CPU cores. Defaults to one.
queue_size (int, optional) – Maximum size of the queue, which is number of batches to buffer. Should be larger than worker_count. Typically, one would want this to be as large as possible to amortize all disk IO and computational costs. Downside of large value is increased RAM consumption. Defaults to 16.
report_progress (bool, optional) –
Print a progress bar similar to tqdm. You still may use tqdm if you set report_progress to False. To use tqdm just do
```
for x in tqdm(batch_provider(...)):
    ...
```
Defaults to True.

Returns

An object that produces a sequence of batches. next() method of the iterator will return object that was produced by processor function

Return type

Iterator

Raises

StopIteration – When all data was iterated through. Stops the for loop.

Example

def process(batch):
    images = [misc.imread(x[0]) for x in batch]
    images = np.asarray(images, dtype=np.float32)
    images = images.transpose((0, 3, 1, 2))
    labeles = [x[1] for x in batch]
    labeles = np.asarray(labeles, np.int)
    return torch.from_numpy(images) / 255.0, torch.from_numpy(labeles)

data = [('some_list.jpg', 1), ('of_filenames.jpg', 2), ('etc.jpg', 4), ...] # filenames and labels
batches = dlutils.batch_provider(data, 32, process)

for images, labeles in batches:
    result = model(images)
    loss = F.nll_loss(result, labeles)
    loss.backward()
    optimizer.step()

dlutils.cache module¶

class dlutils.cache(function)[source]¶

Bases: object

Caches return value of a functions.

Given a function with no side effects, it will compute sha256 hash of passed arguments and use that hash to retrieve saved pickle.

Note

Passed arguments must be picklable.

If you change function, or do any other change that invalidates previously saved caches you will need to delete them manually

Results are saved to ‘.cache’ folder in current directory.

Parameters: function (function) – fucntions to be called.

Example

@dlutils.cache
def expensive_function(x):
    for i in range(12):
        x = x + x * x
    return x

dlutils.default_cfg module¶

dlutils.default_cfg.get_default_cfg()[source]

dlutils.download module¶

Module for downloading files, downloading files from google drive, uncompressing targz

dlutils.download.cifar10(directory='cifar10')[source]

Downloads CIFAR10 Dataset.

Parameters: directory (str) – Directory where to save the files

dlutils.download.cifar100(directory='cifar100')[source]

Downloads CIFAR100 Dataset.

Parameters: directory (str) – Directory where to save the files

dlutils.download.fashion_mnist(directory='fashion-mnist')[source]

Downloads Fashion-MNIST Dataset.

Parameters: directory (str) – Directory where to save the files

dlutils.download.from_google_drive(google_drive_fileid, directory='.', file_name=None, extract_targz=False, extract_gz=False, extract_zip=False)[source]

Downloads file from Google Drive.

Given the file ID, file is downloaded from Google Drive and optionally it can be unpacked after downloading completes.

Note

You need to share the file as Anyone who has the link can access. No sign-in required.. You can find the file ID in the link:

https://drive.google.com/file/d/ 0B3kP5zWXwFm_OUpQbDFqY2dXNGs /view?usp=sharing

Parameters

google_drive_fileid (str) – file ID.
directory (str) – Directory where to save the file
file_name (str, optional) – If not None, this will overwrite the file name, otherwise it will use the filename returned from http request. Defaults to None.
extract_targz (bool) – Extract tar.gz archive. Defaults to False.
extract_gz (bool) – Decompress gz compressed file. Defaults to False.
extract_zip (bool) – Extract zip archive. Defaults to False.

Example

dlutils.download.from_google_drive(directory="data/", google_drive_fileid="0B3kP5zWXwFm_OUpQbDFqY2dXNGs")

dlutils.download.from_url(url, directory='.', file_name=None, extract_targz=False, extract_gz=False, extract_zip=False)[source]

Downloads file from specified URL.

Optionally it can be unpacked after downloading completes.

Parameters

url (str) – file URL.
directory (str) – Directory where to save the file
file_name (str, optional) – If not None, this will overwrite the file name, otherwise it will use the filename returned from http request. Defaults to None.
extract_targz (bool) – Extract tar.gz archive. Defaults to False.
extract_gz (bool) – Decompress gz compressed file. Defaults to False.
extract_zip (bool) – Extract zip archive. Defaults to False.

Example

dlutils.download.from_url("http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz", directory, extract_gz=True)

dlutils.download.mnist(directory='mnist')[source]

Downloads MNIST Dataset.

Parameters: directory (str) – Directory where to save the files

dlutils.epoch module¶

class dlutils.epoch.EpochRange(epoch_count, log_func=None)[source]

Bases: object

Range for iterating epochs

class dlutils.epoch.LossTracker[source]

Bases: object

Tracker for easy recording and computing mean values of some quanities such as losses. Summary of average values is printed at the end of each epoch.

add(name, format_str='%s: %.3f')[source]

reset()[source]

class dlutils.epoch.RunningMean[source]: Bases: object

dlutils.measures module¶

dlutils.measures.auc(label, prediction)[source]

dlutils.measures.f1(label, prediction, threshold)[source]

dlutils.measures.f1_from_pr(precision, recall)[source]

dlutils.measures.f1_from_tp_fp_fn(true_positive, false_positive, false_negative)[source]

dlutils.measures.openset_f1(label_inlier, prediction_inlier, threshold, correctly_classified)[source]

dlutils.numpy_dataset module¶

class dlutils.numpy_dataset.NumpyDataset(data)[source]

Bases: object

static list_of_pairs_to_numpy(l)[source]

shuffle()[source]

dlutils.progress_bar module¶

class dlutils.progress_bar.ProgressBar(total_iterations, prefix='Progress:', suffix='', decimals=1, length=None, fill='#')[source]

Bases: object

increment(val=1)[source]

dlutils.random_rotation module¶

Random rotation matrix

dlutils.random_rotation.random_rotation(size)[source]

dlutils.reader module¶

Util for reading MNIST dataset

class dlutils.reader.Cifar10(path, train=True, test=False)[source]

Bases: object

Read CIFAR out of binary batches

get_images()[source]

get_labels()[source]

class dlutils.reader.Cifar100(path, train=True, test=False)[source]

Bases: object

Read CIFAR out of binary batches

get_images()[source]

get_labels()[source]

class dlutils.reader.Mnist(path, items=None, train=True, test=False, resize_to_32x32=False)[source]

Bases: object

Read MNIST out of binary batches

get_images()[source]

get_labels()[source]

dlutils.registry module¶

class dlutils.registry.Registry(*args, **kwargs)[source]

Bases: dict

register(module_name)[source]

dlutils.save_image module¶

dlutils.make_grid(images, nrow=8, padding=2, NCWH=False)[source]

dlutils.save_image(images, filename, nrow=8, padding=2, NCWH=False, format=None)[source]

dlutils.shuffle module¶

dlutils.shuffle.shuffle_ndarray(x, axis=0)[source]

Shuffle slices of ndarray along specific axis.

For example, given a 4-dimentional ndarray, which represents a batch of images in BCHW format, one could shuffle samples in that batch by applying shuffle_ndarray() with axis = 0.

Note

Function does not return anything. It shuffles ndarray inplace.

Parameters

x (array_like) – ndarray to shuffle.
axis (int, optional) – The axis over which to shuffle. Defaults to 0.

Example

>>> a = np.asarray([[1, 5], [0, 2], [0, 1]])
>>> a
array([[1, 5],
       [0, 2],
       [0, 1]])
>>> dlutils.shuffle.shuffle_ndarray(a, axis=0)
>>> a
array([[0, 2],
       [0, 1],
       [1, 5]])
>>> dlutils.shuffle.shuffle_ndarray(a, axis=1)
>>> a
array([[2, 0],
       [1, 0],
       [5, 1]])

dlutils.shuffle.shuffle_ndarrays_in_unison(arrays, axis=0)[source]

Shuffle slices of a list of ndarrays along specific axis with the same permutation for each of the arrays in the list.

Works similar to shuffle_ndarray(), but applys the same permutation to all arrays in the list

Note

Function does not return anything. It shuffles ndarray inplace. All arrays in the list should have the same shape.

Parameters

arrays (list[array_like]) – list of ndarrays to shuffle.
axis (int, optional) – The axis over which to shuffle. Defaults to 0.

dlutils.timer module¶

Profiling utils

dlutils.timer.timer(f)[source]

Decorator for timeing function (method) execution time.

After return from function will print string: func: <function name> took: <time in seconds> sec.

Parameters: f (Callable[Any]) – function to decorate.
Returns: Decorated function.
Return type: Callable[Any]

Example

>>> from dlutils import timer
>>> @timer.timer
... def foo(x):
...     for i in range(x):
...             pass
...
>>> foo(100000)
func:'foo'  took: 0.0019 sec

dlutils.tracker module¶

class dlutils.tracker.LossTracker(output_dir='.')[source]

Bases: object

add(name, pytorch=True)[source]

load_state_dict(state_dict)[source]

plot()[source]

register_means(epoch)[source]

state_dict()[source]

update(d)[source]

dlutils package¶

Submodules¶

dlutils.async module¶

dlutils.batch_provider module¶

dlutils.cache module¶

dlutils.default_cfg module¶

dlutils.download module¶

dlutils.epoch module¶

dlutils.measures module¶

dlutils.numpy_dataset module¶

dlutils.progress_bar module¶

dlutils.random_rotation module¶

dlutils.reader module¶

dlutils.registry module¶

dlutils.save_image module¶

dlutils.shuffle module¶

dlutils.timer module¶

dlutils.tracker module¶

Module contents¶