dareblopy package¶

Module contents¶

class dareblopy.Archive¶

Bases: pybind11_builtins.pybind11_object

__init__(*args, **kwargs)¶: Initialize self. See help(type(self)) for accurate signature.

exists(self: _dareblopy.Archive, arg0: str) → bool¶: Exists

list_directory(self: _dareblopy.Archive, arg0: str) → bool¶: ListDirectory

open(self: _dareblopy.Archive, arg0: str) → object¶: Opens file

open_as_bytes(self: _dareblopy.Archive, arg0: str) → object¶

open_as_numpy_ubyte(self: _dareblopy.Archive, arg0: str, arg1: object) → numpy.ndarray[uint8]¶

read_jpg_as_numpy(self: _dareblopy.Archive, filename: str, use_turbo: bool = False) → numpy.ndarray[uint8]¶

class dareblopy.Compression¶

Bases: pybind11_builtins.pybind11_object

Enumeration for compression type used for tfrecords.

Possible values:

NONE - default
GZIP
ZLIB

Example:

record_reader = db.RecordReader('zlib_compressed.tfrecords', db.Compression.ZLIB)
record_reader = db.RecordReader('gzip_compressed.tfrecords', db.Compression.GZIP)

record_yielder = db.RecordYielderBasic(['test_utils/test-small-gzip-r00.tfrecords',
                                        'test_utils/test-small-gzip-r01.tfrecords',
                                        'test_utils/test-small-gzip-r02.tfrecords',
                                        'test_utils/test-small-gzip-r03.tfrecords'], db.Compression.GZIP)

record_yielder_random = db.RecordYielderRandomized(['test_utils/test-small-gzip-r00.tfrecords',
                                                    'test_utils/test-small-gzip-r01.tfrecords',
                                                    'test_utils/test-small-gzip-r02.tfrecords',
                                                    'test_utils/test-small-gzip-r03.tfrecords'],
                                                    buffer_size=16,
                                                    seed=0,
                                                    epoch=0,
                                                    db.Compression.GZIP)

Members:

NONE

GZIP

ZLIB

GZIP = Compression.GZIP¶

NONE = Compression.NONE¶

ZLIB = Compression.ZLIB¶

__init__(self: _dareblopy.Compression, arg0: int) → None¶

property name¶

handle) -> str

Type: (self

class dareblopy.DataType¶

Bases: pybind11_builtins.pybind11_object

Enumeration for FixedLenFeature dtype.

Equivalent to tf.string, tf.float32, tf.int64 Note:

uint8 - is an alias for string, that enables reading directly to a preallocated numpy ndarray of a uint8 dtype and a given shape. This eliminates any additional copying/casting.

To use it, shape of the encoded numpy array most be known

Example:
features = {
    'shape': db.FixedLenFeature([3], db.int64),
    'data': db.FixedLenFeature([], db.string)
}

Members:

string

float32

int64

uint8

__init__(self: _dareblopy.DataType, arg0: int) → None¶

float32 = DataType.float32¶

int64 = DataType.int64¶

property name¶

handle) -> str

Type: (self

string = DataType.string¶

uint8 = DataType.uint8¶

class dareblopy.File¶

Bases: pybind11_builtins.pybind11_object

__init__(self: _dareblopy.File) → None¶

get_last_write_time(self: _dareblopy.File) → int¶

path(self: _dareblopy.File) → str¶

read(self: _dareblopy.File, size: int = - 1) → object¶

seek(self: _dareblopy.File, offset: int, origin: int = 0) → int¶

size(self: _dareblopy.File) → int¶

tell(self: _dareblopy.File) → int¶

class dareblopy.FileSystem¶

Bases: pybind11_builtins.pybind11_object

__init__(self: _dareblopy.FileSystem) → None¶

clear_search_paths(self: _dareblopy.FileSystem) → None¶: ClearSearchPaths

create_directory(self: _dareblopy.FileSystem, arg0: _dareblopy.Location) → fsal::Status¶: CreateDirectory

exists(self: _dareblopy.FileSystem, arg0: _dareblopy.Location) → bool¶: Exists

mount_archive(self: _dareblopy.FileSystem, arg0: _dareblopy.Archive) → fsal::Status¶: AddArchive

open(self: _dareblopy.FileSystem, location: _dareblopy.Location, mode: _dareblopy.Mode = Mode.read, lockable: bool = False) → object¶: Opens file

pop_search_path(self: _dareblopy.FileSystem) → None¶: PopSearchPath

push_search_path(self: _dareblopy.FileSystem, arg0: _dareblopy.Location) → None¶: PushSearchPath

remove(self: _dareblopy.FileSystem, arg0: _dareblopy.Location) → fsal::Status¶: Remove

rename(self: _dareblopy.FileSystem, arg0: _dareblopy.Location, arg1: _dareblopy.Location) → fsal::Status¶: Rename

class dareblopy.FixedLenFeature¶

Bases: pybind11_builtins.pybind11_object

An iterator that reads a list of tfrecord files and returns single or a batch of records). Does not support compressed tfrecords. Performs crc32 check of read data.

Variables

shape (TensorShape) – a .TensorShape object that defines input data shape.
dtype (DataType) – a .DataType object that defines input data type.
default_value (object, optional) – default value.

Note

Contructor is overloaded and excepts either:

shape (List[int]), datatype (DataType)

shape (List[int]), datatype (DataType), default_value (object)

Example:

features = {
    'shape': db.FixedLenFeature([3], db.int64),
    'data': db.FixedLenFeature([], db.string)
}

# or

features = {
    'shape': db.FixedLenFeature([3], db.int64),
    'data': db.FixedLenFeature([3, 32, 32], db.uint8)
}

__init__(*args, **kwargs)¶

Overloaded function.

__init__(self: _dareblopy.FixedLenFeature) -> None
__init__(self: _dareblopy.FixedLenFeature, arg0: List[int], arg1: _dareblopy.DataType) -> None
__init__(self: _dareblopy.FixedLenFeature, arg0: List[int], arg1: _dareblopy.DataType, arg2: object) -> None

property default_value¶

property dtype¶

property shape¶

class dareblopy.Location¶

Bases: pybind11_builtins.pybind11_object

__init__(*args, **kwargs)¶

Overloaded function.

__init__(self: _dareblopy.Location, arg0: str) -> None
__init__(self: _dareblopy.Location, arg0: str, arg1: fsal::Location::Options, arg2: fsal::PathType, arg3: fsal::LinkType) -> None

class dareblopy.Mode¶

Bases: pybind11_builtins.pybind11_object

Members:

read

write

append

read_update

write_update

append_update

__init__(self: _dareblopy.Mode, arg0: int) → None¶

append = Mode.append¶

append_update = Mode.append_update¶

property name¶

handle) -> str

Type: (self

read = Mode.read¶

read_update = Mode.read_update¶

write = Mode.write¶

write_update = Mode.write_update¶

class dareblopy.ParsedRecordYielderRandomized¶

Bases: pybind11_builtins.pybind11_object

Generator that yields parsed records from a list of tfrecord files in a randomized way.

ParsedRecordYielderRandomized gives slightly better performance over RecordYielderRandomized since it reduces data coping.

Args:

parser (RecordParser): parser to be used to decode records.
filenames (List[str]): a list of filenames of the tfrecord files. buffer_size (Int): Size of the buffer is in number of samples. Reading of data from tfrecords to this buffer is sequential, but order of tfrecords is picked at random.

Samples from this buffer are sampled at random. The more is the size of the buffer, the smaller are tf records, the more random is sample yielding. Similar to https://www.tensorflow.org/api_docs/python/tf/data/Dataset#shuffle

seed (Int): seed for random number generator compression (Compression, optional): compression type. Default is Compression.None.

__init__(self: _dareblopy.ParsedRecordYielderRandomized, parser: object, filenames: List[str], buffer_size: int, seed: int, epoch: int, compression: _dareblopy.Compression = Compression.NONE) → None¶

__iter__(self: object) → object¶

__next__(self: _dareblopy.ParsedRecordYielderRandomized) → object¶

next_n(self: _dareblopy.ParsedRecordYielderRandomized, arg0: int) → list¶

class dareblopy.RecordParser¶

Bases: pybind11_builtins.pybind11_object

__init__(*args, **kwargs)¶

Overloaded function.

__init__(self: _dareblopy.RecordParser, arg0: dict) -> None
__init__(self: _dareblopy.RecordParser, arg0: dict, arg1: bool) -> None
__init__(self: _dareblopy.RecordParser, arg0: dict, arg1: bool, arg2: int) -> None

parse_example(self: _dareblopy.RecordParser, arg0: List[str]) → list¶

parse_single_example(self: _dareblopy.RecordParser, arg0: str) → list¶

parse_single_example_inplace(self: _dareblopy.RecordParser, arg0: str, arg1: List[object], arg2: int) → None¶

class dareblopy.RecordReader¶

Bases: pybind11_builtins.pybind11_object

An iterator that reads tfrecord file and returns raw records (protobuffer messages). Does not support compressed tfrecords. Performs crc32 check of read data.

Parameters

file (File) – a .File fileobject.
filename (str) – a filename of the file.
compression (Compression, optional) – compression type. Default is Compression.None.

Note

Contructor is overloaded and excepts either file (File) either filename (str)

Example:

rr = db.RecordReader('test_utils/test-small-r00.tfrecords')
file_size, data_size, entries = rr.get_metadata()
records = list(rr)

# Or for the compressed records:
rr = db.RecordReader('test_utils/test-small-gzip-r00.tfrecords', db.Compression.GZIP)
file_size, data_size, entries = rr.get_metadata()
records = list(rr)

__init__(*args, **kwargs)¶

Overloaded function.

__init__(self: _dareblopy.RecordReader, file: fsal::File, compression: _dareblopy.Compression = Compression.NONE) -> None
__init__(self: _dareblopy.RecordReader, filename: str, compression: _dareblopy.Compression = Compression.NONE) -> None

__iter__(self: object) → object¶

__next__(self: _dareblopy.RecordReader) → object¶

get_metadata(self: _dareblopy.RecordReader) → Tuple[int, int, int]¶

Returns metadata of the tfrecord and checks all crc32 checksums.

Note

It has to scan the whole file to

Returns: Tuple[int, int, int] - file_size, data_size, entries. Where file_size - size of the file, data_size - size of the data stored in the tfrecord, entries - number of entries.

read_record(self: _dareblopy.RecordReader, arg0: int) → object¶: Reads a record at specific offset. In majority of cases, you won’t need this method, instead use

RecordReader as iterator.

class dareblopy.RecordYielderBasic¶

Bases: pybind11_builtins.pybind11_object

Generator that yields records from a list of tfrecord files.

Parameters

filenames (List[str]) – a list of filenames of the tfrecord files.
compression (Compression, optional) – compression type. Default is Compression.None.

__init__(self: _dareblopy.RecordYielderBasic, filenames: List[str], compression: _dareblopy.Compression = Compression.NONE) → None¶

__iter__(self: object) → object¶

__next__(self: _dareblopy.RecordYielderBasic) → object¶

next_n(self: _dareblopy.RecordYielderBasic, arg0: int) → list¶

class dareblopy.RecordYielderRandomized¶

Bases: pybind11_builtins.pybind11_object

Generator that yields records from a list of tfrecord files in a randomized way.

Parameters

filenames (List[str]) – a list of filenames of the tfrecord files.
buffer_size (Int) – Size of the buffer is in number of samples. Reading of data from tfrecords to this buffer is sequential, but order of tfrecords is picked at random. Samples from this buffer are sampled at random. The more is the size of the buffer, the smaller are tf records, the more random is sample yielding. Similar to https://www.tensorflow.org/api_docs/python/tf/data/Dataset#shuffle
seed (Int) – seed for random number generator
compression (Compression, optional) – compression type. Default is Compression.None.

__init__(self: _dareblopy.RecordYielderRandomized, filenames: List[str], buffer_size: int, seed: int, epoch: int, compression: _dareblopy.Compression = Compression.NONE) → None¶

__iter__(self: object) → object¶

__next__(self: _dareblopy.RecordYielderRandomized) → object¶

next_n(self: _dareblopy.RecordYielderRandomized, arg0: int) → list¶

class dareblopy.Status¶

Bases: pybind11_builtins.pybind11_object

__init__(self: _dareblopy.Status) → None¶

is_eof(self: _dareblopy.Status) → bool¶

dareblopy.open_as_bytes(arg0: str) → object¶

Opens file as bytes object

Parameters: filename (str) – filename

dareblopy.open_as_numpy_ubyte(filename: str, shape: object = None) → object¶

Opens file as numby array of type np.ubyte

Parameters

filename (str) – filename
shape (List[Int]) – shape

dareblopy.open_zip_archive(*args, **kwargs)¶

Overloaded function.

open_zip_archive(arg0: str) -> fsal::Archive

Opens zip archive

Args:
filename (str): filename
open_zip_archive(arg0: fsal::File) -> fsal::Archive

dareblopy.read_jpg_as_numpy(filename: str, use_turbo: bool = False) → object¶

Opens jpeg file as numby array of type np.ubyte

Parameters

filename (str) – filename
bool) (use_turbo) – Uses libjpeg turbo if True

Submodules¶

dareblopy.TFRecordsDatasetIterator module¶

class dareblopy.TFRecordsDatasetIterator.ParsedTFRecordsDatasetIterator(filenames, features, batch_size, buffer_size=1000, seed=None, epoch=0, compression=None)[source]¶

Bases: object

__init__(filenames, features, batch_size, buffer_size=1000, seed=None, epoch=0, compression=None)[source]¶: Initialize self. See help(type(self)) for accurate signature.

__iter__()[source]¶

__next__()[source]¶

class dareblopy.TFRecordsDatasetIterator.TFRecordsDatasetIterator(filenames, batch_size, buffer_size=1000, seed=None, epoch=0, compression=None)[source]¶

Bases: object

__init__(filenames, batch_size, buffer_size=1000, seed=None, epoch=0, compression=None)[source]¶: Initialize self. See help(type(self)) for accurate signature.

__iter__()[source]¶

__next__()[source]¶

dareblopy.data_loader module¶

dareblopy.data_loader.data_loader(yielder, collator=None, iteration_count=None, worker_count=1, queue_size=16)[source]¶

Return an iterator that retrieves objects from yielder and passes them through collator.

Maintains a queue of given size and can run several worker threads. Intended to be used for asynchronous, buffered data loading. Uses threads instead of multiprocessing, so tensors can be uploaded to GPU in collator.

There are many purposes that function collator can be used for, depending on your use case.

Reading data from disk or db
Data decoding, e.g. from JPEG.
Augmenting data, flipping, rotating adding nose, etc.
Concatenation of data, stacking to single ndarray, conversion to a tensor, uploading to GPU.
Data generation.

Note

Sequential order of batches is guaranteed only if number of workers is 1 (Default), otherwise batches might be supplied out of order.

Parameters

yielder (iterator) – Input data, returns batches.
collator (Callable, optional) – Function for processing batches. Receives batch from yielder.
return object of any type. Defaults to None. (Can) –
worker_count (int, optional) – Number of workers, should be greater or equal to one. To process data in parallel and fully load CPU worker_count should be close to the number of CPU cores. Defaults to one.
queue_size (int, optional) – Maximum size of the queue, which is number of batches to buffer. Should be larger than worker_count. Typically, one would want this to be as large as possible to amortize all disk IO and computational costs. Downside of large value is increased RAM consumption. Defaults to 16.

Returns

An object that produces a sequence of batches. next() method of the iterator will return object that was produced by collator function

Return type

Iterator

Raises

StopIteration – When all data was iterated through. Stops the for loop.

dareblopy.utils module¶

dareblopy.utils.display_grid(tensor, nrow=8, padding=2)[source]¶

dareblopy.utils.make_grid(tensor, nrow=8, padding=2)[source]¶