dareblopy package¶
Module contents¶
-
class
dareblopy.
Archive
¶ Bases:
pybind11_builtins.pybind11_object
-
__init__
(*args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
exists
(self: _dareblopy.Archive, arg0: str) → bool¶ Exists
-
list_directory
(self: _dareblopy.Archive, arg0: str) → bool¶ ListDirectory
-
open
(self: _dareblopy.Archive, arg0: str) → object¶ Opens file
-
open_as_bytes
(self: _dareblopy.Archive, arg0: str) → object¶
-
open_as_numpy_ubyte
(self: _dareblopy.Archive, arg0: str, arg1: object) → numpy.ndarray[uint8]¶
-
read_jpg_as_numpy
(self: _dareblopy.Archive, filename: str, use_turbo: bool = False) → numpy.ndarray[uint8]¶
-
-
class
dareblopy.
Compression
¶ Bases:
pybind11_builtins.pybind11_object
Enumeration for compression type used for tfrecords.
Possible values:
NONE - default
GZIP
ZLIB
Example:
record_reader = db.RecordReader('zlib_compressed.tfrecords', db.Compression.ZLIB) record_reader = db.RecordReader('gzip_compressed.tfrecords', db.Compression.GZIP) record_yielder = db.RecordYielderBasic(['test_utils/test-small-gzip-r00.tfrecords', 'test_utils/test-small-gzip-r01.tfrecords', 'test_utils/test-small-gzip-r02.tfrecords', 'test_utils/test-small-gzip-r03.tfrecords'], db.Compression.GZIP) record_yielder_random = db.RecordYielderRandomized(['test_utils/test-small-gzip-r00.tfrecords', 'test_utils/test-small-gzip-r01.tfrecords', 'test_utils/test-small-gzip-r02.tfrecords', 'test_utils/test-small-gzip-r03.tfrecords'], buffer_size=16, seed=0, epoch=0, db.Compression.GZIP)
Members:
NONE
GZIP
ZLIB
-
GZIP
= Compression.GZIP¶
-
NONE
= Compression.NONE¶
-
ZLIB
= Compression.ZLIB¶
-
__init__
(self: _dareblopy.Compression, arg0: int) → None¶
-
property
name
¶ handle) -> str
- Type
(self
-
class
dareblopy.
DataType
¶ Bases:
pybind11_builtins.pybind11_object
Enumeration for
FixedLenFeature
dtype.Equivalent to tf.string, tf.float32, tf.int64 Note:
uint8 - is an alias for string, that enables reading directly to a preallocated numpy ndarray of a uint8 dtype and a given shape. This eliminates any additional copying/casting.
To use it, shape of the encoded numpy array most be known
Example:
features = { 'shape': db.FixedLenFeature([3], db.int64), 'data': db.FixedLenFeature([], db.string) }
Members:
string
float32
int64
uint8
-
__init__
(self: _dareblopy.DataType, arg0: int) → None¶
-
float32
= DataType.float32¶
-
int64
= DataType.int64¶
-
property
name
¶ handle) -> str
- Type
(self
-
string
= DataType.string¶
-
uint8
= DataType.uint8¶
-
-
class
dareblopy.
File
¶ Bases:
pybind11_builtins.pybind11_object
-
__init__
(self: _dareblopy.File) → None¶
-
get_last_write_time
(self: _dareblopy.File) → int¶
-
path
(self: _dareblopy.File) → str¶
-
read
(self: _dareblopy.File, size: int = - 1) → object¶
-
seek
(self: _dareblopy.File, offset: int, origin: int = 0) → int¶
-
size
(self: _dareblopy.File) → int¶
-
tell
(self: _dareblopy.File) → int¶
-
-
class
dareblopy.
FileSystem
¶ Bases:
pybind11_builtins.pybind11_object
-
__init__
(self: _dareblopy.FileSystem) → None¶
-
clear_search_paths
(self: _dareblopy.FileSystem) → None¶ ClearSearchPaths
-
create_directory
(self: _dareblopy.FileSystem, arg0: _dareblopy.Location) → fsal::Status¶ CreateDirectory
-
exists
(self: _dareblopy.FileSystem, arg0: _dareblopy.Location) → bool¶ Exists
-
mount_archive
(self: _dareblopy.FileSystem, arg0: _dareblopy.Archive) → fsal::Status¶ AddArchive
-
open
(self: _dareblopy.FileSystem, location: _dareblopy.Location, mode: _dareblopy.Mode = Mode.read, lockable: bool = False) → object¶ Opens file
-
pop_search_path
(self: _dareblopy.FileSystem) → None¶ PopSearchPath
-
push_search_path
(self: _dareblopy.FileSystem, arg0: _dareblopy.Location) → None¶ PushSearchPath
-
remove
(self: _dareblopy.FileSystem, arg0: _dareblopy.Location) → fsal::Status¶ Remove
-
rename
(self: _dareblopy.FileSystem, arg0: _dareblopy.Location, arg1: _dareblopy.Location) → fsal::Status¶ Rename
-
-
class
dareblopy.
FixedLenFeature
¶ Bases:
pybind11_builtins.pybind11_object
An iterator that reads a list of tfrecord files and returns single or a batch of records). Does not support compressed tfrecords. Performs crc32 check of read data.
- Variables
shape (TensorShape) – a .TensorShape object that defines input data shape.
dtype (DataType) – a .DataType object that defines input data type.
default_value (object, optional) – default value.
Note
Contructor is overloaded and excepts either:
shape (List[int]), datatype (DataType)
shape (List[int]), datatype (DataType), default_value (object)
Example:
features = { 'shape': db.FixedLenFeature([3], db.int64), 'data': db.FixedLenFeature([], db.string) } # or features = { 'shape': db.FixedLenFeature([3], db.int64), 'data': db.FixedLenFeature([3, 32, 32], db.uint8) }
-
__init__
(*args, **kwargs)¶ Overloaded function.
__init__(self: _dareblopy.FixedLenFeature) -> None
__init__(self: _dareblopy.FixedLenFeature, arg0: List[int], arg1: _dareblopy.DataType) -> None
__init__(self: _dareblopy.FixedLenFeature, arg0: List[int], arg1: _dareblopy.DataType, arg2: object) -> None
-
property
default_value
¶
-
property
dtype
¶
-
property
shape
¶
-
class
dareblopy.
Location
¶ Bases:
pybind11_builtins.pybind11_object
-
__init__
(*args, **kwargs)¶ Overloaded function.
__init__(self: _dareblopy.Location, arg0: str) -> None
__init__(self: _dareblopy.Location, arg0: str, arg1: fsal::Location::Options, arg2: fsal::PathType, arg3: fsal::LinkType) -> None
-
-
class
dareblopy.
Mode
¶ Bases:
pybind11_builtins.pybind11_object
Members:
read
write
append
read_update
write_update
append_update
-
__init__
(self: _dareblopy.Mode, arg0: int) → None¶
-
append
= Mode.append¶
-
append_update
= Mode.append_update¶
-
property
name
¶ handle) -> str
- Type
(self
-
read
= Mode.read¶
-
read_update
= Mode.read_update¶
-
write
= Mode.write¶
-
write_update
= Mode.write_update¶
-
-
class
dareblopy.
ParsedRecordYielderRandomized
¶ Bases:
pybind11_builtins.pybind11_object
Generator that yields parsed records from a list of tfrecord files in a randomized way.
ParsedRecordYielderRandomized gives slightly better performance over RecordYielderRandomized since it reduces data coping.
- Args:
- parser (RecordParser): parser to be used to decode records.
filenames (List[str]): a list of filenames of the tfrecord files. buffer_size (Int): Size of the buffer is in number of samples. Reading of data from tfrecords to this buffer is sequential, but order of tfrecords is picked at random.
Samples from this buffer are sampled at random. The more is the size of the buffer, the smaller are tf records, the more random is sample yielding. Similar to https://www.tensorflow.org/api_docs/python/tf/data/Dataset#shuffle
seed (Int): seed for random number generator compression (Compression, optional): compression type. Default is Compression.None.
-
__init__
(self: _dareblopy.ParsedRecordYielderRandomized, parser: object, filenames: List[str], buffer_size: int, seed: int, epoch: int, compression: _dareblopy.Compression = Compression.NONE) → None¶
-
__iter__
(self: object) → object¶
-
__next__
(self: _dareblopy.ParsedRecordYielderRandomized) → object¶
-
next_n
(self: _dareblopy.ParsedRecordYielderRandomized, arg0: int) → list¶
-
class
dareblopy.
RecordParser
¶ Bases:
pybind11_builtins.pybind11_object
-
__init__
(*args, **kwargs)¶ Overloaded function.
__init__(self: _dareblopy.RecordParser, arg0: dict) -> None
__init__(self: _dareblopy.RecordParser, arg0: dict, arg1: bool) -> None
__init__(self: _dareblopy.RecordParser, arg0: dict, arg1: bool, arg2: int) -> None
-
parse_example
(self: _dareblopy.RecordParser, arg0: List[str]) → list¶
-
parse_single_example
(self: _dareblopy.RecordParser, arg0: str) → list¶
-
parse_single_example_inplace
(self: _dareblopy.RecordParser, arg0: str, arg1: List[object], arg2: int) → None¶
-
-
class
dareblopy.
RecordReader
¶ Bases:
pybind11_builtins.pybind11_object
An iterator that reads tfrecord file and returns raw records (protobuffer messages). Does not support compressed tfrecords. Performs crc32 check of read data.
- Parameters
file (File) – a .File fileobject.
filename (str) – a filename of the file.
compression (Compression, optional) – compression type. Default is Compression.None.
Note
Contructor is overloaded and excepts either file (File) either filename (str)
Example:
rr = db.RecordReader('test_utils/test-small-r00.tfrecords') file_size, data_size, entries = rr.get_metadata() records = list(rr) # Or for the compressed records: rr = db.RecordReader('test_utils/test-small-gzip-r00.tfrecords', db.Compression.GZIP) file_size, data_size, entries = rr.get_metadata() records = list(rr)
-
__init__
(*args, **kwargs)¶ Overloaded function.
__init__(self: _dareblopy.RecordReader, file: fsal::File, compression: _dareblopy.Compression = Compression.NONE) -> None
__init__(self: _dareblopy.RecordReader, filename: str, compression: _dareblopy.Compression = Compression.NONE) -> None
-
__iter__
(self: object) → object¶
-
__next__
(self: _dareblopy.RecordReader) → object¶
-
get_metadata
(self: _dareblopy.RecordReader) → Tuple[int, int, int]¶ Returns metadata of the tfrecord and checks all crc32 checksums.
Note
It has to scan the whole file to
- Returns
Tuple[int, int, int] - file_size, data_size, entries. Where file_size - size of the file, data_size - size of the data stored in the tfrecord, entries - number of entries.
-
read_record
(self: _dareblopy.RecordReader, arg0: int) → object¶ Reads a record at specific offset. In majority of cases, you won’t need this method, instead use
RecordReader as iterator.
-
class
dareblopy.
RecordYielderBasic
¶ Bases:
pybind11_builtins.pybind11_object
Generator that yields records from a list of tfrecord files.
- Parameters
filenames (List[str]) – a list of filenames of the tfrecord files.
compression (Compression, optional) – compression type. Default is Compression.None.
-
__init__
(self: _dareblopy.RecordYielderBasic, filenames: List[str], compression: _dareblopy.Compression = Compression.NONE) → None¶
-
__iter__
(self: object) → object¶
-
__next__
(self: _dareblopy.RecordYielderBasic) → object¶
-
next_n
(self: _dareblopy.RecordYielderBasic, arg0: int) → list¶
-
class
dareblopy.
RecordYielderRandomized
¶ Bases:
pybind11_builtins.pybind11_object
Generator that yields records from a list of tfrecord files in a randomized way.
- Parameters
filenames (List[str]) – a list of filenames of the tfrecord files.
buffer_size (Int) – Size of the buffer is in number of samples. Reading of data from tfrecords to this buffer is sequential, but order of tfrecords is picked at random. Samples from this buffer are sampled at random. The more is the size of the buffer, the smaller are tf records, the more random is sample yielding. Similar to https://www.tensorflow.org/api_docs/python/tf/data/Dataset#shuffle
seed (Int) – seed for random number generator
compression (Compression, optional) – compression type. Default is Compression.None.
-
__init__
(self: _dareblopy.RecordYielderRandomized, filenames: List[str], buffer_size: int, seed: int, epoch: int, compression: _dareblopy.Compression = Compression.NONE) → None¶
-
__iter__
(self: object) → object¶
-
__next__
(self: _dareblopy.RecordYielderRandomized) → object¶
-
next_n
(self: _dareblopy.RecordYielderRandomized, arg0: int) → list¶
-
class
dareblopy.
Status
¶ Bases:
pybind11_builtins.pybind11_object
-
__init__
(self: _dareblopy.Status) → None¶
-
is_eof
(self: _dareblopy.Status) → bool¶
-
-
dareblopy.
open_as_bytes
(arg0: str) → object¶ Opens file as bytes object
- Parameters
filename (str) – filename
-
dareblopy.
open_as_numpy_ubyte
(filename: str, shape: object = None) → object¶ Opens file as numby array of type np.ubyte
- Parameters
filename (str) – filename
shape (List[Int]) – shape
-
dareblopy.
open_zip_archive
(*args, **kwargs)¶ Overloaded function.
open_zip_archive(arg0: str) -> fsal::Archive
Opens zip archive
- Args:
filename (str): filename
open_zip_archive(arg0: fsal::File) -> fsal::Archive
-
dareblopy.
read_jpg_as_numpy
(filename: str, use_turbo: bool = False) → object¶ Opens jpeg file as numby array of type np.ubyte
- Parameters
filename (str) – filename
bool) (use_turbo) – Uses libjpeg turbo if True
Submodules¶
dareblopy.TFRecordsDatasetIterator module¶
-
class
dareblopy.TFRecordsDatasetIterator.
ParsedTFRecordsDatasetIterator
(filenames, features, batch_size, buffer_size=1000, seed=None, epoch=0, compression=None)[source]¶ Bases:
object
-
class
dareblopy.TFRecordsDatasetIterator.
TFRecordsDatasetIterator
(filenames, batch_size, buffer_size=1000, seed=None, epoch=0, compression=None)[source]¶ Bases:
object
dareblopy.data_loader module¶
-
dareblopy.data_loader.
data_loader
(yielder, collator=None, iteration_count=None, worker_count=1, queue_size=16)[source]¶ Return an iterator that retrieves objects from yielder and passes them through collator.
Maintains a queue of given size and can run several worker threads. Intended to be used for asynchronous, buffered data loading. Uses threads instead of multiprocessing, so tensors can be uploaded to GPU in collator.
There are many purposes that function
collator
can be used for, depending on your use case.Reading data from disk or db
Data decoding, e.g. from JPEG.
Augmenting data, flipping, rotating adding nose, etc.
Concatenation of data, stacking to single ndarray, conversion to a tensor, uploading to GPU.
Data generation.
Note
Sequential order of batches is guaranteed only if number of workers is 1 (Default), otherwise batches might be supplied out of order.
- Parameters
yielder (iterator) – Input data, returns batches.
collator (Callable, optional) – Function for processing batches. Receives batch from yielder.
return object of any type. Defaults to None. (Can) –
worker_count (int, optional) – Number of workers, should be greater or equal to one. To process data in parallel and fully load CPU
worker_count
should be close to the number of CPU cores. Defaults to one.queue_size (int, optional) – Maximum size of the queue, which is number of batches to buffer. Should be larger than
worker_count
. Typically, one would want this to be as large as possible to amortize all disk IO and computational costs. Downside of large value is increased RAM consumption. Defaults to 16.
- Returns
An object that produces a sequence of batches.
next()
method of the iterator will return object that was produced bycollator
function- Return type
Iterator
- Raises
StopIteration – When all data was iterated through. Stops the for loop.