Utils Module
This module contains utility functions that are commonly used throughout SCOUT.
In particular, this module contains map-reduce style utility functions for processing large chunk-compressed arrays in parallel.
A class to share memory between processes
Instantiate this class in the parent process and use in all processes.
For all but Linux, we use the mmap module to get a buffer for Numpy to access through numpy.frombuffer. But in Linux, we use /dev/shm which has no file backing it and does not need to deal with maintaining a consistent view of itself on a disk.
Typical use:
shm = SharedMemory((100, 100, 100), np.float32) def do_something(): with shm.txn() as a: a[...] = ... with multiprocessing.Pool() as pool: pool.apply_async(do_something, args)
A contextual wrapper of the shared memory
- Returns
A view of the shared memory which has the shape and dtype given at construction
- Return type
memory - array
- scout.utils.box_slice_idx(start, stop)
Creates an index tuple for a bounding box from start to stop using slices
- Parameters
start (array-like) – index of box start
stop (array-like) – index of box stop (index not included in result)
- Returns
idx – index tuple for bounding box
- Return type
tuple
- scout.utils.chunk_coordinates(shape, chunks)
Calculate the global coordaintes for each chunk’s starting position
- Parameters
shape (tuple) – shape of the image to chunk
chunks (tuple) – shape of each chunk
- Returns
start_coords – the starting indices of each chunk
- Return type
ndarray
- scout.utils.chunk_dims(img_shape, chunk_shape)
Calculate the number of chunks needed for a given image shape
- Parameters
img_shape (tuple) – whole image shape
chunk_shape (tuple) – individual chunk shape
- Returns
nb_chunks – a tuple containing the number of chunks in each dimension
- Return type
tuple
- scout.utils.extract_box(arr, start, stop)
Indexes arr from start to stop
- Parameters
arr (array-like or SharedMemory) – input array to index
start (array-like) – starting index of the slice
stop (array-like) – ending index of the slice. The element at this index is not included.
- Returns
box – resulting box from arr
- Return type
ndarray
- scout.utils.files_in_dir(path)
Searches a path for all files
- Parameters
path (str) – The directory path to check for files
- Returns
list of all files and subdirectories in the input path (excluding . and ..)
- Return type
list
- scout.utils.insert_box(arr, start, stop, data)
Indexes arr from start to stop and inserts data
- Parameters
arr (array-like) – input array to index
start (array-like) – starting index of the slice
stop (array-like) – ending index of the slice. The element at this index is not included.
data (array-like) – sub-array to insert into arr
- Returns
box – resulting box from arr
- Return type
ndarray
- scout.utils.pmap_chunks(f, arr, chunks=None, nb_workers=None, use_imap=False, unordered=False, chunksize=None)
Maps a function over an array in parallel using chunks
The function f should take a reference to the array, a starting index, and the chunk size. Since each subprocess is handling it’s own indexing, any overlapping should be baked into f. Caution: arr may get copied if not using memmap. Use with SharedMemory or Zarr array to avoid copies.
- Parameters
f (callable) – function with signature f(arr, start_coord, chunks). May need to use partial to define other args.
arr (array-like) – an N-dimensional input array
chunks (tuple, optional) – the shape of chunks to use. Default tries to access arr.chunks and falls back to arr.shape
nb_workers (int, optional) – number of parallel processes to apply f with. Default, cpu_count
use_imap (bool, optional) – whether or not to use imap instead os starmap in order to get an iterator for tqdm. Note that this requires input tuple unpacking manually inside of f.
- Returns
result – list of results for each chunk
- Return type
list
- scout.utils.read_voxel_size(path)
Reads in the voxel size stored in path CSV file with voxel dimensions in nanometers
- Parameters
path (str) – Path to CSV file containing integer values of voxel dimensions in nanometers
- Returns
size – Physical voxel size in same order as in CSV
- Return type
tuple
- scout.utils.tifs_in_dir(path)
Searches input path for tif files
- Parameters
path (str) – path of the directory to check for tif images
- Returns
tif_paths (list) – list of paths to tiffs in path
tif_filenames (list) – list of tiff filenames (with the extension) in path