Utils Module

This module contains utility functions that are commonly used throughout SCOUT.

In particular, this module contains map-reduce style utility functions for processing large chunk-compressed arrays in parallel.

class scout.utils.SharedMemory(shape, dtype)

A class to share memory between processes

Instantiate this class in the parent process and use in all processes.

For all but Linux, we use the mmap module to get a buffer for Numpy to access through numpy.frombuffer. But in Linux, we use /dev/shm which has no file backing it and does not need to deal with maintaining a consistent view of itself on a disk.

Typical use:

shm = SharedMemory((100, 100, 100), np.float32)

def do_something():

    with shm.txn() as a:

        a[...] = ...

with multiprocessing.Pool() as pool:

    pool.apply_async(do_something, args)
txn()

A contextual wrapper of the shared memory

Returns

A view of the shared memory which has the shape and dtype given at construction

Return type

memory - array

scout.utils.box_slice_idx(start, stop)

Creates an index tuple for a bounding box from start to stop using slices

Parameters
  • start (array-like) – index of box start

  • stop (array-like) – index of box stop (index not included in result)

Returns

idx – index tuple for bounding box

Return type

tuple

scout.utils.chunk_coordinates(shape, chunks)

Calculate the global coordaintes for each chunk’s starting position

Parameters
  • shape (tuple) – shape of the image to chunk

  • chunks (tuple) – shape of each chunk

Returns

start_coords – the starting indices of each chunk

Return type

ndarray

scout.utils.chunk_dims(img_shape, chunk_shape)

Calculate the number of chunks needed for a given image shape

Parameters
  • img_shape (tuple) – whole image shape

  • chunk_shape (tuple) – individual chunk shape

Returns

nb_chunks – a tuple containing the number of chunks in each dimension

Return type

tuple

scout.utils.extract_box(arr, start, stop)

Indexes arr from start to stop

Parameters
  • arr (array-like or SharedMemory) – input array to index

  • start (array-like) – starting index of the slice

  • stop (array-like) – ending index of the slice. The element at this index is not included.

Returns

box – resulting box from arr

Return type

ndarray

scout.utils.files_in_dir(path)

Searches a path for all files

Parameters

path (str) – The directory path to check for files

Returns

list of all files and subdirectories in the input path (excluding . and ..)

Return type

list

scout.utils.insert_box(arr, start, stop, data)

Indexes arr from start to stop and inserts data

Parameters
  • arr (array-like) – input array to index

  • start (array-like) – starting index of the slice

  • stop (array-like) – ending index of the slice. The element at this index is not included.

  • data (array-like) – sub-array to insert into arr

Returns

box – resulting box from arr

Return type

ndarray

scout.utils.pmap_chunks(f, arr, chunks=None, nb_workers=None, use_imap=False, unordered=False, chunksize=None)

Maps a function over an array in parallel using chunks

The function f should take a reference to the array, a starting index, and the chunk size. Since each subprocess is handling it’s own indexing, any overlapping should be baked into f. Caution: arr may get copied if not using memmap. Use with SharedMemory or Zarr array to avoid copies.

Parameters
  • f (callable) – function with signature f(arr, start_coord, chunks). May need to use partial to define other args.

  • arr (array-like) – an N-dimensional input array

  • chunks (tuple, optional) – the shape of chunks to use. Default tries to access arr.chunks and falls back to arr.shape

  • nb_workers (int, optional) – number of parallel processes to apply f with. Default, cpu_count

  • use_imap (bool, optional) – whether or not to use imap instead os starmap in order to get an iterator for tqdm. Note that this requires input tuple unpacking manually inside of f.

Returns

result – list of results for each chunk

Return type

list

scout.utils.read_voxel_size(path)

Reads in the voxel size stored in path CSV file with voxel dimensions in nanometers

Parameters

path (str) – Path to CSV file containing integer values of voxel dimensions in nanometers

Returns

size – Physical voxel size in same order as in CSV

Return type

tuple

scout.utils.tifs_in_dir(path)

Searches input path for tif files

Parameters

path (str) – path of the directory to check for tif images

Returns

  • tif_paths (list) – list of paths to tiffs in path

  • tif_filenames (list) – list of tiff filenames (with the extension) in path