Walkthrough with test data
The following walkthrough illustrates how to use SCOUT on a small organoid dataset using Docker. Docker provides a cross-platform way to run SCOUT with minimal setup. The test dataset contains stitched (whole-organoid) syto16, SOX2, and TBR1 images from a portion of a d35 cerebral organoid cultured similar to the Lancaster protocol (described in the Methods section).
Before you start
This walkthrough has been optimized on Ubuntu Linux, so other platforms may require slight changes to the commands below. Windows and Mac users may need to make the following adjustments to the walkthrough commands:
- Remove sudo from all commands
If Docker is available to non-root users, then sudo is not needed on Mac. On Windows, sudo is not available.
- Replace $(pwd) with a full path to scout-data/test
This may work on Mac, but using this subshell syntax on Windows Powershell would not work. This should be replaced with
C:\path\to\scout-data\test
.
- Remove the line breaks
The backslash syntax for line breaks used in this walkthrough may not work on Windows Powershell. Instead, you can enter the command using a single line.
- Make sure Docker has permission to mount the drive with scout-data/
On Windows, Docker may only have permissions to access the C: drive by default. If scout-data is placed on another drive, then you should give Docker proper permissions by accessing the Docker settings through the Docker tray icon.
Docker setup
The easiest way to ensure a similar runtime environment for SCOUT on Windows,
Mac, and Linux is by using the chunglabmit/scout
Docker image hosted on
Dockerhub
. First, you will need to install “Docker Desktop” (which is
free) by following the platform-specific instructions at
https://docs.docker.com/get-docker/
Once installed, you will need to download the pre-built Docker image for SCOUT.
Open a terminal and use docker pull
to download it:
sudo docker pull chunglabmit/scout
This image may be moved to chunglabmit/scout
in the future. Note that
the sudo keyword may not be needed on your platform for Docker commands.
Docker Desktop on Windows and Mac may restrict the amount of CPU and RAM
resources that each container can use by default. You can adjust resource
allocation by accessing the Docker settings through the Docker tray icon. You
may also need to allow Docker access to other drives (D:, E:, etc) if the
scout-data
directory is placed on a different drive. Lastly, if you want
to run Jupyter notebooks with the SCOUT Docker image, you may need to follow
some platform-specific networking setup (port forwarding, routing, etc), which
you can read more about at
https://docs.docker.com/docker-for-windows/networking/ or
https://docs.docker.com/docker-for-mac/networking/ for Windows and Mac,
respectively.
Download test data
After installing the SCOUT Docker image, a small test dataset (~3 GB) can be
downloaded from Dropbox. The test dataset is distributed as an archive called
scout-data.zip
, which contains two subfolders: test
and
results
. The test
folder contains all the data needed to start
the SCOUT analysis from the begining, such as raw stitched images from a
microscope. The results
folder contains all the intermediate results
expected from completing the following walkthrough. This data is included for
completeness and verification purposes and is not required to actually run SCOUT
on newly acquired data.
First, download the test dataset from
https://leviathan-chunglab.mit.edu/nature-2020-supplementary/scout-data.zip and unzip it. Make
note of the resulting scout-data/test
and scout-data/results
folders.
Open a terminal (or Powershell on Windows) and move into the
scout-data/test
directory:
cd path/to/scout-data/test # Replace with actual path
This folder will be mounted into the SCOUT Docker container using the -v
...:/scout/data
argument to docker run
throughout the following
walkthrough.
Preprocessing
The first step in the SCOUT pipeline is to estimate the overall image histograms for each channel.
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout preprocess histogram \
data/dataset/color_0/ data/dataset/color0_hist.csv -s 1 -v
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout preprocess histogram \
data/dataset/color_1/ data/dataset/color1_hist.csv -s 1 -v
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout preprocess histogram \
data/dataset/color_2/ data/dataset/color2_hist.csv -s 1 -v
This will create 3 CSV files with histograms for each channel. Using these histograms, we can normalize the images to the range [0, 1] and apply a background threshold.
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout preprocess rescale \
data/dataset/color_0/ data/dataset/color0_hist.csv data/dataset/color0_rescaled \
-t 120 -p 99.7 -v
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout preprocess rescale \
data/dataset/color_1/ data/dataset/color1_hist.csv data/dataset/color1_rescaled \
-t 100 -p 99.7 -v
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout preprocess rescale \
data/dataset/color_2/ data/dataset/color2_hist.csv data/dataset/color2_rescaled \
-t 100 -p 99.7 -v
This will create three new folders containing normalized TIFF images for each channel. In order to more easily work with volumetric image data, we the convert the 2D TIFF stacks into 3D Zarr arrays. Each Zarr array is a nested folder of chunk compressed voxel data. By default, the chunk size is (64, 64, 64).
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout preprocess convert \
data/dataset/color0_rescaled data/dataset/syto.zarr -v
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout preprocess convert \
data/dataset/color1_rescaled data/dataset/sox2.zarr -v
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout preprocess convert \
data/dataset/color2_rescaled data/dataset/tbr1.zarr -v
This will create three new *.zarr
folders, one for each channel.
Nuclei Detection
Once we have the syto16.zarr array, we can detect nuclei centroids using parallel processing on each image chunk. Note that the current Docker image does not support GPU acceleration, and this step would be much faster by installing from source on a machine with a GPU.
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout nuclei detect data/dataset/syto.zarr \
data/dataset/nuclei_probability.zarr data/dataset/centroids.npy \
--voxel-size data/dataset/voxel_size.csv \
--output-um data/dataset/centroids_um.npy -n 2 -v
This will create a new Zarr array, nuclei_probability.zarr
, as well as
two numpy arrays with nuclei centroid coordinates. Given these nuclei centroids,
we can perform a seeded watershed segmentation of the nuclei probability array
to obtain the shape of each detected nucleus. This operation is done with some
overlap between adjacent chunks to avoid artifacts at the boundaries between
adjacent chunks in the watershed lines.
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout nuclei segment \
data/dataset/nuclei_probability.zarr data/dataset/centroids.npy \
data/dataset/nuclei_foreground.zarr data/dataset/nuclei_binary.zarr -n 2 -v
This will create two new Zarr arrays, nuclei_foreground.zarr
and
nuclei_binary.zarr
. Given this binary nuclei segmentation, we can
compute morphological features for each nucleus. The resulting morphological
features are stored in a CSV.
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout nuclei morphology \
data/dataset/nuclei_binary.zarr data/dataset/centroids.npy \
data/dataset/nuclei_morphologies.csv -v
This will create a CSV file containing multiple morphological measurements for each segmented nucleus. Finally, we can sample the fluorescence in the other channels (SOX2 and TBR1 in this case) at each nucleus centroid.
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout nuclei fluorescence \
data/dataset/centroids.npy data/dataset/nuclei_fluorescence/ \
data/dataset/sox2.zarr/ data/dataset/tbr1.zarr/ -v
This will create a folder, nuclei_fluorescence/
, that contains numpy
arrays with the fluorescence mean and standard deviation for each detected
nucleus. The resulting mean fluorescence intensities (MFIs) are useful for
gating cells into different cell types based on protein expression.
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout nuclei gate \
data/dataset/nuclei_fluorescence/nuclei_mfis.npy \
data/dataset/nuclei_gating.npy 0.35 0.25 -v
This will create a numpy array, nuclei_gating.npy
, containing binary
cell type labels for each nucleus. In this case, high SOX2 expression is used to
identify neural progenitors and high TBR1 expression is used to identify
post-motitic neurons. Cells that have low SOX2 and TBR1 expression are called
“double negative” (DN). Cell types can be named in order using the following
command:
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout nuclei name \
sox2 tbr1 dn -o data/dataset/celltype_names.csv -v
This will create a CSV file with names for each cell type.
Microenvironment Analysis
(Note that this was formerly called niche analysis)
Given nuclei centroids and cell type labels, we can further describe the microenvironment around each cell. To do this, we compute the proximity to each of the non-DN cell types, which is described in the Method section.
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout niche proximity \
data/dataset/centroids_um.npy data/dataset/nuclei_gating.npy \
data/dataset/niche_proximities.npy -r 25 25 -k 2 -v
This will create a numpy array with proximities to each cell type. These spatial proximities are attibutes of each cell describing the local environment. The next step is to use these proximity values to further gate cells into subpopulations based on their spatial context.
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout niche gate \
data/dataset/niche_proximities.npy data/dataset/niche_labels.npy \
--low 0.2 0.2 --high 0.66 0.63 -v
This will create a numpy array containing microenvironment labels for each nucleus. Here, we defined a low and high` proximity threshold for SOX2 and TBR1 separately. This results in 7 subpopulations (3 high, 3 mid, and 1 low), which can be named using the following command:
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout niche name \
DN SOX2 TBR1 DP MidTBR1 MidSOX2 MidInter -o data/dataset/niche_names.csv -v
This will create a CSV file with names for each microenvironment.
Ventricle Segmentation
Next, we turn to ventricle segmentation, which is required to calculate radial profiles in a cytoarchitecture analysis. The pretrained U-Net model assumes that each input image is of nuclear staining at 4 um pixel resoltion. We, therefore, resize the normalized nuclei images and stack them into a single 3D TIFF.
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout segment downsample \
data/dataset/color0_rescaled/ data/dataset/syto_down6x 6 6 -v -t
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout segment stack \
data/dataset/syto_down6x/ data/dataset/syto_down6x.tif -v
This will create a new folder and 3D TIFF with 6x downsampled (in x and y) images. The 3D TIFF can be passed to the U-Net model for ventricle segmentation, which occurs one 2D slide at a time.
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout segment ventricle \
data/dataset/syto_down6x.tif models/unet_weights3_zika.h5 \
data/dataset/segment_ventricles.tif -t 0.5 -v
This will result in a 3D TIFF, segment_ventricles.tif
, containing
a binary segmentation of all ventricles. We also need a foreground segmentation
to determine the overall organoid size and shape. A foreground segmentation can
be computed by thresholding.
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout segment foreground \
data/dataset/syto_down6x.tif data/dataset/segment_foreground.tif -v -t 0.02 -g 8 4 4
This will create another 3D TIFF, segment_foreground.tif
, containing a
binary segmentation of the whole organoid.
Cytoarchitecture Analysis
Given the ventricle segmentation, nuclei centroids, and cell types labels, radial profiles from each ventricle can be computed. First, the ventricle segmentation is turned into a polygon mesh (using the marching cubes algorithm).
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout cyto mesh \
data/dataset/segment_ventricles.tif data/dataset/voxel_size.csv \
data/dataset/mesh_ventricles.pkl -d 1 6 6 -g 2 -s 3 -v
This will generate a pickled Python dictionary, mesh_ventricles.pkl
,
containing mesh verticies, faces, and normals. Then, normal vectors from this
mesh are used to compute radial profiles for each cell type.
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout cyto profiles \
data/dataset/mesh_ventricles.pkl data/dataset/centroids_um.npy \
data/dataset/nuclei_gating.npy data/dataset/cyto_profiles.npy -v
This will create a numpy array, cyto_profiles.npy
, containing radial
profiles of cell counts. Finally, we randomly sample from the large number of
radial profiles to be able to cluster radial profiles across many organoids.
This step isn’t required in this case, but we include it for completeness.
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout cyto sample 5000 \
data/dataset/cyto_sample_index.npy -i data/dataset/cyto_profiles.npy \
-o data/dataset/cyto_profiles_sample.npy -v
This will create numpy arrays containing a random sample of radial profiles and
the corresponding index from the original array of profiles. Then, we would
compute clusters of cytoarchitectures across all organoids by combining sampled
profiles and using the determine cyto clusters.ipynb
notebook. You can
access and use these notebooks by starting a Jupyter server within the SCOUT
Docker container:
sudo docker run -it -v "$(pwd):/scout/data" -p 8888:8888 chunglabmit/scout jupyter --ip 0.0.0.0
Note that the positions of the -p and –ip arguments are important because
-p is for Docker port forwarding and –ip is for the Jupyter server. You can
navigate to localhost:8888
in your browser and copy the access token
printed to the terminal as /?token={copy-this-text}
.
For the sake of brevity, we simply provide precomputed profiles, labels, and a fit UMAP model from our d35/d60 comparison. With these, we can classify the cytoarchitecture of all radial profiles.
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout cyto classify \
data/cyto_profiles_combined.npy data/cyto_labels_combined.npy \
data/dataset/cyto_profiles.npy data/cyto_labels.npy -v \
--umap data/model_d35_d60.umap
This will create a numpy array, cyto_labels.npy
, containing
cytoarchitecture labels for each radial profile. Note that because the test
dataset is not a full 3D dataset, the resulting radial profiles and
cytoarchitecture labels obtained here may have some artifacts due to empty
profiles near the top and bottom of the test volume.
We can provide appropriate names for each cytoarchitecture cluster after
inspecting each cluster in the determine cyto clusters.ipynb
notebook.
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout cyto name \
TBR1-LowDN TBR1-HighDN Surface Artifacts DN Adjacent -o data/cyto_names.csv -v
This will create a CSV with names for each cytoarchitecture class.
Multiscale Analysis
All of the intermediate results above are used to compute multiscale features for each dataset in an analysis. Note that the following command assumes that the intermediate results are named as shown in the previous steps.
sudo docker run -v "$(pwd):/scout/data" chunglabmit/scout multiscale features data/ \
-d 1 6 6 -g 2 -v
This command will create an Excel file called organoid_features.xlsx
,
which is the final step in the walkthrough. Details of how to perform
statistical analysis of multiple organoids can be found in the full SCOUT
tutorial.
Expected results
The final organoid_features.xlsx
file can be inspected in Excel. For
convenience, we highlight some expected results in
organoid_features.xlsx
here:
TBR1 nbrhd, tbr count: 3967
SOX2 nbrhd, sox2 count: 12670
ventricle equivalent diameter mean (um): 48.874
organoid volume (mm3): 0.06565 (not a full organoid dataset)
All of the intermediate results can be compared to the results in
scout-data/results
.