Cytoarchitectural analysis
===========================

Given the ventricle segmentation, nuclei phyical coordinates, and cell-type labels, SCOUT's cytoarchitectural analysis
computes radial cell profiles to identify the distribution of cellular organizations in each organoid.

Generate ventricle normals
---------------------------

The binary ventricle segmentation must be converted into a vectorized mesh in order to obtain surface normals that
determine the direction for each radial profile. This can be done using the following command:

.. code-block:: bash

   scout cyto mesh segment_ventricles.tif voxel_size.csv mesh_ventricles.pkl -d 1 6 6 -g 2 -s 2 -v -p

This command creates a new file *mesh_ventricles.pkl*, which is a pickled Python dictionary containing the verticies,
normals, and faces of the ventricle mesh.

The argument *-d 1 6 6* specifies the downsampling factor used to generate
*segment_ventricles.tif*. The arguments *-g 2* and *-s 2* specify how much gaussian smoothing should be applied to the
binary image before meshing and what the mesh spacing should be (minimum is 1, which gives the finest mesh).

A 3D render of the ventricle mesh will be shown.
**Note:** The plotting flag *-p* here will not work if mayavi has not been properly installed.

Compute radial cell profiles
-----------------------------

Given the ventricle mesh, the nuclei point cloud can be binned into radially oriented cylinders along each
normal vector. To compute these radial profiles, the following command can be used:

.. code-block:: bash

   scout cyto profiles mesh_ventricles.pkl centroids_um.npy nuclei_gating.npy cyto_profiles.npy -v -p

This command creates a new numpy array *cyto_profiles.npy* containing the radial profiles for each cell-type along
each normal.

**Note:** This command defaults to using all CPU cores available. If the calculation is still too slow, try increasing
the mesh spacing *-s* to reduce the number of profiles to compute.

Setting up clustering analyses
-------------------------------

Due to the stochastic nature of organoid culture, it is difficult to assign distinct semantic labels to each
radial cell profile corresponding to canonical cytoarchitectures. Instead, SCOUT aims to quantify cytoarchitectural
composition of each organoid relative to what is observed in a group of organoids. To accomplish this, SCOUT uses
hierarchical clustering of radial cell profiles to assign cytoarchitectural labels to all organoids in a particular
study.

Since the results of the clustering analysis depend on which organoids are included, it is useful to aggregate
organoid datasets into a separate folder to make this dependency on a group of organoids explicit. We first select
which datasets we would like to include in our analysis folder using the following command:

.. code-block:: bash

    scout multiscale select /path/to/datasets/summary.csv analysis.csv Arlotta_d34 Lancaster_d35 -v


This command reads the summary.csv file, pulls out the paths to the Arlotta_d34 and Lancaster_d35 organoid datasets, and saves the
selected organoids to an new analysis.csv file in the current folder (which is assumed to be analysis folder for Arlotta_d34_vs_Lancaster_d35 comparison based on the command above). Note that this analysis.csv file can be modified to
include more datasets or exclude some datasets from the analysis. The analysis folder can be setup with links to
underlying dataset folders using the following command:

.. code-block:: bash

    scout multiscale setup analysis.csv /path/to/datasets/ -v

This command creates new folders for each dataset in the analysis containing a symbolic link to the original dataset
folder in `/path/to/datasets`. This allows convenient access to the single-cell analysis results and, more importantly,
shares the previous results across multiple analyses for consistency.

Using the example in the **Preprocessing** section, running this command in an `analysis` folder next to the `datasets`
folder would result in the following structure:

.. code-block:: bash

    datasets/
    |   20190419_14_35_07_AA_org1_488LP13_561LP120_642LP60/
    |   20190419_15_50_16_AA_org2_488LP13_561LP120_642LP60/
    |   20190509_16_55_31_AA-orgs5.8.19_org1_488LP15_561LP140_642LP50/
    |   20190531_14_31_36_AA_org2_488LP13_561LP140_642LP60/
    |   .,, possibly more datasets
    |   summary.csv
    
    analysis_name/
    | analysis.csv
    | Group1/   
    |   20190419_14_35_07_AA_org1_488LP13_561LP120_642LP60/
    |   |   dataset -> ../../datasets/20190419_14_35_07_AA_org1_488LP13_561LP120_642LP60/
    |   20190419_15_50_16_AA_org2_488LP13_561LP120_642LP60/
    |   |   dataset -> ../../datasets/20190419_15_50_16_AA_org2_488LP13_561LP120_642LP60/
    | Group2/
    |   20190509_16_55_31_AA-orgs5.8.19_org1_488LP15_561LP140_642LP50/
    |   |   dataset -> ../../datasets/20190509_16_55_31_AA-orgs5.8.19_org1_488LP15_561LP140_642LP50/
    |   20190531_14_31_36_AA_org2_488LP13_561LP140_642LP60/
    |   |   dataset -> ../../datasets/20190531_14_31_36_AA_org2_488LP13_561LP140_642LP60/

Clustering sampled profiles
----------------------------

In order to identify different types of cytoarchitectures in an unbiased manner, the radial cell profiles can be
clustered into groups using hierarchical clustering.

In comparative studies, some organoids may contain completely different cytoarchitectures depending on the model.
For this reason, the clustering analysis must contain representative radial profiles from each organoid.

Given that it is too computationally expensive to concatenate all profiles together and cluster them directly,
we instead randomly sample an equal number of profiles from each organoid in the analysis and cluster based on that
subset. The profiles can be sampled using the following command:

.. code-block:: bash

   scout cyto sample 5000 cyto_sample_index.npy -i cyto_profiles.npy -o cyto_profiles_sample.npy -v

This command will randomly sample 5000 profiles from *cyto_profiles.npy* and store them in *cyto_profiles_sample.npy*.
Another output is *cyto_samply_index.npy*, which contains the index into *cyto_profiles.npy* for each of the sampled
profiles. This command should be run for each organoid, and then the sampled profiles can be combined:

.. code-block:: bash

    scout cyto combine analysis.csv -o cyto_profiles_combined.npy -s cyto_profiles_combined_samples.npy -v

When this command is run, it is meant to find individual cyto_profiles_sample.npy from each subfolder within each group and combine them all. The profiles will be concatenated in order and saved to *cyto_profiles_combined.npy*, along with a new array
*cyto_profiles_combined_samples.npy* which contains integer labels for each profile corresponding to the which
organoid the profile is from. If this process is run correctly, the analysis folder should resemble the construct below. 

.. code-block:: bash
    
    analysis/
    | analysis.csv
    | cyto_profiles_combined.npy
    | cyto_profiles_combined_sample.npy 
    | Group1/   
    |   20190419_14_35_07_AA_org1_488LP13_561LP120_642LP60/
    |   |   dataset -> ../../datasets/20190419_14_35_07_AA_org1_488LP13_561LP120_642LP60/
    |   20190419_15_50_16_AA_org2_488LP13_561LP120_642LP60/
    |   |   dataset -> ../../datasets/20190419_15_50_16_AA_org2_488LP13_561LP120_642LP60/
    | Group2/
    |   20190509_16_55_31_AA-orgs5.8.19_org1_488LP15_561LP140_642LP50/
    |   |   dataset -> ../../datasets/20190509_16_55_31_AA-orgs5.8.19_org1_488LP15_561LP140_642LP50/
    |   20190531_14_31_36_AA_org2_488LP13_561LP140_642LP60/
    |   |   dataset -> ../../datasets/20190531_14_31_36_AA_org2_488LP13_561LP140_642LP60/

To perform the cytoarchitecture clustering anb visualization, use the Jupyter notebook
"*determine cyto clusters.ipynb*".

Once the cytoarchitecture clusters have been determined, they can be named using the following command:

.. code-block:: bash

   scout cyto name name1 name2 (...) -o cyto_names.csv -v 

Next Step: 

.. code-block:: bash
   
   scp -r cyto_names.csv /Group1/each_organoid_folder 

Classifying cytoarchitectures
------------------------------

Once the clusters labels have been identified, then all radial profiles can be classified based on those
cytoarchitecture assignments. Please go into the individual organoid subfolder and run the command given below:

.. code-block:: bash

    scout cyto classify ../../cyto_profiles_combined.npy ../../cyto_labels_combined.npy dataset/cyto_profiles.npy cyto_labels.npy -v --umap ../../model_name.umap 

This command uses the combined profiles and cluster labels as a training set to classify all profiles in
*cyto_profiles.npy* using a nearest neighbor classifier. The resulting cytoarchitecture labels are saved to
*cyto_labels.npy*, and the argument *--umap model.umap* specifies which pre-trained UMAP model to embed the
profiles before classification.

3D rendering with Blender
--------------------------

Using Blender 2.8, the following scripts can be used to render the ventricles colored by cytoarchitectural labels
each mesh face as well as the nuclei point clouds.

Export as OBJ and CSV
**********************

See the Jupyter notebook "*Export mesh and points as OBJ*". OBJ files can be imported directly into Blender.
The notebook converts the nuclei physical coordinates from a numpy array into a CSV array that can be read
into Blender using pure Python.

The cytoarchitecture labels correspond to each vertex, but meshes are easier to color by faces in Blender, so the
notebook also uses the vertex-based labels to label each face. The resulting face labels are written to CSV
so that they can also be loaded into Blender using pure Python.

Blender script
***************

In Blender, the following script creates a new material for each unique cytoarchitecture and assigns each face
in the ventricle mesh to the corresponding material.

.. code-block:: python

    import bpy
    import csv

    # Path to face labels
    labels_csv = 'face_labels.csv'

    def read_csv(path):
        with open(path, mode='r') as f:
            line = f.readline().split('\n')[0]
        return line.split(',')

    # Load face labels
    labels = read_csv(labels_csv)
    classes = list(set(labels))
    classes.sort()
    n_classes = len(classes)
    print(f'Read {len(labels)} face labels belonging to {n_classes} classes')

    # Make materials for each class
    context = bpy.context
    obj = context.object
    mesh = obj.data

    existing_material_names = [m.name for m in mesh.materials]
    class_material_names = []
    class_material_index = []
    for i in range(n_classes):
        material_name = f'class {i} material'
        class_material_names.append(material_name)
        if material_name in existing_material_names:
            class_material_index.append(existing_material_names.index(material_name))
        else:
            class_material_index.append(len(mesh.materials))
            mesh.materials.append(bpy.data.materials.new(material_name))
    label_to_index = dict(zip(range(n_classes), class_material_index))

    # Assign faces to materials based on labels
    for f, lbl in zip(mesh.polygons, labels):  # iterate over faces
        print(lbl)
        f.material_index = label_to_index[int(lbl)]
        print("face", f.index, "material_index", f.material_index)
        slot = obj.material_slots[f.material_index]
        mat = slot.material
        if mat is not None:
            print(mat.name)
            print(mat.diffuse_color)
        else:
            print("No mat in slot", f.material_index)