MERFISH - MERlin
This tutorial is for visualizing datasets from custom (home-built) MERFISH setups processed via the MERlin pipeline. Our aim was to make the BellaVista package as lightweight as possible and make it easy for users to customize the code for custom setups, analyses, and needs. Check out the napari website for further documentation and features you can implement!
MERlin pipeline outputs
Standard MERlin outputs folder organization:
%ANALYSIS_HOME%
├── ...
├── ExportBarcodes
│ └── barcodes.csv
├── ...
├── FiducialCorrelationWarp
│ └── images
│ └── aligned_images0.tif
└── ...
└── aligned_images406.tif
├── ...
├── RefineCellDatabases
│ └── features
│ └── feature_data_0.hdf5
└── ...
└── feature_data_406.tif
├── ...
├── codebook.csv
├── microscope_parameters.json
└── positions.csv
From these outputs, it is possible to visualize tissue images, spatial transcript locations, and cell/nuclear segmentation boundaries.
To visualize tissue images, individual field-of-views (FOVs) must be stitched together. FOV images can be found in FiducialCorrelationWarp/Images in the MERlin output folder. Currently, BellaVista does not include a stitching pipeline. Stitching can be accomplished using image processing utilizing Python packages such as NumPy and Dask or softwares such as BigStitcher. The output stitched image must be a TIFF image, with individually stitched images for each channel you wish to visualize. The stitched images should be saved in the %ANALYSIS_HOME% directory.
Transcript locations and cell/nuclear segmentations exported by MERlin can be processed directly by BellaVista.
In order to visualize your MERFISH dataset in BellaVista, you will need to create a dataset-specific JSON configuration file containing paths to the MERlin outputs for your dataset. These output files will be processed to generate visualization files for BellaVista. Creating these visualization files will take a few minutes but only need to be created once. For subsequent runs, create_inputs can be set to False.
Configuration JSON file structure
{
"system": "merlin",
"data_folder": "/path/to/merlin_outs",
"create_bellavista_inputs": true,
"visualization_parameters": {
"plot_image": true,
"plot_transcripts": true,
"plot_allgenes": true,
"genes_visible_on_startup": false,
"plot_cell_seg": true
},
"input_files": {
"transcript_filename": "ExportBarcodes/barcodes.csv",
"codebook": "codebook.csv",
"images": "DAPI_z3.tif",
"microscope_parameters": "microscope_parameters.json",
"positions_list": "positions.csv",
"z_plane": 3,
"cell_segmentation": "RefineCellDatabases/features"
}
}
Input file parameters (MERlin)
- transcript_filename: string
relative path to CSV file containing decoded gene transcript spatial coordinates. If None, no transcripts will be processed
- codebook: string
relative path to CSV containing map from
barcode_idto gene ID. This is required to plot transcripts. If None, no transcripts will be processed- images: string or 1D array of strings, default=None
relative path to stitched image file(s). Must be a TIFF file. If None, no images will be processed
When visualizing a single image, provide the file path as a string. For multiple images, pass them as a list of filenames. For example, use "DAPI.tif" for a single image or ["DAPI.tif", "PolyT.tif"] for multiple images
- microscope_parameters: string
relative path to JSON file containing microscope micron to pixel transform. This is required if displaying images. If None, no images will be processed
- positions_list: string
relative path to CSV file containing micron microscope positions for each FOV. This is required if displaying images. If None, no images will be processed
- z_plane: integer, default=0
z-plane of segmentations to plot. We suggest this match the z-plane of the image you are visualizing. If None, the segmentations from the first z-plane will be processed
- cell_segmentation: string, default=None
relative path to folder containing HDF5 cell segmentation boundaries. If None, no cell segmentations will be processed
- nuclear_segmentation: string, default=None
relative path to folder containing HDF5 nuclear segmentation boundaries. If None, no nuclear segmentations will be processed
Important
All input file paths must be relative paths to data_folder
Note
If you are missing some input files, remove those input file parameters from the JSON file. BellaVista will skip the preparation of visualization files for these data.
For example, if you do not have cell segmentations, the input file parameters in your JSON file might look like this:
"input_files": {
"transcript_filename": "barcodes.csv",
"codebook": "codebook.csv",
"images": "DAPI.tif",
"microscope_parameters": "microscope_parameters.json",
"positions_list": "positions.csv"
}
General parameters
- system: string
Value:
"merlin"
The input is not case-sensitive, so values "merlin", "MERlin", and "MERLIN" are treated equivalently- data_folder: string
The path to the folder where the dataset output files are stored. BellaVista visualization files will be saved in a new folder named
BellaVista_outputwithin the data_folder.- create_bellavista_inputs: boolean, default=true
Specifies whether to generate the necessary visualization files for BellaVista. It should be set to
truewhen loading the data for the first time. It can be set tofalsein later runs, as the files will already have been created.If set to
trueand the visualization files already exist from a previous run, BellaVista will skip recreating those files and only generate any missing ones.
Visualization parameters
- plot_image: boolean, default=false
Display image(s)
- plot_transcripts: boolean, default=false
Plot spatial coordinates of gene transcripts
- plot_allgenes: boolean, default=true
Plot transcripts for all gene IDs. If set to
false, only the gene IDs specified inselected_geneswill be plotted- genes_visible_on_startup: boolean, default=false
Controls the visibility of all gene layers at startup. If set to
false, the gene layers will be hidden
Setting this option to false improves navigation performance. Gene layers can be shown later using the toggle visibility feature.
- selected_genes: 1D array of strings, default=None
Specifies the gene IDs whose transcripts will be plotted. If None, transcripts for all genes will be plotted
- plot_cell_seg: boolean, default=false
Plot cell segmentation
- plot_nuclear_seg: boolean, default=false
Plot nuclear segmentation
- transcript_point_size: float, default=1
Size of the points representing individual transcript coordinates
- contrast_limits: tuple array of integers, default=None
Range of values [0, 65535] used to set the contrast limits for the displayed image(s)
- rotate_angle: integer, default=0
Rotation angle in degrees, within the range [0, 360], by which to rotate the data
Loading BellaVista
Once your JSON is correctly configured for your dataset, you can run BellaVista in the terminal:
Replace
my_dataset.jsonwith the filename of the JSON you created. The JSON file argument should contain the file path to your JSON file.
$ bellavista my_dataset.json
Note
It will take a few minutes to create the required data files. The terminal will print updates & have progress bars for time consuming steps.
Once loaded, you should see a napari window displaying your data. Now, you can interactively move around the napari canvas to explore the data. Try zooming in & out, toggling layers on & off to see different spatial patterns!
Tip
To visualize a single layer, and hide all other layers, Option/Alt-click on the visibility button (the eye, to the left of the layer name). Check out Helpful napari tips in the FAQ for more tips!
Refer to the tutorial below for a step-by-step guide on running BellaVista with a sample dataset and JSON.
If you encounter any issues, please check the FAQ. If you're experiencing issues not addressed in the FAQ, please check the open issues or open a new issuein our GitHub repository. You can also leave any feedback here!
Getting Started (with sample data)
Download sample data
For this example, we processed the first FOV from a MERFISH sample dataset from the Brain Image Library [@Zhuang2020] via MERlin.
Download MERlin processed MERFISH mouse primary motor cortex dataset (mouse 2, sample 4, FOV 0) from dropbox.
Load BellaVista
In the folder downloaded from dropbox, open
merlin_sample.jsonReplace the path in
data_folderPython cannot parse JSON files containing file paths with single backslashes (\). To avoid errors, use either forward slashes (/) or double backslashes (\\) when representing file paths in the JSON strings.
merlin_sample.json
{
"system": "merlin",
"data_folder": "/path/to/merlin_outs",
"create_bellavista_inputs": true,
"parameters": {
"plot_image": true,
"plot_transcripts": true,
"plot_allgenes": true,
"genes_visible_on_startup": true,
"plot_cell_seg": true,
"transcript_point_size": 0.75
},
"input_files": {
"transcript_filename": "ExportBarcodes/barcodes.csv",
"codebook": "codebook.csv",
"images": "polyT_z3.tif",
"microscope_parameters": "microscope_parameters.json",
"positions_list": "positions.csv",
"z_plane": 3,
"cell_segmentation": "RefineCellDatabases/features"
}
}
In the terminal, run BellaVista with the MERlin sample JSON:
The JSON file argument should contain the file path to the JSON file.
$ bellavista merlin_sample.json
Note
It will take a few minutes to create the required data files. The terminal will print updates & have progress bars for time consuming steps.
Using this JSON file, the displayed output should look similar to this:
Note
Gene colors are assigned randomly every time BellaVista is launched. So, the gene colors displayed in your window will be different from the image above. See useful-napari-commands in the FAQ for commands to configure gene colors and other customizable visualization options.
To reproduce the same colors every time you launch BellaVista, see Creating your own figures! in the Figure Guide.
Try zooming in & out, toggling gene and cell segmentation layers on & off:
Tip
To visualize a single layer, and hide all other layers, Option/Alt-click on the visibility button (the eye, to the left of the layer name). Check out Helpful napari tips in the FAQ for more tips!
If you encounter any issues, please check the FAQ. If you're experiencing issues not addressed in the FAQ, please check the open issues or open a new issuein our GitHub repository. You can also leave any feedback here!
References
[@Zhuang2020]: Zhuang, Xiaowei, Zhang, Meng. (2020). A molecularly defined and spatially resolved cell atlas of the mouse primary motor cortex. [ Collection / Dataset ]. Brain Image Library. https://doi.org/10.35077/g.21