Export data to AWS S3

1. Import dependencies

We need to import the classes and functions that we will be using from the corresponding packages.

[ ]:

import os
import boto3
import json
from urllib.parse import quote_plus
from os.path import join, isfile, isdir
from urllib.request import urlretrieve
from anndata import read_h5ad
import scanpy as sc

from vitessce import (
    VitessceWidget,
    VitessceConfig,
    Component as cm,
    CoordinationType as ct,
    AnnDataWrapper,
)
from vitessce.data_utils import (
    optimize_adata,
    VAR_CHUNK_SIZE,
)

2. Download and process data

For this example, we need to download a dataset from the COVID-19 Cell Atlas https://www.covid19cellatlas.org/index.healthy.html#habib17.

[ ]:

adata_filepath = join("data", "habib17.processed.h5ad")
if not isfile(adata_filepath):
    os.makedirs("data", exist_ok=True)
    urlretrieve('https://covid19.cog.sanger.ac.uk/habib17.processed.h5ad', adata_filepath)

adata = read_h5ad(adata_filepath)
top_dispersion = adata.var["dispersions_norm"][
    sorted(
        range(len(adata.var["dispersions_norm"])),
        key=lambda k: adata.var["dispersions_norm"][k],
    )[-51:][0]
]
adata.var["top_highly_variable"] = (
    adata.var["dispersions_norm"] > top_dispersion
)

[ ]:

zarr_filepath = join("data", "habib17.processed.zarr")
if not isdir(zarr_filepath):
    adata = optimize_adata(
        adata,
        obs_cols=["CellType"],
        obsm_keys=["X_umap"],
        var_cols=["top_highly_variable"],
        optimize_X=True,
    )
    adata.write_zarr(zarr_filepath, chunks=[adata.shape[0], VAR_CHUNK_SIZE])

3. Create the Vitessce configuration

Set up the configuration by adding the views and datasets of interest.

[ ]:

vc = VitessceConfig(schema_version="1.0.15", name='Habib et al', description='COVID-19 Healthy Donor Brain')
dataset = vc.add_dataset(name='Brain').add_object(AnnDataWrapper(
        adata_path=zarr_filepath,
        obs_embedding_paths=["obsm/X_umap"],
        obs_embedding_names=["UMAP"],
        obs_set_paths=["obs/CellType"],
        obs_set_names=["Cell Type"],
        obs_feature_matrix_path="X",
        feature_filter_path="var/top_highly_variable"
))
scatterplot = vc.add_view(cm.SCATTERPLOT, dataset=dataset, mapping="UMAP")
cell_sets = vc.add_view(cm.OBS_SETS, dataset=dataset)
genes = vc.add_view(cm.FEATURE_LIST, dataset=dataset)
heatmap = vc.add_view(cm.HEATMAP, dataset=dataset)
vc.layout((scatterplot | (cell_sets / genes)) / heatmap);

4. Create a `boto3` resource with S3 credentials

[ ]:

s3 = boto3.resource(
    service_name='s3',
    aws_access_key_id=os.environ['VITESSCE_S3_ACCESS_KEY_ID'],
    aws_secret_access_key=os.environ['VITESSCE_S3_SECRET_ACCESS_KEY'],
)

5. Upload files to S3

The .export(to='S3') method on the view config instance will upload all data objects to the specified bucket. Then, the processed view config will be returned as a dict, with the file URLs filled in, pointing to the S3 bucket files. For more information about configuring the S3 bucket so that files are accessible over the internet, visit the “Hosting Data” page of our core documentation site.

[ ]:

config_dict = vc.export(to='S3', s3=s3, bucket_name='vitessce-export-examples', prefix='test')

6. View on vitessce.io

The returned view config dict can be converted to a URL, and can be used to share the interactive visualizations with colleagues.

[ ]:

vitessce_url = "http://vitessce.io/?url=data:," + quote_plus(json.dumps(config_dict))
import webbrowser
webbrowser.open(vitessce_url)