Using Amazon Web Services S3

Data for our projects is stored on the Amazon Web Services (AWS) Simple Storage Service (S3).

Accessing data from HCP

The data from the Human Connectome project is provided as part of AWS Open Data program. The HCP dataset entry in the program is provided here.

To access the processed Human Connectome Project data, use the instructions provided here

To add your HCP credentials to your configuration, you will need to use the command-line interface.

aws configure --profile hcp

We also have code in pyAFQ that automatically fetches/reads HCP data from S3.

Uploading data to S3

Before uploading data to our S3 storage, please organize it in a 'BIDSish' format on your local hard-drive. This will look something like this:

|    <study>
|      ├-derivatives
|            ├-<pipeline>
|                ├── sub01
|                │   ├── ses01
|                │   │   ├── anat
|                │   │   │   ├── sub-01_ses-01_aparc+aseg.nii.gz
|                │   │   │   └── sub-01_ses-01_T1.nii.gz
|                │   │   └── dwi
|                │   │       ├── sub-01_ses-01_dwi.bvals
|                │   │       ├── sub-01_ses-01_dwi.bvecs
|                │   │       └── sub-01_ses-01_dwi.nii.gz
|                │   └── ses02
|                │       ├── anat
|                │       │   ├── sub-01_ses-02_aparc+aseg.nii.gz
|                │       │   └── sub-01_ses-02_T1w.nii.gz
|                │       └── dwi
|                │           ├── sub-01_ses-02_dwi.bvals
|                │           ├── sub-01_ses-02_dwi.bvecs
|                │           └── sub-01_ses-02_dwi.nii.gz
|                └── sub02
|                   ├── ses01
|                   │   ├── anat
|                   │       ├── sub-02_ses-01_aparc+aseg.nii.gz
|                   │   │   └── sub-02_ses-01_T1w.nii.gz
|                   │   └── dwi
|                   │       ├── sub-02_ses-01_dwi.bvals
|                   │       ├── sub-02_ses-01_dwi.bvecs
|                   │       └── sub-02_ses-01_dwi.nii.gz
|                   └── ses02
|                       ├── anat
|                       │   ├── sub-02_ses-02_aparc+aseg.nii.gz
|                       │   └── sub-02_ses-02_T1w.nii.gz
|                       └── dwi
|                           ├── sub-02_ses-02_dwi.bvals
|                           ├── sub-02_ses-02_dwi.bvecs
|                           └── sub-02_ses-02_dwi.nii.gz

Where study will be the name of the study and will also be the name we will use for the bucket on S3. Instead of pipeline use a name of the preprocessing pipeline that you used to process the data. For example, you might use vista, if you processed the data with the vistalab tools or dmriprep if you use dmriprep.

Here, we will use the command-line interface to upload the data (see how to get started with that).

The command reference for S3 sub-commands is here. Specifically, to create a bucket on S3, you will use the mb sub-sub-command:

aws s3 mb s3://study

This command should be executed once. Then, uploading the data as a sync operation:

aws s3 sync path/to/study s3://study

The use of sync (rather than aws s3 cp) means that only new data would be uploaded (for example, if repeated calls are made).