Example Usage¶
This notebook demonstrates simple usage of Cloud-BIDS-Layout
[2]:
import cloud_bids_layout as cbl
Use Cloud-BIDS-Layout to inspect a large-ish dataset¶
The UCLA Consortium for Neuropsychiatric Phenomics LA5c Study has 272 subjects and takes up about 80 GB of disk space. This would take a long time to download to our local machine in order to use pybids to index it. Let’s try it with Cloud-BIDS-Layout instead.
[5]:
from datetime import datetime
print(datetime.now())
2020-07-28 13:20:08.337858
[6]:
layout = cbl.CloudBIDSLayout(
remote_location="s3://openneuro.org/ds000030",
download_dir="./ds000030"
)
/Users/richford/miniconda3/envs/s3bidsscratch/lib/python3.7/site-packages/bids/layout/models.py:102: FutureWarning: The 'extension' entity currently excludes the leading dot ('.'). As of version 0.14.0, it will include the leading dot. To suppress this warning and include the leading dot, use `bids.config.set_option('extension_initial_dot', True)`.
FutureWarning)
[7]:
print(datetime.now())
2020-07-28 13:41:38.366827
This took me about 20 minutes on my laptop. I estimate it would’be taken about 2.5 hours to download the data to my laptop and then something like 15 minutes for pybids to index that.
You may notice that this created the ds000030
directory. Feel free to explore it. You’ll see that it downloaded all of the json files and created empty files for all of the other file types. This is enough for pybids to index the dataset. Later, you’ll see a method for downloading the actual data.
Now we have a pybids BIDSLayout
instance and we can use the familiar get()
method to query our dataset. For example:
[36]:
# Get the subject IDs for all subjects with DWI data
layout.get(target="subject", return_type="id", datatype="dwi")[:10]
[36]:
['10159',
'10171',
'10189',
'10193',
'10206',
'10217',
'10225',
'10227',
'10228',
'10235']
Now suppose we decide we want to download all files for subject sub-70086
.
[34]:
files_we_want = layout.get(subject="70086", return_type="files")
print([fname.split("ds000030/")[-1] for fname in files_we_want])
['sub-70086/anat/sub-70086_T1w.json', 'sub-70086/anat/sub-70086_T1w.nii.gz', 'sub-70086/beh/sub-70086_task-stopsignaltraining_beh.json', 'sub-70086/beh/sub-70086_task-stopsignaltraining_events.tsv', 'sub-70086/dwi/sub-70086_dwi.bval', 'sub-70086/dwi/sub-70086_dwi.bvec', 'sub-70086/dwi/sub-70086_dwi.json', 'sub-70086/dwi/sub-70086_dwi.nii.gz', 'sub-70086/func/sub-70086_task-bart_bold.json', 'sub-70086/func/sub-70086_task-bart_bold.nii.gz', 'sub-70086/func/sub-70086_task-bart_events.tsv', 'sub-70086/func/sub-70086_task-bht_bold.json', 'sub-70086/func/sub-70086_task-bht_bold.nii.gz', 'sub-70086/func/sub-70086_task-bht_events.tsv', 'sub-70086/func/sub-70086_task-bht_physio.json', 'sub-70086/func/sub-70086_task-bht_physio.tsv.gz', 'sub-70086/func/sub-70086_task-pamenc_bold.json', 'sub-70086/func/sub-70086_task-pamenc_bold.nii.gz', 'sub-70086/func/sub-70086_task-pamenc_events.tsv', 'sub-70086/func/sub-70086_task-pamret_bold.json', 'sub-70086/func/sub-70086_task-pamret_bold.nii.gz', 'sub-70086/func/sub-70086_task-pamret_events.tsv', 'sub-70086/func/sub-70086_task-rest_bold.json', 'sub-70086/func/sub-70086_task-rest_bold.nii.gz', 'sub-70086/func/sub-70086_task-rest_physio.json', 'sub-70086/func/sub-70086_task-rest_physio.tsv.gz', 'sub-70086/func/sub-70086_task-scap_bold.json', 'sub-70086/func/sub-70086_task-scap_bold.nii.gz', 'sub-70086/func/sub-70086_task-scap_events.tsv', 'sub-70086/func/sub-70086_task-stopsignal_bold.json', 'sub-70086/func/sub-70086_task-stopsignal_bold.nii.gz', 'sub-70086/func/sub-70086_task-stopsignal_events.tsv', 'sub-70086/func/sub-70086_task-taskswitch_bold.json', 'sub-70086/func/sub-70086_task-taskswitch_bold.nii.gz', 'sub-70086/func/sub-70086_task-taskswitch_events.tsv']
The CloudBIDSLayout().download_files()
method takes all of the same arguments as BIDSLayout.get()
but downloads the returned files to the local drive so that they actually exist on your system instead of the empty copies that we got earlier.
[37]:
files_we_got = layout.download_files(subject="70086")
print([bidsfile.path.split("ds000030/")[-1] for bidsfile in files_we_got])
['sub-70086/anat/sub-70086_T1w.json', 'sub-70086/anat/sub-70086_T1w.nii.gz', 'sub-70086/beh/sub-70086_task-stopsignaltraining_beh.json', 'sub-70086/beh/sub-70086_task-stopsignaltraining_events.tsv', 'sub-70086/dwi/sub-70086_dwi.bval', 'sub-70086/dwi/sub-70086_dwi.bvec', 'sub-70086/dwi/sub-70086_dwi.json', 'sub-70086/dwi/sub-70086_dwi.nii.gz', 'sub-70086/func/sub-70086_task-bart_bold.json', 'sub-70086/func/sub-70086_task-bart_bold.nii.gz', 'sub-70086/func/sub-70086_task-bart_events.tsv', 'sub-70086/func/sub-70086_task-bht_bold.json', 'sub-70086/func/sub-70086_task-bht_bold.nii.gz', 'sub-70086/func/sub-70086_task-bht_events.tsv', 'sub-70086/func/sub-70086_task-bht_physio.json', 'sub-70086/func/sub-70086_task-bht_physio.tsv.gz', 'sub-70086/func/sub-70086_task-pamenc_bold.json', 'sub-70086/func/sub-70086_task-pamenc_bold.nii.gz', 'sub-70086/func/sub-70086_task-pamenc_events.tsv', 'sub-70086/func/sub-70086_task-pamret_bold.json', 'sub-70086/func/sub-70086_task-pamret_bold.nii.gz', 'sub-70086/func/sub-70086_task-pamret_events.tsv', 'sub-70086/func/sub-70086_task-rest_bold.json', 'sub-70086/func/sub-70086_task-rest_bold.nii.gz', 'sub-70086/func/sub-70086_task-rest_physio.json', 'sub-70086/func/sub-70086_task-rest_physio.tsv.gz', 'sub-70086/func/sub-70086_task-scap_bold.json', 'sub-70086/func/sub-70086_task-scap_bold.nii.gz', 'sub-70086/func/sub-70086_task-scap_events.tsv', 'sub-70086/func/sub-70086_task-stopsignal_bold.json', 'sub-70086/func/sub-70086_task-stopsignal_bold.nii.gz', 'sub-70086/func/sub-70086_task-stopsignal_events.tsv', 'sub-70086/func/sub-70086_task-taskswitch_bold.json', 'sub-70086/func/sub-70086_task-taskswitch_bold.nii.gz', 'sub-70086/func/sub-70086_task-taskswitch_events.tsv']
That’s it. Once we download the files, then we’ll have a subset of the BIDS data and we can use the standard pybids’ BIDSLayout
to index the local data and do some good science.