-
Story
-
Resolution: Done
-
Critical
Identify Data Blobs (Tracy, Jeff & Michael) with member of the team will locate wiki page data that fits the following criteria:
- Wiki page data content that is not template specific
- DICOM in NBIA (accessible via NBIA API) includes counts modalities Collection name versions licenses
- image files in pathDB
- image files in Box
- image files in Aspera Faspex
- “pretty picture” not all collections have but there’s placeholder for a display picture often the graphical logo of the program like CPTAC or APOLLO and often a figure from the accompanying journal article
- Separate templates for Program, Community, Analyses, Path only?
- Links to PI shared code as tarball, notebook, or link
- Acknowledgements that submitters want us to share
- Abstract and excerpt describing the project in plaintext “Summary section”
- Associated citations
- “if you use this data in your work see and cite these papers and these grants”
- “Here are papers by others who used this data”
- Versions and version notes
- Curation versions and version notes on the not-for-public-use “frontend”
- who, when, what, where it went
- Oddball wiki page content
- “Excerpt” wiki sections sometimes do and sometimes do not exactly match NBIA “summary/abstract” and should - note IDC prepared blurbs like this for Analysis Collections that we can’t currently propagate that existed when they imported
- Spreadsheets on wiki pages
- participant splits by PI
- clinical/demographic/proteomic/genomic (nonDICOM helpful details)
- crosswalks between dicom and nondicom imaging ( e.g. Duke-Breast-Cancer-MRI or UCSF-PDGM-to-BraTS)
- Documents and PDFs on wiki pages
- Experiment parameters
- cohort description
- image acquisition longitudinal protocol
- Batched download splits due to either tech limitations or PI request
- test/train/validate
- subsets and supersets lists ( e.g. one analysis crosses several Collections)
- other task splits
- too big so split by patientID