Ticket name: Infogather toward Structured Primary Diagnosis / Primary Site
Person Responsible: Tracy
Watcher: Justin K, Kirk
For reference see "Browse Collections/Analyses table redesign" google doc.
Information-gathering is expected about how big a task it would be to align details from current to standardized elements from these two fields
- Cancer Type
- Locations
to these three fields:
- Primary Diagnosis
- Primary cancer site
- Location imaged
starting from Browse Collections table, "Cancer Type" column,
influenced by values in summary & clinical data from the collection page, try to
(1) where the parallel is easy between the current table values of Primary DIagnosis and an element of https://docs.google.com/spreadsheets/d/1KnbOO85nqbx8A3m1VLWOiUhBJY_KxkI29_JwAVgKxjs/edit?gid=0#gid=0 contains information about CRDC-blessed NCIT permissible values. make a note of what and that it is plausible
(2) where the parallel is "Various" "Unknown" or too vague in TCIA at the moment to assign clearly, make a note of that too.
Edge cases include
covid
"noncancer" distinct from "healthy volunteers across this axis" distinct from "healthy"
special mets explorations. Give some specifics about these.
This deliverable is to report some gestalts about the proportion of "easy" versus "tough" not to start swapping anything on production. It is a necessary preliminary to being able to rename the columns on our browse collections table.