README for Chapter 7 Augmenting Drug Mechanism Prediction with Text Mining Emily K Mallory Description: Directory contains supplemental files for Chapter 7 of dissertation. ########################################## Directories and Files A. auc (directory): Directory with auc results for semantic motifs in PD and PK pathways. B. supplemental_semantic_motif_results (directory): Directory with semantic motif files for PD and PK pathways. Includes raw results and PD/PK score comparison files. C. Semantic Motifs for Risperidone GNBR Query (file): File with GNBR semantic motif query results for risperidone . ########################################## File Descriptions A. auc directory: File names contain pathway and skeleton type. For example: PD_LINE_ProteinSmallMoleculeProtein_within_auc_lines_all.txt refers to LINE skeleton in PD pathway where the entities are Protein -> SmallMolecule -> Protein. Theme1 in the AUC file is for the first edge and theme2 is for the second edge. File format: Column1: theme pair separated by ";". Format is Theme1;Theme2. Short codes used for theme descritions: see Theme Description section below. Column2: Number of positive motifs Column3: Number of negative motifs Column4: Area under the Receiver Operating Characteristic curve (AUROC or AUC) B. supplemental_semantic_motifs directory: Directory contains three primary types of files: those ending in "_lines_all.txt", "_collapsed.txt", and "_comparison". As with the auc directory, filenames contain the pathway type (PD or PK) and the skeleton type (e.g., LINE) for the comparison files. Example filenames: edge_motifs_3_PK_within_vec_positives_lines_all.txt - detect semantic motifs edge_motifs_3_PK_within_OUT_LINE_lines_all_comparison.txt - comparison of average semantic motifs between PD and PK for a given skeleton edge_motifs_3_PK_within_vec_positives_lines_all_collapsed.txt - generic semantic motifs for a given pathway File format for edge_motifs_3_(PD/PK)_within_vec_positives_lines_all.txt: column1: skeleton type column2: skeleton entities column3: theme pair separated by ";". Format is Theme1;Theme2. Short codes used for theme descritions: see Theme Description section below. column4: combined theme score column5: pathway entity 1 column6: theme1 and score, separated by "|" column7: pathway entity 2 column8: theme2 and score, separated by "|" column9: pathway entity 3 File format for edge_motifs_3_(PK/PD)_within_vec_positives_lines_all_collapsed.txt: column1: skeleton type column2: skeleton entities column3: theme pair separated by ";". Format is Theme1;Theme2. Short codes used for theme descritions: see Theme Description section below. column4: count of matching semantic motifs in the "_all.txt" file column5: average combined theme score for a combination skeleton and theme pair File format for edge_motifs_3_(PD/PK)_within_(skeleton)_lines_all_comparison.txt: Filename contains pathway type and skeleton. Columns: skeleton: skeleton entities theme: theme pair separated by ";". Format is Theme1;Theme2. Short codes used for theme descritions: see Theme Description section below. score: average combined theme score for a combination skeleton and theme pair rank change: difference in rank between PD and PK semantic motifs score change: difference in score between PD and PK semantic motifs percentile change: difference in percentile between PD and PK semantic motifs C. Semantic Motifs for Risperidone GNBR Query (file): Semantic Motif: entity type and themes Entity1: entity1 matching the query in GNBR. Entry is a combination of database identifier and common name, separated by ";" Dependency Path 1: max dependency path score for the theme1 in the query. Used for normalization and final score. Entity2: entity2 matching the query in GNBR. Entry is a combination of database identifier and common name, separated by ";" Dependency Path 2: max dependency path score for the theme2 in the query. Used for normalization and final score. Entity3: entity3 matching the query in GNBR. Entry is a combination of database identifier and common name, separated by ";" Theme 1 Score: normalized score for dependency path 1 for theme1 Theme 2 Score: normalized score for dependency path 2 for theme2 Combined Score: sum of theme1 and theme2 score NOTE: risperidone will show up in either entity1, 2 or 3 as MESH:D018967;risperidone or MESH:D018967. ########################################## Theme descriptions Reference with description of the EBC algorirthm: Percha B, Altman RB. Learning the Structure of Biomedical Relationships from Unstructured Text. PLoS Comput Biol. 2015 Jul 28;11(7):e1004216. doi: 10.1371/journal.pcbi.1004216. eCollection 2015 Jul. PubMed PMID: 26219079; PubMed Central PMCID: PMC4517797. Reference for the expanded GNBR themes: Percha B, Altman RB. A global network of biomedical relationships derived from text. Bioinformatics. 2018 Aug 1;34(15):2614-2624. doi: 10.1093/bioinformatics/bty114. PubMed PMID: 29490008. Theme Codes: A+: agonism, activation A-: antagonism, blocking B: binding, ligand E+: increases expession/production E-: decreases expression/production E: affects expression/production (neutral) N: inhibits K: metabolism, pharmacokinetics Z: enzyme activity O: transport, channel