Emily Mallory emkd42@gmail.com August 2018 README Instructions for creating vector space and querying drugs and reactions. ##################################### Files in directory: create_vector_space.py - creates a vector space given input compounds query_vector_space.py - queries vector space for similar difference vectors run_vs.sh - example job submission script for running create_vector_space.py compounds_to_add.tsv - file with new compounds for the vector space drug_metabolite_data.tsv - curated drug metabolite smiles strings full_compounds_to_smiles.p - pickle file with python dictionary of compounds to smiles primary_reactions_data.tsv - reactions used for querying vector space uniq_smiles_final_results_maccs.p - pickle file with a list of unique smiles in the vector space ##################################### To compute vector space and query a drug-metabolite pair: 1. Install python dependencies libraries needed: rdkit cPickle numpy matplotlib 2. Calculate vector space To calculate vector space: python create_vector_space -d /path/to/output/directory -t maccs Parameters: -d: output directory where all output files stored -t: type of fingerprint. Either maccs (MACCS) or full (unfolded) fingerprint 3. Rank reactions for a given drug-metabolite pair python query_vector_space.py -d /path/to/output/directory -t rank_rxn -s DM1 Parameters: -d /path/to/output/directory -t type of ranking (rank_drug or rank_rxn) -s query input (either dm or reaction identifier) There are two options for ranking reactions: rank_drug and rank_rxn. Rank_drug ranks drug-metabolite pairs against a query reaction. Rank_rxn ranks MetaCyc reactions against a query drug-metabolite pair.