Since 2006, the Wishart lab at the University of Alberta has been developing and releasing freely available metabolomic databases, programs and web servers to the metabolomics community. These were developed to address persistent problems in compound identification, compound annotation, data reduction, data analysis, biological interpretation and knowledge translation. This page provides hyperlinks, synoptic descriptions and PubMed abstract links to all of the metabolomic databases and software developed by the Wishart lab and its collaborators. The management and development of these resources was initially done under the auspices of the Bioinformatics Help Desk (a Genome Canada funded core facility) and the Human Metabolome Project from 2006-2010 but it is now handled through the support of TMIC and CIHR. Please feel free to use these tools and we welcome your feedback.
The Human Metabolome Database (HMDB) is a freely available database containing detailed information about small molecule metabolites found in the human body. It is intended to be used for applications in metabolomics, clinical chemistry, biomarker discovery and general education. The database is designed to contain or link three kinds of data: 1) chemical data, 2) clinical data, and 3) molecular biology/biochemistry data. HMDB contains over 41,000 metabolite entries including both water-soluble and lipid soluble metabolites as well as metabolites that would be regarded as either abundant (> 1 uM) or relatively rare (< 1 nM). Additionally, approximately 7,200 protein (and DNA) sequences are linked to these metabolite entries.
The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. The database contains over 7,800 drug entries nearly 2,200 FDA-approved small molecule drugs, 340 FDA-approved biotech (protein/peptide) drugs, 93 nutraceuticals and >5,000 experimental drugs. Additionally, more than 3,500 non-redundant protein (i.e. drug target) sequences are linked to these FDA approved drug entries. Each DrugCard entry contains more than 100 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data.
The Milk Composition Database (MCDB) is a freely available electronic database containing detailed information about small molecule metabolites found in cow milk. The MCDB contains a complete list of metabolite names, metabolite structures, level of verification (confirmed or probable), reference spectra (NMR, GC–MS and LC–MS) and citations for all (to the best of our knowledge) of the milk compounds that have ever been identified, quantified or reported either in this database or in the existing scientific literature. The database contains 2,355 metabolite entries including both water- and lipid-soluble metabolites as well as metabolites that would be regarded as either abundant (>1 uM) or relatively rare (> 1 nM). Each metabolite entry contains more than 90 data fields and many of them are hyperlinked to other databases (KEGG, PubChem, ChEBI, PDB, MetaCyc, DrugBank, GenBank, and UniProt) and a variety of structure and pathway viewing applets. The MCDB itself is fully searchable and supports text, mass, spectral and structure searches. The information includes literature and experimentally derived chemical and molecular/biochemistry data.
The SMPDB (Small Molecule Pathway Database) is an interactive, visual database containing more than 30 000 small molecule pathways found in humans only. The majority of these pathways are not found in any other pathway database. SMPDB is designed specifically to support pathway elucidation and pathway discovery in metabolomics, transcriptomics, proteomics and systems biology. It is able to do so, in part, by providing exquisitely detailed, fully searchable, hyperlinked diagrams of human metabolic pathways, metabolic disease pathways, metabolite signaling pathways and drug-action pathways.
FooDB is the world’s largest and most comprehensive resource on food constituents, chemistry and biology. It provides information on both macronutrients and micronutrients, including many of the constituents that give foods their flavor, color, taste, texture and aroma. It currently has data on over 26,500 food compounds and food associations.
Phenol-Explorer is the first comprehensive database on polyphenol content in foods. The database contains more than 35,000 content values for 500 different polyphenols in over 400 foods. These data are derived from the systematic collection of more than 60,000 original content values found in more than 1,300 scientific publications. Each of these publications has been critically evaluated before inclusion in the database.
Exposome-Explorer is the first database dedicated to biomarkers of exposure to environmental risk factors for diseases. It contains detailed information on the nature of biomarkers, populations and subjects where measured, samples analyzed, methods used for biomarker analyses, concentrations in biospecimens, correlations with external exposure measurements, and biological reproducibility over time. This information can be used by epidemiologists and clinicians to compare the performance and field of application of various biomarkers and to identify specific biomarkers or panels of biomarkers most useful for biomonitoring or disease etiology studies.
The Toxic Exposome Database (TEDB), also known as the Toxin and Toxin Target Database (T3DB) is a unique bioinformatics resource that combines detailed toxin data with comprehensive toxin target information. The database currently houses over 3,600 toxins described by over 41,000 synonyms, including pollutants, pesticides, drugs, and food toxins, which are linked to over 2,000 corresponding toxin target records. Altogether there are over 42,000 toxin, toxin target associations.
The E. coli Metabolome Database (ECMDB) is an expertly curated database containing extensive metabolomic data and metabolic pathway diagrams about Escherichia coli (strain K12, MG1655). This database includes significant quantities of “original” data compiled by members of the Wishart laboratory as well as additional material derived from hundreds of textbooks, scientific journals, metabolic reconstructions and other electronic databases. ECMDB currently contains 3755 small molecules with 1402 associated enzymes and 387 associated transporters. It also has 1542 metabolic pathways that are linked to 3011 metabolites. A total of 19,294 NMR and MS spectra (experimental and predicted) for 3098 different E. coli metabolites are also contained in the database.
The Yeast Metabolome Database (YMDB) is a manually curated database of small molecule metabolites found in or produced by Saccharomyces cerevisiae(also known as Baker's yeast and Brewer's yeast). This database covers metabolites described in textbooks, scientific journals, metabolic reconstructions and other electronic databases. YMDB contains metabolites arising from normal S. cerevisiae metabolism under defined laboratory conditions as well as metabolites generated by S. cerevisiae when used in baking and in the production of wines, beers and spirits. YMDB currently contains 16042 small molecules with 909 associated enzymes and 149 associated transporters.
The Urine Metabolome Database is a freely available electronic database containing detailed information about ~3100 small molecule metabolites found in human urine along with ~3900 concentration values. The data tables may be sorted and searched by concentration values and ranges. The information includes literature and experimentally derived chemical data, clinical data and molecular/biochemistry data.
The Serum Metabolome Database is a freely available electronic database containing detailed information about 4651 small molecule metabolites found in human serum along with 10895 concentration values. The data tables may be sorted and searched by concentration values and ranges. The information includes literature and experimentally derived chemical data, clinical data and molecular/biochemistry data.
The CSF Metabolome Database is a freely available electronic database containing detailed information about 468 small molecule metabolites found in human CSF along with 1650 concentration values. The data tables may be sorted and searched by concentration values and ranges. The information includes literature and experimentally derived chemical data, clinical data and molecular/biochemistry data.
The Saliva Metabolome Database is a freely available electronic database containing detailed information about many small molecule metabolites found in human saliva and many concentration values. Each metabolite entry contains more than 110 data fields and many of them are hyperlinked to other databases (KEGG, PubChem, ChEBI, Chemspider, DrugBank, PDB and Uniprot). The information includes literature and experimentally derived chemical data, clinical data and molecular/biochemistry data.
The Fecal Metabolome Database is a freely available electronic database containing detailed information about many small molecule metabolites found in human feces and many concentration values. Each metabolite entry contains more than 110 data fields and many of them are hyperlinked to other databases (KEGG, PubChem, ChEBI, Chemspider, DrugBank, PDB and Uniprot). The information includes literature and experimentally derived chemical data, clinical data and molecular/biochemistry data.
The Bovine Metabolome Database (BMDB) is a freely available electronic database containing detailed information about small molecule metabolites found in beef and dairy cattle. The information includes literature and experimentally derived information on bovine meat, bovine serum, bovine milk, bovine urine and bovine ruminal fluid.
The Bovine Rumen Metabolome Database (RMDB) makes available tables containing the set of 246 ruminal fluid metabolites or metabolite species from the bovine ruminal fluid metabolome, along with their concentrations, related literature reference and links to their known diet associations. The data was collected by combining NMR spectroscopy, inductively coupled plasma mass-spectroscopy (ICP-MS), gas chromatography-mass spectrometry (GC-MS), direct flow injection (DFI) mass spectrometry and lipidomics with computer-aided literature mining to identify and quantify essentially all of the metabolites in bovine ruminal fluid that can be routinely detected with today’s technology.
The Alberta Food Composition Database (AFCDB) is the first comprehensive resource on food constituents, chemistry and biology dedicated to major Alberta-grown produce. It provides information on both macronutrients and micronutrients, including many of the constituents that give foods their flavor, color, taste, texture and aroma. Users can view the contents of the AFCDB from the "FoodView" (listing foods by their chemical composition) or the "ChemView" (listing chemicals by their food sources). The database contains more than 30 foods and over 1700 metabolites, some of which have been identified, quantified and reported for the very first time.
MarkerDB is a freely available electronic database that consolidates detailed information about human biomarkers into a single source. The database covers several types of biomarkers including small molecules, proteins, genetic markers and karyotypes. Several types of indications are covered including prognostic, diagnostic, predictive indications, as well as biomarkers of exposure and biomarkers for monitoring treatment progression. It is intended to be used for applications in metabolomics, clinical chemistry, biomarker discovery and general education.
MetaboAnalyst is a comprehensive, Web-based tool designed for processing, analyzing, and interpreting metabolomic data. It handles most of the common metabolomic data types including compound concentration lists, spectral bin lists, peak lists, and raw MS spectra.
CFM-ID provides a method for accurately and efficiently identifying metabolites in spectra generated by electrospray tandem mass spectrometry (ESI-MS/MS). The program uses Competitive Fragmentation Modeling to produce a probabilistic generative model for the MS/MS fragmentation process and machine learning techniques to adapt the model parameters from data.
References: Metabolomics 2015 Feb; 11(1): 98–110.
Bayesil is a web system that automatically identifies and quantifies metabolites using 1D 1H NMR spectra of ultra-filtered plasma, serum or cerebrospinal fluid. The NMR spectra must be collected in a standardized fashion for Bayesil to perform optimally. Bayesil first performs all spectral processing steps, including Fourier transformation, phasing, solvent filtering, chemical shift referencing, baseline correction and reference line shape convolution automatically. It then deconvolutes the resulting NMR spectrum using a reference spectral library. This deconvolution process determines both the identity and quantity of the compounds in the biofluid mixture. Extensive testing shows that Bayesil meets or exceeds the performance of highly trained human experts.
GC-AutoFit is a web application that automatically identifies and quantifies metabolites using Gas Chromatography Mass Spectrometry (GC-MS) spectra. For optimal GC-AutoFit performance, the query GC-MS spectra should be prepared according to the instructions (How to collect GC-MS Spectra for GC-AutoFit). GC-AutoFit currently accepts .CDF and .mzXML file formats. It uses alkane standards to calculate the retention index (RI) of each peak in the sample. The extracted EI-MS spectra from each peak, along with the RIs, are then compared to reference spectra (RIs and EI-MS) in the specified library to identify and quantify the compounds. The inclusion of blank spectra is optional, however, it is useful for removing noise effects from the query spectra. Extensive testing shows that GC-AutoFit meets or exceeds the performance of highly trained human experts.
PolySearch 2.0 is an online search engine and text-mining system for identifying relationships between human diseases, genes, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch 2.0 supports a generalized 'Given X, find all associated Ys' query, where X and Y can be selected from the aforementioned biomedical entities.
MetATT is a easy-to-use, web-based tool designed for time-series and two-factor metabolomics data analysis. MetATT offers a number of complementary approaches including 3D interactive principal component analysis, two-way heatmap visualization, two-way ANOVA, ANOVA-simultaneous component analysis and multivariate empirical Bayes time-series analysis.
MetPA (Metabolomics Pathway Analysis) is a free and easy-to-use web application designed to perform pathway analysis and visualization of quantitative metabolomic data.
MSEA is a web-based tool to help identify and interpret patterns of metabolite concentration changes in a biologically meaningful context for human and mammalian metabolomic studies.
Receiver Operating Characteristic (ROC) curves are generally considered the method of choice for evaluating the performance of potential biomarkers. ROCCET is a freely available web-based tool designed to assist clinicians and bench biologists in performing common ROC based analyses on their metabolomic data using both classical univariate and more recently developed multivariate approaches.
MetaboMiner is a program which can be used to automatically or semi-automatically identify metabolites in complex biofluids from 2D NMR spectra. MetaboMiner is able to handle both 1H-1H total correlation spectroscopy (TOCSY) and 1H-13C heteronuclear single quantum correlation (HSQC) data. It identifies compounds by comparing 2D spectral patterns in the NMR spectrum of the biofluid mixture with specially constructed libraries containing reference spectra of approximately 500 pure compounds. Tests using a variety of synthetic and real spectra of compound mixtures showed that MetaboMiner is able to identify >80% of detectable metabolites from good quality NMR spectra.
Heatmapper is a freely available web server that allows users to interactively visualize their data in the form of heat maps through an easy-to-use graphical interface. Heatmapper is a versatile tool that allows users to easily create a wide variety of heat maps for many different data types and applications. Heatmapper allows users to generate, cluster and visualize:
- expression-based heat maps from transcriptomic, proteomic and metabolomic experiments
- pairwise distance maps
- correlation maps
- image overlay heat maps
- latitude and longitude heat maps and
- geopolitical (choropleth) heat maps.
Heatmapper offers a number of simple and intuitive customization options for easy adjustments to each heat map’s appearance and plotting parameters.
BioTransformer is a freely available software package for accurate, rapid, and comprehensive in silico metabolism prediction and compound identification. BioTransformer combines a machine learning-based approach with a knowledge-based approach to predict small molecule metabolism in human tissues (e.g. liver tissue), the human gut as well as the environment (soil and water microbiota), via its Metabolism Prediction Tool.