Secondary MS1 feature annotation
After processing LCMS data through the MS-DIAL pipeline, secondary annotations for MS1 features can be obtained by matching mass data to external databases.
What do you mean by MS1 data?
- What is MS1 data?
- MS1 data represents the initial scan in mass spectrometry, where ions are detected based on their mass-to-charge ratio (m/z). It provides an overview of all molecular features in a sample, including their m/z values and intensities, enabling detection and quantification of compounds.
- How does that differ from MS/MS (MS2) data?
- While MS1 provides a broad profile of ions without structural information, MS/MS involves selecting specific ions from the MS1 scan (precursor ions), fragmenting them, and analysing the resulting product ions. This fragmentation reveals structural details, enabling precise compound identification and differentiation of isomers.
Version 2 of the secondary annotation functions available
Please update your script files for the add_hmdb()
and add_lmsd()
functions for new runs.
Updated with additional adduct values for automated adduct-specific correction to monoisotopic masses.
This is important for accurate secondary assignments.
17 May, 2025
Secondary annotation with HMDB đ
You should currently have a SummarizedExperiment
object that has been pre-processed with pmp
. The next stage is to match the MS1 mass data for each feature to the Human Metabolome Database (HMDB) database file.
Prepared HMDB database file
Download the prepared HMDB database file in RDS format here.
- Version 5.0 (November 2023) – pre-formatted and filtered to include only annotated or documented features.
Appending HMDB annotations to SummarizedExperiment
objects
The first step is to load the formatted HMDB data.frame
into your R session.
# Load HMDB dataset
hmdb_df <- readRDS(here::here('hmdb', 'hmdb_metabolites_detect_quant_v5_20231102.rds'))
Then, using the add_hmdb()
function, we can search the HMDB annotations in the data.frame
and add them to our SummarizedExperiment
objects, as shown below with an example stool metabolomics object.
Parameters for add_hmdb()
Parameter | Description |
---|---|
metab_SE |
The SummarizedExperiment object you want to annotate with HMDB. |
hmdb |
The HMDB database object. |
mass_tol |
Default: 0.002 – the mass tolerance allowed for annotation. |
cores |
The number of parallel processes to use if desired. |
Run time
Running add_hmdb()
, especially without parallel processing, can take a very long time.
Which SummarizedExperiment
should I use?
If you are using the output of the pmp_preprocess()
function, you should extract and annotate the glog_results
. The standard practice is to use the glog-transformed data from here on.
# Extract the glog data from the pmp_preprocess output
metab_stool_glog <- metab_stool_pmp$glog_results
# Search annotations in HMDB and add to the SE objects
metab_stool_glog <- add_hmdb(metab_SE = metab_stool_glog,
hmdb = hmdb_df,
mass_tol = 0.002,
cores = 6)
Secondary annotation with LIPID MAPS đ§
After processing lipidomics data through MS-DIAL, you can enhance the annotations of MS1 features by leveraging the LIPID MAPS Structure Database (LMSD). At this point, you should have a SummarizedExperiment
object containing preliminary annotations and those from the HMDB database. The next step involves matching the MS1 mass data of each feature to the entries in the LMSD database.
Prepared LMSD database file
Download the prepared LMSD database file in RDS format here.
- Version 2022-02-16
Should I run this section?
You should definitely run this step if you have lipidomics data to process. If you are processing metabolomics data, you can skip this section.
Appending LIPID MAPS annotations to SummarizedExperiment
objects
The first step is to load the formatted LMSD data.frame
into your R session.
# Load LMSD dataset
lmsd_df <- readRDS(here::here('lmsd', 'LMSD_231107.rds'))
Then, using the add_lmsd()
function, we can search the LIPID MAPS annotations in the data.frame
and add them to our SummarizedExperiment
objects, as shown below with an example stool metabolomics object.
Parameters for add_lmsd()
Parameter | Description |
---|---|
metab_SE |
The SummarizedExperiment object you want to annotate with HMDB. |
lmsd |
The LMSD database object. |
mass_tol |
Default: 0.002 – the mass tolerance allowed for annotation. |
cores |
The number of parallel processes to use if desired. |
Run time
Running add_lmsd()
, especially without parallel processing, can take a very long time.
# Search annotations in LMSD and add to the SE objects
# Create list for all distinct Lipid Maps matching mz in tolerance range 0.002, an aggregated df of distinct lipids and a df to replace SummarizedExperiment metadata [rowData(metab_glog)]
lmsd_ann_list <- add_lmsd(metab_SE = metab_stool_glog,
lmsd = lmsd_df,
mass_tol = 0.002,
cores = 6)
# Use metadata_lmsd_table to replace the existing SE object metadata
rowData(metab_stool_glog) <- lmsd_ann_list$metadata_lmsd_table
Comparing annotations from different databases âī¸đ
To compare the assigned annotations from each of the methods the compare_annotations_SE() function. It will produce a data.frame
containing only features with at least one annotation, and allow us see whether the annotations typically agree with each other.
Parameters for compare_annotations_SE()
Parameter | Description |
---|---|
metab_SE |
The SummarizedExperiment object with secondary annotations. These should include HMDB for metabolomics data, and both HMDB and LMSD for lipidomics data. |
mode |
Either 'metabolomics' or 'lipidomics' depending on your dataset. |
agg_lmsd_ann |
The aggregated LMSD annotations you generated using the add_lmsd() function, i.e. lmsd_ann_list$agg_lmsd_df . Only required for mode = 'lipidomics' |
Keeping only annotated features đˇī¸â
From here, we can filter our SummarizedExperiment
object for features with at least one annotation.
While the other features likely represent interesting metabolites and lipids, without an available annotation, they won't be interpretable downstream.
We can achieve this providing our SummarizedExperiment
object to the keep_annotated_SE()
function, which will output a filtered SummarizedExperiment
object.
Parameters for keep_annotated_SE()
Parameter | Description |
---|---|
metab_SE |
The SummarizedExperiment object with secondary annotations. These should include HMDB for metabolomics data, and both HMDB and LMSD for lipidomics data. |
mode |
Either 'metabolomics' or 'lipidomics' depending on your dataset. |
The function will create a new rowData
element called shortname
, and also assign this value as the preferred row name.
- It uses the following naming hierarchy to decide an appropriate name: HMDB > MS-DIAL.
# Keep only annotated rows and generate shortname column
metab_stool_glog <- keep_annotated(metab_SE = metab_stool_glog,
mode = 'metabolomics')
# Save the object
saveRDS(metab_stool_glog, here('output', '01_Preprocessing', 'metab_stool_glog_anno.rds'))
Are the other annotations still there?
Yes! While the shorter names are succinct and useful for plotting, you can view the additional annotations at any time and alter as required.
The function will create a new rowData
element called shortname
, and also assign this value as the preferred row name.
- It uses the following naming hierarchy to decide an appropriate name: MS-DIAL > LMSD > HMDB.
- MS-DIAL has its own lipid database that is effective and, because it can also utilise MS/MS data in its annotations, is preferred here.
# Keep only annotated rows and generate shortname column
lipid_stool_glog <- keep_annotated(metab_SE = lipid_stool_glog,
mode = 'lipidomics')
# Save the object
saveRDS(lipid_stool_glog, here('output', '01_Preprocessing', 'lipid_stool_glog_anno.rds'))
Are the other annotations still there?
Yes! While the shorter names are succinct and useful for plotting, you can view the additional annotations at any time and alter as required.
Next steps âĄī¸
You now have a normalised, imputed, dataset that has undergone secondary annotation and been filtered for annotated features. It is now time to proceed to manual curation of the annotated spectra.
Rights
- Copyright ÂŠī¸ 2024 Mucosal Immunology lab, Monash University, Melbourne, Australia.
- HMDB version 5.0: Wishart DS, Guo A, Oler E, Wang F, Anjum A, Peters H, Dizon R, Sayeeda Z, Tian S, Lee BL, Berjanskii M. HMDB 5.0: the human metabolome database for 2022. Nucleic acids research. 2022 Jan 7;50(D1):D622-31.
- License: This pipeline is provided under the MIT license.
- Authors: M. Macowan and C. Pattaroni