GlycoAnnotateR

Logo

R package for annotation of glycans in mass spectrometry data.

View the Project on GitHub margotbligh/GlycoAnnotateR

1 Introduction

This is a simple tutorial for the annotation of glycans in a single LC-MS/MS file in R using XCMS and GlycoAnnotateR. For a more detailed tutorial, including correspondence analysis and isotope detection, see this tutorial.

The data used in this tutorial is an enzyme digest of fucoidan, a sulfated fucan from brown algae. The data can be downloaded from the MassIVE server (f.MSV000095410/peak/MS31_20240618_newMpyrifera_digest_SIMddMS2_2.5uL_85%EtOH_11.mzML).

2 Load required packages

library(GlycoAnnotateR)
library(tidyverse)
library(xcms)
library(ggplot2)
library(ggrepel)

3 Pre-processing the data

3.1 Import data

Now we import the mzML file as an OnDiskMSnExp object.

#read in data 
data <- readMSData(files = 'MS31_20240618_newMpyrifera_digest_SIMddMS2_2.5uL_85%EtOH_11.mzML',
                   mode = 'onDisk')

3.2 Peak picking

Peak picking is done with the CentWave algorithm, for details see Tautenhahn et al., 2008.

Important to note: each dataset will require parameters appropriate for your data! If no peaks are picked, then you will not be able to annotate any peaks! See this tutorial for pick peaking with CentWave.

#create parameter object
cwp <- CentWaveParam(ppm = 10, peakwidth = c(10, 60), integrate = 2,
                     snthresh = 1)
#pick peaks
pks <- findChromPeaks(data, cwp)

Peak picking should always be optimised and checked extensively, as this will have a massive impact on the downstream data interpretation. Here, we will check an example chromatogram for a hexasulfated trifucan oligomer We will quickly use glycoPredict from GlycoAnnotateR to get the correct m/z value.

#calculate m/z values 
#first make parameter object
param = glycoPredictParam(dp = c(3,3), #degree of polymerisation
                          polarity = 'neg', #polarity
                          adducts = 'H', # adducts
                          modifications = c('sulfate', #modifications
                                            'deoxy'),
                          double_sulfate = T, #allow double sulfation 
                          nmod_max = 3) #allow 3 modifications per monomer i.e. 2 sulfates, 1 deoxy
#make table
masses_trifucan <- glycoPredict(param)

#inspect table
#there are 82 possible ions 
head(masses_trifucan)
## # A tibble: 6 × 10
##      dp  mass formula    `IUPAC name`     glyconnect_id glytoucan_id ion      mz
##   <dbl> <dbl> <chr>      <chr>            <chr>         <chr>        <chr> <dbl>
## 1     3  456. C18H32O13  DeoxyHex3        none          none         [M-H…  455.
## 2     3  472. C18H32O14  Hex1 DeoxyHex2   none          none         [M-H…  471.
## 3     3  488. C18H32O15  Hex2 DeoxyHex1   420           G60232GL     [M-H…  487.
## 4     3  504. C18H32O16  Hex3             234           G99375FM     [M-H…  503.
## 5     3  536. C18H32O16S DeoxyHex3 Sulfa… none          none         [M-H…  535.
## 6     3  552. C18H32O17S Hex1 DeoxyHex2 … none          none         [M-H…  551.
## # ℹ 2 more variables: ion_formula <chr>, charge <chr>
#get mz value for [M-2H]-2 adduct
mz_trifucan = masses_trifucan$mz[masses_trifucan$`IUPAC name` == 'DeoxyHex3 Sulfate6'&
                                   masses_trifucan$ion == '[M-2H]-2']

#extract chromatogram +- 4 ppm
chr_trifucan <- chromatogram(pks,
                             mz = c(mz_trifucan-ppm_to_mz(mz_trifucan, 4), 
                                    mz_trifucan+ppm_to_mz(mz_trifucan, 4)))

The picked peak is indicated by the shaded grey area in the extracted ion chromatogram.

plot(chr_trifucan[[1]])

4 Annotating peaks

We will now move forward with extracting and annotating our peak table table to identify putative oligosaccharide ions. Here, we will first create the glycoPredictParam object, then make a table of potential oligosaccharide compositions. We will filter this for charges of -1 and -2, and then use it to annotate the peak table. The charge filtering may not be appropriate depending on your instrument and dataset.

4.1 Extract the peak table

We will extract a dataframe with the picked chromatographic peaks

#extract table and make dataframe
pks_df <- chromPeaks(pks) %>% as.data.frame()

#inspect table
#54 peaks were identified in this example
str(pks_df)
## 'data.frame':    54 obs. of  11 variables:
##  $ mz    : num  243 243 243 243 243 ...
##  $ mzmin : num  243 243 243 243 243 ...
##  $ mzmax : num  243 243 243 243 243 ...
##  $ rt    : num  39 38.8 39 40.4 42.7 ...
##  $ rtmin : num  36.4 36.4 36.4 36.5 36.8 ...
##  $ rtmax : num  41.9 45 45 43.5 48.8 ...
##  $ into  : num  100630 345758 345758 280727 462989 ...
##  $ intb  : num  100316 343758 343758 276292 461956 ...
##  $ maxo  : num  29487 91344 91344 63715 88028 ...
##  $ sn    : num  47 71 71 28 105 ...
##  $ sample: num  1 1 1 1 1 1 1 1 1 1 ...
head(pks_df)
##            mz    mzmin    mzmax       rt    rtmin    rtmax     into     intb
## CP01 243.0652 243.0635 243.0666 38.95620 36.39610 41.92055 100630.0 100315.5
## CP02 243.1792 243.1787 243.1795 38.82145 36.39610 45.01966 345758.0 343758.5
## CP03 243.1792 243.1785 243.1795 38.95620 36.39610 45.01966 345758.0 343758.5
## CP04 243.0873 243.0868 243.0878 40.43844 36.53084 43.53754 280726.8 276292.2
## CP05 242.9601 242.9598 242.9606 42.72905 36.80034 48.79251 462988.8 461956.4
## CP06 242.9603 242.9598 242.9610 52.00861 47.17552 62.14229 415453.0 414178.8
##          maxo  sn sample
## CP01 29486.51  47      1
## CP02 91343.73  71      1
## CP03 91343.73  71      1
## CP04 63714.87  28      1
## CP05 88027.51 105      1
## CP06 43843.45  52      1

4.2 Making the parameter object

Create the parameter object for calculation of theoretical compositions.

#create the glycoPredictParam object
gpp <- glycoPredictParam(
  #degree of polymerisation range
  dp = c(1,10), 
  #ionisation polarity
  polarity = "neg", 
  #ionisation type (affects adduct types)
  ion_type = "ESI", 
  #scan range during MS1 acquisition 
  scan_range = c(175,1400),
  #expected modifications
  modifications = c('sulfate', 'deoxy'),
  #double sulfation is possible
  double_sulfate = T,
  #maximum average number of modifications per monomer
  nmod_max = 3,
  #adducts
  adducts = "H")
gpp
## An object of class "glycoPredictParam"
## Slot "dp":
## [1]  1 10
## 
## Slot "polarity":
## [1] "neg"
## 
## Slot "scan_range":
## [1]  175 1400
## 
## Slot "pent_option":
## [1] FALSE
## 
## Slot "modifications":
## [1] "sulfate" "deoxy"  
## 
## Slot "nmod_max":
## [1] 3
## 
## Slot "double_sulfate":
## [1] TRUE
## 
## Slot "label":
## [1] "none"
## 
## Slot "ion_type":
## [1] "ESI"
## 
## Slot "format":
## [1] "long"
## 
## Slot "adducts":
## [1] "H"
## 
## Slot "naming":
## [1] "IUPAC"
## 
## Slot "glycan_linkage":
## [1] "none"
## 
## Slot "modification_limits":
## [1] "none"

4.3 Calculating compositions

We can now use parameters defined in the glycoPredictParam object to ‘predict’ (or rather calculate) all possible glycans. You can directly run glycoAnnotate to annotate your peak list using the parameters, which will call glycoPredict in the background, but here we will demonstrate the output of glycoPredict by running it separately, as we want to introduce an addition filtering for charge. You can see that the charge filtered table contains ~1000 ions, while the original table contains >6000.

#create a table of theoretical glycans and their ions
pred_df <- glycoPredict(param = gpp)

#inspect table
str(pred_df)
## tibble [6,276 × 10] (S3: tbl_df/tbl/data.frame)
##  $ dp           : num [1:6276] 1 1 1 1 1 2 2 2 2 2 ...
##  $ mass         : num [1:6276] 180 244 260 324 340 ...
##  $ formula      : chr [1:6276] "C6H12O6" "C6H12O8S" "C6H12O9S" "C6H12O11S2" ...
##  $ IUPAC name   : chr [1:6276] "Hex1" "DeoxyHex1 Sulfate1" "Hex1 Sulfate1" "DeoxyHex1 Sulfate2" ...
##  $ glyconnect_id: chr [1:6276] "228" "none" "none" "none" ...
##  $ glytoucan_id : chr [1:6276] "G49874UX" "none" "none" "none" ...
##  $ ion          : chr [1:6276] "[M-H]-" "[M-H]-" "[M-H]-" "[M-H]-" ...
##  $ mz           : num [1:6276] 179 243 259 323 339 ...
##  $ ion_formula  : chr [1:6276] "C6H11O6" "C6H11O8S" "C6H11O9S" "C6H11O11S2" ...
##  $ charge       : chr [1:6276] "-1" "-1" "-1" "-1" ...
#charge filter
pred_df_filt <- pred_df %>% filter(charge %in% c('-1', '-2'))

#inspect filtered table
str(pred_df_filt)
## tibble [995 × 10] (S3: tbl_df/tbl/data.frame)
##  $ dp           : num [1:995] 1 1 1 1 1 2 2 2 2 2 ...
##  $ mass         : num [1:995] 180 244 260 324 340 ...
##  $ formula      : chr [1:995] "C6H12O6" "C6H12O8S" "C6H12O9S" "C6H12O11S2" ...
##  $ IUPAC name   : chr [1:995] "Hex1" "DeoxyHex1 Sulfate1" "Hex1 Sulfate1" "DeoxyHex1 Sulfate2" ...
##  $ glyconnect_id: chr [1:995] "228" "none" "none" "none" ...
##  $ glytoucan_id : chr [1:995] "G49874UX" "none" "none" "none" ...
##  $ ion          : chr [1:995] "[M-H]-" "[M-H]-" "[M-H]-" "[M-H]-" ...
##  $ mz           : num [1:995] 179 243 259 323 339 ...
##  $ ion_formula  : chr [1:995] "C6H11O6" "C6H11O8S" "C6H11O9S" "C6H11O11S2" ...
##  $ charge       : chr [1:995] "-1" "-1" "-1" "-1" ...

4.4 Annotate peaks

We will use the charge filtered table to annotate our peaks at 3.5 ppm error. If you want to ‘collapse’ annotations in cases where there are multiple annotations per peak (and your peaks are therefore duplicated), use the collapse = TRUE option.

#annotate peaks
pks_df_annotated <- glycoAnnotate(data = pks_df,
                                  pred_table = pred_df_filt,
                                  error = 3.5)
## Starting annotation with predictions against data
#keep only annotated peaks in the table
pks_df_onlyannotated <- pks_df_annotated %>% drop_na(dp)

#inspect annotated peaks
#11 sulfated fucan oligosaccharides were annotated
pks_df_onlyannotated
##    dp      mass     formula         IUPAC name glyconnect_id glytoucan_id
## 1   1  244.0253    C6H12O8S DeoxyHex1 Sulfate1          none         none
## 2   2  470.0400 C12H22O15S2 DeoxyHex2 Sulfate2          none         none
## 3   2  629.9536 C12H22O21S4 DeoxyHex2 Sulfate4          none         none
## 4   2  629.9536 C12H22O21S4 DeoxyHex2 Sulfate4          none         none
## 5   2  629.9536 C12H22O21S4 DeoxyHex2 Sulfate4          none         none
## 6   3  776.0116 C18H32O25S4 DeoxyHex3 Sulfate4          none         none
## 7   3  855.9684 C18H32O28S5 DeoxyHex3 Sulfate5          none         none
## 8   3  855.9684 C18H32O28S5 DeoxyHex3 Sulfate5          none         none
## 9   4 1002.0263 C24H42O32S5 DeoxyHex4 Sulfate5          none         none
## 10  4 1081.9831 C24H42O35S6 DeoxyHex4 Sulfate6          none         none
## 11  3  935.9252 C18H32O31S6 DeoxyHex3 Sulfate6          none         none
##         ion  mz_pred ion_formula charge       mz    mzmin    mzmax        rt
## 1    [M-H]- 243.0180    C6H11O8S     -1 243.0178 243.0176 243.0179  59.18481
## 2  [M-2H]-2 234.0127 C12H20O15S2     -2 234.0124 234.0114 234.0132  86.78763
## 3  [M-2H]-2 313.9695 C12H20O21S4     -2 313.9688 313.9686 313.9690 406.92737
## 4  [M-2H]-2 313.9695 C12H20O21S4     -2 313.9688 313.9686 313.9690 405.62887
## 5  [M-2H]-2 313.9695 C12H20O21S4     -2 313.9688 313.9686 313.9690 406.60312
## 6  [M-2H]-2 386.9985 C18H30O25S4     -2 386.9984 386.9975 386.9996 471.60668
## 7  [M-2H]-2 426.9769 C18H30O28S5     -2 426.9768 426.9751 426.9791 588.57740
## 8  [M-2H]-2 426.9769 C18H30O28S5     -2 426.9768 426.9760 426.9777 639.01284
## 9  [M-2H]-2 500.0059 C24H40O32S5     -2 500.0063 500.0047 500.0078 678.05256
## 10 [M-2H]-2 539.9843 C24H40O35S6     -2 539.9852 539.9846 539.9859 798.39528
## 11 [M-2H]-2 466.9553 C18H30O31S6     -2 466.9552 466.9537 466.9563 837.27402
##        rtmin     rtmax      into      intb       maxo   sn sample
## 1   48.92726  66.35989 6848117.8 6847884.8 1215479.88 4292      1
## 2   71.87560  94.05483  520024.9  511757.4   46581.35   24      1
## 3  387.41287 420.44965 3044450.5 3014906.9  233104.44   68      1
## 4  387.41287 420.44965 3044450.5 3014906.9  233104.44   68      1
## 5  387.41287 420.44965 3044450.5 3014906.9  233104.44   68      1
## 6  459.93640 483.94111  272196.5  179606.8   30914.22    5      1
## 7  562.01033 623.75556 4376145.6 4337223.1  211686.11   78      1
## 8  623.75556 650.73222 1246159.6 1229240.2   81602.15   30      1
## 9  654.02082 696.80118 1628744.5 1614033.3   92367.78   69      1
## 10 768.32796 816.37926 5721691.4 5678748.6  411292.03  141      1
## 11 825.11310 853.21116 1084009.9 1079160.8  117393.16  114      1
##    mass_error_ppm
## 1       0.8001308
## 2       1.3653844
## 3       2.2901486
## 4       2.2933373
## 5       2.2868096
## 6       0.2535610
## 7       0.3315664
## 8       0.2715139
## 9       0.9307090
## 10      1.6383194
## 11      0.2080966

5 Extract and annotate MS2 spectra associated with annotations

5.1 Extract MS2 spectra

We will use the glycoMS2Extract function from GlycoAnnotateR to extract the spectra. This function is basically a wrapper for XCMS functions for MS2 spectra extraction, with a layer to only extract those spectra associated with annotated peaks or features

ms2 <- glycoMS2Extract(
  #object with MS1 and MS2 data 
  data_ms2 = data, 
  #processed MS1 data object (must have peaks)
  data_features = pks, 
  #annotated feature table
  #must contain an 'mz' and 'rt' column
  annotations = pks_df_onlyannotated,
  #processing level here is 'peaks' (would be features after correspondence)
  processing_level = 'peaks')

#see how many spectra were extracted in total (here = 961)
length(ms2@backend@spectraData@rownames)
## [1] 961

5.2 Process MS2 spectra

We will use the functions from Spectra to average the MS2 spectra.

#combine peaks with spectra
ms2_combine <- Spectra::combinePeaks(ms2, tolerance = 0.005, ppm = 3)

#combine spectra by peak within each file across scans
ms2_mean <- Spectra::combineSpectra(ms2_combine, f = ms2$peak_index,
                                    tolerance = 0.005, ppm = 3,  
                                    peaks = 'intersect', minProp = 0.2,
                                    intensityFun = max)
## Backend of the input object is read-only, will change that to an 'MsBackendMemory'
#see how many spectra there are after averaging - 11
length(ms2_mean)
## [1] 11

5.3 Convert to dataframe

Now we will convert MS2 data into a dataframe format so its easier to work with.

#build empty df
ms2_df <- data.frame(precursorMz = as.numeric(),
                     rt = as.numeric(),
                     mz = as.numeric(),
                     intensity = as.numeric(),
                     peak_id = as.numeric())
#fill in df
for (i in 1:length(ms2_mean)){
  mz = sprintf("%.4f",mz(ms2_mean)[i] %>% unlist()) %>% 
    as.numeric()
  intensity = intensity(ms2_mean)[i] %>% unlist()
  intensity = (intensity / max(intensity)) * 100 #normalise intensity with respect to maximum
  rt = rep(rtime(ms2_mean)[i], length(mz))
  precursorMz = rep(precursorMz(ms2_mean)[i], length(mz))
  peak_id = rep(ms2_mean$peak_id[i], length(mz)) 

  temp <- data.frame(precursorMz = precursorMz,
                     rt = rt, mz = mz, intensity = intensity,
                     peak_id = peak_id)
  ms2_df <- rbind(temp, ms2_df)
}

#inspect dataframe
head(ms2_df)
##   precursorMz       rt       mz intensity peak_id
## 1    466.9558 825.7071  67.6220  49.40483    CP54
## 2    466.9558 825.7071  96.9588   8.52363    CP54
## 3    466.9558 825.7071  97.2079  31.28682    CP54
## 4    466.9558 825.7071 293.1770  10.59588    CP54
## 5    466.9558 825.7071 307.0408  25.18584    CP54
## 6    466.9558 825.7071 347.0196 100.00000    CP54

5.4 Annotate MS2 spectra

5.4.1 Annotate precursors

We will begin by annotating the precursors.

ms2_df_annotP <- glycoAnnotate(data = ms2_df,
                               #this should generally be higher than for MS1
                               error = 5, 
                               #table for annotation
                               pred_table = pred_df_filt, 
                               #name of mz column to annotate) 
                               mz_column = 'precursorMz') 
## Starting annotation with predictions against data
#make column named 'annotation' with name, ion and dp pasted together
#separated by a colon 
ms2_df_annotP <- ms2_df_annotP %>% 
  mutate(annotations = paste(`IUPAC name`, `ion`, dp, sep = ':'))

All precursors should be assigned an annotation (as we only extracted MS2 spectra for annotated precursors).

#check all precursors annotated (statement should be FLASE)
any(is.na(ms2_df_annotP$dp))
## [1] FALSE

5.4.2 Annotate fragments

Now we can annotate the fragment ions in MS2 spectra. The glycoPredictParam parameters automatically change depending on the precursor annotation. Only fragments resulting from glycosidic bond breakage, dehydration, or loss of modified groups (e.g. desulfation) can currently be annotated by GlycoAnnotateR.

#make a vector of the precursor annotations
#the vector should be as long as the number of peaks / the length
#of the ms2 object after averaging (i.e. 11 in this case)
ms2_df_annotP_unique <- ms2_df_annotP %>% 
  distinct(precursorMz, annotations, rt, peak_id)

#rename annotations column to precursorAnnotations
names(ms2_df_annotP)[names(ms2_df_annotP) == 'annotations'] <- 'precursorAnnotations'

#remove other columns added during annotation
ms2_df_annotP <- ms2_df_annotP %>% 
  select(all_of(c(names(ms2_df), 'precursorAnnotations')))

#annotate the fragment ions based on the precursor annotations
ms2_df_annotF <- glycoMS2Annotate(
  #vector of the precursor annotations
  precursorAnnotations = ms2_df_annotP_unique$annotations,
  #separator in between name, ion and dp of precursor annotations
  precursorAnnotations_sep = ':',
  #ms2spectra dataframe with annotated precursors
  ms2spectra = as.data.frame(ms2_df_annotP),
  #ion type
  ion_type = 'ESI',
  #error
  error = 5, error_units = 'ppm',
  #allow double sulfate and dehydrations and 3 annotations per monomer
  double_sulfate = T, nmod_max = 3, dehydrations = T)

#ions with multiple annotations will automatically be 'collapsed'
#in the 'annotations' column
#ions with one annotation will not be collapsed, so we will now 
#coalesce these columns
#and also annotate sulfate
ms2_df_annotF <- ms2_df_annotF %>% 
  mutate(annotations = case_when(is.na(annotations) & !is.na(ion)~ 
                                   paste(`IUPAC name`, ion, sep = ':'),
                                 is.na(annotations) & is.na(ion) ~  NA,
                                 TRUE ~ annotations),
         annotations = case_when(mz > 96.95 & mz < 97 ~ 'HSO4-',
                                 TRUE ~ annotations))

Annotated spectra can be plotted with ggplot.

#plot spectra with annotated fragments
#you can play around with how you plot this
peakIDs <- ms2_df_annotF$peak_id %>% unique()
for(i in 1:length(peakIDs)){
  pkid <- peakIDs[i]
  df <- ms2_df_annotF %>% 
    #filter for peak id
    filter(peak_id == !!pkid)  %>% 
    #filter for 1% relative abundance
    filter(intensity > 1) %>% 
    #add new line break where there are commas for prettier plotting
    mutate(annotations = gsub('\\,', ',\n', annotations))
  
  
  
  p <-  ggplot(df) +
    #plot lines
    geom_segment(aes(x = mz, xend = mz, y = 0, yend= intensity)) +
    #add annotations
    geom_text_repel(aes(label = annotations, x = mz, y = intensity),
                    size = 2, hjust = 0.5, vjust = 0.5, nudge_y = 1,
                    segment.size = 1, segment.colour = 'lightblue', 
                    segment.linetype = 'dashed', min.segment.length = 0.2) +
    #add mz values 
    geom_text(mapping = aes(x = mz, y = intensity + 0.5, label = round(mz, 4)),
              size = 2) +
    #set labels
    labs(x = expression(italic("m/z")), y = "Normalised intensity (a.u.)",
         title = paste0(df$precursorAnnotations[1], ' at ',
                        round(df$rt[1],1), ' sec, precursor m/z=',
                        round(df$precursorMz[1], 4)))+
    theme_minimal()+
    theme(axis.title = element_text(colour = 'black', size = 8, face = 'bold'),
          axis.text =  element_text(colour = 'black', size = 6),
          plot.title =element_text(colour = 'black', size = 8, face = 'bold', 
                                   hjust = 0.5),
          panel.border = element_rect(fill = NA, colour = 'black'),
          panel.grid = element_blank())
  print(p)
}