R package for annotation of glycans in mass spectrometry data.
GlycoAnnotateR is an R package for data-base free annotation of glycan compositions in mass spectrometry data. The package is designed to be flexible and work with many different types of mass spectrometry data (e.g. LC-MS, MALDI, direct injection), as well as the output of many different data processing pipelines (e.g. XMCS, Cardinal). Please see the tutorial for detailed explanations and instructions.
This package can be installed directly from Github using devtools:
library(devtools)
devtools::install_github('margotbligh/GlycoAnnotateR')
Please note that python is required for the package to function. If you do not have a local version of python available, please follow instructions to download and install.
The ‘prediction’ or ‘calculation’ of glycan compositions is the core utility of this package. Therefore a detailed description of the arguments is provided here.
Degree of polymerisation, dp
This is always a range from the lowest to highest DP
desired (e.g. c(1,10)
for DPs from 1 to 10)- if you need
only a single DP, provide that DP twice (e.g. c(2,2)
for
only DP 2).
Should pentose be included in addition to hexose,
pent_option
This is a logical argument for whether pentose monomers should be included in compositions in addition to hexose monomers.
Maximum number of modifications per monomer on average,
nmod_max
Calculated by the number of modifications over the number of
monomers. Does not take into account unsaturated, alditol or dehydrated.
For example, for a tetramer (DP4) of deoxyhexoses with four sulfate
groups, the average number of modifications per monomer is 2. By default
nmod_max
is 1, and the maximum allowed value is 3. Consider
carefully whether you need to increase this value above the
default.
Label, label
Are sugars labelled by reductive amination? Current supported labels
are none
(default) and those givin in the table
below:
Label | Accepted names |
---|---|
procainamide | “procainamide”, “proca”, “procA”, “ProA” |
2-aminopyridine | “2-ap”, “2-AP”, “pa”, “PA”, “2-aminopyridine” |
2-aminobenzoic acid | “2-aa”, “2-AA”, “aba”, “ABA”, “2-aminobenzoic acid” |
2-aminobenzamide | “2-ab”, “2-AB”, “ab”, “AB”, “2-aminobenzamide” |
1-phenyl-3-methyl-5-pyrazolone | “pmp”, “PMP”, “1-phenyl-3-methyl-5-pyrazolone” |
Double sulfate, double_sulfate
Can monomers be disulfated? Logical option required. To work
sulfate
must be in modifications and nmod_max
at least 2.
Glycan linkage, glycan_linkage
By default none
. When oglycan
or
nglycan
the limits described by Cooper et al. (2021) for
the GlycoMod software are implemented. Rules are listed here: https://web.expasy.org/glycomod/glycomod-doc.html
Modification limits, modification_limits
User provided limits on monomers or modifications. Provide as a named list.
Modifications, modifications
By default, each modification can occur once per monomer, and it is
possible to have all modifications selected present on one monomer.
After calculation of modified monomers they are filtered by the
nmod_max
term before output is returned. So, for example,
for
modifications = c('deoxy', 'sulfate', 'carboxylicacid')
,
the program will generate as one possible composition all three
modifications on one monomer (i.e. ‘DeoxyHex1 CarboxylicAcid1
Sulfate1’). If nmod_max
is at the default 1, this
composition will be filtered out before output is returned (as the
nmod
= 3). Sulfate is the only modification which is
allowed to occur twice per mononer. For this, you need to set
double_sulfate=TRUE
and nmod_max
to be at
least 2.
The different modifications and their namings are summarised below:
Modification | Definition / description | IUPAC naming | GlycoCT naming | Oxford naming |
---|---|---|---|---|
carboxylicacid | Effective loss of two hydrogens and gain of one oxygen to form a carboxylic acid group on C6. The modified monomer is commonly called a ‘uronic acid’ | CarboxylicAcid | COOH | A |
sialicacid | Effect addition of C11H19N1O9 to hexose. Here, sialic acid only refers to N-Acetylneuraminic acid (Neu5Ac), the most common sialic acid. Predominantly found in complex mammalian glycans. | NeuAc | SIA | SA |
phosphate | Phosphate | PO4 | P | |
sulfate | Addition of SO3. Only modification allowed to occur twice per
monomer (see options for double_sulfate ) |
Sulfate | SO4 | S |
amino | Gain of NH and loss of of O - result ofreplacing a hydroxyl group with an amino group. | Amino | NH2 | Am |
deoxy | One hydroxyl group is replaced by an H atom. Fucose and rhamnose are two common deoxyhexoses. NB: GlycoAnnotateR currently only considers deoxyhexoses and not deoxypentoses. | DeoxyHex | DHEX | D |
nacetyl | Addition of an N-acetyl group (net change = +C2H3N) . Common example of N-acetylated hexose is N-acetylglucosamine. Note that here, N-acetylglucosamine would be termed in e.g. IUPAC naming Hex1 N-Acetyl1. | N-Acetyl | NAc | N |
oacetyl | Acetylation of a hydroxyl group (net change = +C2H2O). | O-Acetyl | Ac | Ac |
omethyl | Addition of CH2 to an hydroxyl group. Natural modification, but can also be generated by permethylation. | O-Methyl | OMe | M |
anhydrobridge | Water loss formed by bridge between two hydroxyl groups. Occurs from C6 to C3, C2 or C1. Seen in e.g. carrageenans. | AnhydroBridge | ANH | B |
unsaturated | Water loss to form a C-C double bond inside a ring. Seen for example in ulvans and are the target of polysaccharide lyases. | Unsaturated | UNS | U |
dehydrated | Water loss that occurs during ionisation or other reactions. | Dehydrated | Y | Y |
alditol | Reducing end monomer is opened and the aldehyde reduced to an alcohol. Commonly done before PGC-LC to reduce anomer splitting of peaks. Refers to an alditol ‘modification’ not a monomer here. | Alditol | ALD | o |
aminopentyllinker | Functional group used in synthetic chemistry. Can occur once per composition. | NH2Pent1 | NH2Pent1 | NH2Pent1 |
Scan range, scan_range
Scan range (m/z) used during acquisition. For prediction/computation purposes only this can be set very wide. Compositions with no adduct with an m/z value inside the scan range will be filtered out.
Polarity, polarity
Negative (neg
) and/or positive (pos
)
ionisation polarity used during acquisition. Changes the adducts
returned. See below for specific adducts generated.
Ionisation type, ion_type
ESI (ESI
) and/or MALDI (MALDI
) ionisation
used. Changes the adducts returned (MALDI has only singly charged ions,
ESI can have multiply charged). See below for specific adducts
generated.
Naming, naming
How should compositions be named? Options are IUPAC
,
GlycoCT
and Oxford
. As only compositions and
not structures are given, conventions could not be followed closely, but
common abbreviations from the conventions are used (see modifications
table above).
Adducts, adducts
Options are: H
, Na
, NH4
,
K
, Cl
and CHOO
. The adducts
generated depends on adducts
, polarity
and
ion type
. The resulting adducts are summarised in the table
below:
__NB: n is the number of anionic groups. Where relevant, ions will be generated with n values from 2-n. For example, in negative mode with MALDI and Na adducts, for a composition with four sulfate groups (n = 4) the adducts will include [M-2H+1Na]-, [M-3H+2Na]- and [M-4H+3Na]-.
Adduct | Ion type | Polarity | Adducts generated |
---|---|---|---|
H | ESI | Positive | [M+H]+ |
Negative | [M-H]-, [M-nH]-n | ||
MALDI | Positive | [M+H]+ | |
Negative | [M-H]- | ||
Na | ESI | Positive | [M+Na]+, [M-nH+(n+1)Na]+ |
Negative | [M-nH+(n-1)Na]- | ||
MALDI | Positive | [M+Na]+, [M-nH+(n+1)Na]+ | |
Negative | [M+nH+(n-1)Na]- | ||
NH4 | ESI | Positive | [M+NH4]+, [M-nH+(n+1)NH4]+ |
MALDI | Positive | [M+NH4]+, [M-nH+(n+1)NH4]+ | |
Negative | [M-nH+(n-1)NH4]- | ||
K | ESI | Positive | [M+K]+, [M-nH+(n+1)K]+ |
MALDI | Positive | [M+K]+, [M-nH+(n+1)K]+ | |
Negative | [M-nH+(n-1)K]- | ||
Cl | ESI | Negative | [M+Cl]- |
MALDI | Negative | [M+Cl]- | |
CHOO | ESI | Negative | [M+CHOO]- |
MALDI | Negative | [M+CHOO]- |