GlycoAnnotateR

Logo

R package for annotation of glycans in mass spectrometry data.

View the Project on GitHub margotbligh/GlycoAnnotateR

1 Overview

GlycoAnnotateR is an R package for data-base free annotation of glycan compositions in mass spectrometry data. The package is designed to be flexible and work with many different types of mass spectrometry data (e.g. LC-MS, MALDI, direct injection), as well as the output of many different data processing pipelines (e.g. XMCS, Cardinal). There are currently two tutorials for annotation of LC-MS data: one 'simple' tutorial for a single data file and one more detailed tutorial (more steps including isotope detection, and with correspondence analysis).

2 Installation

This package can be installed directly from Github using devtools:

library(devtools)
devtools::install_github('margotbligh/GlycoAnnotateR')

Please note that python is required for the package to function. If you do not have a local version of python available, please follow instructions to download and install.

3 Prediction parameters

The ‘prediction’ or ‘calculation’ of glycan compositions is the core utility of this package. Therefore a detailed description of the arguments is provided here.

3.1 Glycan composition parameters

  • Degree of polymerisation, dp

    This is always a range from the lowest to highest DP desired (e.g. c(1,10) for DPs from 1 to 10)- if you need only a single DP, provide that DP twice (e.g. c(2,2) for only DP 2).

  • Should pentose be included in addition to hexose, pent_option

    This is a logical argument for whether pentose monomers should be included in compositions in addition to hexose monomers.

  • Maximum number of modifications per monomer on average, nmod_max

    Calculated by the number of modifications over the number of monomers. Does not take into account unsaturated, alditol or dehydrated. For example, for a tetramer (DP4) of deoxyhexoses with four sulfate groups, the average number of modifications per monomer is 2. By default nmod_max is 1, and the maximum allowed value is 3. Consider carefully whether you need to increase this value above the default.

  • Label, label

    Are sugars labelled? Unless specified, labelling is assumed to be by reductive amination. Current supported labels are none (default) and those given in the table below:

Label Accepted names
procainamide “procainamide”, “proca”, “procA”, “ProA”
2-aminopyridine “2-ap”, “2-AP”, “pa”, “PA”, “2-aminopyridine”
2-aminobenzoic acid “2-aa”, “2-AA”, “aba”, “ABA”, “2-aminobenzoic acid”
2-aminobenzoic acid (nonreductive) “2-aa nonreductive”, “2-AA nonreductive”, “aba nonreductive”, “ABA nonreductive”, “2-aminobenzoic acid nonreductive”
2-aminobenzamide “2-ab”, “2-AB”, “ab”, “AB”, “2-aminobenzamide”
1-phenyl-3-methyl-5-pyrazolone “pmp”, “PMP”, “1-phenyl-3-methyl-5-pyrazolone”
3-aminoquinoline (nonreductive) “3AQ nonreductive”, “3-AQ nonreductive”, “3-aminoquinoline nonreductive”
  • Double sulfate, double_sulfate

    Can monomers be disulfated? Logical option required. To work sulfate must be in modifications and nmod_max at least 2.

  • Glycan linkage, glycan_linkage

    By default none. When oglycan or nglycan the limits described by Cooper et al. (2021) for the GlycoMod software are implemented. Rules are listed here: https://web.expasy.org/glycomod/glycomod-doc.html

  • Modification limits, modification_limits

    User provided limits on monomers or modifications. Provide as a named list.

  • Modifications, modifications

    By default, each modification can occur once per monomer, and it is possible to have all modifications selected present on one monomer. After calculation of modified monomers they are filtered by the nmod_max term before output is returned. So, for example, for modifications = c('deoxy', 'sulfate', 'carboxylicacid'), the program will generate as one possible composition all three modifications on one monomer (i.e. ‘DeoxyHex1 CarboxylicAcid1 Sulfate1’). If nmod_max is at the default 1, this composition will be filtered out before output is returned (as the nmod = 3). Sulfate is the only modification which is allowed to occur twice per mononer. For this, you need to set double_sulfate=TRUE and nmod_max to be at least 2.

    The different modifications and their namings are summarised below:

Modification Definition / description IUPAC naming GlycoCT naming Oxford naming
carboxylicacid Effective loss of two hydrogens and gain of one oxygen to form a carboxylic acid group on C6. The modified monomer is commonly called a ‘uronic acid’ CarboxylicAcid COOH A
sialicacid Effect addition of C11H19N1O9 to hexose. Here, sialic acid only refers to N-Acetylneuraminic acid (Neu5Ac), the most common sialic acid. Predominantly found in complex mammalian glycans. NeuAc SIA SA
phosphate Phosphate PO4 P
sulfate Addition of SO3. Only modification allowed to occur twice per monomer (see options for double_sulfate) Sulfate SO4 S
amino Gain of NH and loss of of O - result ofreplacing a hydroxyl group with an amino group. Amino NH2 Am
deoxy One hydroxyl group is replaced by an H atom. Fucose and rhamnose are two common deoxyhexoses. NB: GlycoAnnotateR currently only considers deoxyhexoses and not deoxypentoses. DeoxyHex DHEX D
nacetyl Addition of an N-acetyl group (net change = +C2H3N) . Common example of N-acetylated hexose is N-acetylglucosamine. Note that here, N-acetylglucosamine would be termed in e.g. IUPAC naming Hex1 N-Acetyl1. N-Acetyl NAc N
oacetyl Acetylation of a hydroxyl group (net change = +C2H2O). O-Acetyl Ac Ac
omethyl Addition of CH2 to an hydroxyl group. Natural modification, but can also be generated by permethylation. O-Methyl OMe M
anhydrobridge Water loss formed by bridge between two hydroxyl groups. Occurs from C6 to C3, C2 or C1. Seen in e.g. carrageenans. AnhydroBridge ANH B
unsaturated Water loss to form a C-C double bond inside a ring. Seen for example in ulvans and are the target of polysaccharide lyases. Unsaturated UNS U
dehydrated Water loss that occurs during ionisation or other reactions. Dehydrated Y Y
alditol Reducing end monomer is opened and the aldehyde reduced to an alcohol. Commonly done before PGC-LC to reduce anomer splitting of peaks. Refers to an alditol ‘modification’ not a monomer here. Alditol ALD o
aminopentyllinker Functional group used in synthetic chemistry. Can occur once per composition. NH2Pent1 NH2Pent1 NH2Pent1

3.2 Mass spec parameters

  • Scan range, scan_range

    Scan range (m/z) used during acquisition. For prediction/computation purposes only this can be set very wide. Compositions with no adduct with an m/z value inside the scan range will be filtered out.

  • Polarity, polarity

    Negative (neg) and/or positive (pos) ionisation polarity used during acquisition. Changes the adducts returned. See below for specific adducts generated.

  • Ionisation type, ion_type

    ESI (ESI) and/or MALDI (MALDI) ionisation used. Changes the adducts returned (MALDI has only singly charged ions, ESI can have multiply charged). See below for specific adducts generated.

3.3 Output and other parameters

  • Naming, naming

    How should compositions be named? Options are IUPAC, GlycoCT and Oxford. As only compositions and not structures are given, conventions could not be followed closely, but common abbreviations from the conventions are used (see modifications table above).

  • Adducts, adducts

Options are: H, Na, NH4, K, Cl and CHOO. The adducts generated depends on adducts, polarity and ion type. The resulting adducts are summarised in the table below:

__NB: n is the number of anionic groups. Where relevant, ions will be generated with n values from 2-n. For example, in negative mode with MALDI and Na adducts, for a composition with four sulfate groups (n = 4) the adducts will include [M-2H+1Na]-, [M-3H+2Na]- and [M-4H+3Na]-.

Adduct Ion type Polarity Adducts generated
H ESI Positive [M+H]+
Negative [M-H]-, [M-nH]-n
MALDI Positive [M+H]+
Negative [M-H]-
Na ESI Positive [M+Na]+, [M-nH+(n+1)Na]+
Negative [M-nH+(n-1)Na]-
MALDI Positive [M+Na]+, [M-nH+(n+1)Na]+
Negative [M+nH+(n-1)Na]-
NH4 ESI Positive [M+NH4]+, [M-nH+(n+1)NH4]+
MALDI Positive [M+NH4]+, [M-nH+(n+1)NH4]+
Negative [M-nH+(n-1)NH4]-
K ESI Positive [M+K]+, [M-nH+(n+1)K]+
MALDI Positive [M+K]+, [M-nH+(n+1)K]+
Negative [M-nH+(n-1)K]-
Cl ESI Negative [M+Cl]-
MALDI Negative [M+Cl]-
CHOO ESI Negative [M+CHOO]-
MALDI Negative [M+CHOO]-