Fast and Accurate Disulfide Bridge Detection
Authors: Søren Heissel, Yi He, Andris Jankevics, Yuqi Shi, Henrik Molina, Rosa Viner, and Richard A. Scheltema
Correspondence: sheissel@rockefeller.edu; rosa.viner@thermofisher.com; Richard.Scheltema@liverpool.ac.uk
Graphical Abstract
A workflow for fast and accurate disulfide mapping is presented. It involves microwave-assisted acid hydrolysis (MAAH) to generate peptides, followed by Field Asymmetric Ion Mobility Spectrometry (FAIMS) to remove chemical noise and improve signal quality. Peptides are then fragmented using Electron Transfer Higher-energy Collision Dissociation (EThcD). Finally, data analysis, including False Discovery Rate (FDR) control at the cross-link level, is performed using the XlinkX node in Proteome Discoverer. This process allows for the identification of disulfide bridges within a 1-hour timeframe.
In Brief
A method for fast and accurate disulfide mapping using MAAH and FAIMS has been developed. This method generates hundreds of overlapping peptides from each protein while keeping disulfide bonds intact. Chemical noise from the hydrolysis process is removed using FAIMS, and peptides are fragmented with EThcD. Additional developments were made to the XlinkX node in Proteome Discoverer for handling such data. The method was tested using purified proteins and allows for accurate disulfide mapping in 1 hour.
Highlights
- MAAH is used for fast and nonspecific proteolysis with intact canonical disulfides.
- Background ions from MAAH are removed using FAIMS, increasing disulfide bond IDs.
- EThcD along with a novel data analysis platform allows for disulfide mapping in 1 hour.
Introduction
Recombinant expression of proteins, propelled by therapeutic antibodies, has evolved into a multibillion dollar industry. Essential for this is the quality control assessment of critical attributes, such as sequence fidelity, proper folding, and posttranslational modifications. Errors can lead to diminished bioactivity and, in the context of therapeutic proteins, an elevated risk for immunogenicity. Over the years, many techniques were developed and applied to validate proteins in a standardized and high-throughput fashion. One parameter that has been challenging to assess is disulfide bridges, covalent bonds linking two cysteine residues, which assist in correct folding and stability of proteins and thus influence their efficacy. Mass spectrometry promises to be an optimal technique for uncovering them quickly and accurately. This work presents a unique combination of sample preparation, data acquisition, and analysis facilitating the rapid and accurate assessment of disulfide bridges in purified proteins. Through microwave-assisted acid hydrolysis, proteins are digested rapidly and artifact-free into peptides with a substantial degree of overlap over the sequence. The nonspecific nature of this procedure introduces chemical background, which is efficiently removed by integrating ion mobility preceding the mass spectrometric measurement. The nonspecific nature of the digestion step additionally necessitates new developments in data analysis, for which the XlinkX node in Proteome Discoverer was extended to efficiently process the data and ensure correctness through effective false discovery rate correction. The entire workflow can be completed within 1 hour, allowing for high-throughput, high-accuracy disulfide mapping.
Protein expression describes the intricate processes through which proteins are generated, modified, and regulated in living organisms. Within the wider context of protein research and pharmaceutical applications, these processes are utilized by researchers to direct organisms to produce desired proteins in a process called recombinant expression. Here, plasmids are introduced and encoded with the desired protein sequence, which is then automatically translated into proteins in large amounts. Different organisms can be selected for protein production, each with its own benefits. For example, Escherichia coli are commonly used as they rapidly expand and represent the most cost-effective option. However, this organism lacks the capabilities to introduce complex posttranslational modifications (PTMs) like glycosylation, which can be essential for bioactivity. Other organisms, like Chinese hamster ovary cells or insect cells, are used in those cases, although these exhibit slower rates of expansion resulting in a higher cost. The advent of recombinant expression has been instrumental to kick-start the biopharmaceutical revolution, leading to rapid expansion that is still ongoing. Notably, between 2018 and 2022 alone, a total of 180 novel protein products secured regulatory approval. Analysts anticipate further growth due to the mounting incidence of cancer, hereditary disorders, and autoimmune maladies, complemented by the approval of numerous therapeutic interventions that modify the course of these afflictions.
An essential step in protein production is quality control of the product. During this step, the protein is investigated for multiple critical quality attributes to ensure bioactivity and, in the case of biopharmaceuticals, that they cause no immunological issues. Sequence fidelity tests determine whether the proteins have the intended amino acid sequence. Problems that can occur include point mutations or, in the case of eukaryotic cells, splice variants, which can cause loss of activity. Typically, mass spectrometry (MS) approaches are used for this step. Structural assays determine whether the secondary and tertiary structure of the protein are correct, which is often essential for bioactivity. Many structural biology techniques are employed here, the most common being NMR, while size-exclusion chromatography provides information on the molecular weight and presence of protein aggregates. Finally, the presence and correct location of the required PTMs is verified, typically using MS approaches.
Disulfide bridges are covalent links between two cysteine residues, existing commonly within the same polypeptide chain (intrachain links) and less frequently between two polypeptide chains (interchain links). This PTM is facilitated in the endoplasmic reticulum by the enzyme protein disulfide isomerase and is among the most common PTMs. Present mostly in secreted proteins (a large source of biopharmaceuticals), disulfide bridges are critical to the protein, and incorrect disulfide structures may lead to degradation, loss of function, or diseases. Disulfide mapping is one of many characteristics considered as quality attributes that must be thoroughly characterized during protein production. Mismatch of the disulfide bridges (disulfide scrambling) can cause a complex impurity profile and decreased efficacy of the biopharmaceutical, which is why official guidelines call for disulfide bridge analysis during production. It is, however, not trivial to map the often-complex disulfide networks, making disulfide mapping important in both academic and industrial settings.
Several techniques within structural biology have been applied to detect disulfide bridges, such as X-ray crystallography and NMR, however these are costly to execute both in terms of material needed and time spent on the analysis. In recent years, LC-MS has emerged as a powerful additional technique for studying disulfide dynamics. In the classic bottom-up proteomics workflow, proteins are reduced and alkylated, during which, disulfide bonds are disrupted and permanently blocked from reforming and finally digested into peptides with proteases such as trypsin. The reduction and alkylation serve to denature the protein and increase the accessibility for the protease. After desalting, the peptide mixture is separated by liquid chromatography and finally analyzed by MS, where the peptides are fragmented along the peptide backbone in the gas phase. Disulfide bridges can be detected with LC-MS by either employing isotopically labeled alkylation agents to reduced and nonreduced protein samples to elucidate which cysteines are engaged in disulfide bonds and quantify the levels at which each residue exists as a free thiol, although this does not provide details on the specific disulfide pairs. Proteins may also be subjected to a step-wise reduction using an initial weaker concentration of reducing agent, followed by alkylation, reducing, and alkylating only a subset of the disulfide bonds. This is followed by a complete reduction and alkylation employing a different alkylation agent. Disulfide bonds may also be elucidated by direct detection, where bonded peptides are identified, allowing for mapping of disulfide pairs. This strategy requires processing the proteins under nonreducing conditions, which may influence digestion efficiency unless a chaotropic agent is added for denaturation. Alkylation of free thiols may still be performed to identify nonbonded cysteines. Alternatively, disulfide bonds may be reduced on-the-fly by UV-photo dissociation and immediately alkylated via a postcolumn microreaction cell for disulfide mapping or chemically reduced postcolumn.
The unambiguous determination of disulfide bridges through MS remains challenging due to several factors such as poor fragmentation properties of linked peptides. Additionally, a limited set of data analysis tools are currently available capable of dealing with the complexity of the data (unspecific digestion and alternative fragmentation techniques) while delivering low false positive rates. Researchers commonly digest the protein(s) using trypsin, but the alkaline pH optimum makes the disulfide bridges prone to scrambling, leading to incorrect results. The aspartic protease pepsin has a lower cleavage specificity than that of trypsin, and, with a pH optimum at 1 to 2, pepsin remains active at conditions where disulfide scrambling is less likely to occur. Pepsin has therefore been used as a more favorable alternative to conventional tryptic digestion in disulfide mapping. Proteins however can also be cleaved nonenzymatically by strong acid under high temperatures either to their individual amino acid components or to peptides with optimal properties for mass spectrometric detection, a process that may be accelerated by applying microwave energy (microwave-assisted acid hydrolysis [MAAH]). MAAH carried out with TFA allows for hydrolysis in less than 10 min and is able to provide extensive sequence coverage. As the cleavage specificity is practically random, peptide generation is not reliant on the protein sequence as is the case with conventional proteolytic digestion and a high degree of overlap over the full sequence is typically achieved. MAAH has furthermore shown potential for disulfide mapping. However, the nonspecific cleavage pattern combined with combinatoric analyses make data analysis challenging.
In this work, the digestion properties of MAAH on a simple protein system, lysozyme C, are inspected, finding that suitable large, interpretable peptides and high sequence coverage are obtained while retaining the disulfide bridges. By interpreting the fragmentation spectra recorded of disulfide-bridged peptide pairs, it is shown that electron transfer higher energy dissociation (EThcD) produces highly informative fragmentation spectra of disulfide-bridged peptides resulting from this nontryptic approach. Next, the results from a therapeutically relevant protein—Trastuzumab—are inspected, finding that the digestion approach results in a large amount of background noise hiding many of the disulfide-bridged peptide pairs. By integrating field asymmetric ion mobility spectrometry (FAIMS), which has previously shown potential in enhancing data quality in cross-linking mass spectrometry studies, this background is efficiently removed, bringing all signals into view. The unspecific nature of the digestion necessitated optimizations to the data analysis software XlinkX (https://www.hecklab.com/software/xlinkx/) both in the search as well as the false discovery rate (FDR) control, for which all data presented in this study was used to develop an approach, integrating an open search option. Next, data is presented where the approach successfully detects disulfide scrambling in an experiment where peptides from lysozyme C were induced to undergo scrambling. Finally, the approach successfully detects and quantifies all disulfide bridges on a very short-time scale in therapeutically and structurally relevant proteins, trastuzumab and integrin α-llb.
Experimental Procedures
Experimental Design and Statistical Rationale
The material used consists of single batches of purified proteins and therefore no biological replicates were used. All samples were injected multiple times to ensure consistency. All results presented in this study originate from representative single replicates. In cases where various conditions are tested against controls, the controls are obtained without the parameter being tested. For the scrambling experiment, high-pH, high-temperature samples were compared to a hydrolyzed control sample that had been kept at acidic pH from hydrolysis to injection onto the mass spectrometer. For evaluation of FAIMS, the no-FAIMS sample was analyzed with the same chromatographic and mass spectrometer acquisition settings but omitting FAIMS.
Chemicals and Reagents
Chicken lysozyme C (L-7651) was purchased from Sigma-Aldrich. Trastuzumab was generously donated by a pharmaceutical company and human integrin α-llb was purified from human platelets. TFA, LC-MS grade was purchased from Thermo Fisher Scientific.
Microwave-Assisted Acid Hydrolysis
Twenty micrograms of dry protein sample was dissolved in 40 µl 25% TFA in a 1.5 ml low-binding polypropylene vial (Eppendorf). The vial was sealed with a micro tube cap lock (Scientific Specialties) and additionally secured with tape before placing it in a bubble rack, which was placed in a 1000 ml beaker containing 200 ml demineralized water. The beaker was positioned off-center in a household microwave oven (General Electrics, model PEM31DMWW), and hydrolysis was performed using the standard setting at 800 W for 15 min (except for when hydrolysis time was evaluated, where 5, 7.5, 10, and 15 min were selected). Hydrolyzed protein was quickly dried by vacuum centrifugation and dissolved in LC-MS loading solvent (0.1% TFA) prior to injection on the mass spectrometer.
Assessing Disulfide Scrambling
Chicken lysozyme C was acid hydrolyzed and purified by in-house constructed zip-tips. Aliquots were dissolved in 50 mM triethylammonium bicarbonate pH 8.5 and incubated at room temperature, 37 °C, and 50 °C. Time points were taken after 1 h, 3 h, and 6 h and immediately acidified. All time point aliquots were compared to a control sample, which had kept at low-pH conditions.
LC-MS Data Acquisition
Samples were separated by reverse-phase HPLC using a Thermo Fisher Scientific Vanquish Neo system connected to an EASY-Spray PepMap RSLC C18 column (0.075 mm x 250 mm, 2 µm particle size, 100 Å pore size) at 250 nl/min flow rate. The hydrolyzed samples were analyzed on the Orbitrap Eclipse Tribrid mass spectrometer coupled with FAIMS Pro Duo interface. Reverse-phase separation was accomplished using a 20, 30, or 60 min separation gradient (plus 10-20 min equilibration phase) of 4 to 40% solvent B (A: 0.1% formic acid; B: 80% acetonitrile, 0.1% formic acid). FAIMS was set at standard resolution with 3.9 L/min total carrier gas flow and with 2 CV (-50/-60 or –60/–75) method. Samples were analyzed using an EThcD-MS2 acquisition strategy with 20% supplemental activation collision energy. MS1 and MS2 scans were acquired in the Orbitrap with a respective mass resolution of 120,000 and 60,000. MS1 scan range was set to m/z 375 to 1400, standard automatic gain control target, 246 ms maximum injection time, and 60 s dynamic exclusion. MS2 scans in data-dependent acquisition mode (top speed 1.5 s/cv) were set to an automatic gain control target of 2e5, 118 ms max injection time, and isolation window 1.6 m/z. Only precursors at charged states +3 to +8 were subjected to MS2. LC-MS acquisition settings for the hydrolysis-time evaluation and evaluation of modification landscape can be found in Supplementary Methods.
Implementation of an Open Search Engine
An open search engine modeled after the approach described for MSFragger was implemented. To speed up fragmentation spectrum annotation, a fragment index is used, where each theoretical fragment is maintained in a sorted list that is binned on 0.01 Da intervals. This setup allows for almost instantaneous extraction of peptides matching a particular fragment. All fragments in a spectrum can then be annotated with peptide identities and a list constructed on the most likely peptide identities. To tie everything together, additional meta-data about the protein sequences, modifications, and the total peptide mass are required. For the initial identification, the hyper-score calculation integrated in Sequest is utilized. Based on this score, a top ten list of the best peptide identities is assembled and rescoring is performed with a scoring routine adapted from Olsen and Mann to obtain the final peptide identity.
To investigate the engine's performance, results from a HeLa cell tryptic digest were compared to those obtained with Mascot or Sequest. Approximately 75% of the spectra identified by Mascot were also identified by this engine, along with a very low percentage of additional identifications uniquely identified by this search engine. The remaining fraction of Mascot-identified fragmentation spectra were also correctly identified by this engine but discarded during the FDR control step due to low quality. It is concluded that this linear search engine works properly. The integration into the crosslink search workflow is described in detail in the Results section Search Engine Optimization.
Data Analysis
All data analysis, unless otherwise stated, was performed in Proteome Discoverer v. 3.1 SP1. The processing workflow consists of nodes for connecting to raw files, selecting spectra with accurate precursor m/z and charge states, performing a linear peptide search using the Mascot node with a FASTA file and defined modifications (methionine oxidation, protein N-term acetylation, asparagine and glutamine deamidation), validating results with a target-decoy peptide-spectrum-match validator to control for false positives at 1%, filtering spectra using a "Spectrum confidence filter" to remove low-quality spectra, and finally using the XlinkX node in Proteome Discoverer for crosslink detection.
The workflow involves several steps: 1. Cluster Peaks Removal: Noisy peaks clustering around the real peak are eliminated. 2. Precursor Peak Removal: Peaks larger than the precursor mass minus 2x water are removed to avoid negative impact on the hyper-score. 3. Immonium Ions Removal: These ions are removed based on accurate masses. 4. Spectrum Filtering: The spectrum is filtered to retain the TopX of 20 peaks per 100 Da. 5. Neutral Loss Peaks Removal: Nonannotated neutral loss peaks are removed based on mass differences. These steps yield clean and interpretable fragmentation spectra.
The second stage focuses on identifying linear peptides from the available fragmentation spectra, as these peptides are easier to identify than cross-linked pairs. After FDR control at the peptide-spectrum-match, peptide, and protein level, this step provides information for calibrating precursor masses. A maximum acceptable mass deviation is estimated after calibration, utilizing interquartile range fences. The effect of applying m/z calibration is dramatic, as even after stringent FDR control, false positives remain when calibration is not applied, while only the correct disulfide bridges remain when calibration is applied. For the presented data, a development version of XlinkX/PD capable of identifying linear peptides was used. However, similar results can be obtained with regards to mass calibration and removal of confidently identified spectra by using mass recalibration option and any search engine for linear peptides available in Proteome Discoverer.
In the final stage, cross-linked (or disulfide-bridged) peptides were identified. Confidently identified linear peptide spectra are removed and an open search strategy employed to identify the possible sequence of the first peptide (PeptideA) in the remaining spectra. High sequence coverage for the correct PeptideA was achieved for all spectra supported by intense fragment ions. This information is used to create a list of possible peptide identities, sorted based on the hyper-score used in the open search, resulting in the correct peptide located within the top 8 in >95% of the cases. Enforcing the presence of at least one “cross-linker” cleavage product improves the accuracy significantly, with the correct identity in over 95% of the cases located within the top 3. The second peptide (PeptideB) is identified by calculating the mass difference between the precursor and the summed mass of PeptideA and the crosslinker, followed by a classical search approach using a scoring routine. Due to only considering the top 10 PeptideA identities, search time is dramatically reduced from excess of 24 h to less than 5 min per raw file.
FDR Correction
In the final stage of the analysis, the complete list of identified cross-linked peptide spectra, along with their corresponding peptide pairs, is compiled into a CSM table. However, this table contains, next to the canonical disulfide bridges, numerous false positives. To achieve elimination, a multistage strategy is employed. Initially, peptide identities for both PeptideA and PeptideB are identified using a database containing authentic protein sequences and another with decoy sequences. Only the top-scoring pair is retained for analysis. If either of the peptides originates from the decoy database, the identification is flagged as decoy. To establish the appropriate score cut-off, the complete list is sorted, and the number of decoy identifications is calculated for each potential cut-off. This step eliminates more than half of the false positives, although the remaining fraction necessitates further curation. The results from the CSM table are then condensed into a crosslinks table, grouping peptide pairs based on their positions in the protein, including PTMs and missed cleavages. Similar to the CSM table, the crosslinks table is FDR controlled based on the maximum score among all entries for each crosslink, and the highest scoring CSM is reported for identified unique pair of cross-linked sites. These filtering steps effectively eliminate false positive identifications, which are common due to the nonspecific nature of the digestion and the significant expansion of the search space (up to 50-100 times). Consequently, only canonical disulfide bridges remain, with just one false positive identification.
Disulfide Bridge Scrambling during MAAH
The identified disulfide-bonded peptides were evaluated to uncover the degree at which scrambling of disulfide bridges (i.e., nonbiologically relevant disulfide bridges induced by incorrect folding or sample preparation issues) occurs. This attribute is critical if the workflow is to be used as a QC tool for biopharmaceutical proteins. To ascertain this, the degree of disulfide scrambling in an acid hydrolysate of lysozyme C, which was subsequently incubated at pH 8.5 at room temperature (25 °C), 37 °C, and 50 °C for 1, 3, and 6 h, was evaluated. These samples were compared to a control acid hydrolysate, which had not been incubated at higher pH. As expected, the four correct bridges were exclusively identified in the control sample. At room temperature, however, three scrambled bridges were observed after 6 h of incubation. At 37 °C and 50 °C, after just 1 h incubation, four and 17 scrambled bridges were observed, respectively. The number of scrambled disulfide bridges rapidly increased with longer incubation times. After 3 h at 37 °C, a total of 20 scrambled disulfide bridges were identified, which is comparable to the number identified after just 1 h at 50 °C. It was noted that the intensities of the canonical disulfide bridges significantly dropped after incubation at pH 8.5. Peptides representing noncanonical links increased rapidly in intensity at 37 °C and at 50 °C. These links had reached intensities comparable to those of the canonical links within just 1 h.
Disulfide Mapping of Protein Standards
To further highlight the capability of the presented approach, the disulfide bridges of proteins with clear clinical relevance and varying complexity in their disulfide landscapes were characterized. The first, trastuzumab, is a mAb used to treat breast cancer for patients who are HER2 positive. The second, human integrin alpha-llb, is expressed by a wide variety of cell types, including T cells (NKT cells), NK cells, fibroblasts, and platelets. Integrins are involved in cell adhesion, participate in cell-surface-mediated signaling, and play a critical role in platelet aggregation.
For trastuzumab, a total of nine disulfide bridges are described, indicated in schematic form. Using standard settings, seven bridges were detected. The disulfide bridges at Cys229 and Cys232 are missing, which can be explained by their close spacing, resulting in peptide pairs connected by two disulfide bridges. To account for this, a disulfide bridge (i.e., H(-2)) was added as a variable modification, resulting in the positive identification of these disulfide bridges in a single peptide pair. As the disulfide bridges at position 229-229 and 232-232 are identified from a single peptide pair, an occupancy rate cannot be calculated individually, but the combination has an occupancy of 97%, making it likely that both are present in the vast majority of cases. The remaining calculated occupancy rates show high occupancy for all disulfide bridges except for one. The bridge Cys370-Cys428 is occupied in 86% of the cases. It was previously shown that though removal of this disulfide bridge affects antibody stability, it does not structurally change the antibody to the level that intact immunoglobulin G can still be formed and the antibody remains active. This is well supported by its role in stabilizing two already connected beta sheets. Hexamerization can be affected potentially modulating complement activation, although it was previously observed that disconnected disulfide bridges can spontaneously reform in the presence of plasma.
For integrin alpha-llb, nine disulfide bridges are also described, indicated in schematic form. With standard settings, eight were successfully detected, all at high occupancy rates. The disulfide bridge Cys911-Cys916 was not found in our analysis, for which previous reports also do not provide evidence of existence. A lower occupancy is detected for Cys504-Cys515, which maintains a small loop. To verify whether this observation is correct, occupancy rates were correlated to those previously reported. This study reports Cys521-Cys576 with the lowest occupancy, while the rest of the disulfide bridges correlate very well on occupancy rates.
Analysis Time Reduction
Sample throughput is a major concern in many proteomics laboratories, both industrial and academic. Given the significant speed increase in data analysis and the extreme repetition rates of identifications, it was reasoned that data acquisition time can be reduced. To this end, the capabilities on lysozyme C were evaluated using a 20-min and a 30-min gradient rather than 60 min, which would allow for disulfide mapping on a much shorter time scale. Considering the drastic increase in data quality attributed to FAIMS, it was hypothesized that a shorter gradient would still provide extensive linkage information. Using lysozyme C, all four disulfide bridges were covered using the 20-min, 30-min, and 60-min gradient with 79, 135, and 217 CSMs, respectively. These results demonstrate that it is possible to go from intact protein to complete disulfide mapping results within a single hour.
Conclusions and Discussion
This paper presents an efficient and precise sample preparation, data acquisition, and analysis approach for the accurate detection of disulfide bridges. In earlier studies, researchers attempted to identify disulfide bridges through indirect means like differential alkylation or direct detection as "cross-linked" peptides. Typically, these methods involved analyzing samples under unfavorable conditions. For instance, they employed traditional enzymatic reactions, where digestion occurred in a slightly alkaline pH setting. While this approach allowed the utilization of existing tools tailored for proteomics workflows, it had a drawback: it left proteins susceptible to rearrangement of their disulfide bridges, forming nonbiologically relevant connections between cysteines. To overcome this, several studies report reduced scrambling when performing digestions at pH 5 to 6, by applying proteases with a lower pH optimum such as pepsin or by the addition of an oxidizing agent. Most enzymes used in proteomics experiments however require higher temperatures and longer incubation times to ensure complete digestion and, so far, these did not gain traction for the detection of disulfide bridges. In contrast, the MAAH that is central to this approach efficiently hydrolyzes proteins in highly acidic conditions, preventing such scrambling. MAAH however produces a high abundant background, which must be removed to keep required high dynamic range for disulfide identification. Adding FAIMS as a gas filtering device improves the number of identified CSMs 5-fold and enables identification of all S-S links in the systems investigated. EThcD has previously shown great potential for disulfide analysis, and as this fragmentation strategy produces not only backbone fragments but also intense ions from the intact linear peptides, the spectra were rich in information.
The required search engine optimizations were implemented in the XlinkX node in PD v. 3.1 sp1. This data analysis pipeline has for the last 7 years gone through continuous development cycles and will continue to do so. For the near future, its functionality is being extended to encompass a broader set of options. One of these will be the extension of XlinkX/PD for the extraction of higher order (more than two) cross-linked peptides. Currently, XlinkX/PD is restricted to two peptides (or dipeptide) crosslinked together. For disulfide-bridged peptides, more peptides need to be supported. A famous example is insulin, where the disulfide bridges intertwine around closely spaced positions. For the current implementation, a workaround enables their detection.
The comprehensive protocol not only accelerates sample preparation, data acquisition, and data analysis but also guarantees precise and dependable identification of cross-linked peptides in intricate samples. This makes it an invaluable tool for high-throughput quality control applications, which is an increasingly important part of the biopharmaceutical pipeline. At a current estimated market share of 450 billion dollars annually, which is anticipated to exceed 1 trillion dollars by 2030, this protocol can become a key component in the biopharmaceutical pipeline. Disulfide mapping is also a critical attribute in detailed protein structure characterization. As currently most knowledge about disulfide bridges comes from much more complicated classical structural biology experiments like X-ray crystallography, complementary information provided by mass spectrometric analyses can be beneficial. Most proteins analyzed by the classical techniques are overexpressed recombinant and very often alkylated (alkylation improves the resolution for these imaging techniques) proteins that can result in errors of disulfide mapping due to scrambling or alkylation. MS can provide disulfide mapping for more complex and endogenous samples.
Data Availability
Appropriate raw files and search results have been submitted to ProteomeXchange and can be accessed through accession: PXD046855. For potential review of the data, please use user-name: "reviewer_pxd046855@ebi.ac.uk" and password: "DhS5mphm". The updated XlinkX node for Proteome Discoverer 3.1 will be made available through a service pack.
Supplemental data—This article contains supplemental data.
Acknowledgments
Thanks are extended to Dr Lu Wang of the Allen and Frances Adler Laboratory of Blood and Vascular Biology for the generous donation of the purified integrin α-llb.
Funding and Additional Information
R. A. S. acknowledges that this work is part of the research program NWO TA with project number 741.018.201, financed by the Dutch Research Council (NWO). R. A. S. further acknowledges funding through the European Union Horizon 2020 program INFRAIA project Epic-XS (Project 823839). S. H. and H. M. acknowledge that instrumentation at the Proteomics Resource Center, The Rockefeller University is funded by the Sohn Conferences Foundation, and the Leona M. and Harry B. Helmsley Charitable Trust.
Author Contributions
S. H., Y. H., H. M., R. V., and R. A. S. methodology; S. H., Y. H., Y. S., and R. V. investigation; R. A. S. and A. J. formal analysis; S. H., Y. H., A. J., Y. S., H. M., R. V., and R. A. S. writing-original draft; A. J. and R. A. S. software; S. H., H. M., R. V., and R. A. S. conceptualization; S. H., Y. H., A. J., Y. S., H. M., R. V., and R. A. S. writing-review and editing.
Conflict of Interest
Y. H., Y. S., and R. V. are employees of Thermo Fisher Scientific, the manufacturer of the Orbitrap and the Proteome Discoverer platforms used in this work. The other authors declare no competing interests.
Abbreviations
The abbreviations used are: CSM, crosslink spectrum match; ETD, electron transfer dissociation; EThcD, electron transfer higher energy dissociation; FAIMS, field asymmetric ion mobility spectrometry; FDR, false discovery rate; MAAH, microwave-assisted acid hydrolysis; MS, mass spectrometry; PTM, posttranslational modification; XlinkX/PD, XlinkX node embedded in Proteome Discoverer.
References
A comprehensive list of references is provided in the original document, detailing previous research and methodologies related to protein analysis, mass spectrometry, and disulfide bond mapping.