Robust Standards in Proteomic Research
Sigma-Aldrich and Experimental Therapeutics CentreDevelop standards that extend beyond the normal controls used in most experiments today.
Saturday, November 01, 2008
Email This |
Printer Friendly
By J.L. Turner (PhD), J.J. Walters (PhD), J. Wildsmith (MSc, MBA), H. S. Duewel (PhD), Sigma-Aldrich Corporation & Manfred R. Raida (PhD), Experimental Therapeutics Centre.
Proteomics, the large-scale study of proteins expressed by the genome, impacts the pharmaceutical development workflow. Protein study is relevant from protein or peptide-based biomarkers to peptide-based drug delivery systems or protein-based therapeutics.
Chemical proteomics, which is the analysis of proteins interacting with small molecules (such as ligands, drugs and lead candidates), allows early stage identification of the target protein, the identification of off-targets, possible toxic side effects, and it demonstrates possible alternative uses of already established drugs.
Analyzing the interaction of drugs on cellular systems in vivo rather than using recombinant single proteins therefore provides a deeper understanding of the biology as it allows early optimization of the molecules.
The development of enabling technologies has led to the importance of proteomics in applications such as imaging techniques, mass spectrometry (MS), as well as techniques for protein separation. This consistently calls for the highest quality reagents in order to optimize the performance of such technologies.
New platform-based techniques require benchmarking or a link to existing knowledge and technologies. This benchmarking must take place at several stages such as:
• Sample analyte preparation techniques,
• Data acquisition,
• Data analysis and interpretation and,
• The reporting of data and how this data is re-interpreted by the scientific community.

Without standardization, proteomic platforms cannot achieve the necessary level of confidence required to be an integral part of the drug development workflow. Proteomic workflows, while being diverse, hold an obvious potential.
Proteomics enables researchers to correctly identify proteins, characterize the post-translational modifications (PTMs) of proteins, ascertain how proteins interact with one another and other biologics, and to accurately quantify protein concentrations.
The use of proteomic standards is essential to provide confidence in platform performance, to enable accurate data interpretation by the community, and to facilitate cross platform comparisons and optimization. Proteomic workflows also have the unrecognized need for biological and technical repetition of the analysis and, within these repeats, the use of global standards to compare results.
Discovery and validation of potential biomarkers is especially reliant on global standards that allow comparison and normalization of the experiments over an extended timeframe and between different laboratories and instruments.
Proteomics Workflow
A proteomics workflow is best described as a set of modules, which can be divided into four areas:
• Sample preparation,
• Sample fractionation/separation,
• Sample analysis and data acquisition, and
• Data analysis including statistical analysis, and data reporting.
The modules consist of a variety of techniques that can be assembled in different ways to generate the required and optimal workflow to address the specific biological questions being posed. The modules used for one experiment are largely user-defined and are dependent upon a number of factors such as the amount of verification/validation required, user experience and the instruments available for analysis.
Besides user experience as a major determinant for experimental success, making the correct choice on the workflow is another determinant. At present, a unique universal workflow does not exist for all applications and the experienced user needs to be up-todate with current developments in the field.
New techniques supplementing the four modules such as chromatographic or electrophoretic techniques, sample preparation or manipulation, mass spectrometric methods and bioinformatics approaches to data handling, have contributed greatly to the recent growth and relevance of proteomics.
Recent and ongoing developments (for the past 20 years) in the field of MS have placed it at the crux of modern day proteomics. A real shift in proteomic productivity is attributed to the pairing of sample preparation, chromatography, electrophoresis, mass spectrometry and bioinformatics into platform technologies.
The basic premise of these platform technologies remains similar even though the exact methods and instruments used often vary widely between institutions, instrument vendors and even between adjacent laboratories.
As Andrew Anderson, director of Strategic Partnerships at Advanced Chemistry Development succinctly wrote in the August edition of PharmaAsia this year: "An overarching, commercially available informatics platform for drug discovery does not yet exist. Rather, current vendor or inhouse applications typically specialize in supporting only one or a few functions of the drug discovery process".
Studies are underway comparing and contrasting new platforms/methodologies of proteomic analysis to decrease the time required for sample analysis, and to determine the relative efficiencies between individual users and among labs.
Drug development researchers and government regulatory agencies often share expertise and laboratory infrastructure to speed validation and standardization of new methods to assess dose and controls.
This includes the development and standardization of mass spectrometric-based proteomic platforms for robust protein identification, characterization (including post-translational modifications [PTMs]), quantitation, and interaction.
Additionally, these platforms must be optimized to achieve high sensitivity and dynamic range in order to support proteomewide identification of protein modifications and the evaluation of drug-safety.

Protein Identification
With technological advances in data collection platforms, the amount of data collected has increased dramatically. This is apparent with the protein identification workflow, whether in a directed (such as 2D gel spots and affinity purified proteins) or a shotgun approach (such as whole proteome analysis).
As a result, automated data analysis (including very advanced statistical models and algorithms) has not only become a crucial part of the proteomic workflow but often determines (and even controls) the results of the experiment.
If done naively, the analysis can grossly skew the results. There have been prominent publications with over 10,000 proteins identified, the majority of which were later determined to be false positives for example.
Protein identification by MS-based methods is by far, more complex than considered a few years ago. Compounding the situation is the apparent infinite number of cell-, tissue-, developmental- and physiological-dependant protein variants that can exist resulting from at minimum PTMs, splice variants and mutations.
The proteomic field would gain greater acceptance and validity if there was an accepted way to answer the question of what constitutes identification of a protein. Standardization of how data is analyzed, the specific criteria for protein identification, and how that data is reported, should all be focus areas for improvement.
As different computational approaches to identify proteins become available in the protein identifi cation workflow, special attention must be paid to their use and validity. The field with organizations such as Human Proteome Organization (HUPO), Association of Biomolecular Resource Facilities (ABRF), National Institute of Standards and Technology (NIST) and American Society of Biochemistry and Molecular Biology (ASBMB) are making strides in this area with accepted consensus yet to be adopted.
Protein Quantitation
Beyond 'simple' protein identification lays the burgeoning area of protein quantitation. The original goal of proteomics was to identify and quantify all proteins in a cell at a certain state. For several years quantitation was simply overlooked in favor of protein identification.
MS is widely considered the heir apparent to the enzyme linked immunosorbant assay (ELISA) technique as the gold standard of protein quantitation. However, while protein quantitation by MS can be extremely reliable, there are potential pitfalls that can lead to erroneous results.
Even with the extreme sensitivity and specificity of multiple reaction monitoring on triple quadrupole mass spectrometers, nearly every MS-based quantitative assay must also be supported by orthogonal techniques. Several factors contribute to the delay in the acceptance of MS as a robust quantitative tool. These include the complexity of the workflow, experimental variation due to biological variation and sample preparation methods, and difficulty in obtaining reliable standards.
The isotope labeled standards used in small molecule analysis for example are readily synthesized and accepted as the most desirable internal standard for quantitation as well as for studying drug metabolism. However, similar and widespread standards for proteomic workflows are generally lacking and more difficult to obtain.
In recent years several approaches towards global standards for reliable quantitation over many experiments have been developed, which is mainly based on stable isotope labeling techniques. Nevertheless, these techniques have not reached the level of ease required for daily incorporation into the workflow.
Questions abound relative to the ideal approach toward protein and peptide quantitation - should labeling be done at the protein or peptide level? What should the label be? How is the label created? At what stage is the standard introduced in the workflow? These questions contribute to the complex issues a researcher must address when performing an experiment for protein quantitation.
A well-characterized quantitative proteomic standard would allow the researcher to assess the efficiency of their workflow and to take the necessary steps for experimental optimization. Ideally, this standard would be composed of an isotopically labeled species of interest with a dynamic range consisting of several orders of magnitude.
Protein Characterization
Proteomics has a clearly defined role in the analytical characterization of proteins. It is also vital at each step of a recombinant protein product pipeline - not only for novel product development, but also for quality control, assessment, and improvement.
The complete analytical characterization of an antibody for example includes confirmation of the primary acid sequence, exact intact and reduced molecular weights (MW), glycosylation profile, and identification of other potential molecular modifications.
MS plays a critical role in every one of these measurements. There needs to be standard reagents and protocols in order to qualify individual workflows and the instrumentation utilized.
With the emergence of protein-based therapeutics, MS is also an important tool for protein characterization during large-scale protein production runs. MS can be employed to ensure that the same protein chemical signature is achieved on a consistent basis.
Protein Post Translational Modifications
PTMs play an important role in normal cellular processes. Many PTMs occur in a cell or in a pathway to control function and signal transduction such as the control of transcription and translation.
Most of these modifications have not been analyzed in detail due to the lack of technical approaches and suitable standards to optimize the workflow and verify the results. It stands to reason, that changes in PTMs can drastically alter the way a cell functions and are often the causative agent for diseases.
The importance of PTMs is now realized and as a result several new fields of proteomics research have been developed such as phosphoproteomics, glycomics and lipidomics. To achieve a better understanding on the role PTMs play in normal and diseased cells, they must be identified, the measurement of their site-specific location within the protein must be determined, the stoichiometry of the modifications must be ascertained, and the chemical makeup of the modification must be confirmed.
Performing this work reliably requires very specific standards for each type of modification. For example, a phosphorylated standard could be a protein or a peptide, having multiple phosphorylation sites (1-4), and be amenable to all areas of the workflow including the numerous enrichment protocols available.
Multiple glycosylation standards such as the following may be required: free sugars, glycoproteins and glycopeptides. Regardless of the application, the workflow will be complex and fraught with potential pitfalls.
Without the appropriate standard reagents and standardization of the workflow, intraand inter-laboratory variability will remain a major hurdle. Adding the quantitation aspect to PTM analysis also complicates the workflow to such an extent that there is no reliable common approach available today.
The complexities surrounding the standardization of workflows, protocols, and platforms are in no way trivial. In studying the PTM of a protein in response to a biological stimulus there are three fractionation techniques, which are commonly used to enrich phosphorylated species.
These include immobilized metal affinity chromatography (IMAC), titanium dioxide (TiO2) enrichment, and chemical conjugation through phosphoramidite chemistry (PAC) or ß-elimination (Figure 1).

There have been recent introductions of chromatographic-based methods to enrich phosphorylated peptides. Comparison of different methods, such as IMAC with TiO2, show that these techniques have different selectivity and specificity toward mono-and multi-phosphorylated peptides.
Developments towards a generic method are ongoing but they have not produced the features required. There is also a subset of techniques for each of these methods. The IMAC enrichment of phosphorylated species for example has been demonstrated using numerous metals, a variety of chelators, and on a variety of resins.
Phosphopeptide enrichment kits are also available from a variety of manufacturers including some lab homebrew kits. Unfortunately, besides literature precedent, word-of-mouth communication and prior experience, it is difficult to discern the best technique, especially since only a few fully optimized protocols exist for any one particular workflow.
Proteomic Standards and Drug Development
Whether the focus of the drug discovery is biomarker discovery, lead optimization, chemical proteomics (target discovery) or candidate selection, proteomic platforms provide researchers the ability to identify, quantitate, and characterize proteins, as well as delineate protein-protein interactions.
With these evolving platforms, modern day proteomics represents a major research tool. While these proteomic platforms are the most widely employed techniques, there are some specialized areas of emergent interest requiring platform standardization. These include MS-based proteomic imaging, enzyme kinetics study, and media characterization.
In the increasingly complex area of protein research, never have standards and standardized techniques been more important, especially when putative therapeutic candidates are being scrutinized for use in preventing or combating human disease.
The risks surrounding drug development are high, especially if something goes wrong. These risks include both the upfront research and development cost and the potential negative side effects or the litigation that inevitably ensues.
The importance of these risks cannot be stressed enough and the process of drug discovery through commercialization is only effective when there is a system of checks and balances at every level.
The drug development process is adequately defi ned, however drug development programs may differ for each potential therapeutic with the exact testing and submission requirements being shaped by many unique factors. For example, the biological effects that a small chemical drug can elicit, its mechanism of action and potential metabolites can be widely different from a protein therapeutic such as an antibody.

Discovery
Discovery or differential proteomic analysis is a complex area as the biologics of interest are often present in intricate matrices such as serum, urine or saliva. These sample matrices add another dimension of complexity, as they require an advanced level of component separation prior to identification and quantitation.
Ways to separate complex mixtures include liquid chromatography (such as C18, SCX, HILIC and monolithic) and gel techniques (such as 1D, 2D and microfl uidic) need to be controlled, standardized, and benchmarked.
Biomarker discovery and consecutive validation is one of many emerging demands placed on proteomic workflows. Proteins, or differentially modified proteins due to a disease, will likely become the predominant tool for early disease diagnostics and improved treatments.
The biomarker workflow becomes even more complicated in biological fl uid analysis such as serum, urine and tears. Putative biomarkers are present in very low concentrations alongside high abundance proteins. Removal of these high abundance proteins by established techniques such as immunoaffi nity separations carries with it the risk of losing the interesting low abundant diagnostic proteins and peptides.
Standardized workflows with global interlaboratory standards are required for successful discovery and validation of biomarkers. Later in the validation phase, the use of global standards such as stable isotope labeled peptides is a requirement in order to compare between experiments over many years and between different instruments.
It is imperative that each step in the process such as the diversity of scale, sample, instrument, researcher and laboratory variation, are controlled in order for data to be considered comparative. Using a fully characterized standard is the best method for controlling all of these potential variables and minimizing downstream failures.
Common Standards
A widely adopted standard would be ideal for normalization, enabling researchers to qualify the evaluation and to determine the level of day-to-day and lab-to-lab variation. This standard should be amenable to all the 'accepted' workflows and be included through the entire procedure to support the level of data required.
A review of literature reveals numerous studies that may have benefited from a universally defined standard for testing. Even if the results of the experiment remain perfectly valid, at a minimum, the use of a standard would increase confi dence in the data, facilitate an extension of conclusions, or potentially demonstrate the bias(es) of a new analytical technique.
A universal standard also instills the confi dence to perform multiple experiments across platforms, labs, or even borders, and know of the expected outcome. It also improves the probability that other labs will be able to reliably re-create results on a given platform.
It is important to note that peer reviewed journals (such as Proteomics and Molecular and Cellular Proteomics) have initiated steps to ensure that data has been properly collected. Nevertheless, without standards, researchers are largely dependent upon another's familiarity with the workflow that is used to generate their data.
While experience and familiarity develop over time, the use of an appropriate welldefi ned standard can reveal experimental inaccuracies and biases previously unseen.
The need for standards has also been exemplified by several publications from Eugene Kolker's group at the Biatech Institute. Initially beginning with 18 commercially available proteins, which served as a standard to optimize database search algorithms and minimize error rates, his group has since expanded the complexity and value of their mixture.
These, more complex standards, have been demonstrated by the Kolker group in a 2007 Proteomics article to add value in the assessment of protein identification for MS. In a similar vein, HUPO has produced a standard, consisting of 20 recombinant proteins, which has served as a training tool for its members.
Besides the use of standards, the publication of raw data, especially MS/MS data becomes important. Releasing the many MS/MS spectra in a standardized format, after removing the context to the experiment, which may interfere with the intellectual property (IP) position, would greatly benefit many laboratories.
The already available PeptideAtlas, which contains thousands of protein identifications based on the MS/MS spectra, allows researcher worldwide to validate their findings and plan MRM experiments for biomarker discovery and validation.
These data serves as a computational standard and with its increasing size, findings in the protein discovery field could be validated and false positive findings, which often lead to unnecessary costs in development, can be avoided at an early stage.
ABRF Study
An example of experimental biases was demonstrated in a study published by the Proteomics Standards Research Group (sPRG) of the ABRF in 2006. The sPRG provided a study sample consisting of an equimolar mixture of 49 human proteins to over 100 participating labs, and asked for simple identifications of the protein components of the sample using methods of analysis or fractionation the labs would chose.
The variation with the results obtained (Figure 2) showed 66% of the respondents were able to identify at least 30 out of the 49 proteins, and 8% identified only less than 10 of the proteins.

Based upon the results, the sPRG theorized that the success in protein identification is largely user-dependant rather than methodor platform-dependant. The study noted that valid results are possible in laboratories that do not have the most sophisticated instruments, but instead spend time optimizing experimental variables.
The results of the study have also surprised many participants. Some labs, never having used a complex standard, realized how well (or how poorly) they were performing in comparison to other labs. This benchmarking could drive quality standards in proteomics.
The results demonstrate that, even with widely employed proteomic techniques, accurate identification of the proteins and modifications present within a complex mixture require diligence by the researcher to ensure that a previously validated experimental workflow is being employed.
The analysis and identification of proteins by MS, while extremely powerful, is also extremely complex as concluded by this ABRF study. Use of standards for platform validation aide researchers in establishing confidence levels surrounding their workflow.
Standardized Protocols
Standards are available in a variety of different formats and vary in protein number, source, and purity of each protein. While many labs consistently make use of wellcharacterized proteins such as bovine serum albumin to measure platform performance, complex standards are required for more intricate and comprehensive studies.
Ruedi Aebersold and his group published a comprehensive evaluation in Molecular & Cellular Proteomics in 2006 comparing and analyzing different methods for phosphopeptide enrichment (Figure 1).
The study, which utilized a previously undefined Drosophila melanogaster cell lysate for analysis, worked well for one module of the workflow. However, it failed to show any of the unknown up- and down-stream biases that may have been revealed using a known phosphopeptide standard.
For example, using a phosphopeptide standard, a researcher may ask how well the instrument of choice detects phosphorylated species. Is the sample container or pipette tip non-selectively binding certain peptides? Do the reagents introduced in the chemical conjugation step introduce bias in the analytical technique downstream?
This study has been widely cited by scientists within the community who are actively involved in the study of phosphorylation and using a similar phosphorylation analysis workflow. However, the lack of standardized protocols may hinder further discoveries, as there is no way of qualifying a re-creation of the adapted workflows.
Designing Standards
Standards need to be developed that extend beyond the normal controls used in most experiments today. These standards may be used as a means to determine sample loss and to demonstrate sample normalization, to examine the extent of bias in a particular separation technique, to examine the confidence limits or the detection limits in an analytical method.
The modules being used in the proteomics workflow should largely define the makeup of the standard employed. Once defined, these standards should be introduced as early as possible in the experimental protocol.
Design of a standard for testing of a particular workflow or module needs to reflect a strategic balance between defining and producing a standard of appropriate complexity, and the costs and time associated with its creation and analysis.
The nature of the standard should reflect the needs of the researcher and their needed confidence in the data. The confidence in the data should not be subjective as these parameters are measurable with the use of appropriate standardization.
Many standards that are designed to fit into the proteomics workflows are available as commercial products, although many labs still choose to produce and work with homebrews. Commercial standards have the added value of being produced on a large scale and thereby defray production and starting material costs while at the same time decreasing overall variability.
Commercial standards also have the added value of being thoroughly 'vetted' by the community. Given the subtle nuances of many proteomics experiments, commercially available standards may need to be customized in order to suit every desired experiment.
As realization of the importance of standards to test, optimize, compare and validate these workflows becomes prevalent, so should the availability and variety of commercially available standards. A collaborative effort between manufacturers, government agencies, and the research community at large will be required to standardize proteomics and facilitate its coming of age.
Del.icio.us |












