CHI's Molecular Med Monthly Articles

newheader.jpg

Search CHI's Network

Back to Molecular Med Monthly Articles main page

The Current State of Proteomic Technology

A Larger Task Ahead
If there is a single phrase that can capture the challenge that lies ahead in understanding the proteome and all of its functions, it is that genes were easy. By their very nature, the simplicity of their chemical make-up, and their ability to serve as templates for exact copies of themselves, they have been ideal subjects for study, readily amenable to automation and extremely high throughput levels of analysis. These features are what enabled the Human Genome Project to be completed years ahead of schedule.

Unfortunately, the proteome will not yield itself so easily. First, gene expression analysis, aggressively pursued by the industry over the past few years, provides limited information on the proteins they encode, primarily because:

Messenger RNA levels do not always correlate with protein levels. In fact, correlation is less than 0.5
One mRNA does not necessarily code for one protein, due to alternative splicing between exons
Proteins are subject to post-translational modification such as proteolytic processing, phosphorylation and glycosylation. Each modification may affect function in a unique way
Proteins have varying half-lives
Proteins can be compartmentalized into different cellular locations (for example, surface vs. internalized) in such a way as to affect their activity
Some proteins may not be functionally relevant until they are assembled into large complexes

Second, proteins require more delicate handling than DNA, because they can easily unfold when coming in contact with the wrong surface or environment. Third, strand complementarity makes identification of DNA a simple task, whereas proteins must be detected using mass spectrometric analysis in conjunction with sophisticated software or using molecules (such as antibodies) that specifically recognize their molecular structure.

Despite these complexities, there is a compelling reason to pursue the human proteome, because doing so will bring the industry that much closer to understanding the molecular basis of disease. The number of companies dedicated to sorting out the function and activity of proteins on a genomic scale has grown substantially in the past few years. Most have been established to capitalize on a particular set of technologies, which are used to generate revenue through collaboration or sale of products and databases. Only a handful are actually engaged in drug discovery programs of their own.

An issue facing all players in the proteomics field is that the technology is far from the level of precision available to users and purveyors of genomic technology. Some will be incorporating incremental improvements, along with automation, to vastly scale up (or industrialize) the output from traditional methodologies such as 2D gel electrophoresis, while others are seeking alternative technologies, such as microarrays, that would be accessible to a broader market and provide for more precise, quantitative analysis.

Industrialization of Mature Technologies
The workhorse of proteomics, 2D gel electrophoresis, might at first glance appear to be the ideal method for annotating and tracking the proteome. By separating a crude sample of proteins in one dimension by isoelectric point, and the second dimension by molecular weight, 2D gels are able to resolve thousands of proteins in a single experiment. However, the method has several limitations:

Membrane proteins may be excluded due to poor solubility in the prepared sample
Staining methods have a limited dynamic range and most sample preparation methods make it difficult to detect low abundance proteins
Spot positions can vary significantly, complicating the task of gel registration and making it difficult to compare protein expression levels from different experiments
Identification of proteins on the gel is difficult, requiring extensive computational analysis of mass spectra and comparison with genomic databases

Incremental Improvements
To address these limitations, many companies and academic groups are forging ahead with incremental improvements meant to increase the reliability and throughput of 2D gel analysis. These efforts are focused on 4 critical areas:

Sample preparation or fractionation methods to access low abundance or hydrophobic proteins
Improving reproducibility of spot positions and gel-to-gel comparison
Mass spectrometry instrumentation for the identification and quantification of proteins spots
Bioinformatics for interpretation of mass spectra

Generally, sample pre-fractionation can simplify the sample being analyzed and thus improve the resolution of downstream analysis. One method, called laser capture microdissection enables one to precisely select cells from a tissue section (diseased or normal), for subsequent lysis and preparation. This provides significantly cleaner results than conventional tissue sample preparation. Other common forms of pre-fractionation (affinity columns, ion exchange, HPLC) can improve the identification of low abundance proteins.

Improved sample preparation methods have been developed to increase the yield of membrane proteins on the gel, which are otherwise undercounted, while automation in sample handling has been shown to increase both yield and sensitivity at the point of mass spectral analysis (see "Impact of Automation on Proteomics" by Keith Ashman).

When comparing the proteomes of two cell states (e.g. diseased vs. normal), gel-to-gel variability in spot position and protein yield often places the results of such experiments in question. Differential labeling enables one to analyze both states on a single gel, thus enabling direct comparison of protein levels. In this method, cells are treated with normal media, or media enriched in ¹⁵N. Corresponding proteins from each state will migrate to the same location on the gel, but analysis by mass spectrometry will distinguish the metabolically labeled peptides and thus quantify the two sets of proteins separately. This can have significant impact on reproducibility when comparing experiments.

An analogous differential labeling technique uses isotope coded affinity tags (ICAT) that chemically modify peptide cysteines with a normal- or deuterium-labeled biotin reagent. Samples are pooled purified by avidin chromatography and quantified as described above, but there is no need for metabolic labeling. Both differential labeling techniques permit combined samples to be pre-fractionated prior to separation, without losing information on their relative quantities.

New fluorescent stains (such as Sypro) have improved both the dynamic range of protein detection and protein quantification in 2D gels. But fluorescent stains have only marginally improved the detection of low abundance proteins from unfractionated lysates.

One of the most significant improvements in 2D gel pattern reproducibility if not one of the simplest came with the introduction of immobilized pH gradient (IPG) strips. IPG strips standardize the first dimension of separation, which is usually fraught with artifacts and variability. After the first dimension is run, the strips are laid on a polyacrylamide gel to separate the molecules by molecular weight. Gels become easier to compare from one experiment to the next.

Software can compensate to a large degree for the variability in spot positions. Several companies have introduced software that facilitates automated analysis of 2D gels, by aligning spots on one gel with those on another gel, and integrating the intensity of each spot. These include:

Melanie ( Geneva Bioinformatics and BioRad Laboratories)
ImageMaster (Amersham Pharmacia Biotech)
Phoretix 2D (Phoretix International)
Gellab (Scanalytics)
Kepler (Large Scale Proteomics)
Z3 (Compugen)
GD Impressionist (GeneData)

Some of these products use rather sophisticated algorithms to register each gel. Z3, for example borrows from moving image processing. The challenge is to automate the entire process of spot identification and integration, to substantially increase throughput. Compugen claims to have improved throughput 10-fold over standard 2D gel image processing. GD Impressionist adds another layer of statistical analysis over other gel programs, such as Keppler or Melanie, to help researchers visualize the data for analysis.

After a spot has been located, identification of proteins usually requires spot excision, enzymatic digestion, purification and deposition on a substrate for matrix assisted laser desorption ionization (MALDI) for subsequent mass spectrometric analysis. One potentially significant improvement involves bypassing staining, excision and enzymatic digestion altogether. An interleaved polyvinylidene difluoride (PVDF)/trypsin coated membrane permits transfer and digestion of proteins in a single step directly from the gel. MALDI-mass spectrometric analysis is then conducted directly from this membrane. Long analysis times and consumption of large amounts of computer memory make this method prohibitive for routine use, but advancements in mass spectrometry and computation may make it more feasible in the near future.

Mass spectrometry, to a significant degree, has been the driving force behind progress in proteomics. The two most important types of mass spectrometry for proteomics are MALDI and electrospray ionization (ESI), both capable of identifying and quantifying large biological molecules. Identification of proteins with mass spectrometry requires a marriage between instrumentation and computation, in which a peptide mass "fingerprint" must be matched up with the theoretical mass spectrum of any of a large number of proteins derived from a genomic database. The problem gets even more complicated when trying to identify a protein from an organism whose genome sequence is not yet complete. Innovative approaches to this complex search problem have greatly improved protein identification. Examples of mass spectrum analysis software include ProFound, Mascot and PeptIdent2. Advances in mass spectrometry, completion of genome projects and improved database searching software have been critical to growth in proteomic research.

Industrialization
The assembly of proteomic databases approaching the scope of the Human Genome Project will require an industrialized approach to gathering data on protein expression. Oxford Glycosciences (OGS), Large Scale Biology (LSB) and Proteome Sciences have concentrated on industrializing all aspects of proteomic technologies (including 2D gels) and increasing throughput. They are doing this by integrating many of the improvements mentioned above with their own technology in automation and analysis.

Oxford Glycosciences is arguably the leader in this area, running hundreds of gels and sequencing thousands of protein samples using mass spectrometry per week. This level of throughput requires a high level of automation and a large investment in expensive robotics, mass spectrometry equipment and experienced personnel. The scale of their technology and ambitions are highlighted by their agreement in 1998 with Incyte pharmaceuticals to develop and commercialize proteomics databases for human, animal, plant and microbial organisms. According to Christopher Ashton of OGS, "since the proteome is essentially infinite, only those with significant throughput capabilities are going to make an impact. OGS is constantly scaling up".

But even OGS is not putting all of its bets on the longevity of 2D electrophoresis. The company is developing protein microarrays as well (in part through a collaboration with Cambridge Antibody Technology) and has stated that it is not wedded to any particular technology in proteomics.

Large Scale Biology efforts are also centered on scaling up traditional technologies and, like OGS, "turbocharging" them with their own proprietary improvements in sample preparation, gel staining, image analysis, robotic spot picking, mass spectrometry, and database mining. In collaboration with pharmaceutical companies such as Genentech, F.Hoffmann La Roche, Eli Lilly, Novartis, Pfizer and Aventis, the company is developing two large databases of protein expression. The Molecular Anatomy and Pathology (MAP) database tabulating protein expression in normal and diseased tissue, and the Molecular Effects of Drugs (MED) database containing information on protein expression in response to drugs.

A third company taking the industrialized approach to proteomics is Proteome Sciences, which has developed an integrated system with improvements in sample preparation, advanced 2D gel electrophoresis, image analysis, mass spectrometry bioinformatics and methods to greatly improve sensitivity of detection on a gel. The company is applying its technologies to identify new diagnostic markers and therapeutic targets, and is developing proteomic databases in five major research areas: oncology, neurology, cardiovascular disease, diabetes, obesity and solid organ transplant rejection.

While OGS, LSB and Proteome Sciences are putting together integrated programs with the goal of mapping out large portions of the proteome and pursuing their own drug discovery initiatives, other companies are positioning themselves as vendors of integrated proteomics technology or it components. Key providers of integrated systems based on 2D gel electrophoresis include Amersham Pharmacia Biotech, Applied Biosystems, GPC Biotech, BioRad and Genomic Solutions.

Seeking Alternatives to Traditional Proteomics
The 2D electrophoresis technique is unparalleled in its ability to resolve over 10,000 proteins in a single experiment. Although enormous strides have been taken in industrializing 2D electrophoretic analysis, the industry is looking for new techniques that may replace 2D gels, warts and all, for something more amenable to automation and high throughput analysis. This is especially critical for smaller companies and late-comers who do not have the time or resources to build proteomic factories like the Big Three (OGS, LSB and Proteome Sciences).

A number of chromatography techniques (such as LC-MS/MS or 2D chromatography-MS/MS) are being developed to provide alternative, liquid phase separations before mass spectrometric analysis, but relatively poor resolution of complex mixtures have kept these methods from rivaling the power of 2D solid phase separation on gels.

Potentially, the most revolutionary new technologies are those that impart the equivalent of PCR amplification on proteins. Two of these technologies, discussed in this report, are phage display and Profusion. In particular, Profusion has the capability of taking any pool of mRNA, translating it into protein/mRNA fusion products, and following up any kind of selection protocol on the protein with further amplification. The company that is commercializing this technology, Phylos, is seeking to combine the method with other powerful screening techniques such as microarrays.

Protein microarrays, or protein chips as they are sometimes called, are an emerging alternative to 2D gel based methods. By providing an addressable array of spots, with analytes detected directly by methods such as mass spectrometry or fluorescence, protein chips eliminate much of the irreproducibility and complexity of 2D gel analysis.

The Future is Looking Very Small
Like their predecessor the DNA chip, protein chips will be able to significantly increase the rate of analysis, measuring hundreds to thousands of samples simultaneously for information on protein expression and protein-protein interactions. But the manufacturers of protein chips will not have the benefit of working with a simple and robust molecule such as DNA. Proteins are notoriously sensitive to their environment and to surfaces they come in contact with, and can easily unravel if not treated under the right conditions.

One company, Zyomyx, is using its expertise in surface chemistry to deposit proteins onto a solid surface without disrupting their structure. The nature of the chip surface and methodologies for the attachment and monitoring of proteins are considered by the company to be key differentiating and proprietary features of their technology. The company aims to produce very dense microarrays of up to 10,000 addressable spots per square centimeter. Zyomyx intends to market its chips as part of an integrated system, including ultra-high throughput protein dispensers and a biochip reader, but the protein chips will also be work in conventional DNA chip readers.

Ciphergen was the earliest entry, and now commercially the most advanced, in the protein chip market. The Ciphergen ProteinChip can accept a crude biological sample, capturing a subset of hundreds or thousands of proteins on a chromatographic surface, or only a few proteins based on specific interaction with an antibody or receptor bound to the chip surface. Multiplexing at this stage is limited to 96 samples analyzed in parallel.

There are a couple of key advantages to this technology that make it one to watch closely as the protein chip field develops. These are:

Analysis by mass spectrometry straight from the chip eliminates the need to preserve native conformation after binding to the surface

The ability to mine for previously unknown proteins. If 2D gel electrophoresis is the ideal "open" system, able to capture virtually all proteins, known and unknown, the ProteinChip chromatographic surface is a"semi-open" system, also able to capture virtually all proteins, known and unknown, based on very broad characteristics, such as overall charge or hydrophobicity.

Packard Bioscience has introduced an innovation in chip formatting that circumvents many of the problems associated with fixing proteins to a solid surface. The company uses what it calls a hydrogel chip, in which the proteins are attached within a hydrophilic polymer matrix on a glass slide. The porous and aqueous nature of the matrix permits proteins to interact more naturally with the sample being tested.

Oxford Glycosciences has formed several partnerships to assemble the components of its own protein microarray system. The company has selected the Packard hydrogel technology for its chip format, and has formed a collaboration with Cambridge Antibody Technology to gain access to a genome-scale antibody library. The antibodies will serve as the capture proteins immobilized within the gel matrix. According to the company, it is unlikely that "global" chips able to capture an entire proteome will be developed. Instead, their protein chips will be developed around specific applications.

Biacore has developed a proprietary system to measure not only endpoint levels of protein or ligand binding on a chip, but also kinetic data. The method is based on surface plasmon resonance (SPR), in which binding of an analyte to the surface changes the refractive index at the surface/solution interface. Measurement of protein or ligand binding can be conducted optically in real time. Millennium Pharmaceuticals and Biacore have recently teamed up to develop a higher density protein chip based on SPR technology.

Protein microarrays will permit researchers to scan thousands of proteins in a variety of proteomic experiments, including differential expression, response to drugs, protein-protein interactions and identification of disease biomarkers. So far, they have proven to be very quantitative and, by virtue of their addressable arrays, much easier to compare results between experiments than 2D gels. Commercialization of protein arrays also promises rapid development toward real applications in clinical and point-of-care diagnostics, which would be impossible with more complex proteomic technologies that require electrophoresis or chromatography. One disadvantage of the microarray approach is that generally it is a "closed" system – you can only measure proteins for which you have a capturing agent (such as an antibody). Chromatography and electrophoresis approaches, on the other hand are more amenable to the discovery of novel proteins. One exception is the Ciphergen chip, which can be prepared with chromatographic surfaces.

Complementary Technologies Enabling Chips
The development of protein microarrays will be facilitated by several complementary technologies. In particular, it will be necessary to systematically produce a large number and variety of capturing agents that are fixed to the array in order to pick up the desired proteins from the sample. For the most part, those agents will be antibodies generated by the following methods:

Profusion (Phylos): mRNA/antibody fusions are selected for binding to a target, and then amplified. Capable of producing antibody libraries with 10¹⁰ members.
Phage Display (Dyax, Cambridge Antibody Technology): phages displaying a combinatorial set of antibodies are selected for binding to a target, and then amplified by reinfection of E.coli. Capable of producing antibody libraries with 10⁹ members.
Biodisplay (Biovation): synthetic antibody libraries are selected on a target, and the identity of the binding antibodies determined by cleaving off a molecular barcode and analyzing by mass spectrometry.
HuCAL (MorphoSys): synthetic antibody libraries cloned into E.Coli

Both Profusion and phage display can be used not only to generate capturing antibodies, but also as a source of protein sample derived from cellular mRNA.

Detection of proteins bound to the microarray is also a critical enabling step. In some cases, they are detected by mass spectrometry. Ciphergen employs surface enhanced laser desorption and ionization (SELDI) followed by mass spec to read the bound molecules directly off the chip. Eventually it will be necessary, if smaller players are going to gain access to proteomic tools, to find solutions that do not require expensive equipment for detection (a MALDI-TOF mass spectrometer can cost between $200,000 and $350,000).

Molecular Staging has adapted its rolling circle amplification (RCA) technology to provide a correlate of PCR for proteins. Immuno-RCA involves an antibody-DNA conjugate that is capable of generating long concatenated (and fluorescently labeled) copies of the attached DNA. Antibodies that bind to the chip surface, whether directly on the target or on another antibody in a sandwich assay, can amplify the signal from the bound protein to such a degree that it is possible to detect even a single molecule. The ability to anchor the amplified DNA to the protein target makes immuno-RCA an ideal application for protein chip signal detection. The company is actively developing microarray applications for diagnostics in allergy and other areas in partnership with Axcell Biosciences.

Beating a Path to New Drugs
One of the primary objectives of proteomics is to map out the complex network of biological pathways in normal and diseased states. Protein-protein interactions define the nodes of these pathways, and thus many technologies have been developed to identify these associations. These include

Yeast two hybrid assays based on fusing a "bait" protein to one domain of a transcription factor and a "prey" protein to another domain of the transcription factor. If the two proteins interact, it produces an active transcription protein necessary for the survival of the cell. Only cells with positive interactions survive. Variations of this technology are marketed by a number of companies. Examples include Curagen’s PathCalling, GPC Biotech PathCode, Myriad Genetics ProNet and Hybrigenics two+one hybrid system.

E.coli Dimerization Detection System developed by Morphochem, a membrane based protein-protein interaction assay that uses dual protein-signal domain hybrids analogous to the yeast two hybrid method.

Phage display can be employed to identify interactions. Axcell Biosciences is using this technique to create an Interfunctional Proteomic Database to construct protein interaction pathways.

Biacore surface plasmon resonance chip can be used to identify protein interactions and kinetics in real time. Four receptors can be applied to a surface and tested for interaction with other proteins. The company is working on denser arrays.

Other protein microarrays such as Ciphergen ProteinChips can be used to capture proteins that interact with a target immobilized on the surface.

Fluorescent Resonance Energy Transfer (FRET) and Bioluminescence Resonance Energy Transfer (BRET2) used by Packard Biosignal to detect interacting partners through the transfer of energy from a donor to acceptor molecule attached to each protein. The interaction is read as emitted fluorescence.

Proximol developed by Cambridge Antibody Technology uses a free radical enzymatic reaction to label molecules in proximity to a given protein.

Any method requiring a fusion construct, such as yeast two hybrid, suffers from the possibility of presenting false positives or omitting interactions through false negatives. The risk in using these methods is that the fusion molecule may not be properly folded, or it may obscure critical interaction surfaces. The fact that millions of potential interactions can be screened in a single yeast two hybrid experiment explains why so many companies have used this technology despite its limitations and questions of accuracy.

Alternative methods for identifying protein-protein interaction, such as Tandem Affinity Purification (described in this report), provide more gentle isolation of protein complexes under conditions more likely to preserve native structure. Although results can be cleaner, they cannot match the throughput of selection-based technologies such as yeast two hybrid or phage display.

Post-Proteomics
Companies are taking position at the end stages of drug discovery in the hopes that industry-wide efforts in gene expression, protein expression, protein-protein interaction and other proteomic studies will yield many disease targets that must have their function verified. But to become a marketable solution for the industry, they must significantly increase the scale of functional experiments such as animal models and cell assays that, historically, have not been easily scaled.

Lexicon Genetics’ proprietary technologies enable it to produce large numbers of knockout mice for the purpose of identifying the phenotypic consequences of a particular gene or protein, potentially validating it as a therapeutic target. In 1999, Lexicon was the fourth most active genomic company in terms of collaborations (15). The company’s deals in 2000 have included several collaborations with large pharmaceutical companies such as American Home Products, Biogen, Boehringer Ingelheim, Bristol Myers Squibb and GD Searle. An expanded agreement with Millennium Pharmaceuticals triples the number of knockout mice to be generated for that company.

Cell biology is also looking less traditional these days. Companies such as Automated Cell and Cellomics have developed live cell assays that fully automate sample handling and quantify cellular characteristics such as motility, proliferation and morphology. The ability to track the behavior of individual cells over time permits data gathering on functional behavior not available in any other kind of assay. This functional assay technology is amenable to high throughput analysis, and therefore can occupy a niche complementary to many proteomic technologies focused on identification of potential therapeutic targets.

Diagnostics is the Earliest Beneficiary of Proteomics
The first to benefit from Proteomics will be those who commercialize the many diagnostic markers that will arise from this work. Certainly before a therapeutic drug is developed, and even before detailed biological pathways are determined, there will be many markers discovered indicative of disease. Early work in proteomics is already yielding a large number of markers that can be commercialized in diagnostic assays, while at the same time being used to gauge effectiveness of new drugs in development. The development of diagnostic markers will pave the way for the effective administration (and market access) of the novel therapeutics that will follow.

Several proteomics deals are beginning to bear fruit in the diagnostics arena. In one example, Oxford Glycosciences and Pfizer filed a joint patent application in April 2000 for the discovery of potential clinical markers for Alzheimer’s Disease.

Toxicology studies also stand to benefit early in the development of proteomics technology. A prime example is in Large Scale Biology’s Molecular Effects of Drugs (MED) database, which is accumulating information on perturbations caused by therapeutic treatments. In other studies by Novartis, 2D gel profiles indicated that treatment with cyclosporin A (an immunosuppressive agent) caused a dangerous reduction of calbindin in kidney tubules, suggesting a mechanism for the known toxic effect of cyclosporin A of intratubular calcification.

A Compelling Need for Another Industry-Wide Initiative
In 1980 there was serious consideration by Congress of putting together an organized effort called the Human Protein Index. It was recognized then that proteins direct function, and that the full annotation of human proteins would be an enormous benefit to life science research. But 1980 was a different era. Most did not foresee the enormous commercial benefit of such a task, as was later seen with the Human Genome Project, nor did it seem entirely feasible with the tools available at the time. With the introduction of automated sequencing, decoding the genome took center stage.

Now the attention is shifting back to the proteome. Many pharmaceutical companies and government agencies that collectively laid out billions of dollars toward gene sequencing and expression studies are recognizing that no amount of money or effort on genes alone is going to provide the industry with what it needs to produce novel therapeutics. An understanding of the mechanisms underlying disease, more directly mediated by proteins, is required. While the task is so large and current efforts are so scattered, it would make sense to actuate all the players in a coordinated effort to solve the proteome: government agencies, investors, private companies and technology incubators. The technologies described here, are preparing the way for, dare we say, a Human Proteome Project.

Articles related to Proteomics on the Web:
Bio Online - Proteomics Panel Discussion
Chemical & Engineering News: Proteomics Taking Over Where Genomics Leaves Off
Nature - A Post-Genomic Challenge: Patterns of Protein Synthesis
Signals - Plucking Proteins From Cell Soup
Signals - Protein Chip Challenges
Signals - Proteomics Gears Up
Signals - Systems Biology In The Post-Genomics Era
The Scientist - Proteomes and 2D Gels
The Scientist - Pursuing Proteomes
The Human Proteomics Initiative
Why Proteomics in Universities?
Proteomics Overview
Proteomics Tour

For more information on this report, please contact:

Cindy Ohlman
CHA Advances Reports
PH: 781-547-0202
cohlman@advancesreports.com
www.advancesreports.com

foot.jpg

Your Life Science Network

Cambridge Healthtech Institute | 250 First Avenue | Suite 300 | Needham, MA 02494
Phone: 781-972-5400 | Fax: 781-972-5425
chi@healthtech.com