Antibodies are among the most frequently used tools in basic science research and in clinical assays, but there are no universally accepted guidelines or standardized methods for determining the validity of these reagents. Furthermore, for commercially available antibodies, it is clear that what is on the label does not necessarily correspond to what is in the tube. To validate an antibody, it must be shown to be specific, selective, and reproducible in the context for which it is to be used. In this review, we highlight the common pitfalls when working with antibodies, common practices for validating antibodies, and levels of commercial antibody validation for seven vendors. Finally, we share our algorithm for antibody validation for immunohistochemistry and quantitative immunofluorescence.
Keywords: antibody, validation, immunohistochemistry, immunofluorescenceAntibodies are among the most commonly used research tools, routinely used for Western blot (WB), immunoprecipitation (IP), enzyme-linked immunosorbent assays (ELISA), quantitative immunofluorescence (QIF), and immunohistochemistry (IHC). They are also important tools in clinical management with extensive use in both laboratory medicine (ELISA assays and flow cytometry) and anatomic pathology (IHC). In anatomic pathology, IHC serves as a diagnostic, prognostic, and predictive method and IHC readings directly influence the management of patients in the clinical setting. For example, the assessment of estrogen receptor α (ER-α), and human epidermal growth factor receptor 2 (HER2) by IHC in breast cancer patients is the definitive test to determine whether or not a patient will receive therapies that can cost as much as $100,000 per year. Thus, in the clinic, as well as in the research laboratory, careful accurate validation of antibody reagents is critical for correct results.
The influence of antibody-based tests on clinical decisions has led to a number of publications that have highlighted the unmet need for standardization of such assays and development of antibody validation guidelines (1–8). Although many groups have enunciated the need, there are no universally accepted guidelines for best practice in antibody-based tests. There are a number of books on the topics by world leaders such as Clive Taylor and David Dabbs, and recently, an ad hoc group published a set of “recommendations” (2). However, these works focus on the clinical aspects of IHC, often using subjective criteria and often not taking advantage of recent scientific advances that allow more quantitative evaluation of antibodies. Conversely, there are other groups that have done biologically rigorous evaluation of antibodies using surface plasmon resonance (9) and even X-ray crystallization of antibodies bound to their antigens (10), methods that are unachievable in a routine research or clinical setting. The wide range of rigor and methodology in what is used for validation is probably responsible for a lack of consensus on a single method for antibody validation. Here we present an overview of antibody validation approaches and the pitfalls associated with the failures of validation. This work specifically focuses on assessment of prognostic and predictive cancer-related biomarkers on formalin-fixed paraffin embedded (FFPE) tissue.
The FDA defines validation as “the process of demonstrating, through the use of specific laboratory investigations, that the performance characteristics of an analytical method are suitable for its intended analytical use” (www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM070107.pdf). For antibodies, one must demonstrate that they are specific, selective, and reproducible in the context for which they are used. When it comes to IHC, standardization can be quite challenging due to the number of pre-analytical, analytical, and post- analytical factors known to influence staining in FFPE tissue. Variable time to fixation, inadequate fixation period, differences in fixative used, and tissue processing can all affect tissue antigenicity (5,11). Antibody clone and dilution, antigen retrieval, detection system, and interpretation of results using different cutoff points are also important variables that regulate IHC measurements (3,12) (V.K.A., unpublished data). Here we focus on analytical factors and highlight the importance of proper antibody validation, especially for IHC or QIF use.
A recent editorial by Michel et al. (13) emphasizes the lack of target specificity for 49 antibodies against 19 subtypes of G protein–coupled receptors calling for more stringent antibody validation criteria. Examples highlighted by the authors included double-knockout mice lacking the M2 and M3 subtypes of muscarinic receptors still staining positive for M2 and M3 receptor antibodies (14), and triple-knockout mice for the three α1-adrenoceptor subtypes demonstrating staining patterns similar to wild-type mice (15).
Determining the specificity of an antibody is in part dependent on the type of the immunogens: synthetic peptides or purified proteins. Synthetic peptides provide the advantage of knowing the amino acid sequence to which the antibody binds; however, these peptides do not necessarily recapitulate the 3-D structure or post-translational modifications of the native protein (16). As a result, antibodies generated against a synthetic peptide may not work well when a protein is in its native conformation with intact 3-D structure. Such antibodies may not be useful for IP or IHC experiments, but may bind the protein of interest after it is fully denatured when running SDS WB. The opposite could also be the case, especially if the immunogen was the purified protein, where the antibody works well for proteins in their native conformation, but not when denatured. Thus WB cannot be an absolute standardization for antibody binding in IHC or other assays where the antigen is in its native conformation.
The native versus denatured confirmation is further complicated by methods used to fix tissue. Epitopes that are not exposed in the native proteins can be exposed in fixed tissue and vice versa, even though they may not be truly denatured. Thus, an antibody could recognize one epitope in fresh tissue, but when applied to fixed tissue recognize another epitope (17,18). A representative example is an epitope on BCL-2 (41–54 amino acids) that is exposed when BCL-2 is present in the cytoplasm but is inaccessible in the nuclear compartment most likely due to interaction with other proteins (19). However, when BCL-2 is phosphorylated at sites close to the epitope, the epitope becomes available, as seen when the protein is extracted from cells or denatured by SDS (20).
The issue of epitope specificity is further complicated by the choice of monoclonal versus polyclonal antibodies. Polyclonal antibodies represent a pool of antibodies against the immunogen and typically show a higher probability for detection in a range of different conditions, whereas a monoclonal antibody is more likely to work in only one set of conditions. Polyclonal antibodies might nevertheless contain superior affinity clones. Perhaps the best example of this is the FDA-approved Herceptest antibody from Dako, which is the industry standard for HER2 testing in breast cancer (21). Monoclonal antibodies are more pure, since they are produced from single clones of fused cells producing immune globulins. However, these clones may be grown in host animals where the ascites fluid containing the secreted antibody is collected (16,18); therefore, these antibody preparations may be contaminated by antibodies other than the monoclonal antibody of interest. A study by Spicer et al. demonstrated that seven of 20 monoclonal antibody preparations (35%) they analyzed had staining patterns localized to the Golgi cisternae unrelated to the antigenic specificity of the antibody, and that five of these cross-reactive antibodies failed to even stain the antigen of interest (22). Antibody validation efforts in our lab have shown similar lack of specificity, even in monoclonal antibodies. Figure 1 provides two examples of nonspecific antibodies tested in our lab. In Figure 1A , a mouse polyclonal antibody against HoxA1 (B01P; Abnova, Taipei City, Taiwan) was used to probe lysates from ten cell lines by WB. The expected molecular weight is 37 kD and a band of this size was seen in the CaCo2 lysates (denoted by the arrow). All of the lysates show several bands above and below this expected molecular weight at a lower signal level, such that if this antibody were validated only against a cell line transfected to overexpress HoxA1, these bands would have been missed. This level of noise as seen by WB raises concerns about nonspecificity. Figure 1B provides a representative example of the staining pattern seen with this HoxA1 antibody on breast carcinoma tissue. The predominately cytoplasmic staining for a homeobox transcription factor indicates that this antibody should not be used for IHC or IF methods. Figure 1C provides an example of a monoclonal antibody for phospho-4EBP1 (rabbit MAb, clone 236B4; Cell Signaling Technology, Beverley, MA, USA) showing a band of the expected molecular weight (denoted by the arrow) as well as numerous additional bands at higher molecular weights. Figure 1D is a representative example of the nuclear staining observed when using this antibody for IHC on FFPE lung carcinoma tissue when the expected localization for p-4EBP1 is cytoplasmic.
(A) Cell lysates were denatured and separated by SDS PAGE, transferred to nitrocellulose and blotted with a mouse polyclonal antibody against HoxA1. A band of the expected molecular weight is seen in CaCo2 lysate (marked by the arrow). Note the numerous bands in all cell lines at unexpected molecular weights. (B) A representative example of IHC staining on FFPE breast carcinoma tissue for HoxA1. Note the cytoplasmic staining for a nuclear transcription factor. (C) Western blot for phospho-4-EBP1. Arrow denotes band at the expected molecular weight; again, all lysates show numerous bands at unexpected molecular weights. (D) A representative example of IHC staining on FFPE lung carcinoma tissue for p-4EBP1. Note the predominantly nuclear localization for a protein expected to localize to the cytoplasm and nucleus.
Poor correlations between antibody lots are cause for concern, as demonstrated by a recent study from our lab on antibodies against the Met tyrosine kinase receptor (23). Strikingly, two different lots of the monoclonal 3D4 Met antibody stained on an array of 688 breast cancer cases showed opposite staining patterns—one nuclear and one membranous and cytoplasmic—with a regression between the two lots having an R 2 value of 0.038 (23). Another example of non-reproducible antibodies was demonstrated by Grimsey et al. (2008) with antibodies supposedly targeting the cannabinoid CB1 receptor. The authors tested multiple lots of antibodies from various commercial sources by both WB and IF against HEK cells transfected with HA-tagged CB1 receptor. Only two of the six antibodies tested showed specific membranous staining that co-localized with detection using an anti-HA antibody. Additionally, WB analysis on HEK cell lysates demonstrated that the antibodies with poor IF specificity detected proteins of incorrect molecular weight, no proteins at all, or proteins present in wild-type HEK cells which do not even express the CB1 receptor (24).
Some examples of non-reproducibility can be very subtle. Figure 2 shows two examples of what appears to be specific staining with expected localization. However, staining with the same conditions on a serial section of the TMA (representing the same patient tissues) shows an extremely poor correlation. VEGF is a secreted growth factor and as such, we expected cytoplasmic staining on patient tissues with the VG-1 clone ( Figure 2A ). Staining on a serial section of the same tissue microarray with the same lot of antibody was not reproducible, with an R 2 value of 0.016 ( Figure 2B ).
(A) IHC on FFPE lung carcinoma tissue with VG-1 (1:50 dilution) demonstrates specific staining of expected localization. (B) Serial sections of our lung TMA stained with VG-1 (1:50) do not correlate with each other. (C,D) IHC staining of serial sections of the same patient tissue showing a high level of positivity in Run 1 (C) is negative for VEGF in Run 2 (D). Inset represents cytokeratin staining of the tumor for reference.
WB is widely used to determine an antibody’s specificity and is an appropriate first validation step if the antibody recognizes the denatured antigen. The first indication that the antibody is specific for the selected target would be observing a single band at the known molecular weight for the target. Presence of multiple bands or bands not at the proper molecular weight could represent the same target at different post-translational modification status, breakdown products, or splice variants. However, such observations should raise concerns for using this antibody for further experiments. Major et al. (2006) published their database AbMiner as a resource for information including the immunogen, vendor, and antigen on over 600 commercially available monoclonal antibodies that the group validated via WB against pooled cell lysates from each of the NCI-60 cell lines. An antibody was considered validated if it produced a band (or bands) of the expected molecular weight(s) for the target protein (25).
While this type of antibody validation is a useful first step, it only guarantees that a given antibody will provide accurate results for WB analysis. If the goal is to use the antibody for IHC or IF, then the user must demonstrate that the antibody is also able to specifically recognize its target when used for those applications. Antibody-pedia is also an online portal for sharing antibody validation data for WB, IHC, and IF. Only antibodies commercially available are submitted, and in addition to a validation score, the inclusion of original experimental data is required. (26).
Antibody specificity has also been evaluated by using blocking peptides especially for IHC (27). These peptides are the sequences used to generate the antibody and are incubated with the antibody in great excess. The antibody with and without the blocking peptide is then used to stain tissue known to express the target of interest. If the antibody is specific, the addition of the blocking peptide will result in loss of staining on the tissue. An example using this technique was recently published (27) for the validation of phospho-specific antibodies for ERα. Although this method demonstrates that the antibody is specific for the immunogen from which it was generated, it does not prove selectivity of the antibody since off-target binding activity of the antibody will also be inhibited by pre-adsorption with the blocking peptide. So while blocking peptides can prove that an antibody is bad, when nonspecific staining is seen in the presence of the peptide, they cannot prove that an antibody is good. Blocking peptides had been used with a number of the G protein–coupled receptor antibodies and have been found not to be selective upon more stringent validation (13). Thus, we do not typically include blocking peptides as part of our antibody validation process.
The key to proving antibody specificity is often the correct use of controls. A negative control, including no primary antibody in an IHC or QIF experiment, is valuable but insufficient. A better negative control is a cell line or tissue that is known not to express the protein of interest. Knockout cells thus provide the best negative controls. Similarly, non-expressing cells, transfected with the protein of interest, provide the best positive controls. Since these reagents are often beyond the reach of many labs, there are other approaches that can be used to obtain comparable control results. Often there are readily available cell lines that have been biologically proven not to express a specific protein of interest. For example, H1650 cells are PTEN-null and thereby make a good negative control for PTEN antibodies. Similarly, overexpression can be valuable as a positive control. For example, A431 cells overexpress wild-type EGFR and can be used as a positive control for anti-EGFR antibodies. An alternative to knockout cell controls is siRNA or shRNA knockdown controls, as will be discussed with our Rimm Lab Algorithm of antibody validation.
Finally, another type of quality control slide has been developed by Sompuram et al. that couples peptides to glass slides that mimic the epitope of the native antigen and are therefore specific controls for a given monoclonal antibody (28). The drawback to this type of control is that it only works for monoclonal antibodies, as the antibody must interact with a known epitope in order to generate the appropriate peptide controls.
An important criterion for validation and standardization is antibody reproducibility. This means staining of the same antibody over time with different lots on different days, as well as comparison of a new antibody to either a previously validated antibody or to a second independent means of measuring the target that would yield similar results (29). There are not many studies on reproducibility of antibodies, since this is often assumed when using new lots of the same antibody or new aliquots of an antibody used previously by the same lab. Our lab published one such example of this type of evaluation (23), and a similar work to show assay reproducibility has been done for HER2 by Gustavson and colleagues (30).
A second issue related to reproducibility is when a new reagent is introduced and there is a desire to compare the new antibody to previously validated standards. This sort of study is seen more commonly in the literature. For example, van der Vegt et al. compared HER2 IHC staining of 283 breast adenocarcinomas on tissue microarrays with the 4B5 rabbit monoclonal antibody to the previously established CB11 mouse monoclonal antibody. Additionally, both antibodies were also compared to each other and to the corresponding FISH scores for each case. The authors found no significant differences in sensitivity, specificity, or predictive values between the two antibodies, validating that 4B5 could be used to assess HER2 expression (31). A second example by Zafrani et al. compared antibodies for ER and PR with IHC to biochemical determination using enzyme immunoassay and found significant links between the two methods (32). A similar study by Sasano et al. validated the use of an aromatase monoclonal antibody by correlating the IHC score with biochemical activity as determined by product isolation assays (33). Sometimes this is done in an attempt to prove superiority of a new reagent. In those cases, investigators must compare the reagents in the setting of intended use with outcome or response to therapy, not the previous assay, as the criterion standard. A good example of this is the work of Cheang et al. that showed the ER antibody SP-1 to be more prognostic and predictive than the current standard 1D5 (34), although this conclusion was later disputed by Brock and colleagues (35).
It is estimated that there are over 180 antibody companies that produce over 350,000 antibodies for the research and clinical markets (www.antibodyre-source.com/onlinecomp.html). So it is not surprising that there are commercially available antibodies against a huge collection of target proteins. When one buys a kit to isolate DNA, it is generally safe to assume it will not end up purifying protein. However, the same is not true for antibodies. Just because the manufacturer claims an antibody is specific for protein Z does not necessarily mean that it will only bind protein Z and not a range of other proteins (29,36). In 2005, Ramos-Vara noted that the information about antibodies targeting the same antigen varied significantly depending on the manufacturer (16). In fact, it is now clear that the responsibility for proof of specificity is with the purchaser, not the vendor. Different vendors provide different levels of validation, depending on their approach to the balance between making a profit and providing high quality. For this review, we sought to compare the level of validation and amount of information provided by seven different companies (referred to herein simply as Companies 1–7) to the randomly selected molecule AMPKβ1. The companies are not revealed since only 7 of over 180 were chosen at random; furthermore, the level of validation at some companies varies based on the product. When available, we compared the validation protocol on the datasheet for phospho-specific antibodies, as these require another level of specificity and validation (37,38).
In general, we found three levels of validation. The least validation was observed for Company 1’s phospho-AMPKβ1 polyclonal antibody. No information was readily available on its web site or datasheet describing any level of antibody validation criteria. The datasheet itself contained minimal information about the antibody including a brief background on the target with corresponding references, recommended applications and starting dilutions for WB, IF, and ELISA. However, no examples of successful use in any of these applications are provided and none of the references provided had utilized this antibody. The datasheet also includes the animal host and immunogen source as a synthetic peptide, but does not include exact peptide sequence, although the peptide is available for purchase for use as a blocking peptide. The datasheet also cautions that this antibody “may” cross-react with correspondingly phosphorylated AMPK β2.
A moderate level of validation was observed for Companies 2–5. These companies also did not provide any in-depth descriptions of antibody validation procedures. Datasheets all included background on the target, as well as information on the immunogen. Three out of four companies included the complete sequence used and the fourth identified the region surrounding the phosphorylation site as the sequence used. These companies also provided recommended applications with starting dilutions, and all provided at least one example of the antibody successfully identifying its target in one of the recommended applications. WBs— either with transfected cells expressing the target or pre-incubation of the antibody with blocking peptide—were the most commonly shown antibody validation examples.
The highest levels of validation were seen for Companies 6 and 7. Company 6 describes its validation procedures for phospho-specific antibodies to include WB analysis in (i) multiple cell lines, (ii) peptide and phospho-peptide competition experiments, and (iii) analysis of site-directed mutants. It also strived to demonstrate lot-to-lot consistency aided by combination of two or more pre-qualified individual lots to create each batch to minimize batch-to-batch variability. Company 7 describes their stringent validation protocol for all antibodies to include testing of each antibody by WB, IP, IHC, IF, flow cytometry, and ELISA. It verifies the specificity and reproducibility of antibodies by using appropriate kinase-specific activators or inhibitors when available; testing against a large panel of cell lines with known target expression; phosphatase treatment for phospho-specificity; comparison to isotype control; verification in transfected cells, knockout cells, and siRNA-treated cells; utilizing blocking peptides to eliminate all signal; verifying correct subcellular localization or treatment-induced translocation; comparing new antibody lots to previous lots; providing optimal dilutions and buffers, as well as specifying both positive and negative control cell lines. For IHC, Company 7’s antibodies are tested on paraffin-embedded cell pellets including cell pellets created after the cell lines are subjected to treatments known to induce signaling changes or treated with siRNA to block expression of the target. Tissue is treated with phosphatase to additionally test phospho-specificity on FFPE tissues. Xenografts with cell lines of known target expression or treatments to modulate expression are paraffin-embedded and then stained.
The datasheets for these companies include everything seen with Companies 2–5 plus additional examples of the successful use of their antibody. For example, Company 6 includes representative data for all validated applications with recommended antibody concentrations and includes flow cytometry, immunocytochemistry on a positive control cell line (including incubation with either the phospho-peptide or the non–phospho-peptide demonstrating signal absence only with the phospho-peptide), IHC of a positive control tissue, and a WB of a positive control cell line lysate also including detection after pre-incubation with the phospho- or non–phospho-specific peptides demonstrating a single band of the expected molecular weight with signal absent only after incubation with the phospho-specific peptide.
Antibodies purchased from Company 1 (which provided the minimal amount of information and no examples of successful use of the product) would require extensive validation by a researcher to demonstrate that the results describe only the target of interest; the company provides little in the way of assurances that it will work. Companies 2–5 all provide more information on their respective antibodies and include at least one example of the product successfully being used for at least one of their recommended applications with corresponding positive controls. While this is clearly preferable to Company 1’s approach, these antibodies would still need to be validated by the researcher for target specificity in the application of interest and for lot-to-lot reproducibility. Company 6 describes the validation steps it uses to determine that its phospho-specific antibodies are both specific and reproducible. Furthermore, for each recommended application, an example of successful use is provided with corresponding inhibition by phospho-specific blocking peptides. Company 7 provides what we consider the gold standard for antibody validation. The extensive in-house testing described provides a high level of confidence that the antibodies it provides will work for all the applications that are recommended for the particular antibody. It also provides a greater level of confidence that results obtained will be specific for the described target and will be reproducible among antibody lots. Even though Companies 6 and 7 provide extensive validation, the researcher is still obliged to confirm that the product gives specific and reproducible results in the cell lines or tissues of interest in the lab. Indeed, even the best companies cannot control what happens after the product leaves their door. Issues during shipping, inappropriate storage on or after arrival in the laboratory, and antibody contamination during usage are all potential sources of error in antibody-based testing. Thus vigilant and comprehensive controls should be done with each assay.
There are no uniform or enforceable standards for antibody validation. Unlike drugs—whose sale is prohibited without FDA approval—there is no federal agency governing what can be sold into the antibody-based assay market. In the future, we may see further FDA clearance of antibodies or more rigorous labeling and regulation of reagents to be used in clinical testing. However, to date, there is no universal standard. We therefore present our lab’s approach (the Rimm Lab Algorithm) to the validation issue. Our approach is not sanctioned or approved by any governing body or by any trade association, and thus we are not so bold as to call this a “recommendation.” Instead, we provide what we feel is a compromise between rigor and lab economics that results in a level of evidence sufficient for data dissemination. Our algorithm for antibody validation ( Figure 3 ) is especially focused for the end use application of IHC or QIF on paraffin-embedded tissues, but could be equally valid or modified for other antibody-based assays.