J Immunol Methods. 2016 Oct;437:53-63. doi: 10.1016/j.jim.2016.08.003.

Evaluation of highly sensitive immunoassay technologies for quantitative measurements of sub-pg/mL levels of cytokines in human serum.




The past decade has witnessed the introduction of many new technologies that aim to detect the presence of low abundance molecules in complex biological samples. These technologies are based on ligand binding assays (LBA) and employ a diverse array of techniques aimed at increasing assay sensitivity. For example, amplification of the detection signal and/or enhancement of the antibody-ligand interactions in the assay are often used to improve the assay sensitivity compared to the conventional enzyme-linked immunosorbent assays (ELISA). These enhancements led to the development of a new wave of assays that claim assay sensitivities reaching the sub-nanogram and even sub-picogram per mL levels of analytes. However, the sensitivity levels reported by technology manufacturers are typically based on the measurements of a pure recombinant protein in a buffer sample and often significantly underestimate the assay’s sensitivity for an endogenous analyte of interest in complex biological samples. It is not unusual that LBAs capable of quantifying very low levels of recombinant protein in a buffer sample fail to detect the much higher expected levels of the corresponding endogenous counterpart in biological samples.


Estimating the assay’s sensitivity to measure the endogenous analyte of interest is a challenging task and it is hard to accurately quantify it because endogenous analytes may be subjected to variety of posttranslational modifications, molecular interactions with matrix components, degradation and other factors making it virtually impossible to develop adequate standards that embody all these variations. Therefore, we introduced a new measure of assay sensitivity based on the frequency of endogenous analyte detection (FEAD) showing the proportion of evaluated samples that have the detectable levels of the analyte. Unlike the analytical assay sensitivity that is based on the detection of a recombinant protein, FEAD represents the assay’s ability to detect endogenous analyte in biological samples. Low FEAD values indicate that an assay is not capable of detecting an endogenous analyte in biological samples and, therefore, may be considered as not sensitive enough for the intended application. Caution should be taken when interpreting the high FEAD values as these may be artificially inflated and may not accurately reflect the assay sensitivity if there are significant non-specific interactions in the assay. For example, the non-specific background values that fall within the assay range do not embody endogenous analyte measurements yet will be accounted in FEAD. Therefore, it is important to ensure that the biomarker measurements are specific to adequately assess the assay sensitivity for endogenous analyte.


Ensuring the biomarker assays’ specificity is easier said than done because it requires having an independent validated reference method or adequate standards that could be used to assess the specificity and accuracy of the biomarker results. It is rather common that two assays with comparable assay sensitivities per manufacturer’s specifications may yield very different results and it may be challenging to decide which result is closer to the truth without validated standards or another independent validated test. One workaround this problem is to compare the biomarker results on several technology platforms ideally including the more established platforms such as ELISA and looking for common trends in the results between the platforms. The underlying assumption in such an approach is that the majority of the evaluated platforms should yield comparable biomarker trends and the results should correlate across different platforms.


We performed a comprehensive comparison of nine established and emerging LBA technologies with the primary focus on assay sensitivity and relative accuracy of biomarker measurements. To minimize potential biasing to a single analyte or assay, up to four different cytokine assays (IL-2, IL-17a, IL-6, and TNF-α) representing common cytokines that are typically detected in either the pg/mL or sub-pg/mL range were evaluated on each platform. The evaluated LBAs must be highly selective in capturing a cytokine in a complex mixture of other proteins in biological samples and/or amplify the signal representing the cytokine of interest in order to achieve these extreme sensitivity levels. All evaluated platforms employed one or both of these approaches in order to improve the assay sensitivity. For example, improved immunocapture on Milliplex and AMMP ViBE platforms is achieved by using beads compared to the microtiter wells in a traditional ELISA. Effective binding is also accomplished on the Ella platform by taking advantage of a large surface area and nanoflow fluidics. Other platforms such as high sensitivity ELISA, V-plex and Imperacer are primarily based on improved detection signal using tyramide signal amplification, electrochemisluminescence (ECL), and polymerase chain reaction (PCR), respectively. Platforms such as Erenna and Simoa combine both very sensitive digital counting of single molecules and improved capture efficiency on beads.


To ensure that assays yield consistent results, inter-assay and intra-assay precision were assessed using pooled human serum samples spiked with supernatants from mitogen-stimulated PBMC cultures. Most of the cytokine assays across the evaluated technology platforms had acceptable assay precision with automated platforms such as Simoa and Ella showing a marginally better precision. Next, we evaluated the analytical assay sensitivity using recombinant protein standards for each assay and platform. While this measure provides an estimate of assay sensitivity its utility is limited because the standards in each assay can be different and, therefore, may not be comparable across platforms. Also, the analytical sensitivity parameter is based on a recombinant protein in buffer which may not be representative of the assay’s ability to detect the endogenous analyte as commented earlier. Therefore, we evaluated the FEAD values for each assay and platform using 40 individual serum samples. As expected majority of assays with better analytical assay sensitivity also showed higher FEAD values (Fig. 1). However, there were several surprising findings as well. Some less sensitive assays based on analytical sensitivity yielded unexpectedly high FEAD values (e.g., Milliplex IL-17a results) while others with sub-pg/mL analytical sensitivity (e.g., V-Plex IL-2) failed to detect endogenous analyte in biological samples.



Fig. 1. Correlation of FEAD and analytical sensitivity values. The regression line with the 95% confidence intervals is shown as solid and dotted lines, respectively. Data points proven to be outliers by the pairwise correlation analysis presented later were excluded from the linear regression.


The high FEAD values in less sensitive assays were especially questionable suggesting that all measurements may not be specific to the measured analyte. Therefore, we evaluated the relative accuracy of biomarker results by performing the pairwise correlations of the measured biomarker values for the evaluated technologies. Not surprisingly, the suspected non-specific results (e.g., in Milliplex IL-17-a, IL-2, and TNF-α) had poor correlation with other assays. The correlation between platforms such as Simoa and Erenna, on the other hand, was excellent, lending confidence in the ability of these platforms to measure cytokines at low levels. Another interesting observation was that biomarker levels on some platforms differed 10- to 100-fold (e.g. Milliplex or Imperacer IL-2 results vs Simoa IL-2 results), which was not surprising because the results from these platforms did not correlate well with more sensitive platforms raising questions on what is actually being measured in these assays.



Fig. 2. Numbers in the table correspond to FEAD values. Passed if both FEAD and correlation passed; Failed if either FEAD or correlation failed; Maybe all other in between cases; NA = Not Assessed. The ELISA performance for IL-2 could not be determined due to failure of the calibrator standards during the evaluation.


In addition to the correlation analysis, the accuracy of quantitative assessments was also evaluated using the assay parallelism evaluations, which is performed by serially diluting the serum samples and comparing the results within the analytical sensitivity. As expected the platforms that performed best in sensitivity assessments also showed good parallelism.


Upon completion of the analysis, data for each criterion was ranked in order to assess the relative performance of each platform. The Simoa (Quanterix) and Erenna (Singulex) platforms had the strongest performance based on these rankings. These are platforms that measure a single cytokine, use beads to improve cytokine capture efficiency and have proprietary algorithms to improve detection. Milliplex that is also bead based, but, with the ability to measure multiple cytokines suffered from possible cross-reactivity of the antibody pairs used and scored low in the correlation and parallelism criteria. Signal amplification technology platforms such as the high sensitivity ELISA, V-Plex and Imperacer were unable to measure all of the cytokines and have the drawback of background signal amplification. In terms of emerging technologies (Ella, AMMP ViBE and BAT), the Ella scored consistently well on each criterion. It is likely that AMMP ViBE and BAT, being rather nascent, require further optimization of the reagents, assay conditions and level of operator skill.


However, the purpose of this assessment was not to promote one technology over another, but to demonstrate the relative strengths of each platform. For example, platforms such as the V-Plex (MSD), Milliplex (Merk Millipore) and Biochip Array Technology (RANDOX) have multiplexing capabilities not present on the evaluated Simoa or Erenna assays, and platforms such as the High Sensitivity ELISA (eBioscience/R&D Systems) have greater ease of use than some of the more complex technologies. While our assessment was limited to a few cytokines, it is our hope that researchers will utilize a systematic approach as that we have described in order to make the best decisions when purchasing instrument platforms in pursuit of their research goals.