Clin Pharmacol Ther. 2017 May;101(5):667-674. DOI: 10.1002/cpt.526

Development and validation of algorithms for the detection of statin myopathysignals from electronic medical records.


SL Chan1, MY Tham2, SH Tan2, C Loke2, BPQ Foo2, Y Fan2,3, PS Ang2, LR Brunham1,4 and C Sung2,5

1Translational Laboratory in Genetic Medicine, Agency for Science, Technology and Research, Singapore

2Vigilance and Compliance Branch, Health Products Regulation Group, Health Sciences Authority, Singapore

3Genome Institute of Singapore, Singapore

4Department of Medicine, Center for Heart and Lung Innovation, University of British Columbia, Canada

5Duke-NUS Medical School, Singapore



The purpose of this study was to develop and validate sensitive algorithms to detect hospitalized statin-induced myopathy (SIM) cases from electronic medical records (EMRs). We developed four algorithms on a training set of 31,211 patient records from a large tertiary hospital. We determined the performance of these algorithms against manually curated records. The best algorithm used a combination of elevated creatine kinase (>4× the upper limit of normal (ULN)), discharge summary, diagnosis, and absence of statin in discharge medications. This algorithm achieved a positive predictive value of 52-71% and a sensitivity of 72-78% on two validation sets of >30,000 records each. Using this algorithm, the incidence of SIM was estimated at 0.18%. This algorithm captured three times more rhabdomyolysis cases than spontaneous reports (95% vs. 30% of manually curated gold standard cases). Our results show the potential power of utilizing data and text mining of EMRs to enhance pharmacovigilance activities.



Statins are commonly used cholesterol-lowering agents, but they occasionally are associated with muscle toxicity, ranging from asymptomatic elevations of creatine kinase to muscle weakness to extensive degradation of muscle tissue known as rhabdomyolysis, which can lead to serious renal complications and death. A recurring challenge for pharmacovigilance programs is to comprehensively characterize the scope and types of adverse reactions to drugs.  Adverse event reports submitted to regulatory authorities by companies and health care professionals have an unknown and variable degree of underreporting, hence mining of electronic medical records (EMRs) offers a complementary pathway to identify cases, which could be more efficient and provide more accurate estimates of incidence. Here we successfully developed an algorithm for statin myopathy cases by combining laboratory data, discharge summaries, diagnoses and discharge prescriptions using EMRs from a 1230-bed tertiary healthcare institution in Singapore. Mining of EMRs has its set of challenges, such as inconsistency of drug and laboratory orders (misspellings and multiple ways of writing prescriptions for the same drug or laboratory test), limited drug history prior to a hospitalization, lack of diagnostic coding when an adverse drug reaction is not the primary reason for hospitalization, varying amounts of data in the record for different patients, and different ways that clinicians describe an adverse event in free text.  There are also limitations to interpretation of entirely retrospective, observational data.  Nonetheless, the work described in this paper demonstrates one successful example of utilizing big data analytics to extract meaningful information from EMRs.


While randomized controlled trials (RCTs) remain the gold standard for generating the most rigorous tests of hypotheses, societies’ capacity and resources to conduct RCTs to answer the myriad of questions facing healthcare systems are limited.  EMRs and other sources of healthcare data such as claims and billing data, disease registries and even patient wearable devices provide real world data.  These can be mined to improve the efficiency of carrying out regulatory responsibilities, especially in post-marketing surveillance of rarer events. With more accurate estimates and near real-time detection of cases, more timely action can be taken to safeguard public safety. This was envisioned when the US Food and Drug Administration’s Sentinel Initiative and Japan’s Pharmaceutical and Medical Devices Agency’s MIHARI were conceived (1,2).  Guidance and strategic directions from the US Food and Drug Administration (3,4) as well as formation of international collaborative networks such as the Observational Health Data Sciences and Informatics (OHDSI, 5) underscores the potential value that can be unlocked from real world healthcare data that can add to the robustness of evidence for the safety and efficacy of drugs and devices.


The ability to find more verifiable cases of adverse drug reactions and estimate incidence, as was achieved in this paper, are the first steps in building an effective pharmacovigilance program.  These cases can then be fed into predictive modelling, to identify the variables associated with increased risk. Prediction methods could range from simple regression to machine learning. In addition, predictive models can incorporate new information as they become available to continually improve prediction. More importantly, these models can offer insights on preventive interventions, feeding findings back to the healthcare system and changing practice, to realize the full potential of “Learning Healthcare Systems” (6).



  4. Gottlieb S. Statement from FDA Commissioner Scott Gottlieb, M.D., on Administration’s request for new FDA funding to promote innovation and broaden patient access through competition [Internet]. 2018. Available from:
  6. Foley T, Fairmichael F. The Potential of Learning Health Systems [Internet]. 2015. Available from: