We developed an insomnia classification algorithm by interrogating an electronic medical records (EMR) database of 314,292 patients. The patients received care at Massachusetts General Hospital (MGH), Brigham and Women's Hospital (BWH), or both, between 1992 and 2010. Our algorithm combined structured variables (such as International Classification of Diseases 9th Revision [ICD-9] codes, prescriptions, laboratory observations) and unstructured variables (such as text mentions of sleep and psychiatric disorders in clinical narrative notes). The highest classification performance of our algorithm was achieved when it included a combination of structured variables (billing codes for insomnia, common psychiatric conditions, and joint disorders) and unstructured variables (sleep disorders and psychiatric disorders). Our algorithm had superior performance in identifying insomnia patients compared to billing codes alone (area under the receiver operating characteristic curve [AUROC] = 0.83 vs. 0.55 with 95% confidence intervals [CI] of 0.76-0.90 and 0.51-0.58, respectively). When applied to the 314,292-patient population, our algorithm classified 36,810 of the patients with insomnia, of which less than 17% had a billing code for insomnia. In conclusion, an insomnia classification algorithm that incorporates clinical notes is superior to one based solely on billing codes. Compared to traditional methods, our study demonstrates that a classification algorithm that incorporates physician notes can more accurately, comprehensively, and quickly identify large cohorts of insomnia patients.