Algorithm development to estimate pregnancy data from Electronic Health Records

A Use Case from Argentina

Carolina Mengoni Goñalons

Buenos Aires City Government | University of Buenos Aires

Juliana Reves Szemere

Buenos Aires City Government | National University of San Martín | National Pedagogical University

María Cristina Nanton

Buenos Aires City Government | University of Buenos Aires

Information and Health Statistics Management Office

Ministry of Health of the City of Buenos Aires

Organizational chart of the Health Information and Statistics Management under the Ministry of Health of the City of Buenos Aires. The chart shows a hierarchy starting with the Undersecretariat of Health Planning, followed by the General Directorate of Health Information Systems, which oversees four areas: Health Informatics, Software Development, Technology, Telecommunications and Support, and Health Information and Statistics Management.

Map of South America, zooming in the City of Buenos Aires. Map of South America, zooming in the City of Buenos Aires.

¿Why does our office need to detect and estimate pregnancies and delivery dates?

Diagram showing information flow from health centers to information systeem to electronic health records

  • Free-text input by healthcare professionals
  • Rich, high-value data
  • Longitudinal pregnancy monitoring
  • Critical obstetric metrics

Background

Use of structured data to estimate pregnancy outcomes:

Capture of the journal article titled 'Development and evaluation of MADDIE: Method to Acquire Delivery Date Information from Electronic Health Records,' published in the International Journal of Medical Informatics, Volume 145, January 2021. Authors listed are Silvia P. Canelón, Heather H. Burris, Lisa D. Levine, and Mary Regina Boland.

Aim: Develop an algorithm that infers patient delivery dates (PDDs) and delivery-specific details from Electronic Health Records (EHRs) with high accuracy, enabling pregnancy-level outcome studies in women’s health.

Tools

R-based, open-source ecosystem for scalable health data processing

Image with the logos of the tools Gitlab, tidyverse, dplyr, dbplyr, tidyr, lubridate, stringr, janitor and agiseR. Above them, a screen capture of the structure of some of GOGIES' DataWarehouse tables, and a diagram of SIGEHOS database infrastructure being ingested in the DataWarehouse

agiseR

is an internal R package developed by our team to facilitate access to local databases and automate reporting workflows.

Automated Detection of Pregnancies

Records and pregnancies

Diagram illustrating patient timelines. Each horizontal line represents an individual patient, with black-outlined circles representing health records and red segments indicating pregnancy intervals.

Diagram illustrating patient timelines. Each horizontal line represents an individual patient, with black-outlined circles representing health records and red segments indicating pregnancy intervals.

Automated Detection of Pregnancies

Extraction of Gestational Age (GA) values

Diagram showing a patient's timeline. Black-outlined circles represent electronic health records from which a value of Gestational Age could be extraced, while grey circles indicate no GA was obtained

This patient timeline shows health records that may or not contain information on gestational age, GA (black circles or grey circles).

Automated Detection of Pregnancies

Clustering of records

Diagram showing a patient's timeline. Grey outlined circles represent non-informative records. Coloured circles represent records refering to different pregnancies according to their colour

These patient’s health records belong to different pregnancies (differently coloured circles). Our algorithm identifies them by comparing differences in GA values and the time gap between records.

Automated Detection of Pregnancies

Delimiting of detected pregnancy

Diagram showing a patient's timeline. Coloured segments represent the reconstructed pregnancies based on the patients records, the result of the algorithm. Below, bold text indicates variables that characterise the pregnancy while regular text indicates operations performed

As a result of the algorithm, two pregnancies have been reconstructed based on the patients records (differently coloured segments). Below, bold text indicates variables that characterise the pregnancy while regular text indicates operations performed.

Automated Detection of Pregnancies

Updating

Flowchart illustrating how a new medical record is classified in relation to pregnancy detection. It begins with a new record and asks whether it matches someone with a detected pregnancy. If not, it's marked as a new pregnancy. If yes, it checks if the pregnancy is already known. If not, it's also a new pregnancy. If yes, and based on comparing changes in gestational age versus record date, the record is classified as new information about an ongoing pregnancy, indicating either end of pregnancy or higher gestational age.

Project’s impact

Available pregnancy patient information and efficient processing for secondary uses of health data.

Primary healthcare

  • Alert rules for specific conditions
  • Result indicators for appropriate checkups
  • Managing conditions alongside a pregnancy

Healthcare cost recovery

  • Generation of tracer indicators

Mandatory reportable diseases

  • Differentiate pregnant patients for Epidemiological Bulletin

Thank you for your time!