Data Management to Data Science – It’s Time!
Over 70% of clinical trial data today comes from non-case report form data. Non-case report form data includes continuous data such as wearable data, lab data, patient reported data and ECG data, to name a few. Generally speaking, these are all collected as source without transcription or entry into another system. Today, we have reached a critical point where we can actively make choices about how data management will evolve. Otherwise, outside forces will drive change. After years of discussion, the time has come for the transition from data management to data science.
As reported by the Society for Clinical Data Management in their recently released innovation white paper, “The 5Vs of Collecting Clinical Data,” industry must rethink their approach and understand how the 5Vs of data are reshaping clinical data management.
What are the 5Vs?
Let’s start with defining the 5Vs of data. Volume, variety, velocity, veracity, and value make up the 5 elements of data that we see today in clinical trials.
Volume
The volume of data in a Phase III trial has increased exponentially over the past decade, rising from data points counted in the millions to data points in the billions. This directly relates to increases in wearable/sensor data, real- world data, and increased biomarker data, to name a few.
Variety
The variety of data collected today in clinical trials has increased. Significantly, this has been accentuated by the pandemic and the rising adoption of tools and technologies by sponsors to drive more patient centric approaches. This means that more and more direct capture of data as electronic source (eSource) is occurring in clinical trials today.
With this variety of data, the authors of the 5Vs white paper indicate that: “This means that feedback on the data quality and integrity of this variety of eSources needs to be provided at the time of data generation. After data is generated, CDM will rarely be able to send a query to request its correction.”
Velocity
It goes without saying that the speed and intensity of the data we collect today has also increased tremendously. Wearables, sensors, and other continuous data sources are increasing in use within certain therapeutic areas. This increase in velocity requires the need for the evolution of data management to data science. Applying data aggregation, analytics and trend/outlier assessments is key for the management of this data velocity.
Veracity
The 5Vs white paper indicates that:“Often, veracity is associated with the key attributes of data integrity and ALCOA+ (Attributable, Legible, Contemporaneous, Original, Accurate, Complete, Consistent, Enduring, and Available). Veracity also can be associated with some of the attributes of data quality such as data conformity, credibility, and reliability. <Because of this, organizations> must establish proactive measures to secure the authenticity and security of the data. This is becoming critical in the world of e-Source and Real World Data, where data can rarely be corrected and where anonymization is increasingly challenging and critical.”
Value
In today’s clinical trial environment, we are swimming in data. Are these data valuable? Are the edit checks and logic checks that we program adding any value? The 5V white paper encourages the maximization of the relative value of a specific data point. In moving from traditional data management to data science, the 5th V of the 5Vs, value, implores industry to look at the fact that the value of data goes beyond integrity and quality. When planning your clinical trials, it is important to consider and understand the full potential and value of data. Clinical data science sits at the intersection of ensuring that all stakeholders get what they need to be successful.
What does it mean to move from data management to data science?
Sponsors and CROs Need to Embrace a True eSource Model Across all Data
What if you could tap into a network of nearly 2,000 research sites that are already technology enabled, patient-driven, quality minded and are leaders in diversity, equity and inclusion? And, better yet, what if you could harness the data that these sites (including home health care providers and other nontraditional players like pharmacy chains) are already collecting without requiring an additional data collection tool?
Stop Forcing Unnecessary and Expensive Technology On Sites
Many research sites have already embraced technology to run studies and collect data. Sponsors can actively review and monitor the data collected using this technology. However, many sponsors spend countless hours and massive amounts of money on third party tools that require sites to re-enter data they’ve already collected. In fact, site users are the only users in the entire research process that have no say on the technology they use in their day to day job.
Protocol-Driven eSource
Sponsors and CROs can utilize CRIO to define study-specific eSource templates. These are derived directly from the protocol. Significantly, these templates are ordered in line with the schedule of assessments. The CRIO eSource templates then form the foundation of what is pushed out to both pre-existing and new CRIO sites. Finally, these eSource templates will serve the purpose of collecting protocol-compliant data.
Beyond this, sites are still able to create and include their own site-specific procedures in addition to what is being provided by the sponsor. This approach ensures a protocol compliant collection of data at the time of the patient encounter. It also includes real-time validation checks to interrogate the data at the time of entry. This approach leads to better quality, fewer post data entry queries, enhanced protocol compliance and a host of process efficiencies. Accordingly, this affords sites more time to focus on the critical aspects of the trial such as recruitment, retention, patient follow up, and study drug accountability.
With 2,000 sites using CRIO eSource across nearly 5,000 active protocols, CRIO now has a large database to quantify the impact on site performance. Our case studies and quantitative research prove that sites can reduce protocol deviations by 40% and lower the risk of a negative FDA audit finding by 70% when using CRIO. In addition, we have demonstrated that sites that use CRIO outperform non-CRIO sites in enrollment in 2 out of 3 trials, with a median outperformance of 40%. And finally, across the entire CRIO site network, we are observing racial diversity that is twice the industry average.
Use eSource to Power the Evolution from Data Management to Data Science
CRIO’s system can automatically send site eSource data directly into a sponsor-facing application called Reviewer EDC, which lets the clinical team review the data remotely. Because of the system’s built-in edit checks at time of capture, the data are more likely to enter the system accurately and completely. Because the eSource is now effectively the same as the eCRF data, there is no need for traditional onsite monitoring or source data verification.
Within this framework, sponsors and CROs should embrace a centralized clinical data monitoring approach. This provides for a single primary workflow for clinical oversight and review of data as they are entered. In this workflow, monitoring teams can review data, issue and close queries, track source data changes, lock data, and medically code verbatim terms. This streamlines the data review and management approach, eliminating further redundant data review. The figure below depicts how the traditional five workflows can be simplified into a reimagined workflow.
Sponsors need to reimagine the clinical trial process and leverage what is in plain sight:
CRIO’s dual approach of addressing the needs of sites and sponsors creates value for both stakeholders, reducing the number of redundant and administratively burdensome systems forced on sites and driving efficiencies in cost and time.
Data Management must Evolve to be Data Science
To establish a cohesive central clinical data monitoring team, we must dismantle departmental barriers and segmentation. It is critical to align the varied roles and functions of medical coding, clinical data monitors and data scientists, under a unified approach. When done correctly, this will support the evolution towards central data monitoring and data science as the primary approach for data oversight and monitoring.
Ultimately, while each organization may take a slightly different approach to organizing the best model for ensuring data quality oversight, the Society for Clinical Data Management’s White Paper, How to Create a Clinical Data Science Organization encourages organizations to have Clinical Data professionals be an integral part of the monitoring approach.
Develop an Integrated Clinical Data Management Plan
At the end of the day, it’s all about managing your clinical data. Through the use of tools like CRIO, sponsors and CROs can embrace an integrated approach to managing clinical data. Developing and maintaining a plan that brings together the needed elements of clinical data monitoring during study start up, study conduct and study closeout, a more streamlined approach can be outlined and managed. Understanding and outlining all of the needed data review, patient safety and data oversight in one comprehensive plan, is key. An integrated plan allows for enhanced and direct oversight over the critical elements of patient oversight and safety monitoring. This integrated approach further strengthens the use of CRIO for the direct capture of patient data. Altogether, CRIO eliminates the redundant layers of data review by rolling patient data review into a singular integrated role.
Additionally, through this approach, data managers can be liberated from unnecessary form-by-form review. Data professionals will continue to focus on the data management plan and support the new clinical data monitoring process through the provision of listings, data analytics, trends and outliers. CRIO enables much more accurate and contemporaneous access to site source data. This allows data professionals to run better analytics and augment their intelligence.
Significantly, sponsors don’t have to invent a new process. Instead, they can use a process that is already available and in plain sight. The world’s largest data management and data science society, SCDM, further encourages and endorses this process. It’s not a challenge of invention, discovery or even innovation. It’s a challenge of re-imagination!
Related Reading: Data Blast from the Past: From Triplicate to eCRF