The 4 Challenges of Working With Data and How Cloud Can Help Solve Them
This article will discuss issues that adversely affect health care data and analytics, their role as improving patient treatment, and how cloud services such as Amazon Web Services (AWS), can help address these issues for health care providers. Because that’s what my work with, I will be focusing on large genomics data sets. However, I am aware that this is only a small fraction of the data that will be used in the future by health care providers.
Below are four areas that I believe will be the most challenging in the future of health care data: standardization, standardization, security, and scaling. While this column is focused on health care, many aspects can be applied to other industries.
Scalability
This is especially relevant to genomics, which has seen a decrease in costs and an increase in speed with which genomic data can now be provided. Pure storage (for raw and processed data) as well as compute power (which is what transforms that raw data into something usable). Consider also the U.S. federal mandate for hospitals to adopt electronic health records (EHRs). The volume of electronic data has grown and will continue growing.
Additionally, many other types of biological data, including images, audio, and streaming data from handheld devices, are now creating large and varied datasets. Scale is a problem. How can I run my queries, searches and algorithms against these huge datasets? Traditional storage, database, and compute systems are not able to keep up.
AWS and cloud technologies are a solution to this problem. The core value of cloud services lies in their ability to scale quickly and on-demand. Cloud storage and compute costs continue to drop. Cloud reduces upfront capital expenditures and the pay-per use model is an efficient way to make cost-effective decisions.
Sharing
Another U.S. federal mandate requires interoperability of health systems. This allows for sharing of data about patients who may be treated at multiple hospitals. Research requires constant sharing of data and giving access to others. This includes data sets that could complement, validate, or enhance your research. Multiple terabytes (or soon petabytes!) cannot be moved across the Internet at different levels. This means that a central repository is not possible.
Cloud services seem like a good solution for data sharing. It takes a lot of infrastructure and setup to securely share data (see more below). AWS, a cloud provider, still requires proper setup. However, this reduces the need to purchase physical resources which can further complicate things. It also allows for the consolidation of this sharing across multiple secured environments.
As data becomes more abundant, we begin to see that data movement is a major bottleneck. It is more difficult to buy bandwidth and build more capacity than it is to allow more people access the data.
Standardization
The data in the health care sector is very complex. A lot of effort has been put into standardizing data and how it is classified and shared. This is a simple example of how this can be important. Two hospital doctors could exchange information about a patient. This effort towards standardization is required by many industries, but health care is the most difficult. The complexity of the human body as well as the environment in which we live are not trivial.
There are many standards for data related to health care, depending on the topic or use. For laboratory results coding, the Logical Observation Identifiers Names and Codes is used (LOINC). For diagnos