Data

The Shale Network is a project funded by the National Science Foundation to help scientists and citizens publish data about water resources that may be affected by gas exploitation in shale. Started in November 2011, the project was initiated by scientists from Penn State, the University of Pittsburgh, Dickinson College and from throughout Pennsylvania.

"Our goal is to find, organize, and upload data for water resources for online publication." The Shalenetwork

Making Knowledge from Numbers

Since 2010, a booming interest has developed among watershed associations to collect stream data in the region of Marcellus Shale development. Toward that end, members of the ShaleNetwork steering committee, Candie Wilderman and Julie Vastine (ALLARM, Dickinson College), have been working since 2010 to develop a volunteer-based Marcellus Monitoring protocol. Already, over 700 volunteers across the Shale region are collecting data. In addition to community-based data, data are being collected at the county, state, and federal agency levels, and by members of colleges, universities, and the gas industry. We have identified a great need to organize, utilize, and interpret the continuously accumulating data. The ShaleNetwork is working to develop a database of PA waters in the gas production region as a mechanism to pull together this network of research and citizen scientists and to understand the water quantity and quality data – to "make knowledge from the numbers".

Shalenetwork Data Focus

The focus of the database is streamwater data with lesser emphasis on groundwater, flowback water, production water, and formation water. We are collecting data from private water wells. Such sampling establishes “pre-drilling” water quality. Since these data are considered to be private, however, they are generally not available for use in the public realm. However, we have a strong relationship with the Marcellus Shale Coalition, an industry group that may allow us access to some groundwater data. Furthermore, several groundwater monitoring projects are being conducted by university researchers and, if time allows, this data will be included in the water quality database.  We plan to devote some of the time of the second and third workshops to identify the best means to engage watershed stakeholders to collect meaningful groundwater quality data to establish water quality conditions in areas of significant natural gas development for future data synthesis efforts.   

Quality Assurance and Quality Control

We accept project data from associations, groups, or agencies in all formats.  Data that will be input into the ShaleNetwork database will generally be stored first as Excel files. Pitt and Penn State will assemble these files for data entry into the CUAHSI Hydrologic Information System (HIS). The servers at PSU and Pitt are secure and are backed up nightly.

The ShaleNetwork database team will accept data and organize it and place it online so that citizens and researchers can understand whether development of shale gas is affecting water quality and quantity.  Assessment of the quality of data is very difficult, regardless of whether data is collected by scientists or volunteers. Our overall philosophy will be to accept data and place it online for the community of citizens and researchers to evaluate. Levels of data quality will be indicated within the database as a metadata field. All data in the database will not have the same data quality.

Quality assurance refers to measures taken to ensure data meets data quality standards; quality control means the actions implemented to achieve quality. Data quality objectives include that the data must be credible and of sufficient value for timely response to problems. We accept data from volunteers who have received training from service providers in the state of PA such as ALLARM. This training generally consists of close examination of the monitoring manuals, laboratory training on equipment, and field training including chemical monitoring, flow measurement, and visual assessment. Meters for measurement of total dissolved solids (TDS) or conductivity are calibrated with standard solutions before each use and are stored according to manufacturer specifications between use. Volunteers generally work with ALLARM to pass a split sample quality control test annually. Specifically, monitors generally use TDS meters to test waters and then collect an extra set of water samples to send with their data to the ALLARM lab. At the lab, the water is tested using the monitors’ equipment as well as laboratory equipment and results are compared to volunteer data. If precision is acceptable, volunteers have passed quality control and can continue monitoring to provide data. If precision is not acceptable (outside limits), ALLARM re-trains volunteers. All methods are documented. ALLARM is responsible for QC/QA with volunteer groups as appropriate.

ShaleNetwork Database Technologies

The ShaleNetwork database is growing to include data from published and unpublished sources, from citizen scientists, from county, state, and federal agencies, from industry, and from researchers. Metadata within the database indicate the source of the data. The database will be easily accessible through the CUAHSI Hydrodesktop utility. Hydrodesktop accesses many databases other than the ShaleNetwork, and many of these databases will also have utility for understanding water quality and quantity issues with respect to the Devonian shale gas plays. The ShaleNetwork team is working with other entities such as the Susquehanna River Basin Commission to help provide their data appropriately, either within the ShaleNetwork database or at least accessible by HydroDesktop.

We are using an existing standard for time series data: the Observations Data Model (ODM) (Horsburgh et al., 2008). We are also following the CUAHSI-HIS data and metadata standards available in the same reference.  For GIS coverages and spatial data, we are following the Federal Geographic Data Commission’s (FGDC) US Federal Metadata standard, the Content Standard for Digital Geospatial Metadata (CSDGM),Ver.2.

Publishing and Accessing Data with CUAHSI HIS

The CUAHSI HIS (Hydrologic Information System) is designed to manage and to publish data collected at fixed points, such as wells and surface water sampling stations. CUAHSI and the ShaleNetwork are establishing a HydroServer to host the ShaleNetwork database. CUAHSI already interacts heavily with the USGS, and USGS data are already a part of the CUAHSI HIS, or can be “ingested” easily. One of our collaborators, Jim Campbell (USGS PA Water Science Center), is facilitating our interactions in regard to USGS data for PA. Furthermore, CUAHSI has also worked with several state agencies around the country to “ingest” state data into the HIS.

We are using this service because we have data from various sources that we want to make accessible online – this is called publishing the data -- in a standard way so that everyone can access and use it. To do this, CUAHSI is going to build the ShaleNetwork a Hydroserver. A Hydroserver is essentially a Windows XP computer running HIS software designed for data publication. Because it does not require a commercial license we are using HydroServer Lite. HydroLite uses the Observations Data Model for data storage and a WaterOneFlow web service for data publication.

For some datasets that we are given from data providers, we are using the application ODM Data Loader (ODMDL) to load our data files into an ODM database. This method is appropriate for data that are the result of a project or study that has been completed and will not need periodic updating. For data that are being continuously updated (i.e., data streaming from sensors in the field) we will use the ODM Streaming Data Loader (also free software). The concept behind ODMDL is that it accepts input data in table format (Excel, CSV, or tab- separated), and it is designed to enable the loading of data into ODM tables either one at a time or as bulk data loading from a single file into multiple tables all at once. Once data are loaded into an ODM database, we can look at the data using the application ODM Tools. ODM Tools provides query and visualization tools. If the data look good, we can then publish the data with a WaterOne Flow web service. This service essentially hooks directly into an ODM database to publish data from that database. When data are published this way, it results in a standard output format called WaterML.

The next step after publishing is to register our WaterOneFlow web service. Registering is necessary in order for people to access our data (data discovery). To do this, HIS Central is a website maintained by the CUAHSI HIS team where we register our WaterOneFlow web service. Our service will then be discoverable along with dozens of other web services registered with the system (including services for the USGS NWIS and the EPA STORET datasets). HIS Central is the largest single catalog of the nation’s water data. HIS Central includes a free and open source desktop application called Hydrodesktop. Hydrodesktop allows users to search for data across all registered data sources at once. Once we’ve registered our service and can see our data show up in Hydrodesktop, we have completed the data publication process.