Making Knowledge from Numbers
Since 2010, a booming interest has developed among watershed associations to collect stream data in the region of Marcellus Shale development. Toward that end, members of the ShaleNetwork steering committee, Candie Wilderman and Julie Vastine (ALLARM, Dickinson College), have been working since 2010 to develop a volunteer-based Marcellus Monitoring protocol. Already, over 700 volunteers across the Shale region are collecting data. In addition to community-based data, data are being collected at the county, state, and federal agency levels, and by members of colleges, universities, and the gas industry. We have identified a great need to organize, utilize, and interpret the continuously accumulating data. The ShaleNetwork is working to develop a database of PA waters in the gas production region as a mechanism to pull together this network of research and citizen scientists and to understand the water quantity and quality data – to “make knowledge from the numbers”.
What are we doing now?
We are currently working on the following:
i) identifying and collecting data from groups collecting water data or from publications of data for any time period from the region of extraction of natural gas from Devonian shale in the PA-WV-NY-NJ-VA-OH region, with an emphasis on PA;
ii) working to create a sustainable network among these groups by arranging an annual meeting to discuss and organize this data;
iii) working with the Consortium of Universities for the Advancement of Hydrologic Sciences, Inc. (CUAHSI) to organize the data that we are collecting into a water database that can be used to establish background concentrations and to assess impacts across the shale gas play region;
iv) training two graduate students in database development and use for betterment of communities impacted by shale gas extraction;
v) facilitating ALLARM to help community groups in organizing, collecting, and interpreting water data;
vi) evaluating hydrogeochemical data using geographic information systems (GIS) that incorporate population and economic data in order to evaluate the potential for public health risk factors.
List of Analytes Included So Far in ShaleNetwork Database*
Common water quality measurements: pH, Na, K, Mg, Ca, Sulfate, Chloride, Bromide, NH4, Nitrate, Nitrite, Total N, Acidity, Alkalinity, BOD, COD, Hardness, TDS, Specific conductance, TSS
Trace elements: Al, As, Ba, Be, B, Cd, Cr, Co, Cu, Fe, Pb, Li, Mg, Mn, Hg, Mo, Ni, Pb, Se, Ag, Sr, Th, U, Zn
Naturally occurring radioactive material measurements: Gross alpha, Gross beta, Ra-226, Ra-228
Organic constituents: Acetophenone, Benzene, Bis(2-ethylhexyl) phthalate, Ethylbenzene, Ethylene glycol, Methanol, Methylene blue active substances, Napthalene, Oil and grease, Phenolics, Toluene, Xylenes
*Not all analytes are included for every measurement
CUAHSI HIS
The CUAHSI HIS (Hydrologic Information System) is designed to manage and to publish data collected at fixed points, such as wells and surface water sampling stations. CUAHSI and the ShaleNetwork are establishing a HydroServer to host the ShaleNetwork database. CUAHSI already interacts heavily with the USGS, and USGS data are already a part of the CUAHSI HIS, or can be “ingested” easily. One of our collaborators, Jim Campbell (USGS PA Water Science Center), is facilitating our interactions in regard to USGS data for PA. Furthermore, CUAHSI has also worked with several state agencies around the country to “ingest” state data into the HIS.
What can you do to help if you have data?
We have begun inputting data into our database. We have input published data, new data, online data, and unpublished data. We seek datasets from community groups, researchers, government agencies, industry, and river commissions. If you would like to submit a database or begin working with us, please fill out this form or email the information or an inquiry to the Director of the ShaleNetwork, Sue Brantley:
Your name:Your email address:
Name of your organization:
What specific measurements are you making?
Where are you making these measurements?
What protocol or quality control are you using?
How long have you been making measurements?
In what form are you keeping your data?
What help do you need with your data?
How soon are you interested in beginning to share your data?
More Background on the ShaleNetwok Database
The focus of the database will be streamwater data with lesser emphasis on groundwater, flowback water, production water, and formation water. We are collecting data from private water wells. Such sampling establishes “pre-drilling” water quality. Since these data are considered to be private, however, they are generally not available for use in the public realm. However, we have a strong relationship with the Marcellus Shale Coalition, an industry group that may allow us access to some groundwater data. Furthermore, several groundwater monitoring projects are being conducted by university researchers and, if time allows, this data will be included in the water quality database. We plan to devote some of the time of the second and third workshops to identify the best means to engage watershed stakeholders to collect meaningful groundwater quality data to establish water quality conditions in areas of significant natural gas development for future data synthesis efforts.
Quality Assurance and Quality Control
We accept project data from associations, groups, or agencies in all formats. Data that will be input into the ShaleNetwork database will generally be stored first as Excel files. Pitt and Penn State will assemble these files for data entry into the CUAHSI Hydrologic Information System (HIS). The servers at PSU and Pitt are secure and are backed up nightly.
The ShaleNetwork database team will accept data and organize it and place it online so that citizens and researchers can understand whether development of shale gas is affecting water quality and quantity. Assessment of the quality of data is very difficult, regardless of whether data is collected by scientists or volunteers. Our overall philosophy will be to accept data and place it online for the community of citizens and researchers to evaluate. Levels of data quality will be indicated within the database as a metadata field. All data in the database will not have the same data quality.
Quality assurance refers to measures taken to ensure data meets data quality standards; quality control means the actions implemented to achieve quality. Data quality objectives include that the data must be credible and of sufficient value for timely response to problems. We accept data from volunteers who have received training from service providers in the state of PA such as ALLARM. This training generally consists of close examination of the monitoring manuals, laboratory training on equipment, and field training including chemical monitoring, flow measurement, and visual assessment. Meters for measurement of total dissolved solids (TDS) or conductivity are calibrated with standard solutions before each use and are stored according to manufacturer specifications between use. Volunteers generally work with ALLARM to pass a split sample quality control test annually. Specifically, monitors generally use TDS meters to test waters and then collect an extra set of water samples to send with their data to the ALLARM lab. At the lab, the water is tested using the monitors’ equipment as well as laboratory equipment and results are compared to volunteer data. If precision is acceptable, volunteers have passed quality control and can continue monitoring to provide data. If precision is not acceptable (outside limits), ALLARM re-trains volunteers. All methods are documented. ALLARM is responsible for QC/QA with volunteer groups as appropriate.
ShaleNetwork Database, the nitty-gritty
The ShaleNetwork database is growing to include data from published and unpublished sources, from citizen scientists, from county, state, and federal agencies, from industry, and from researchers. Metadata within the database indicate the source of the data. The database will be easily accessible through the CUAHSI Hydrodesktop utility. Hydrodesktop accesses many databases other than the ShaleNetwork, and many of these databases will also have utility for understanding water quality and quantity issues with respect to the Devonian shale gas plays. The ShaleNetwork team is working with other entities such as the Susquehanna River Basin Commission to help provide their data appropriately, either within the ShaleNetwork database or at least accessible by HydroDesktop.
We are using an existing standard for time series data: the Observations Data Model (ODM) (Horsburgh et al., 2008). We are also following the CUAHSI-HIS data and metadata standards available in the same reference. For GIS coverages and spatial data, we are following the Federal Geographic Data Commission’s (FGDC) US Federal Metadata standard, the Content Standard for Digital Geospatial Metadata (CSDGM),Ver.2.
Publishing and Accessing Data with CUAHSI HIS
We are using this service because we have data from various sources that we want to make accessible online – this is called publishing the data -- in a standard way so that everyone can access and use it. To do this, CUAHSI is going to build the ShaleNetwork a Hydroserver. A Hydroserver is essentially a Windows XP computer running HIS software designed for data publication. Because it does not require a commercial license we are using HydroServer Lite. HydroLite uses the Observations Data Model for data storage and a WaterOneFlow web service for data publication.
For some datasets that we are given from data providers, we are using the application ODM Data Loader (ODMDL) to load our data files into an ODM database. This method is appropriate for data that are the result of a project or study that has been completed and will not need periodic updating. For data that are being continuously updated (i.e., data streaming from sensors in the field) we will use the ODM Streaming Data Loader (also free software). The concept behind ODMDL is that it accepts input data in table format (Excel, CSV, or tab- separated), and it is designed to enable the loading of data into ODM tables either one at a time or as bulk data loading from a single file into multiple tables all at once. Once data are loaded into an ODM database, we can look at the data using the application ODM Tools. ODM Tools provides query and visualization tools. If the data look good, we can then publish the data with a WaterOne Flow web service. This service essentially hooks directly into an ODM database to publish data from that database. When data are published this way, it results in a standard output format called WaterML.
The next step after publishing is to register our WaterOneFlow web service. Registering is necessary in order for people to access our data (data discovery). To do this, HIS Central is a website maintained by the CUAHSI HIS team where we register our WaterOneFlow web service. Our service will then be discoverable along with dozens of other web services registered with the system (including services for the USGS NWIS and the EPA STORET datasets). HIS Central is the largest single catalog of the nation’s water data. HIS Central includes a free and open source desktop application called Hydrodesktop. Hydrodesktop allows users to search for data across all registered data sources at once. Once we’ve registered our service and can see our data show up in Hydrodesktop, we have completed the data publication process.

