CLIMATE DATABASE PROJECT: A STRATEGY FOR IMPROVING INFORMATION ACCESS ACROSS RESEARCH SITES
Donald L. Henshaw
U.S. Forest Service Pacific Northwest Research Station, 3200 SW Jefferson, Corvallis, OR 97331
Maryan Stubbs and Barbara J. Benson
Center for Limnology, University of Wisconsin-Madison, Madison, WI 53706
Karen Baker
Scripps Institution of Oceanography, University of California-SanDiego, La Jolla, CA 92093
Darrell Blodgett
Forest Soils Laboratory, University of Alaska, Fairbanks, AK 99775
John H. Porter
Department of Environmental Sciences, University of Virginia, Charlottesville, VA 22903
Presented August 9, 1997, Albuquerque, New Mexico, at the workshop on
Data and Information Management in the Ecological Sciences: A Resource Guide
Abstract. To facilitate intersite research among the network of Long-Term Ecological Research sites, information managers are exploring strategies for linking individual site information systems. A prototype to provide climatic summaries dynamically has been developed and serves as one model for improving access to data across sites. Individual sites maintain local climate data in local information systems while a centralized site continually updates and provides access to all sites’ data through a common database. Common distribution report formats have been established to meet specific needs of climate data users.
Keywords: data access, data exchange, intersite data, climate data
PowerPoint Presentation
Introduction
Background
Overview
Specific Exchange and Distribution Formats
Conclusions
Acknowledgments
Literature Cited
Prototype Webpage
INTRODUCTION
Information Managers associated with the Long-Term Ecological Research (LTER) program have developed a basic Network Information System (NIS) with a primary goal of facilitating intersite research (Stafford et al. 1994). To accommodate the needs of various intersite studies and synthesis efforts within the LTER, it is considered critical to develop dynamic systems for providing comparable data from multiple LTER sites. Improving access and adding query capability to intersite data using network information servers is a major component of current NIS development (Brunt 1996). With each site operating its own information management system, the LTER NIS will employ a variety of strategies in linking these individual systems (Porter et al. 1997).
Climate data are collected at all LTER sites and is a frequently requested data set. Synthesis groups need ready access to climatic summaries from multiple sites. A NIS prototype to provide climatic summaries dynamically has been developed and serves as one model for improving access to data across sites. This approach allows individual sites to maintain the local climate data in local information systems while a centralized site continually updates and provides access to all sites’ data through a common database.
BACKGROUND
A standards document developed by the LTER Climate Committee (Greenland 1986) established baseline meteorological measurements to characterize each LTER site. Standardized measurements provide a basis for coordinating meteorological measurements at two or more sites and enable intersite comparisons. More recently, a project to conduct climatic analyses of the LTER sites (CLIMDES) gathered individual site temperature and precipitation data (1960-1990) and created on-line monthly summaries for each site (Greenland et al. 1997). While the CLIMDES project satisfied an immediate need for access to monthly site climate data, the structure provided no method for maintaining and updating these summaries or satisfying frequent requests for daily climate data. Most of the LTER sites had their climate data available on the World-Wide Web (WWW), but the data sets were sometimes difficult to find and were formatted and aggregated differently site to site.
The NSF-funded XROOTS project requires intersite climate data to synthesize belowground productivity using root biomass data from multiple sites. The idea that distribution of data in report formats amenable to users independent of the data storage format was explored in an XROOTS climate workshop (Bledsoe et al. 1996). Two monthly distribution report formats were recommended to accommodate both spreadsheet (V-One) and database (V-Many) users (See Table 1).
OVERVIEW
As part of the LTER Information Managers’ NIS development, the LTER climate database project (ClimDB) has developed a prototype for harvesting daily climate data in a standardized exchange format using the WWW from a subgroup of LTER sites. The harvested data are stored in a centralized relational database. Climate variables include daily minimum, maximum, and mean air temperature and daily precipitation. Applications have been developed initially to generate the two XROOT monthly distribution formats using this centralized database of daily values. Additionally, a webpage (
http://www.fsl.orst.edu/climhy) has been created to provide access to the daily and monthly climate data as well as to permit query by LTER site, weather station, and date.SPECIFIC EXCHANGE AND DISTRIBUTION FORMATS
Each of the five sites participating in the prototype development process provided climate data files in a standardized daily exchange format at an Internet address (URL). For this model, the site files could be either static or produced by a dynamic script. A comma-delimited format was agreed upon after discussions revealed the diversity of approaches, opinions and needs among sites. For instance, date can be stored as a single 8-character field, comma separated, or julian day designated. It is important to note there is not one "right" exchange format. The primary criteria demands for individual sites to easily "filter" local site data into the exchange format. The standardized daily exchange format agreed upon is as follows:
Site, station, date, value1, flag1, value2, flag2, value3, flag3, value4, flag4
where,
site the three-letter LTER site code
station that site’s name for the weather station
date 8-character field, yyyymmdd
value1, flag1 mean air temperature and corresponding flag
value2, flag2 maximum air temperature and corresponding flag
value3, flag3 minimum air temperature and corresponding flag
value4, flag4 precipitation and corresponding flag
All temperature values are reported in degrees Celsius and precipitation in millimeters. Each value has a corresponding data quality flag where flags are coded as follows:
G or blank value is a good value
E value is estimated
Q value is questionable
M value is missing
T trace value (for precipitation only)
Here is a brief example of the daily format from the Andrews Forest (AND) site’s Primary Meteorological Station (PRIMET) aligned for readability:
AND,PRIMET,19960101,6.8, ,10.8,Q,4.5, , 0.0,T
AND,PRIMET,19960102,5.3, ,10.6,Q,0.8, , 4.3,
AND,PRIMET,19960103,7.7, , 9.7, ,4.1, ,20.6,
AND,PRIMET,19960104,4.2, , 6.7, ,2.4, ,11.4,
AND,PRIMET,19960105,4.8,E, 7.4,E,2.7,E, ,M
AND,PRIMET,19960106,5.7,E, 9.7,E,1.3,E, ,M
Daily climate data from all sites are harvested automatically from the local sites using a simple script calling the www line mode browser. An example of the harvest command line for the Andrew’s Forest climate data is:
www -n -source http://www.fsl.orst.edu/lter/webmast/and_clim.txt >and.dat
Data are stored in a relational database at the centralized site. Application programs produce two monthly distribution tables (See Table 1). A webpage allows the user to query for daily data in addition to providing the two monthly tables. Monthly summary values are displayed along with the number of valid daily values included in the summary. Missing and questionable values are excluded from summary values. Listing the number of valid data values used in calculating a monthly value gives the user some assurance about the value’s accuracy and represents a valuable addition to any distribution format.
Table 1. Examples of the two monthly distribution tables (V-One and V-Many) are shown for the Andrews Forest (AND) site’s Primary Meteorological Station (PRIMET). The "#" indicates the number of valid daily values (including estimated values) that were used in calculating the monthly summary value.
V-One displays one variable per table and is primarily intended for use in spreadsheets. These two abbreviated examples show mean monthly air temperature and total precipitation.
V-One
AND PRIMET Avg_mean_air_temp_c Year Jan # Feb # Mar # Apr # May # Nov # Dec # 1991 0.1 31 5.8 28 4.5 31 6.9 30 10.0 31 . . . 6.5 30 3.2 31 1992 3.3 29 5.8 29 8.1 30 10.0 30 15.0 31 . . . 5.0 30 1.0 31 1993 -0.6 31 0.6 28 6.0 31 7.7 30 13.2 31 . . . -0.8 30 -0.2 30 AND PRIMET Totl_precip_mm Year Jan # Feb # Mar # Apr # May # Nov # Dec # 1991 232 31 208 28 221 31 242 30 195 31 . . . 451 30 214 31 1992 160 31 201 29 40 31 290 30 20 31 . . . 377 30 419 31 1993 242 31 95 28 354 31 394 30 237 31 . . . 103 30 278 31
V-Many displays many variables per table and is primarily intended for use in relational databases. This example includes all four prototype variables of monthly mean, minimum, and maximum air temperature and total monthly precipitation.
V-Many
AND PRIMET Year Month Mean # Max # Min # Ppn # 1991 Jan 0.1 31 5.3 31 -3.0 31 232 31 1991 Feb 5.8 28 12.4 28 2.1 28 208 28 1991 Mar 4.5 31 11.2 31 0.3 31 221 31 1991 Apr 6.9 30 13.3 30 2.6 30 242 30
METADATA
Every meteorological station will be described in a central metadata database. An entity-relationship diagram (See Figure 1) shows the proposed schema for the metadata database. LTER site level information, individual station descriptions, and specific measurement documentation form the three major entities. Standardized web forms will be used to collect this information from participating sites. Metadata term definitions will be made available on the central webpage. Metadata will be critical for intersite studies in evaluating key differences in site descriptions and methodology.
Figure 1. Proposed schema for the metadata database.
CONCLUSIONS
With an increasing focus on intersite activities within the LTER program, the LTER Information Managers are developing a Network Information System to facilitate intersite research. This LTER NIS prototype for climate data will serve as a model for other intersite data set integration efforts. The approach allows for the diversity in information management systems across the LTER network. Data sets are distributed across multiple sites, but are accessible in common distribution formats from a central site. Specially formatted distribution reports have been established to meet specific needs of climate data users, but the design is extensible in that it permits update with additional formats as the need arises.
ACKNOWLEDGMENTS
The authors would like to acknowledge contributions from the North Temperate Lakes LTER site for participating in the development of this prototype and for supporting the centralized database and web pages. Contributions from the H. J. Andrews Experimental Forest, the Bonanza Creek Experimental Forest, Palmer Station, and the Virginia Coast Reserve LTER sites are also recognized for participating in the development of this prototype. LTER sites are funded all or in part by the National Science Foundation. We also wish to acknowledge the efforts of Caroline Bledsoe for her strong support and continued interest in this project.
LITERATURE CITED
Bledsoe, C., J. Hastings, and R. Nottrott. 1996. Xclimate workshop, Davis, California, USA [Online]. Available:
http://www.lternet.edu/documents/reports/Xroots/aclim.htm [1997,September 18].Brunt, J. W. 1996. Developing an LTER Network Information System for the 21st century [Online]. Available:
http://www.lternet.edu/is/ [1997, September 18].Greenland, D., T. Kittel, B. P. Hayden and D. S. Schimel. 1997. A climatic analysis of Long-Term Ecological Research sites [Online]. Available:
http://lternet.edu/documents/Publications/climdes/index.html [1997,September 18].Greenland, D. 1986. Standardized meteorological measurements for Long-Term Ecological Research sites. Bulletin of the Ecological Society of America. 67:275-277.
Porter, J., D. L. Henshaw, and S. G. Stafford. 1997. Research Metadata in Long-Term Ecological Research (LTER). In Proceedings of the Second IEEE Metadata Conference. Silver Spring, Maryland, USA [Online]. Available:
http://computer.org/conferen/proceed/meta97/list_papers.html [1997,September 18].Stafford, S. G., J. W. Brunt, and W. K. Michener. 1994. Integration of scientific information management and environmental research. Pages 3-19 in S. G. Stafford, J.W. Brunt and W.K. Michener, editors. Environmental Information Management and Analysis: Ecosystem to global scales. Taylor & Francis, Bristol, Pennsylvania, USA.