Home > Long-Term Research Categories > Information Management > Information Management Summary
The Andrews LTER Information Management System
14 March 2014
Information System: The Andrews LTER Information Management Team has developed an information system to support the collection, documentation, management, and archival of a rich and diverse collection of Andrews LTER and other environmental data. The central component of the information system is the Forest Science Data Bank (FSDB), a long-term data repository initiated in 1983 (Stafford et al. 1984, 1988, Henshaw and Spycher 1999, Henshaw et al. 2002), which is supported by the Andrews LTER in partnership with the U.S. Forest Service Pacific Northwest Research Station (PNW) and the OSU College of Forestry (COF). The FSDB includes over 200 active and legacy study databases and features a highly-structured metadata database. The FSDB includes approximately 160 active LTER study databases including all "signature" data. Signature data refers to core Andrews LTER databases including all key long-term and ongoing data collections. Ongoing long-term data sets such as meteorological station, stream gauging station, stream chemistry, fish population, permanent vegetation plot, and decomposition plot data are collectively managed by LTER PIs, staff and the IM Team. Workflow paths are established to assure data updates, quality control, and archival into the FSDB and the LTER NIS on a frequent basis. Data contributions from individual LTER PIs and graduate students require more specific planning and interaction between the PI and IM Team, and software tools are used to facilitate capture of these data. In addition to continuing updates of existing data, five new LTER study databases are added each year on average. Other components of the information system include a generic, metadata-driven quality control system, an administrative interface for LTER members, data submission tools, and dynamic web pages for discovery and access to data and informational products. The information system manages study databases and research publications, and extensions to include the image library and Andrews-related museum collections are being considered.
FSDB study data and metadata: The FSDB contains "signature" and other LTER-related data sets from the Andrews Forest. All LTER data sets are routinely placed on-line based on the terms of our data access policy. The FSDB has also opportunistically captured other important data sets from OSU and the Forest Service, and continues to house significant legacy data collections that are not available on-line, due to priority status or quality control issues. Metadata are established in compliance with the LTER metadata standard, the Ecological Metadata Language (EML), and follow LTER "best practice" recommendations. Software tools are used to map elements from the relational metadata database into EML, and similarly map ESRI metadata from the FGDC spatial standard into EML. EML metadata are regularly harvested from the Andrews LTER into the central metadata repository (Metacat) at the LTER Network Office (LNO), which assures that Andrews data is available for network-wide data searches. In January 2014, the LNO rolled out their newly developed data archive, Provenance Aware Synthesis Tracking Architecture (PASTA). Data sets uploaded into PASTA have passed a set of quality checks, assuring the congruence between both data and metadata. All Andrews LTER data sets will be uploaded into PASTA. EML files are easily mapped into the NBII Biological Data Profile metadata standard using stylesheet software at the LNO and discoverable through the NBII clearinghouse. Recent improvements to our data archive system include the ability to manage both data version and metadata version together. Each data set is created (typically a CSV file) with its corresponding metadata EML file and the most current version is available for download. For some of our larger databases, like climate and hydrology, data is accessed using an interactive application that allows the querying of an individual probe and date range. Downloads are tracked through a minimal user registration system.
Quality control system: This system consists of a set of simple procedures that provide generic metadata-driven data validation. A desktop control program reads the relevant metadata for validating any given data table and generates appropriate validation code. The control program executes the generated code and records any problems in the metadata description of the data table in an error report. Validation includes checks of the primary key for nulls and duplicates (entity integrity), checks versus listed numeric ranges or enumerated codes (domain integrity), and database rules. Rules are typically specific to individual databases and often have been "discovered" with the help of database owners. Generic rules are employed in time-series contexts, but most rules are only shared occasionally. PASTA uses a similar approach, the "EML congruency checker", that provides EML-driven metadata and data validation.
Administrative interface: An improved administrative interface has been implemented that allows interactive site member submission of study metadata, managing of personnel profiles, and managing research projects including an online project application form. This interface is designed to improve the efficiency of IM operations by reducing the amount of staff time dedicated to the update of study metadata and personal information. Recent extensions have enabled the entry of publication citations. All site publications are entered through the interface rather than being managed in a separate citation management system, which allows immediate display online and PDF availability.
Data submission: The IM team has developed a web page to provide instructions and other references to facilitate submission of study data from site PIs, graduate students, and other researchers. Instructions are available to assist a data provider in entering study metadata using the administrative interface and describing spatial entities. A spreadsheet template is also provided to capture specific entity and attribute information. Desktop software tools allow the import of the template (Excel) into the metadata framework and allow additional editing of the metadata. The information system draws upon a local controlled vocabulary for both place and theme keywords and a reference list of common units of measurement to promote consistency of data set descriptions and to avoid redundant descriptions of site locations.
Web pages: Web page development has been an important activity for the site as it acts as an organizational framework for the display of research products and is a primary source of site information for both local and broader research community users. Andrews LTER personnel maintain and update extensive web pages describing the Andrews Forest, ongoing LTER and collaborative research, personnel, site data sets and associated metadata, publication lists with links to scanned documents, education and outreach, and other current events and activities. All site web pages are written dynamically with web integration software taking advantage of metadata tables that describe content, page templates and navigation bars. A web site search engine is employed and various interfaces permit additional searching for data and publications using either simple search strings or established relations with researcher, place or theme keywords. Social media (Facebook, Twitter, RS feed) have been incorporated into the web presence. Google Analytics software has been added to the web pages to track visitor numbers, user access flow, and highly trafficked pages.