Information Management

Information Management plays a major role at the H. J. Andrews Experimental Forest Long-Term Ecological Research (LTER) site. Intensive forest ecosystem research, conducted on the Andrews Forest since the 1950's, has resulted in many diverse, long-term ecological databases and a strong commitment to information management. Hundreds of ecological databases are managed through the Forest Science Data Bank (FSDB), which has long been an essential component of the Andrews LTER and is jointly sponsored by the OSU College of Forestry and the US Forest Service PNW. The FSDB is managed through an information management system that supports the collection, quality control, archival and long-term accessibility of collected data and associated metadata. The Andrews LTER maintains an extensive bibliography and image library, and personnel and keyword databases are managed for integrated web searches.

 

Learn about how the Andrews Forest LTER manages information:

 


Andrews LTER Information Management Summary

Information Management Philosophy and Objectives

The mission of Andrews LTER Information Management is to ensure that all LTER research data will be archived and openly available for the future. The primary goals are to 1) preserve high-quality and well-documented data collections that are both secure and accessible, 2) serve the Andrews and broader community through the development and management of informational products and tools, and 3) provide leadership and participation in relevant committees and activities at both the site and LTER Network level. The following objectives illustrate how aspects of the primary goals are achieved.

  • Assure preservation of high quality metadata and data products through the direction and maintenance of a long-term data repository that adheres to LTER standards and best practices and provides security through regular maintenance and backup procedures.
  • Maintain and adhere to a Data Access Policy in compliance with LTER standards to assure that data and accompanying metadata are freely and publicly available electronically within two years of collection at both the local site and within the LTER Network Information System (NIS).
  • Assure regular contributions of research data into the long-term data repository through strong integration of the IM Team with site science including 1) participation and regular reporting at site executive meetings and monthly meetings, 2) annual reviews of Information Management, and 3) regular interactions and trainings conducted with co-PIs and community members.
  • Develop and maintain the Andrews LTER web pages and associated user interfaces that provide access to data and metadata, research publications, programs and projects, site and personnel information, education and outreach programs, community events, and other site information.
  • Provide leadership to the LTER Network in the development of standards and best practices for the NIS and participation in community projects and systems that promote the discovery, use, and integration of LTER data, both within the network and throughout the broader research community.

People and Institutions

Information Management is an essential component of the Andrews LTER program and benefits from an institutional partnership between the Oregon State University College of Forestry (COF) and the USFS Pacific Northwest Research Station (PNW). The current Andrews LTER Information Management Team reflects this long-term partnership:

  • Suzanne Remillard (IM Team Leader/Databases & System Development, LTER/COF)
  • Stephanie Schmidt (IM Programming/Databases & Streaming System Development, PNW)
  • Adam Kennedy (Site System Administrator/Wireless Communications, LTER/COF)
  • Hans Luh (System Admin/Programmer, .25 FTE, LTER/COF)

Two other field technicians (1 PNW, 1 LTER/COF) serve IM roles in supporting data loggers and field computers used in routine data collection, describing collection methods, and providing data. A PNW administrative assistant tracks Andrews publications and maintains the LTER bibliography. NSF supplements are used on occasion to contract for specific application development.

Impacts of Information Management Activities

Perhaps the greatest impact is demonstrated through the persistence and growth of the Forest Science Data Bank. The FSDB was established in 1980 and has been largely funded and operated by LTER personnel since the mid-1990s. The FSDB has opportunistically acquired non-LTER data and includes well over 250 databases with more than 170 databases on-line (mostly LTER). A stable computing environment and information system with desirable features such as adherence to national metadata standards have allowed the FSDB to expand its LTER data resource holdings into a regional data center. Holdings now include key US Forest Service Research data (e.g., Research Natural Areas and Experimental Forests), USFS campaign data (e.g., Demonstration of Ecosystem Management Options (DEMO) and Mount St. Helens), National Forest System data (Young Stand Study), OSU CoF data (e.g., OSU MacDonald Forest), and the Long-Term Permanent Vegetation Plot Network (OSU, PNW, UW). NSF-funded grants in ecosystem informatics such as the IGERT and summer institute (EISI) programs have broadened campus-wide perspectives on information management and cyberinfrastructure issues, and Andrews data has been essential in student projects (e.g., quality control of high-volume streaming data, visualization software on species diversity).

Access to long-term data is critical for ecological researchers to understand and identify processes, patterns, and drivers of complex ecosystem and population dynamics, and to create models and analyses to project potential future states. Data products archived and maintained by the IM team, as well as tools developed for discovery, have enabled researchers to test ecological hypotheses and answer questions that would not otherwise be possible.

Significant Results from our Efforts to Improve our Information Management System

  • Metadata enhancements and improvements to the Ecological Metadata Language (EML)-generation program, which is a generic program to build EML files for all data sets, following best practices guidelines, from our relational metadata database.
  • An archival system to manage versioning of data sets and versioning of associated EML files, thus improving our ability to acess previous versions of data sets.
  • A quality control workflow for streaming sensor data to assure immediate posting of provisional data.
  • An efficient and standardized system of handling all data streaming through the new communication network (includes developing generic database handling programs using file and attribute naming conventions).
  • Refining the web-based login process to greatly simplify access to Andrews data.

Outreach and Training

The Andrews IM team conducts yearly training and outreach to graduate students and other researchers, including conducting "Metadata Parties" with PIs as a means of collecting key research metadata for core data sets, demonstrating metadata requirements and training on the use of our resources and tools. Our recent training workshops have been held virtually (on Zoom), which allows us to record and share this online resource on our website for those that were unable to attend the training. The team meets one-on-one with students and researchers to help them understand the importance of managing their data and contributing to the long-term records of the Andrews LTER.


Andrews LTER Data Life Cycle

The Andrews Forest LTER Information Management System (AIMS) supports the complete data life cycle of a rich and diverse collection of Andrews Forest LTER and other environmental data. The IM team uses AIMS to standardize data curation from design, data collection, processing, validation, documentation, publication, access, and analysis of research data. Ultimately, datasets are uploaded to the Environmental Data Initiative (EDI) data repository, which serves as a node to DataONE where well-described Earth observational data are easily discovered. The Data Observation Network for Earth (DataONE) data life cycle framework (i.e., Plan, Collect, Assure, Describe, Preserve, Discover, Integrate, Analyze) is used to illustrate the components of our system.

PLAN. Information management continues to be an important and unifying theme at the Andrews Forest LTER, and the availability of long-term data provides incentive for researchers to conduct further research at the site. A representative from the IM Team serves as a regular member of the Andrews Forest LTER Executive Committee and participates in new proposal planning. The IM Team works with site leadership to establish awareness and priority for all LTER-collected data. Data contributions from Andrews Forest researchers and graduate students require specific planning with the IM Team. Individual consultations begin with design of study database and continue through data collection, quality control, and archival of data. When planning new research efforts, researchers understand the value and importance of early interaction with the IM Team so as to assure smooth and efficient archiving of data and curation in long-term data repositories (Stafford 1993). To this end, IM training workshops for graduate students and researchers are conducted annually as a means of assuring data contributions and providing IM education.

Andrews Forest LTER data include long-term datasets such as meteorological station and distributed meteorological collections, stream gauging station measurements, stream chemistry, and permanent vegetation plot data. These specific long-term datasets are collectively managed by LTER PIs, staff, and the IM Team. Additionally, many PI-managed data collections, including phenology measurements of vegetation and birds, associated air temperature and other distributed understory air temperature collections, canopy processes, aquatic ecology, and vertebrate populations are archived. Planned new collections include studies on biotic-abiotic effects on ecosystem properties, species response to abiotic drivers, and interpretation of science, values, and decisions.

The IM Team has designed specific applications within AIMS for adding new and updating ongoing study data. A comprehensive SQL relational metadata database serves as the driver for these applications that broadly apply to all data to perform quality control (QC) checks, data versioning and Ecological Metadata Language (EML) generation. Detailed workflows have been developed that serve as documentation of key processing routines for long-term climate, stream discharge and chemistry, and vegetation data. These workflows provide a clear path for data processing from field collection into archival formats, provide necessary provenance for data construction, and buffer the site against disruption from changes in personnel.

COLLECT. Data collection efforts are continual and widespread throughout the forest. Many studies collect data manually and web-based applications, data loggers, field recorders, and radio telemetry are becoming more common. A wireless communication backbone installed across the forest collects and transmits high-temporal resolution data from data loggers located at meteorological and gauging stations back to a base station at Andrews Headquarters (Henshaw et al. 2008). Radio telemetry at 5.8 Ghz provides near real-time data and is particularly useful given the remoteness and limited accessibility of most sites during the winter. A dynamic system using GCE Data Toolbox in Matlab is employed to provide initial QC, web access and near real-time graphics of streaming hydro-meteorological data. This system pre-screens data and flags potential errors in this provisional data. Problem data are quickly identified, and the IM team is alerted as problems occur, enabling technicians to provide rapid attention to the issue. This pre-screening improves efficiency in delivering final data products for public access and building user confidence in the data streams. Field technicians enter notes and comments to further document other problem issues that are discovered through a locally developed web application. The notes are used to assign method and event codes in the data. Standard naming conventions are applied on more than 60 data loggers across all hydro-meteorological datasets to ensure efficient data management. This system, and the wireless capabilities, has the capacity to accommodate more data streams in the future. The Andrews Forest has adopted best practices for managing streaming sensor data documented by the EnviroSensing cluster (Gries et al. 2014) within the Earth Science Information Partners (ESIP).

ASSURE. metadata-driven QC system, consisting of a set of procedures that provide generic data validation for any dataset, provides another example of an efficient software tool that relies directly on the metadata. In this case, a desktop control program uses relevant metadata to validate that the attributes for each table in a dataset are properly described in the metadata. Problems are recorded in an error report and validation includes checks against illegal null values or duplicate records (entity integrity), checks against listed numeric ranges for extreme values and against enumerated domains for undefined codes (domain integrity), and special database checks that are pre-determined in discussion with the PI for individual datasets. This QC system provides valuable metadata checks for researchers that serve to identify data inconsistencies not discovered in earlier stages of QC. The resulting cleaned dataset is thus prepared for near-seamless delivery into the EDI data repository, which requires each dataset to pass a series of additional congruency checks to verify that data tables are compliant with and ingestible from the EML metadata.

DESCRIBE. The IM team has focused on improving efficiency in order to manage documentation for increasing volumes of data collected by the LTER. The SQL relational metadata database is tailored to accommodate all necessary elements within the LTER metadata standard, EML. Metadata content for all study datasets including detailed entity (data table) and attribute (variable) descriptions is established within the metadata database, and metadata templates and software tools are used to facilitate adding information. EML files are easily generated from a locally developed application that maps elements from the metadata database into EML using style sheet transformation scripts and new metadata content is instantly incorporated into new EML files. A similar EML generation program has been used to map ESRI ArcGIS metadata from the federal FGDC spatial standard into EML descriptions of spatial entities. Our EML documentation adheres to LTER EML “best practice” recommendations and assures a standardized approach for consistency with other LTER sites. These applications are easily modified to add new EML elements or adhere to new EML versions, such as the recently released EML version 2.2. A data versioning system assures that all versions of both the EML metadata and associated datasets are archived, and that new versions are immediately generated for public access.

The IM Team facilitates the collection of study metadata by providing webpage descriptions of the data submission process. A web-based administrative interface allows any researcher associated with a study to enter and revise descriptive metadata for that data, relieving the information manager of this effort. “Metadata writing parties,” where PIs and students come together and collectively use the software tools under Information Manager guidance, have proven to be effective in collecting and improving titles, abstracts, methods, and other study metadata.

PRESERVE. The central component of AIMS is the Forest Science Data Bank (FSDB), a long-term data repository initiated in 1983 (Stafford et al. 1984, 1988, Henshaw and Spycher 1999, Henshaw et al. 2002), supported by the Andrews Forest in partnership with the U.S. Forest Service Pacific Northwest Research Station (PNW) and the OSU College of Forestry (COF). The FSDB stores complete collections of datasets and current and historic publications from the LTER as well as from pre-LTER research at the H.J. Andrews Experimental Forest (data collection started in 1948). The highly structured metadata database includes a data catalog with all associated metadata and publication citation information for these collections. All Andrews Forest LTER datasets including key long-term and ongoing data collections, are curated in FSDB, published online on the Andrews Forest LTER webpage, and uploaded to EDI, where a Digital Object Identifier (DOI) is assigned. Short-term data products are published to EDI within two years, and ongoing data products are uploaded on regular intervals (typically annually or biennially). A table that lists all datasets from the site that have been deposited into EDI is provided as a Supplementary Document with this proposal.

In addition to the EDI repository, the Andrews Forest LTER website provides access to datasets. Beyond local preference and familiarity, one key advantage of continuing to provide data locally is the value-added capabilities available for accessing and subsetting large datasets. Local features include filters to download desired subsets or to request subsets of data at specific time intervals, thereby speeding download time and not having to download extremely large datasets. We provide the dataset DOI, assigned by EDI, in our local online data citation through a regularly run script that harvests the current version DOI from EDI and inserts it into our metadata database.

DISCOVER. All Andrews Forest LTER data webpages are dynamic. Webpages use web integration software to display metadata and provide access to the data. A website search engine permits simple search strings to find data and publications and allows additional searches using person, place or theme keywords. Our locally developed theme keyword list has been mapped to, and includes elements from, the LTER Controlled Vocabulary in the EML document. The primary Andrews Forest LTER website is in a Drupal content management system that pulls personnel, publication, and database information from our metadata schema in an XML file to allow a comprehensive text searching mechanism from the main search box. Both EDI and DataONE employ search capabilities to locate desired data based on the EML metadata document. Basic infrastructure spatial data have been moved into ArcGIS Open Data Hub for standardization, easy access, and direct import into an ArcGIS system.

The Andrews Forest LTER data access policy is compliant with the LTER network data policy. Both policies were revised in 2017, and include three sections: data release, data access, and data use agreement. Contributions of data are required when any LTER funding is involved and are expected for all approved site research projects. Andrews Forest LTER researchers make every effort to release data in a timely fashion and with attention to accurate and complete metadata. Datasets are released to the public domain under the Creative Commons Attribution 4.0 International Public License. Data and information derived from publicly funded research in the Andrews will be made available online within two years of collection. Some data may be restricted due to documented institutional or legal requirements of the owner, but these occurrences are rare and exceptional. Primary observations collected for core research activities directly or partially supported by LTER funding receive the highest priority for data release. Other types of data including affiliated studies or legacy data are released as resources permit.

INTEGRATE. The DataOne life cycle term “integration” refers to creating homogeneous datasets that can be readily analyzed by combining data from disparate sources. Given the specific needs of the Andrews Forest, the IM team has focused on approaches that integrate and improve processing efficiency for datasets of similar data types. For example, there is a common data structure for climate data. Similarly, data from several large, long-term vegetation growth and mortality studies are being reorganized into standard formats. This is cost-effective in that it encourages creation of standard field collection forms, which simplify data processing, and enables calculations of summary data. Additionally, automation of field data collection is being used to reduce the time for manual data entry and correction. Efficiencies in management of climate and vegetation data are essential given their time-consuming nature and inherent complications in properly documenting and processing. Analogously, increasing volumes of streaming sensor data require standardized approaches in data management (Campbell et al. 2013).

Data integration not only streamlines processing and management of these data collections but allows the use of web-based applications to more easily access and analyze the datasets. For example, several of our long-term datasets contain multiple sites and parameters. While these datasets could be structured into tens of individual databases, our use of standard formats and local web features allows users to efficiently select, filter or subset data from these large datasets:

  • Meteorological station data for all 7 primary and secondary benchmark stations, beginning in 1957 to present, includes daily and high-resolution measurements in 32 data tables with multiple parameters. Over 30 million meteorological observations are added each year (MS001).
  • Stream discharge data from 10 watershed gauging stations, beginning in 1949 to present, with over 10 million stream discharge observations added each year (HF004).
  • Permanent vegetation plot tree data, some data collected as early as 1910, includes regular tree measurements for 380 plots on 8 watersheds plus over 180 reference stands and represents over 130,000 tagged trees that have been repeatedly measured over time (TV010).

 

ANALYZE. The IM team provides tools for analysis of data such as calculating or aggregating data for many datasets. The previously described web-based applications that provide value-added features for accessing the large datasets enable analysis of data. Examples of these applications include:

  • GLITCH, which allows users to filter high temporal resolution climate entities for a selected station, sensor, date range, and requested output interval.
  • FLOW, which recalculates stream discharge for user specifications, similar to GLITCH.
  • Generic AIMS tools that take advantage of entity metadata to allow any dataset table to be subset using its primary key (i.e., site code and date) as an aid to users downloading data for analysis.

 

View our Schema of FSDB LTER Integrated Metadata System (FLIMSY)    [Printer Ready PDF].


System Administration

The Andrews Forest LTER has agreements with COF for computer system administration, backup of production servers, and other information technology services including system administration support for LTER-related campus computer servers, production and development web servers, production and development database servers, shared file server directories, two tape backup servers, and cloud storage. With the improved internet capabilities at the field station, large data volumes (e.g., imagery) are now backed up directly to OSU servers and cloud storage, removing a previous vulnerability in the data archiving process.


October 2024