Information Management

Information Management plays a major role at the H. J. Andrews Experimental Forest Long-Term Ecological Research (LTER) site. Intensive forest ecosystem research, conducted on the Andrews Forest since the 1950's, has resulted in many diverse, long-term ecological databases and a strong commitment to information management. Hundreds of ecological databases are managed through the Forest Science Data Bank (FSDB), which has long been an essential component of the Andrews LTER and is jointly sponsored by the OSU College of Forestry and the US Forest Service PNW. The FSDB is managed through an information management system that supports the collection, quality control, archival and long-term accessibility of collected data and associated metadata. The Andrews LTER maintains an extensive bibliography and image library, and personnel and keyword databases are managed for integrated web searches.

 

Learn about how the Andrews Forest LTER manages information:

 


Andrews LTER Information Management Summary

Information Management Philosophy and Objectives

The mission of Andrews LTER Information Management is to ensure that all LTER research data will be archived and openly available for the future. The primary goals are to 1) preserve high-quality and well-documented data collections that are both secure and accessible, 2) serve the Andrews and broader community through the development and management of informational products and tools, and 3) provide leadership and participation in relevant committees and activities at both the site and LTER Network level. The following objectives illustrate how aspects of the primary goals are achieved.

  • Assure preservation of high quality metadata and data products through the direction and maintenance of a long-term data repository that adheres to LTER standards and best practices and provides security through regular maintenance and backup procedures.
  • Maintain and adhere to a Data Access Policy in compliance with LTER standards to assure that data and accompanying metadata are freely and publicly available electronically within two years of collection at both the local site and within the LTER Network Information System (NIS).
  • Assure regular contributions of research data into the long-term data repository through strong integration of the IM Team with site science including 1) participation and regular reporting at site executive meetings and monthly meetings, 2) annual reviews of Information Management, and 3) regular interactions and trainings conducted with co-PIs and community members.
  • Develop and maintain the Andrews LTER web pages and associated user interfaces that provide access to data and metadata, research publications, programs and projects, site and personnel information, education and outreach programs, community events, and other site information.
  • Provide leadership to the LTER Network in the development of standards and best practices for the NIS and participation in community projects and systems that promote the discovery, use, and integration of LTER data, both within the network and throughout the broader research community.

People and Institutions

Information Management is an essential component of the Andrews LTER program and benefits from an institutional partnership between the Oregon State University College of Forestry (COF) and the USFS Pacific Northwest Research Station (PNW). The current Andrews LTER Information Management Team reflects this long-term partnership:

  • Don Henshaw (IM Programming Lead and Databases, PNW)
  • Suzanne Remillard (Databases and Information Management System Administration, LTER/COF)
  • Adam Kennedy (System Administrator/Wireless Communications, LTER/COF)
  • Hans Luh (Programmer, .25 FTE, LTER/COF)

Two other field technicians (1 PNW, 1 LTER/COF) serve IM roles in supporting data loggers and field computers used in routine data collection, describing collection methods, and providing data. A PNW administrative assistant tracks Andrews publications and maintains the LTER bibliography. NSF supplements are used on occasion to contract for specific application development.

Impacts of Information Management Activities

Perhaps the greatest impact is demonstrated through the persistence and growth of the Forest Science Data Bank. The FSDB was established in 1980 and has been largely funded and operated by LTER personnel since the mid-1990s. The FSDB has opportunistically acquired non-LTER data and includes well over 250 databases with more than 170 databases on-line (mostly LTER). A stable computing environment and information system with desirable features such as adherence to national metadata standards have allowed the FSDB to expand its LTER data resource holdings into a regional data center. Holdings now include key US Forest Service Research data (e.g., Research Natural Areas and Experimental Forests), USFS campaign data (e.g., Demonstration of Ecosystem Management Options (DEMO) and Mount St. Helens), National Forest System data (Young Stand Study), OSU CoF data (e.g., OSU MacDonald Forest), and the Long-Term Permanent Vegetation Plot Network (OSU, PNW, UW). NSF-funded grants in ecosystem informatics such as the IGERT and summer institute (EISI) programs have broadened campus-wide perspectives on information management and cyberinfrastructure issues, and Andrews data has been essential in student projects (e.g., quality control of high-volume streaming data, visualization software on species diversity). There have been over six thousand documented downloads of data from FSDB in the past three years.

The IM Team has recently been concentrating on improving our information management system to satisfy LTER Network expectations for placing metadata and data into the Network data repository, PASTA, and advancing Network functionality. To do this, we have prioritized updates and review of all data sets, including reviewing and updating the website for necessary improvements. An on-going project has been the streamlining the processing of major data collections including several climate and vegetation study data sets by standardizing workflows and improving our capability to run quality control checks on streaming data in near real-time

Significant Results from our Efforts to Improve our Information Management System

  • Metadata enhancements and improvements to the Ecological Metadata Language (EML)-generation program, which is a generic program to build EML files for all data sets, following best practices guidelines, from our relational metadata database.
  • An archival system to manage versioning of data sets and versioning of associated EML files, thus improving our ability to acess previous versions of data sets.
  • A quality control workflow for streaming sensor data to assure immediate posting of provisional data.
  • An efficient and standardized system of handling all data streaming through the new communication network (includes developing generic database handling programs using file and attribute naming conventions).
  • Refining the web-based login process to greatly simplify access to Andrews data.

Outreach and Training

The Andrews IM team conducts yearly training and outreach to graduate students, IGERT, and Eco-Informatics Summer Institute students, including conducting "Metadata Parties" with PIs as a means of collecting key research metadata for core data sets, demonstrating metadata requirements and training on the use of our administrative interface. The team meets one-on-one with students and researchers to help them understand the importance of managing their data and contributing to the long-term records of the Andrews LTER. Team members have conducted training and outreach internationally and nationally at conferences and workshops.


Andrews LTER Information Management System

Information System: The Andrews LTER Information Management Team has developed an information system to support the collection, documentation, management, and archival of a rich and diverse collection of Andrews LTER and other environmental data. The central component of the information system is the Forest Science Data Bank (FSDB), a long-term data repository initiated in 1983 (Stafford et al. 1984, 1988, Henshaw and Spycher 1999, Henshaw et al. 2002), which is supported by the Andrews LTER in partnership with the U.S. Forest Service Pacific Northwest Research Station (PNW) and the OSU College of Forestry (COF). The FSDB includes over 200 active and legacy study databases and features a highly-structured metadata database. The FSDB includes approximately 160 active LTER study databases including all "signature" data. Signature data refers to core Andrews LTER databases including all key long-term and ongoing data collections. Ongoing long-term data sets such as meteorological station, stream gauging station, stream chemistry, fish population, permanent vegetation plot, and decomposition plot data are collectively managed by LTER PIs, staff and the IM Team. Workflow paths are established to assure data updates, quality control, and archival into the FSDB and the LTER Data Portal on a frequent basis. Data contributions from individual LTER PIs and graduate students require more specific planning and interaction between the PI and IM Team, and software tools are used to facilitate capture of these data. In addition to continuing updates of existing data, five new LTER study databases are added each year on average. Other components of the information system include a generic, metadata-driven quality control system, an administrative interface for LTER members, data submission tools, and dynamic web pages for discovery and access to data and informational products. The information system manages study databases and research publications, and extensions to include the image library and Andrews-related museum collections are being considered.

FSDB study data and metadata: The FSDB contains "signature" and other LTER-related data sets from the Andrews Forest. All LTER data sets are routinely placed on-line based on the terms of our data access policy. The FSDB has also opportunistically captured other important data sets from OSU and the Forest Service, and continues to house significant legacy data collections that are not available on-line, due to priority status or quality control issues. Metadata are established in compliance with the LTER metadata standard, the Ecological Metadata Language (EML), and follow LTER "best practice" recommendations. Software tools are used to map elements from the relational metadata database into EML, and similarly map ESRI metadata from the FGDC spatial standard into EML. All data are regularly harvested into the Provenance Aware Synthesis Tracking Architecture (PASTA) of the Environmental Data Initiative (EDI), previously referred to as the LTER Network Information System (NIS). Data sets uploaded into PASTA have passed a set of quality checks, assuring the congruence between both data and metadata. EML files are easily mapped into the NBII Biological Data Profile metadata standard using stylesheet software and discoverable through the NBII clearinghouse. Our data archive system includes the ability to manage both data version and metadata version together. Each data set is created (typically a CSV file) with its corresponding metadata EML file and the most current version is available for download. For some of our larger databases, like climate and hydrology, data is accessed using an interactive application that allows the querying of an individual probe and date range. Downloads are tracked through a minimal user registration system. View our Schema of FSDB LTER Integrated Metadata System (FLIMSY)    [Printer Ready PDF].

Quality control system: This system consists of a set of simple procedures that provide generic metadata-driven data validation. A desktop control program reads the relevant metadata for validating any given data table and generates appropriate validation code. The control program executes the generated code and records any problems in the metadata description of the data table in an error report. Validation includes checks of the primary key for nulls and duplicates (entity integrity), checks versus listed numeric ranges or enumerated codes (domain integrity), and database rules. Rules are typically specific to individual databases and often have been "discovered" with the help of database owners. Generic rules are employed in time-series contexts, but most rules are only shared occasionally. PASTA uses a similar approach, the "EML congruency checker", that provides EML-driven metadata and data validation.

Administrative interface: An improved administrative interface has been implemented that allows interactive site member submission of study metadata, managing of personnel profiles, and managing research projects including an online project application form. This interface is designed to improve the efficiency of IM operations by reducing the amount of staff time dedicated to the update of study metadata and personal information. Recent extensions have enabled the entry of publication citations. All site publications are entered through the interface rather than being managed in a separate citation management system, which allows immediate display online and PDF availability.

Data submission: The IM team has developed a web page to provide instructions and other references to facilitate submission of study data from site PIs, graduate students, and other researchers. Instructions are available to assist a data provider in entering study metadata using the administrative interface and describing spatial entities. A spreadsheet template is also provided to capture specific entity and attribute information. Desktop software tools allow the import of the template (Excel) into the metadata framework and allow additional editing of the metadata. The information system draws upon a local controlled vocabulary for both place and theme keywords and a reference list of common units of measurement to promote consistency of data set descriptions and to avoid redundant descriptions of site locations.

Web pages: Web page development has been an important activity for the site as it acts as an organizational framework for the display of research products and is a primary source of site information for both local and broader research community users. Andrews LTER personnel maintain and update extensive web pages describing the Andrews Forest, ongoing LTER and collaborative research, personnel, site data sets and associated metadata, publication lists with links to scanned documents, education and outreach, and other current events and activities. A recent website redesign moved the framework from dynamic integration of metadata tales to a Drupal content management framework (MySQL). To provide coherence of the FSDB metadata system with the Andrews primary webpages,  XML files of publications and personnel are created nightly from the IM Team-managed SQLServer database and ingested into the Drupal content management framework.  Additionally, an XML file of the databases is also created and ingested to allow searching from the main Drupal webpage.  The data catalog serves all database webpages which are written dynamically using ASP.NET using metadata tables in our SQLServer database.


System Administration

System administration and hardware at Oregon State: The COF Forestry Computing Resources (FCR) provides system administration support for LTER campus computer servers through agreements with LTER and PNW. Production and development web servers (IIS, UNIX, and LINUX), production and development database servers (MS SQLServer), shared file server directories, and two tape backup servers are directly used by the LTER and supported through FCR. Refer to the FCR description of network systems for more information.

System administration and hardware at the Andrews site: The on-site Andrews LTER system administrator maintains the site Local Area Network (LAN), local web server, wireless LAN, spread spectrum and radio telemetry communication network, telephone communications, and local personal computers. A wireless LAN is installed with access points linking the conference room and classroom, dormitories, cafeteria, shop, and director's residence to the wired LAN with a wireless bridge.

Backup policies: General backup procedures are maintained and implemented through agreements with OSU College of Forestry. In general, campus web servers (including IIS, UNIX, and Linux used by LTER) and file servers (Windows) are backed up nightly. For these systems a full backup is done once each month, and a "level" backup is done once a week. A level backup catches what changed since the last full backup. Then, on all remaining days, an incremental backup is performed. Backups are kept for 6 months. See the COF backup policy for more information. A T1 line to OSU campus allows nightly backup of Andrews on-site web and file servers to COF. On-site servers are also mirrored to provide an immediate local backup.

Legato NetWorker's backup module for SQL Server is used to backup MS SQL Server databases. A full backup is performed on a regular schedule every night. Additionally, space is provided for DB managers to perform SQL backups as needed throughout the day. For example, if a database was undergoing a major change, the DB manager could use the SQL Backup Tools within MS SQL Server to backup the database by hand, before making the change. We also have the ability to perform a backup using NetWorker's tools at any time. Backups are kept for 2 months.

The backup server is a Dell 2900 system with a RAID to store backup indexes. Backups are routed to a Qualstar XLS tape library that holds 245 tapes, and each tape holds 1.5 Terabytes native. Inside the library are 6 LTO-5 tape drives. Backups are grouped according to our network design (home directories, group directories, web servers, database servers, email, UNIX servers, and special backup needs. COF is currently in the process of reviewing backup strategies to address changes in storage architectures, such as Storage Area Networks.

Non-electronic storage: Paper record storage is greatly reduced from historic levels, but raw data collection records including field and lab data forms, check sheets, and recording charts are stored in the FSL fire-proof vault. Legacy documents, charts, computer printouts, and individual scientist storage boxes are also stored here. While all chart recorded data has been digitized, scanning of these long-term paper documents into digital formats is ongoing. Similarly, a publication reprint library is being reduced in scope and all LTER publications have been scanned. Original photographic slides and aerial photos are inventoried and stored in six fire-proof cabinets, and scanning will proceed when resources are available.


July 2017