Information Management plays a major role at the H. J. Andrews Experimental Forest Long-Term Ecological Research (LTER) site. Intensive forest ecosystem research, conducted on the Andrews Forest since the 1950's, has resulted in many diverse, long-term ecological databases and a strong commitment to information management. Hundreds of ecological databases are managed through the Forest Science Data Bank (FSDB), which has long been an essential component of the Andrews LTER and is jointly sponsored by the OSU College of Forestry and the US Forest Service PNW. The FSDB is managed through an information management system that supports the collection, quality control, archival and long-term accessibility of collected data and associated metadata. The Andrews LTER maintains an extensive bibliography and image library, and personnel and keyword databases are managed for integrated web searches.
Learn about how the Andrews Forest LTER manages information:
- Andrews LTER Information Management Summary
- Andrews LTER Information Management System
- System Administration
Andrews LTER Information Management Summary
Information Management Philosophy and Objectives
The mission of Andrews LTER Information Management is to ensure that all LTER research data will be archived and openly available for the future. The primary goals are to 1) preserve high-quality and well-documented data collections that are both secure and accessible, 2) serve the Andrews and broader community through the development and management of informational products and tools, and 3) provide leadership and participation in relevant committees and activities at both the site and LTER Network level. The following objectives illustrate how aspects of the primary goals are achieved.
- Assure preservation of high quality metadata and data products through the direction and maintenance of a long-term data repository that adheres to LTER standards and best practices and provides security through regular maintenance and backup procedures.
- Maintain and adhere to a Data Access Policy in compliance with LTER standards to assure that data and accompanying metadata are freely and publicly available electronically within two years of collection at both the local site and within the LTER Network Information System (NIS).
- Assure regular contributions of research data into the long-term data repository through strong integration of the IM Team with site science including 1) participation and regular reporting at site executive meetings and monthly meetings, 2) annual reviews of Information Management, and 3) regular interactions and trainings conducted with co-PIs and community members.
- Develop and maintain the Andrews LTER web pages and associated user interfaces that provide access to data and metadata, research publications, programs and projects, site and personnel information, education and outreach programs, community events, and other site information.
- Provide leadership to the LTER Network in the development of standards and best practices for the NIS and participation in community projects and systems that promote the discovery, use, and integration of LTER data, both within the network and throughout the broader research community.
People and Institutions
Information Management is an essential component of the Andrews LTER program and benefits from an institutional partnership between the Oregon State University College of Forestry (COF) and the USFS Pacific Northwest Research Station (PNW). The current Andrews LTER Information Management Team reflects this long-term partnership:
- Don Henshaw (IM Programming Lead and Databases, PNW)
- Suzanne Remillard (Databases and Information Management System Administration, LTER/COF)
- Adam Kennedy (System Administrator/Wireless Communications, LTER/COF)
- Hans Luh (Programmer, .25 FTE, LTER/COF)
Two other field technicians (1 PNW, 1 LTER/COF) serve IM roles in supporting data loggers and field computers used in routine data collection, describing collection methods, and providing data. A PNW administrative assistant tracks Andrews publications and maintains the LTER bibliography. NSF supplements are used on occasion to contract for specific application development.
Impacts of Information Management Activities
Perhaps the greatest impact is demonstrated through the persistence and growth of the Forest Science Data Bank. The FSDB was established in 1980 and has been largely funded and operated by LTER personnel since the mid-1990s. The FSDB has opportunistically acquired non-LTER data and includes well over 250 databases with more than 170 databases on-line (mostly LTER). A stable computing environment and information system with desirable features such as adherence to national metadata standards have allowed the FSDB to expand its LTER data resource holdings into a regional data center. Holdings now include key US Forest Service Research data (e.g., Research Natural Areas and Experimental Forests), USFS campaign data (e.g., Demonstration of Ecosystem Management Options (DEMO) and Mount St. Helens), National Forest System data (Young Stand Study), OSU CoF data (e.g., OSU MacDonald Forest), and the Long-Term Permanent Vegetation Plot Network (OSU, PNW, UW). NSF-funded grants in ecosystem informatics such as the IGERT and summer institute (EISI) programs have broadened campus-wide perspectives on information management and cyberinfrastructure issues, and Andrews data has been essential in student projects (e.g., quality control of high-volume streaming data, visualization software on species diversity). There have been over six thousand documented downloads of data from FSDB in the past three years.
The IM Team has recently been concentrating on improving our information management system to satisfy LTER Network expectations for placing metadata and data into the Network data repository, PASTA, and advancing Network Information System (NIS) functionality. To do this, we have prioritized updates and review of all data sets, including reviewing and updating the website for necessary improvements. An on-going project has been the streamlining the processing of major data collections including several climate and vegetation study data sets by standardizing workflows and improving our capability to run quality control checks on streaming data in near real-time
Significant Results from our Efforts to Improve our Information Management System
- Metadata enhancements and improvements to the Ecological Metadata Language (EML)-generation program, which is a generic program to build EML files for all data sets, following best practices guidelines, from our relational metadata database.
- An archival system to manage versioning of data sets and versioning of associated EML files, thus improving our ability to acess previous versions of data sets.
- A quality control workflow for streaming sensor data to assure immediate posting of provisional data.
- An efficient and standardized system of handling all data streaming through the new communication network (includes developing generic database handling programs using file and attribute naming conventions).
- Refining the web-based login process to greatly simplify access to Andrews data.
Outreach and Training
The Andrews IM team conducts yearly training and outreach to graduate students, IGERT, and Eco-Informatics Summer Institute students, including conducting "Metadata Parties" with PIs as a means of collecting key research metadata for core data sets, demonstrating metadata requirements and training on the use of our administrative interface. The team meets one-on-one with students and researchers to help them understand the importance of managing their data and contributing to the long-term records of the Andrews LTER. Team members have conducted training and outreach internationally and nationally at conferences and workshops.
Andrews LTER Information Management System
Information System: The Andrews LTER Information Management Team has developed an information system to support the collection, documentation, management, and archival of a rich and diverse collection of Andrews LTER and other environmental data. The central component of the information system is the Forest Science Data Bank (FSDB), a long-term data repository initiated in 1983 (Stafford et al. 1984, 1988, Henshaw and Spycher 1999, Henshaw et al. 2002), which is supported by the Andrews LTER in partnership with the U.S. Forest Service Pacific Northwest Research Station (PNW) and the OSU College of Forestry (COF). The FSDB includes over 200 active and legacy study databases and features a highly-structured metadata database. The FSDB includes approximately 160 active LTER study databases including all "signature" data. Signature data refers to core Andrews LTER databases including all key long-term and ongoing data collections. Ongoing long-term data sets such as meteorological station, stream gauging station, stream chemistry, fish population, permanent vegetation plot, and decomposition plot data are collectively managed by LTER PIs, staff and the IM Team. Workflow paths are established to assure data updates, quality control, and archival into the FSDB and the LTER NIS on a frequent basis. Data contributions from individual LTER PIs and graduate students require more specific planning and interaction between the PI and IM Team, and software tools are used to facilitate capture of these data. In addition to continuing updates of existing data, five new LTER study databases are added each year on average. Other components of the information system include a generic, metadata-driven quality control system, an administrative interface for LTER members, data submission tools, and dynamic web pages for discovery and access to data and informational products. The information system manages study databases and research publications, and extensions to include the image library and Andrews-related museum collections are being considered.
FSDB study data and metadata: The FSDB contains "signature" and other LTER-related data sets from the Andrews Forest. All LTER data sets are routinely placed on-line based on the terms of our data access policy. The FSDB has also opportunistically captured other important data sets from OSU and the Forest Service, and continues to house significant legacy data collections that are not available on-line, due to priority status or quality control issues. Metadata are established in compliance with the LTER metadata standard, the Ecological Metadata Language (EML), and follow LTER "best practice" recommendations. Software tools are used to map elements from the relational metadata database into EML, and similarly map ESRI metadata from the FGDC spatial standard into EML. EML metadata are regularly harvested from the Andrews LTER into the central metadata repository (Metacat) at the LTER Network Office (LNO), which assures that Andrews data is available for network-wide data searches. In January 2014, the LNO rolled out their newly developed data archive, Provenance Aware Synthesis Tracking Architecture (PASTA). Data sets uploaded into PASTA have passed a set of quality checks, assuring the congruence between both data and metadata. All Andrews LTER data sets will be uploaded into PASTA. EML files are easily mapped into the NBII Biological Data Profile metadata standard using stylesheet software at the LNO and discoverable through the NBII clearinghouse. Recent improvements to our data archive system include the ability to manage both data version and metadata version together. Each data set is created (typically a CSV file) with its corresponding metadata EML file and the most current version is available for download. For some of our larger databases, like climate and hydrology, data is accessed using an interactive application that allows the querying of an individual probe and date range. Downloads are tracked through a minimal user registration system.
Quality control system: This system consists of a set of simple procedures that provide generic metadata-driven data validation. A desktop control program reads the relevant metadata for validating any given data table and generates appropriate validation code. The control program executes the generated code and records any problems in the metadata description of the data table in an error report. Validation includes checks of the primary key for nulls and duplicates (entity integrity), checks versus listed numeric ranges or enumerated codes (domain integrity), and database rules. Rules are typically specific to individual databases and often have been "discovered" with the help of database owners. Generic rules are employed in time-series contexts, but most rules are only shared occasionally. PASTA uses a similar approach, the "EML congruency checker", that provides EML-driven metadata and data validation.
Administrative interface: An improved administrative interface has been implemented that allows interactive site member submission of study metadata, managing of personnel profiles, and managing research projects including an online project application form. This interface is designed to improve the efficiency of IM operations by reducing the amount of staff time dedicated to the update of study metadata and personal information. Recent extensions have enabled the entry of publication citations. All site publications are entered through the interface rather than being managed in a separate citation management system, which allows immediate display online and PDF availability.
Data submission: The IM team has developed a web page to provide instructions and other references to facilitate submission of study data from site PIs, graduate students, and other researchers. Instructions are available to assist a data provider in entering study metadata using the administrative interface and describing spatial entities. A spreadsheet template is also provided to capture specific entity and attribute information. Desktop software tools allow the import of the template (Excel) into the metadata framework and allow additional editing of the metadata. The information system draws upon a local controlled vocabulary for both place and theme keywords and a reference list of common units of measurement to promote consistency of data set descriptions and to avoid redundant descriptions of site locations.
Web pages: Web page development has been an important activity for the site as it acts as an organizational framework for the display of research products and is a primary source of site information for both local and broader research community users. Andrews LTER personnel maintain and update extensive web pages describing the Andrews Forest, ongoing LTER and collaborative research, personnel, site data sets and associated metadata, publication lists with links to scanned documents, education and outreach, and other current events and activities. All site web pages are written dynamically with web integration software taking advantage of metadata tables that describe content, page templates and navigation bars. A web site search engine is employed and various interfaces permit additional searching for data and publications using either simple search strings or established relations with researcher, place or theme keywords. Social media (Facebook, Twitter, RS feed) have been incorporated into the web presence. Google Analytics software has been added to the web pages to track visitor numbers, user access flow, and highly trafficked pages.
System administration and hardware at Oregon State: The COF Forestry Computing Resources (FCR) provides system administration support for LTER campus computer servers through agreements with LTER and PNW. Production and development web servers (IIS, UNIX, and LINUX), production and development database servers (MS SQLServer), shared file server directories, and two tape backup servers are directly used by the LTER and supported through FCR. Refer to the FCR description of network systems for more information.
System administration and hardware at the Andrews site: The on-site Andrews LTER system administrator maintains the site Local Area Network (LAN), local web server, wireless LAN, spread spectrum and radio telemetry communication network, telephone communications, and local personal computers. A wireless LAN is installed with access points linking the conference room and classroom, dormitories, cafeteria, shop, and director's residence to the wired LAN with a wireless bridge.
Backup policies: General backup procedures are maintained and implemented through agreements with OSU College of Forestry. In general, campus web servers (including IIS, UNIX, and Linux used by LTER) and file servers (Windows) are backed up nightly. For these systems a full backup is done once each month, and a "level" backup is done once a week. A level backup catches what changed since the last full backup. Then, on all remaining days, an incremental backup is performed. Backups are kept for 6 months. See the COF backup policy for more information. A T1 line to OSU campus allows nightly backup of Andrews on-site web and file servers to COF. On-site servers are also mirrored to provide an immediate local backup.
Legato NetWorker's backup module for SQL Server is used to backup MS SQL Server databases. A full backup is performed on a regular schedule every night. Additionally, space is provided for DB managers to perform SQL backups as needed throughout the day. For example, if a database was undergoing a major change, the DB manager could use the SQL Backup Tools within MS SQL Server to backup the database by hand, before making the change. We also have the ability to perform a backup using NetWorker's tools at any time. Backups are kept for 2 months.
The backup server is a Dell 2900 system with a RAID to store backup indexes. Backups are routed to a Qualstar XLS tape library that holds 245 tapes, and each tape holds 1.5 Terabytes native. Inside the library are 6 LTO-5 tape drives. Backups are grouped according to our network design (home directories, group directories, web servers, database servers, email, UNIX servers, and special backup needs. COF is currently in the process of reviewing backup strategies to address changes in storage architectures, such as Storage Area Networks.
Non-electronic storage: Paper record storage is greatly reduced from historic levels, but raw data collection records including field and lab data forms, check sheets, and recording charts are stored in the FSL fire-proof vault. Legacy documents, charts, computer printouts, and individual scientist storage boxes are also stored here. While all chart recorded data has been digitized, scanning of these long-term paper documents into digital formats is ongoing. Similarly, a publication reprint library is being reduced in scope and all LTER publications have been scanned. Original photographic slides and aerial photos are inventoried and stored in six fire-proof cabinets, and scanning will proceed when resources are available.