Digital Preservation in North Carolina
Apr 21, 2017
As outlined in North Carolina’s General Statutes 132 and 121, the State Archives of North Carolina (SANC) serves dual purposes within the state: to provide guidance concerning the preservation and management of government records to state, county, city, and state university officials; and to collect, preserve, and provide access to historically significant archival materials relating to North Carolina. SANC’s collections include everything from early 17th century documents and maps to digital media and electronic records produced by current state agencies, including digital audiovisual and geospatial materials.
Digital Preservation in North Carolina
The North Carolina Digital Repository is a collaborative effort between the State Archives (SANC) and the State Library of NC (SLNC) to store, preserve, and provide access to the permanently valuable electronic records and publications of state government. The electronic records program began in 2001 with the transfer of 5 GB from the administration of Governor Jim Hunt. This transfer of the Governor’s email server lead to a partnership between SANC and SLNC in 2005 on the Access to State Government Information Initiative (ASGII) grant which allowed the state to begin addressing the issues and challenges of preserving and providing access to electronic records of all formats. It was natural that the sister divisions of the DNCR would collaborate on the digital preservation of public records and publications and coordinate relevant policies and procedures generally. The North Carolina Digital Repository itself was established in 2007 with the purchase of dedicated storage space as part of the Electronic Mail Capture and Preservation (EMCAP) grant, an NHPRC grant in which North Carolina partnered with Pennsylvania and Kentucky to begin investigating the management of email.
In the 10 years since its establishment, the North Carolina Digital Repository has continued to grow. As part of the GeoMapp grant, a National Digital Information Infrastructure and Preservation (NDIIP) grant from the Library of Congress led by North Carolina with partners in Kentucky, Montana, and Utah, SANC was able to purchase additional offsite backup storage in Asheville in 2008. The Repository has collected over 75 TB of electronic records and archival material, and that number continues to grow as information is increasingly created and stored in a digital environment. The materials collected by the Digital Repository include:
- Born-digital state agency records scheduled for permanent digital retention in the State Archives, including email
- Born-digital local government tax records scheduled for permanent digital retention in the State Archives (pilot program)
- Born-digital Special Collections materials accessioned into the State Archives
- Selected digitized copies of analog materials previously accessioned into the State Archives
- Publications managed by the State Library
These materials are housed and managed on local servers with geographically dispersed backups. Currently these collections are managed manually by SANC and SLNC staff as they are transferred to our custody, though we anticipate that this model will not be sustainable in the long-term due to the volume of data managed by the repository, as well as the anticipated volume of future transfers.
Developing the Matrix
In 2014, SANC and SLNC became a pre-launch beta tester for the new ArchivesDirect service. ArchivesDirect provides access to Archivematica through DuraCloud as a possible solution for automating some of the preservation actions needed to ingest and provide access to incoming electronic records. The first edition of a testing matrix was developed as part of the testing process. The matrix listed required and preferred system features, and offered a ratings scale to determine how well the tool met current needs and preferences. At the end of the testing period, it was determined that the ArchivesDirect service did not meet the North Carolina Digital Repository’s need at that time.
In 2015, SLNC received a grant to test another preservation system, Preservica. Staff wanted to compare the testing criteria used with Archivematica on Preservica, but found the tests difficult to replicate due to vague language and scoring in the testing documentation. Instead of replicating the tests between systems, the testing criteria was altered, condensing it to the core system requirements for preservation. The rewritten matrix added objective scoring and repeatable tests that could be done across systems. Objective and repeatable testing, we hoped, would help us communicate our feedback to vendors and IT staff more efficiently, as well as help us justify the cost of the preservation solution, should we decide to purchase it. We wanted the matrix to clearly answer three things about each core requirement for a digital preservation system:
What do we mean?
How do we prove it?
How do we score it?
We decided that all test scores would be pass/fail, and that section and total system scores would be based on the average of individual test scores; the closer to 1 a total system score was, the more fully it passed the tests. Additionally, we created a standard suite of testing files for use in the evaluation of each digital preservation platform.
By creating the testing matrix, we hope it will allow us and other institutions to provide clear, repeatable feedback to vendors, developers, and IT staff to better meet our needs and best practices. The matrix will help justify the purchase of digital preservation software to administers and allow the State Archives of North Carolina to continue to keep up with the increasing amount of data being ingested annually. Additionally, when we presented the matrix at BPE 2016 in Sacramento, CA, we asked other archival institutions to provide feedback and collaboration on the matrix so that we could continue to improve testing. So far, the matrix has been shared with two other states who are in the process of testing digital preservation system solutions.
At the end of our testing period with Preservica, we had not completely finished the matrix, but we have been using the completed matrix in our current testing of LibSafe/LibNova. Based on this testing period, we will be editing and updating the testing matrix in order to improve it for any future tests, as we have found that some of the tests are more software-specific than we had anticipated.
For more information on the testing matrix, or to collaborate on its development, please contact Camille Tyndall Watson at firstname.lastname@example.org.