Archivists Who Code

 View Only
  • 1.  Preservica OPEX Generation

    Posted 10-03-2022 01:44 PM
    Hi,

    I was talking to Brian Thomas about this group at the Best Practices Exchange Conference recently and I'd love to resurrect it. As a start I have a Preservica problem. I want to bulk upload essentially a hard drive's amount of files and then link the file to its existing ArchivesSpace archival object. The problem is automating that. Preservica Support gave me information about their OPEX ingest function but I need to create the OPEX files and insert them into the right directory level while also querying the ArchivesSpace API via the component identifier because that's the only link I have to the ArchivesSpace object. 

    My Questions are:
    1. How to get the Archival Object ID via the ArchivesSpace API with only the component identifier field as an access point
    2. The best way to generate the OPEX metadata. I've been making it in a spreadsheet then converting it to XML
    3. Inserting the right metadata file in the right directory for ingest. 

    Attached are the python scripts I have so far:
    opexTransform.py reads the spreadsheet and converts it to XML by cheating and just making a text file that happens to be XML.
    patternDir.py reads the original file directory and copies the directory structure and only the preservation file formats, ignoring any access or vendor created admin files.

    I have nothing for the ArchivesSpace part. I'm more comfortable with the ArchivesSpace database but my new institution hosts their instance so I don't have access to it.

    ------------------------------
    Corinne Chatnik
    Union College, Shaffer Library
    Schenectady, NY

    chatnikc@union.edu
    ------------------------------


  • 2.  RE: Preservica OPEX Generation

    Posted 10-06-2022 04:06 PM
    Hi corinne,
    Your stuff didn't attach. My experience was needing to add .txt to the extension so it doesn't get filtered out by the platform.

    I've got nothing on the Aspace API but maybe I'm just not quite understanding. You want to extract the archival object ID from aspace and insert it at the item level in Preservica like in this section?

    I am thinking about this like a data merge, what is the point at which both ASpace and Preservica data will match? My guess would be filename.

    Using stream of consciousness to work out my understanding: If you can harvest the Aspace info info from a collection and generate a spreadsheet from that lists the AO name and the AO identifier, then you can use a spreadsheet tool like pandas of excel to merge your two spreadsheets of data. in which case you also need the Identifier type name from Preservica for the opex file. Screenshot of an opex file i have post-ingest

    If all that matches up, I have something that may be adaptable but is not necessarily created for that purpose.

    ------------------------------
    Brian Thomas
    ------------------------------