Storage API

From Archivematica
Revision as of 16:18, 4 June 2013 by Hbecker (talk | contribs) (→‎Initial Research: More on how Archivematica touches FS)
Jump to navigation Jump to search

This is the discussion page for the Archivematica Storage API (Issue #5158), requirements, and proposed implementations.

Goals

  • Get transfers from more locations (eg. FTP, NFS, HTTP, etc)
  • Store AIPs more flexibly (eg. LOCKSS, FEDORA)
    • be able to break into smaller chucks for external storage requirements, store metadata about the chunks
  • Configure where to store transfer backlog, quarantine location, etc

Initial Research

Goal: Look at all the places Archivematica currently accesses the filesystem, and categorize them.

Categories:

  • Transfers
    • dashboard.... ? , puts in watched directory
  • Quarantine
    • copied to watchedDirectories/quarantined/
  • Backlog transfer
    • Send to backlog (MicroServiceChainLink abd6d60c-d50f-4660-a189-ac1b34fafe85)
    • Where retrieve from backlog?
  • Currently Processing
    • Anything initiated by putting files in a watchedDirectory, anything being processed by a MicroServiceChain
    • touched everywhere, in all the client scripts, with python and client scripts.
    • Probably best to keep local
    • Already set up to be move-able with %sharedDirectory% as long as folder structure inside %sharedDirectory% is preserved
  • AIP Storage
    • done in one place: src/MCPClient/lib/clientScripts/storeAIP.py
  • Uploaded DIPs?

Ways Archivematica can touch the filesystem:

  • python's open(), shutil.{move|copy|rm}
    • mostly just in currently processing
  • python's os module (checking if file/directory exists, create directory, remove file)
  • cp, mv, mkdir, rm, chmod as client Scripts
    • mostly just processing, or moving within processing dirs
    • create transfer backup (MicroServiceChainLink 478512a6-10e4-410a-847d-ce1e25d8d31c)
    • Check for 'move to processing directory' that fetches from quarantine, backlog
      • Usually their own chainlinks, so should be straightforward to change
  • dashboard configs (eg. AIP storage location, transfer source)
    • dashboard.components.main.models.py SourceDirectory, StorageDirectory