Difference between revisions of "Storage API"

From Archivematica
Jump to navigation Jump to search
(→‎Initial Research: More categories)
(→‎Initial Research: More on how Archivematica touches FS)
Line 1: Line 1:
 
This is the discussion page for the Archivematica Storage API (Issue #5158), requirements, and proposed implementations.
 
This is the discussion page for the Archivematica Storage API (Issue #5158), requirements, and proposed implementations.
 +
 +
== Goals ==
 +
 +
* Get transfers from more locations (eg. FTP, NFS, HTTP, etc)
 +
* Store AIPs more flexibly (eg. LOCKSS, FEDORA)
 +
** be able to break into smaller chucks for external storage requirements, store metadata about the chunks
 +
* Configure where to store transfer backlog, quarantine location, etc
  
 
== Initial Research ==
 
== Initial Research ==
Line 6: Line 13:
  
 
Categories:
 
Categories:
* Transfers                                                                                                                                                                                    
+
* Transfers
 +
** dashboard.... ? , puts in watched directory
 
* Quarantine
 
* Quarantine
 +
** copied to watchedDirectories/quarantined/
 
* Backlog transfer
 
* Backlog transfer
 +
** Send to backlog (MicroServiceChainLink abd6d60c-d50f-4660-a189-ac1b34fafe85)
 +
** Where retrieve from backlog?
 
* Currently Processing
 
* Currently Processing
 
** Anything initiated by putting files in a watchedDirectory, anything being processed by a MicroServiceChain
 
** Anything initiated by putting files in a watchedDirectory, anything being processed by a MicroServiceChain
Line 16: Line 27:
 
* AIP Storage
 
* AIP Storage
 
** done in one place: src/MCPClient/lib/clientScripts/storeAIP.py
 
** done in one place: src/MCPClient/lib/clientScripts/storeAIP.py
 +
* Uploaded DIPs?
  
 
Ways Archivematica can touch the filesystem:
 
Ways Archivematica can touch the filesystem:
* python's open()
+
* python's open(), shutil.{move|copy|rm}
* python's shutil.{move|copy|rm}
+
** mostly just in currently processing
 
* python's os module (checking if file/directory exists, create directory, remove file)
 
* python's os module (checking if file/directory exists, create directory, remove file)
 
* cp, mv, mkdir, rm, chmod as client Scripts
 
* cp, mv, mkdir, rm, chmod as client Scripts
 +
** mostly just processing, or moving within processing dirs
 +
** create transfer backup (MicroServiceChainLink 478512a6-10e4-410a-847d-ce1e25d8d31c)
 +
** Check for 'move to processing directory' that fetches from quarantine, backlog
 +
*** Usually their own chainlinks, so should be straightforward to change
 
* dashboard configs (eg. AIP storage location, transfer source)
 
* dashboard configs (eg. AIP storage location, transfer source)
 
** dashboard.components.main.models.py SourceDirectory, StorageDirectory
 
** dashboard.components.main.models.py SourceDirectory, StorageDirectory
*
 
  
  
 
[[Category:Development documentation]]
 
[[Category:Development documentation]]

Revision as of 17:18, 4 June 2013

This is the discussion page for the Archivematica Storage API (Issue #5158), requirements, and proposed implementations.

Goals

  • Get transfers from more locations (eg. FTP, NFS, HTTP, etc)
  • Store AIPs more flexibly (eg. LOCKSS, FEDORA)
    • be able to break into smaller chucks for external storage requirements, store metadata about the chunks
  • Configure where to store transfer backlog, quarantine location, etc

Initial Research

Goal: Look at all the places Archivematica currently accesses the filesystem, and categorize them.

Categories:

  • Transfers
    • dashboard.... ? , puts in watched directory
  • Quarantine
    • copied to watchedDirectories/quarantined/
  • Backlog transfer
    • Send to backlog (MicroServiceChainLink abd6d60c-d50f-4660-a189-ac1b34fafe85)
    • Where retrieve from backlog?
  • Currently Processing
    • Anything initiated by putting files in a watchedDirectory, anything being processed by a MicroServiceChain
    • touched everywhere, in all the client scripts, with python and client scripts.
    • Probably best to keep local
    • Already set up to be move-able with %sharedDirectory% as long as folder structure inside %sharedDirectory% is preserved
  • AIP Storage
    • done in one place: src/MCPClient/lib/clientScripts/storeAIP.py
  • Uploaded DIPs?

Ways Archivematica can touch the filesystem:

  • python's open(), shutil.{move|copy|rm}
    • mostly just in currently processing
  • python's os module (checking if file/directory exists, create directory, remove file)
  • cp, mv, mkdir, rm, chmod as client Scripts
    • mostly just processing, or moving within processing dirs
    • create transfer backup (MicroServiceChainLink 478512a6-10e4-410a-847d-ce1e25d8d31c)
    • Check for 'move to processing directory' that fetches from quarantine, backlog
      • Usually their own chainlinks, so should be straightforward to change
  • dashboard configs (eg. AIP storage location, transfer source)
    • dashboard.components.main.models.py SourceDirectory, StorageDirectory