Difference between revisions of "Improvements/Reporting"

From Archivematica
Jump to navigation Jump to search
 
(13 intermediate revisions by the same user not shown)
Line 4: Line 4:
 
Archivematica is designed to be very AIP-centric but users want to be able to access common files (commonalities could include file type, date ingested, tools used to generate metadata, or other) across AIPs.  For example, a user might want to know how many PDFs were ingested across all AIPs between January and June of 2017.  Alternatively, a user might want to run a file identification report before creating a SIP that identifies file types as a form of high-level analysis or scan of what is contained in the transfer.   
 
Archivematica is designed to be very AIP-centric but users want to be able to access common files (commonalities could include file type, date ingested, tools used to generate metadata, or other) across AIPs.  For example, a user might want to know how many PDFs were ingested across all AIPs between January and June of 2017.  Alternatively, a user might want to run a file identification report before creating a SIP that identifies file types as a form of high-level analysis or scan of what is contained in the transfer.   
  
To start, the goal of this project is to be able to query the Archivematica database to locate the total number of files in the storage service by its [https://www.nationalarchives.gov.uk/PRONOM/Default.aspx PRONOM fmt/id].
+
This improvement was proposed in 2015 and some work was done (see Status 2015 and Analysis 2015).
  
See Search for additional information.
+
List of [[Research_data_management#METS_parsing | example searches]] and expected responses.
 +
 
 +
The goal of this project is to be able to query the Archivematica database to locate the total number of files in the storage service by its [https://www.nationalarchives.gov.uk/PRONOM/Default.aspx PRONOM fmt/id].
  
 
==User story==
 
==User story==
 
As a repository manager I want to be able to search across AIPs to locate how many of a certain file type I have.  Ideally, I'd like to be able to narrow my search using date or other parameters.
 
As a repository manager I want to be able to search across AIPs to locate how many of a certain file type I have.  Ideally, I'd like to be able to narrow my search using date or other parameters.
 
== Status 2015==
 
 
* Initial storage service work: [https://github.com/artefactual/archivematica-storage-service/pull/89/ Pull Request]
 
* METS reader-writer support for PREMIS: [https://github.com/artefactual-labs/mets-reader-writer/pull/20/ Pull Request]
 
* PyPREMIS [https://github.com/uchicago-library/uchicagoldr-premiswork Repository]
 
* Server for METS reader-writer [https://github.com/artefactual-labs/mets-reader-writer/pull/21/ Pull Request]
 
  
 
== Development tasks 2017 ==
 
== Development tasks 2017 ==
Line 26: Line 21:
 
*Produce sample input (CURL command) and output (JSON response)*Combine METS reader/writer work
 
*Produce sample input (CURL command) and output (JSON response)*Combine METS reader/writer work
 
*Modify / make new PR
 
*Modify / make new PR
 +
 +
== Status 2015==
 +
 +
* Initial storage service work: [https://github.com/artefactual/archivematica-storage-service/pull/89/ Pull Request]
 +
* METS reader-writer support for PREMIS: [https://github.com/artefactual-labs/mets-reader-writer/pull/20/ Pull Request]
 +
* PyPREMIS [https://github.com/uchicago-library/uchicagoldr-premiswork Repository]
 +
* Server for METS reader-writer [https://github.com/artefactual-labs/mets-reader-writer/pull/21/ Pull Request]
 +
== Analysis 2015 ==
 +
 +
* This will require improvements to METS reader-writer (metsrw) to be able to support different queries
 +
* METS can contain multiple metadata standards - metsrw should be able to handle these in an easily-extensible way
 +
** Since it is the ''METS'' reader-writer, other standards should be supported as plugins of some sort
 +
** Prefer to use existing libraries (eg PyPREMIS) or thin wrappers around existing libraries
 +
* By default metsrw should include PREMIS in the way Archivematica uses it and DublinCore

Latest revision as of 11:45, 17 October 2017

Synopsis[edit]

This project is being sponsored by UCLA Library and NYPL Special Collections but more collaborators are welcome! Please get in touch on the community user forum.

Archivematica is designed to be very AIP-centric but users want to be able to access common files (commonalities could include file type, date ingested, tools used to generate metadata, or other) across AIPs. For example, a user might want to know how many PDFs were ingested across all AIPs between January and June of 2017. Alternatively, a user might want to run a file identification report before creating a SIP that identifies file types as a form of high-level analysis or scan of what is contained in the transfer.

This improvement was proposed in 2015 and some work was done (see Status 2015 and Analysis 2015).

List of example searches and expected responses.

The goal of this project is to be able to query the Archivematica database to locate the total number of files in the storage service by its PRONOM fmt/id.

User story[edit]

As a repository manager I want to be able to search across AIPs to locate how many of a certain file type I have. Ideally, I'd like to be able to narrow my search using date or other parameters.

Development tasks 2017[edit]

  • Rebase search dev PR - done
  • Evaluate if this allows us to remove tastypie in favour of Django REST framework (yes/no/maybe?/both) - done, answer is both
  • Evaluate if search branch is able to answer the clients’ PRONOM questions:
    • How many files do I have in a PRONOM fmt/id?
      • that were ingested between date1 and date2?
  • Produce sample input (CURL command) and output (JSON response)*Combine METS reader/writer work
  • Modify / make new PR

Status 2015[edit]

Analysis 2015[edit]

  • This will require improvements to METS reader-writer (metsrw) to be able to support different queries
  • METS can contain multiple metadata standards - metsrw should be able to handle these in an easily-extensible way
    • Since it is the METS reader-writer, other standards should be supported as plugins of some sort
    • Prefer to use existing libraries (eg PyPREMIS) or thin wrappers around existing libraries
  • By default metsrw should include PREMIS in the way Archivematica uses it and DublinCore