Difference between revisions of "AIP packaging and compression"

From Archivematica
Jump to navigation Jump to search
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
  
 
== Requirements ==
 
== Requirements ==
 
+
Must have:
* Separate compression and packaging functions
+
* Separate compression and packaging(bagit) functions  
* Ubiquitous format
 
 
* Processing time
 
* Processing time
 
* Cross-platform tool availability for unpacking
 
* Cross-platform tool availability for unpacking
* Tools used most by other repositories
 
 
* Must be able to include empty directories (excludes using zip?)
 
* Must be able to include empty directories (excludes using zip?)
 
* File date tags preserved
 
* File date tags preserved
 +
* Support removing/adding individual files (updating METS metadata)
 +
 +
Nice to have:
 +
* Ubiquitous format
 
* LZMA is the preffered compression algorithm
 
* LZMA is the preffered compression algorithm
* Support removing adding individual files (updating METS metadata)
+
* Tools used most by other repositories
  
 
See Issue 928 and Issue 927
 
See Issue 928 and Issue 927

Latest revision as of 09:26, 5 June 2012

Requirements[edit]

Must have:

  • Separate compression and packaging(bagit) functions
  • Processing time
  • Cross-platform tool availability for unpacking
  • Must be able to include empty directories (excludes using zip?)
  • File date tags preserved
  • Support removing/adding individual files (updating METS metadata)

Nice to have:

  • Ubiquitous format
  • LZMA is the preffered compression algorithm
  • Tools used most by other repositories

See Issue 928 and Issue 927

Notes On Specific Formats[edit]

ZIP[edit]

info from: http://en.wikipedia.org/wiki/Zip_%28file_format%29 The maximum size for both the archive file and the individual files inside it is 4,294,967,295 bytes (232−1 bytes, or 4 GiB) for standard ZIP, and 18,446,744,073,709,551,615 bytes (264−1 bytes, or 16 EiB) for ZIP64.

ZIP 64[edit]

The original zip format had a 4 GiB limit on various things (uncompressed size of a file, compressed size of a file and total size of the archive), as well as a limit of 65535 entries in a zip archive. In version 4.5 of the specification (which is not the same as v4.5 of any particular tool), PKWARE introduced the "ZIP64" format extensions to get around these limitations, increasing the limitation to 16 EiB (264 bytes).

The File Explorer in Windows XP does not support ZIP64, but the Explorer in Windows Vista does. Likewise, some libraries, such as DotNetZip and IO::Compress::Zip in Perl, support ZIP64. Java's built-in java.util.zip does support ZIP64 from version Java 7.[29]



info from: http://manpages.ubuntu.com/manpages/lucid/man1/zip.1.html

A companion program (unzip(1)) unpacks zip archives. The zip and unzip(1) programs can work with archives produced by PKZIP (supporting most PKZIP features up to PKZIP version 4.6), and PKZIP and PKUNZIP can work with archives produced by zip (with some exceptions, notably streamed archives, but recent changes in the zip file standard may facilitate better compatibility). zip version 3.0 is compatible with PKZIP 2.04 and also supports the Zip64 extensions of PKZIP 4.5 which allow archives as well as files to exceed the previous 2 GB limit (4 GB in some cases). zip also now supports bzip2 compression if the bzip2 library is included when zip is compiled. Note that PKUNZIP 1.10 cannot extract files produced by PKZIP 2.04 or zip 3.0. You must use PKUNZIP 2.04g or unzip 5.0p1 (or later versions) to extract them.