Difference between revisions of "AIP packaging and compression"

From Archivematica
Jump to navigation Jump to search
(Created page with " == Requirements == * Separate compression and packaging functions * Ubiquitous format * Processing time * Cross-platform tool availability for unpacking * Tools used most by...")
 
 
(7 intermediate revisions by 2 users not shown)
Line 1: Line 1:
  
 
== Requirements ==
 
== Requirements ==
 +
Must have:
 +
* Separate compression and packaging(bagit) functions
 +
* Processing time
 +
* Cross-platform tool availability for unpacking
 +
* Must be able to include empty directories (excludes using zip?)
 +
* File date tags preserved
 +
* Support removing/adding individual files (updating METS metadata)
  
* Separate compression and packaging functions
+
Nice to have:
 
* Ubiquitous format
 
* Ubiquitous format
* Processing time
+
* LZMA is the preffered compression algorithm
* Cross-platform tool availability for unpacking
 
 
* Tools used most by other repositories
 
* Tools used most by other repositories
* Must be able to include empty directories (excludes using zip)
 
  
See Isse 928 and Issue 927
+
See Issue 928 and Issue 927
 +
 
 +
== Notes On Specific Formats ==
 +
 
 +
=== ZIP ===
 +
info from: http://en.wikipedia.org/wiki/Zip_%28file_format%29
 +
The maximum size for both the archive file and the individual files inside it is 4,294,967,295 bytes (232−1 bytes, or 4 GiB) for standard ZIP, and 18,446,744,073,709,551,615 bytes (264−1 bytes, or 16 EiB) for ZIP64.
 +
 
 +
==== ZIP 64 ====
 +
The original zip format had a 4 GiB limit on various things (uncompressed size of a file, compressed size of a file and total size of the archive), as well as a limit of 65535 entries in a zip archive. In version 4.5 of the specification (which is not the same as v4.5 of any particular tool), PKWARE introduced the "ZIP64" format extensions to get around these limitations, increasing the limitation to 16 EiB (264 bytes).
 +
 
 +
The File Explorer in Windows XP does not support ZIP64, but the Explorer in Windows Vista does. Likewise, some libraries, such as DotNetZip and IO::Compress::Zip in Perl, support ZIP64. Java's built-in java.util.zip does support ZIP64 from version Java 7.[29]
 +
 
 +
 
 +
 
 +
 
 +
info from: http://manpages.ubuntu.com/manpages/lucid/man1/zip.1.html
 +
 
 +
A  companion  program  (unzip(1))  unpacks  zip  archives.  The zip and unzip(1) programs can work with archives produced by PKZIP  (supporting most PKZIP features up to PKZIP version 4.6), and PKZIP and PKUNZIP can work with archives produced  by  zip  (with  some  exceptions,  notably streamed  archives,  but  recent  changes  in the zip file standard may facilitate better compatibility).  zip version 3.0 is  compatible  with PKZIP  2.04  and  also supports the Zip64 extensions of PKZIP 4.5 which allow archives as well as files to exceed the previous 2 GB limit (4 GB in  some  cases).  zip also now supports bzip2 compression if the bzip2 library is included when zip  is  compiled.  Note  that  PKUNZIP  1.10 cannot  extract  files  produced by PKZIP 2.04 or zip 3.0. You must use PKUNZIP 2.04g or unzip 5.0p1 (or later versions) to extract them.

Latest revision as of 09:26, 5 June 2012

Requirements[edit]

Must have:

  • Separate compression and packaging(bagit) functions
  • Processing time
  • Cross-platform tool availability for unpacking
  • Must be able to include empty directories (excludes using zip?)
  • File date tags preserved
  • Support removing/adding individual files (updating METS metadata)

Nice to have:

  • Ubiquitous format
  • LZMA is the preffered compression algorithm
  • Tools used most by other repositories

See Issue 928 and Issue 927

Notes On Specific Formats[edit]

ZIP[edit]

info from: http://en.wikipedia.org/wiki/Zip_%28file_format%29 The maximum size for both the archive file and the individual files inside it is 4,294,967,295 bytes (232−1 bytes, or 4 GiB) for standard ZIP, and 18,446,744,073,709,551,615 bytes (264−1 bytes, or 16 EiB) for ZIP64.

ZIP 64[edit]

The original zip format had a 4 GiB limit on various things (uncompressed size of a file, compressed size of a file and total size of the archive), as well as a limit of 65535 entries in a zip archive. In version 4.5 of the specification (which is not the same as v4.5 of any particular tool), PKWARE introduced the "ZIP64" format extensions to get around these limitations, increasing the limitation to 16 EiB (264 bytes).

The File Explorer in Windows XP does not support ZIP64, but the Explorer in Windows Vista does. Likewise, some libraries, such as DotNetZip and IO::Compress::Zip in Perl, support ZIP64. Java's built-in java.util.zip does support ZIP64 from version Java 7.[29]



info from: http://manpages.ubuntu.com/manpages/lucid/man1/zip.1.html

A companion program (unzip(1)) unpacks zip archives. The zip and unzip(1) programs can work with archives produced by PKZIP (supporting most PKZIP features up to PKZIP version 4.6), and PKZIP and PKUNZIP can work with archives produced by zip (with some exceptions, notably streamed archives, but recent changes in the zip file standard may facilitate better compatibility). zip version 3.0 is compatible with PKZIP 2.04 and also supports the Zip64 extensions of PKZIP 4.5 which allow archives as well as files to exceed the previous 2 GB limit (4 GB in some cases). zip also now supports bzip2 compression if the bzip2 library is included when zip is compiled. Note that PKUNZIP 1.10 cannot extract files produced by PKZIP 2.04 or zip 3.0. You must use PKUNZIP 2.04g or unzip 5.0p1 (or later versions) to extract them.