[nSLUG] tar + compression

George N. White III gnwiii at gmail.com
Tue Jan 25 15:04:57 AST 2011


On Tue, Jan 25, 2011 at 11:28 AM, Peter Dobcsanyi <petrus at ftml.net> wrote:

> Using tar with various compression methods.
>
> System: Ubuntu 10.10
> CPU:    Pentium(R) 4 CPU 3.20GHz
> source: Django's mercurial repo with build/ and built docs/ (i.e. mainly text)

Large repositories are tricky -- it is painful to deal with a single
large compressed
archive.  I'd prefer an archive of smaller compressed chunks so you
can more quickly
extract a portion of the archive.


> tar commands with running times:
>
>    tar cf /tmp/d.tar django   0.10s user 0.90s system 29% cpu 3.404 total
>
> tar czf /tmp/d.tar.gz django   16.95s user 1.78s system 101% cpu 18.387 total
> tar cjf /tmp/d.tar.bz2 django  81.88s user 2.26s system 100% cpu 1:23.57 total
> tar cJf /tmp/d.tar.xz django   166.25s user 3.80s system 100% cpu 2:48.91 total

What about memory requirements?  There is a parallel xz implementation
that should
help when running on new multi-core CPU's.

> results:
>
> -rw------- 1 peter peter 184176640 2011-01-25 10:23 /tmp/d.tar         100.00%
> -rw------- 1 peter peter 128148318 2011-01-25 10:21 /tmp/d.tar.gz       69.58%
> -rw------- 1 peter peter 115184718 2011-01-25 10:20 /tmp/d.tar.bz2      62.54%
> -rw------- 1 peter peter  93680468 2011-01-25 10:17 /tmp/d.tar.xz       50.86%
>
> My conclusion:  gzip is a good compromise between time and compression ratio.
>                YMMV

The tradeoffs depend on the application.  If you plan to store many
compressed files for a long time
then smaller size can represent significant cost savings in storage
capacity and transfer times, so
smallest size wins.

-- 
George N. White III <aa056 at chebucto.ns.ca>
Head of St. Margarets Bay, Nova Scotia



More information about the nSLUG mailing list