Package lzw
[frames] | no frames]

Package lzw

source code

A stream friendly, simple compression library, built around iterators. See compress and decompress for the easiest way to get started.

After the TIFF implementation of LZW, as described at http://www.fileformat.info/format/tiff/corion-lzw.htm

In an even-nuttier-shell, lzw compresses input bytes with integer codes. Starting with codes 0-255 that code to themselves, and two control codes, we work our way through a stream of bytes. When we encounter a pair of codes c1,c2 we add another entry to our code table with the lowest available code and the value value(c1) + value(c2)[0]

Of course, there are details :)

The Details

Our control codes are

When dealing with bytes, codes are emitted as variable length bit strings packed into the stream of bytes.

codepoints are written with varying length

code points are stored with their MSB in the most significant bit available in the output character.

>>> import lzw
>>>
>>> mybytes = lzw.readbytes("README.txt")
>>> lessbytes = lzw.compress(mybytes)
>>> newbytes = b"".join(lzw.decompress(lessbytes))
>>> oldbytes = b"".join(lzw.readbytes("README.txt"))
>>> oldbytes == newbytes
True

Version: 0.01

Author: Joe Bowers

License: MIT License

Classes
  ByteEncoder
Takes a stream of uncompressed bytes and produces a stream of compressed bytes, usable by ByteDecoder.
  ByteDecoder
Decodes, combines bit-unpacking and interpreting a codepoint stream, suitable for use with bytes generated by ByteEncoder.
  BitPacker
Translates a stream of lzw codepoints into a variable width packed stream of bytes, for use by BitUnpacker.
  BitUnpacker
An adaptive-width bit unpacker, intended to decode streams written by BitPacker into integer codepoints.
  Decoder
Uncompresses a stream of lzw code points, as created by Encoder.
  Encoder
Given an iterator of bytes, returns an iterator of integer codepoints, suitable for use by Decoder.
  PagingEncoder
UNTESTED.
  PagingDecoder
UNTESTED.
Functions
 
compress(plaintext_bytes)
Given an iterable of bytes, returns a (hopefully shorter) iterable of bytes that you can store in a file or pass over the network or what-have-you, and later use to get back your original bytes with decompress.
source code
 
decompress(compressed_bytes)
Given an iterable of bytes that were the result of a call to compress, returns an iterator over the uncompressed bytes.
source code
 
unpackbyte(b)
Given a one-byte long byte string, returns an integer.
source code
 
filebytes(fileobj, buffersize=1024)
Convenience for iterating over the bytes in a file.
source code
 
readbytes(filename, buffersize=1024)
Opens a file named by filename and iterates over the filebytes found therein.
source code
 
writebytes(filename, bytesource)
Convenience for emitting the bytes we generate to a file.
source code
 
inttobits(anint, width=None)
Produces an array of booleans representing the given argument as an unsigned integer, MSB first.
source code
 
intfrombits(bits)
Given a list of boolean values, interprets them as a binary encoded, MSB-first unsigned integer (with True == 1 and False == 0) and returns the result.
source code
 
bytestobits(bytesource)
Breaks a given iterable of bytes into an iterable of boolean values representing those bytes as unsigned integers.
source code
 
bitstobytes(bits)
Interprets an indexable list of booleans as bits, MSB first, to be packed into a list of integers from 0 to 256, MSB first, with LSBs zero-padded.
source code
Variables
  __status__ = 'Development'
  __email__ = 'joerbowers@gmail.com'
  __url__ = 'http://www.joe-bowers.com/static/lzw'
  CLEAR_CODE = 256
  END_OF_INFO_CODE = 257
  DEFAULT_MIN_BITS = 9
  DEFAULT_MAX_BITS = 12
  __package__ = 'lzw'
Function Details

compress(plaintext_bytes)

source code 

Given an iterable of bytes, returns a (hopefully shorter) iterable of bytes that you can store in a file or pass over the network or what-have-you, and later use to get back your original bytes with decompress. This is the best place to start using this module.

unpackbyte(b)

source code 

Given a one-byte long byte string, returns an integer. Equivalent to struct.unpack("B", b)

filebytes(fileobj, buffersize=1024)

source code 

Convenience for iterating over the bytes in a file. Given a file-like object (with a read(int) method), returns an iterator over the bytes of that file.

readbytes(filename, buffersize=1024)

source code 

Opens a file named by filename and iterates over the filebytes found therein. Will close the file when the bytes run out.

writebytes(filename, bytesource)

source code 

Convenience for emitting the bytes we generate to a file. Given a filename, opens and truncates the file, dumps the bytes from bytesource into it, and closes it

inttobits(anint, width=None)

source code 

Produces an array of booleans representing the given argument as an unsigned integer, MSB first. If width is given, will pad the MSBs to the given width (but will NOT truncate overflowing results)

>>> import lzw
>>> lzw.inttobits(304, width=16)
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0]

intfrombits(bits)

source code 

Given a list of boolean values, interprets them as a binary encoded, MSB-first unsigned integer (with True == 1 and False == 0) and returns the result.

>>> import lzw
>>> lzw.intfrombits([ 1, 0, 0, 1, 1, 0, 0, 0, 0 ])
304

bytestobits(bytesource)

source code 

Breaks a given iterable of bytes into an iterable of boolean values representing those bytes as unsigned integers.

>>> import lzw
>>> [ x for x in lzw.bytestobits(b"\x01\x30") ]
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0]

bitstobytes(bits)

source code 

Interprets an indexable list of booleans as bits, MSB first, to be packed into a list of integers from 0 to 256, MSB first, with LSBs zero-padded. Note this padding behavior means that round-trips of bytestobits(bitstobytes(x, width=W)) may not yield what you expect them to if W % 8 != 0

Does *NOT* pack the returned values into a bytearray or the like.

>>> import lzw
>>> bitstobytes([0, 0, 0, 0, 0, 0, 0, 0, "Yes, I'm True"]) == [ 0x00, 0x80 ]
True
>>> bitstobytes([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0]) == [ 0x01, 0x30 ]
True