An iterator over all lines of the given files. Like module fileinput in the standard library, but faster, and written in C.
Unlike fileinput, input files are not completely read into memory; it can handle files of any size.
In addition, a replacement for the standard fileinput.FileInput legacy class is provided.
This package has no external dependencies. It has been tested in Python 2.6; support for Python 3.1 is still experimental.
Extract the source distribution in some temporary directory, and execute:
python setup.py build python setup.py install
It uses the very same test suite as the standard fileinput module. To run the tests:
cd test python test_multifileiter.py
This package implements a basic multiple file iterator written in C, a thin Python wrapper on top of it, and a replacement for the standard fileinput.FileInput class.
Example:
from multifileiter.fileinput import FileInput fi = FileInput(list_of_file_names) # iterate over every line in every file for line in fi: process_line(line)
If you want to rewrite the input files with new content:
fi = FileInput(list_of_file_names, inplace=True) for line in fi: new_line = process_line(line) fi.output.write(new_line)
The output attribute points to the currently written file. If you want the legacy FileInput behavior (printing or writing to sys.stdout goes to the output file) use replace_stdout=True.
Class LegacyFileInput implements the same interface as the standard library's FileInput. You may monkey-patch the standard fileinput module to gain the speed of the new module without modifying any legacy code. Just execute this at the start of your program:
# monkey-patch stdlib's fileinput import multifileiter.fileinput import fileinput fileinput.FileInput = multifileiter.fileinput.LegacyFileInput .. note:: In addition to the :class:`FileInput` class, the :mod:`fileinput` standard module exposes many global functions. Using those global functions with this version of :class:`FileInput` may work, or may not. I don't like its global nature, they were never tested with this module, and using them is not supported. Use at your own risk.
class multifileiter.fileinput.MultiFileIter (files=None, mode="r")
files is any iterable yielding either strings or file-like objects. The iterable is consumed lazily. Strings are considered file names and the corresponding file is opened using the mode parameter. The string '-' is special-cased and represents sys.stdin. Other objects are assumed to be file-like objects; only their next(), name() and close() methods are called (the latter two being optional).
MultiFileIter implements the iterator protocol. next() returns each line from its input files.
MultiFileIter objects have these methods:
class multifileiter.fileinput.FileInput (files=None, inplace=0, backup="", mode="r", openhook=None, replace_stdout=False)
FileInput extends MultiFileIter, adding support for writing files.
files and mode are passed to the base class.
When inplace is true, the input files are renamed, and a new file of the same name is created for writting (it may be accessed thru the output property). In addition, if replace_stdout is true, standard output (sys.stdout) is redirected to that file too.
backup is the extension added to the original file names; '.bak' is used if not specified.
openhook is a function used instead of the builtin open function to open the files; it must take two positional arguments, filename and mode.
FileInput objects have these attributes:
output
The file currently being written (or None when inplace is false)
The issue tracker is located at http://code.google.com/p/multifileiter/issues
Alternatively you may contact the author: Gabriel A. Genellina <ggenellina@yahoo.com.ar>
This package is Copyright 2010 Gabriel A. Genellina, and licensed under the MIT license: http://opensource.org/licenses/mit-license.php
See license.txt for details.