Utility functions for MRJob that have no external dependencies.
boto’s file iterator splits by buffer size instead of by newline. This wrapper puts them back into lines.
Return an uncompressed bz2 stream from a file object
build a command line that works in a shell.
Resolve ~ (home dir) and environment variables in path.
If path is None, return None.
Get the name of the directory the tar at archive_path extracts into.
| Parameters: |
|
|---|
return the file extension, including the .
>>> file_ext('foo.tar.gz')
'.tar.gz'
Generate a hash (currently md5) of the repr of the object
Set up a null handler for the given stream, to suppress “no handlers could be found” warnings.
Set up logging.
| Parameters: |
|
|---|
Duplicate behavior of OptionParser, but capture the strings required to reproduce the same values. Ref. optparse.py lines 1414-1548 (python 2.6.5)
Given a dictionary mapping OptionGroup and OptionParser objects to a list of strings represention option dests, populate the objects with options from indexed_options (generated by scrape_options_and_index_by_dest()) in alphabetical order by long option name. This function primarily exists to serve scrape_options_into_new_groups().
| Parameters: |
|
|---|
Reads a file.
Stream input the way Hadoop would.
You can redefine stdin for ease of testing. stdin can actually be any iterable that yields lines (e.g. a list).
Like eval, but with nearly everything in the environment blanked out, so that it’s difficult to cause mischief.
globals and locals are optional dictionaries mapping names to values for those names (just like in eval()).
Context manager that saves os.environ and loads it back again after execution
Scrapes optparse options from OptionParser and OptionGroup objects and builds a dictionary of dest_var: [option1, option2, ...]. This function primarily exists to serve scrape_options_into_new_groups().
An example return value: {'verbose': [<verbose_on_option>, <verbose_off_option>], 'files': [<file_append_option>]}
| Parameters: | parsers_and_groups (OptionParser or OptionGroup) – Parsers and groups to scrape option objects from |
|---|---|
| Returns: | dict of the form {dest_var: [option1, option2, ...], ...} |
Puts options from the OptionParser and OptionGroup objects in source_groups into the keys of assignments according to the values of assignments. An example:
| Parameters: |
|
|---|
Return the given datetime.timedelta, without microseconds.
Useful for printing datetime.timedelta objects.
Tar and gzip the given dir to a tarball at out_path.
If we encounter symlinks, include the actual file, not the symlink.
| Parameters: |
|
|---|
Extract the contents of a tar or zip file at archive_path into the directory dest.
| Parameters: |
|
|---|
dest will be created if it doesn’t already exist.
tar files can be gzip compressed, bzip2 compressed, or uncompressed. Files within zip files can be deflated or stored.