mrjob.compat - Hadoop version compatibility

Utility functions for compatibility with different version of hadoop.

mrjob.compat.get_jobconf_value(variable, default=None)

Get the value of a jobconf variable from the runtime environment.

For example, a MRJob could use get_jobconf_value('map.input.file') to get the name of the file a mapper is reading input from.

If the name of the jobconf variable is different in different versions of Hadoop (e.g. in Hadoop 0.21, map.input.file is mapreduce.map.input.file), we’ll automatically try all variants before giving up.

Return default if that jobconf variable isn’t set.

mrjob.compat.supports_combiners_in_hadoop_streaming(version)

Return True if this version of Hadoop Streaming supports combiners (i.e. >= 0.20.203), otherwise False.

mrjob.compat.supports_new_distributed_cache_options(version)

Use -files and -archives instead of -cacheFile and -cacheArchive

mrjob.compat.translate_jobconf(variable, version)

Translate variable to Hadoop version version. If it’s not a variable we recognize, leave as-is.

mrjob.compat.uses_generic_jobconf(version)

Use -D instead of -jobconf

mrjob.compat.version_gte(version, cmp_version_str)

Return True if version >= cmp_version_str.

Previous topic

Utilities

Next topic

mrjob.parse - log parsing

This Page