mrjob.conf - parse and write config files

“mrjob.conf” is the name of both this module, and the global config file for mrjob.

Reading and writing mrjob.conf

mrjob.conf.find_mrjob_conf()

Look for mrjob.conf, and return its path. Places we look:

  • The location specified by MRJOB_CONF
  • ~/.mrjob.conf
  • ~/.mrjob (deprecated)
  • mrjob.conf in any directory in PYTHONPATH (deprecated)
  • /etc/mrjob.conf

Return None if we can’t find it. Print a warning if its location is deprecated.

mrjob.conf.load_mrjob_conf(conf_path=None)

Deprecated since version 0.3.3.

Load the entire data structure in mrjob.conf, which should look something like this:

{'runners':
    'emr': {'OPTION': VALUE, ...},
    'hadoop: {'OPTION': VALUE, ...},
    'inline': {'OPTION': VALUE, ...},
    'local': {'OPTION': VALUE, ...},
}

Returns None if we can’t find mrjob.conf.

Parameters:conf_path (str) – an alternate place to look for mrjob.conf. If this is False, we’ll always return None.
mrjob.conf.load_opts_from_mrjob_conf(runner_alias, conf_path=None, already_loaded=None)

Load a list of dictionaries representing the options in a given mrjob.conf for a specific runner. Returns [(path, values)]. If conf_path is not found, return [(None, {})].

Parameters:
  • runner_alias (str) – String identifier of the runner type, e.g. emr, local, etc.
  • conf_path (str) – an alternate place to look for mrjob.conf. If this is False, we’ll always return {}.
  • already_loaded (list) – list of mrjob.conf paths that have already been loaded

Combining options

Combiner functions take a list of values to combine, with later options taking precedence over earlier ones. None values are always ignored.

mrjob.conf.combine_values(*values)

Return the last value in values that is not None.

The default combiner; good for simple values (booleans, strings, numbers).

mrjob.conf.combine_lists(*seqs)

Concatenate the given sequences into a list. Ignore None values.

Generally this is used for a list of commands we want to run; the “default” commands get run before any commands specific to your job.

mrjob.conf.combine_dicts(*dicts)

Combine zero or more dictionaries. Values from dicts later in the list take precedence over values earlier in the list.

If you pass in None in place of a dictionary, it will be ignored.

mrjob.conf.combine_cmds(*cmds)

Take zero or more commands to run on the command line, and return the last one that is not None. Each command should either be a list containing the command plus switches, or a string, which will be parsed with shlex.split()

Returns either None or a list containing the command plus arguments.

mrjob.conf.combine_cmd_lists(*seqs_of_cmds)

Concatenate the given commands into a list. Ignore None values, and parse strings with shlex.split().

Returns a list of lists (each sublist contains the command plus arguments).

mrjob.conf.combine_envs(*envs)

Combine zero or more dictionaries containing environment variables.

Environment variables later from dictionaries later in the list take priority over those earlier in the list. For variables ending with PATH, we prepend (and add a colon) rather than overwriting.

If you pass in None in place of a dictionary, it will be ignored.

mrjob.conf.combine_local_envs(*envs)

Same as combine_envs(), except that paths are combined using the local path separator (e.g ; on Windows rather than :).

mrjob.conf.combine_paths(*paths)

Returns the last value in paths that is not None. Resolve ~ (home dir) and environment variables.

mrjob.conf.combine_path_lists(*path_seqs)

Concatenate the given sequences into a list. Ignore None values. Resolve ~ (home dir) and environment variables, and expand globs that refer to the local filesystem.

Table Of Contents

Previous topic

Quick Reference

Next topic

mrjob.protocol - input and output

This Page