fileutils - Filesystem helpers

Virtually every Python programmer has used Python for wrangling disk contents, and fileutils collects solutions to some of the most commonly-found gaps in the standard library.

Creating, Finding, and Copying

Python’s os, os.path, and shutil modules provide good coverage of file wrangling fundamentals, and these functions help close a few remaining gaps.

boltons.fileutils.mkdir_p(path)[source]

Creates a directory and any parent directories that may need to be created along the way, without raising errors for any existing directories. This function mimics the behavior of the mkdir -p command available in Linux/BSD environments, but also works on Windows.

boltons.fileutils.iter_find_files(directory, patterns, ignored=None, include_dirs=False, max_depth=None)[source]

Returns a generator that yields file paths under a directory, matching patterns using glob syntax (e.g., *.txt). Also supports ignored patterns.

Parameters:
  • directory (str) – Path that serves as the root of the search. Yielded paths will include this as a prefix.

  • patterns (str or list) – A single pattern or list of glob-formatted patterns to find under directory.

  • ignored (str or list) – A single pattern or list of glob-formatted patterns to ignore.

  • include_dirs (bool) – Whether to include directories that match patterns, as well. Defaults to False.

  • max_depth (int) – traverse up to this level of subdirectory. I.e., 0 for the specified directory only, 1 for directory and one level of subdirectory.

For example, finding Python files in the current directory:

>>> _CUR_DIR = os.path.dirname(os.path.abspath(__file__))
>>> filenames = sorted(iter_find_files(_CUR_DIR, '*.py'))
>>> os.path.basename(filenames[-1])
'urlutils.py'

Or, Python files while ignoring emacs lockfiles:

>>> filenames = iter_find_files(_CUR_DIR, '*.py', ignored='.#*')
boltons.fileutils.copytree(src, dst, symlinks=False, ignore=None)

The copy_tree function is an exact copy of the built-in shutil.copytree(), with one key difference: it will not raise an exception if part of the tree already exists. It achieves this by using mkdir_p().

As of Python 3.8, you may pass shutil.copytree() the dirs_exist_ok=True flag to achieve the same effect.

Parameters:
  • src (str) – Path of the source directory to copy.

  • dst (str) – Destination path. Existing directories accepted.

  • symlinks (bool) – If True, copy symlinks rather than their contents.

  • ignore (callable) – A callable that takes a path and directory listing, returning the files within the listing to be ignored.

For more details, check out shutil.copytree() and shutil.copy2().

Atomic File Saving

Ideally, the road to success should never put current progress at risk. And that’s exactly why atomic_save() and AtomicSaver exist.

Using the same API as a writable file, all output is saved to a temporary file, and when the file is closed, the old file is replaced by the new file in a single system call, portable across all major operating systems. No more partially-written or partially-overwritten files.

boltons.fileutils.atomic_save(dest_path, **kwargs)[source]

A convenient interface to the AtomicSaver type. Example:

>>> try:
...     with atomic_save("file.txt", text_mode=True) as fo:
...         _ = fo.write('bye')
...         1/0  # will error
...         fo.write('bye')
... except ZeroDivisionError:
...     pass  # at least our file.txt didn't get overwritten

See the AtomicSaver documentation for details.

class boltons.fileutils.AtomicSaver(dest_path, **kwargs)[source]

AtomicSaver is a configurable context manager that provides a writable file which will be moved into place as long as no exceptions are raised within the context manager’s block. These “part files” are created in the same directory as the destination path to ensure atomic move operations (i.e., no cross-filesystem moves occur).

Parameters:
  • dest_path (str) – The path where the completed file will be written.

  • overwrite (bool) – Whether to overwrite the destination file if it exists at completion time. Defaults to True.

  • file_perms (int) – Integer representation of file permissions for the newly-created file. Defaults are, when the destination path already exists, to copy the permissions from the previous file, or if the file did not exist, to respect the user’s configured umask, usually resulting in octal 0644 or 0664.

  • text_mode (bool) – Whether to open the destination file in text mode (i.e., 'w' not 'wb'). Defaults to False (wb).

  • part_file (str) – Name of the temporary part_file. Defaults to dest_path + .part. Note that this argument is just the filename, and not the full path of the part file. To guarantee atomic saves, part files are always created in the same directory as the destination path.

  • overwrite_part (bool) – Whether to overwrite the part_file, should it exist at setup time. Defaults to False, which results in an OSError being raised on pre-existing part files. Be careful of setting this to True in situations when multiple threads or processes could be writing to the same part file.

  • rm_part_on_exc (bool) – Remove part_file on exception cases. Defaults to True, but False can be useful for recovery in some cases. Note that resumption is not automatic and by default an OSError is raised if the part_file exists.

Practically, the AtomicSaver serves a few purposes:

  • Avoiding overwriting an existing, valid file with a partially written one.

  • Providing a reasonable guarantee that a part file only has one writer at a time.

  • Optional recovery of partial data in failure cases.

boltons.fileutils.atomic_rename(src, dst, overwrite=False)[source]

Rename src to dst, replacing dst if *overwrite is True

boltons.fileutils.replace(src, dst)[source]

Similar to os.replace() in Python 3.3+, this function will atomically create or replace the file at path dst with the file at path src.

On Windows, this function uses the ReplaceFile API for maximum possible atomicity on a range of filesystems.

File Permissions

Linux, BSD, Mac OS, and other Unix-like operating systems all share a simple, foundational file permission structure that is commonly complicit in accidental access denial, as well as file leakage. FilePerms was built to increase clarity and cut down on permission-related accidents when working with files from Python code.

class boltons.fileutils.FilePerms(user='', group='', other='')[source]

The FilePerms type is used to represent standard POSIX filesystem permissions:

  • Read

  • Write

  • Execute

Across three classes of user:

  • Owning (u)ser

  • Owner’s (g)roup

  • Any (o)ther user

This class assists with computing new permissions, as well as working with numeric octal 777-style and rwx-style permissions. Currently it only considers the bottom 9 permission bits; it does not support sticky bits or more advanced permission systems.

Parameters:
  • user (str) – A string in the ‘rwx’ format, omitting characters for which owning user’s permissions are not provided.

  • group (str) – A string in the ‘rwx’ format, omitting characters for which owning group permissions are not provided.

  • other (str) – A string in the ‘rwx’ format, omitting characters for which owning other/world permissions are not provided.

There are many ways to use FilePerms:

>>> FilePerms(user='rwx', group='xrw', other='wxr')  # note character order
FilePerms(user='rwx', group='rwx', other='rwx')
>>> int(FilePerms('r', 'r', ''))
288
>>> oct(288)[-3:]  # XXX Py3k
'440'

See also the FilePerms.from_int() and FilePerms.from_path() classmethods for useful alternative ways to construct FilePerms objects.

Miscellaneous

class boltons.fileutils.DummyFile(path, mode='r', buffering=None)[source]