fileutils
- Filesystem helpers
Virtually every Python programmer has used Python for wrangling
disk contents, and fileutils
collects solutions to some of the
most commonly-found gaps in the standard library.
Creating, Finding, and Copying
Python’s os
, os.path
, and shutil
modules provide
good coverage of file wrangling fundamentals, and these functions help
close a few remaining gaps.
- boltons.fileutils.mkdir_p(path)[source]
Creates a directory and any parent directories that may need to be created along the way, without raising errors for any existing directories. This function mimics the behavior of the
mkdir -p
command available in Linux/BSD environments, but also works on Windows.
- boltons.fileutils.iter_find_files(directory, patterns, ignored=None, include_dirs=False, max_depth=None)[source]
Returns a generator that yields file paths under a directory, matching patterns using glob syntax (e.g.,
*.txt
). Also supports ignored patterns.- Parameters:
directory (str) – Path that serves as the root of the search. Yielded paths will include this as a prefix.
patterns (str or list) – A single pattern or list of glob-formatted patterns to find under directory.
ignored (str or list) – A single pattern or list of glob-formatted patterns to ignore.
include_dirs (bool) – Whether to include directories that match patterns, as well. Defaults to
False
.max_depth (int) – traverse up to this level of subdirectory. I.e., 0 for the specified directory only, 1 for directory and one level of subdirectory.
For example, finding Python files in the current directory:
>>> _CUR_DIR = os.path.dirname(os.path.abspath(__file__)) >>> filenames = sorted(iter_find_files(_CUR_DIR, '*.py')) >>> os.path.basename(filenames[-1]) 'urlutils.py'
Or, Python files while ignoring emacs lockfiles:
>>> filenames = iter_find_files(_CUR_DIR, '*.py', ignored='.#*')
- boltons.fileutils.copytree(src, dst, symlinks=False, ignore=None)
The
copy_tree
function is an exact copy of the built-inshutil.copytree()
, with one key difference: it will not raise an exception if part of the tree already exists. It achieves this by usingmkdir_p()
.As of Python 3.8, you may pass
shutil.copytree()
the dirs_exist_ok=True flag to achieve the same effect.- Parameters:
src (str) – Path of the source directory to copy.
dst (str) – Destination path. Existing directories accepted.
symlinks (bool) – If
True
, copy symlinks rather than their contents.ignore (callable) – A callable that takes a path and directory listing, returning the files within the listing to be ignored.
For more details, check out
shutil.copytree()
andshutil.copy2()
.
Atomic File Saving
Ideally, the road to success should never put current progress at
risk. And that’s exactly why atomic_save()
and
AtomicSaver
exist.
Using the same API as a writable file, all output is saved to a temporary file, and when the file is closed, the old file is replaced by the new file in a single system call, portable across all major operating systems. No more partially-written or partially-overwritten files.
- boltons.fileutils.atomic_save(dest_path, **kwargs)[source]
A convenient interface to the
AtomicSaver
type. Example:>>> try: ... with atomic_save("file.txt", text_mode=True) as fo: ... _ = fo.write('bye') ... 1/0 # will error ... fo.write('bye') ... except ZeroDivisionError: ... pass # at least our file.txt didn't get overwritten
See the
AtomicSaver
documentation for details.
- class boltons.fileutils.AtomicSaver(dest_path, **kwargs)[source]
AtomicSaver
is a configurable context manager that provides a writablefile
which will be moved into place as long as no exceptions are raised within the context manager’s block. These “part files” are created in the same directory as the destination path to ensure atomic move operations (i.e., no cross-filesystem moves occur).- Parameters:
dest_path (str) – The path where the completed file will be written.
overwrite (bool) – Whether to overwrite the destination file if it exists at completion time. Defaults to
True
.file_perms (int) – Integer representation of file permissions for the newly-created file. Defaults are, when the destination path already exists, to copy the permissions from the previous file, or if the file did not exist, to respect the user’s configured umask, usually resulting in octal 0644 or 0664.
text_mode (bool) – Whether to open the destination file in text mode (i.e.,
'w'
not'wb'
). Defaults toFalse
(wb
).part_file (str) – Name of the temporary part_file. Defaults to dest_path +
.part
. Note that this argument is just the filename, and not the full path of the part file. To guarantee atomic saves, part files are always created in the same directory as the destination path.overwrite_part (bool) – Whether to overwrite the part_file, should it exist at setup time. Defaults to
False
, which results in anOSError
being raised on pre-existing part files. Be careful of setting this toTrue
in situations when multiple threads or processes could be writing to the same part file.rm_part_on_exc (bool) – Remove part_file on exception cases. Defaults to
True
, butFalse
can be useful for recovery in some cases. Note that resumption is not automatic and by default anOSError
is raised if the part_file exists.
Practically, the AtomicSaver serves a few purposes:
Avoiding overwriting an existing, valid file with a partially written one.
Providing a reasonable guarantee that a part file only has one writer at a time.
Optional recovery of partial data in failure cases.
- boltons.fileutils.atomic_rename(src, dst, overwrite=False)[source]
Rename src to dst, replacing dst if *overwrite is True
- boltons.fileutils.replace(src, dst)[source]
Similar to
os.replace()
in Python 3.3+, this function will atomically create or replace the file at path dst with the file at path src.On Windows, this function uses the ReplaceFile API for maximum possible atomicity on a range of filesystems.
File Permissions
Linux, BSD, Mac OS, and other Unix-like operating systems all share a
simple, foundational file permission structure that is commonly
complicit in accidental access denial, as well as file
leakage. FilePerms
was built to increase clarity and cut down
on permission-related accidents when working with files from Python
code.
- class boltons.fileutils.FilePerms(user='', group='', other='')[source]
The
FilePerms
type is used to represent standard POSIX filesystem permissions:Read
Write
Execute
Across three classes of user:
Owning (u)ser
Owner’s (g)roup
Any (o)ther user
This class assists with computing new permissions, as well as working with numeric octal
777
-style andrwx
-style permissions. Currently it only considers the bottom 9 permission bits; it does not support sticky bits or more advanced permission systems.- Parameters:
user (str) – A string in the ‘rwx’ format, omitting characters for which owning user’s permissions are not provided.
group (str) – A string in the ‘rwx’ format, omitting characters for which owning group permissions are not provided.
other (str) – A string in the ‘rwx’ format, omitting characters for which owning other/world permissions are not provided.
There are many ways to use
FilePerms
:>>> FilePerms(user='rwx', group='xrw', other='wxr') # note character order FilePerms(user='rwx', group='rwx', other='rwx') >>> int(FilePerms('r', 'r', '')) 288 >>> oct(288)[-3:] # XXX Py3k '440'
See also the
FilePerms.from_int()
andFilePerms.from_path()
classmethods for useful alternative ways to constructFilePerms
objects.