Os Path Walk

Python os.walk Method - Python method walk generates the file names in a directory tree by walking the tree either top-down or bottom-up. Print(os.path.join(root, name)) for name in dirs: print(os.path.join(root, name)) Let us compile and run the above program, this will scan all the directories and subdirectories bottom-to-up. Walk ('/', printfnmatches, '.mp3') The old os.path.walk function was a challenge for many to use because of the need to pass a function into the walk,.

Os Path Walk

When you use a scripting language like Python, one thing you will find yourself doing over and over again is walking a directory tree, and processing files. While there are many ways to do this, Python offers a built-in function that makes this process a breeze.

Here's a really simple example that walks a directory tree, printing out the name of each directory and the files contained:

2
4
6
8
importos
# Set the directory you want to start from
fordirName,subdirList,fileList inos.walk(rootDir):
forfname infileList:

os.walk takes care of the details, and on every pass of the loop, it gives us three things:

  • dirName: The next directory it found.
  • subdirList: A list of sub-directories in the current directory.
  • fileList: A list of files in the current directory.

Let's say we have a directory tree that looks like this:

The code above will produce the following output:

2
4
6
8
file2a.jpeg
test.py
file1a.txt
Found directory:./subdir2

By default, Python will walk the directory tree in a top-down order (a directory will be passed to you for processing), then Python will descend into any sub-directories. We can see this behaviour in the output above; the parent directory (.) was printed first, then its 2 sub-directories.

Path

Sometimes we want to traverse the directory tree bottom-up (files at the very bottom of the directory tree are processed first), then we work our way up the directories. We can tell os.walk to do this via the topdown parameter:

2
4
6
fordirName,subdirList,fileList inos.walk(rootDir,topdown=False):
forfname infileList:

Which gives us this output:

Walk
2
4
6
8
file1a.txt
Found directory:./subdir2
file2a.jpeg
test.py

Now we get the files in the sub-directories first, then we ascend up the directory tree.

Walk

The examples so far have simply walked the entire directory tree, but os.walk allows us to selectively skip parts of the tree.

For each directory os.walk gives us, it also provides a list of sub-directories (in subdirList). If we modify this list, we can control which sub-directories os.walk will descend into. Let's tweak our example above so that we skip the first sub-directory.

2
4
6
8
10
fordirName,subdirList,fileList inos.walk(rootDir):
forfname infileList:
# Remove the first entry in the list of sub-directories
iflen(subdirList)>0:

This gives us the following output:

2
4
file2a.jpeg
test.py

We can see that the first sub-directory (subdir1) was indeed skipped.

This only works when the directory is being traversed top-down since for a bottom-up traversal, sub-directories are processed before their parent directory, so trying to modify the subdirList would be pointless since by that time, the sub-directories would have already been processed!

It's also important to modify the subdirListin-place, so that the code calling us will see the changes. If we did something like this:

... we would create a new list of sub-directories, one that the calling code wouldn't know about.

For a more comprehensive tutorial on Python's os.walk method, checkout the recipe Recursive File and Directory Manipulation in Python. Or to take a look at traversing directories in another way (using recursion), checkout the recipe Recursive Directory Traversal in Python: Make a list of your movies!.

Walk

Overview

In an earlier post, OS.walk in Python, I described how to use os.walk and showed some examples on how to use it in scripts.

In this article, I will show how to use the os.walk() module function to walk a directory tree, and the fnmatch module for matching file names.

What is OS.walk?

It generates the file names in a directory tree by walking the tree either top-down or bottom-up.

For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).

dirpath # is a string, the path to the directory.

dirnames # is a list of the names of the subdirectories in dirpath (excluding ‘.’ and ‘..’).

filenames # is a list of the names of the non-directory files in dirpath.

Note that the names in the lists contain no path components.

To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name). For more information, please see the Python Docs.

What is Fnmatch

The fnmatch module compares file names against glob-style patterns such as used by Unix shells.

These are not the same as the more sophisticated regular expression rules. It’s purely a string matching operation.

If you find it more convenient to use a different pattern style, for example regular expressions, then simply use regex operations to match your filenames. http://www.doughellmann.com/PyMOTW/fnmatch/

What does it do?

The fnmatch module is used for the wild-card pattern matching.

Simple Matching

fnmatch() compares a single file name against a pattern and returns a boolean indicating whether or not they match. The comparison is case-sensitive when the operating system uses a case-sensitive file system.

Recommended Python Training

For Python training, our top recommendation is DataCamp.

Filtering

To test a sequence of filenames, you can use filter(). It returns a list of the names that match the pattern argument.

Find all mp3 files

This script will search for *.mp3 files from the rootPath (“/”)

Search computer for specific files

This script uses ‘os.walk’ and ‘fnmatch’ with filters to search the hard-drivefor all image files

There are many other (and faster) ways to do this, but now you understand thebasics of it.

More Reading

Os.walk(path).next()

Recommended Python Training

Os.path.walk Python 2.7

For Python training, our top recommendation is DataCamp.