I find myself searching and (re-)implementing this on a regular basis. The os.walk
is often given as a starting point, but then there are still few lines that are needed to wrap-up.
Update
Original version contained a mistake that would cause to walk through sub-folders several time. The code below is correct.
Here is a version of getting all files from starting point and below. Note that this is not very efficient if you just need to iterate through them, but I will come back on this in the next example.
The following function will return a list of all files including path to them, so you can directly start processing them (e.g. open, read, etc).
def get_all_files(path): ''' Gets list of all files including path starting from given path recursively. Usage: file_names = get_all_files('.') for file_name in file_names: ... @path the path to start from. @debug whether to print debug information. ''' file_names = [] for root, sub_folders, files in os.walk(path): file_names += [os.path.join(root, file_name) for file_name in files] return file_names
The biggest disadvantages are that the complete list has to be constructed completely and kept in memory before you can do anything with it. Constructing can take time, which may be frustrating for end-users. If you don’t need all files or may stop after processing just a few (for whatever reason) then this does not make any sense. Another point is that the complete list is kept in memory while this is not required when you process files one by one.
Note that there may be very legitimate situations to have all files, it completely depends on your use-case.
Sample usage can be:
file_names = get_all_files('.') for file_name in file_names: ...
A more elegant and optimal solution for iterating can be realized using iterator functions. The file system is traversed as needed. One possible implementation can be as following:
def get_all_files_iter(path): ''' Iterator for listing all files starting from given path recursively. Usage: for file_name in get_all_files_iter(path): ... @path the path to start from. ''' for root, sub_folders, files in os.walk(path): for file_name in files: yield os.path.join(root, file_name)
Sample usage can be:
for file_name in get_all_files_iter('.'): ...
The disadvantage is that you don’t know upfront how many entries are there to be processed, so giving progress information is rather cumbersome.
Enjoy!