Remove non-ascii characters in file names

Today someone asked to help with getting some files with non-ASCII characters on his Linux box. The problem was that those file couldn’t be read by some apps (unicode remains a mystery for some). Since I am not a bash-minded person I thought of Python…

Today someone asked to help with getting some files with non-ASCII characters on his Linux box. The problem was that those file couldn’t be read by some apps (unicode still remains a mystery for some). Since I am not a bash-minded person I thought of Python (2.x) first (works on Windows as well). The following script will remove any non-ASCII characters from file names.

WARNING: Beware if you have files that may end up in the same name, as the files may be overridden!

import os
for file in os.listdir(u"."):
    if os.path.isfile(file) and file.endswith(u'.rar'):
    new_file = "".join(i for i in file if ord(i)<128)
    if (file != new_file):
        print u"Renaming", file.encode('utf8'),u" to ", new_file.encode('utf8')
        os.rename(file, new_file)

 

Note that the u"." is essential so that you get the unicode file names back. The "." will give you regular string which is pain-in-the-butt. Sticking to bash (if you don’t like Python for some reason), I’ve came up with the following script:

for f in *.rar; do
    mv "$f" `echo $f | tr -cd "a-zA-Z0-9.-_"`
done

 

Note that most script deal with data IN the files, but not the file names themselves. I hope some other people can use this as well.

2 thoughts on “Remove non-ascii characters in file names”

  1. i could benefit from this code if it could be run by a windows executable with a “inc. sub folders” feature… …. thanks anyway i wilil keep looking

  2. Thanks. This is quite useful. To the earlier comment, the python script works on windows too. I tested it on Python 2.7, which is supported by py2exe. so, you can generate a windows exe out of this script – then distribute the exe.
    Take care to install the 32 bit versions of Python 2.7 because py2exe does not support 64bit.

Comments are closed.