Today someone asked to help with getting some files with non-ASCII characters on his Linux box. The problem was that those file couldn’t be read by some apps (unicode still remains a mystery for some). Since I am not a bash-minded person I thought of Python (2.x) first (works on Windows as well). The following script will remove any non-ASCII characters from file names.
WARNING: Beware if you have files that may end up in the same name, as the files may be overridden!
import os for file in os.listdir(u"."): if os.path.isfile(file) and file.endswith(u'.rar'): new_file = "".join(i for i in file if ord(i)<128) if (file != new_file): print u"Renaming", file.encode('utf8'),u" to ", new_file.encode('utf8') os.rename(file, new_file)
Note that the u"."
is essential so that you get the unicode file names back. The "."
will give you regular string which is pain-in-the-butt. Sticking to bash (if you don’t like Python for some reason), I’ve came up with the following script:
for f in *.rar; do mv "$f" `echo $f | tr -cd "a-zA-Z0-9.-_"` done
Note that most script deal with data IN the files, but not the file names themselves. I hope some other people can use this as well.
i could benefit from this code if it could be run by a windows executable with a “inc. sub folders” feature… …. thanks anyway i wilil keep looking
Thanks. This is quite useful. To the earlier comment, the python script works on windows too. I tested it on Python 2.7, which is supported by py2exe. so, you can generate a windows exe out of this script – then distribute the exe.
Take care to install the 32 bit versions of Python 2.7 because py2exe does not support 64bit.