Today someone asked to help with getting some files with non-ASCII characters on his Linux box. The problem was that those file couldn’t be read by some apps (unicode still remains a mystery for some). Since I am not a bash-minded person I thought of Python (2.x) first (works on Windows as well). The following script will remove any non-ASCII characters from file names.
WARNING: Beware if you have files that may end up in the same name, as the files may be overridden!
import os for file in os.listdir(u"."): if os.path.isfile(file) and file.endswith(u'.rar'): new_file = "".join(i for i in file if ord(i)<128) if (file != new_file): print u"Renaming", file.encode('utf8'),u" to ", new_file.encode('utf8') os.rename(file, new_file)
Note that the
u"." is essential so that you get the unicode file names back. The
"." will give you regular string which is pain-in-the-butt. Sticking to bash (if you don’t like Python for some reason), I’ve came up with the following script:
for f in *.rar; do mv "$f" `echo $f | tr -cd "a-zA-Z0-9.-_"` done
Note that most script deal with data IN the files, but not the file names themselves. I hope some other people can use this as well.