Today someone asked to help with getting some files with non-ASCII characters on his Linux box. The problem was that those file couldn’t be read by some apps (unicode remains a mystery for some). Since I am not a bash-minded person I thought of Python…
Today someone asked to help with getting some files with non-ASCII characters on his Linux box. The problem was that those file couldn’t be read by some apps (unicode still remains a mystery for some). Since I am not a bash-minded person I thought of Python (2.x) first (works on Windows as well). The following script will remove any non-ASCII characters from file names.
WARNING: Beware if you have files that may end up in the same name, as the files may be overridden!
for file in os.listdir(u"."):
if os.path.isfile(file) and file.endswith(u'.rar'):
new_file = "".join(i for i in file if ord(i)<128)
if (file != new_file):
print u"Renaming", file.encode('utf8'),u" to ", new_file.encode('utf8')
Note that the
u"." is essential so that you get the unicode file names back. The
"." will give you regular string which is pain-in-the-butt. Sticking to bash (if you don’t like Python for some reason), I’ve came up with the following script:
for f in *.rar; do
mv "$f" `echo $f | tr -cd "a-zA-Z0-9.-_"`
Note that most script deal with data IN the files, but not the file names themselves. I hope some other people can use this as well.
I hope this will help some people like me searching for an answer. Several steps were mentioned in the Python for .NET mailing list by other people as well, but I haven’t seen a step-by-step guide. It is not my intention to duplicate other posts in that sense, but rather have all-in-one post.
Here is how I’ve got VS2010 and .NET 4.0 working with the revision 122 of the Python.NET having Python 2.6 installed.
- Get the sources (tarball from sourceforge or directly from SVN)
- Open the pythonnet.sln with VS2010 and convert to 2010 format (will happen automagically)
- [updated] Add the constructorbinding.cs to the Python.Runtime.csproj project (see also this post in the PythonNET mailing list)
- Change the target framework to 4. Follow the following step for EACH project
Right-click on the project name and select “Properties”
Select the “Application” tab on the left (if not selected yet)
Change the “Target framework” to “.NET Framework 4”
- Open the buildclrmodule.bat and change the following lines (2 times!)
%windir%\Microsoft.NET\Framework\v2.0.50727\ilasm /nologo /quiet /dll %ILASM_EXTRA_ARGS% /include=%INCLUDE_PATH% /output=%OUTPUT_PATH% %INPUT_PATH%
%windir%\Microsoft.NET\Framework\v4.0.30319\ilasm /nologo /quiet /dll %ILASM_EXTRA_ARGS% /include=%INCLUDE_PATH% /output=%OUTPUT_PATH% %INPUT_PATH%
- Open the clrmodule.il and change the lines with the version number in the following piece of code.assembly extern mscorlib
.publickeytoken = (B7 7A 5C 56 19 34 E0 89 )
- Recompile the whole solution, ignore the deprecation warnings.
Now you have all necessary files under the pythonnet folder where you have the sources. You need clr.pyd, python.exe and Python.Runtime.dll.