Setting up Dango i18n, i10n

Anything related to i18n/i10n subject seem to be somehow quirkier than it looks at first glance. Python (2.x) itself and handling of unicode is a story apart.

This time I was looking into building a small website that has to provide UI in different languages. As this is one of the things you want to have right away I’ve started experimenting with adding i18n and i10n support.

First step is easy, the already had proper settings. Then for .py files it is rather straightforward to add e.g. (for forms):

from django.utils.translation import ugettext_lazy as _
city_name = forms.CharField( required = False, label = _('City:'))

For the .html files something like

{% load i18n %}

{% trans "Hello there!" %}

Then create under project folder folder ‘conf/locale’ (if you don’t do this it will complain), and then run makemessages -l ru

Edit the resulting django.po file, add translations to your messages.

Warning: don’t forget to edit the following field, which comes EMPTY first, even while you have given it a parameter! Otherwise this file will be not used properly.

"Language: ru\n"

Then compile your nice and shiny translations: compilemessages

Now we get all messages available. At least they should. But there is another trick missed in the Django documentation/tutorials: you HAVE TO specify the location of the message files explicitly in your otherwise your texts will continue coming up in default (en) language no matter how hard you try. E.g.:

os.path.join(os.path.dirname(__file__), 'conf', 'locale').replace('\\','/'),

Well, after all this it seems to work. But it costs quite some searching and poking around to come to this. I can imagine after several rounds this becomes obvious, but you don’t get any errors, warnings, whatsoever, it just does not what you want it to do. Well, I hope it will do it for you now :).

Happy Djangoing!

A catch with the clean_xxx methods in Django forms

As I am getting more into Django stuff I also seem to step into rookie mistakes. Hereby some (beginner) notes.

As I am getting more into Django stuff I also seem to step into rookie mistakes. Hereby some (beginner) notes.

For validating of a particular field in Django a form class method can be defined, e.g.

from django import forms

class MyForm(forms.Form):
    first_field = forms.CharField() 
    second_field = forms.CharField()

def clean_first_field(self):
    first_field = self.cleaned_data['first_field']

Well, so far so good. But there is a catch which cost me another sleepless hour last night. When the method for field1 is called the ‘self.cleaned_data’ dictionary does not contain the ‘field2’ yet!!

Try the following:

def clean_field1(self):
    first_field = self.cleaned_data['first_field']
    second_field = self.cleaned_data['second_field'] # This will result in a KeyError exception!

Apparently Django will call the clean_xxx() methods in the same sequence as they are defined and will only supply the values UP TO AND INCLUDING the field being ‘cleaned’, but NOT the fields defined AFTER it. Watta…

This means if you want to check e.g. field1 and field2 between each other (yes, there are other ways of doing that as well, but when you’re only so far through the Django book you don’t know about them :)), then you have to check for the LAST field you want to check against to get values of this and PREVIOUSLY defined fields. So the code becomes:

def clean_first_field(self):
    first_field = self.cleaned_data['first_field'] # This is OK, field1 is defined before field2
    second_field = self.cleaned_data['second_field'] # As this is called for field2 you can get its value as well.

Of course you should have more error-checking logic, the examples above are simplified for illustrative reasons.

Note that if you use the higher-level clean() method it will be called when all values are filled in.

Working with unicode in Python (again)

This time I have stumbled (again) a unicode problem using some Python code which was supposed to be perfectly suitable for doing this since it even started with

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

It was quite some time since the last post, but this does not mean I haven’t done anything interesting :). It is just that it was so much interesting that I didn’t have any time to write anything.

Anyway, this time I have stumbled (again) a unicode problem using some Python code which was supposed to be perfectly suitable for doing this since it even started with

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

It went perfectly fine when running in Eclipse, but to my huge surprise I’ve got problems when running the unit tests from the command line in terminal. Whaaat? It just worked!

Well, declaring your source as UTF-8 is not enough of course. There are several things to check when getting the “UnicodeDecodeError: ‘ascii’ codec can’t decode byte … in position …: ordinal not in range(128)”-kind of errors. Googling around didn’t bring me much luck to my surprise, so there is are my findings for the next time :).

First of all make absolutely sure you haven’t forgotten the ‘u’ character before your strings containing the unicode strings. Yep, just like that you screw up the rest of the unicode support. Python (ok, I admit, I use 2.5.4) treats a ‘string’ as a regular string and not as a unicode. So write u’string’ instead!

Second, when doing things file operations don’t forget that you don’t get the unicode by default. Consider the following:

message = u'unicode message'

Well, guess what. You get a problem when writing the string away. It cannot be recognized. So the solution would be to do something like this

encoded_message = message.encode(u"utf-8")

But that’s only a half of the problem. At some point you will be reading this data back. And most probably you would like to get your beloved unicode thingy back. Just doing the following will hardly help:

file_handle = open(full_name, 'r')
line = file_handle.readline()

The following will save your day:

file_handle = open(full_name, 'r')
line = file_handle.readline().decode('utf-8')

Voila. I hope this saves some frustration to somebody, even if it will be me some month later :).


Comments from Andy (thanks!):

Probably there is nothing new for you in what I am saying below, however, from my experience, it covers 99% unicode-related errors.

Unicode string is s sequence of code points in range 0 to 0x10ffff. Encoding is a way of serializing this sequence, so thay can be represented in memory, written to a file, sent over a socket etc.
Encoding unicode string is needed _at least_ because its ‘as is’ byte representation is not portable due to byte order issues.

It is _recommended_ that you work with unicode string internally provided the language/API supports unicode
It is _must_ that you encode the string to be consumed by another program.