A catch with the clean_xxx methods in Django forms

As I am getting more into Django stuff I also seem to step into rookie mistakes. Hereby some (beginner) notes.

As I am getting more into Django stuff I also seem to step into rookie mistakes. Hereby some (beginner) notes.

For validating of a particular field in Django a form class method can be defined, e.g.

from django import forms

class MyForm(forms.Form):
    first_field = forms.CharField() 
    second_field = forms.CharField()

def clean_first_field(self):
    first_field = self.cleaned_data['first_field']

Well, so far so good. But there is a catch which cost me another sleepless hour last night. When the method for field1 is called the ‘self.cleaned_data’ dictionary does not contain the ‘field2’ yet!!

Try the following:

def clean_field1(self):
    first_field = self.cleaned_data['first_field']
    second_field = self.cleaned_data['second_field'] # This will result in a KeyError exception!

Apparently Django will call the clean_xxx() methods in the same sequence as they are defined and will only supply the values UP TO AND INCLUDING the field being ‘cleaned’, but NOT the fields defined AFTER it. Watta…

This means if you want to check e.g. field1 and field2 between each other (yes, there are other ways of doing that as well, but when you’re only so far through the Django book you don’t know about them :)), then you have to check for the LAST field you want to check against to get values of this and PREVIOUSLY defined fields. So the code becomes:

def clean_first_field(self):
    first_field = self.cleaned_data['first_field'] # This is OK, field1 is defined before field2
    second_field = self.cleaned_data['second_field'] # As this is called for field2 you can get its value as well.

Of course you should have more error-checking logic, the examples above are simplified for illustrative reasons.

Note that if you use the higher-level clean() method it will be called when all values are filled in.

Working with unicode in Python (again)

This time I have stumbled (again) a unicode problem using some Python code which was supposed to be perfectly suitable for doing this since it even started with


#!/usr/bin/env python
# -*- coding: UTF-8 -*-

It was quite some time since the last post, but this does not mean I haven’t done anything interesting :). It is just that it was so much interesting that I didn’t have any time to write anything.

Anyway, this time I have stumbled (again) a unicode problem using some Python code which was supposed to be perfectly suitable for doing this since it even started with


#!/usr/bin/env python
# -*- coding: UTF-8 -*-

It went perfectly fine when running in Eclipse, but to my huge surprise I’ve got problems when running the unit tests from the command line in terminal. Whaaat? It just worked!

Well, declaring your source as UTF-8 is not enough of course. There are several things to check when getting the “UnicodeDecodeError: ‘ascii’ codec can’t decode byte … in position …: ordinal not in range(128)”-kind of errors. Googling around didn’t bring me much luck to my surprise, so there is are my findings for the next time :).

First of all make absolutely sure you haven’t forgotten the ‘u’ character before your strings containing the unicode strings. Yep, just like that you screw up the rest of the unicode support. Python (ok, I admit, I use 2.5.4) treats a ‘string’ as a regular string and not as a unicode. So write u’string’ instead!

Second, when doing things file operations don’t forget that you don’t get the unicode by default. Consider the following:


message = u'unicode message'
file_handle.write(message)

Well, guess what. You get a problem when writing the string away. It cannot be recognized. So the solution would be to do something like this


encoded_message = message.encode(u"utf-8")
file_handle.write(encoded_message)

But that’s only a half of the problem. At some point you will be reading this data back. And most probably you would like to get your beloved unicode thingy back. Just doing the following will hardly help:


file_handle = open(full_name, 'r')
line = file_handle.readline()

The following will save your day:


file_handle = open(full_name, 'r')
line = file_handle.readline().decode('utf-8')

Voila. I hope this saves some frustration to somebody, even if it will be me some month later :).

Enjoy!


Comments from Andy (thanks!):

Probably there is nothing new for you in what I am saying below, however, from my experience, it covers 99% unicode-related errors.

Unicode string is s sequence of code points in range 0 to 0x10ffff. Encoding is a way of serializing this sequence, so thay can be represented in memory, written to a file, sent over a socket etc.
Encoding unicode string is needed _at least_ because its ‘as is’ byte representation is not portable due to byte order issues.

It is _recommended_ that you work with unicode string internally provided the language/API supports unicode
It is _must_ that you encode the string to be consumed by another program.