C++11 vs Python – not quite there yet

Although not exactly fair, while reading the C++ Primer book to refresh my C++11 knowledge during Christmas break (OK, after kids went asleep), I came across the example given for the new C++11 features (lambdas), strings and STL usage in chapter 16, which does nothing more than counting words in a text file. This is a rather classical example and I happen to have it as a part of the Python course I give to my colleagues. Of course I couldn’t hold myself from comparing C++11 and Python as C++11 actually tries to get higher level and closer to languages like Python. Here is what I’ve got.

What it does

It reads words from a text file (hard-coded file name, basic error handling provided) and counts the number of occurrences of each word in the file ignoring upper/lower case. For brevity a word is defined anything surrounded by whitespaces, so “end” and “end.” happen to be different words. So be it.

Quick and dirty summary

C++11 does the job in whopping 54 effective lines of code (ELOC) or in 42 not taking into account lines with single curly braces. The readability could be better and simple constructs seem to require quite some time to get right.

Python does it in just 17 ELOC. The code is straight and clean.


Here is how it looks like

C++11 code

The following code was compiled under GNU c++ 4.6.3-1ubuntu5 using the following command
# c++ word_count.cpp -o word_count -std=c++0x

// Counts words in a text file in C++ (using C++11)

#include <iostream>
#include <iomanip>  // for 'setw'
#include <fstream>
#include <vector>
#include <set>
#include <map>
#include <iterator>
#include <algorithm> // for 'transform'
#include <cstdlib>

using namespace std;

// Required as the default tolower accepts an integer.
char to_lower2(char ch) { return tolower(ch); } 
string & to_lower(string & st);

int main()
    ifstream file_in;
    const char * file_name = "word_count.txt";

    if (file_in.is_open() == false)
        cerr << "Cannot open file '" << file_name << "'. Aborting.\n";

    // Read words into vector.
    vector<string> words;
    string item;

    file_in >> item;
    while (file_in)
        file_in >> item;

    cout << "Words: \n";
    for_each(words.begin(), words.end(), 
             [](const string & word) {cout << word << " "; } );


    // Put all words in set lowercase
    set<string> words_set;
    transform(words.begin(), words.end(), 
              insert_iterator<set > (words_set, words_set.begin()), to_lower);

    // Perform actual counting
    map<string, int> words_map;
    set<string>::iterator si;
    for (si = words_set.begin(); si != words_set.end(); si++ )
        words_map[*si] = count(words.begin(), words.end(), *si);

    // Report results
    cout << "\n\nOccurences:\n";
    for (si = words_set.begin(); si != words_set.end(); si++ )
        cout << setw(16) << left << *si << ": " << words_map[*si] << endl;

    return 0;

string & to_lower(string & st)
    transform(st.begin(), st.end(), st.begin(), to_lower2);
    return st;

Python code
The following code was run using Python 2.7.3.

#!/usr/bin/env python

# Counts words in a text file in Python.

words = []
words_count = {}
file_name = 'word_count.txt'

    with open(file_name) as file_in:
        # Get the words.
        for line in file_in:
            words += line.lower().split()
        print "Words:"
        for word in words:
            print "%s" % word,
        # Count the words.
        for word in words:
            words_count[word] = words_count.get(word, 0) + 1
    print "\n\nOccurences:"
    for word, count in words_count.iteritems():
        print "%-16s: %s" % (word, count)

except IOError:
    print "Failed to open file '%s'. Aborting." % file_name

Some (biased) conclusions
Although not a fair comparison, still C++ still feels clumsy in simple string/file manipulations. It improves by providing proper containers and lambda’s support (and hey, regular expressions are also in the box!), but still using it feels rather unhandy and fails in readability against something like Python. There is too much chatty boilerplate code that needs to be written for simple manipulations.