Rob Knight's blog

A blog

Drupal on Docker

At the time of writing, the current version of Docker is 0.3.2 – future versions may change some of the details in this post, and I’ll try to keep it up-to-date when this happens

Docker is a new open source tool from dotCloud, which simplifies and improves the process of creating and managing Linux containers (LXC). If you’re unfamiliar with Linux containers, the easiest way to think of them is like extremely lightweight VMs – they allow Linux operating systems to host other copies of Linux, safely sharing access to resources without the overheard of something like Xen or VirtualBox. This is useful because there are many cases where full-blown VMs are not necessary, and performance and flexibility can be gained by using containers instead. (On the negative side, containers are Linux-only; both the host and guest operating systems must be Linux-based).

This would be great if containers were easy to use. Unfortunately, they’re not; at least, not as easy to use as they could be. Using containers is a bit like trying to use git using only commands like update-index and read-tree, without familiar tools like add, commit and merge. Docker provides that layer of “porcelain” for LXC, turning containers into something that are much easier to think about, manipulate and distribute.

To get started, we need an environment to run Docker in. If you already run Ubuntu as your primary OS then this process is probably unnecessary and you can skip straight to installing Docker. For everyone else (including me), the easiest solution is to install a Ubuntu VM (LXC requires modern Linux kernels for host systems, and Ubuntu has the best tooling at present, though Arch is also supported). To get this running, install Vagrant (documentation here) and create a Vagrantfile as follows:

1
2
3
4
5
6
7
8
9
Vagrant::Config.run do |config|
  config.vm.box = "raring"
  config.vm.box_url = "http://cloud-images.ubuntu.com/raring/current/raring-server-cloudimg-vagrant-amd64-disk1.box"
  config.vm.forward_port 80, 8880
  config.vm.share_folder("v-root", "/vagrant", ".")
  Vagrant::Config.run do |config|
    config.vm.customize ["modifyvm", :id, "--memory", 512]
  end
end

This should get you a VM running Ubuntu 13.04. Run vagrant up and your new Ubuntu environment should download and boot and you can use vagrant ssh to log in. You can then follow the Docker installation instructions. Once installed, you can run docker commands (enter docker at the command line to get a list of available commands).

Docker Basics

We’re now ready to start doing interesting stuff.

Before we can go any further, we need a base image for our guest operating system. This has to be a Linux distro (non-Linux operating systems are not supported) but it doesn’t have to be the same version as the host, or even the same distro. In this example, let’s pull a CentOS guest image:

1
$ docker pull centos

This pulls the image from the Docker repository (more about this later). Let’s start a container based on this image:

1
$ docker run -i -t centos /bin/bash

You should now be logged in to your container as root. From the shell prompt, we can do any of the things we’d normally do – install applications Remember, however, that the CentOS base image we downloaded is very minimal, containing only the bare essentials. We’ll probably want to install some additional packages:

1
2
3
4
5
6
7
$ yum install ruby-irb
# ...yum output...
$ irb
irb(main):001:0> puts "Hello, docker"
Hello, docker
=> nil
irb(main):002:0>

So far, so good. Now let’s quit the bash prompt and see what happens:

1
$ exit

What happened to our container? Run docker ps to get a list of running containers and it shows nothing.

1
2
$ docker ps
ID          IMAGE       COMMAND     CREATED     STATUS      COMMENT     PORTS

Run docker ps -a (“a” for “all”) and you can see your container, along with an exit code indicating that it’s no longer running.

1
2
3
$ docker ps -a
ID             IMAGE        COMMAND      CREATED          STATUS      COMMENT     PORTS
969373734016   centos:6.4   /bin/bash    2 minutes ago    Exit 0

Unlike VMs, containers don’t need to boot up or shut down the whole OS. Unless you’re running a process in it, your container isn’t taking up any resources apart from some disk space. Once our bash process has finished, your container is closed.

What about the changes made to the filesystem – the Ruby package we installed, for instance? That’s still there, and will remain until the container is deleted (using docker rm). We can get back to our container by “restarting” it – this takes whatever command you ran originally (in our case, /bin/bash) and runs it again. docker restart will start your container in the background, so you need to run docker attach to start interacting with it. Voila, our bash prompt returns!

So far, we’ve seen how to download a base image, start a container, make some changes in it, exit, and restart. What about creating images? If we always have to start from a minimal base image then we’re going to waste a lot of time installing things. Let’s say that you want to use docker for testing LAMP-stack applications – it would be really useful to have a base image that included Apache, PHP and MySQL. Fortunately, that’s very easy to do. Let’s start a new container:

1
$ docker run -i -t centos /bin/bash

And, inside our container, install our key dependencies:

1
$ yum install httpd php php-common php-cli php-pdo php-mysql php-xml php-mbstring mysql mysql-server

After these packages have installed, we can exit from the container and use a new docker command – commit:

1
2
3
4
5
6
$ exit
$ docker ps -a
ID             IMAGE        COMMAND      CREATED          STATUS      COMMENT     PORTS
1ae6304e2514   centos:6.4   /bin/bash    5 minutes ago    Exit 0
$ docker commit 1ae6304e2514 LAMP
bd2f18527e54

What this does is save our container as a new image, which can be used as the starting point for new containers in future. The number which is printed after docker commit is the ID of our new image. We can see it in the list of images:

1
2
3
4
5
$ docker images
REPOSITORY          TAG                 ID                  CREATED
centos              6.4                 539c0211cd76        5 weeks ago
centos              latest              539c0211cd76        5 weeks ago
LAMP                latest              bd2f18527e54        9 seconds ago

So, we’ve got a new base image, called LAMP, which comes with PHP, MySQL and Apache pre-installed. Let’s start a container with it:

1
2
3
4
5
$ docker run -i -t -p :80 LAMP /bin/bash
$ php -v
PHP 5.3.3 (cli) (built: Feb 22 2013 02:51:11)
Copyright (c) 1997-2010 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies

Great, PHP is installed and working. Did you see the -p :80 parameter in the docker run command? That tells docker to forward port 80 from the container to the host. This means that if we run Apache on port 80 inside the container, we’ll be able to connect to it on port 80 on the host system. If you’re running the host OS inside Vagrant then the Vagrantfile earlier in this post will forward that back to port 8880 on your main system.

Within the container, start Apache:

1
/sbin/service httpd start

Now, hit the server (http://localhost:80 if you’re running Docker natively, http://localhost:8880 if you’re running it inside Vagrant). You should see something like this:

Apache test page

OK, this is great, but nobody wants to see the Apache test page. We need a way of deploying some code to our container. Given that we have shell access to our container, we could just download some code using wget or scp or any number of other tools. Right now, the approved way of downloading code to a Docker instance is to use docker insert (I say “right now” because insert is a recently-added command, and it’s possible that its behaviour may change). To download Drupal core:

1
2
3
$ docker insert LAMP http://ftp.drupal.org/files/projects/drupal-7.22.tar.gz /root/drupal.tar.gz
Downloading 3183014/3183014 (100%)
26d9b3e0438f631b2f030eedffb6b14216261c9e4ecab035e9123ebf4e3460e7

This downloads the file into our LAMP image. The long hash code which is printed to the screen is the ID of the new image which is created as a result (you can’t modify an image directly, except by creating a new image which inherits from it – think of this as being like a new git commit).

Having to use a 64-character ID every time we want to use our image would get annoying, so let’s ‘tag’ it:

1
2
3
4
$ docker tag 26d9b3e0438f631b2f030eedffb6b14216261c9e4ecab035e9123ebf4e3460e7 LAMP drupal
$ docker images
LAMP                latest              bd2f18527e54        22 hours ago
LAMP                drupal              26d9b3e0438f        9 seconds ago

Now we can fire up a container based on our new image, extract Drupal, and configure MySQL and Apache:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
$ docker run -i -t -p :80 LAMP:drupal /bin/bash
# Extract Drupal to /var/www/drupal
$ tar zxf /root/drupal.tar.gz --strip=1 -C /var/www/html
# Slight hack needed to get MySQL to start
$ echo "NETWORKING=yes" > /etc/sysconfig/network
$ /sbin/service mysqld start
Initializing MySQL database:  Installing MySQL system tables...
OK
# MySQL will now print some messages
...
Starting mysqld:                                           [  OK  ]
# Let's create a MySQL DB
$ mysql -uroot
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.69 Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database drupal;
Query OK, 1 row affected (0.00 sec)
mysql> grant all privileges on drupal.* to 'drupal'@'localhost' identified by 'drupal';
Query OK, 0 rows affected (0.00 sec)
mysql> flush privileges;
mysql> exit
Bye
# Create the Drupal settings file and point to the DB we just created
$ cp /var/www/html/sites/default/default.settings.php /var/www/html/sites/default/settings.php
# Edit the database settings so that the DB name is 'drupal', username is 'drupal' and password is 'drupal'
$ vi /var/www/html/sites/default/settings.php
# Set up a files directory and some basic file permissions
$ mkdir /var/www/html/sites/default/files
$ chown apache:apache /var/www/html/sites/default/files
# Fire up Apache
$ /sbin/service httpd start
Starting httpd:                                            [  OK  ]

Remember when we hit the Apache test page earlier? Go back to the URL we used then and add /install.php on the end, so you get http://localhost/install.php if you’re running Docker locally, or http://localhost:8880/install.php if you’re running Docker inside Vagrant. If everything worked, you get this:

Drupal installation screen

From here, you can complete your Drupal installation. Once you’ve finished, you can commit your new image, and then you have a base image with Drupal pre-installed. The end!

Some questions

That took a long time. Can we automate some of it?

Yes! In a real scenario, logging in and typing in shell commands is a painful way to create an image. Docker supports “build” files which are simple scripts that can be used to perform steps such as running commands, importing files and exposing forwarded ports. So long as you don’t kick off any long-running processes, a build file typically takes only a few seconds to run and will create your image for you.

Can I share my image?

Yes! The Docker repository is a place where you can push images you’ve created, and download images created by others. The Docker index provides a web front end to search for images. The image created in the above example is uploaded here.

What can I actually use Docker for?

This is a big question. Right now, Docker is new and there will be use cases that nobody has thought of, or had time to investigate. But some of the obvious ideas are:

  • Deployment If you can build an image that works on your desktop, you can run it on a server too. Because the image includes everything – a full copy of the OS and all dependencies (PHP, MySQL and Apache in our example, but it could be anything), then you don’t need to worry about which versions are running in which environments. You just need a server that can run Docker.
  • Testing If we can script the construction of containers, and we can script Docker itself (using shell scripts, or Rakefiles, or whatever) then we could build an automated testing process using Docker to create our test environments. Say you want to know if your application works in all versions of CentOS, or works across CentOS, Ubuntu and Arch; you could have Docker build files for each distro version, and run these every time you want to create images with the latest version of your application for testing.
  • Isolation Because processes inside the container are isolated from the host, it’s a great way of running code safely. If your application requires several processes to run, you could put each one inside a different container. The containers could even be running different distributions, meaning that your battle-tested enterprise service can run in a RHEL 6 container and your bleeding-edge NodeJS application can run in a Ubuntu 13.04 container.

Ulimately, Docker could change how we see “applications”, at least ones that are deployed on servers in the cloud. Instead of your application being a thin layer of code that sits on top of multiple layers of services over which you have little control, you could package everything in a known working combination, and ship that instead.

Explaining Architectural Tiers in Drupal

Like most web frameworks, Drupal has three easily-identifiable architectural layers:

  • Data, including the DB abstraction layer and some contrib modules such as Views
  • Logic, consisting mostly of module code
  • Presentation, representing by the theme layer

When thinking about Drupal, it’s therefore pretty easy to apply some familiar (outside of Drupal-world) concepts and principles. And the three-tier architecture is popular precisely because it’s a good model that is widely applicable. The three-tier approach can easily become the MVC model, which most other web frameworks explicitly adopt as their fundamental architecture, though the variation in quality of these “MVC” frameworks suggests that they don’t always adopt it correctly.

So, at a trivial level, it’s possible to describe Drupal as having a three-tier architecture. But there are two important things that this fails to capture:

The Presentation-Abstraction-Control model (or Hierarchical MVC)

Drupal doesn’t just have a “flat” MVC model: it has a “hierarchical” MVC model more similar to the PAC model (I apologise for the use of these buzzwords, but I think it’s better to try to use existing terms that some people might be familiar with than to create new ones). In the PAC model, different components have their own Presentation-Abstraction-Control triads, and these form part of the overall output. This contrasts with a “flat” MVC model which involves a controller manipulating a model in order to create some kind of data structure which is then handed over to the view for complete rendering. Below is a conceptual diagram of a block within a page, which contains a view, which in turn contains two nodes. Each is rendered separately, and the output is incorporated within the output of its container.

Hierarchical MVC, showing a page containing a block, which contains a view, which contains two nodes

As an example, consider a Drupal page which contains some blocks. Each block is rendered separately, with its own model (data), view (theme) and controller (module implementation of hook_block()). The block returns its finished product to the page, which simply incorporates it. Under flat MVC, the block might return a data structure that the higher-level view must transform into HTML.

So, whilst there are three layers in each of these processes, what we actually have is a series of decoupled MVC triads. Many MVC frameworks have to invent new concepts in order to squeeze this idea into their otherwise “flat” MVC implementation (e.g. Action Helpers in Zend Framework).

Modules are Vertical, not Horizontal

A module actually represents a vertical “slice” of the three-layer framework. A module can define a data schema, provide business logic, and define and implement theme functions necessary to present the data and logic to the user. Modules (or groups of tightly-related modules) provide whole “free-standing” units of functionality. They also (often) provide APIs allowing other modules to access that functionality, and plug their own behaviours into it.

Image showing how Drupal modules consist of a vertical slice of the three-tier architecture

This can break some assumptions about three-layer architectures. For example, one assumption might be that any part of the logic layer can talk directly to any part of the data layer. But if we already have some logic for accessing that part of the data layer (an API implemented by a module) then it’s really important that this always be used – we should never really update, say, the node table directly in a SQL query when node_save() does a much better job – and ensures that other modules get a chance to react to your changes via hook_nodeapi() (or its Drupal 7 replacements).

In “Domain Driven Design” terminology, we might say that each module represents a “bounded context”, an area of your application which should only be accessed via exposed interfaces rather than, say, direct mucking about with the data layer. In Drupal terms, this means that we only make changes to a module’s data using the APIs provided by that module itself. Direct changes, made either by other modules or by external processes, can result in inconsistent behaviour, because the module won’t even know that they have happened.

If treating modules as vertical “slices” of an application, and putting a ban on direct updates to the data owned by this “slice” sounds like it’s a restriction for no real gain, also consider the benefits of doing this. Drupal sites are complex interwoven lattices of data and functionality. Well-written modules expose a lot of their internal behaviour to the rest of the system, making it easy for other modules to rely on the original module’s behaviour. So we can write a module that always performs a certain operation whenever some other module performs a data update (say, something that emails a list whenever a piece of content is updated). This relies on the module providing hooks for other modules to react to, but this is standard practice in well-written modules. It also relies on the module’s API always being used. In our previous example, if someone changed the content by writing directly to the DB, no hooks will be fired and our email will not be sent. The whole architecture is undermined, and it becomes impossible to build our new functionality. The cost of using the correct APIs is more than outweighed by the ease with which it becomes possible to add functionality in future developments.

Neither of the two major concepts I’ve outlined above are really explained by referring to Drupal as a three-tier architecture. Although it’s a true description of Drupal as a “framework”, it’s not a sufficient description of how applications built with Drupal really operate. Part of the challenge of communicating how Drupal works to a non-Drupal audience is explaining the real benefits of following the basic patterns of Drupal implementation as a matter of principle, without taking short-cuts to “easier” solutions that undermine much of the existing functionality, never mind the future developments.

Google: The IE6 of Search

It seems my prediction that 2011 might herald some innovation in the web search field is not as far-fetched as I originally assumed. This post on TechCrunch contains a lengthy list of Google’s flaws, and it chimes with my experience of using Google too. This provides supporting evidence to the thesis that Google search is stagnating; it isn’t keeping up with the increase in both breadth and depth of content on the web, and is failing to stay ahead of the efforts of spammers and “content mills” who profit from filling Google’s results with their own low-quality links.

Google is often described as a near-monopoly in search, but Google is a multi-sided platform which also places it in the role of near-monopsony consumer of web content. To put it another way, most websites now aim to make their content appealing to Google’s search algorithms and largely ignore the rest. “SEO” has become near-synonymous with “getting a good Google rank”.

We’re now so used to thinking of Google as the only way to find things on the web that when we fail to find something with Google, we’re apt to think that this must be because the thing we’re searching for simply doesn’t exist, or it’s our fault for being unable to craft the right combination of keywords necessary to coax the genie from the bottle. We don’t blame Google, we blame the web itself.

Alternatives exist, but adopting them requires conscious effort. The two main non-IE browsers – Firefox and Chrome – both have Google as their default search engine. Google is the default on iPhones and, unsurprisingly, Android devices too. In fact, Google’s attempts at gaining substantial market share in the browser and mobile device marketplaces can be seen partly as an attempt to ensure that their search engine remains the default choice. Mozilla is paid handsomely to ensure that Google remains a central part of the Firefox user experience.

If we have accounts on any of Google’s plethora of free services (and there are hundreds of millions of such users) then we’re always signed in to our Google accounts; Google gets some valuable tracking data from us, and gives us personalised search results in return.

Does this remind you of anything? Looked at a certain way, it’s not too different from the situation that existed around 5 years ago with IE6. Like Google, IE quickly came to be seen as a near-monopoly; web developers treated IE6’s behaviour as the standard to adhere to; sites that didn’t work in IE6 were regarded as “wrong” even if they were standards-compliant, and if sites performed poorly few non-techies would have blamed the browser software; alternatives existed, but were difficult to obtain (at least until March 2010), and Microsoft used all of their commercial muscle to ensure that millions of people encountered IE as the default option; bundling of IE with other Microsoft products, or the “enhancement” of Microsoft web properties when using IE pushed people towards the browser. All of these explain how a powerful incumbent can gain a dominant position in the market and, by using their strength in adjacent markets, and relying on the dependable habits of ordinary users, they can maintain that dominance even when the quality of the product stagnates or declines.

Of course, in theory, Google’s grip on the search marketplace should not be as strong as Microsoft’s grip on the browser and OS markets was. Changing your search engine is easier than changing your OS or your browser. But the psychological grip that Google has on its users may be greater than the grip IE6 had. We rely on Google because we trust their search results. By and large, Google might send us to some pretty dull websites that don’t contain what we want, but it’s unlikely to send us to something positively dangerous, containing viruses or phishing scams (though clearly this is not always the case). We trust Google to guide us along safe passageways across the web, and we might not trust a new search engine in the same way. Even if Google is under-performing, the psychological benefit of the trusted brand may outweigh this in the minds of users.

In my opinion, Google’s dominance of search cannot be sustained forever. The arms race against the spammers makes Google’s job very difficult; for every technical improvement they make to their search algorithms, the spammers fight back. And the spammers are more varied and agile; Google is a monolith in comparison. Meanwhile, newer search engines such as DuckDuckGo and Blekko are much less troubled by spam simply because nobody is really targeting their platforms. They have the same advantages that Macs have over Windows when it comes to viruses: there are very few Mac viruses because there’s much less to gain by writing them. It seems highly plausible that alternative search engines will be able to offer definitely better results than Google does, and that this gap may grow. Over time, the pressure to switch away from Google may become significant. Right now, it’s just the early adopters who are thinking about this, but these are the same kinds of people who were using Opera or Firefox 1.0 when IE6 was at the peak of its market share. It will probably take most of a decade for anyone to overturn Google’s entrenched position in the market, but right now it feels like the opportunity to start the process has opened up.

The Year Ahead: 2011

As is traditional at this time of year, thoughts turn from the year ahead; the days are getting longer, and new possibilities seem slightly less remote than they did before. For people in the tech industry, the beginning of a new year is a traditional opportunity to make some predictions about innovations likely to happen in the next 12 months, and I’m no different.

2010

Ahead of my thoughts on the year ahead, I’d like to reflect on 2010. For me, 2010 has been a good year, in which I’ve moved to London, set up in business and have gained some great experiences working with some very smart people on bigger and better projects than I’ve worked on before. I’ve also been very lucky that a lot of my personal interests have become close to mainstream in the tech industry, which has given me a way to pay the bills and do things I enjoy at the same time.

2011

What follows is a selection of technology areas that I think are likely to undergo some important changes in the coming year:

Information security

This is a huge problem area and none of the problems within it are easily solved, but the first step in solving a problem is admitting that it exists. 2010 has confronted the world with the evidence, and 2011 will be the year in which we start to admit that our current solutions don’t work. The Gawker hack, Firesheep and several other high-(and low-)profile incidents have made it clear that we’re just not safe relying on weak or non-existent encryption and weak password-based authentication. This should be obvious, and is obvious to many people, but I suspect that this belief is about to go mainstream. Most people rightly ignore scare stories about “hackers” stealing their personal data, but every year the risks get greater as we rely ever more on our outdated infrastructure to support rich digital interactions, and the scale of these risks will force people to take them seriously.

Personal Data Stores

The Personal Data Store is a concept which may help to address my previous point. A PDS provides a secure, trusted repository for personal data of all kinds, and mechanisms to allow selective access to this data to third parties. It replaces the current chaotic mess in which the personal data of individuals is frequently collected by stealth, stolen, traded or assumed incorrectly without the knowledge or control of the individual. With Personal Data Stores, we may have the chance to control who knows what about us, and on what terms we grant that knowledge. A PDS might serve as proof of identity, membership, ownership or certification, a means of managing “social graph” information and both personal and business relationship data, and a repository for the data and documents that we create in our daily lives, either as a by-product or a direct creative effort. And all of this could be done with greater security and control than available at present. Too good to be true? We might find out soon enough.

Education

Bit of a broad topic, I admit, but education – particularly adult education – is ripe for change. In the UK, student tuition fees are likely to rise considerably, to the point where the assumption that a university education pays off – in narrow financial terms, at least – may no longer hold for a significant number of people (indeed, the the “graduate premium” is already negative for some). This should lead to increased demand for alternatives to university education, and this is probably a good thing. The great expansion of university education over the past decade has been a good thing too, but the challenge we’re facing now is to ensure that the education people receive is actually serving their interests. Too many people are graduating and finding that their qualifications are not taken seriously by employers, and some universities don’t seem to think that their role involves much more than giving students a basic grounding in a topic and a certificate at the end of it. To me, universities have to be places where people strive to break their personal boundaries and discover the boundaries of their field; anything less than that is a waste. University education cannot remain as the sole “acceptable” way of preparing oneself for professional work, and the neglect and stigmatisation of “vocational” study will hopefully be overcome by new modes of education that don’t require fees that reduce the return on investment to zero or below.

Search

This is my wildcard pick. Surely everyone knows that Google has the search business sewn up, and Bing only keeps up some semblance of competition due to Microsoft’s willingness to keep funding it? This all seems pretty undeniably true. But the very lack of change in the search market could be a sign of stagnation. When it comes to Google, the question “what have you done for me lately?” yields answers that are not particularly impressive. The Android OS is nice, but it’s neither as polished as iOS or as open as MeeGo. Most of Google’s recent “innovations” have been expensive flops – technically interesting but lacking any vital spark of usefulness. Google is sitting on a massive pile of cash, and a lesser company might see that as a reason to pay out a hefty dividend to their shareholders. But not Google; they’ve made some acquisitions, but mostly they’re just sitting on it. Google now faces a challenge even more difficult than becoming #1 in the search market: what do you do when you’ve won, and everyone knows it? Microsoft faced this problem in the operating system market over a decade ago, and whilst they continue to turn an extremely healthy profit, nobody sees them as a vital force any more, and their operating system dominance continues mostly because of vendor lock-in rather than technical superiority. But if lock-in was easy for Microsoft to achieve in the OS market, it should be much more difficult to achieve in the search market. With cloud computing, the cost of competing with Google’s infrastructure is coming down, and the spread of new types of device and new types of search could easily provide opportunities for competitors to sneak in. Finally, I take the mere existence of Duck Duck Go to be evidence of the fact that it’s still possible for new “traditional” search engines to appear even as old ones finally die.

So, that’s my set of predictions for 2011. I could have made some easier ones – 2011 will see more Wikileaks clones, more open source software being adopted by enterprise heavyweights, more Scala and less Java – but those are hardly worth placing bets on. There are other notions which I hold more in hope than expectation – better politics, economic renaissance, Liverpool FC to start playing decent football again – and which therefore can’t really count as predictions. Overall, I’m optimistic about 2011, if only because I see in the continued progress of technology the possibility to solve more and greater problems. Here’s hoping it’s a good year for all!

Re-launch on Drupal 7 - First Impressions

With the final Drupal 7 release only days away, it’s about time for me to get up to speed with the latest and greatest release. I’ve had my head down on a massive Drupal 6 project lately, and this has kept me from spending too much time with Drupal 7. Upgrading this site has been a crash course in Drupal 7’s many changes.

First impressions of Drupal 7

Compared to previous upgrades, going from D6 to D7 has been remarkably pain-free. Not all of the old modules have D7 equivalents yet, but a considerable number of them do – more than enough for a fairly basic site like this one. Great work has been done by many module maintainers in ensuring that the long wait for module upgrades that plagued D6 isn’t being repeated. Unsurprisingly, all of the Drupal core upgrades went without a hitch.

Drupal 7 also feels a lot friendlier. The new admin themes (Seven is my favourite) are pleasant to look at and easy to use, and the garish nightmare of Garland everywhere is thankfully banished. The admin screens have been rationalised, and a handy toolbar is provided which makes navigation around the admin screens easy, although I’m not convinced that it’s better than the admin_menu module was in D6. Commonly-used options are easy to get to and the whole system feels a lot more “joined up” than previous releases did – little things like being able to get to the configuration or permissions screens for a module directly from the Modules page contribute to a sense of a cohesive system.

Since my main interest is as a developer, I also had to get my hands dirty with some module and theme code. So far, this has resulted in the Github module, a small-but-useful module that lets you easily display the “fork me on Github” ribbon on your site (I’m working on some other Github-based ideas too), and the currently-unpublished Slide Box module (which you’ll see in action when you reach the bottom of this post). Neither module gets beyond the surface of Drupal 7’s APIs, but the basics of Drupal module development don’t seem to have changed dramatically – and all of the changes I’ve seen have been improvements. Again, small API changes add up to a sense of a much better-designed system. I can already tell that the new database abstraction layer is going to make module development easier.

Theming has changed too, although I’ve spent less time looking at this and resultingly don’t have anything dramatic to say. For the site re-launch, I decided to take an off-the-shelf Drupal theme (Dingus) and modify it to suit my needs. Drupal 7’s theming feels a bit more modular and decoupled than Drupal 6’s did, and that’s a good thing in the long-term.

Ready for the big time?

So is Drupal 7 “ready”? Well, I’m still seeing some occasional PHP warnings, and some of the modules I’m using only worked if I took the latest version from CVS, but overall Drupal 7 feels a lot more stable than I expected. If I were starting a big project now, I’d give serious consideration to starting with D7 rather than D6, depending on when I expected to launch. The benefits of a nicer UI and improved APIs offset some of the trouble that can be expected from bleeding-edge code, and once developers get used to the new improved APIs it will be hard to go back to the old ones. All in all, I’m impressed.

System Features Module for Drupal

I’ve published the very first version of a new module for Drupal, the System Features Integration module (Github). This module integrates “system objects” (basically, modules and, eventually, themes) with Features, allowing you to store module enablement/disablement status and module weight in a feature.

Why would you want to do this?

The simple answer is that this could be really useful for automated deployments. Consider a situation where you’ve deployed version 1.0 of a Drupal-based site, and you’ve got version 2.0 running on a development or staging server. Version 2.0 includes several new modules and some adjustments to module weights. Without Features integration, upgrading v1.0 to v2.0 would involve enabling the modules and then manually adjusting the module weights, perhaps by editing the database by hand or by writing a custom module update script in a .install file. With Features integration, just install the new feature module and revert the feature and – bingo! – your new modules are enabled and weights adjusted. You can even use it to disable v1.0 modules that are no longer needed.

There are, of course, other ways to do this, and your mileage may vary. Install profiles can enable modules and adjust weights upon installation, though they’re not really any help during an upgrade. Drush can painlessly enable and disable modules from the command-line, and is easily scriptable, though module weight adjustments are tricky. Obviously there’s still plenty of other work involved in an upgrade or deployment, but this module is another step towards totally features-driven development and deployment; it moves one more aspect of configuration out of the database and ad-hoc update scripts, and in to features.

Future plans for the module include support for enabling/disabling themes, and for more complex module enablement settings. In the meantime, I’d love to hear from people using the module, or those who have useful suggestions about how it could be improved.

What Does a Drupal Architect Do, and What Do Architects on Drupal Projects Do?

Drupal occupies a strange place in the web framework landscape. It’s not a pure framework, like Ruby on Rails, Symfony or Zend Framework. Nor is it just a CMS or blogging product with the ability to host plugins, like WordPress. It’s somewhere inbetween, giving the developer a fully functional CMS as a platform but providing many of the flexible basic services and abstractions that would be expected in a more generic “framework”.

For this reason, talking about “architecture” in Drupal can be confusing. From one perspective, Drupal is the architecture. To consider this properly, let’s unpack the metaphor:

When we talk about system architecture, we’re talking about the bits of the project that aren’t going to change, or will change only very slightly, in response to feedback and information gathered during the project itself. The architecture is the stuff that we can rely upon to remain true for a long time. In this sense, the metaphor with building architecture is very accurate: when planning a new house or office block, we begin by figuring out where the walls, foundations and windows will go, and how the heating, power and water will be provided – we don’t worry about what colour to paint the walls, or whether to have carpet or wooden floors (in a Drupal project, this kind of thing is handled by the theme). And on a Drupal project, the main fundamental part is, well, Drupal. It defines how the user permissions system works, how content is stored in the database, the entire operation of the presentation layer, and with a few additional modules it can also define a lot about how integration with other systems works too.

So if Drupal gives us an architecture to work with already, do we need architects on Drupal projects? My answer is a tentative yes – there’s clearly a need for architects to design the overall system, of which Drupal may be only a part. And the other parts of the system, which are built from much lower levels of abstraction (say, on top of a Java framework) will need architects to design them. But when working with Drupal itself, the job of the architect is different from these other cases. If an architect tries to plan how a Drupal site should operate from first principles, he will be wasting a lot of his time since many of these decisions have already been taken – and tested in the real world – by others in the community. The job of an architect on a Drupal project is not to design but to understand how to get Drupal to do exactly what is needed in the most efficient way possible.

An architect with little knowledge of Drupal may look at the requirements for his system and say “aha, we need a service for querying large amounts of information from an external data store”. A more experienced Drupal architect would say “aha, we need to use Views, Schema and Table Wizard”. The experienced architect knows that contact forms can be done easily via Webforms and don’t require a custom module. This seems obvious to experienced Drupal developers, but it really is a quite strange concept to developers and architects from other backgrounds who are used to building applications from extremely flexible basic components.

Consider an organisation that needs to have multiple contact forms, allowing users to submit different kinds of information on each. Using something like Zend Framework, this would involve writing code to generate each form using Zend_Form, enforce validation rules using Zend_Validate_* classes and ultimately send the submissions via Zend_Mail. Now, each of those components is well-architected and de-coupled, and each can be subclassed and replaced easily. Zend_Mail has various implementors for different mail transports and encodings – it’s a very flexible library. But the amount of effort required to implement even a very simple user story with ZF is considerably greater than the effort required to do the same with Drupal, because Drupal provides pre-built architecture for this very common web pattern. In ZF, each form will require a separate form class which contains all of the form element definitions, labels, error messages and validation rules – and that’s before we consider the possibility of administrative users adding extra form fields without any code changes. In Drupal, the entire set of user stories is handled by the Webforms module, with little or no code to be written at all. Thus the Drupal architect’s greatest asset is the knowledge of the patterns already implemented by others, rather than the ability to produce designs of how to re-implement this system from first principles (which would probably end up looking something like the Zend Framework example, and would take as long – or longer – to code).

This example explains why Drupal is different. Other frameworks value – correctly, in some cases – flexibility over functionality. Adherence to the formal structures of object-oriented (or perhaps that should be called class-oriented programming) is prevalent here. This can make these web frameworks extremely flexible, with every component pluggable and replaceable, with intricate inheritance trees providing – in theory – code reusability. But in practice, Drupal’s approach of providing the developer with a set of prefabricated components that satisfy 90% of the likely user stories is simply more productive. Reusability is achieved by having modules that implement large chunks of functionality, with the potential to inject new functionality or override behaviour via Drupal’s aspect-oriented hook system. Whilst this might feel limiting to a design purist, it is an extremely pragmatic way of building high-functioning websites; quite simply, it’s easier to tweak the pre-fabricated components than it is to build new ones. When properly understood, Drupal allows architects and developers to skip ahead in huge leaps, building on top of the components already provided, focusing all efforts on the new and unique parts of a project, where those efforts are truly needed. This does mean accepting a limited scope for the architect, but this limited scope can be traded off against faster delivery and more time for in-depth consideration of the newest – and, by definition, riskiest – parts of a project. Architects can still add value and get satisfaction from solving the truly difficult problems.

So, my answer to my original question is that Drupal projects do need architects. But they need to understand that they’re not designing the whole system from the ground up – they’re designing only those parts that aren’t already there, and their greatest asset when doing so is their understanding of how the existing parts work.

Royal Mail Project Hits the Headlines, and Capgemini Launches "Immediate"

I haven’t blogged or tweeted much about work lately, and the main reason for this is that I have been working on a project that has been under wraps until yesterday.

The project has now been announced: I’ve been working with Capgemini on the migration of Royal Mail to Drupal. It’s a hugely exciting project – the site will ultimately be one of Europe’s largest Drupal sites – and it’s also great to be bringing Drupal into an enterprise setting with one of the world’s biggest enterprise consulting firms. When I started developing with Drupal almost five years ago, it was to enable me to build richly interactive, community-oriented sites for small businesses, charitable and non-profit organisations. I never imagined that 5 years later I’d be doing the same thing for one of the UK’s biggest and most recognisable brands.

For those outside the UK, it must be understood that Royal Mail is huge. It’s one of the UK’s largest employers, has a network of thousands of post offices, many thousands more post boxes and has a legal requirement to deliver to every single postal address in the country. I pass two post boxes just walking to the bus stop on my way into work in the morning. There’s something exciting about working on a site for a business that plays such a large role in so many people’s lives.

However, Royal Mail isn’t the sole focus – or even the main focus – of the project. What Capgemini is building is a fully-fledged framework of components for delivery of enterprise-scale web projects, with Drupal as a central component. This framework is called Immediate, and working on making this a great product, suitable for many customers, is where the main focus of my efforts are going. This means making use of great Drupal technologies such as Features, Drush (& make files) and CTools exportables to build a packageable system. It also means integrating with a wide variety of third-party providers for e-commerce, identity and authentication, CRM and more.

For such large-scale sites, performance is critical and we have great support from David Strauss and Four Kitchens. Pressflow, memcached, APC and a highly-tuned MySQL server all go into the mix, and Zeus provides an excellent reverse proxy and load balancing solution. No doubt many interesting performance challenges await as the site goes fully live, by which point it will be one of the most heavily-trafficked Drupal sites in the world.

It’s really great to see the good reception this project has had from the Drupal community, and hopefully this latest success will spur on even greater adoption of Drupal within the enterprise world.

Digital Economy Bill: It’s the Numbers, Stupid

Since my previous post on the Digital Economy Bill, Cory Doctorow has written another post, this time accusing Lib Dem MPs of “stand[ing] back” and allowing the Digital Economy Bill to proceed to the “wash-up”, the Parliamentary process by which bills that ran out of time before the dissolution of Parliament are nodded through. Now, I worship at Cory’s altar as much as any other geek, but I think he’s wrong on this.

As I said in my previous post, the government has the numbers to do what it likes so long as it retains the support of its own back-bench MPs. If they have the support of the Tories, government bills are virtually unsassailable. This is how ID cards, the DNA database, 28-day-detention-without-charge and, for that matter, the Iraq war have been approved by Parliament. The Lib Dems voted against them, which is about as much as you can do when Labour outnumbers you 7:1 and, with Tory support, 10:1. (If you’re wondering how the Lib Dems are that badly outnumbered despite getting 22% of the votes at the last election, you’ll want to consider how the electoral system works). The Lib Dems aren’t “standing back” as there’s actually no real way of stopping the government once it has decided to do something.

Worse, by saying that the Lib Dems are supporting the Labour/Tory consensus, Cory is letting the real culprits off the hook: the massed ranks of Labour back-benchers, with whom true power and responsibility lies. The Lib Dems have repeatedly called for further scrutiny and debate on this issue: David Heath first called for the second reading of the Bill to be held urgently, and Don Foster has made it clear that the Lib Dems are against the web blocking provisions and against the use of disconnection as a punishment for file-sharing – through negotiation, Lib Dems have already ensured that further legislation will be required before anyone gets disconnected, and that this must follow at least a year of studies considering alternatives, and a full consultation process. Since that legislation will have to occur on the other side of the general election, after which the current government may have either lost office or be forced into power-sharing with the Lib Dems, there’s a fair chance that disconnection will never happen.

Now, it’s certainly possible to push further. The clauses relating to disconnection and web blocking could be dropped from the Bill before it is passed. But it is not in the power of the Lib Dems to make this happen, due to the aforementioned Parliamentary arithmetic. We can be pretty sure that the Lib Dems will be voting against the Bill when it comes up, but we can also be pretty sure that they’ll lose due to Labour’s back-benchers supporting the government. Tom Watson is an honourable exception, but he doesn’t seem to have the support of many of his Labour colleagues.

There’s one final roll of the dice, though. Because of the rapid speed at which the government is pushing the Bill through, it has to go forward as part of the “wash-up” process. This is normally reserved for uncontroversial legislation that simply ran out of time before the election, but Labour have rarely stopped to worry about procedural niceties. In the wash-up, the parties come to an agreement about what to allow through, then – as I understand it – hold a series of votes which are effectively formalities, nodding legislation through. There is a chance that Labour can be spooked into dropping the controversial clauses in order to get the Bill through wash-up, or risk losing the whole package. This can only happen if Labour are worried about their own support on the back benches, which means that this is where pressure should be directed. I honestly can’t understand why Cory is putting the emphasis on the actions of the 60-odd Lib Dem MPs (many of whom have publicly said they’ll vote against the Bill anyway!) when there’s 400+ Labour MPs, many of whom are in very marginal constituences and will have to take complaints from their constituents, particularly those raised in an organised manner, very seriously right now. And if we want to put real pressure on them in Labour/Lib Dem marginal seats, it might be worth mentioning that, actually, the Lib Dems are the [relatively] good guys in this, and Labour are the party which created this Bill and are forcing it through Parliament.

So far as I can tell, the Lib Dems are doing about as much as they can. It’s not enough to stop the Bill, because there’s not enough Lib Dem MPs to do that. Follow the numbers and you can see where the battle over this bill is really being fought.

Digital Economy Bill: Have I Got This Right?

As anyone with an awareness of politics or technology issues will be aware, the British government has recently been attempting to pass the Digital Economy Bill. This is a wide-ranging piece of legislation, covering issues from digital radio to copyright infringement on the internet and much more besides. As the legislation has evolved, it has acquired – apparently at the behest of the government in the form of Lord Mandelson – greater powers to punish those accused of copyright infringement, and this is where the current controversy lies.

Now, first of all it should be pointed out that there are differing views on copyright itself. At the extremes lie the views that copyright is essentially wrong, as it prevents the totally free flow of ideas, and correspondingly the view that copyright should be absolute, giving the owner of a copyrighted work considerable powers to enforce their control over those works. In the middle, most of us accept that copyright provides a useful incentive to people to create things, by granting them a temporary monopoly on their creations, enabling them to profit from the sale of licensed copies of their work, but also believe that this needs to be balanced by rights of ‘fair use’, allowing others to share, remix and discuss these works in freedom.

The government, it is fair to say, have shown themselves to be on the side of the rights-owners, those who wish to maintain or extend their powers to enforce control over copyrighted works. The Digital Economy Bill provides new powers for rights-owners to seek the blocking of websites which they accuse of facilitating the sharing of copyrighted works, and to seek “technical measures” against individuals they accuse of sharing copyrighted works.

At this point, your views on the matter may diverge based on how much you know about the technology. As a self-confessed geek, I have to admit to knowing quite a lot, which gives me grave doubts about the feasibility of blocking websites or employing “technical measures” against individuals. Importantly, it is often hard to block websites in isolation. Many sites exist on “shared hosting” accounts, which mean that if one website on a particular server is being blocked, other – perfectly innocent – sites on the same server might get blocked too. This means that hosting providers – the people who operate the network infrastructure – have to be super-cautious about any threat of web blocking, because there is a risk that their customers may end up as collateral damage. This means that hosting providers often act on the mere threat of web blocking, simply taking down websites that are alleged to contain infringing content on the basis of nothing more than a solicitor’s letter. By creating a further threat, of state-enforced web blocking, the power shifts further away from individuals running websites and towards those who wish to threaten them. The Liberal Democrats (full disclosure: I’m a paid-up member) and Conservatives jointly acted to specify a proper, legal process for this in an amendment to the Bill; whilst an improvement over the government’s original proposals, it still leaves site operators under threat of site blocking. After a resolution at the party conference, Liberal Democrat Lords attempted to introduce a new, better amendment, but for reasons that I do not fully understand, this amendment was not adopted in the Lords. What we now have was better than the government’s original proposals, but still not good enough.

On the second point, “technical measures” against individuals, the situation is even less clear. “Technical measures” relates to the use of “throttling” to limit the amount of data that can be transferred over an internet connection, effectively degrading the service to the point of unusability and, if that is not judged to have had an effect, enforced disconnection from the internet can follow. This is an even more serious problem than web blocking, because it is almost guaranteed to create collateral damage. If one member of a household is accused of sharing copyrighted works, the rest of the household can be made to suffer for it. There is a good case to believe that this is a violation of natural justice and will, I imagine, end up being challenged on Human Rights grounds. The Open Rights Group (full disclsoure: I’m a paid-up member) has done great work in campaigning against this part of the Bill (and others!), and that campaign is rapidly approaching its moment of truth: the Bill is due to be voted on on April 6th and, at present, “technical measures” are still very much part of the Bill.

Matters are further complicated by the Labour government’s abuse of Parliamentary procedure. They are attempting to pass this legislation with the barest minimum of debate in the House of Commons; the Bill will pass without a committee or report stage and will be made law as part of the ‘wash-up’ process that exists to fast-track pending legislation once a general election has been called. Given that this is not an emergency bill, and that there still exists substantial disagreement over its contents, this can fairly be called abuse of the procedure.

However, it’s important not to let the unusual circumstances obscure the Parliamentary reality: Labour have a considerable majority and the Conservatives are sympathetic to the Bill – it would have passed even if it had been debated for months. The Iraq War, ID cards, 28-days detention, Control Orders, the exemption of MPs expenses from the Freedom of Information Act, the one-sided Extradition Act, the DNA database, the Legislative and Regulatory Reform Act, restriction of trial by jury and countless other smaller but no less pernicious pieces of government business have been passed by sheer weight of Labour’s numbers, often with Conservative support or sympathy. The Liberal Democrats have voted against all of these, but this has never succeeded in preventing the legislation, on the simple basis that Liberal Democrat MPs comprise fewer than 10% of the total in the Commons (despite receiving 22% of the vote at the last General Election). In some cases, Lib Dem amendments have succeeded in taking some of the sharp edges off Labour’s legislative flails, and when Labour’s back benches have remembered their consciences it has been possible to defeat the government – on a grand total of six occasions in the last five years.

But does that mean that the Lib Dems are doing enough? Well, I’m not sure. Certainly the Lib Dems have taken the most sensible position of the three main parties, opposing web blocking and disconnection. At this point, I’m getting mixed messages about how effective that opposition has been; Don Foster MP says that for people to be forcibly disconnected from the internet, further legislation will be required in the next Parliament and that the Lib Dems will oppose it when it comes up; Jim Killock of ORG says that this isn’t good enough, as it will likely be passed when it does. He is probably right, given that the legislation will take the form of “Statutory Instruments”, which are normally approved with little scrutiny. This is a favourite trick of Labour’s – pass the uncontroversial bits when everyone is paying attention, but give the Secretary of State (Mandelson, in this case) the power to add the worst bits back in via SIs when nobody’s paying much attention. Jim wants to avoid things getting to that stage and seems to be hoping that moral pressure from the Lib Dems might persuade the government to drop the whole idea. However, given the previous record (enumerated above), I somehow doubt that this will work. A better hope is that Mandelson might not be around in a year’s time and if the election is as close as the polls currently predict, the Lib Dems might end up holding the balance of power, a much stronger position from which to block the disconnection powers from coming into use.

In any case, pressure from the Lib Dems looks like our best hope right now and we should certainly be pursuing it, along with a vigorous campaign by ORG members and supporters. With the General Election bearing down on us, it might just be possible to spook enough Labour backbenchers into pressuring their own side into dropping the worst parts of the Digital Economy Bill before it passes into law. A cynic might remark that if two million people marching didn’t stop the Iraq war, the few hundred of us outside Parliament a few days ago is unlike to stop this Bill, but that’s no reason to give up on campaigning.

My question is: have I got this right? Is there more to this than I’ve realised, and is there some nuance of Parliamentary procedure which actually makes the actions of the Lib Dems more important than I’ve realised? Is there more that we can do?