Matteo Caprari home

generating SVG charts with couchdb

posted on 08 Dec 2009

In this article I describe how I got couchdb to produce SVG charts using list functions

This post is long, so I'll report the results first:

group_level=1 yearly averages svg - png
group_level=2 monthly averages svg - png
group_level=3 daily values svg - png
 

Now go and read how I did it:

  1. generate some test data
  2. upload test data to couchdb
  3. create and manage a design document with couchapp
  4. write a simple view with map/reduce
  5. write a _list function and render the charts!
  6. Conclusions

Apache CouchDB is a document-oriented database server, accessible via a RESTful JSON API. It has some advanced features, such as the ability to write 'views' in a map/reduce fashion and to further transform the results using javascript. It's a young but very promising project.

Try this at home

You can browse browse or download all code discussed here. All comments and corrections are welcome.

Generate some test data

To get started with this exploration we need some data to render, and a quick way to visualize it before our application is ready. This Python script generates a series of data points that simulate the goings of someone's bank account.
# test_data.py. Usage: python test_data.py <simulation_length>
import sys
import random

days = int(sys.argv[1])
savings = 10000
pay = 2000
for i in range(0, days):
	if ( i%30 == 0):
		savings = savings + pay
	savings = savings - random.randint(0, pay/16) - 2
	print i, (int(savings))
Use the script to generate a sample set with 3000 points:
$ python test_data.py 3000 > test_data.txt
$ cat test_data.txt
0 11947
1 11882
2 11813
...
Our final output will be similar to a line chart made with some bash and gnuplot:
#!/bin/sh
# gnuplot.sh generates a plot of a series piped in stdin
(echo -e "set terminal png size 750, 500\nplot \"-\" using 1:2 with lines notitle"
cat -
echo -e "end") | gnuplot
$ cat test_data.txt | sh gnuplot.sh > test_data.png

Upload test data data to couchdb

We need our data in json format so that it can be uploaded to couchdb. This python scripts converts each input line to a json object. Each object will become a document in couchdb. All lines are collected in the 'docs' array, to make the output compatible with couchdb bulk document api. It also adds a tag to each document, so it's easier to upload and manage multiple datasets.
# data_to_json.py. builds json output suitable for couchdb bulk operations
import sys
import datetime
date = datetime.datetime(2000, 01, 01)
tag = sys.argv[1]
print '{"docs":['
for line in sys.stdin:
	day, value = line.strip().split(' ')
	datestr = (date + datetime.timedelta(int(day))).strftime("%Y-%m-%d")
	if (day <> "0"): print ","
	sys.stdout.write('{"tag":"%s", "date":"%s", "amount":%s}'%(tag, datestr, value)),
print '\n]}',
$ cat test_data.txt | python data_to_json.py test-data > test_data.json
$ cat test_data.json
{"docs":[
{"tag":"test-data", "date":"2000-01-01", "amount":11896},
{"tag":"test-data", "date":"2000-01-02", "amount":11876},
....
{"tag":"test-data", "date":"2008-03-17", "amount":18703},
{"tag":"test-data", "date":"2008-03-18", "amount":18643}
]}
Create a new database with name svg-charts-demo
$ curl -i -X PUT http://localhost:5984/svg-charts-demo/
HTTP/1.1 201 Created
...
{"ok":true}
Upload the test data
$ curl -i -d @test_data.json -X POST http://localhost:5984/svg-charts-demo/_bulk_docs
HTTP/1.1 100 Continue

HTTP/1.1 201 Created
....
Verify that 3000 documents are in the database.
$ curl http://localhost:5984/svg-charts-demo/_all_docs?limit=0
{"total_rows":3000,"offset":3000,"rows":[]}

Create and manage a design document with couchapp

Design documents are special couchdb documents that contain application code such as views and lists. CouchApp is a set of scripts that makes it easy to create and manage design documents. In most cases installing couchapp is matter of one command. If you have any problems or want to know more, visit Managing Design Documents on the Definitive Guide.
$ easy_install -U couchapp
This command creates a new couchapp called svg-charts and installs it in couchdb
$ couchapp generate svg-charts

$ ls svg-charts/
_attachments  _id  couchapp.json  lists  shows  updates  vendor  views

$ couchapp push svg-charts http://localhost:5984/svg-charts-demo/
[INFO] Visit your CouchApp here:
http://localhost:5984/svg-charts-demo/_design/svg-charts/index.html

Write a simple view with map/reduce

This view will enable us to group the test data year, month or day and see the average for each group.
// map.js
// key is array representing a date [year][month][day]
// value is each doc amount field (a number)
function(doc) {
	// dates are stored in the doc as 'yyyy-mm-dd'
	emit(doc.date.split('-'), doc.amount);
}
// reduce.js
// this reduce function returns an array of objects
// {tot:total_value_for_group, count:elements_in_the_group}
// clients can than do tot/count to get the average for the group
// Keys are arrays [year][month][day], so count will always be 1 when group_level=3
function(keys, values, rereduce) {
	if (rereduce) {
		var result = {tot:0, count:0};
		for (var idx in values) {
			result.tot += values[idx].tot;
			result.count += values[idx].count;
		}
		return result;
	}
	else {
		var result = {tot:sum(values), count:values.length};
		return result;
	}
}
Update the design document and test the different groupings
$ couchapp push svg-charts http://localhost:5984/svg-charts-demo/
Call the view with group_level=1 to get the data grouped by year
$ curl http://localhost:5984/svg-charts-demo/_design/svg-charts/_view/by_date?group_level=1
{"rows":[
{"key":["2000"],"value":{"tot":4247068,"count":366}},
...
{"key":["2008"],"value":{"tot":1529286,"count":78}}
]}
Call the view with roup_level=2 to get the data grouped by month
$ curl http://localhost:5984/svg-charts-demo/_design/svg-charts/_view/by_date?group_level=2
{"rows":[
{"key":["2000","01"],"value":{"tot":343578,"count":31}},
{"key":["2000","06"],"value":{"tot":345282,"count":30}},
...
Call the view with roup_level=3 to get the data grouped by day. As all the keys are different at the third level, this returns a single row for each document.
$ curl -s http://localhost:5984/svg-charts-demo/_design/svg-charts/_view/by_date?group_level=3
{"rows":[
{"key":["2000","01","01"],"value":{"tot":11896,"count":1}},
{"key":["2000","01","04"],"value":{"tot":11747,"count":1}},
...
Same as above but limiting the response to a range of days
$ curl -s 'http://localhost:5984/svg-charts-demo/_design/svg-charts/_view/by_date?group_level=3
&startkey=\["2008","01","01"\]&endkey=\["2008","01","04"\]'
{"rows":[
{"key":["2008","01","01"],"value":{"tot":20050,"count":1}},
{"key":["2008","01","02"],"value":{"tot":20019,"count":1}},
{"key":["2008","01","03"],"value":{"tot":19974,"count":1}},
{"key":["2008","01","04"],"value":{"tot":19878,"count":1}}
]}

Write a _list function and render the charts!

function(head, req) {
	start({"headers":{"Content-Type" : "image/svg+xml"}});
	
	// some utility functions that print svg elements
	function svg(width, height) {
		return '<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"'+
		' style="fill:black"'+
		' width="'+width+'" height="'+height+'">\n';
	}
	function line(x1, y1, x2, y2, color) {
		return '<line x1="'+x1+'" y1="'+y1+'" x2="'+x2+'" y2="'+y2+'"
			style="stroke-width: 0.2; stroke:'+color+'"/>\n';
	}
	function rect(x, y, width, height, color, fill) {
		return '<rect x="'+x+'" y="'+y+'" width="'+width+'" height="'+height+'"
			style="fill:'+fill+'; stroke:'+color+'"/>\n';
	}
	function text(x,y, text) {
		return '<text x="'+x+'" y="'+y+'" font-size="11"
			font-family="sans-serif">'+text+'</text>\n';
	}
	
	// import query parameters
	var x_size = req.query.width || 750;
	var y_size = req.query.height || 500;
	var level = parseInt(req.query.group_level);
	
	// find max and min values
	// collect values and labels
	var y_max = null;
	var y_min = null;
	var values = [];
	var labels = [];
	var count = 0;
	while(row = getRow()) {		
		var value = Math.ceil(row.value.tot/row.value.count);
		if (y_max==null || value>y_max) { y_max=value; }
		if (y_min==null || value<y_min) { y_min=value; }
		values[count] = value;
		labels[count] = row.key.join('-');
		count++;
	}
	// calculate scalig factors
	var in_width = x_size-(2*pad);
	var in_height = y_size-(2*pad);
	var in_x_scale = in_width/count;
	var in_y_scale = in_height/(y_max-y_min);
	
	// free space surrounding the actual chart
	var pad = Math.round(y_size/12);
	
	send('<?xml version="1.0"?>');
	send(svg(x_size, y_size));
	
	// background box	
	send(rect(1,1, x_size, y_size, '#C6F1C7', '#C6F1C7'));
	
	// chart container box
	send(rect(pad,pad, x_size-(2*pad), y_size-(2*pad), 'black','white'));

	// draw labels and grid
	var y_base = y_size - pad;
	var lastx = 0;
	var lasty = 0;
	for(var i=0; i<count; i++) {
		var x = pad+Math.round(i*in_x_scale);
		if (i==0 || x-lastx > (30+12*level)) {
			send(line(x, y_base+(pad/2), x, pad,'gray'));
			send(text(x+3, y_base + (pad/2), labels[i]));
			lastx = x;
		}	
		var y = Math.round(y_base - ( (values[i]-y_min) * in_y_scale));
		if (i==0 || lasty-y > 15) {
			send(line(5, y, pad+in_width, y,'gray'));
			send(text(5, y-2, values[i]));
			lasty = y;
		}
	}
	// draw the actual chart
	send('<polyline style="stroke:black; stroke-width: '+ (4-level) +'; fill: none;" points="');
	for(var i=0; i<count; i++) {
		if (i>0) send(',\n');
		var x = pad+Math.round(i*in_x_scale);
		var y = Math.round(y_base - ( (values[i]-y_min) * in_y_scale));
		send( x + ' ' + y);
	}
	send('"/>');
	
	send('</svg>');
}
Update couchapp, and execute the list function 'chart-line' against the view 'by_date'. Use different group_level settings, to obtain different charts:
curl http://localhost:5984/svg-charts-demo/_design/svg-charts/\
_list/chart-line/by_date?group_level=3 > chart-line_level-3.svg

curl http://localhost:5984/svg-charts-demo/_design/\
_list/chart-line/by_date?group_level=2 > chart-line_level-2.svg

curl http://localhost:5984/svg-charts-demo/_design/\
_list/chart-line/by_date?group_level=1 > chart-line_level-1.svg
group_level=1 yearly averages svg - png
group_level=2 monthly averages svg - png
group_level=3 daily values svg - png
 

Concusions

It worked.

I didn't expect to use a single list function for all grouping levels. I'm particularly happy of how it worked out, and even more considering that the whole thing is about 100 lines of code.

The output isn't too nice, but I think I can be made presentable with under 500 lines of code and some effort.

Couchdb is always a pleasure to work with and it goas a long way in minimizing "Time To something Done".

blog comments powered by Disqus