Polipo — a completely unscientific benchmark

Here are a few benchmarks. I compare GNU wget speaking directly to the network with wget going through Squid with wget going through Polipo.

I make no claim about the validity of these results; they were only made to satisfy my idle curiosity. In fact, these benchmarks are completely invalid for quite a few reasons:

On the other hand, the tests were roughly reproducible: they were run three times with two hours between the runs; with two exceptions, the times remained within 5% of each other. The client machine had plenty of memory, and all the binaries were already in the system's cache.

You should also remember that batch performance is a meaningless measure. Stability, functionality and latency are what really counts.

Local ethernet

In this test, we get one, two or three disjoint subtrees of our local web server in parallel with the command wget -r; all the pages requested are static. Every time, the proxies are restarted with an empty cache and (in the case of Polipo) no information about the capabilities of the server. Squid was using its default cache implementation, Polipo was run with no on-disk cache.

All times are real time in seconds (smaller is better).

1 client2 clients3 clients
Wget1.21.72.1
Wget through Squid0.81.41.8
Wget through Polipo (4 MB/8 MB)0.71.31.7
Wget through Polipo (400 kB/800 kB)5.55.64.0

Wget exhibits a mild inefficiency when speaking to a local server. The two proxies have similar performance when given enough memory.

When very short on memory, Polipo doesn't manage to keep full objects in core. As the server is very fast, it sometimes overtakes the client, and causes Polipo to discard data that the client hasn't read yet; Polipo then needs to fetch the data again (some segments of some resources were fetched as many as three times). With three clients speaking to only one server, this effect becomes less noticeable.

Remote host: static pages

In this test, we do the same with a host on the other side of the Atlantic; all pages under consideration are static. The test was done at a time that was the middle of the night locally and late evening on the other side.

All times are real time in seconds (smaller is better).

1 client2 clients3 clients
Wget8.88.88.8
Wget through Squid9.19.19.1
Wget through Polipo8.58.912.8

All three implementations yield similar results for one and two clients.

With three clients, Wget and Squid are using three connections where Polipo is only using two. The particular transatlantic link being used is limited by TCP's congestion avoidance algorithm: the throughput is proportional to the number of connections being used. As the clients are not aggressive enough to allow Polipo to do any significant amount of pipelining, Polipo is ending up with almost exactly 2/3 the throughput of the other clients.

This run is network-bound, and the server does not overtake the clients. Changing Polipo's memory allocation does not result in any measurable difference (i.e. Polipo is just as fast with 400 kB of memory as with 8 MB).

Remote host: dynamic pages

All times are real time in seconds (smaller is better).

1 client2 clients3 clients
Wget8.28.48.3
Wget through Squid8.28.57.7
Wget through Polipo4.94.54.5

The remote host is sending dynamically generated pages with some static images. Polipo requests HTTP/1.1 chunked encoding and keeps the connection up; both Wget and Squid are HTTP/1.0 implementations, and they need to close the connection whenever they receive a dynamically generated page.

Again, changing Polipo's memory allocation does not result in a measurable difference.

Conclusions

In the arbitrary and random situations created in this completely unscientific test, Polipo performs decently. In all tests where Polipo's behaviour is directly comparable to that of Squid, performance is roughly the same.

The only major slowdown is the last run in test 1; the effect is well understood, and it is due to a situation that does not happen under real usage (servers pushing data faster than clients can pull it). A comparison with Squid's behaviour is meaningless, as Squid is not designed to run in 400 kB of core.

There is a minor slowdown in test 2, where other implementations use three connections and Polipo only two. Polipo was explicitly designed to limit the number of connections used on the server side, and the current version limits that number to 2 per server (5 for broken servers); a future version might allow this number to change dynamically.

Note: since version 0.9, polipo does allow this figure to be changed by setting the serverSlots variable. However, there is still no provision for having this value change dynamically. If you have any ideas for a suitable algorithm, please drop me a note.

Software used

Client side:

Server side:

Back to Polipo.