Polipo FAQ

General questions
Troubleshooting
Features
Configuring Polipo
Performing exotic tasks
Development-related questions
Polipo internals

General questions

Who is maintaining Polipo?

Juliusz Chroboczek, unfortunately, in his copious free time.

How do I get help?

If the fine manual and this FAQ didn't solve your problem, and you have checked the list of known bugs in Polipo feel free to ask on the Polipo-users mailing list.

As this list carries fairly moderate traffic (usually between 0 and 3 messages a day, with occasional bursts of activity), you should feel free to subscribe. You are welcome to send mail to the list even if you're not subscribed, in which case you should mention that you want to be CC'd with replies.

You may browse the list without subscribing from SourceForge by HTTP (slow and unreliable), from Gmane by HTTP (faster) and from Gmane by NNTP (even faster, but then, I'm using a smart newreader and a poor web browser).

You should not contact the developers personally, unless you have good reasons to want your query to remain confidential. If you do, please make sure to include the word ‘‘polipo’’ somewhere in the subject line.

How do I report a bug?

You can either send mail to the Polipo-users mailing list (no subscription required) or submit an issue to the issue tracker. If you do both, please mention the issue number in your mail.

What systems does Polipo run on?

Polipo is developed on recent versions of Linux (with glibc) and tested with both gcc and clang.

Polipo is fairly portable C89 code, and should work on any POSIX-ish system built in the last 30 years. Older versions of Polipo have sucessfully tested on at least the following systems:

Linux with glibc 2 (Alpha, x86, x86_64, PPC), libc 5 (x86) and uclibc (MIPS on OpenWRT);
FreeBSD since 4.2 (x86);
NetBSD since 1.6.1 (m68k and x86);
OpenBSD (various versions);
Mac OS X since 10.2 (10.1 doesn't work);
SVR4 (Solaris/Sparc 2.9 and HP/UX B.11.11);
Microsoft Windows native API (various versions);
Microsoft Windows with Cygwin (various versions).

If you're lucky, you might find a Windows binary in my download area.

Polipo has been reported to be slowish under Cygwin — please use the native Windows port instead. Under SVR4 and native Windows, Polipo has a small memory leak due to a deficiency in the putenv library function. Please use a Free Unix system if possible.

Polipo should be easy to port to any 32- or 64-bit machine with a Unix-like system, a half-decent C89 or C99 compiler, and either the poll or select system call. The system calls writev and readv are nice to have, but not strictly necessary.

Are there any caveats for Windows users

Yes, Polipo for Windows usually requires some manual configuration. Please see this document for more information.

You might also find useful hints in this thread.

Which browsers does Polipo work with?

Any compliant HTTP/1.0 or HTTP/1.1 client that can speak to an HTTP proxy should be fine. This includes every browser known to me.

Which web sites does Polipo work with?

Polipo aims at being a compliant HTTP/1.1 proxy. It should work with any web site that complies with either HTTP/1.1 or the older HTTP/1.0. (Polipo does not support the long-obsolete HTTP/0.9.)

Since a large number of web sites do not comply with relevant standards, Polipo includes a number of workarounds for broken sites, and can optionally be configured to include even more of those at a slight cost in performance.

A well-tuned web site will send replies that contain hints for a cache to fine-tune their behaviour. Please see this caching tutorial if you want to make your web site cooperate with Polipo and other standards-compliant caches.

If Polipo is free, do you request my first-born child?

Polipo is Free and OpenSource Software, and comes with absolutely no strings attached, not even the rather mild conditions required by typical GNU-style software or the even milder conditions required by the older (four-pronged) BSD license. Please read the Polipo distribution conditions and see for yourself.

Troubleshooting

Polipo says ‘‘Couldn't bind: Address already in use’’

Polipo is already running.

When I try to access a remote host, polipo just sits there

If the local webserver (http://localhost:8123/) works fine, but polipo just hangs when accessing remote hosts, there might be a DNS problem. Try running

$ polipo dnsUseGethostbyname=true

If that works around the problem, read DNS in the Polipo manual for more information about what's going on.

Polipo says ‘‘Falling back to using system resolver’’

Usually, Polipo will say something like

DNS: recv failed: Connection refused (111)
Falling back to using system resolver.

This means that Polipo couldn't contact a name server. Please point Polipo at a working recursive name server by editing /etc/resolv.conf, or, if that is impossible, by setting the variable dnsNameServer. Please see DNS in the Polipo manual for more information.

Why does my S key stop working after installing Polipo for Windows?

Some users report that after installing Polipo for Windows, their S key stops working until they reboot the computer. We've been unable to duplicate the issue. It appears to be related to the version of the Nullsoft Installer and some combination of installed software in the system. If you have this issue, please install the Polipo binary manually.

Polipo says ‘‘Couldn't parse server headers’’

Either the server is buggy, and it speaks HTTP incorrectly, or polipo is buggy, and it cannot parse perfectly good HTTP. Polipo will ignore the incorrect header, which should work around the issue.

Polipo says ‘‘Couldn't parse URL’’

Was that an FTP URL? Polipo is an HTTP proxy that can tunnel HTTPS, it's not an FTP or FTP-over-HTTP proxy.

Reconfigure your browser so that it doesn't use a proxy for FTP connections.

Polipo says ‘‘Uncacheable object’’

That's normal. A resource was decorated with metadata that causes compliant HTTP/1.1 proxies (including Polipo) to request it every time. Polipo is logging the URL in case you want to include it in your forbidden URLs file.

Polipo says ‘‘Restarting pipeline’’

That's normal. Polipo decided to pipeline requests to a given server, and it later turned out that this was a bad decision. Polipo is recovering.

Polipo says ‘‘"Not changed" reply with no ETag’’

That's okay. The server is violating RFC 2616, Section 10.3.5. Polipo will work around the problem.

Polipo says ‘‘Server ignored conditional request’’

That's okay. Polipo tried to find out whether an object had been superseded (to revalidate it using an if-modified-since request) and the server replied with the full object rather than just informing Polipo there was no change. If the server keeps doing that, Polipo will switch to using a different, slightly slower validation method (HEAD revalidation).

Polipo says ‘‘Persistent reply with no content-length’’

That's sort of okay. The server is sending a combination of headers that doesn't make sense but that is not explicitly forbidden by RFC 2616. Polipo will make a reasonable guess about the meaning of the server's reply (it will assume that the reply is not persistent).

The site I'm trying to access doesn't work with Polipo

The most common cause for such issues is a site that provides incorrect cache control information, and hence causes Polipo to serve stale data to the client.

You can work around most such issues by setting dontCacheRedirects and dontCacheCookies to true, and creating a file ~/.polipo-uncachable (or whatever you set uncachableFile to) with the following contents

\.(php[345]?|[sp]html|cgi|pl|py|[aj]sp)$
\?
/cgi-bin/

Note that doing this will slow down Polipo quite a bit.

(Thanks to hondza for providing this answer.)

Features

Can Polipo do https proxying?

Yes.

Can Polipo do FTP proxying?

No.

Can Polipo use a SOCKS gateway?

Yes. SOCKS4a and SOCKS5 with hostnames are supported. Please check the variables socksParentProxy and socksProxyType.

Can Polipo behave as a transparent proxy?

Polipo is transparent if you set the following in your config file:

maxAge = 0
maxExpiresAge = 0

But that's probably not what you meant — please see the next question.

Can Polipo behave as an intercepting proxy?

No.

Interception proxying (sometimes confusingly called ‘‘transparent’’ proxying) is a technique that intercepts client connections at the network layer in order to redirect them at an application layer proxy.

Interception proxying is a fundamentally broken design (see for example this posting and RFC 3143, Section 2.2.2), and will not be supported by Polipo. If you want to use interception proxying in order to avoid manually configuring your clients, please ask your browser vendor to provide a proper protocol for client auto-configuration. If you want to use interception proxying for any other reason, you're probably doing something wrong.

(Or you're a fascist pig with a read-only mind.)

Is Polipo an anonymising proxy?

There's no such thing as an anonymising proxy (but see below about tor). Some proxies, however, have some features that make client identification somewhat less precise.

Every server that you access can find out the IP address of the machine where the connection comes from. When you use a proxy, this is the IP address of the machine running the proxy; thus, a shared proxy makes it slightly more difficult to find out which client is accessing a given server.

Polipo does not use the non-standard X-Forwarded-For header that gives the client's address out. By default, Polipo does not include the Via header that tells servers the name of every proxy being used (but check the disableVia variable).

There are, however, many other elements of the HTTP/HTML suite that give up information about you. The most obvious are HTTP headers, notably cookies, Accept-Language and User-Agent; all of these can be censored by Polipo.

The Javascript client side scripting language can also be used to disclose information about the client; the only solution is to disable Javascript in your browser.

All of the tweaks suggested above will break some sites. And remember: no matter what measures you take, you will not be anonymous; always assume that your local law enforcement agency, your boss, your significant other and your mother know which sites you have been visiting.

Is it possible to run Polipo together with Privoxy?

Yes. In order to get the privacy enhancements of Privoxy and much (but not all) of the performance of Polipo, you should put Polipo upstream of Privoxy.

In other words, you should:

point your web browser at Privoxy (localhost:8118);
point Privoxy at Polipo (put forward / localhost:8123 in the Privoxy config file);
use no parent proxy in Polipo.

Is it possible to use Polipo with tor?

(Tor is a volunteer-run network of anonymising proxies.)

Yes. Set socksParentProxy to localhost:9050.

Is Polipo secure?

Only the paranoid survive — Andy Grove

It depends on what is your threat model and how you configure Polipo.

By default, Polipo allows anyone on the allowedClients network to connect and access the configuration interface and the list of cached pages. You can close that loophole by setting disableLocalInterface.

By default, access to Polipo is only allowed from the local machine. If you change allowedClients to allow remote access, Polipo relies on your routers to prevent address spoofing. Setting authCredentials does not improve security if you don't control your routers, as HTTP Basic security is vulnerable to sniffing. I myself leave allowedClients at the default value and use ssh tunnels when sharing proxies.

A serious security bug was found in Polipo 0.9.8. This bug could only be exploited by people who were allowed to access the proxy. This has been fixed in 0.9.9.

During its history, four buffer overflows have been found (and fixed) in Polipo. One was only present on obsolete systems (with the old definition of snprintf), the other three were buffer overflows while reading, and therefore most probably impossible to exploit.

Configuring Polipo

What filesystem should I use for Polipo's on-disk cache?

Any filesystem that is able to efficiently handle large numbers of small files should do.

Under Linux, reiserfs/tails provides the best space/time compromise. Other good choices are ext4, Reiserfs/notails and XFS. Avoid ext2 and ext3 without hashed directories. Make sure that the filesystem is mounted with the relatime option, or, even better, noatime (but make sure you know what this option implies).

Under BSD Unix, FFS (UFS) with small frags has good space usage, but access time of large directories is pretty bad. FreeBSD's hashed directories only partially solve the problem.

The currently fashionable copy-on-write filesystems (ZFS, btrfs, f2fs) should provide excellent performance, but I haven't tested them myself.

No idea about Windows.

Is it okay to point Polipo's on-disk cache to an NFS-mounted directory?

This should in principle work with NFSv3 except if your NFS implementation is buggy (but see below about Linux). If you run the proxy (polipo) and the expiry process (polipo -x) on different hosts, you might (if your LAN is very noisy, your NFS implementation very primitive, and you're very unlucky) get I/O errors and broken connections, but no data corruption should happen.

NFSv2 is definitely not safe, unless both the proxy and the expiry process are run on the same host.

NFSv3 is not safe if the NFS client is running a Linux version earlier than 2.6.5.

More generally, any filesystem should work as long as:

open(O_CREAT|O_EXCL) works reliably, and
the filesystem supports Unix read-after-delete semantics.

If you use a filesystem that does not maintain last-modified time correctly, you might want to set the variable preciseExpiry to true. This will make expiry much slower.

How do I move Polipo's on-disk cache to a different machine?

Just copy the contents of /var/cache/polipo. You don't need to preserve atimes, but you should preserve mtimes. You should send Polipo a USR2 signal both before and after you perform the copy.

You can also merge two caches by simply copying one over the other.

Is it okay to share a single on-disk cache between multiple polipi?

That's not what Polipo has been designed for, but it turns out to work reasonably well.

The cache will not become corrupted (except if you're running NFSv2). If two polipi try to access the same object at the same time, they will complain loudly and fail to save the new data on disk.

How does Polipo react to stepping the clock? to clock skew?

Polipo doesn't like your system clock to change by more than a few seconds. If you ever need to step your system clock, you may want to stop Polipo before the change and restart it afterwards.

If you step the system clock backwards by a large amount, polipo will become confused about the dates of the files in the disk cache and start serving stale data to clients. You can work around this problem either by manually purging the on-disk cache (rm -r), or by shift-clicking reload in your browser. (Stepping the clock forwards does not have this problem.)

All of the algorithms that polipo uses are safe with respect to clock skew between proxy and server. In other words, if your system clock is wildly off (but you don't step it), polipo will react by fetching more data than necessary from the network, never by serving stale data.

You can avoid the inconveniences described above by using an NTP client to keep the system time accurate. I'm running ntpd on my servers and desktop machines, and chrony on my laptops.

What can I do to make Polipo faster?

The default configuration of Polipo is carefully tuned to balance size and speed. The simplest way to make Polipo faster is to give it more memory; see Memory usage in the manual for information on doing that.

If you're using a lot of regular expressions in your /etc/forbidden file — don't do that. Using domains is okay, although even that is not as fast as I'd like.

If you're on a fast network, you may also improve Polipo's I/O performance by recompiling it with a larger CHUNK_SIZE.

$ make EXTRA_DEFINES="-DCHUNK_SIZE=8192"

The default value is 4096 on 32-bit architectures, and 8192 on 64-bit ones. 8192 and 16384 are good values (there's hardly any benefit beyond that). Note that doing that decreases Polipo's ability to allocate memory in a flexible manner, and you should increase Polipo's chunk memory to compensate.

If your Polipo is limited by the speed of the disk, you may be able to make it feel faster by playing with the value of idleTime. See Asynchronous Writing in the manual for more information.

If you are limited by the speed of the network, you may get Polipo's cache to be more effective by serving stale data; if you do that, you will sometimes need to hit the reload to see the fresh contents of a page. See the description of the variables cacheIsShared, relaxTranparency and mindlesslyCacheVary in the manual.

What can I do to make Polipo use less memory?

The simplest way to decrease Polipo's memory usage is to give it less memory; please see Memory usage in the manual for information on doing that.

If you decrease Polipo's chunk memory, you may want to recompile it with a smaller CHUNK_SIZE.

$ make EXTRA_DEFINES="-DCHUNK_SIZE=2048"

2048 is a good value. (While Polipo will work with 1024 and even 512 byte chunks, I don't recommend going beneath 2048.)

If you're using a lot of regular expressions in your /etc/forbidden file — don't do that.

What can I do to make the Polipo binary smaller?

Polipo is fairly small out of the box. If you're building a single-floppy system using Polipo, or burning Polipo into a router's ROM, you might want to go to some extra effort in order to make Polipo's binary smaller.

Of course, you will want to run strip(1) on the Polipo binary.

Many of Polipo's features can be compiled out if not needed; please see the Makefile for details.

You may also want to recompile Polipo with assertions disabled by defining the macro NDEBUG.

For example, if you're using gcc, you might want to say

$ make CDEBUGFLAGS="-Os -Wall" EXTRA_DEFINES="-DNDEBUG -DNO_IPv6 -DNO_STANDARD_RESOLVER -DNO_REDIRECTOR -DNO_FORBIDDEN" all

You might also want to compress the resulting binary using something like upx, but only do that if you understand its effect on virtual memory (swap) usage.

Performing exotic tasks

Most of the answers in this section only apply to Unix systems. Windows-specific contributions are welcome.

How do I find which sites take the most space in the on-disk cache?

$ du /var/cache/polipo/ | sort -n | tail

How do I cause polipo to revalidate an object in the cache?

You need to send polipo a request for the object with a Cache-Control: no-cache header.

With Netscape/Mozilla: go to the page and hit shift-reload.

With other tools: make sure that http_proxy is set and use one of the following:

$ curl -I -H 'Cache-Control: no-cache' http://... > /dev/null
$ wget --header='Cache-Control: no-cache' -O /dev/null http://...
$ squidclient -s -p 8123 -r http://...

How do I purge a complete web site from the on-disk cache?

$ killall -USR1 polipo
$ rm -r /var/cache/polipo/www.microsoft.com/
$ killall -USR2 polipo

How do I purge a single object from the on-disk cache?

There is currently no automated way of performing this operation. You can do it by hand, by identifying and removing the relevant file in the cache:

$ killall -USR1 polipo
$ grep -il '^X-Polipo-Location: http://www.pps.jussieu.fr/~jch/software/polipo/^M' /var/cache/polipo/www.pps.jussieu.fr/*
/var/cache/polipo/www.pps.jussieu.fr/pA2oquORVPZEXdYJ7cXWOQ==
$ rm /var/cache/polipo/www.pps.jussieu.fr/pA2oquORVPZEXdYJ7cXWOQ==
$ killall -USR2 polipo

Type ^V^M in order to get a ^M onto the command line.

How do I use Polipo to provide IPv4 client access to an IPv6-only network?

If you've got an IPv6-only network, the most convenient solution to get access to the IPv4 Internet would be to set-up routing through a NAT box, either natively (that's what we do on our mesh network) or else using a set of IPv4-in-IPv6 tunnels (or an IPv4-over-IPv6 VPN).

A more pedestrian solution is to get your TCP client applications to tunnel through an instance of Polipo running on a double-stack host.

Install Polipo on a double-stack host. Set proxyAddress to ::, then configure both allowedClients and your firewall suitably. Do not use authCredentials — it is insecure, and should not be used except in very particular situations.

You should then make sure that tunnelAllowedPorts includes at least 22, 443, 873 and 5223.

On every client, configure your web clients, your Jabber clients, your rsync clients, etc. to use Polipo as an HTTP proxy. For command line clients, this can be done with:

$ export http_proxy=http://polipo.example.org:8123
$ export https_proxy=http://polipo.example.org:8123
$ export RSYNC_PROXY=polipo.example.org:8123

On systems using OpenSSH, you will want to install socat and create a script ssh-polipo with the following contents:

#!/bin/sh
exec ssh -o 'ProxyCommand socat - PROXY:polipo.example.org:%h:%p,proxyport=8123' "$@"

If you are running an IPv6-only network, I definitely want to hear from you.

Development-related questions

How do I apply a patch?

When someone publishes a fix on the Polipo-users list, it usually comes under the form of a patch, a plain text file with the extension ‘.patch’ or sometimes ‘.diff’.

A patch describes the differences between a released version of polipo and the fixed version. Modifying the released version in order to get the fixed version is called applying the patch, which is done with a program called patch. You first need to untar the polipo sources, change to the directory where the sources are, and then invoke patch:

$ cd polipo-0.9.4/
$ patch -p1 < ../polipo-fix.patch

If patch complains that it cannot find the file to patch, try using -p0 instead of -p1. If patch complains that it cannot recognise the patch format, you're probably trying to use a SVR4 version of the patch utility with a patch in unified diff format; please install GNU patch first (or upgrade to a Free Unix system).

How do I get your current development version of polipo?

$ git clone git://git.torproject.org/git/polipo
$ cd polipo/
$ gitk &

Alternatively, check the GitHub mirror of Polipo.

What do you mean by ‘‘run it under valgrind’’?

Valgrind is a (rather amazing) memory debugger for Linux. If you send me a bug report that I cannot reproduce, I may ask you to try to reproduce it under valgrind.

In order to do that, you will need to install valgrind on your system. You should then recompile Polipo with debugging:

$ make clean
$ make CDEBUGFLAGS='-g -Wall'

You should then run Polipo under valgrind:

$ valgrind ./polipo

and send me any error messages produced.

Polipo internals

What do the data in the ‘‘`servers?`’’ display mean?

In order to make sound decisions about pipelining and PMM, Polipo needs to cache a certain amount of data about the servers it accesses; the contents of the server cache can be displayed on http://localhost:8123/polipo/servers?. This information is mostly useful for people developing Polipo; however, if you're interested, read on.

The ‘‘servers?’’ display looks like so:

Server	Version	Persistent	Pipeline	Connections		rtt	rate
www.pps.jussieu.fr	1.1	yes	unknown	1/2		0.008
www.kde.org	1.1	yes	yes	2/2		0.135	181176
slashdot.org	1.1	no		0/4		0.217	33681
ez.no	1.1	no		0/2	(1 lies)	0.268

‘‘Server’’ is the name of the server. ‘‘Version’’ can be ‘‘1.0’’, ‘‘1.1’’ or ‘‘unknown’’, and is the version of HTTP that the server claims to speak.

‘‘Persistent’’ specifies whether the server does persistent connections, and can be one of ‘‘yes’’, ‘‘no’’ or ‘‘unknown’’. If ‘‘Version’’ is ‘‘1.1’’ and ‘‘Persistent’’ is ‘‘yes’’, then ‘‘Pipeline’’ specifies whether the server can reliably do pipelining; it can be one of ‘‘yes’’, ‘‘no’’, ‘‘unknown’’ or, if a pipelining probe is currently in progress, ‘‘probing’’.

‘‘Connections’’ specifies the number of connections to this server. It is of the form ‘‘m/n’’, where m is the number of connections currently open, and n the maximum number of connections that polipo will use when speaking to this server.

If the server failed to respond to Polipo's standard validation method (‘‘If-Modified-Since’’ and ‘‘If-None-Match’’ preconditions), the server is marked as a ‘‘liar’’, and Polipo switches to using the ‘‘HEAD’’ method for validation. This is noted as ‘‘(n lies)’’, where n is (a linearly decaying measure of) the number of times the server lied to a precondition.

Finally, ‘‘rtt’’ is an estimate of the time the server takes to respond to requests in seconds (the server's round-trip time), and ‘‘rate’’ is an estimate of the transfer rate from the server, in bytes per second. Both are exponentially smoothed averages, and are absent if not measured yet.

Why do you implement your own DNS resolver?

Polipo doesn't use the standard stub resolver (gethostbyname and getaddrinfo) but instead implements its own DNS resolver. There are two reasons for that:

gethostbyname and getaddrinfo are blocking interfaces: using one of those would mean that the whole of Polipo would hang while a DNS lookup is in progress;
neither gethostbyname nor getaddrinfo return the DNS TTL, and obeying the TTL is a MUST according to RFC 2616 paragraph 15.3.

Back to the Polipo page.