
CGIProxy 1.0

HTTP Proxy in a CGI Script

(c) 1996, 1998 by James Marshall, james@jmarshall.com

For the latest, see http://www.jmarshall.com/tools/cgiproxy/

------------------------------------------------------------------------
This CGI script acts as an HTTP Proxy.  Through it, you can can retrieve
any resource that is Web-accessible from the server it runs on.  This is
useful when your own access is limited, but you can reach a server that
can in turn reach others that you can't.  No user info is sent to the
target server, so you can set up your own anonymous proxy like The
Anonymizer (http://www.anonymizer.com/).

Whenever an HTML resource is retrieved, it's modified so that all URLs
in it point back through the same proxy.  Form submission is supported.
Once you're using the proxy, you can (almost) forget it's there.

The owner of the proxy can choose to forward only text resources. This
may help when bandwidth is limited.

Requires Perl 5, but can run in Perl 4 with a simple modification.

The original seed for this was a program I wrote for Rich Morin's
article in the June 1996 issue of Unix Review; online version at
http://www.cfcl.com/tin/P/9606.html.

------------------------------------------------------------------------
LEGAL DISCLAIMER

Censorship is a controversial subject, and some governments and
companies have rules about what information you should have access
to.  If you use my software to bypass rules that have been imposed on
you, you assume all legal risks and responsibilities involved.  I'm
providing the software as a demonstration and teaching tool, and for
when legitimate access is needed to non-accessible servers.  I won't
encourage you to break any rules, because I would get in trouble if I
did.  I can't prevent you from using this software in illegitimate ways,
but I believe the value of it as a teaching tool is far too great to
let a few miscreants ruin it for everybody.

------------------------------------------------------------------------
TO INSTALL:

To run this, your server must support Non-Parsed Header (NPH) CGI
scripts.  Most servers do, but not all.

Quick answer:  Unpack the two scripts and call startproxy.cgi.

Longer answer: 

  1. Unpack the distribution. 
  2. Install the two scripts like any other CGI scripts (set permissions 
     and path to Perl interpreter). Rename them if you want; be sure 
     the $proxyname setting in startproxy.cgi matches the main script.
  3. Be sure the main script is an NPH script.  In Apache and related 
     servers, do this by simply starting the filename with "nph-". 
  4. Set any options in the main script:
       . To restrict forwarded data to text only, set $textonly=1. 
       . If this proxy uses another HTTP proxy, set $ENV{'http_proxy'} and
           $ENV{'no_proxy'}. 
       . To use Perl 4 instead of Perl 5, see the instructions by the 
           "use Socket" line. 

------------------------------------------------------------------------
TO USE:

Just call the main script with the full URL to retrieve in the PATH_INFO, 
e.g. http://www.yourserver.com/path/nph-proxy.cgi/http://www.slashdot.org/ 

Once you've gotten a page through the proxy, everything it links to
will automatically go through the proxy.  You can use startproxy.cgi to
correctly launch the main script, but it's not needed.

------------------------------------------------------------------------
LIMITS 'N' BUGS:

Cookies aren't supported. If there's demand, it shouldn't be hard to
add them.

URLs generated by JavaScript or similar mechanisms won't be re-proxy'ed
correctly.  If there are normal tags I'm not converting, please let me
know.

If you're using this to get around filters, it may not work because the
target URL is still present in the proxy'ed URL; it depends on the
filter.  To fix this, the program could be modified to use an encoded
form of the target URL, instead of the plaintext URL, in the PATH_INFO.
Is there demand?

It's possible to construct HTML tags that mistakenly convert a URL when
it shouldn't.  Also, URLs will be truncated if they have spaces in
them. To make the program bulletproof would slow it down greatly.

I didn't check the spec on HTTP proxies when I wrote this (sometime in
1996).  It's possible the protocol is violated. Actually, this whole
concept is a violation of the proxy concept, so I'm not too worried.  If
any protocol violations cause you problems, let me know.

Only HTTP is supported so far.

------------------------------------------------------------------------
Last Modified: August 3, 1998
http://www.jmarshall.com/tools/cgiproxy/

