
CGIProxy 1.1

HTTP Proxy in a CGI Script

(c) 1996, 1998-1999 by James Marshall, james@jmarshall.com

For the latest, see http://www.jmarshall.com/tools/cgiproxy/

------------------------------------------------------------------------
This CGI script acts as an HTTP Proxy.  Through it, you can can retrieve
any resource that is Web-accessible from the server it runs on.  This is
useful when your own access is limited, but you can reach a server that
can in turn reach others that you can't.  By default, no user info 
(except browser type) is sent to the target server, so you can set up 
your own anonymous proxy like The Anonymizer (http://www.anonymizer.com/).

Whenever an HTML resource is retrieved, it's modified so that all URLs
in it point back through the same proxy, including form submissions.
Once you're using the proxy, you can (almost) forget it's there.

Configurable options include cookie support, text-only support (to
save bandwidth), simple ad filtering, and custom encoding of target
URLs.

Requires Perl 5, but can run in Perl 4 with a simple modification.

The original seed for this was a program I wrote for Rich Morin's
article in the June 1996 issue of Unix Review; online version at
http://www.cfcl.com/tin/P/9606.html.

------------------------------------------------------------------------
LEGAL DISCLAIMER

Censorship is a controversial subject, and some governments and
companies have rules about what information you should have access
to.  If you use my software to bypass rules that have been imposed on
you, you assume all legal risks and responsibilities involved.  I'm
providing the software as a demonstration and teaching tool, and for
when legitimate access is needed to non-accessible servers.  I won't
encourage you to break any rules, because I would get in trouble if I
did.  I can't prevent you from using this software in illegitimate ways,
but I believe the value of it as a teaching tool is far too great to
let a few miscreants ruin it for everybody.

------------------------------------------------------------------------
TO INSTALL:

To run this, your server must support Non-Parsed Header (NPH) CGI
scripts.  Most servers do, but not all.

Quick answer:  Unpack the two scripts and call startproxy.cgi.

Longer answer: 

  1. Unpack the distribution. 
  2. Install the two scripts like any other CGI scripts (set permissions 
     and path to Perl interpreter). Rename them if you want; be sure 
     the $proxyname setting in startproxy.cgi matches the main script.
  3. Be sure the main script is an NPH script.  In Apache and related 
     servers, do this by simply starting the filename with "nph-". 
  4. Set any options in the main script:
       . To restrict forwarded data to text only, set $TEXT_ONLY=1.
       . To support cookies, set $SUPPORT_COOKIES=1.
       . To filter ads and ad-related cookies, set $FILTER_ADS=1.  You
           can customize this behavior with a few related settings.
       . To customize the encoding format for target URLs, modify the 
           &proxy_encode() and &proxy_decode() routines.
       . If this proxy uses another HTTP proxy, set $ENV{'http_proxy'} and
           $ENV{'no_proxy'}.
       . To use Perl 4 instead of Perl 5, see the instructions by the 
           "use Socket" line.
       . To hide which browser you're using, set $ENV{'USER_AGENT'}.
       . If your server is a Windows machine, comment out the "alarm(600);" 
           line.

------------------------------------------------------------------------
TO USE:

Run startproxy.cgi to start a browsing session.  Once you've gotten a page 
through the proxy, everything it links to will automatically go through 
the proxy. 

Or, you can call the main script directly with the correctly encoded 
target URL in PATH_INFO, e.g.
  http://www.yourserver.com/path/nph-proxy.cgi/http/www.slashdot.org/ 

------------------------------------------------------------------------
LIMITS 'N' BUGS:

HTTP Basic authentication isn't supported.

URLs generated by JavaScript or similar mechanisms won't be re-proxy'ed
correctly.  JavaScript in general may not work as expected.

If you browse to many sites with cookies, CGIProxy may drop some, but I 
haven't seen this happen yet.

It's possible to construct HTML tags that mistakenly convert URLs when
they shouldn't.  Also, URLs get truncated if they have spaces in them.
To make the program bulletproof would slow it down greatly.

I didn't check the spec on HTTP proxies when I wrote this (sometime in
1996).  It's possible the protocol is violated. Actually, this whole
concept is a violation of the proxy concept, so I'm not too worried.  If
any protocol violations cause you problems, let me know.

Only HTTP is supported so far.

------------------------------------------------------------------------
Last Modified: March 9, 1999
http://www.jmarshall.com/tools/cgiproxy/

