
CGIProxy 1.2  (released September 11, 1999)

HTTP Proxy in a CGI Script

(c) 1996, 1998-1999 by James Marshall, james@jmarshall.com

For the latest, see http://www.jmarshall.com/tools/cgiproxy/

------------------------------------------------------------------------
This CGI script acts as an HTTP or FTP proxy.  Through it, you can can
retrieve any resource that is accessible from the server this runs on.
This is useful when your own access is limited, but you can reach a server
that can in turn reach others that you can't.  By default, no user info
(except browser type) is sent to the target server, so you can set up your
own anonymous proxy like The Anonymizer (http://www.anonymizer.com/).

IMPORTANT NOTE ABOUT ANONYMOUS BROWSING:
Anonymity is mostly supported, but is not bulletproof.  For example,
malicious JavaScript or embedded objects on pages can send your
identity to any server.  CGIProxy was originally made for indirect
browsing more than anonymity, but since people are using it for
anonymity, I'll make it as anonymous as possible.  Suggestions
welcome.  For better anonymity, browse with JavaScript turned off.

Whenever an HTML resource is retrieved, it's modified so that all URLs
in it point back through the same proxy, including form submissions.
Once you're using the proxy, you can (almost) forget it's there.

Configurable options include cookie support, text-only support (to
save bandwidth), simple ad filtering, custom encoding of target
URLs, and more.

Requires Perl 5, but can run in Perl 4 with a simple modification.

The original seed for this was a program I wrote for Rich Morin's
article in the June 1996 issue of Unix Review; online version at
http://www.cfcl.com/tin/P/9606.html.

------------------------------------------------------------------------
LEGAL DISCLAIMER:

Censorship is a controversial subject, and some governments and
companies have rules about what information you should have access
to.  If you use my software to bypass rules that have been imposed on
you, you assume all legal risks and responsibilities involved.  I'm
providing the software as a demonstration and teaching tool, and for
when legitimate access is needed to non-accessible servers.  I won't
encourage you to break any rules, because I would get in trouble if I
did.  I can't prevent you from using this software in illegitimate ways,
but I believe the value of it as a teaching tool is far too great to
let a few miscreants ruin it for everybody.

------------------------------------------------------------------------
TO INSTALL:

To run this, your server must support Non-Parsed Header (NPH) CGI
scripts.  Most servers do, but not all.

Quick answer:  Unpack the two scripts and call startproxy.cgi.

Longer answer: 

  1. Unpack the distribution. 
  2. Install the two scripts like any other CGI scripts (set permissions 
     and path to Perl interpreter). Rename them if you want; be sure 
     the $proxyname setting in startproxy.cgi matches the main script.
  3. Be sure the main script is an NPH script.  In Apache and related 
     servers, do this by simply starting the filename with "nph-". 
  4. Set any options in the main script:
       . To restrict forwarded data to text only, set $TEXT_ONLY=1.
       . To support cookies, set $SUPPORT_COOKIES=1.
           You can specify which cookies are allowed and which are not, with
           @ALLOWED_COOKIE_SERVERS and @BANNED_COOKIE_SERVERS.
       . To filter ads and ad-related cookies, set $FILTER_ADS=1.
           You can customize this behavior with a few related settings.
       . To remove embedded scripts from pages, set $REMOVE_SCRIPTS=1.
           This helps with anonymity somewhat. It also removes most popup ads.
       . To customize the encoding format for target URLs, modify the 
           &proxy_encode() and &proxy_decode() routines.
       . If this proxy uses another HTTP proxy, set $ENV{'http_proxy'} and
           $ENV{'no_proxy'}.
       . To use Perl 4 instead of Perl 5, see the instructions by the 
           "use Socket" line.
       . Other minor config is possible; see the user configuration section.
       . If your server is a Windows machine, comment out the "alarm(600);" 
           line.

------------------------------------------------------------------------
TO USE:

Run startproxy.cgi to start a browsing session.  Once you've gotten a page 
through the proxy, everything it links to will automatically go through 
the proxy. 

Or, you can call the main script directly with the correctly encoded 
target URL in PATH_INFO, e.g.
  http://www.yourserver.com/path/nph-proxy.cgi/http/slashdot.org/ 

------------------------------------------------------------------------
LIMITS 'N' BUGS:

Anonymity is NOT PERFECT!! For example, malicious JavaScript on a page can
  send your identity to any server. For best anonymity, turn JavaScript off.

HTTP Basic authentication isn't supported.

URLs generated by JavaScript or similar mechanisms won't be re-proxy'ed
correctly.  JavaScript in general may not work as expected.

If you browse to many sites with cookies, CGIProxy may drop some, but I 
haven't seen this happen yet.

To save CPU time, I took some shortcuts with URL-handling.  I doubt these
  will ever affect anything, but tell me if you have problems. (The
  shortcuts are listed in the source code.) 

I didn't check the spec on HTTP proxies when I wrote this (sometime in
1996).  It's possible the protocol is violated. Actually, this whole
concept is a violation of the proxy concept, so I'm not too worried.  If
any protocol violations cause you problems, let me know.

Only HTTP and FTP are supported so far.

========================================================================

CH'CH'CH'CH'CHANGES:
--------------------

1.2, released September 11, 1999:
---------------------------------

The internal structure was rearranged in a big way, to support multiple
  protocols more cleanly.  Previously, HTTP was ingrained throughout; now 
  it's more modular.

FTP is now supported.

@ALLOWED_COOKIE_SERVERS lets you only accept cookies from certain servers.

@BANNED_COOKIE_SERVERS and @ALLOWED_COOKIE_SERVERS are now lists of
  Perl patterns (regular expressions) to match, rather than literal host
  names.  This lets you allow or forbid whole sets of servers rather
  than listing each server individually.  For more information on
  Perl patterns, read the Perl documentation.  nph-proxy.cgi has a note 
  in the user config section that may help enough.

You can remove scripts from HTML pages by setting $REMOVE_SCRIPTS=1.
  This helps with anonymity somewhat by removing some JavaScript (but not
  all!).  It also removes most popup ads.  :)

The HEAD method is now supported more cleanly.

Rare net_path form of relative URL (i.e. like "//host.com/path/etc")
  is now supported, for completeness and safety.

The default lists of cookie and ad servers are a bit better.


1.1, released March 9, 1999:
----------------------------

The whole format of the target URL in PATH_INFO was restructured.  It
  can be encoded however the user wishes.  This gets around PATH_INFO 
  clashes in various servers, solving most problems regarding server 
  incompatibilities I've heard about.

Cookies are now optionally supported (but off by default).

Banner ads can be filtered out.  Only a simple set of URL patterns are
  filtered out by default, but it's easy to add more entries to
  @BANNED_IMAGE_URL_PATTERNS.

Cookies from ad servers are filtered out (at least the main ones).  Again,
  the default list in @BANNED_COOKIE_SERVERS is simple, but you can
  easily add more.

Binary files are no longer getting messed up on Windows.

More HTTP headers are fixed to point back through the proxy.

Under some conditions, extra processes would hang around for hours
  and drag the system.  Alex Freed added a timeout to solve this
  for now.  I can't reproduce the problem, so any info is appreciated.
  [9-9-1999: It may be a bug in older Apaches, fixed by upgrading to 
    Apache 1.3.6 or better.  Julian Haight reports the same problem with 
    other scripts on Apache 1.3.3, but not with Apache 1.3.6.]

Internally: code was cleaned up, URL-parsing was improved, and
  relative URL calculation was redone.



1.0, released August 3, 1998:
-----------------------------

Initial release.


========================================================================

Last Modified: September 11, 1999
http://www.jmarshall.com/tools/cgiproxy/

