skipfish

Go to file
Steve Pinkham fcf0650b5e Version 1.00b as released		2010-03-20 11:46:08 -04:00
assets	Version 1.00b as released	2010-03-20 11:46:08 -04:00
dictionaries	Version 1.00b as released	2010-03-20 11:46:08 -04:00
COPYING	Version 1.00b as released	2010-03-20 11:46:08 -04:00
Makefile	Version 1.00b as released	2010-03-20 11:46:08 -04:00
README	Version 1.00b as released	2010-03-20 11:46:08 -04:00
alloc-inl.h	Version 1.00b as released	2010-03-20 11:46:08 -04:00
analysis.c	Version 1.00b as released	2010-03-20 11:46:08 -04:00
analysis.h	Version 1.00b as released	2010-03-20 11:46:08 -04:00
config.h	Version 1.00b as released	2010-03-20 11:46:08 -04:00
crawler.c	Version 1.00b as released	2010-03-20 11:46:08 -04:00
crawler.h	Version 1.00b as released	2010-03-20 11:46:08 -04:00
database.c	Version 1.00b as released	2010-03-20 11:46:08 -04:00
database.h	Version 1.00b as released	2010-03-20 11:46:08 -04:00
debug.h	Version 1.00b as released	2010-03-20 11:46:08 -04:00
http_client.c	Version 1.00b as released	2010-03-20 11:46:08 -04:00
http_client.h	Version 1.00b as released	2010-03-20 11:46:08 -04:00
report.c	Version 1.00b as released	2010-03-20 11:46:08 -04:00
report.h	Version 1.00b as released	2010-03-20 11:46:08 -04:00
same_test.c	Version 1.00b as released	2010-03-20 11:46:08 -04:00
skipfish.c	Version 1.00b as released	2010-03-20 11:46:08 -04:00
string-inl.h	Version 1.00b as released	2010-03-20 11:46:08 -04:00
types.h	Version 1.00b as released	2010-03-20 11:46:08 -04:00
README

===========================================
skipfish - web application security scanner
===========================================

  http://code.google.com/p/skipfish/

  * Written and maintained by Michal Zalewski <lcamtuf@google.com>.
  * Copyright 2009, 2010 Google Inc, rights reserved.
  * Released under terms and conditions of the Apache License, version 2.0. 

--------------------
1. What is skipfish?
--------------------

Skipfish is an active web application security reconnaissance tool. It prepares 
an interactive sitemap for the targeted site by carrying out a recursive crawl 
and dictionary-based probes. The resulting map is then annotated with the 
output from a number of active (but hopefully non-disruptive) security checks. 
The final report generated by the tool is meant to serve as a foundation for 
professional web application security assessments.
Why should I bother with this particular tool?

A number of commercial and open source tools with analogous functionality is 
readily available (e.g., Nikto, Nessus); stick to the one that suits you best. 
That said, skipfish tries to address some of the common problems associated 
with web security scanners. Specific advantages include:

  * High performance: 500+ requests per second against responsive Internet 
    targets, 2000+ requests per second on LAN / MAN networks, and 7000+ requests
    against local instances has been observed, with a very modest CPU, network,
    and memory footprint. This can be attributed to:

    - Multiplexing single-thread, fully asynchronous network I/O and data 
      processing model that eliminates memory management, scheduling, and IPC 
      inefficiencies present in some multi-threaded clients.

    - Advanced HTTP/1.1 features such as range requests, content 
      compression, and keep-alive connections, as well as forced response size 
      limiting, to keep network-level overhead in check.

    - Smart response caching and advanced server behavior heuristics are 
      used to minimize unnecessary traffic.

    - Performance-oriented, pure C implementation, including a custom 
      HTTP stack. 

  * Ease of use: skipfish is highly adaptive and reliable. The scanner 
    features:

    - Heuristic recognition of obscure path- and query-based parameter 
      handling schemes.

    - Graceful handling of multi-framework sites where certain paths obey 
      a completely different semantics, or are subject to different filtering
      rules.

    - Automatic wordlist construction based on site content analysis.

    - Probabilistic scanning features to allow periodic, time-bound 
      assessments of arbitrarily complex sites. 

  * Well-designed security checks: the tool is meant to provide accurate and 
    meaningful results:

    - Three-step differential probes are preferred to signature checks 
      for detecting vulnerabilities.

    - Ratproxy-style logic is used to spot subtle security problems: 
      cross-site request forgery, cross-site script inclusion, mixed content,
      issues MIME- and charset mismatches, incorrect caching directive, etc.

    - Bundled security checks are designed to handle tricky scenarios: 
      stored XSS (path, parameters, headers), blind SQL or XML injection, or
      blind shell injection.

    - Report post-processing drastically reduces the noise caused by any 
      remaining false positives or server gimmicks by identifying repetitive
      patterns. 

That said, skipfish is not a silver bullet, and may be unsuitable for certain 
purposes. For example, it does not satisfy most of the requirements outlined in 
WASC Web Application Security Scanner Evaluation Criteria (some of them on 
purpose, some out of necessity); and unlike most other projects of this type, 
it does not come with an extensive database of known vulnerabilities for 
banner-type checks.

-----------------------------------------------------
2. Most curious! What specific tests are implemented?
-----------------------------------------------------

A rough list of the security checks offered by the tool is outlined below.

  * High risk flaws (potentially leading to system compromise):

    - Server-side SQL injection (including blind vectors, numerical 
      parameters).
    - Explicit SQL-like syntax in GET or POST parameters.
    - Server-side shell command injection (including blind vectors).
    - Server-side XML / XPath injection (including blind vectors).
    - Format string vulnerabilities.
    - Integer overflow vulnerabilities. 

  * Medium risk flaws (potentially leading to data compromise):

    - Stored and reflected XSS vectors in document body (minimal JS XSS 
      support present).
    - Stored and reflected XSS vectors via HTTP redirects.
    - Stored and reflected XSS vectors via HTTP header splitting.
    - Directory traversal (including constrained vectors).
    - Assorted file POIs (server-side sources, configs, etc).
    - Attacker-supplied script and CSS inclusion vectors (stored and 
      reflected).
    - External untrusted script and CSS inclusion vectors.
    - Mixed content problems on script and CSS resources (optional).
    - Incorrect or missing MIME types on renderables.
    - Generic MIME types on renderables.
    - Incorrect or missing charsets on renderables.
    - Conflicting MIME / charset info on renderables.
    - Bad caching directives on cookie setting responses. 

  * Low risk issues (limited impact or low specificity):

    - Directory listing bypass vectors.
    - Redirection to attacker-supplied URLs (stored and reflected).
    - Attacker-supplied embedded content (stored and reflected).
    - External untrusted embedded content.
    - Mixed content on non-scriptable subresources (optional).
    - HTTP credentials in URLs.
    - Expired or not-yet-valid SSL certificates.
    - HTML forms with no XSRF protection.
    - Self-signed SSL certificates.
    - SSL certificate host name mismatches.
    - Bad caching directives on less sensitive content. 

  * Internal warnings:

    - Failed resource fetch attempts.
    - Exceeded crawl limits.
    - Failed 404 behavior checks.
    - IPS filtering detected.
    - Unexpected response variations.
    - Seemingly misclassified crawl nodes. 

  * Non-specific informational entries:

    - General SSL certificate information.
    - Significantly changing HTTP cookies.
    - Changing Server, Via, or X-... headers.
    - New 404 signatures.
    - Resources that cannot be accessed.
    - Resources requiring HTTP authentication.
    - Broken links.
    - Server errors.
    - All external links not classified otherwise (optional).
    - All external e-mails (optional).
    - All external URL redirectors (optional).
    - Links to unknown protocols.
    - Form fields that could not be autocompleted.
    - All HTML forms detected.
    - Password entry forms (for external brute-force).
    - Numerical file names (for external brute-force).
    - User-supplied links otherwise rendered on a page.
    - Incorrect or missing MIME type on less significant content.
    - Generic MIME type on less significant content.
    - Incorrect or missing charset on less significant content.
    - Conflicting MIME / charset information on less significant content.
    - OGNL-like parameter passing conventions. 

Along with a list of identified issues, skipfish also provides summary 
overviews of document types and issue types found; and an interactive sitemap, 
with nodes discovered through brute-force denoted in a distinctive way.

-----------------------------------------------------------
3. All right, I want to try it out. What do I need to know?
-----------------------------------------------------------

First and foremost, please do not be evil. Use skipfish only against services 
you own, or have a permission to test.

Keep in mind that all types of security testing can be disruptive. Although the 
scanner is designed not to carry out disruptive malicious attacks, it may 
accidentally interfere with the operations of the site. You must accept the 
risk, and plan accordingly. Run the scanner against test instances where 
feasible, and be prepared to deal with the consequences if things go wrong.

Also note that the tool is meant to be used by security professionals, and is 
experimental in nature. It may return false positives or miss obvious security 
problems - and even when it operates perfectly, it is simply not meant to be a 
point-and-click application. Do not rely on its output at face value.
How to run the scanner?

To compile it, simply unpack the archive and try make. Chances are, you will 
need to install libidn first.

Next, you need to copy the desired dictionary file from dictionaries/ to 
skipfish.wl. Please read dictionaries/README-FIRST carefully to make the right 
choice. This step has a profound impact on the quality of scan results later on.

Once you have the dictionary selected, you can try:

$ ./skipfish -o output_dir http://www.example.com/some/starting/path.txt

Note that you can provide more than one starting URL if so desired; all of them 
will be crawled.

In the example above, skipfish will scan the entire www.example.com (including 
services on other ports, if linked to from the main page), and write a report 
to output_dir/index.html. You can then view this report with your favorite 
browser (JavaScript must be enabled). The index.html file is static; actual 
results are stored as a hierarchy of JSON files, suitable for machine 
processing if needs be.

Some sites may require authentication; for simple HTTP credentials, you can try:

$ ./skipfish -A user:pass ...other parameters...

Alternatively, if the site relies on HTTP cookies instead, log in in your 
browser or using a simple curl script, and then provide skipfish with a session 
cookie:

$ ./skipfish -C name=val ...other parameters...

Other session cookies may be passed the same way, one per each -C option.

Certain URLs on the site may log out your session; you can combat this in two 
ways: by using the -N option, which causes the scanner to reject attempts to 
set or delete cookies; or with the -X parameter, which prevents matching URLs 
from being fetched:

$ ./skipfish -X /logout/logout.aspx ...other parameters...

The -X option is also useful for speeding up your scans by excluding /icons/, 
/doc/, /manuals/, and other standard, mundane locations along these lines. In 
general, you can use -X, plus -I (only spider URLs matching a substring) and -S 
(ignore links on pages where a substring appears in response body) to limit the 
scope of a scan any way you like - including restricting it only to a specific 
protocol and port:

$ ./skipfish -I http://example.com:1234/ ...other parameters...

Another useful scoping option is -D - allowing you to specify additional hosts 
or domains to consider in-scope for the test. By default, all hosts appearing 
in the command-line URLs are added to the list - but you can use -D to broaden 
these rules, for example:

$ ./skipfish -D test2.example.com -o output-dir http://test1.example.com/

...or, for a domain wildcard match, use:

$ ./skipfish -D .example.com -o output-dir http://test1.example.com/

In some cases, you do not want to actually crawl a third-party domain, but you 
trust the owner of that domain enough not to worry about cross-domain content 
inclusion from that location. To suppress warnings, you can use the -B option, 
for example:

$ ./skipfish -B .google-analytics.com -B .googleapis.com ...other parameters...

By default, skipfish sends minimalistic HTTP headers to reduce the amount of 
data exchanged over the wire; some sites examine User-Agent strings or header 
ordering to reject unsupported clients, however. In such a case, you can use -b 
ie or -b ffox to mimic one of the two popular browsers.

When it comes to customizing your HTTP requests, you can also use the -H option 
to insert any additional, non-standard headers; or -F to define a custom 
mapping between a host and an IP (bypassing the resolver). The latter feature 
is particularly useful for not-yet-launched or legacy services.

Some sites may be too big to scan in a reasonable timeframe. If the site 
features well-defined tarpits - for example, 100,000 nearly identical user 
profiles as a part of a social network - these specific locations can be 
excluded with -X or -S. In other cases, you may need to resort to other 
settings: -d limits crawl depth to a specified number of subdirectories; -c 
limits the number of children per directory; and -r limits the total number of 
requests to send in a scan.

An interesting option is available for repeated assessments: -p. By specifying 
a percentage between 1 and 100%, it is possible to tell the crawler to follow 
fewer than 100% of all links, and try fewer than 100% of all dictionary 
entries. This - naturally - limits the completeness of a scan, but unlike most 
other settings, it does so in a balanced, non-deterministic manner. It is 
extremely useful when you are setting up time-bound, but periodic assessments 
of your infrastructure. Another related option is -q, which sets the initial 
random seed for the crawler to a specified value. This can be used to exactly 
reproduce a previous scan to compare results. Randomness is relied upon most 
heavily in the -p mode, but also for making a couple of other scan management 
decisions elsewhere.

Some particularly complex (or broken) services may involve a very high number 
of identical or nearly identical pages. Although these occurrences are by 
default grayed out in the report, they still use up some screen estate and take 
a while to process on JavaScript level. In such extreme cases, you may use the 
-Q option to suppress reporting of duplicate nodes altogether, before the 
report is written. This may give you a less comprehensive understanding of how 
the site is organized, but has no impact on test coverage.

In certain quick assessments, you might also have no interest in paying any 
particular attention to the desired functionality of the site - hoping to 
explore non-linked secrets only. In such a case, you may specify -P to inhibit 
all HTML parsing. This limits the coverage and takes away the ability for the 
scanner to learn new keywords by looking at the HTML, but speeds up the test 
dramatically. Another similarly crippling option that reduces the risk of 
persistent effects of a scan is -O, which inhibits all form parsing and 
submission steps.

By default, skipfish complains loudly about all MIME or character set 
mismatches on renderable documents, and classifies many of them as "medium 
risk"; this is because, if any user-controlled content is returned, the 
situation could lead to cross-site scripting attacks in certain browsers. On 
some poorly designed and maintained sites, this may contribute too much noise; 
if so, you may use -J to mark these issues as "low risk" unless the scanner can 
explicitly sees its own user input being echoed back on the resulting page. 
This may miss many subtle attack vectors, though.

Some sites that handle sensitive user data care about SSL - and about getting 
it right. Skipfish may optionally assist you in figuring out problematic mixed 
content scenarios - use the -M option to enable this. The scanner will complain 
about situations such as http:// scripts being loaded on https:// pages - but 
will disregard non-risk scenarios such as images.

Likewise, certain pedantic sites may care about cases where caching is 
restricted on HTTP/1.1 level, but no explicit HTTP/1.0 caching directive is 
given on specifying -E in the command-line causes skipfish to log all such 
cases carefully.

Lastly, in some assessments that involve self-contained sites without extensive 
user content, the auditor may care about any external e-mails or HTTP links 
seen, even if they have no immediate security impact. Use the -U option to have 
these logged.

Dictionary management is a special topic, and - as mentioned - is covered in 
more detail in dictionaries/README-FIRST. Please read that file before 
proceeding. Some of the relevant options include -W to specify a custom 
wordlist, -L to suppress auto-learning, -V to suppress dictionary updates, -G 
to limit the keyword guess jar size, -R to drop old dictionary entries, and -Y 
to inhibit expensive $keyword.$extension fuzzing.

Skipfish also features a form auto-completion mechanism in order to maximize 
scan coverage. The values should be non-malicious, as they are not meant to 
implement security checks - but rather, to get past input validation logic. You 
can define additional rules, or override existing ones, with the -T option (-T 
form_field_name=field_value, e.g. -T login=test123 -T password=test321 - 
although note that -C and -A are a much better method of logging in).

There is also a handful of performance-related options. Use -g to set the 
maximum number of connections to maintain, globally, to all targets (it is 
sensible to keep this under 50 or so to avoid overwhelming the TCP/IP stack on 
your system or on the nearby NAT / firewall devices); and -m to set the per-IP 
limit (experiment a bit: 2-4 is usually good for localhost, 4-8 for local 
networks, 10-20 for external targets, 30+ for really lagged or non-keep-alive 
hosts). You can also use -w to set the I/O timeout (i.e., skipfish will wait 
only so long for an individual read or write), and -t to set the total request 
timeout, to account for really slow or really fast sites.

Lastly, -f controls the maximum number of consecutive HTTP errors you are 
willing to see before aborting the scan; and -s sets the maximum length of a 
response to fetch and parse (longer responses will be truncated).

--------------------------------
4. But seriously, how to run it?
--------------------------------

A standard, authenticated scan of a well-designed and self-contained site 
(warns about all external links, e-mails, mixed content, and caching header 
issues):

$ ./skipfish -MEU -C "AuthCookie=value" -X /logout.aspx -o output_dir \
  http://www.example.com/

Five-connection crawl, but no brute-force; pretending to be MSIE and caring 
less about ambiguous MIME or character set mismatches:

$ ./skipfish -m 5 -LVJ -W /dev/null -o output_dir -b ie http://www.example.com/

Brute force only (no HTML link extraction), trusting links within example.com 
and timing out after 5 seconds:

$ ./skipfish -B .example.com -O -o output_dir -t 5 http://www.example.com/

For a short list of all command-line options, try ./skipfish -h.

----------------------------------------------------
5. How to interpret and address the issues reported?
----------------------------------------------------

Most of the problems reported by skipfish should self-explanatory, assuming you 
have a good gasp of the fundamentals of web security. If you need a quick 
refresher on some of the more complicated topics, such as MIME sniffing, you 
may enjoy our comprehensive Browser Security Handbook as a starting point:

  http://code.google.com/p/browsersec/

If you still need assistance, there are several organizations that put a 
considerable effort into documenting and explaining many of the common web 
security threats, and advising the public on how to address them. I encourage 
you to refer to the materials published by OWASP and Web Application Security 
Consortium, amongst others:

  * http://www.owasp.org/index.php/Category:Principle
  * http://www.owasp.org/index.php/Category:OWASP_Guide_Project
  * http://www.webappsec.org/projects/articles/

Although I am happy to diagnose problems with the scanner itself, I regrettably 
cannot offer any assistance with the inner wokings of third-party web 
applications.

---------------------------------------
6. Known limitations / feature wishlist
---------------------------------------

Below is a list of features currently missing in skipfish. If you wish to 
improve the tool by contributing code in one of these areas, please let me know:

  * Buffer overflow checks: after careful consideration, I suspect there is 
    no reliable way to test for buffer overflows remotely. Much like the actual 
    fault condition we are looking for, proper buffer size checks may also
    result in uncaught exceptions, 500 messages, etc. I would love to be proved
    wrong, though. 

  * Fully-fledged JavaScript XSS detection: several rudimentary checks are 
    present in the code, but there is no proper script engine to evaluate 
    expressions and DOM access built in. 

  * Variable length encoding character consumption / injection bugs: these 
    problems seem to be largely addressed on browser level at this point, so
    they were much lower priority at the time of this writing. 

  * Security checks and link extraction for third-party, plugin-based content 
    (Flash, Java, PDF, etc). 

  * Password brute-force and numerical filename brute-force probes. 

  * Search engine integration (vhosts, starting paths). 

  * VIEWSTATE decoding. 

  * NTLM and digest authentication. 

  * Proxy support: somewhat incompatible with performance control features 
    currently employed by skipfish; but in the long run, should be provided as
    a last-resort option. 

  * Scan resume option. 

  * Standalone installation (make install) support. 

  * Config file support. 

-------------------------------------
7. Oy! Something went horribly wrong!
-------------------------------------

There is no web crawler so good that there wouldn't be a web framework to one 
day set it on fire. If you encounter what appears to be bad behavior (e.g., a 
scan that takes forever and generates too many requests, completely bogus nodes 
in scan output, or outright crashes), please recompile the scanner with:

$ make clean debug

...and re-run it this way:

$ ./skipfish [...previous options...] 2>logfile.txt

You can then inspect logfile.txt to get an idea what went wrong; if it looks 
like a scanner problem, please scrub any sensitive information from the log 
file and send it to the author.

If the scanner crashed, please recompile it as indicated above, and then type:

$ ulimit -c unlimited
$ ./skipfish [...previous options...] 2>logfile.txt
$ gdb --batch -ex back ./skipfish core

...and be sure to send the author the output of that last command as well.

-----------------------
8. Credits and feedback
-----------------------

Skipfish is made possible thanks to the contributions of, and valuable feedback 
from, Google's information security engineering team.

If you have any bug reports, questions, suggestions, or concerns regarding the 
application, the author can be reached at lcamtuf@google.com.