500 lines
23 KiB
Plaintext
500 lines
23 KiB
Plaintext
===========================================
|
|
skipfish - web application security scanner
|
|
===========================================
|
|
|
|
http://code.google.com/p/skipfish/
|
|
|
|
* Written and maintained by Michal Zalewski <lcamtuf@google.com>.
|
|
* Copyright 2009, 2010 Google Inc, rights reserved.
|
|
* Released under terms and conditions of the Apache License, version 2.0.
|
|
|
|
--------------------
|
|
1. What is skipfish?
|
|
--------------------
|
|
|
|
Skipfish is an active web application security reconnaissance tool. It prepares
|
|
an interactive sitemap for the targeted site by carrying out a recursive crawl
|
|
and dictionary-based probes. The resulting map is then annotated with the
|
|
output from a number of active (but hopefully non-disruptive) security checks.
|
|
The final report generated by the tool is meant to serve as a foundation for
|
|
professional web application security assessments.
|
|
Why should I bother with this particular tool?
|
|
|
|
A number of commercial and open source tools with analogous functionality is
|
|
readily available (e.g., Nikto, Nessus); stick to the one that suits you best.
|
|
That said, skipfish tries to address some of the common problems associated
|
|
with web security scanners. Specific advantages include:
|
|
|
|
* High performance: 500+ requests per second against responsive Internet
|
|
targets, 2000+ requests per second on LAN / MAN networks, and 7000+ requests
|
|
against local instances has been observed, with a very modest CPU, network,
|
|
and memory footprint. This can be attributed to:
|
|
|
|
- Multiplexing single-thread, fully asynchronous network I/O and data
|
|
processing model that eliminates memory management, scheduling, and IPC
|
|
inefficiencies present in some multi-threaded clients.
|
|
|
|
- Advanced HTTP/1.1 features such as range requests, content
|
|
compression, and keep-alive connections, as well as forced response size
|
|
limiting, to keep network-level overhead in check.
|
|
|
|
- Smart response caching and advanced server behavior heuristics are
|
|
used to minimize unnecessary traffic.
|
|
|
|
- Performance-oriented, pure C implementation, including a custom
|
|
HTTP stack.
|
|
|
|
* Ease of use: skipfish is highly adaptive and reliable. The scanner
|
|
features:
|
|
|
|
- Heuristic recognition of obscure path- and query-based parameter
|
|
handling schemes.
|
|
|
|
- Graceful handling of multi-framework sites where certain paths obey
|
|
a completely different semantics, or are subject to different filtering
|
|
rules.
|
|
|
|
- Automatic wordlist construction based on site content analysis.
|
|
|
|
- Probabilistic scanning features to allow periodic, time-bound
|
|
assessments of arbitrarily complex sites.
|
|
|
|
* Well-designed security checks: the tool is meant to provide accurate and
|
|
meaningful results:
|
|
|
|
- Three-step differential probes are preferred to signature checks
|
|
for detecting vulnerabilities.
|
|
|
|
- Ratproxy-style logic is used to spot subtle security problems:
|
|
cross-site request forgery, cross-site script inclusion, mixed content,
|
|
issues MIME- and charset mismatches, incorrect caching directive, etc.
|
|
|
|
- Bundled security checks are designed to handle tricky scenarios:
|
|
stored XSS (path, parameters, headers), blind SQL or XML injection, or
|
|
blind shell injection.
|
|
|
|
- Report post-processing drastically reduces the noise caused by any
|
|
remaining false positives or server gimmicks by identifying repetitive
|
|
patterns.
|
|
|
|
That said, skipfish is not a silver bullet, and may be unsuitable for certain
|
|
purposes. For example, it does not satisfy most of the requirements outlined in
|
|
WASC Web Application Security Scanner Evaluation Criteria (some of them on
|
|
purpose, some out of necessity); and unlike most other projects of this type,
|
|
it does not come with an extensive database of known vulnerabilities for
|
|
banner-type checks.
|
|
|
|
-----------------------------------------------------
|
|
2. Most curious! What specific tests are implemented?
|
|
-----------------------------------------------------
|
|
|
|
A rough list of the security checks offered by the tool is outlined below.
|
|
|
|
* High risk flaws (potentially leading to system compromise):
|
|
|
|
- Server-side SQL injection (including blind vectors, numerical
|
|
parameters).
|
|
- Explicit SQL-like syntax in GET or POST parameters.
|
|
- Server-side shell command injection (including blind vectors).
|
|
- Server-side XML / XPath injection (including blind vectors).
|
|
- Format string vulnerabilities.
|
|
- Integer overflow vulnerabilities.
|
|
|
|
* Medium risk flaws (potentially leading to data compromise):
|
|
|
|
- Stored and reflected XSS vectors in document body (minimal JS XSS
|
|
support present).
|
|
- Stored and reflected XSS vectors via HTTP redirects.
|
|
- Stored and reflected XSS vectors via HTTP header splitting.
|
|
- Directory traversal (including constrained vectors).
|
|
- Assorted file POIs (server-side sources, configs, etc).
|
|
- Attacker-supplied script and CSS inclusion vectors (stored and
|
|
reflected).
|
|
- External untrusted script and CSS inclusion vectors.
|
|
- Mixed content problems on script and CSS resources (optional).
|
|
- Incorrect or missing MIME types on renderables.
|
|
- Generic MIME types on renderables.
|
|
- Incorrect or missing charsets on renderables.
|
|
- Conflicting MIME / charset info on renderables.
|
|
- Bad caching directives on cookie setting responses.
|
|
|
|
* Low risk issues (limited impact or low specificity):
|
|
|
|
- Directory listing bypass vectors.
|
|
- Redirection to attacker-supplied URLs (stored and reflected).
|
|
- Attacker-supplied embedded content (stored and reflected).
|
|
- External untrusted embedded content.
|
|
- Mixed content on non-scriptable subresources (optional).
|
|
- HTTP credentials in URLs.
|
|
- Expired or not-yet-valid SSL certificates.
|
|
- HTML forms with no XSRF protection.
|
|
- Self-signed SSL certificates.
|
|
- SSL certificate host name mismatches.
|
|
- Bad caching directives on less sensitive content.
|
|
|
|
* Internal warnings:
|
|
|
|
- Failed resource fetch attempts.
|
|
- Exceeded crawl limits.
|
|
- Failed 404 behavior checks.
|
|
- IPS filtering detected.
|
|
- Unexpected response variations.
|
|
- Seemingly misclassified crawl nodes.
|
|
|
|
* Non-specific informational entries:
|
|
|
|
- General SSL certificate information.
|
|
- Significantly changing HTTP cookies.
|
|
- Changing Server, Via, or X-... headers.
|
|
- New 404 signatures.
|
|
- Resources that cannot be accessed.
|
|
- Resources requiring HTTP authentication.
|
|
- Broken links.
|
|
- Server errors.
|
|
- All external links not classified otherwise (optional).
|
|
- All external e-mails (optional).
|
|
- All external URL redirectors (optional).
|
|
- Links to unknown protocols.
|
|
- Form fields that could not be autocompleted.
|
|
- All HTML forms detected.
|
|
- Password entry forms (for external brute-force).
|
|
- Numerical file names (for external brute-force).
|
|
- User-supplied links otherwise rendered on a page.
|
|
- Incorrect or missing MIME type on less significant content.
|
|
- Generic MIME type on less significant content.
|
|
- Incorrect or missing charset on less significant content.
|
|
- Conflicting MIME / charset information on less significant content.
|
|
- OGNL-like parameter passing conventions.
|
|
|
|
Along with a list of identified issues, skipfish also provides summary
|
|
overviews of document types and issue types found; and an interactive sitemap,
|
|
with nodes discovered through brute-force denoted in a distinctive way.
|
|
|
|
-----------------------------------------------------------
|
|
3. All right, I want to try it out. What do I need to know?
|
|
-----------------------------------------------------------
|
|
|
|
First and foremost, please do not be evil. Use skipfish only against services
|
|
you own, or have a permission to test.
|
|
|
|
Keep in mind that all types of security testing can be disruptive. Although the
|
|
scanner is designed not to carry out malicious attacks, it may accidentally
|
|
interfere with the operations of the site. You must accept the risk, and plan
|
|
accordingly. Run the scanner against test instances where feasible, and be
|
|
prepared to deal with the consequences if things go wrong.
|
|
|
|
Also note that the tool is meant to be used by security professionals, and is
|
|
experimental in nature. It may return false positives or miss obvious security
|
|
problems - and even when it operates perfectly, it is simply not meant to be a
|
|
point-and-click application. Do not rely on its output at face value.
|
|
|
|
Running the tool against vendor-supplied demo sites is not a good way to
|
|
evaluate it, as they usually approximate vulnerabilities very imperfectly; we
|
|
made no effort to accommodate these cases.
|
|
|
|
Lastly, the scanner is simply not designed for dealing with rogue and
|
|
misbehaving HTTP servers - and offers no guarantees of safe (or sane) behavior
|
|
there.
|
|
|
|
--------------------------
|
|
4. How to run the scanner?
|
|
--------------------------
|
|
|
|
To compile it, simply unpack the archive and try make. Chances are, you will
|
|
need to install libidn first.
|
|
|
|
Next, you need to copy the desired dictionary file from dictionaries/ to
|
|
skipfish.wl. Please read dictionaries/README-FIRST carefully to make the right
|
|
choice. This step has a profound impact on the quality of scan results later on.
|
|
|
|
Once you have the dictionary selected, you can try:
|
|
|
|
$ ./skipfish -o output_dir http://www.example.com/some/starting/path.txt
|
|
|
|
Note that you can provide more than one starting URL if so desired; all of them
|
|
will be crawled.
|
|
|
|
In the example above, skipfish will scan the entire www.example.com (including
|
|
services on other ports, if linked to from the main page), and write a report
|
|
to output_dir/index.html. You can then view this report with your favorite
|
|
browser (JavaScript must be enabled). The index.html file is static; actual
|
|
results are stored as a hierarchy of JSON files, suitable for machine
|
|
processing if needs be.
|
|
|
|
Some sites may require authentication; for simple HTTP credentials, you can try:
|
|
|
|
$ ./skipfish -A user:pass ...other parameters...
|
|
|
|
Alternatively, if the site relies on HTTP cookies instead, log in in your
|
|
browser or using a simple curl script, and then provide skipfish with a session
|
|
cookie:
|
|
|
|
$ ./skipfish -C name=val ...other parameters...
|
|
|
|
Other session cookies may be passed the same way, one per each -C option.
|
|
|
|
Certain URLs on the site may log out your session; you can combat this in two
|
|
ways: by using the -N option, which causes the scanner to reject attempts to
|
|
set or delete cookies; or with the -X parameter, which prevents matching URLs
|
|
from being fetched:
|
|
|
|
$ ./skipfish -X /logout/logout.aspx ...other parameters...
|
|
|
|
The -X option is also useful for speeding up your scans by excluding /icons/,
|
|
/doc/, /manuals/, and other standard, mundane locations along these lines. In
|
|
general, you can use -X, plus -I (only spider URLs matching a substring) and -S
|
|
(ignore links on pages where a substring appears in response body) to limit the
|
|
scope of a scan any way you like - including restricting it only to a specific
|
|
protocol and port:
|
|
|
|
$ ./skipfish -I http://example.com:1234/ ...other parameters...
|
|
|
|
Another useful scoping option is -D - allowing you to specify additional hosts
|
|
or domains to consider in-scope for the test. By default, all hosts appearing
|
|
in the command-line URLs are added to the list - but you can use -D to broaden
|
|
these rules, for example:
|
|
|
|
$ ./skipfish -D test2.example.com -o output-dir http://test1.example.com/
|
|
|
|
...or, for a domain wildcard match, use:
|
|
|
|
$ ./skipfish -D .example.com -o output-dir http://test1.example.com/
|
|
|
|
In some cases, you do not want to actually crawl a third-party domain, but you
|
|
trust the owner of that domain enough not to worry about cross-domain content
|
|
inclusion from that location. To suppress warnings, you can use the -B option,
|
|
for example:
|
|
|
|
$ ./skipfish -B .google-analytics.com -B .googleapis.com ...other parameters...
|
|
|
|
By default, skipfish sends minimalistic HTTP headers to reduce the amount of
|
|
data exchanged over the wire; some sites examine User-Agent strings or header
|
|
ordering to reject unsupported clients, however. In such a case, you can use -b
|
|
ie or -b ffox to mimic one of the two popular browsers.
|
|
|
|
When it comes to customizing your HTTP requests, you can also use the -H option
|
|
to insert any additional, non-standard headers; or -F to define a custom
|
|
mapping between a host and an IP (bypassing the resolver). The latter feature
|
|
is particularly useful for not-yet-launched or legacy services.
|
|
|
|
Some sites may be too big to scan in a reasonable timeframe. If the site
|
|
features well-defined tarpits - for example, 100,000 nearly identical user
|
|
profiles as a part of a social network - these specific locations can be
|
|
excluded with -X or -S. In other cases, you may need to resort to other
|
|
settings: -d limits crawl depth to a specified number of subdirectories; -c
|
|
limits the number of children per directory; and -r limits the total number of
|
|
requests to send in a scan.
|
|
|
|
An interesting option is available for repeated assessments: -p. By specifying
|
|
a percentage between 1 and 100%, it is possible to tell the crawler to follow
|
|
fewer than 100% of all links, and try fewer than 100% of all dictionary
|
|
entries. This - naturally - limits the completeness of a scan, but unlike most
|
|
other settings, it does so in a balanced, non-deterministic manner. It is
|
|
extremely useful when you are setting up time-bound, but periodic assessments
|
|
of your infrastructure. Another related option is -q, which sets the initial
|
|
random seed for the crawler to a specified value. This can be used to exactly
|
|
reproduce a previous scan to compare results. Randomness is relied upon most
|
|
heavily in the -p mode, but also for making a couple of other scan management
|
|
decisions elsewhere.
|
|
|
|
Some particularly complex (or broken) services may involve a very high number
|
|
of identical or nearly identical pages. Although these occurrences are by
|
|
default grayed out in the report, they still use up some screen estate and take
|
|
a while to process on JavaScript level. In such extreme cases, you may use the
|
|
-Q option to suppress reporting of duplicate nodes altogether, before the
|
|
report is written. This may give you a less comprehensive understanding of how
|
|
the site is organized, but has no impact on test coverage.
|
|
|
|
In certain quick assessments, you might also have no interest in paying any
|
|
particular attention to the desired functionality of the site - hoping to
|
|
explore non-linked secrets only. In such a case, you may specify -P to inhibit
|
|
all HTML parsing. This limits the coverage and takes away the ability for the
|
|
scanner to learn new keywords by looking at the HTML, but speeds up the test
|
|
dramatically. Another similarly crippling option that reduces the risk of
|
|
persistent effects of a scan is -O, which inhibits all form parsing and
|
|
submission steps.
|
|
|
|
By default, skipfish complains loudly about all MIME or character set
|
|
mismatches on renderable documents, and classifies many of them as "medium
|
|
risk"; this is because, if any user-controlled content is returned, the
|
|
situation could lead to cross-site scripting attacks in certain browsers. On
|
|
some poorly designed and maintained sites, this may contribute too much noise;
|
|
if so, you may use -J to mark these issues as "low risk" unless the scanner can
|
|
explicitly sees its own user input being echoed back on the resulting page.
|
|
This may miss many subtle attack vectors, though.
|
|
|
|
Some sites that handle sensitive user data care about SSL - and about getting
|
|
it right. Skipfish may optionally assist you in figuring out problematic mixed
|
|
content scenarios - use the -M option to enable this. The scanner will complain
|
|
about situations such as http:// scripts being loaded on https:// pages - but
|
|
will disregard non-risk scenarios such as images.
|
|
|
|
Likewise, certain pedantic sites may care about cases where caching is
|
|
restricted on HTTP/1.1 level, but no explicit HTTP/1.0 caching directive is
|
|
given on specifying -E in the command-line causes skipfish to log all such
|
|
cases carefully.
|
|
|
|
Lastly, in some assessments that involve self-contained sites without extensive
|
|
user content, the auditor may care about any external e-mails or HTTP links
|
|
seen, even if they have no immediate security impact. Use the -U option to have
|
|
these logged.
|
|
|
|
Dictionary management is a special topic, and - as mentioned - is covered in
|
|
more detail in dictionaries/README-FIRST. Please read that file before
|
|
proceeding. Some of the relevant options include -W to specify a custom
|
|
wordlist, -L to suppress auto-learning, -V to suppress dictionary updates, -G
|
|
to limit the keyword guess jar size, -R to drop old dictionary entries, and -Y
|
|
to inhibit expensive $keyword.$extension fuzzing.
|
|
|
|
Skipfish also features a form auto-completion mechanism in order to maximize
|
|
scan coverage. The values should be non-malicious, as they are not meant to
|
|
implement security checks - but rather, to get past input validation logic. You
|
|
can define additional rules, or override existing ones, with the -T option (-T
|
|
form_field_name=field_value, e.g. -T login=test123 -T password=test321 -
|
|
although note that -C and -A are a much better method of logging in).
|
|
|
|
There is also a handful of performance-related options. Use -g to set the
|
|
maximum number of connections to maintain, globally, to all targets (it is
|
|
sensible to keep this under 50 or so to avoid overwhelming the TCP/IP stack on
|
|
your system or on the nearby NAT / firewall devices); and -m to set the per-IP
|
|
limit (experiment a bit: 2-4 is usually good for localhost, 4-8 for local
|
|
networks, 10-20 for external targets, 30+ for really lagged or non-keep-alive
|
|
hosts). You can also use -w to set the I/O timeout (i.e., skipfish will wait
|
|
only so long for an individual read or write), and -t to set the total request
|
|
timeout, to account for really slow or really fast sites.
|
|
|
|
Lastly, -f controls the maximum number of consecutive HTTP errors you are
|
|
willing to see before aborting the scan; and -s sets the maximum length of a
|
|
response to fetch and parse (longer responses will be truncated).
|
|
|
|
--------------------------------
|
|
5. But seriously, how to run it?
|
|
--------------------------------
|
|
|
|
A standard, authenticated scan of a well-designed and self-contained site
|
|
(warns about all external links, e-mails, mixed content, and caching header
|
|
issues):
|
|
|
|
$ ./skipfish -MEU -C "AuthCookie=value" -X /logout.aspx -o output_dir \
|
|
http://www.example.com/
|
|
|
|
Five-connection crawl, but no brute-force; pretending to be MSIE and caring
|
|
less about ambiguous MIME or character set mismatches:
|
|
|
|
$ ./skipfish -m 5 -LVJ -W /dev/null -o output_dir -b ie http://www.example.com/
|
|
|
|
Brute force only (no HTML link extraction), trusting links within example.com
|
|
and timing out after 5 seconds:
|
|
|
|
$ ./skipfish -B .example.com -O -o output_dir -t 5 http://www.example.com/
|
|
|
|
For a short list of all command-line options, try ./skipfish -h.
|
|
|
|
----------------------------------------------------
|
|
6. How to interpret and address the issues reported?
|
|
----------------------------------------------------
|
|
|
|
Most of the problems reported by skipfish should self-explanatory, assuming you
|
|
have a good gasp of the fundamentals of web security. If you need a quick
|
|
refresher on some of the more complicated topics, such as MIME sniffing, you
|
|
may enjoy our comprehensive Browser Security Handbook as a starting point:
|
|
|
|
http://code.google.com/p/browsersec/
|
|
|
|
If you still need assistance, there are several organizations that put a
|
|
considerable effort into documenting and explaining many of the common web
|
|
security threats, and advising the public on how to address them. I encourage
|
|
you to refer to the materials published by OWASP and Web Application Security
|
|
Consortium, amongst others:
|
|
|
|
* http://www.owasp.org/index.php/Category:Principle
|
|
* http://www.owasp.org/index.php/Category:OWASP_Guide_Project
|
|
* http://www.webappsec.org/projects/articles/
|
|
|
|
Although I am happy to diagnose problems with the scanner itself, I regrettably
|
|
cannot offer any assistance with the inner wokings of third-party web
|
|
applications.
|
|
|
|
---------------------------------------
|
|
7. Known limitations / feature wishlist
|
|
---------------------------------------
|
|
|
|
Below is a list of features currently missing in skipfish. If you wish to
|
|
improve the tool by contributing code in one of these areas, please let me know:
|
|
|
|
* Buffer overflow checks: after careful consideration, I suspect there is
|
|
no reliable way to test for buffer overflows remotely. Much like the actual
|
|
fault condition we are looking for, proper buffer size checks may also
|
|
result in uncaught exceptions, 500 messages, etc. I would love to be proved
|
|
wrong, though.
|
|
|
|
* Fully-fledged JavaScript XSS detection: several rudimentary checks are
|
|
present in the code, but there is no proper script engine to evaluate
|
|
expressions and DOM access built in.
|
|
|
|
* Variable length encoding character consumption / injection bugs: these
|
|
problems seem to be largely addressed on browser level at this point, so
|
|
they were much lower priority at the time of this writing.
|
|
|
|
* Security checks and link extraction for third-party, plugin-based content
|
|
(Flash, Java, PDF, etc).
|
|
|
|
* Password brute-force and numerical filename brute-force probes.
|
|
|
|
* Search engine integration (vhosts, starting paths).
|
|
|
|
* VIEWSTATE decoding.
|
|
|
|
* NTLM and digest authentication.
|
|
|
|
* Proxy support: somewhat incompatible with performance control features
|
|
currently employed by skipfish; but in the long run, should be provided as
|
|
a last-resort option.
|
|
|
|
* Scan resume option.
|
|
|
|
* Standalone installation (make install) support.
|
|
|
|
* Config file support.
|
|
|
|
-------------------------------------
|
|
8. Oy! Something went horribly wrong!
|
|
-------------------------------------
|
|
|
|
There is no web crawler so good that there wouldn't be a web framework to one
|
|
day set it on fire. If you encounter what appears to be bad behavior (e.g., a
|
|
scan that takes forever and generates too many requests, completely bogus nodes
|
|
in scan output, or outright crashes), please first check this page:
|
|
|
|
http://code.google.com/p/skipfish/wiki/KnownIssues
|
|
|
|
If you can't find a satisfactory answer there, recompile the scanner with:
|
|
|
|
$ make clean debug
|
|
|
|
...and re-run it this way:
|
|
|
|
$ ./skipfish [...previous options...] 2>logfile.txt
|
|
|
|
You can then inspect logfile.txt to get an idea what went wrong; if it looks
|
|
like a scanner problem, please scrub any sensitive information from the log
|
|
file and send it to the author.
|
|
|
|
If the scanner crashed, please recompile it as indicated above, and then type:
|
|
|
|
$ ulimit -c unlimited
|
|
$ ./skipfish [...previous options...] 2>logfile.txt
|
|
$ gdb --batch -ex back ./skipfish core
|
|
|
|
...and be sure to send the author the output of that last command as well.
|
|
|
|
-----------------------
|
|
9. Credits and feedback
|
|
-----------------------
|
|
|
|
Skipfish is made possible thanks to the contributions of, and valuable feedback
|
|
from, Google's information security engineering team.
|
|
|
|
If you have any bug reports, questions, suggestions, or concerns regarding the
|
|
application, the author can be reached at lcamtuf@google.com.
|