1.76b: Major clean-up of dictionary instructions.

This commit is contained in:
Steve Pinkham 2010-11-21 07:43:07 -05:00
parent 088136e95e
commit 806e8eedea
4 changed files with 120 additions and 181 deletions

View File

@ -1,3 +1,8 @@
Version 1.76b:
--------------
- Major clean-up of dictionary instructions.
Version 1.75b:
--------------

View File

@ -20,7 +20,7 @@
#
PROGNAME = skipfish
VERSION = 1.74b
VERSION = 1.76b
OBJFILES = http_client.c database.c crawler.c analysis.c report.c
INCFILES = alloc-inl.h string-inl.h debug.h types.h http_client.h \

View File

@ -1,195 +1,129 @@
This directory contains four alternative, hand-picked Skipfish dictionaries.
Before you pick one, you should understand several basic concepts related to
dictionary management in this scanner, as this topic is of critical importance
to the quality of your scans.
PLEASE READ THIS FILE CAREFULLY BEFORE PICKING ONE. This is *critical* to
getting good results in your work.
----------------
Dictionary modes
----------------
The basic modes you should be aware of (in order of request cost):
1) Orderly crawl with no DirBuster-like brute-force at all. In this mode, the
scanner will not discover non-linked resources such as /admin,
/index.php.old, etc:
./skipfish -W /dev/null -LV [...other options...]
This mode is very fast, but *NOT* recommended for general use because of
limited coverage. Use only where absolutely necessary.
2) Orderly scan with minimal extension brute-force. In this mode, the scanner
will not discover resources such as /admin, but will discover cases such as
/index.php.old:
cp dictionaries/extensions-only.wl dictionary.wl
./skipfish -W dictionary.wl -Y [...other options...]
This method is only slightly more request-intensive than #1, and therefore,
generally recommended in cases where time is of essence. The cost is about
90 requests per fuzzed location.
3) Directory OR extension brute-force only. In this mode, the scanner will only
try fuzzing the file name, or the extension, at any given time - but will
not try every possible ${filename}.${extension} pair from the dictionary.
cp dictionaries/complete.wl dictionary.wl
./skipfish -W dictionary.wl -Y [...other options...]
This method has a cost of about 1,700 requests per fuzzed location, and is
recommended for rapid assessments, especially when working with slow
servers.
4) Normal dictionary fuzzing. In this mode, every ${filename}.${extension}
pair will be attempted. This mode is significantly slower, but offers
superior coverage, and should be your starting point.
cp dictionaries/XXX.wl dictionary.wl
./skipfish -W dictionary.wl [...other options...]
Replace XXX with:
minimal - recommended starter dictionary, mostly focusing on backup
and source files, under 50,000 requests per fuzzed location.
medium - more thorough dictionary, focusing on common frameworks,
under 100,000 requests.
complete - all-inclusive dictionary, over 150,000 requests.
This mode is recommended when doing thorough assessments of reasonably
responsive servers.
As should be obvious, the -W option points to a dictionary to be used; the
scanner updates the file based on scan results, so please always make a
target-specific copy - do not use the master file directly, or it may be
polluted with keywords not relevant to other targets.
Additional options supported by the aforementioned modes:
-L - do not automatically learn new keywords based on site content.
This option should not be normally used in most scanning
modes; *not* using it significantly improves the coverage of
minimal.wl.
-G num - specifies jar size for keyword candidates selected from the
content; up to <num> candidates are kept and tried during
brute-force checks; when one of them results in a unique
non-404 response, it is promoted to the dictionary proper.
-V - prevents the scanner from updating the dictionary file with
newly discovered keywords and keyword usage stats (i.e., all
new findings are discarded on exit).
-Y - inhibits full ${filename}.${extension} brute-force: the scanner
will only brute-force one component at a time. This greatly
improves scan times, but reduces coverage.
-R num - purges all dictionary entries that had no non-404 hits for
the last <num> scans. Prevents dictionary creep in repeated
assessments, but use with care!
-----------------------------
Dictionary management basics:
More about dictionary design:
-----------------------------
1) Each dictionary may consist of a number of extensions, and a number of
"regular" keywords. Extensions are considered just a special subset of
the keyword list.
Each dictionary may consist of a number of extensions, and a number of
"regular" keywords. Extensions are considered just a special subset of
the keyword list.
2) Use -W to specify the dictionary file to use. The dictionary may be
custom, but must conform to the following format:
You can create custom dictionaries, conforming to this format:
type hits total_age last_age keyword
type hits total_age last_age keyword
...where 'type' is either 'e' or 'w' (extension or wordlist); 'hits'
is the total number of times this keyword resulted in a non-404 hit
in all previous scans; 'total_age' is the number of scan cycles this
word is in the dictionary; 'last_age' is the number of scan cycles
since the last 'hit'; and 'keyword' is the actual keyword.
...where 'type' is either 'e' or 'w' (extension or wordlist); 'hits'
is the total number of times this keyword resulted in a non-404 hit
in all previous scans; 'total_age' is the number of scan cycles this
word is in the dictionary; 'last_age' is the number of scan cycles
since the last 'hit'; and 'keyword' is the actual keyword.
Do not duplicate extensions as keywords - if you already have 'html' as
an 'e' entry, there is no need to also create a 'w' one.
Do not duplicate extensions as keywords - if you already have 'html' as
an 'e' entry, there is no need to also create a 'w' one.
There must be no empty or malformed lines, comments in the wordlist
file. Extension keywords must have no leading dot (e.g., 'exe', not '.exe'),
and all keywords should be NOT url-encoded (e.g., 'Program Files', not
'Program%20Files'). No keyword should exceed 64 characters.
There must be no empty or malformed lines, comments in the wordlist
file. Extension keywords must have no leading dot (e.g., 'exe', not '.exe'),
and all keywords should be NOT url-encoded (e.g., 'Program Files', not
'Program%20Files'). No keyword should exceed 64 characters.
If you omit -W in the command line, 'skipfish.wl' is assumed. This
file does not exist by default; this is by design.
If you omit -W in the command line, 'skipfish.wl' is assumed. This
file does not exist by default; this is by design.
3) The scanner will automatically learn new keywords and extensions based on
any links discovered during the scan; and will also analyze pages and
extract words to use as keyword candidates.
The scanner will automatically learn new keywords and extensions based on
any links discovered during the scan; and will also analyze pages and
extract words to use as keyword candidates.
A capped number of candidates is kept in memory (you can set the jar size
with the -G option) in FIFO mode, and are used for brute-force attacks.
When a particular candidate results in a non-404 hit, it is promoted to
the "real" dictionary; other candidates are discarded at the end of the
scan.
You can inhibit this auto-learning behavior by specifying -L in the
command line.
4) Keyword hit counts and age information will be updated at the end of the
scan. This can be prevented with -V.
5) Old dictionary entries with no hits for a specified number of scans can
be purged by specifying the -R <cnt> option.
----------------------------------------------
Dictionaries are used for the following tasks:
----------------------------------------------
1) When a new directory, or a file-like query or POST parameter is discovered,
the scanner attempts passing all possible <keyword> values to discover new
files, directories, etc.
2) The scanner also tests all possible <keyword>.<extension> pairs. Note that
this results in several orders of magnitude more requests, but is the only
way to discover files such as 'backup.tar.gz', 'database.csv', etc.
In some cases, you might want to inhibit this step. This can be achieved
with the -Y switch.
3) For any non-404 file or directory discovered by any other means, the scanner
also attempts all <node_filename>.<extension> combinations, to discover,
for example, entries such as 'index.php.old'. This behavior is independent
of the -Y option, since it is much less request-intensive.
----------------------
Supplied dictionaries:
----------------------
1) Empty dictionary (-).
Simply create an empty file, then load it via -W. If you use this option
in conjunction with -L, this essentially inhibits all brute-force testing,
and results in an orderly, link-based crawl.
If -L is not used, the crawler will still attempt brute-force, but only
based on the keywords and extensions discovered when crawling the site.
This means it will likely learn keywords such as 'index' or extensions
such as 'html' - but may never attempt probing for 'log', 'old', 'bak', etc.
Both these variants are very useful for lightweight scans, but are not
particularly exhaustive.
2) Extension-only dictionary (extensions-only.wl).
This dictionary contains about 90 common file extensions, and no other
keywords. It must be used in conjunction with -Y (otherwise, it will not
behave as expected).
This is often a better alternative to a null dictionary: the scanner will
still limit brute-force primarily to file names learned on the site, but
will know about extensions such as 'log' or 'old', and will test for them
accordingly.
3) Basic extensions dictionary (minimal.wl).
This dictionary contains about 25 extensions, focusing on common entries
most likely to spell trouble (.bak, .old, .conf, .zip, etc); and about 1,700
hand-picked keywords.
This is useful for quick assessments where no obscure technologies are used.
The principal scan cost is about 42,000 requests per each fuzzed directory.
Using it without -L is recommended, as the list of extensions does not
include standard framework-specific cases (.asp, .jsp, .php, etc), and
these are best learned on the fly.
** This dictionary is strongly recommended for your first experiments with
** skipfish, as it's reasonably lightweight.
You can also use this dictionary with -Y option enabled, approximating the
behavior of most other security scanners; in this case, it will send only
about 1,700 requests per directory, and will look for 25 secondary extensions
only on otherwise discovered resources.
3) Standard extensions dictionary (default.wl).
This dictionary contains about 60 common extensions, plus the same set of
1,700 keywords. The extensions cover most of the common, interesting web
resources.
This is a good starting point for assessments where scan times are not
a critical factor; the cost is about 100,000 requests per each fuzzed
directory.
In -Y mode, it behaves nearly identical to minimal.wl, but will test a
greater set of extensions on otherwise discovered resources at a relatively
minor expense.
4) Complete extensions dictionary (complete.wl).
Contains about 90 common extensions and 1,700 keywords. These extensions
cover a broader range of media types, including some less common programming
languages, image and video formats, etc.
Useful for comprehensive assessments, over 150,000 requests per each fuzzed
directory.
In -Y mode, this dictionary offers the best coverage of all three wordlists
at a relatively low cost.
Of course, you can customize these dictionaries as seen fit. It might be, for
example, a good idea to downgrade file extensions not likely to occur given
the technologies used by your target host to regular 'w' records.
Whichever option you choose, be sure to make a *copy* of this dictionary, and
load that copy, not the original, via -W. The specified file will be overwritten
with site-specific information unless -V used - and you probably want to keep
the original around.
----------------------------------
Bah, these dictionaries are small!
----------------------------------
Keep in mind that web crawling is not password guessing; it is exceedingly
unlikely for web servers to have directories or files named 'henceforth',
'abating', or 'witlessly'. Because of this, using 200,000+ entry English
wordlists, or similar data sets, is largely pointless.
More importantly, doing so often leads to reduced coverage or unacceptable
scan times; with a 200k wordlist and 80 extensions, trying all combinations
for a single directory would take 30-40 hours against a slow server; and even
with a fast one, at least 5 hours is to be expected.
DirBuster uses a unique approach that seems promising at first sight - to
base their wordlists on how often a particular keyword appeared in URLs seen on
the Internet. This is interesting, but comes with two gotchas:
- Keywords related to popular websites and brands are heavily
overrepresented; DirBuster wordlists have 'bbc_news_24', 'beebie_bunny',
and 'koalabrothers' near the top of their list, but it is pretty unlikely
these keywords would be of any use in real-world assessments of a typical
site, unless it happens to be BBC or Disney.
- Some of the most interesting security-related keywords are not commonly
indexed, and may appear, say, on no more than few dozen or few thousand
crawled websites in Google index. But, that does not make 'AggreSpy' or
'.ssh/authorized_keys' any less interesting - in fact, you might care
about them a whole lot more.
Bottom line is, tread carefully; poor wordlists are one of the reasons why some
web security scanners perform worse than expected. You will almost always be
better off narrowing down or selectively extending the supplied set (and
possibly contributing back your changes upstream!), than importing a giant
Tread carefully; poor wordlists are one of the reasons why some web security
scanners perform worse than expected. You will almost always be better off
narrowing down or selectively extending the supplied set (and possibly
contributing back your changes upstream!), than importing a giant
wordlist scored elsewhere.