This directory contains four alternative, hand-picked Skipfish dictionaries. PLEASE READ THESE INSTRUCTIONS CAREFULLY BEFORE PICKING ONE. This is *critical* to getting decent results in your scans. ------------------------ Key command-line options ------------------------ Skipfish automatically builds and maintain dictionaries based on the URLs and HTML content encountered while crawling the site. These dictionaries are extremely useful for subsequent scans of the same target, or for future assessments of other platforms belonging to the same customer. Exactly one read-write dictionary needs to be specified for every scan. To create a blank one and feed it to skipfish, you can use this syntax: $ touch new_dict.wl $ ./skipfish -W new_dict.wl [...other options...] In addition, it is almost always beneficial to seed the scanner with any number of supplemental, read-only dictionaries with common keywords useful for brute-force attacks against any website. Supplemental dictionaries can be loaded using the following syntax: $ ./skipfish -S supplemental_dict1.wl -S supplemental_dict2.wl \ -W new_dict.wl [...other options...] Note that the -W option should be used with care. The target file will be modified at the end of the scan. The -S dictionaries, on the other hand, are purely read-only. If you don't want to create a dictionary and store discovered keywords, you can use -W- (an alias for -W /dev/null). You can and should share read-only dictionaries across unrelated scans, but a separate read-write dictionary should be used for scans of unrelated targets. Otherwise, undesirable interference may occur. With this out of the way, let's quickly review the options that may be used to fine-tune various aspects of dictionary handling: -L - do not automatically learn new keywords based on site content. The scanner will still use keywords found in the specified dictionaries, if any; but will not go beyond that set. This option should not be normally used in most scanning modes; if supplied, the scanner will not be able to discover and leverage technology-specific terms and file extensions unique to the architecture of the targeted site. -G num - change jar size for keyword candidates. Up to candidates are randomly selected from site content, and periodically retried during brute-force checks; when one of them results in a unique non-404 response, it is promoted to the dictionary proper. Unsuccessful candidates are gradually replaced with new picks, and then discarded at the end of the scan. The default jar size is 256. -R num - purge all entries in the read-write that had no non-404 hits for the last scans. This option prevents dictionary creep in repeated assessments, but needs to be used with care: it will permanently nuke a part of the dictionary. -Y - inhibit full ${filename}.${extension} brute-force. In this mode, the scanner will only brute-force one component at a time, trying all possible keywords without any extension, and then trying to append extensions to any otherwise discovered content. This greatly improves scan times, but reduces coverage. Scan modes 2 and 3 shown in the next section make use of this flag. -------------- Scanning modes -------------- The basic dictionary-dependent modes you should be aware of (in order of the associated request cost): 1) Orderly crawl with no DirBuster-like brute-force at all. In this mode, the scanner will not discover non-linked resources such as /admin, /index.php.old, etc: $ ./skipfish -W- -L [...other options...] This mode is very fast, but *NOT* recommended for general use because the lack of dictionary bruteforcing will limited the coverage. Use only where absolutely necessary. 2) Orderly scan with minimal extension brute-force. In this mode, the scanner will not discover resources such as /admin, but will discover cases such as /index.php.old (once index.php itself is spotted during an orderly crawl): $ touch new_dict.wl $ ./skipfish -S dictionaries/extensions-only.wl -W new_dict.wl -Y \ [...other options...] This method is only slightly more request-intensive than #1, and therefore, is a marginally better alternative in cases where time is of essence. It's still not recommended for most uses. The cost is about 100 requests per fuzzed location. 3) Directory OR extension brute-force only. In this mode, the scanner will only try fuzzing the file name, or the extension, at any given time - but will not try every possible ${filename}.${extension} pair from the dictionary. $ touch new_dict.wl $ ./skipfish -S dictionaries/complete.wl -W new_dict.wl -Y \ [...other options...] This method has a cost of about 2,000 requests per fuzzed location, and is recommended for rapid assessments, especially when working with slow servers or very large services. 4) Normal dictionary fuzzing. In this mode, every ${filename}.${extension} pair will be attempted. This mode is significantly slower, but offers superior coverage, and should be your starting point. $ touch new_dict.wl $ ./skipfish -S dictionaries/XXX.wl -W new_dict.wl [...other options...] Replace XXX with: minimal - recommended starter dictionary, mostly focusing on backup and source files, about 60,000 requests per fuzzed location. medium - more thorough dictionary, focusing on common frameworks, about 140,000 requests. complete - all-inclusive dictionary, over 210,000 requests. Normal fuzzing mode is recommended when doing thorough assessments of reasonably responsive servers; but it may be prohibitively expensive when dealing with very large or very slow sites. ---------------------------- More about dictionary design ---------------------------- Each dictionary may consist of a number of extensions, and a number of "regular" keywords. Extensions are considered just a special subset of the keyword list. You can create custom dictionaries, conforming to this format: type hits total_age last_age keyword ...where 'type' is either 'e' or 'w' (extension or keyword), followed by a qualifier (explained below); 'hits' is the total number of times this keyword resulted in a non-404 hit in all previous scans; 'total_age' is the number of scan cycles this word is in the dictionary; 'last_age' is the number of scan cycles since the last 'hit'; and 'keyword' is the actual keyword. Qualifiers alter the meaning of an entry in the following way: wg - generic keyword that is not associated with any specific server-side technology. Examples include 'backup', 'accounting', or 'logs'. These will be indiscriminately combined with every known extension (e.g., 'backup.php') during the fuzzing process. ws - technology-specific keyword that are unlikely to have a random extension; for example, with 'cgi-bin', testing for 'cgi-bin.php' is usually a waste of time. Keywords tagged this way will be combined only with a small set of technology-agnostic extensions - e.g., 'cgi-bin.old'. NOTE: Technology-specific keywords that in the real world, are always paired with a single, specific extension, should be combined with said extension in the 'ws' entry itself, rather than trying to accommodate them with 'wg' rules. For example, 'MANIFEST.MF' is OK. eg - generic extension that is not specific to any well-defined technology, or may pop-up in administrator- or developer-created auxiliary content. Examples include 'bak', 'old', 'txt', or 'log'. es - technology-specific extension, such as 'php', or 'cgi', that are unlikely to spontaneously accompany random 'ws' keywords. Skipfish leverages this distinction by only trying the following brute-force combinations: /some/path/wg_keyword ('index') /some/path/ws_keyword ('cgi-bin') /some/path/wg_extension ('old') /some/path/ws_extension ('php') /some/path/wg_keyword.wg_extension ('index.old') /some/path/wg_keyword.ws_extension ('index.php') /some/path/ws_keyword.ws_extension ('cgi-bin.old') To decide between 'wg' and 'ws', consider if you are likely to ever encounter files such as ${this_word}.php or ${this_word}.class. If not, tag the keyword as 'ws'. Similarly, to decide between 'eg' and 'es', think about the possibility of encountering cgi-bin.${this_ext} or zencart.${this_ext}. If it seems unlikely, choose 'es'. For your convenience, all legacy keywords and extensions, as well as any entries detected automatically, will be stored in the dictionary with a '?' qualifier. This is equivalent to 'g', and is meant to assist the user in reviewing and triaging any automatically acquired dictionary data. Other notes about dictionaries: - Do not duplicate extensions as keywords - if you already have 'html' as an 'e' entry, there is no need to also create a 'w' one. - There must be no empty or malformed lines, or comments, in the wordlist file. Extension keywords must have no leading dot (e.g., 'exe', not '.exe'), and all keywords should be NOT url-encoded (e.g., 'Program Files', not 'Program%20Files'). No keyword should exceed 64 characters. - Any valuable dictionary can be tagged with an optional '#ro' line at the beginning. This prevents it from being loaded in read-write mode. - Tread carefully; poor wordlists are one of the reasons why some web security scanners perform worse than expected. You will almost always be better off narrowing down or selectively extending the supplied set (and possibly contributing back your changes upstream!), than importing a giant wordlist scored elsewhere.