Web Analysis Module
The web analysis module examines discovered web assets to identify technologies, extract content, and prepare targets for vulnerability testing.
Why Web Analysis?
Before scanning for vulnerabilities, you need to understand what you're scanning:
Target Validation: Not all subdomains run web servers. HTTP probing identifies which hosts actually serve web content on which ports.
Technology Fingerprinting: Different technologies have different vulnerabilities. Knowing that a target runs WordPress vs Django changes your attack approach.
Attack Surface Mapping: URL collection reveals:
Hidden endpoints not linked in the UI
API routes
Admin panels
Legacy code paths
Input Discovery: Parameters and endpoints found here become test targets for vulnerability scanning. Without URL collection, scanners miss most of the attack surface.
Efficiency: Analyzing JavaScript and crawling once, then reusing results for multiple vulnerability tests, is more efficient than having each scanner crawl independently.
Module Overview
webprobe_simple
Probe ports 80/443
httpx
webprobe_full
Probe uncommon web ports
httpx
screenshot
Capture web screenshots
nuclei
virtualhosts
Virtual host discovery
VhostFinder
urlchecks
URL collection (passive + active)
urlfinder, katana, JSA
url_gf
URL pattern classification
gf, urless
url_ext
File extension sorting
custom
jschecks
JavaScript analysis
subjs, xnLinkFinder, mantra, jsluice
fuzz
Directory fuzzing
ffuf
cms_scanner
CMS detection
CMSeeK
wordlist_gen
Custom wordlist generation
custom
wordlist_gen_roboxtractor
Robots.txt wordlist
roboxtractor
password_dict
Password dictionary generation
pydictor
iishortname
IIS shortname scanning
shortscan, sns
graphql_scan
GraphQL endpoint detection
nuclei, GQLSpection
grpc_reflection
gRPC reflection probing
grpcurl
param_discovery
Parameter discovery
arjun
websocket_checks
WebSocket auditing
custom
Configuration Options
HTTP Probing
webprobe_simple - Standard Port Probing
webprobe_simple - Standard Port ProbingIdentifies live web servers on standard ports (80, 443).
How It Works:
Information Extracted:
Status codes
Page titles
Web server type
Technologies detected
Content length
Redirect locations
Output:
Sample Output (webs_info.txt):
Configuration:
webprobe_full - Uncommon Port Probing
webprobe_full - Uncommon Port ProbingProbes an extended list of ports commonly used for web services.
Ports Checked:
Output:
Configuration:
Screenshots
screenshot - Web Screenshot Capture
screenshot - Web Screenshot CaptureCaptures screenshots of all discovered web servers for visual analysis.
How It Works:
Output:
Change Detection:
reconFTW creates SHA256 hashes of screenshots to detect visual changes between scans:
Configuration:
Virtual Hosts
virtualhosts - Virtual Host Discovery
virtualhosts - Virtual Host DiscoveryDiscovers virtual hosts by fuzzing the HTTP Host header.
How It Works:
Why It's Useful: Many servers host multiple websites on the same IP. Virtual host fuzzing reveals:
Hidden admin panels
Development sites
Internal applications
Additional attack surface
Output:
Configuration:
URL Collection
urlchecks - Full URL Extraction
urlchecks - Full URL ExtractionCollects URLs from multiple sources for full coverage.
Passive Sources:
Wayback Machine
Common Crawl
AlienVault OTX
URLScan.io
Active Sources:
Katana web crawler
JavaScript parsing
Sitemap analysis
How It Works:
Output:
Configuration:
url_gf - URL Pattern Classification
url_gf - URL Pattern ClassificationClassifies URLs by potential vulnerability patterns using gf patterns.
Patterns Detected:
xss
Potential XSS parameters
sqli
SQL injection candidates
ssrf
SSRF-prone URLs
redirect
Open redirect parameters
rce
Command injection candidates
lfi
Local file inclusion
ssti
Template injection
idor
Insecure direct object reference
debug_logic
Debug/admin endpoints
Output:
Sample XSS Pattern Match:
Configuration:
url_ext - File Extension Sorting
url_ext - File Extension SortingOrganizes URLs by file extension for targeted analysis.
Categories:
Configuration:
JavaScript Analysis
jschecks - Full JS Analysis
jschecks - Full JS AnalysisExtracts secrets, endpoints, and sensitive information from JavaScript files.
What It Finds:
API keys and tokens
AWS credentials
Internal endpoints
Hardcoded passwords
Debug information
Hidden functionality
Tools Used:
subjs: JS file discovery
xnLinkFinder: Endpoint extraction
mantra: Secret patterns
jsluice: Advanced JS parsing
nuclei: JS secret templates
sourcemapper: Source map extraction
How It Works:
Output:
Sample Secrets Found:
Configuration:
Directory Fuzzing
fuzz - Web Directory Fuzzing
fuzz - Web Directory FuzzingDiscovers hidden directories, files, and endpoints.
How It Works:
Wordlists:
Primary:
$fuzz_wordlistCustom generated from target
Output:
Sample Findings:
Configuration:
CMS Detection
cms_scanner - CMS Identification
cms_scanner - CMS IdentificationIdentifies content management systems and their versions.
CMS Detected:
WordPress
Joomla
Drupal
Magento
Shopify
And 170+ more...
Information Extracted:
CMS type and version
Installed plugins/themes
Known vulnerabilities
Configuration issues
Output:
Sample Output:
Configuration:
Advanced Analysis
iishortname - IIS Shortname Scanner
iishortname - IIS Shortname ScannerExploits IIS shortname vulnerability to discover hidden files/directories.
How It Works:
Windows IIS servers may expose 8.3 format filenames through timing attacks, revealing:
Hidden directories
Backup files
Configuration files
Output:
Sample Output:
Configuration:
graphql_scan - GraphQL Endpoint Detection
graphql_scan - GraphQL Endpoint DetectionDiscovers and analyzes GraphQL endpoints.
What It Checks:
/graphql/graphiql/api/graphqlCustom endpoints
Analysis:
Introspection enabled?
Schema extraction
Query suggestions
Output:
Configuration:
param_discovery - Parameter Discovery
param_discovery - Parameter DiscoveryDiscovers hidden parameters on web endpoints.
How It Works:
Output:
Sample Output:
Configuration:
Wordlist Generation
wordlist_gen - Custom Wordlist Creation
wordlist_gen - Custom Wordlist CreationGenerates target-specific wordlists from discovered content.
Sources:
JavaScript content
HTML content
URL paths
Parameter names
Domain-specific terms
Output:
Configuration:
wordlist_gen_roboxtractor - Robots.txt Analysis
wordlist_gen_roboxtractor - Robots.txt AnalysisExtracts historical disallowed paths from Wayback Machine.
How It Works:
Output:
Configuration:
password_dict - Password Dictionary Generation
password_dict - Password Dictionary GenerationGenerates target-specific password lists based on the domain name.
How It Works:
The function takes the first part of the domain (e.g., "target" from "target.com") and generates password variations using:
Leetspeak transformations (a→4, e→3, etc.)
Common suffixes (123, !, 2024, etc.)
Length constraints
Output:
Sample Output (for target.com):
Configuration:
Use Cases:
Password spraying attacks (with authorization)
Testing default/weak credential policies
Generating custom wordlists for brute-force
grpc_reflection - gRPC Reflection Probing
grpc_reflection - gRPC Reflection ProbingDiscovers gRPC services with reflection enabled.
What is gRPC Reflection?
gRPC reflection allows clients to query a server for available services and methods without prior knowledge. When enabled (often for debugging), it exposes the entire API surface.
How It Works:
Output:
Sample Output:
Configuration:
Security Implications:
Exposed reflection reveals internal API structure
Service names may reveal business logic
Combined with protobuf enumeration = full API mapping
Requirements:
grpcurlinstalledNetwork access to gRPC ports
Output Summary
webs/webs.txt
Live web servers
webs/webs_info.txt
Detailed probe results
webs/url_extract.txt
All discovered URLs
screenshots/
Web screenshots
gf/*.txt
Pattern-classified URLs
js/js_secrets.txt
JavaScript secrets
fuzzing/
Directory fuzzing results
webs/cms_scanner.txt
CMS detection results
Best Practices
Rate Limiting: Respect target resources with appropriate rate limits
Scope Filtering: Use
-xflag to exclude out-of-scope URLsScreenshot Review: Visual inspection often reveals interesting assets
JS Analysis Priority: JavaScript often contains the most valuable secrets
Custom Wordlists: Generated wordlists improve fuzzing effectiveness
Next Steps
Vulnerability Module - Test for security issues
Output Interpretation - Understand results
Last updated