Web Analysis Module
The web analysis module examines discovered web assets to identify technologies, extract content, and prepare targets for vulnerability testing.
Module Overview
webprobe_simple
Probe ports 80/443
httpx
webprobe_full
Probe uncommon web ports
httpx
screenshot
Capture web screenshots
nuclei
virtualhosts
Virtual host discovery
VhostFinder
urlchecks
URL collection (passive + active)
urlfinder, katana, JSA
url_gf
URL pattern classification
gf, urless
url_ext
File extension sorting
custom
jschecks
JavaScript analysis
subjs, xnLinkFinder, mantra, jsluice
fuzz
Directory fuzzing
ffuf
cms_scanner
CMS detection
CMSeeK
wordlist_gen
Custom wordlist generation
custom
wordlist_gen_roboxtractor
Robots.txt wordlist
roboxtractor
password_dict
Password dictionary generation
pydictor
iishortname
IIS shortname scanning
shortscan, sns
graphql_scan
GraphQL endpoint detection
nuclei, GQLSpection
grpc_reflection
gRPC reflection probing
grpcurl
param_discovery
Parameter discovery
arjun
websocket_checks
WebSocket auditing
custom
Configuration Options
HTTP Probing
webprobe_simple - Standard Port Probing
webprobe_simple - Standard Port ProbingIdentifies live web servers on standard ports (80, 443).
How It Works:
Information Extracted:
Status codes
Page titles
Web server type
Technologies detected
Content length
Redirect locations
Output:
Sample Output (webs_info.txt):
Configuration:
webprobe_full - Uncommon Port Probing
webprobe_full - Uncommon Port ProbingProbes an extended list of ports commonly used for web services.
Ports Checked:
Output:
Configuration:
Screenshots
screenshot - Web Screenshot Capture
screenshot - Web Screenshot CaptureCaptures screenshots of all discovered web servers for visual analysis.
How It Works:
Output:
Change Detection:
reconFTW creates SHA256 hashes of screenshots to detect visual changes between scans:
Configuration:
Virtual Hosts
virtualhosts - Virtual Host Discovery
virtualhosts - Virtual Host DiscoveryDiscovers virtual hosts by fuzzing the HTTP Host header.
How It Works:
Why It's Useful: Many servers host multiple websites on the same IP. Virtual host fuzzing reveals:
Hidden admin panels
Development sites
Internal applications
Additional attack surface
Output:
Configuration:
URL Collection
urlchecks - Full URL Extraction
urlchecks - Full URL ExtractionCollects URLs from multiple sources for full coverage.
Passive Sources:
Wayback Machine
Common Crawl
AlienVault OTX
URLScan.io
Active Sources:
Katana web crawler
JavaScript parsing
Sitemap analysis
How It Works:
Output:
Configuration:
url_gf - URL Pattern Classification
url_gf - URL Pattern ClassificationClassifies URLs by potential vulnerability patterns using gf patterns.
Patterns Detected:
xss
Potential XSS parameters
sqli
SQL injection candidates
ssrf
SSRF-prone URLs
redirect
Open redirect parameters
rce
Command injection candidates
lfi
Local file inclusion
ssti
Template injection
idor
Insecure direct object reference
debug_logic
Debug/admin endpoints
Output:
Sample XSS Pattern Match:
Configuration:
url_ext - File Extension Sorting
url_ext - File Extension SortingOrganizes URLs by file extension for targeted analysis.
Categories:
Configuration:
JavaScript Analysis
jschecks - Full JS Analysis
jschecks - Full JS AnalysisExtracts secrets, endpoints, and sensitive information from JavaScript files.
What It Finds:
API keys and tokens
AWS credentials
Internal endpoints
Hardcoded passwords
Debug information
Hidden functionality
Tools Used:
subjs: JS file discovery
xnLinkFinder: Endpoint extraction
mantra: Secret patterns
jsluice: Advanced JS parsing
nuclei: JS secret templates
sourcemapper: Source map extraction
How It Works:
Output:
Sample Secrets Found:
Configuration:
Directory Fuzzing
fuzz - Web Directory Fuzzing
fuzz - Web Directory FuzzingDiscovers hidden directories, files, and endpoints.
How It Works:
Wordlists:
Primary:
$fuzz_wordlistCustom generated from target
Output:
Sample Findings:
Configuration:
CMS Detection
cms_scanner - CMS Identification
cms_scanner - CMS IdentificationIdentifies content management systems and their versions.
CMS Detected:
WordPress
Joomla
Drupal
Magento
Shopify
And 170+ more...
Information Extracted:
CMS type and version
Installed plugins/themes
Known vulnerabilities
Configuration issues
Output:
Sample Output:
Configuration:
Advanced Analysis
iishortname - IIS Shortname Scanner
iishortname - IIS Shortname ScannerExploits IIS shortname vulnerability to discover hidden files/directories.
How It Works:
Windows IIS servers may expose 8.3 format filenames through timing attacks, revealing:
Hidden directories
Backup files
Configuration files
Output:
Sample Output:
Configuration:
graphql_scan - GraphQL Endpoint Detection
graphql_scan - GraphQL Endpoint DetectionDiscovers and analyzes GraphQL endpoints.
What It Checks:
/graphql/graphiql/api/graphqlCustom endpoints
Analysis:
Introspection enabled?
Schema extraction
Query suggestions
Output:
Configuration:
param_discovery - Parameter Discovery
param_discovery - Parameter DiscoveryDiscovers hidden parameters on web endpoints.
How It Works:
Output:
Sample Output:
Configuration:
Wordlist Generation
wordlist_gen - Custom Wordlist Creation
wordlist_gen - Custom Wordlist CreationGenerates target-specific wordlists from discovered content.
Sources:
JavaScript content
HTML content
URL paths
Parameter names
Domain-specific terms
Output:
Configuration:
wordlist_gen_roboxtractor - Robots.txt Analysis
wordlist_gen_roboxtractor - Robots.txt AnalysisExtracts historical disallowed paths from Wayback Machine.
How It Works:
Output:
Configuration:
password_dict - Password Dictionary Generation
password_dict - Password Dictionary GenerationGenerates target-specific password lists based on the domain name.
How It Works:
The function takes the first part of the domain (e.g., "target" from "target.com") and generates password variations using:
Leetspeak transformations (a→4, e→3, etc.)
Common suffixes (123, !, 2024, etc.)
Length constraints
Output:
Sample Output (for target.com):
Configuration:
Use Cases:
Password spraying attacks (with authorization)
Testing default/weak credential policies
Generating custom wordlists for brute-force
grpc_reflection - gRPC Reflection Probing
grpc_reflection - gRPC Reflection ProbingDiscovers gRPC services with reflection enabled.
What is gRPC Reflection?
gRPC reflection allows clients to query a server for available services and methods without prior knowledge. When enabled (often for debugging), it exposes the entire API surface.
How It Works:
Output:
Sample Output:
Configuration:
Security Implications:
Exposed reflection reveals internal API structure
Service names may reveal business logic
Combined with protobuf enumeration = full API mapping
Requirements:
grpcurlinstalledNetwork access to gRPC ports
Output Summary
webs/webs.txt
Live web servers
webs/webs_info.txt
Detailed probe results
webs/url_extract.txt
All discovered URLs
screenshots/
Web screenshots
gf/*.txt
Pattern-classified URLs
js/js_secrets.txt
JavaScript secrets
fuzzing/
Directory fuzzing results
webs/cms_scanner.txt
CMS detection results
Best Practices
Rate Limiting: Respect target resources with appropriate rate limits
Scope Filtering: Use
-xflag to exclude out-of-scope URLsScreenshot Review: Visual inspection often reveals interesting assets
JS Analysis Priority: JavaScript often contains the most valuable secrets
Custom Wordlists: Generated wordlists improve fuzzing effectiveness
Next Steps
Vulnerability Module - Test for security issues
Output Interpretation - Understand results
Last updated