Web Analysis Module

The web analysis module examines discovered web assets to identify technologies, extract content, and prepare targets for vulnerability testing.


Why Web Analysis?

Before scanning for vulnerabilities, you need to understand what you're scanning:

  1. Target Validation: Not all subdomains run web servers. HTTP probing identifies which hosts actually serve web content on which ports.

  2. Technology Fingerprinting: Different technologies have different vulnerabilities. Knowing that a target runs WordPress vs Django changes your attack approach.

  3. Attack Surface Mapping: URL collection reveals:

    • Hidden endpoints not linked in the UI

    • API routes

    • Admin panels

    • Legacy code paths

  4. Input Discovery: Parameters and endpoints found here become test targets for vulnerability scanning. Without URL collection, scanners miss most of the attack surface.

  5. Efficiency: Analyzing JavaScript and crawling once, then reusing results for multiple vulnerability tests, is more efficient than having each scanner crawl independently.


Module Overview

Function
Purpose
Tools

webprobe_simple

Probe ports 80/443

httpx

webprobe_full

Probe uncommon web ports

httpx

screenshot

Capture web screenshots

nuclei

virtualhosts

Virtual host discovery

VhostFinder

urlchecks

URL collection (passive + active)

urlfinder, katana, JSA

url_gf

URL pattern classification

gf, urless

url_ext

File extension sorting

custom

jschecks

JavaScript analysis

subjs, xnLinkFinder, mantra, jsluice

fuzz

Directory fuzzing

ffuf

cms_scanner

CMS detection

CMSeeK

wordlist_gen

Custom wordlist generation

custom

wordlist_gen_roboxtractor

Robots.txt wordlist

roboxtractor

password_dict

Password dictionary generation

pydictor

iishortname

IIS shortname scanning

shortscan, sns

graphql_scan

GraphQL endpoint detection

nuclei, GQLSpection

grpc_reflection

gRPC reflection probing

grpcurl

param_discovery

Parameter discovery

arjun

websocket_checks

WebSocket auditing

custom


Configuration Options


HTTP Probing

webprobe_simple - Standard Port Probing

Identifies live web servers on standard ports (80, 443).

How It Works:

Information Extracted:

  • Status codes

  • Page titles

  • Web server type

  • Technologies detected

  • Content length

  • Redirect locations

Output:

Sample Output (webs_info.txt):

Configuration:


webprobe_full - Uncommon Port Probing

Probes an extended list of ports commonly used for web services.

Ports Checked:

Output:

Configuration:


Screenshots

screenshot - Web Screenshot Capture

Captures screenshots of all discovered web servers for visual analysis.

How It Works:

Output:

Change Detection:

reconFTW creates SHA256 hashes of screenshots to detect visual changes between scans:

Configuration:


Virtual Hosts

virtualhosts - Virtual Host Discovery

Discovers virtual hosts by fuzzing the HTTP Host header.

How It Works:

Why It's Useful: Many servers host multiple websites on the same IP. Virtual host fuzzing reveals:

  • Hidden admin panels

  • Development sites

  • Internal applications

  • Additional attack surface

Output:

Configuration:


URL Collection

urlchecks - Full URL Extraction

Collects URLs from multiple sources for full coverage.

Passive Sources:

  • Wayback Machine

  • Common Crawl

  • AlienVault OTX

  • URLScan.io

Active Sources:

  • Katana web crawler

  • JavaScript parsing

  • Sitemap analysis

How It Works:

Output:

Configuration:


url_gf - URL Pattern Classification

Classifies URLs by potential vulnerability patterns using gf patterns.

Patterns Detected:

Pattern
Description

xss

Potential XSS parameters

sqli

SQL injection candidates

ssrf

SSRF-prone URLs

redirect

Open redirect parameters

rce

Command injection candidates

lfi

Local file inclusion

ssti

Template injection

idor

Insecure direct object reference

debug_logic

Debug/admin endpoints

Output:

Sample XSS Pattern Match:

Configuration:


url_ext - File Extension Sorting

Organizes URLs by file extension for targeted analysis.

Categories:

Configuration:


JavaScript Analysis

jschecks - Full JS Analysis

Extracts secrets, endpoints, and sensitive information from JavaScript files.

What It Finds:

  • API keys and tokens

  • AWS credentials

  • Internal endpoints

  • Hardcoded passwords

  • Debug information

  • Hidden functionality

Tools Used:

  • subjs: JS file discovery

  • xnLinkFinder: Endpoint extraction

  • mantra: Secret patterns

  • jsluice: Advanced JS parsing

  • nuclei: JS secret templates

  • sourcemapper: Source map extraction

How It Works:

Output:

Sample Secrets Found:

Configuration:


Directory Fuzzing

fuzz - Web Directory Fuzzing

Discovers hidden directories, files, and endpoints.

How It Works:

Wordlists:

  • Primary: $fuzz_wordlist

  • Custom generated from target

Output:

Sample Findings:

Configuration:


CMS Detection

cms_scanner - CMS Identification

Identifies content management systems and their versions.

CMS Detected:

  • WordPress

  • Joomla

  • Drupal

  • Magento

  • Shopify

  • And 170+ more...

Information Extracted:

  • CMS type and version

  • Installed plugins/themes

  • Known vulnerabilities

  • Configuration issues

Output:

Sample Output:

Configuration:


Advanced Analysis

iishortname - IIS Shortname Scanner

Exploits IIS shortname vulnerability to discover hidden files/directories.

How It Works:

Windows IIS servers may expose 8.3 format filenames through timing attacks, revealing:

  • Hidden directories

  • Backup files

  • Configuration files

Output:

Sample Output:

Configuration:


graphql_scan - GraphQL Endpoint Detection

Discovers and analyzes GraphQL endpoints.

What It Checks:

  • /graphql

  • /graphiql

  • /api/graphql

  • Custom endpoints

Analysis:

  • Introspection enabled?

  • Schema extraction

  • Query suggestions

Output:

Configuration:


param_discovery - Parameter Discovery

Discovers hidden parameters on web endpoints.

How It Works:

Output:

Sample Output:

Configuration:


Wordlist Generation

wordlist_gen - Custom Wordlist Creation

Generates target-specific wordlists from discovered content.

Sources:

  • JavaScript content

  • HTML content

  • URL paths

  • Parameter names

  • Domain-specific terms

Output:

Configuration:


wordlist_gen_roboxtractor - Robots.txt Analysis

Extracts historical disallowed paths from Wayback Machine.

How It Works:

Output:

Configuration:


password_dict - Password Dictionary Generation

Generates target-specific password lists based on the domain name.

How It Works:

The function takes the first part of the domain (e.g., "target" from "target.com") and generates password variations using:

  • Leetspeak transformations (a→4, e→3, etc.)

  • Common suffixes (123, !, 2024, etc.)

  • Length constraints

Output:

Sample Output (for target.com):

Configuration:

Use Cases:

  • Password spraying attacks (with authorization)

  • Testing default/weak credential policies

  • Generating custom wordlists for brute-force


grpc_reflection - gRPC Reflection Probing

Discovers gRPC services with reflection enabled.

What is gRPC Reflection?

gRPC reflection allows clients to query a server for available services and methods without prior knowledge. When enabled (often for debugging), it exposes the entire API surface.

How It Works:

Output:

Sample Output:

Configuration:

Security Implications:

  • Exposed reflection reveals internal API structure

  • Service names may reveal business logic

  • Combined with protobuf enumeration = full API mapping

Requirements:

  • grpcurl installed

  • Network access to gRPC ports


Output Summary

File
Content

webs/webs.txt

Live web servers

webs/webs_info.txt

Detailed probe results

webs/url_extract.txt

All discovered URLs

screenshots/

Web screenshots

gf/*.txt

Pattern-classified URLs

js/js_secrets.txt

JavaScript secrets

fuzzing/

Directory fuzzing results

webs/cms_scanner.txt

CMS detection results


Best Practices

  1. Rate Limiting: Respect target resources with appropriate rate limits

  2. Scope Filtering: Use -x flag to exclude out-of-scope URLs

  3. Screenshot Review: Visual inspection often reveals interesting assets

  4. JS Analysis Priority: JavaScript often contains the most valuable secrets

  5. Custom Wordlists: Generated wordlists improve fuzzing effectiveness


Next Steps

Last updated