Web Analysis Module

The web analysis module examines discovered web assets to identify technologies, extract content, and prepare targets for vulnerability testing.


Module Overview

Function
Purpose
Tools

webprobe_simple

Probe ports 80/443

httpx

webprobe_full

Probe uncommon web ports

httpx

screenshot

Capture web screenshots

nuclei

virtualhosts

Virtual host discovery

VhostFinder

urlchecks

URL collection (passive + active)

urlfinder, katana, JSA

url_gf

URL pattern classification

gf, urless

url_ext

File extension sorting

custom

jschecks

JavaScript analysis

subjs, xnLinkFinder, mantra, jsluice

fuzz

Directory fuzzing

ffuf

cms_scanner

CMS detection

CMSeeK

wordlist_gen

Custom wordlist generation

custom

wordlist_gen_roboxtractor

Robots.txt wordlist

roboxtractor

password_dict

Password dictionary generation

pydictor

iishortname

IIS shortname scanning

shortscan, sns

graphql_scan

GraphQL endpoint detection

nuclei, GQLSpection

grpc_reflection

gRPC reflection probing

grpcurl

param_discovery

Parameter discovery

arjun

websocket_checks

WebSocket auditing

custom


Configuration Options


HTTP Probing

webprobe_simple - Standard Port Probing

Identifies live web servers on standard ports (80, 443).

How It Works:

Information Extracted:

  • Status codes

  • Page titles

  • Web server type

  • Technologies detected

  • Content length

  • Redirect locations

Output:

Sample Output (webs_info.txt):

Configuration:


webprobe_full - Uncommon Port Probing

Probes an extended list of ports commonly used for web services.

Ports Checked:

Output:

Configuration:


Screenshots

screenshot - Web Screenshot Capture

Captures screenshots of all discovered web servers for visual analysis.

How It Works:

Output:

Change Detection:

reconFTW creates SHA256 hashes of screenshots to detect visual changes between scans:

Configuration:


Virtual Hosts

virtualhosts - Virtual Host Discovery

Discovers virtual hosts by fuzzing the HTTP Host header.

How It Works:

Why It's Useful: Many servers host multiple websites on the same IP. Virtual host fuzzing reveals:

  • Hidden admin panels

  • Development sites

  • Internal applications

  • Additional attack surface

Output:

Configuration:


URL Collection

urlchecks - Full URL Extraction

Collects URLs from multiple sources for full coverage.

Passive Sources:

  • Wayback Machine

  • Common Crawl

  • AlienVault OTX

  • URLScan.io

Active Sources:

  • Katana web crawler

  • JavaScript parsing

  • Sitemap analysis

How It Works:

Output:

Configuration:


url_gf - URL Pattern Classification

Classifies URLs by potential vulnerability patterns using gf patterns.

Patterns Detected:

Pattern
Description

xss

Potential XSS parameters

sqli

SQL injection candidates

ssrf

SSRF-prone URLs

redirect

Open redirect parameters

rce

Command injection candidates

lfi

Local file inclusion

ssti

Template injection

idor

Insecure direct object reference

debug_logic

Debug/admin endpoints

Output:

Sample XSS Pattern Match:

Configuration:


url_ext - File Extension Sorting

Organizes URLs by file extension for targeted analysis.

Categories:

Configuration:


JavaScript Analysis

jschecks - Full JS Analysis

Extracts secrets, endpoints, and sensitive information from JavaScript files.

What It Finds:

  • API keys and tokens

  • AWS credentials

  • Internal endpoints

  • Hardcoded passwords

  • Debug information

  • Hidden functionality

Tools Used:

  • subjs: JS file discovery

  • xnLinkFinder: Endpoint extraction

  • mantra: Secret patterns

  • jsluice: Advanced JS parsing

  • nuclei: JS secret templates

  • sourcemapper: Source map extraction

How It Works:

Output:

Sample Secrets Found:

Configuration:


Directory Fuzzing

fuzz - Web Directory Fuzzing

Discovers hidden directories, files, and endpoints.

How It Works:

Wordlists:

  • Primary: $fuzz_wordlist

  • Custom generated from target

Output:

Sample Findings:

Configuration:


CMS Detection

cms_scanner - CMS Identification

Identifies content management systems and their versions.

CMS Detected:

  • WordPress

  • Joomla

  • Drupal

  • Magento

  • Shopify

  • And 170+ more...

Information Extracted:

  • CMS type and version

  • Installed plugins/themes

  • Known vulnerabilities

  • Configuration issues

Output:

Sample Output:

Configuration:


Advanced Analysis

iishortname - IIS Shortname Scanner

Exploits IIS shortname vulnerability to discover hidden files/directories.

How It Works:

Windows IIS servers may expose 8.3 format filenames through timing attacks, revealing:

  • Hidden directories

  • Backup files

  • Configuration files

Output:

Sample Output:

Configuration:


graphql_scan - GraphQL Endpoint Detection

Discovers and analyzes GraphQL endpoints.

What It Checks:

  • /graphql

  • /graphiql

  • /api/graphql

  • Custom endpoints

Analysis:

  • Introspection enabled?

  • Schema extraction

  • Query suggestions

Output:

Configuration:


param_discovery - Parameter Discovery

Discovers hidden parameters on web endpoints.

How It Works:

Output:

Sample Output:

Configuration:


Wordlist Generation

wordlist_gen - Custom Wordlist Creation

Generates target-specific wordlists from discovered content.

Sources:

  • JavaScript content

  • HTML content

  • URL paths

  • Parameter names

  • Domain-specific terms

Output:

Configuration:


wordlist_gen_roboxtractor - Robots.txt Analysis

Extracts historical disallowed paths from Wayback Machine.

How It Works:

Output:

Configuration:


password_dict - Password Dictionary Generation

Generates target-specific password lists based on the domain name.

How It Works:

The function takes the first part of the domain (e.g., "target" from "target.com") and generates password variations using:

  • Leetspeak transformations (a→4, e→3, etc.)

  • Common suffixes (123, !, 2024, etc.)

  • Length constraints

Output:

Sample Output (for target.com):

Configuration:

Use Cases:

  • Password spraying attacks (with authorization)

  • Testing default/weak credential policies

  • Generating custom wordlists for brute-force


grpc_reflection - gRPC Reflection Probing

Discovers gRPC services with reflection enabled.

What is gRPC Reflection?

gRPC reflection allows clients to query a server for available services and methods without prior knowledge. When enabled (often for debugging), it exposes the entire API surface.

How It Works:

Output:

Sample Output:

Configuration:

Security Implications:

  • Exposed reflection reveals internal API structure

  • Service names may reveal business logic

  • Combined with protobuf enumeration = full API mapping

Requirements:

  • grpcurl installed

  • Network access to gRPC ports


Output Summary

File
Content

webs/webs.txt

Live web servers

webs/webs_info.txt

Detailed probe results

webs/url_extract.txt

All discovered URLs

screenshots/

Web screenshots

gf/*.txt

Pattern-classified URLs

js/js_secrets.txt

JavaScript secrets

fuzzing/

Directory fuzzing results

webs/cms_scanner.txt

CMS detection results


Best Practices

  1. Rate Limiting: Respect target resources with appropriate rate limits

  2. Scope Filtering: Use -x flag to exclude out-of-scope URLs

  3. Screenshot Review: Visual inspection often reveals interesting assets

  4. JS Analysis Priority: JavaScript often contains the most valuable secrets

  5. Custom Wordlists: Generated wordlists improve fuzzing effectiveness


Next Steps

Last updated