Methodology

How we collect and analyze threats

Overview

TroyanosYVirus uses a network of **honeypots** (decoy systems) strategically distributed across 5 continents to attract, detect and analyze cyber attacks in real time. A honeypot is a computer system configured to appear as a legitimate and attractive target for attackers. When an attacker interacts with the honeypot, all their actions are logged and analyzed, providing valuable intelligence about: - **Attack vectors** currently in use - **Malicious IPs** and their geolocation - **Credentials** tested in brute force attacks - **Malware** deployed by attackers - **Commands** executed after gaining access

Infrastructure

Our network consists of **5 T-Pot honeypot servers** located in:

πŸ‡«πŸ‡·
Paris
France
Western Europe
πŸ‡©πŸ‡ͺ
Frankfurt
Germany
Central Europe
πŸ‡΅πŸ‡±
Warsaw
Poland
Eastern Europe
πŸ‡ΈπŸ‡¬
Singapore
Singapore
Asia-Pacific
πŸ‡¨πŸ‡¦
Toronto
Canada
North America

This geographic distribution allows us to capture attacks from different regions and obtain a global view of the threat landscape.

T-Pot: Multi-Honeypot Platform

We use **T-Pot**, an all-in-one honeypot platform developed by Deutsche Telekom Security. T-Pot integrates multiple specialized honeypots:

Cowrie
Medium/high interaction SSH/Telnet honeypot. Captures credentials, commands and malware.
Dionaea
Captures malware exploiting vulnerabilities in SMB, HTTP, FTP, MySQL, etc.
Honeytrap
Low interaction honeypot for capturing generic network attacks.
Conpot
Simulates industrial control systems (ICS/SCADA).
Mailoney
SMTP honeypot for capturing spam and phishing.
ADBHoney
Emulates Android Debug Bridge to detect attacks on Android devices.
Glutton
SSH proxy honeypot that forwards connections to Cowrie.
Heralding
Captures credentials from multiple protocols (FTP, SSH, Telnet, HTTP, etc.).

Data Flow

01
Capture
Honeypots log all attacker interactions: IPs, ports, credentials, commands, payloads.
02
Aggregation
Data is sent to a central server where it is normalized and stored in Elasticsearch.
03
Enrichment
IPs are enriched with geolocation, ASN/ISP data, and reputation from external sources.
04
Analysis
Risk scores are calculated, patterns identified and threats classified.
05
Visualization
Processed data is exposed through the web and API for public query.

Risk Score Calculation

Each detected IP receives a **risk score** (0-100) based on multiple factors:

Factors

25%
Attack volume
Total number of malicious events
20%
Attack diversity
Different types of attacks performed
15%
Honeypots affected
Number of honeypots that detected the IP
20%
Associated malware
If the IP deployed known malware
10%
Recency
Time since last activity
10%
External reputation
Data from blacklists and OSINT sources

Scale

0-39
Low
Minimal or sporadic activity
40-59
Medium
Moderate activity, possible scanning
60-79
High
Significant activity, active attacks
80-100
Critical
Severe threat, multiple attack vectors

Data Quality

We implement multiple controls to ensure data quality: - **Deduplication**: We remove duplicate events from the same attack - **IP validation**: We verify that IPs are public and valid - **Noise filtering**: We exclude known benign scans (Shodan, Censys, etc.) - **Malware verification**: Hashes are checked against VirusTotal - **Continuous updates**: Data is updated every minute

Limitations

It's important to understand honeypot data limitations: - **Geographic bias**: Honeypot location affects which attacks are captured - **Targeted attacks**: Very sophisticated attacks may detect honeypots - **Volume vs. impact**: High attack volume doesn't always mean greater danger - **Attribution**: IPs may be proxies, VPNs or compromised machines - **False positives**: Some legitimate scans may be classified as attacks

Ethical Considerations

We follow ethical principles in our research: - **No offensive interaction**: Our systems are passive, we never attack back - **Public data**: We only publish IPs that performed verifiable malicious activity - **Privacy**: We don't collect personal data from legitimate users - **Responsibility**: Data is provided "as is" without warranties - **Legitimate use**: Data is intended for research and defense, not retaliation

Frequently Asked Questions

Is the data real-time?

Data is updated every minute. The live map shows attacks from the last few seconds.

Can I use the data to block IPs?

Yes, but we recommend combining with other sources. Consider context and false positive risk.

Do you offer an API to access the data?

We are working on a public API. Contact us if you need early access for research.

How can I contribute?

You can report false positives, share data from your honeypots, or collaborate on project development.

Have questions about our methodology? Contact us.

Contact β†’