Methodology

How we collect and analyze threats

Overview

TroyanosYVirus uses a distributed network of **honeypot** sensors (decoy systems) deployed across multiple geographic locations to attract, detect and analyze cyber attacks in real time. A honeypot is a computer system configured to appear as a legitimate and attractive target for attackers. When an attacker interacts with the honeypot, all their actions are logged and analyzed, providing valuable intelligence about: - **Attack vectors** currently in use - **Malicious IPs** and their geolocation - **Credentials** tested in brute force attacks - **Malware** deployed by attackers - **Commands** executed after gaining access

Infrastructure

Our network consists of a **distributed network of T-Pot honeypot sensors** deployed across multiple geographic locations, covering different regions of the world.

Europe

Multiple countries

North America

Multiple countries

Asia-Pacific

Multiple countries

APAC

The geographic distribution of our sensors allows us to capture attacks from different regions and obtain a global view of the threat landscape. For operational security reasons, we do not disclose the exact number or specific locations of our sensors.

T-Pot: Multi-Honeypot Platform

We use **T-Pot**, an all-in-one honeypot platform developed by Deutsche Telekom Security. T-Pot integrates multiple specialized honeypots:

Cowrie

Medium/high interaction SSH/Telnet honeypot. Captures credentials, commands and malware.

Dionaea

Captures malware exploiting vulnerabilities in SMB, HTTP, FTP, MySQL, etc.

Honeytrap

Low interaction honeypot for capturing generic network attacks.

Conpot

Simulates industrial control systems (ICS/SCADA).

Mailoney

SMTP honeypot for capturing spam and phishing.

ADBHoney

Emulates Android Debug Bridge to detect attacks on Android devices.

Glutton

SSH proxy honeypot that forwards connections to Cowrie.

Heralding

Captures credentials from multiple protocols (FTP, SSH, Telnet, HTTP, etc.).

Data Flow

Capture

Honeypots log all attacker interactions: IPs, ports, credentials, commands, payloads.

→

Aggregation

Data is sent to a central server where it is normalized and stored in Elasticsearch.

→

Enrichment

IPs are enriched with geolocation, ASN/ISP data, and reputation from external sources.

→

Analysis

Risk scores are calculated, patterns identified and threats classified.

→

Visualization

Processed data is exposed through the web and API for public query.

Risk Score Calculation

Each detected IP receives a **risk score** (0-100) based on multiple factors:

Factors

25%

Attack volume

Total number of malicious events

20%

Attack diversity

Different types of attacks performed

15%

Honeypots affected

Number of honeypots that detected the IP

20%

Associated malware

If the IP deployed known malware

10%

Recency

Time since last activity

10%

External reputation

Data from blacklists and OSINT sources

Scale

0-39

Low

Minimal or sporadic activity

40-59

Medium

Moderate activity, possible scanning

60-79

High

Significant activity, active attacks

80-100

Critical

Severe threat, multiple attack vectors

Data Quality

We implement multiple controls to ensure data quality: - **Deduplication**: We remove duplicate events from the same attack - **IP validation**: We verify that IPs are public and valid - **Noise filtering**: We exclude known benign scans (Shodan, Censys, etc.) - **Malware verification**: Hashes are checked against VirusTotal - **Continuous updates**: Data is updated every minute

Limitations

It is essential to understand the inherent limitations of honeypot data for its correct interpretation: - **Geographic bias**: Sensor location affects which attacks are captured and which remain outside detection scope - **Targeted attacks**: Very sophisticated or targeted attacks may detect and evade honeypots, being underrepresented in the data - **Volume vs. impact**: High attack volume from an IP does not always mean greater danger; it may be low-impact automated scanning - **Attribution**: Detected IPs may be proxies, VPNs, Tor nodes, compromised machines or CDN infrastructure. The presence of an IP does not imply that its owner is responsible for the malicious activity - **False positives**: Some legitimate security scans (researchers, search engines, monitoring services) may be incorrectly classified as attacks - **Automated data**: The entire collection, enrichment and scoring process is automated, which may introduce systematic errors - **No definitive attribution**: Data from this platform should not be used as the sole source for attributing criminal activity to any person or entity

Ethical Considerations

We follow ethical principles in our research: - **No offensive interaction**: Our systems are passive, we never attack back - **Public data**: We only publish IPs that performed verifiable malicious activity - **Privacy**: We don't collect personal data from legitimate users - **Responsibility**: Data is provided "as is" without warranties - **Legitimate use**: Data is intended for research and defense, not retaliation

Frequently Asked Questions

Is the data real-time?

Data is updated every minute. The live map shows attacks from the last few seconds.

Can I use the data to block IPs?

Yes, but we recommend combining with other sources. Consider context and false positive risk.

Do you offer an API to access the data?

We are working on a public API. Contact us if you need early access for research.

How can I contribute?

You can report false positives, share data from your honeypots, or collaborate on project development.

Have questions about our methodology? Contact us.

Contact →