Skip to content

Latest commit

 

History

History
720 lines (361 loc) · 28.7 KB

information_sources.md

File metadata and controls

720 lines (361 loc) · 28.7 KB

External information sources for proactive detection of incidents

Feeds of malware URLs

Blacklists consisting of websites that have been observed hosting malware or exploit kits. By correlating such lists with logs from local network monitoring, such as proxy logs, it is possible to detect connections that may have resulted in a malware infection. While all major browsers have built-in URL blacklists, their content is provided by the vendors, so using additional sources can increase coverage. This especially applies to less-common malware or information from non-public sources that may not be available in the vendor's blacklists. Additionally, once a malware URL is discovered, it is possible to search the logs retroactively to find any past connections to the malicious websites. Another common use of this data is using it for detecting malware hosted in the constituency, which should trigger a remediation process. Since malware is often hosted on compromised websites that are abused by criminals, typically only full URLs can be used for detection, since corresponding domains and IP addresses may host other, harmless content.

Evaluation

Timeliness: Fair; most of the data is collected via automated means by crawlers scanning websites on a large scale; opportunistic malware campaigns on popular websites will be discovered quickly, however more targeted attacks or less visited sites can take significant amount of time until they appear in such feeds, if they are detected at all.

Accuracy: Low; significant false positive ratio; by the time a report is received many of the URLs no longer serve malicious content.

Ease of use: Fair; using bulk feeds for detection or blocking can be challenging due to the volume of the data and quality issues; however, using this information for early identification of compromised machines in own constituency is very straightforward and recommended even for teams with less resources.

Data volume: Global blacklists contain a lot of entries; often a single compromised website is reported multiple times, since malware is hosted at different URLs.

Completeness: Poor; usually very limited amount of information besides URLs and detection times.

Examples of sources

Abuse.ch

website status

Website: https://urlhaus.abuse.ch/browse

Provider: Community

Access: Public

DNS-BH - Malware Domain Blocklist

website status

Website: http://dns-bh.sagadc.org

Provider: For-profit entity

Access: Public

hp-hosts by Malwarebytes

website status

Website: http://hosts-file.net

Provider: For-profit entity

Access: Public

Clean MX

website status

Website: https://support.clean-mx.com/clean-mx/viruses.php

Provider: For-profit entity

Access: Registration

URLVir Active Malicious Hosts - novirusthanks

website status

Website: http://www.urlvir.com/

Provider: For-profit entity

Access: Public

NetLab 360

website status

Website: https://netlab.360.com

Provider: For-profit entity

Access: Public

Facebook threat exchange

website status

Website: https://developers.facebook.com/programs/threatexchange

Provider: For-profit entity

Access: Invite required

VXvault

website status

Website: http://vxvault.net/ViriList.php

Provider: Individual

Access: Public

Feeds of phishing sites

Lists containing recently reported or active phishing URLs. A significant part of this data comes from user reports however there is a large ecosystem of entities dealing with finding and taking down phishing sites. By correlating such lists with logs from local network monitoring, such as proxy logs, it is possible to detect a possible theft of credentials. While all major browsers have built-in URL blacklists, their content is provided by the vendors, so using additional sources can increase coverage. This especially applies to more targeted phishing campaigns and information from non-public sources that may not be available in the vendor's blacklists. Additionally, once a phishing URL is discovered, it is possible to search the logs retroactively to find any past connections to the malicious websites. Another common use of this data is using them for detecting malware hosted in the constituency, which should trigger a remediation process. Since phishing is often hosted on compromised websites that are abused by criminals, typically only full URLs can be used for detection, since corresponding domains and IP addresses may host other, harmless content.

Evaluation

Timeliness: Good; phishing databases often obtain their data from a community of users (both individuals and companies), which means that there is variance in reporting times; wide-scale phishing campaigns are usually discovered quickly.

Accuracy: Low; different providers have various levels of vetting reports; by the time a report is received, many of the URLs no longer serve malicious content.

Ease of use: Fair; using bulk feeds for detection or blocking can be challenging due to the volume of the data and quality issues; however, using this information for early identification of compromised machines in own constituency is very straightforward and recommended even for teams with less resources.

Data volume: Medium, as most phishing reports are reported or verified manually which

Completeness: Fair; apart from the URL and timestamp, providers often share the name of the brand that is being targeted and the current status of the phishing.

Examples of sources

DNS-BH - Malware Domain Blocklist

website status

Website: http://dns-bh.sagadc.org

Provider: For-profit entity

Access: Public

hp-hosts by Malwarebytes

website status

Website: http://hosts-file.net

Provider: For-profit entity

Access: Public

OpenPhish

website status

Website: https://openphish.com

Provider: For-profit entity

Access: Public

Phishtank

website status

Website: https://www.phishtank.com

Provider: For-profit entity

Access: Public

APWG

website status

Website: https://apwg.org

Provider: Non-profit

Access: Registration

Feeds of botnet command and control servers

Data on command and control servers used by malware, usually domains or IP addresses. This information is obtained by analysing individual malware samples or tracking the infrastructure used by threat actors. Addresses of command and control servers are very good network IoCs and can be used for real-time detection and blocking, but also for identification of infected machines by correlating them with network activity logs, for example netflow.

Evaluation

Timeliness: Fair; new addresses are often added after manual analysis, which can take hours or days; some sources provide data from automated tracking of specific botnets, these information can be close to realtime.

Accuracy: Good; C&C servers are usually verified before being added to a blacklist.

Ease of use: Excellent; C&C addresses can be easily correlated with network logs using existing tools.

Data volume: Low, the number of C&C servers is much smaller than other types of malicious infrastructure.

Completeness: Fair; sufficient for detection and blocking: domains or IP addresses and malware name; some sources provide additional malware-specific details that can be used for in-depth investigations.

Examples of sources

Abuse.ch

website status

Website: https://abuse.ch

Provider: Individual

Access: Public

Bambenek C&C

website status

Website: https://osint.bambenekconsulting.com/feeds

Provider: Individual

Access: Public

Cybercrime tracker

website status

Website: http://cybercrime-tracker.net

Provider: Individual

Access: Public

Spamhaus BCL

website status

Website: https://www.spamhaus.org

Provider: For-profit entity

Access: Public

Malware Corpus Tracker

website status

Website: https://tracker.h3x.eu

Provider: Individual

Access: Public

Viriback Tracker

website status

Website: http://tracker.viriback.com

Provider: Individual

Access: Public

NetLab 360

website status

Website: https://netlab.360.com

Provider: For-profit entity

Access: Public

Feeds of infected machines (bots)

IP addresses of machines infected with malware. The primary use of these reports is the identification of compromised machines in the constituency for remediation purposes. If the team does not have the authority to clean up infections, this data is used for notification. Notifications are implemented by sending email reports to responsible entities (common for national CSIRTs), by putting users in so-called "walled gardens" (possible for ISPs) or other by means suitable for specific environment.

Evaluation

Timeliness: Good; varies between daily to hourly or even realtime.

Accuracy: Excellent; this type of data is mostly collected through sinkholes, which have a low ratio of false-positives as they should not receive any legitimate connections; the main challenges are NAT and DHCP, which complicate identification of the actual infected machines.

Ease of use: Good; existing workflows and tools are well suited for handling these sources; this also applies for large-scale notifications (especially national teams).

Data volume: Depending on the constituency size; this can be one of the biggest data sources, especially when a large botnet is taken down and sinkholed; national teams may receive hundreds of thousands of reports per day or more; since the majority of infections affect home users, corporate networks receive relatively small number of reports; amount of data varies during the day, according to the daytime usage of computers in general.

Completeness: Fair; generally sufficient for notification (IP address, timestamp, ports); the main issue is the naming of malware families and botnets, as providers do not use a common taxonomy or even do not provide any name at all in some cases, which makes proper identification a challenge.

Examples of sources

Shadowserver

website status

Website: https://www.shadowserver.org

Provider: Non-profit

Access: Network owners

Spamhaus

website status

Website: https://www.spamhaus.org

Provider: For-profit entity

Access: Public

Netlab 360

website status

Website: https://netlab.360.com

Provider: For-profit entity

Access: Public

Cymru CAP

website status

Website: https://www.team-cymru.com/CSIRT-AP.html

Provider: For-profit entity

Access: National CSIRTs

Microsoft GSP

website status

Website: https://www.microsoft.com/en-us/securityengineering/gsp

Provider: For-profit entity

Access: National CSIRTs

n6

website status

Website: https://n6.cert.pl/en

Provider: National CSIRT

Access: Network owners

CERT-Bund Reports

website status

Website: https://www.bsi.bund.de/EN/Topics/IT-Crisis-Management/CERT-Bund/CERT-Reports/reports_node.html

Provider: National CSIRT

Access: National CSIRTs

Feeds with information on sources of abuse (spam, attacks, scanning)

Information on hosts that are responsible for various malicious activity on the internet, including sending spam, performing port scans, making exploitation attempts, etc. One of the main ways of collecting this information are server honeypots listening on public IPv4 addresses. This data often come in a form of various blacklists, sometimes aggregated by entire networks. The original purpose of such blacklist was using them for filtering, however due to the ease that attackers can change infrastructure, effectiveness of such approach is questionable. Nevertheless, these sources are valuable for detection of compromised machines or bad actors in own constituency. There are also services that aggregate abuse reports from multiple sources to simplify access to the data.

Evaluation

Timeliness: Fair; most sources provide data aggregated daily, however, there are some providers with more frequent updates, even realtime.

Accuracy: Good; typically, there is a small risk of false positives.

Ease of use: Good; coordinating teams have automation tools available that can handle this type of information well; for individual organisations an major challenge might be the large number of different sources of this data, with different formats and access mechanisms.

Data volume: For non-corporate networks with a large number of users the amount of reports can be very high; there is often an overlap of reports of abuse and other malicious activity related to the same addresses, since threat actors commonly use compromised machines for further attacks, spamming, proxies etc.

Completeness: Poor; apart from classification and IP addresses, there is usually little additional detail.

Examples of sources

Antihacker Alliance

website status

Website: https://anti-hacker-alliance.com

Provider: Community

Access: Public

Greensnow.co

website status

Website: https://greensnow.co

Provider: Non-profit

Access: Public

Nothink

website status

Website: http://www.nothink.org

Provider: Individual

Access: Public

Turris greylist

website status

Website: https://project.turris.cz

Provider: Non-profit

Access: Public

Firehol

website status

Website: https://firehol.org

Provider: Community

Access: Public

Stop forum spam

website status

Website: https://www.stopforumspam.com

Provider: Non-Profit

Access: Public

Uceprotect

website status

Website: http://www.uceprotect.net/

Provider: Non-Profit

Access: Public

Netlab 360

website status

Website: https://netlab.360.com

Provider: For Profit entity

Access: Public

Cymru CAP

website status

Website: https://www.team-cymru.com/CSIRT-AP.html

Provider: For-profit Entity

Access: Public

n6

website status

Website: https://n6.cert.pl/en

Provider: national CSIRT

Access: Network owners

Information sharing platforms

Systems that facilitate exchange of IoCs, advisories and other threat intelligence. One of the main benefits of such platforms is the fact that they aggregate a large amount of information and provide convenient ways to access it. Depending on the platform, most of the content may come from the vendor, individual researchers, or CSIRTs. Some users provide their original findings to but also a significant part might be information obtained elsewhere that is imported into the platform for correlation and easier access. One of the common use cases for the information sharing platforms is exporting IoCs in bulk and using them for realtime detection. Analysts may also want to browse individual advisories and follow-up the relevant ones with investigations to determine if the constituency has been affected by a similar attack. While almost all of the platforms are provided as an online service, MISP is an exception as it allows self-hosting and uses a federated model of sharing.

Evaluation

Timeliness: Good; depends on how active the user community is; information from other sources is usually imported within hours to days.

Accuracy: N/A; depends on the particular contribution; in general there is no platform-wide verification of the data and the consumer must understand trustworthiness of different contributors.

Ease of use: Fair; features offered by the platform come with additional complexity; in general, personnel needs additional training to take full advantage of these sources.

Data volume: Data aggregation means that the total amount of entries can be high and keeping up with the new contributions can be challenging for analysts; on the other hand, the amount of data should not be a problem in the case of automated processing.

Completeness: N/A; varies: platforms allow for adding rich contextual information, however, the actual level of detail depends on the contributor; usually the amount of data is sufficient for understanding the context and using the data for proactive detection.

Examples of sources

Threatstream

website status

Website: https://www.anomali.com/community

Provider: For-profit entity

Access: Public

IBM X-Force Exchange

website status

Website: https://exchange.xforce.ibmcloud.com/

Provider: For-profit entity

Access: Registration

MISP

website status

Website: https://www.misp-project.org/ (software)

Provider: Community

Access: Public

OTX Alienvault

website status

Website: https://otx.alienvault.com

Provider: Community

Access: Public

Network indicators of compromise for monitoring

Feeds of indicators that describe patterns in network traffic corresponding to known attacks, botnet communication, etc. and are specifically tailored for use in NIDSes. While commercial IDS vendors provide their own feeds, it is usually possible to add custom rules. The de-facto standard for network indicators are rules compatible with Snort, a rule language which balances the expressive power and a design that allow the analysis of multiple gigabits of traffic per second in realtime. Most of the rules focus on characteristic elements of the payload or application-protocol headers, as these are less likely to be changed by threat actors than IP addresses. Nevertheless, IPs or domains, especially of the C&C servers, can be used for indicators as well.

Evaluation

Timeliness: Good; commercial feeds are frequently updated, typically on a daily basis; other are usually less timely.

Accuracy: Fair; varies between providers; open source feeds tend to have higher false positive rates in comparison with commercial offerings.

Ease of use: Excellent; typically these feeds can be easily imported in IDSes.

Data volume: The number of rules is low enough not to require any special consideration for importing in IDSes.

Completeness: Good; classification of detected events; references to external analyses and taxonomies (for example CVE).

Examples of sources

Emerging Threats

website status

Website: https://rules.emergingthreats.net

Provider: For-profit entity

Access: Public

Snort Community

website status

Website: https://www.snort.org/downloads

Provider: For-profit entity

Access: Registration

Malware intelligence

Services that provide information from static and dynamic analysis of malware samples and other related intelligence, going beyond a sandbox service. This type of services usually offer access to a large data repository with the analysis results and extensive query capabilities to facilitate investigations, research and tracking of particular malware families. One of the common methods for finding new malware samples relevant for the constituency is using YARA signatures that can be matched against newly observed samples or for historical data.

Evaluation

Timeliness: Good; information is collected continuously; results of automated analyses are available within minutes of submission; malware samples and other types of intelligence might depend on submission by the community, however for large-scale crimeware campaigns, they are usually available within hours of appearing in the wild.

Accuracy: Fair; varied: automated analyses, especially sandboxes, can provide data that is not suitable as indicators; for most of the information analyst's interpretation is required.

Ease of use: Fair; personnel must have some understanding of malware analysis to use such services and interpret results; more advanced functionality requires in-depth expertise.

Data volume: N/A; services are backed by large data repositories, however teams search and access only the information that is needed for tracking particular threats (for example specific malware families) or relevant to an investigation; full datasets are not shared.

Completeness: Excellent; these type of sources can provide very detailed analysis reports, with a lot of contextual information and observables that can be used for further pivoting.

Examples of sources

Hybrid Analysis

website status

Website: https://www.hybrid-analysis.com

Provider: For-profit entity

Access: Public

Malshare.com

website status

Website: https://malshare.com

Provider: Community

Access: Public

VT Enterprise (VirusTotal)

website status

Website: https://www.virustotal.com/gui/services-overview

Provider: For-profit entity

Access: Paid

mwdb.cert.pl

website status

Website: https://mwdb.cert.pl

Provider: National CSIRT

Access: Registration

Feeds of defaced websites

Lists of compromised websites with modified content. It is an important source for detecting defacements before the affected entity reports it to the team, so the remediation process can be triggered as soon as possible.

Evaluation

Timeliness: Good; the primary source of these reports are user submission, so the timeliness may vary, however it is usually within hours.

Accuracy: Fair; entries may be verified manually by the provider, which ensures certain trustworthiness; otherwise such reports must be treated with care as false reports, which can happen often.

Ease of use: Good; the information is straightforward to handle and does not require advanced tooling to process.

Data volume: Number of defacements relevant to the constituency is typically small, which means that each case can be verified and investigated.

Completeness: Fair; reports contain URL of the affected page; other details might include a mirrored version or some information of the threat actor.

Examples of sources

Clean MX

website status

Website: https://support.clean-mx.com/clean-mx/portals

Provider: For-profit entity

Access: Public

Zone-h

website status

Website: http://www.zone-h.org

Provider: Community

Access: Public

Feeds of vulnerable services

Lists of network services that have a known vulnerability or the fact that they are exposed on a public address may pose a security risk (exposing them is against good practice). Data is obtained by large-scale of IP space (usually IPv4, however relevant subsets of IPv6 can also be scanned). This information is very valuable for teams, since early identification of vulnerable assets is key to preventing intrusions or may reveal machines that might be already compromised. Services that are not strictly vulnerable but misconfigured in a way that make them prone to abuse as DDoS reflectors are also included in this category.

Evaluation

Timeliness: Poor; depends on the scanning frequency; popular services can be scanned even daily, less common only occasionally.

Accuracy: Good; providers typically take care to avoid false positives in the results, however some degree of verification is still needed.

Ease of use: Good; existing workflows and tools are well suited for handling these sources; this also applies for large-scale notifications (especially national teams).

Data volume: Corresponding to the number of publicly accessible hosts in the constituency; for large networks the number of reports daily can be in thousands, for national teams can exceed hundreds of thousands.

Completeness: Fair; information contains scan time, IP address and name of the vulnerability; some providers offer additional details on the vulnerable service and other services running on the same host.

Examples of sources

Shadowserver

website status

Website: https://www.shadowserver.org

Provider: Non-profit

Access: Network owners

Shodan

website status

Website: https://shodan.io

Provider: For-profit entity

Access: Public

Sector-specific advisories

Information concerning entities in a particular sector, either because of specific technologies being affected (for example aviation systems) or an attack that is targeting the sector. This type of information is typically in the form of advisories, possibly with technical data such as IoCs attached. The information is usually shared between companies, regulators or other entities being part of the sector, with smaller involvement of public sources or major security vendors in comparison with other categories. There is also an important role of ISACs and both open and closed informal sharing groups for information exchange.

Evaluation

Timeliness: Fair; attacks and vulnerabilities can be disclosed with a significant delay; depends on the victim's willingness to share information or vendor's processes.

Accuracy: Good; typically, only verified IoCs and vulnerabilities are shared.

Ease of use: Fair; information often comes in multiple formats and must be processed manually.

Data volume: Low.

Completeness: Good; usually sufficient information on the threat, context and mitigation steps is provided to make these reports actionable.

Examples of sources

Cybersecurity & Infrastructure Security Agency (CISA)

Website: https://www.us-cert.gov/ics

Provider: National CSIRT

Access: Public