How to Conduct and Detect Data Exfiltration
Is your organization taking proactive measures to prevent a data breach? From 2020 to 2021, the number of data breaches in the U.S. increased by 10%. According to the Identity Theft Resource Center, more than 500 data breach incidents occurred in 2020. The average cost of a data breach in 2021 was $9.05million for U.S. organizations, according to the IBM Cost of Breach study.
Unfortunately, the impact does not stop there. The residual costs caused by damage to brand reputation, subsequent legal action, and the detrimental effect on business operations often exceed the direct cost associated with addressing and containing the data breach. The collapse of certificate authority DigiNotar in 2011 is perhaps the most well-known example of an organization going bankrupt from the fallout of a data breach. DigiNotar failed to adequately protect their systems, allowing an attacker to issue hundreds of fraudulent certificates which effectively blacklisted the organization from the internet.
Most organizations would agree that data protection and security is imperative. However, despite the overwhelming amount of evidence and widespread understanding of how to implement effective security measures, organizations continue to neglect best practices and make drastically poor decisions when it comes to information security.
The predominant reasons are due to either not knowing where to start and/or the costs associated with implementing a successful security program. Given these concerns, where can you start to take proactive measures to protect your organization that are both effective and cost efficient?
Proactive security testing is an important part of developing a sophisticated information security program and can be an inexpensive place to start. It can help to determine the feasibility of different attack vectors in order to expose and address weaknesses within your organization. It also tests the ability of network defenders to successfully detect and respond to security incidents. A critical function of information security is to effectively prevent sensitive data from being accessed by external or unauthorized parties. After an attacker breaches an organization, they will often attempt to steal data by exfiltrating it out of the organization’s network. This post will explore the data exfiltration process, including how you can detect data exfiltration and what measures your organization should take to prevent it.
What is Data Exfiltration?
Data exfiltration is the process of transmitting data across network boundaries, typically from an organization’s intranet to the internet. It is commonly achieved by attackers after they establish a foothold in an organization’s network. Sophisticated attackers can often remain undetected in enterprise networks for a remarkable amount of time, even while actively hunting for valuable data. Once an attacker decides they have gathered enough data, they will attempt to exfiltrate it. An attacker’s primary objective while exfiltrating data is to get as much out of the network in as little time as possible, often implementing techniques to minimize their chance of being detected.
Many enterprise network defense mechanisms are aimed primarily at preventing attackers from entering a network. It is much less common for organizations to implement defenses aimed at preventing sensitive data from leaving their networks. For example, most enterprise firewalls filter ingress (incoming) traffic but are not configured to sufficiently monitor and filter egress (outgoing) traffic. A major obstacle that organizations face in the effort to prevent data exfiltration is the inability to know exactly how data will traverse network boundaries. Current detection suites generally focus on identifying specific attributes of sensitive data, such as filenames, extensions, and keywords – methods that attackers can easily circumvent by encrypting the data or obfuscating the exfiltration channel. In order to better understand how to prevent data exfiltration, we will further examine the attack surface and methodology in the next sections of this post.
Figure 01 – The path of exfiltrated data as it flows out of an organization’s network
The Process of Exfiltrating Data
If an attacker is able to gain unauthorized access to a system within an organization’s network without being detected, in many cases it will be trivial for them to transmit data out of that network. To accomplish this, attackers typically establish a shell – a communication channel that enables remote interaction with a machine – between the host they have compromised and their server. A remote shell makes it much easier to load external libraries or third-party tools onto the compromised host. There are also mechanisms built-in to many consumer systems (e.g. PowerShell or Windows Management Instrumentation (WMI)) that provide attackers a pre-installed toolset for controlling a host remotely, as well as exfiltrating data.
When an attacker is ready to exfiltrate data out of a network, they will configure a server on the internet to listen for a connection using a predetermined protocol (e.g. Hypertext Transfer Protocol (HTTP)). The attacker will then establish a connection between the compromised host and their server, ensuring the same protocol is used by both machines. Next, the attacker instructs the compromised host to send the specified data to the awaiting server. The time it takes to transmit the data depends on factors like the size of the file being transferred, the uplink speed, and the capabilities of both the hosts. Once the data transmission is complete, attackers will usually move the stolen data off of the server that received it in order to ensure its integrity. They may also take the server offline for operational security purposes.
Attackers sometimes perform anti-forensics on compromised hosts in an attempt to hide their activity and remove any evidence that data was exfiltrated. They may go as far as patching a vulnerability that they exploited to gain access to a machine in the first place.
Exfiltration Methods and Detection
The following sections outline a few methods for exfiltrating data, offered from the view of an external attacker, as well as how you can detect these exfiltration methods.
Hypertext Transfer Protocol (HTTP)
The Hypertext Transfer Protocol is an application-layer protocol for transmitting data between a client and a server. HTTP is used by web browsers to access websites and communicate with web servers. It is a reliable web technology that can be used for a wide variety of functions. Attackers often use HTTP to exfiltrate data because of how common it is in most networks. The high volume of HTTP traffic traversing enterprise networks allows data to blend in more easily while being exfiltrated.
The structure of HTTP communication provides a lot of benefits to attackers. It can be used to facilitate command and control with a compromised host using traffic that resembles innocuous web browsing, such as reading the news or shopping on Amazon. It also enables large data transfers directly between two hosts with easy validation and integrity checks.The standard way to send large files or chunks of data to a server over HTTP is by submitting a POST request. The data is placed in the body of the request and sent to a specific URL on a web server that is designed to handle the request. There is generally no limit to the amount of data that can be transferred using this method, except for those imposed by the web server. If a file is too big for a server to handle in one POST request, it can be split up and sent in multiple requests. Properly configured web servers will automatically reconstruct the data as it is received.
In order to remain stealthy, attackers can configure web servers to only respond to requests that meet certain conditions. For example, a server can drop all requests that do not contain a specific user agent string (known only by the attacker). Similarly, all GET requests can be dropped, or even redirected to the target organization’s own website.
Due to that fact that HTTP is a plain text protocol, attackers typically compress, encode or encrypt data before exfiltrating it to minimize the risk of detection.Most end-user’s web browsing activity is fairly consistent, if not predictable. If at any point an end-user submits a POST request to a web server, it is likely to be after an initial session with that web server has been established. An example would be submitting a GET request to an application’s login page before actually logging in with a POST request. When attackers use HTTP to exfiltrate data from a compromised host, POST requests on the network can often be observed to a web server with which the host has had no prior communication. Attackers usually try to limit the exposure of their infrastructure in an effort to stay undetected. The less interaction a compromised host has with the attacker’s infrastructure, the more difficult it may be for an organization’s security team to effectively respond to the exfiltration. As a result, POST requests to servers that have not been seen before may be an indicator of suspicious activity. However, this method is prone to false positives because there are a number of legitimate use cases where that activity may occur (e.g. application programming interfaces (APIs), XMLHttpRequests). A better approach would be whitelisting domain names and IP addresses that are deemed necessary for business operations and granting access to new sites on a per user request basis.
Another method for detecting suspicious HTTP activity is to monitor the duration of TCP sessions between clients and remote servers, as well as the amount of data exchanged between the two hosts during a session. Normal web browsing activity generally consists of the server sending the client the majority of data, including webpages, scripts, and images. If the client is sending a disproportionate amount of data to the server, it may indicate that a compromised host is attempting to exfiltrate data. For example, if a TCP session lasts for longer than 30 seconds and the client sends the server a stream of HTTP data greater than 10 MB, then the connection may be considered suspicious and should be flagged. However, this method could also produce a large number of false positives due to the HTTP keep-alive mechanism that allows connections to persist for longer durations. Similar rules can be tailored to better suit an organization’s environment and needs.
Domain Name System (DNS)
The Domain Name System is the component of the internet that maps domain names to the numerical IP addresses that are required for routing internet traffic to its proper location. Computers are designed to automatically query DNS servers in the background during normal user operations. For example, when a user types “www.example.com” in a web browser, their computer retrieves the website without ever displaying its hosted IP address.
Figure 02 – Simplified depiction of a DNS query and subsequent response
DNS tunneling is the process of transmitting data using DNS queries and responses. It can be used to transfer files or facilitate command and control with a compromised host, especially in environments where other methods may be closely monitored. If an attacker can issue external DNS queries from within an organization’s network, they will likely be able to exfiltrate data by tunneling it over DNS.
The first step of DNS tunneling is creating DNS records that will point any queries for a specific domain name to a server under the attacker’s control. After setting up and testing the records, the attacker will configure the server to act as an authoritative DNS server to itself and to listen for incoming DNS queries. They can then begin exfiltrating data by issuing virtually unlimited queries from a compromised host. Each DNS query will contain a chunk of stolen data in the subdomain portion of the request. For example, to exfiltrate credit card numbers along with their expiration dates and security codes, a string of queries may include subdomains that look like this:
Numerous open source tools are available to make this process easier for attackers, most of which implement server-side parsing techniques that automatically reconstruct or format data as it is received.
A common indicator of DNS tunneling is a high volume of DNS queries to the same second-level domain, each containing a unique subdomain and originating from the same host, in a very short period of time. There are very few legitimate situations where this behavior should occur.
It is likely that data being exfiltrated will contain special characters if an attacker is targeting anything other than credit card numbers. If special characters exist in the data, it has to be encoded before it can be sent in a DNS query because domain names can only contain certain characters. Base32 is often used to encode the data because its entire character set can be used in domain names. Most encoding schemes produce strings with a high degree of entropy (if the encoded data is not extremely similar). If multiple DNS queries are observed with the same subdomain length, as well as a high degree of entropy, then there is a strong likelihood that host issuing the queries is compromised. Any traffic that satisfies these characteristics should be flagged and immediately trigger a temporary block of the domain being queried.
DNS tunneling is inherently conspicuous but continues to succeed as an exfiltration method because many organizations implement few, if any, monitoring systems for outbound DNS queries. Many detection signatures for common DNS tunneling tools have been developed over time and should be added to an organization’s monitoring suite to reduce the chance of a successful attack.
File Transfer Protocol (FTP)
The File Transfer Protocol is a network protocol used for transferring files between a client and a server. Although it does not provide any sort of integrity protection, FTP is generally a reliable protocol for transferring large files.In order to exfiltrate data over FTP, an attacker must be able to authenticate to an external FTP server from a compromised host within an organization’s network. Many enterprise networks lack firewall rules preventing outbound connections, allowing attackers to easily connect back to their own infrastructure. In addition, most operating systems are shipped with a native FTP client, so attackers do not have to install any supplemental tools on a compromised host.
FTP, like HTTP, is a plain text protocol. A major drawback of plain text protocols is that credentials are also sent across the wire in the clear. Anyone who is able to capture the traffic of a host authenticating to an FTP server would also be able to gain access to that server. Attackers can employ numerous techniques to retain a strong degree of operational security while transferring data over FTP. One such technique is to configure an FTP server with write-only permissions (also called a “blind drop” server) that permits anonymous uploads but prohibits all other actions such as listing directory contents and retrieving files. This allows attackers to avoid using credentials altogether.Other common operational security measures can be applied to FTP servers as well. For example, an attacker can setup an FTP service that only accepts incoming connections during specific times, or from specific IP addresses.
As a plain text protocol, if sensitive data is being exfiltrated over FTP, most network monitoring solutions or intrusion detection systems should be able to detect it. It becomes much more difficult to detect when data is obfuscated or encrypted before being sent over the network.
In an enterprise network, transmitting encrypted data over an unencrypted channel can be a strong indicator of compromise; or, at the very least, an indicator that suspicious activity could be occurring and should be further investigated. Flagging and temporarily blocking encrypted data being sent over an unencrypted FTP connection may help stop data exfiltration attempts.
A whitelisting approach is most likely the best course of action for preventing data exfiltration over FTP. A thorough whitelisting approach would require inspecting outbound traffic at the network perimeter and dropping any FTP packets that are not destined for servers on an approved whitelist. If a user needs FTP access to a new server not on the list, they can contact a system administrator in order to get the server approved.
The exfiltration methods presented above share one major disadvantage for attackers: the protocols they rely on lack encryption by default. Sophisticated attackers are often very cautious of organizations conducting deep packet inspection (DPI) on their egress traffic. Realistically, the chance that an attacker would use a plain text protocol to exfiltrate easily identifiable information such as credit card or social security numbers is low.
Figure 03 – Unencrypted request to www.example.com over HTTP
Detection methods that rely on analyzing packet content are almost entirely ineffective when encryption is used. For example, HTTPS data is virtually impossible to read without network perimeter devices that are capable of decrypting SSL/TLS streams.
Figure 04 – Encrypted request to www.example.com over HTTPS
If network restrictions prevent an attacker from establishing an encrypted communication channel, they will encrypt data locally on the compromised host and exfiltrate it over an unencrypted protocol instead. Any network monitoring service or intrusion detection system located upstream would not be able to read the encrypted data. As previously stated, encrypted data streams being sent over unencrypted channels should warrant an investigation. Unless an organization has full visibility into the traffic leaving their network, there can be no certainty of what might be hiding under a layer of encryption.
Protecting Your Organization
After reviewing the HTTP, DNS, and FTP data exfiltration techniques, the next step is to ensure your organization is utilizing the detection methods discussed in this post. MindPoint Group offers exfiltration testing as a service to help organizations better understand their attack surface and test their detection capabilities at a large scale. We work with organizations to identify their “crown jewels” – what an internal or external attacker would most likely be after – and design data exfiltration tests that are tailored to those specific needs: credit card numbers, personally identifiable information (PII), source code, and much more. Our secure, comprehensive testing simulates realistic data exfiltration scenarios using a wide variety of techniques in addition to the ones outlined above.