Utilizing SiLK and Mothra to Establish Knowledge Exfiltration by way of the Area Identify Service

A wide range of fashionable community threats contain knowledge theft by way of abuse of community companies, which is termed knowledge exfiltration. To trace such threats, analysts monitor knowledge transfers out of the group’s community, significantly knowledge transfers occurring by way of community companies not primarily meant for bulk switch companies. One such service is the Area Identify System (DNS), which is important for a lot of different Web companies. Sadly, attackers can manipulate DNS to exfiltrate knowledge in a covert method.

This SEI weblog submit focuses on how the DNS protocol will be abused to exfiltrate knowledge by including bytes of knowledge onto DNS queries or making repeated queries that include knowledge encoded into the fields of the question. The submit additionally examines the overall site visitors analytic we are able to use to establish this abuse and applies a number of instruments accessible to implement the analytic. The combination dimension of DNS packets can present a prepared indicator of DNS abuse. Nonetheless, as a result of the DNS protocol has grown from a easy tackle decision mechanism to distributed database assist for community connectivity, decoding the combination dimension requires understanding of the context of queries and responses. By understanding the amount of DNS site visitors, each in isolation and in combination, analysts could higher match outgoing queries and incoming responses.

The information used on this weblog submit is the CIC-BELL-DNS-EXF 2021 knowledge set, as printed at the side of the paper Light-weight Hybrid Detection of Knowledge Exfiltration utilizing DNS based mostly on Machine Studying by Samaneh Mahdavifar et al.

The Function of DNS

DNS helps a number of kinds of queries. These queries are described in a wide range of Web Engineering Process Power (IETF) Request for Remark (RFC) paperwork. These RFCs embrace the next:

  • A and AAAA queries for IP tackle akin to a site title (e.g., “which tackle corresponds to www.instance.com?” with a response like “”)
  • pointer file (PTR) queries for title akin to an IP tackle (e.g., “which title corresponds to” with a response of “www.instance.com”)
  • title server (NS), mail trade (MX), and service locator (SRV) queries for the id of key servers in a given area
  • begin of authority (SOA) queries for details about addresses on which the queried server could communicate authoritatively
  • certificates (CERT) queries for encryption certificates pertaining to the server’s lined domains
  • textual content file (TXT) queries for extra info (as configured by the community administrator) in a textual content format

A given DNS question packet will request info on a given area from a selected server, however the response from that server could embrace a number of useful resource information. The dimensions of the response will rely upon what number of useful resource information are returned and the kind of every file.

As soon as analysts perceive the explanations for monitoring DNS site visitors and the context wanted for decoding the monitoring outcomes, they will then decide what info is desired from the monitoring. This weblog submit assumes the analyst needs to trace exterior hosts that could be receiving exfiltrated info.

Overview of the Analytic for Figuring out Knowledge Exfiltration

The analytic lined on this weblog submit assumes that the networks of curiosity are lined by site visitors sensors that produce community movement information or a minimum of packet captures that may be aggregated into community movement information. There are a number of instruments accessible to generate these movement information. As soon as produced, the movement information are archived in a movement repository or applicable database tables, relying on the evaluation software suite.

The strategy taken on this analytic is, first, to combination DNS site visitors related to exterior locations performing like servers and, second, to profile the site visitors for these locations. Step one (affiliation) includes figuring out DNS site visitors (both by service port or by precise examination of the applying protocol), then figuring out the exterior locations concerned. The second step (profiling) examines what number of sources are speaking with every of the locations, the combination byte depend, packet depend, and different revealing info as described within the following sections.

A number of completely different instruments can be utilized for this evaluation. This weblog submit will talk about two units of SEI-developed instruments:

  • The System for Web-Degree Data (SiLK) is a set of site visitors evaluation instruments developed to facilitate safety evaluation of enormous networks. The SiLK software suite helps the environment friendly assortment, storage, and evaluation of community movement knowledge, enabling community safety analysts to quickly question massive historic site visitors knowledge units. SiLK is ideally fitted to analyzing site visitors on the spine or border of a giant, distributed enterprise or mid-sized ISP.
  • Mothra is a set of Apache Spark libraries that assist evaluation of community movement information in Web Protocol Stream Info Export(IPFIX) format with deep packet inspection fields.

Every of the next sections will current an analytic for detecting exfiltration by way of DNS queries within the corresponding software set.

Implementing the Analytic by way of SiLK

Determine 1 beneath presents a collection of SiLK instructions to implement an analytic to detect exfiltration. The primary command applies a filter to regular, benign DNS site visitors, isolating DNS site visitors (recognized by protocol recognition as indicated by the applying label of 53) coming from the inner community (classless inter-domain routing [CIDR] block and of comparatively lengthy (70 bytes or extra) packets. The output of the filter is then summarized by vacation spot tackle and transport protocol, counting bytes, movement information, and packets for every mixture of tackle and protocol. The ensuing counts are solely proven if the collected bytes are 500 or extra. After making use of the analytic to benign DNS knowledge, it’s utilized within the second sequence to DNS knowledge encompassing compressed knowledge for exfiltration.

Screen Shot 2023-04-03 at 3.22.00 PM

Determine 1: SiLK Analytic and Outcomes

The ends in Determine 1 present that the community talks to a major DNS server, a secondary DNS server, and a public server. Within the benign case, the info is especially directed to the first DNS server and the general public server. Within the exfiltration case, the info is especially directed to the first DNS server and the secondary DNS server. This shift of vacation spot, in isolation, just isn’t sufficient to make the exfiltration site visitors suspicious or present a foundation for transferring past suspicion into investigation. Within the benign case, there’s a notable fraction of the site visitors directed to the general public DNS server at Within the site visitors labeled as abusive, this fraction is lessened, and the fraction to a non-public DNS server (the exfiltration goal) at is elevated. Sadly, given the restricted nature of SiLK movement information, safety analysts have a tough time exfiltrating extra site visitors. To go additional, extra DNS-specific fields are required. These fields are supplied by deep packet inspection (DPI) knowledge in expanded movement information in IPFIX format. Whereas SiLK can’t course of IPFIX movement information, different instruments akin to Mothra and databases can.

Implementing the Analytic by way of Mothra

The code pattern beneath reveals the analytic carried out in Spark utilizing the Mothra libraries. These libraries enable definition and loading of knowledge frames with community movement file knowledge in both SiLK or IPFIX format. An information body is a assortment of knowledge organized into named columns. Knowledge frames will be manipulated by Spark capabilities to isolate flows of curiosity and to summarize these flows. Defining the info frames includes figuring out the columns and the info to populate the columns. Within the code pattern, the info frames are outlined by the spark.learn.area operate and populated by knowledge from both the captured benign site visitors or the captured exfiltration site visitors by way of Mothra’s ipfix operate. Collectively, these capabilities set up the knowledge knowledge body.

The outcome knowledge body is constructed from the knowledge knowledge body by way of a collection of filtering and summarization capabilities. The preliminary filter restricts it to site visitors labeled as DNS site visitors, adopted by one other filter that ensures the information include DNS useful resource file queries or responses. The choose operate that follows isolates particular file options for summarization: time, site visitors supply and vacation spot, byte and packet volumes, DNS names, DNS flags, and DNS useful resource file sorts. The groupBy operate generates the summarization for every distinctive DNS title and useful resource file kind mixture. The agg operate specifies that the summarization include the depend of movement information, the counts of supply and vacation spot IP addresses, and the totals for bytes and packets. The filter operate (after the summarization) restricts output to simply these exhibiting a bytes-per-packet ratio of greater than 70 with fewer than three entries within the DNS Identify listing. This final filter excludes summarizations of site visitors that’s massive solely as a result of size of the response listing slightly than to the size of particular person queries.

This filtering and summarization course of creates a profile of enormous DNS requests and responses (separated by DNS flag values). The usage of DNS names as a grouping worth permits the analytic to tell apart repeated queries to comparable domains. The counts of supply and vacation spot IP addresses enable the analyst to tell apart repeated site visitors to some places as a substitute of uncommon site visitors to a number of places or from a number of sources.

val data_dir = ".../path/to/knowledge"
import org.cert.netsa.mothra.datasources._
import org.cert.netsa.mothra.datasources.ipfix.IPFIXFields
import org.apache.spark.sql.capabilities._

// In dnsIDBenign.sc:
val data_file = s"$data_dir/light_benign.ipfix"
// In dnsIDAbuse.sc:
// val data_file =
//   s"$data_dir/light_compressed.ipfix"

val knowledge = {
    IPFIXFields.default, IPFIXFields.dpi.dns

val outcome = {
    .filter(($"silkAppLabel" === 53) &&
       $"dnsRecordList.dnsRRType" as "dnsRRType",
       $"dnsRecordList.dnsQueryResponse" as "dnsQR",
       $"dnsRecordList.dnsResponseCode" as "dnsResponse",
       $"dnsRecordList.dnsName" as "dnsName")
     .agg(depend($"*") as "flows",
          countDistinct($"sourceIPAddress") as "#sIP",
          countDistinct($"destinationIPAddress") as "#dIP",
          sum($"octetCount") as "bytes",
          sum($"packetCount") as "packets")
 //    .filter($"packets" > 20)
     .filter($"bytes"/$"packets" > 70)
     .filter(dimension($"dnsName") < 3)

The code pattern beneath reveals the output of dnsIDExfil.sc on benign and on compressed knowledge, the info units used within the previous SiLK dialogue. The presence of multicast (224/8 and 239/8 CIDR blocks) and RFC1918 non-public addresses (192.168/16 CIDR blocks) is because of this knowledge coming from a man-made assortment surroundings as a substitute of stay Web site visitors seize.

Contrasting the benign output in opposition to the abuse output, we see a smaller variety of lookup addresses being queried within the abuse outcomes and a a lot faster drop-off within the variety of queries per host. Within the benign outcomes, there are six DNSNames which can be queried repeatedly; within the abuse outcomes, there are two. All the queries proven are PTR (reverse. RRType=12) queries, and all are going to the identical server. Within the high-volume DNSName queries, the utmost common packet size is barely bigger for the abuse knowledge than for the benign knowledge (81 vs. 78). Taken collectively, these variations present a slow-and-steady launch of extra knowledge as a part of the DNS knowledge switch, which displays the file switch happening.

dnsIDBenign.sc output:
|dnsName                              |dnsRRType|flows|#sIP|#dIP|bytes |packets|
|[]          |[12]     |2835 |1   |1   |416539|5901   |
|[]       |[12]     |982  |1   |1   |242585|3125   |
|[]       |[12]     |895  |1   |1   |134756|1836   |
|[]        |[12]     |901  |1   |1   |133490|1844   |
|[]       |[12]     |757  |1   |1   |112173|1533   |
|[]         |[12]     |635  |1   |1   |91734 |1288   |
|[]         |[12]     |315  |1   |1   |45438 |640    |
|[_ipps._tcp.local., _ipp._tcp.local.]|[12, 12] |122  |32  |1   |13161 |136    |
|[]      |[12]     |74   |1   |1   |11328 |152    |
|[]       |[12]     |31   |1   |1   |4666  |64     |
solely exhibiting high 10 rows

dnsIDAbuse.sc output:
|dnsName                              |dnsRRType|flows|#sIP|#dIP|bytes |packets|
|[]          |[12]     |1260 |1   |1   |191398|2696   |
|[]         |[12]     |255  |1   |1   |130725|1615   |
|[]       |[12]     |416  |1   |1   |63606 |866    |
|[]       |[12]     |388  |1   |1   |57686 |788    |
|[]        |[12]     |379  |1   |1   |56492 |781    |
|[]       |[12]     |340  |1   |1   |50738 |694    |
|[]         |[12]     |125  |1   |1   |17750 |250    |
|[]      |[12]     |32   |1   |1   |4736  |64     |
|[_ipps._tcp.local., _ipp._tcp.local.]|[12, 12] |46   |30  |1   |4467  |51     |
|[_ipp._tcp.local., _ipps._tcp.local.]|[12, 12] |13   |9   |1   |1782  |19     |
solely exhibiting high 10 rows

Understanding Knowledge Exfiltration

Whichever type of tooling is used, analysts typically want an understanding of the info transfers from their community. Repetitive queries for DNS decision ought to be slightly uncommon—caching ought to eradicate many of those repetitions. As repetitive queries for decision are recognized, a number of teams of hosts could also be discovered:

  • Hosts that generate repetitive queries not indicative of exfiltration of knowledge are more likely to exist, characterised by very constant question dimension, periodic timing, and the usage of anticipated title servers.
  • Hosts that generate repetitive queries with uncommon title servers or timing could require additional investigation.
  • Hosts that generate repetitive queries with uncommon title servers or question sizes ought to be examined rigorously to establish potential exfiltration.

The influence of those hosts on community safety will range relying on the vary and criticality of belongings these hosts entry, however a few of the site visitors could demand quick response.

What Would possibly a Safety Analyst Need to Know

This submit is a part of a collection addressing a easy query: What may a safety analyst need to know firstly of every shift concerning the community? In every submit we’ll talk about one reply to this query and utility of a wide range of instruments that will implement that reply. Our purpose is to supply some key observations that assist analysts monitor and defend their networks, specializing in helpful ongoing measures, slightly than these particular to 1 occasion, incident, or concern.

We is not going to deal with signature-based detection, since there are a selection of sources for such together with intrusion detection programs (IDS)/intrusion prevention programs (IPS) and antivirus merchandise. The instruments utilized in these articles will primarily be a part of the CERT/NetSA Evaluation Suite, however we’ll embrace different instruments if useful. Earlier posts examined instruments for monitoring software program updates and proxy bypass.

Our strategy can be to spotlight a given analytic, talk about the motivation behind the analytic, and supply the applying as a labored instance. The labored instance, by intention, is illustrative slightly than exhaustive. The choice of what analytics to deploy, and the way, is left to the reader.

If there are particular behaviors that you just want to counsel, please ship them by e mail to netsa-help@cert.org with “SOC Analytics Thought” within the topic line.

Related Articles


Please enter your comment!
Please enter your name here

Latest Articles