Exploiting Intra-Packet Dependency for Fine-Grained Protocol

Slide Note

This study focuses on inferring protocol formats from packet payloads by utilizing intra-packet dependency. It addresses challenges in network management, quality of service control, and intrusion detection systems due to the lack of protocol specifications. The design goals aim for robust, independent, and efficient protocol format inference, applicable to both byte-level and bit-level protocols.

kyma_28 Follow

Uploaded on Mar 04, 2025 | 7 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Exploiting Intra-Packet Dependency for Fine-Grained Protocol Format Inference Qun Huang1, Patrick P. C. Lee1 and Zhibin Zhang2 1The Chinese University of Hong Kong, Hong Kong 2Institute of Computing Technology, Chinese Academy of Sciences, China Networking 15 1

Motivation Application-layer protocols for network management Quality-of-service control Intrusion detection systems In-depth profiling resource usage of applications Challenge lack of protocol specifications most protocol specifications are proprietary Goal: Infer message format of a protocol from packet payloads 2

Problem Network protocol format A sequence of fields Fields have different length Each field is defined with a set of values Protocol format inference Output: protocol format Applications Field 1: offset, length, values Field 2: offset, length, values Field 3: offset, length, values Inference Input: Packet Payloads Training Set Traffic Classification 3

Design Goals Fine-grain inference Work for both byte-level and bit-level protocols Protocol independent Do not depend on any protocol specifications Robust Remain efficient even for noisy training data Simple parameterization Employ a few easy-understood parameters Fast execution 4

Related Work Common sequences in payloads CGS [Ma et al. 2006]; SANTaClass [Tongaonkar et al. 2013] Statistical features ACAS [Haffner et al. 2005]; NBC [Bonfiglio et al. 2007]; KISS [Finamore et al. 2010]; ProWord [Zhang et al. 2013] Limitation: not efficient for bit-level protocols Other features Printable characters Discoverer[Cui et al. 07] N-grams ProDecoder [Dimitropoulos et al. 12] Program execution log Polyglot [Caballero et al. 07]; AutoFormat [Lin et al. 07]; dynamic taint analysis [Wondracek et al. 07]; Netzob [Bossert et al. 14] Limitation: generality or accuracy 5

Our Work ProGraph: infer message format of network protocols with all the design goals Intra-packet dependency as payload features Graphical model to characterize intra-packet dependency A systematic approach to construct graphical models Graph theory techniques Information theory techniques Experiments on real-world traces 6

ProGraph Payload Feature Intra-packet dependency as payload features: Values in a field determines values in other fields Example DNS (first 32 bits) Offset 0 Session ID 16 17 21 22 23 24 25 26 27 28 Q R A A T C R D R A A D C D Opcode Z Response code Query (0) / Response (1) Whether response supports recursion bit 16 is 1 -> bit 24 is 0 or 1 bit 16 is 0 -> bit 24 is 0 7

Graphical Model Characterize intra-packet dependency of a protocol Nodes: adjacent bits or bytes in the payload Directed edges: dependency across nodes Legitimate values with nodes Legitimate pairs with edges ( bits 18:19:20 -> bits 29:30:30 ) : 0 -> 0, 2, 3, 5 4 -> 0, 5 5 -> 0, 5 Bit 16 {0, 1} Bits 18:19:20 {0, 4, 5} Bit 23 {0, 1} Bit 21 {0, 1} Bit 24 {0, 1} Bits 29:30:31 {0, 2, 3, 5} Bit 28 {0} Bit 17 {0} Bit 22 {0, 1} Bit 25 {0} Bit 26 {0} Bit 27 {0, 1} 8

Model Construction Input Training set: a collection of packet payloads of target protocols L: Length of payload considered (16 bytes in this work) B: Initial length for each node in bits 1 (bit-level inference) or 8 (byte-level inference) Output A graphical model Initialization Initial graph with L/B nodes and no edges Training set implies empirical probability distributions For every node, draw the distribution on observed values For every node pair, draw the joint distribution on observed value pairs 9

Model Update Initial node/pair distributions Initial graph topology Each node represents one bit or one byte No edges Iteratively update the graphical model with the distributions Refine training set Training Set (Input) Node/Pair Distributions Updated Distributions Add/delete Add/delete Edges Edges Merge Merge Nodes Nodes Assign Assign Values/Pairs Values/Pairs Filter Filter Noise Noise Graphical Model Updated Topology Graph Topology Output 10

Add/Delete Edges Add/delete Add/delete Edges Edges Merge Merge Nodes Nodes Assign Assign Values/Pairs Values/Pairs Filter Filter Noise Noise Problem: quantify dependency between two nodes Strong dependency: add edges Weak dependency: remove edges Our approach: Information gain ratio Higher information gain ratio -> stronger dependency Edge direction From nodes with less values to nodes with more values From nodes with lower offset to nodes with higher offset 11

Example: Add/Delete Edges DNS (bits 16 to 31) 16 17 21 22 23 24 25 26 27 28 Q R A A T C R D R A A D C D Opcode Z Response code Initial Graph Add/Delete Edges Bit 17 Bit 18 Bit 19 Bit 20 Bit 21 Bit 22 Bit 23 Bit 16 Bit 25 Bit 26 Bit 27 Bit 28 Bit 29 Bit 30 Bit 31 Bit 24 12

Merge Nodes Add/delete Add/delete Edges Edges Merge Merge Nodes Nodes Assign Assign Values/Pairs Values/Pairs Filter Filter Noise Noise Nodes are merged to form fields Observations - nodes of the same field have: (1) high dependency with each other (2) high dependency with similar other nodes Our approach: graph analysis Observation (1) -> clustering analysis Observation (2) -> neighboring analysis 13

Clustering Analysis Add/delete Add/delete Edges Edges Merge Merge Nodes Nodes Assign Assign Values/Pairs Values/Pairs Filter Filter Noise Noise Goal: partition nodes into clusters consists of adjacent nodes forms a dense sub-graph Definition: density of a cluster Sum of edge dependency / number of nodes in the cluster Formulation: maximize sum of densities of clusters Solve with dynamic programming 14

Example: Clustering Analysis DNS (bits 16 to 31) 16 17 21 22 23 24 25 26 27 28 Q R A A T C R D R A A D C D Opcode Z Response code Initial Graph Add/Delete Edges Clustering Analysis Bit 17 Bit 18 Bit 19 Bit 20 Bit 21 Bit 22 Bit 23 Bit 16 Bit 25 Bit 26 Bit 27 Bit 28 Bit 29 Bit 30 Bit 31 Bit 24 15

Neighboring Analysis Add/delete Add/delete Edges Edges Merge Merge Nodes Nodes Assign Assign Values/Pairs Values/Pairs Filter Filter Noise Noise Neighbors of a node u Incoming cluster: cluster with incoming node of u Outgoing cluster: cluster with outgoing node of u Two nodes have similar neighbors: Their incoming and outgoing clusters differ at most one element We merge nodes, if They are in the same cluster, and They have similar neighbors 16

Example: Merge Nodes DNS (bits 16 to 31) 16 17 21 22 23 24 25 26 27 28 Q R A A T C R D R A A D C D Opcode Z Response code Initial Graph Add/Delete Edges Clustering Analysis Neighboring Analysis Merge Bit 17 Bit 18 Bit 19 Bit 20 Bit 21 Bit 22 Bit 23 Bit 16 Bits 18:19:20 Merge Bit 25 Bit 26 Bit 27 Bit 28 Bit 29 Bit 30 Bit 31 Bit 24 Bits 30:31 17

Assign Values/Pairs Add/delete Add/delete Edges Edges Merge Merge Nodes Nodes Assign Assign Values/Pairs Values/Pairs Filter Filter Noise Noise Challenge: Observed values from training set may include noisy values Observation Values introduced by noise have low probability Our approach Discard values with probability lower than a threshold Thresholds are adaptively selected with minimum description length technique [Agrawal et al. 98] 18

Filter Noise Add/delete Add/delete Edges Edges Merge Merge Nodes Nodes Assign Assign Values/Pairs Values/Pairs Filter Filter Noise Noise Eliminate noise packet payloads Examine each payload in training set A payload is a noise, if it contains non-legitimate value for some node, or non-legitimate value pair for some edge 19

Application: Traffic Classification Distinguish protocol packets from non-protocol packets For a payload, travel each node and its outgoing edges Examine whether node values and edge pairs are legitimate Classify protocol packets based on functionalities A specific field indicate the functionalities of packets Determine values of other fields Source node: zero indegree, non-zero outdegree Equivalent Bit 16 {0, 1} Bits 18:19:20 {0, 4, 5} Source node 0 -> 0, 2, 3, 5 4 -> 0, 5 5 -> 0, 5 Bit 23 {0, 1} Bit 21 {0, 1} Bit 24 {0, 1} Bits 29:30:31 {0, 2, 3, 5} Bit 28 {0} Bit 17 {0} Bit 22 {0, 1} Bit 25 {0} Bit 26 {0} Bit 27 {0, 1} 20

Experimental Results Validate ProGraph by evaluating its efficiency in traffic classification Two real-world datasets from two cities mainland China >4 billion packets, 2TB traffic in total Protocols HTTP (byte-level), BitTorrent (byte-level), DNS (bit-level) Uniformly sample 1% as training set The remaining form validation set Metrics Accuracy: false negative rate andfalse positive rate Throughput: model construction and traffic classification 21

Accuracy of ProGraph Compare with longest common subsequence (LCS) approaches False negative rate False positive rate LCS approaches: high false positive rate (LCS-baseline) or high false negative rate (LCS-refined) ProGraph achieves much lower false negative rate and false positive rate for all the three protocols 22

Robust to Noisy Packets Uniformly inject non-protocol packets as noisy packets into training set False negative rate False positive rate ProGraph remains low error rate (<6%) in the face of noisy training set 23

Throughput Model Construction Throughput Traffic Classification Throughput Throughput is stable in the face of noise DNS is much slower than HTTP and BitTorrent Faster than existing approaches (ProDecoder [Wang et al. 12], ProWord [Zhang et al. 14]) 24

Bit-Level VS Byte-Level False negative rate (results for HTTP are overlapped) False positive rate Byte-level inference for bit-level protocols have high error rate Bit-level inference for byte-level protocols remain accurate 25

Case Study: WeChat WeChat: one of the most popular mobile messaging application Training set: capture in some controlled experiments Include both TCP and UDP traffic, we focus on UDP traffic here 4,077 packets, some of which may be noise Model Validate with recorded operations in the controlled experiments Apply to real-world datasets Results: 1.36 M packets for 810 MB traffic volume Byte 1 indicates the functionalities (source node) full-duplex VoIP chats half-duplex walkie-talkie chats heartbeats 26 More results can be found in our another work (IWQoS 15)

Conclusions Propose ProGraph to infer message formats of network protocols Exploit intra-packet dependency as payload features Propose a graphical model to characterize intra-packet dependency Propose a systematic approach to construct model from training set Fine-grained: works for both byte-level and bit-level protocols Protocol independence Simple parameters Robust to noise Fast execution Application: traffic classification Distinguish protocol packets Classify based on functionalities Case Study: WeChat 27

Exploiting Intra-Packet Dependency for Fine-Grained Protocol

Download Presentation

Presentation Transcript

Related

More Related Content