Investigation of encrypted traffic
As the traffic on the Internet becomes more and more encrypted due to privacy concerns, the need to protect data from third parties, prying eyes, marketers, service providers and others, behavioural profiling of network sessions will become more and more necessary. Already, there are many products that claim to do behavioural profiling of network activity in varying degrees to assist with behaviour detection. There is more and more active research in this area by vendors, law enforcement, bad guys and others.
I reviewed a report where it was indicated that because the data was encrypted it was impossible to determine anything useful. This is not always the case, but I have seen this conclusion in reports and investigations many times when dealing with encrypted or unidentified data. Aside from the marketing which says that if my Internet sessions are encrypted then one is safe (nothing could be further from the truth), many network administrators do not understand or have had much experience with behavioural profiling. Behavioural profiling of networks can be very complex, and research is relatively new in this area. To give some insight into how one might profile network sessions and show how one can use behavioural profiling to extract information, I decided to walk through a simple example and answer a simple question. Specifically, what are the differences between an encrypted network session where one is watching a program or video (user providing no input), compared to an interactive type of network session where one is interacting (providing input)? I used the SSH protocol to illustrate.
I used video over SSH to watch a program. The program was approximately 24 minutes in duration and was hosted on a server at my ISP. There were no problems watching the program, it didn’t pause or stop, and it was just like watching a typical television program (in fact I watched it on my flat screen TV). I used a device to capture the traffic between the server hosting the program and my home for the entire duration of the program. Finally, I captured an interactive SSH session which was me logged into a server at my ISP, where I was doing some coding and some shell commands.
Attempts to look at the actual data of either of these captures will be useless. Since the data is encrypted, without access to the session keys knowing what was transmitted is close to if not impossible. That being stated, what behaviour characteristics can we observe to tell us what might be going on?
I separated the direction of each of two captures which gave me 4 capture files, video received, video transmitted, interactive data received and interactive data transmitted.
Bandwidth
| Received | Transmitted | Ratio | |
| Video | 193.2 MB | 7.0 MB | 0.036 |
| Interactive | 0.59 MB | 0.58 MB | 0.98 |
Looking at the chart above, the video watching has a much larger amount of data received than transmitted compared to the interactive session where a similar amount of data is transmitted and received. Analysis of most video streaming and flows where downloading is occurring will yield a similar results. The ratio of received to transmitted data will be high. Interactive sessions tend to have a more balanced ratio of transmitted to received data compared to a video session. This of course has dependencies on what the user is doing in the interactive session, but typically this has been the case in my experience.
Inter-packet timing
Another interesting metric is the time difference or delta between two packets. Watching a video or listening to music, the delta between two packets tends to be small in comparison to an interactive type of session. There are a few reason for this. Since the video is being viewed, it is important to ensure that the data arrives in a timely manner so as to not have the video ‘freeze’ while being watched. Some software attempts to write the video data to disk in advance of viewing to help mitigate this problem, but that leaves an exposure where an savvy individual can obtain a copy of the video by simply making a copy of the temporary file. As a result, newer software tends to attempt to keep the data in memory and not write it to disk. The result is the need to ensure a smooth delivery of data, minimizing delay between packets (known as Jitter).
| Received (seconds) |
Transmitted (seconds) |
|||||
| Maximum | Mean | Std Dev. | Maximum | Mean | Std Dev. | |
| Video | 3.065 | 0.021 | 0.094 | 3.051 | 0.014 | 0.076 |
| Interactive | 4028.555 | 3.568 | 88.736 | 4028.544 | 2.162 | 69.137 |
I wrote a simple python script which will take as input a capture file, calculates the inter-packet timing for each pair of packets and then outputs among other information, the results you see in the table above. The Maximum field is the largest time between packets, the mean is the average time between packets, and the standard deviation is a measure of how ‘different’ the inter packet times are from the ‘normal’. For those that don’t know or wish to have a refresher in standard deviation, here is a good place to start. However, most languages and spreadsheets have functions to calculate this for you if you do not wish to learn the math. In simple terms and using our specific example, if all the packets had the exact same time between them then the standard deviation would be 0. The greater the difference in timing between packets, the greater the standard deviation will be.
Notice that the standard deviation is much higher for the interactive session then the video session. Sessions that stream data, tend to have a low standard deviation for inter-packet timing. If you think about it this makes sense, as an interactive session you can walk away from the computer, or the program could be waiting for input from the user so data transmission will fluctuate more.
Bandwidth, inter-packet timing, and methods such as standard deviation and mean are just a few things that can be used to narrow down what a particular subjects activities might be. In corporate or law enforcement investigations, profiling network behaviour can be a useful tool to determine if you need to spend more time on the investigation or if you have the right target. Using our example above, suppose a corporation wants to determine which employees are watching streaming videos. A scan of the network data reveals an individual who has encrypted sessions, but these sessions show a transmit / receive ratio that is in line with typical interactive sessions and not video sessions. Also, the standard deviation of the inter-packet timing is higher for these sessions, then you can rule them out as an individual of interest immediately. This has the advantage of focusing your investigation, not encroaching on privacy issues unnecessarily, and saves time by allowing you to focus on the users that have network sessions with characteristics that fit the behaviour you are looking for.
For those of you that feel comfortable because the data is ‘encrypted’ it can be a false sense of security. These are two of the many metrics and theorems that can be used on the data. This area has active research and there are many products that will do this type of analysis in an automated fashion. For those interested in this, although older now, this is a great paper where an experiment was conducted to determine what movie people were watching even though the movie data was encrypted. They used behavioural data to fingerprint the movies, then applied the fingerprints to encrypted transmitted data.
Anyone in the digital forensics community will have heard 




