Archive

Archive for the ‘Behavioural Profiling - Network’ Category

Tor and plausible deniability

February 18th, 2010 Clear2Go 2 comments

Once again I have been experimenting with the Tor network.  In doing so I have set up some Tor nodes. I have received a few notifications that my computer ‘may be infected’. Google for a brief period of time requested I enter a capcha to confirm I am human.  These are all expected minor nuisances when running Tor as an exit node. My main reason for setting up Tor this time, is to obtain a better understanding of what happens to behavioural and static detection when a Tor exit node is present.

If you want privacy or anonymity on the Internet, there are many things you can do. Proxies, Tor, encrypted tunnels, compromised systems, and many other techniques are available.  None of these will guarantee you anonymity or privacy, but they each help and the more you can do the better.  There are caveats of course and in several cases while consulting I have come across scenarios where a client thought they were being anonymous but were in fact not as anonymous as they thought.  When you are trying to be anonymous, use of monitoring techniques and system checks really help.

I’ve realized that running a Tor exit node but not using it yourself gives you anonymity.  I’ve always known this inherently, but I’ve realized that it is even better than I thought.  Say you are an evil person doing something evil on the Internet.  If your activities were being tracked by your service provider due to a warrant from law enforcement or laws were put in place that required all service providers to track and retain your Internet surfing activities for a period of time, they would be recording the surfing habits of every connection that selected your Tor node as its exit node.

If they accused you of illegal activity, you could easily say that was not me, it must have been someone using my Tor node.  While this is not a guarantee the criminal would not get caught, it would increase the cost of the investigation significantly.  More investigation time, more forensics to prove that the suspect is the criminal.  Add in anti-forensics on your terminals and systems you use for the crime and the costs for investigation again will increase, forcing them to assess if it is worth the time, money, and resources required.

If countries are going to deploy the retention laws similar to the above, it will only be a matter of time before they will have to outlaw services such as Tor in order to make them effective at catching the serious criminals.  From a Tor network perspective, these laws might help increase the node count of the Tor network on the Internet which is a good thing for them.

I wonder if law makers consider these questions when suggesting these laws?

Identifying the anonymous in today’s digital world

January 28th, 2010 Clear2Go No comments

http://www.flickr.com/photos/solarider/2255744829/

A few years ago, I was having a discussion with an acquaintance who was involved in an investigation.  One individual they were tracking kept changing his mobile phone every few days.  Each new mobile was typically pay as you go or stolen and personal information connected to the mobile was either false or not available.  Yet the investigators were able to very quickly determine the new number of the individual each time they switched mobile numbers.    How they did this at the time impressed me, and I use the logic to this day.

Throughout the course of the investigation they were able to determine who this individual contacted.  A few of the mobiles that the individual contacted did not routinely change their mobile number.  As a result, by watching the calling patterns of the mobile phones where the numbers did not change, the investigators could quickly determine a new number that suddenly was calling each of the static numbers in a similar pattern.  This of course requires access to mobile network data, but it worked.  Even though this individual thought they were not being tracked,  their efforts to remain anonymous unknown to them were ineffective.   As a side note, there is software that will search for and detect these types of calling patterns automatically.  The same logic here can easily be applied to a Internet connection.

A more common example is when you are ever pulled over by a police officer and you don’t have your license.  Aside from them giving you a ticket for not having your license on your person, they will most likely ask you for your full name and birth date.  The reason for the birth date is to help assure them that when they go back to the cruiser to search on their laptop, the records they obtain are actually yours and not someone else with the same name.   How many Michael Dundas’ are there in Canada?  Not sure, but the number of Michael Dundas’ with the exact same birth date really lowers the probability of a false positive.  This same logic can be applied to social networking and there is interesting research in this area including twitter.

The EFF recently published a post on information theory and privacy.  In it they discuss the concept of Entropy and how it applies to information and privacy.  It touches a bit on some of the math behind it, but if you are interested it is a good explanation of why when you think you are anonymous you may not be, even when you take precautions.  If you skip the math, their example of how a ‘user-agent’ header transmitted by your browser can narrow you down to one of 1500 people can start to give people that are new to information and anonymity a good perspective.

A simple and common network attack

December 17th, 2009 Clear2Go 2 comments

1930DictionaryIn working with large companies such as service providers, financial and manufacturing institutions, I have come across many common and simple attacks.  I will discuss one that I came across recently while planning for a project.  It is not a new attack as I and most other security professionals have encountered it many times.   The attack itself has been around for years now.  What amazes me is that regardless of how simple, common, and old the attack is I usually find it undetected on most networks.

Before walking through the attack, let me describe the steps used for this attack.  There are many  papers, books, courses and posts by security professionals on how to effectively detect and respond to attacks, the proper methodology, decision points and other variables.  These methods vary to different degrees in application, complexity and point of view.  For example, the methods and steps identified and taken by a first responder will be different than a security architect designing a system.    For the purposes of this post, I’ve chosen a simple set of steps:

  • Detection
  • Investigation
  • Scope
  • Assessment
  • Mitigation

Detection

I was working on a particular server and router.  I was planning a side project I have an interest in and wanted to check the configurations of the router and server to ensure it would support my project.  During the course of checking the server, I issued a command to check for the current connections being made to the server (netstat).
netstat1-cleansed1

What immediately jumped out at me was the ssh connection highlighted above in red.  Although SSH is permitted to this system, there is only 3 people that have access and all are members of the same ISP.  This connection was not part of the ISP netblocks.  It is possible someone could have been traveling and accessed it remotely but I was confident no one with access was in China  (where the IP is registered).  Regardless of the source address, the source port ‘36948′ was constantly changing every few seconds, indicating a new connections being spawned.

Investigation

After observing the constant connection attempts, a quick look at the server logs and some basic filtering revealed the following:

Nov 16 00:45:05 serverA sshd[5423]: Invalid user admin from 218.108.234.208
Nov 16 00:45:05 serverA sshd[5424]: input_userauth_request: invalid user admin
Nov 16 00:45:06 serverA sshd[5423]: Failed password for invalid user admin from 218.108.234.208 port 36910 ssh2
Nov 16 00:45:10 serverA sshd[5425]: Invalid user test from 218.108.234.208
Nov 16 00:45:10 serverA sshd[5426]: input_userauth_request: invalid user test
Nov 16 00:45:11 serverA sshd[5425]: Failed password for invalid user test from 218.108.234.208 port 38556 ssh2
Nov 16 00:45:14 serverA sshd[5427]: Invalid user guest from 218.108.234.208
Nov 16 00:45:14 serverA sshd[5428]: input_userauth_request: invalid user guest
Nov 16 00:45:16 serverA sshd[5427]: Failed password for invalid user guest from 218.108.234.208 port 40196 ssh2
Nov 16 00:45:19 serverA sshd[5429]: Invalid user webmaster from 218.108.234.208
Nov 16 00:45:19 serverA sshd[5430]: input_userauth_request: invalid user webmaster
Nov 16 00:45:22 serverA sshd[5429]: Failed password for invalid user webmaster from 218.108.234.208 port 41776 ssh2
Nov 16 00:45:31 serverA sshd[5434]: Invalid user oracle from 218.108.234.208
Nov 16 00:45:31 serverA sshd[5435]: input_userauth_request: invalid user oracle
Nov 16 00:45:33 serverA sshd[5434]: Failed password for invalid user oracle from 218.108.234.208 port 45829 ssh2
Nov 16 00:45:36 serverA sshd[5436]: Invalid user library from 218.108.234.208
Nov 16 00:45:36 serverA sshd[5437]: input_userauth_request: invalid user library
Nov 16 00:45:38 serverA sshd[5436]: Failed password for invalid user library from 218.108.234.208 port 47647 ssh2
Nov 16 00:45:41 serverA sshd[5438]: Invalid user info from 218.108.234.208
Nov 16 00:45:41 serverA sshd[5439]: input_userauth_request: invalid user info
Nov 16 00:45:43 serverA sshd[5438]: Failed password for invalid user info from 218.108.234.208 port 49440 ssh2
Nov 16 00:45:46 serverA sshd[5440]: Invalid user shell from 218.108.234.208
Nov 16 00:45:46 serverA sshd[5441]: input_userauth_request: invalid user shell
Nov 16 00:45:48 serverA sshd[5440]: Failed password for invalid user shell from 218.108.234.208 port 51218 ssh2
Nov 16 00:45:51 serverA sshd[5442]: Invalid user linux from 218.108.234.208
Nov 16 00:45:51 serverA sshd[5443]: input_userauth_request: invalid user linux
Nov 16 00:45:53 serverA sshd[5442]: Failed password for invalid user linux from 218.108.234.208 port 52953 ssh2
Nov 16 00:45:56 serverA sshd[5444]: Invalid user unix from 218.108.234.208
Nov 16 00:45:56 serverA sshd[5445]: input_userauth_request: invalid user unix
Nov 16 00:45:59 serverA sshd[5444]: Failed password for invalid user unix from 218.108.234.208 port 54704 ssh2
Nov 16 00:46:02 serverA sshd[5446]: Invalid user webadmin from 218.108.234.208
Nov 16 00:46:02 serverA sshd[5447]: input_userauth_request: invalid user webadmin
Nov 16 00:46:04 serverA sshd[5446]: Failed password for invalid user webadmin from 218.108.234.208 port 56994 ssh2
Nov 16 00:46:13 serverA sshd[5451]: Invalid user test from 218.108.234.208
Nov 16 00:46:13 serverA sshd[5452]: input_userauth_request: invalid user test
Nov 16 00:46:16 serverA sshd[5451]: Failed password for invalid user test from 218.108.234.208 port 60988 ssh2
Nov 16 00:46:24 serverA sshd[5456]: Invalid user admin from 218.108.234.208
Nov 16 00:46:24 serverA sshd[5457]: input_userauth_request: invalid user admin
Nov 16 00:46:27 serverA sshd[5456]: Failed password for invalid user admin from 218.108.234.208 port 36482 ssh2
Nov 16 00:46:30 serverA sshd[5458]: Invalid user guest from 218.108.234.208
Nov 16 00:46:30 serverA sshd[5459]: input_userauth_request: invalid user guest
Nov 16 00:46:32 serverA sshd[5458]: Failed password for invalid user guest from 218.108.234.208 port 38285 ssh2
Nov 16 00:46:35 serverA sshd[5460]: Invalid user master from 218.108.234.208
Nov 16 00:46:35 serverA sshd[5461]: input_userauth_request: invalid user master
Nov 16 00:46:37 serverA sshd[5460]: Failed password for invalid user master from 218.108.234.208 port 39898 ssh2
Nov 16 00:47:20 serverA sshd[5489]: Invalid user admin from 218.108.234.208
Nov 16 00:47:20 serverA sshd[5490]: input_userauth_request: invalid user admin
Nov 16 00:47:23 serverA sshd[5489]: Failed password for invalid user admin from 218.108.234.208 port 54777 ssh2
Nov 16 00:47:26 serverA sshd[5491]: Invalid user admin from 218.108.234.208
Nov 16 00:47:26 serverA sshd[5492]: input_userauth_request: invalid user admin
Nov 16 00:47:28 serverA sshd[5491]: Failed password for invalid user admin from 218.108.234.208 port 56536 ssh2
Nov 16 00:47:31 serverA sshd[5493]: Invalid user admin from 218.108.234.208
Nov 16 00:47:31 serverA sshd[5494]: input_userauth_request: invalid user admin
Nov 16 00:47:33 serverA sshd[5493]: Failed password for invalid user admin from 218.108.234.208 port 58262 ssh2
Nov 16 00:47:36 serverA sshd[5495]: Invalid user admin from 218.108.234.208
Nov 16 00:47:36 serverA sshd[5496]: input_userauth_request: invalid user admin
Nov 16 00:47:38 serverA sshd[5495]: Failed password for invalid user admin from 218.108.234.208 port 60006 ssh2
Nov 16 00:47:52 serverA sshd[5503]: Invalid user test from 218.108.234.208
Nov 16 00:47:52 serverA sshd[5504]: input_userauth_request: invalid user test
Nov 16 00:47:54 serverA sshd[5503]: Failed password for invalid user test from 218.108.234.208 port 36914 ssh2
Nov 16 00:47:57 serverA sshd[5505]: Invalid user test from 218.108.234.208
Nov 16 00:47:57 serverA sshd[5506]: input_userauth_request: invalid user test
Nov 16 00:47:59 serverA sshd[5505]: Failed password for invalid user test from 218.108.234.208 port 38498 ssh2
Nov 16 00:48:04 serverA sshd[5507]: Invalid user webmaster from 218.108.234.208
Nov 16 00:48:04 serverA sshd[5508]: input_userauth_request: invalid user webmaster
Nov 16 00:48:06 serverA sshd[5507]: Failed password for invalid user webmaster from 218.108.234.208 port 40506 ssh2
Nov 16 00:48:09 serverA sshd[5509]: Invalid user user from 218.108.234.208
Nov 16 00:48:09 serverA sshd[5510]: input_userauth_request: invalid user user
Nov 16 00:48:11 serverA sshd[5509]: Failed password for invalid user user from 218.108.234.208 port 42147 ssh2
Nov 16 00:48:14 serverA sshd[5511]: Invalid user username from 218.108.234.208
Nov 16 00:48:14 serverA sshd[5512]: input_userauth_request: invalid user username
Nov 16 00:48:16 serverA sshd[5511]: Failed password for invalid user username from 218.108.234.208 port 43771 ssh2
Nov 16 00:48:19 serverA sshd[5513]: Invalid user username from 218.108.234.208
Nov 16 00:48:19 serverA sshd[5514]: input_userauth_request: invalid user username
Nov 16 00:48:21 serverA sshd[5513]: Failed password for invalid user username from 218.108.234.208 port 45636 ssh2
Nov 16 00:48:24 serverA sshd[5515]: Invalid user user from 218.108.234.208
Nov 16 00:48:24 serverA sshd[5516]: input_userauth_request: invalid user user
Nov 16 00:48:26 serverA sshd[5515]: Failed password for invalid user user from 218.108.234.208 port 47217 ssh2
Nov 16 00:48:35 serverA sshd[5520]: Invalid user admin from 218.108.234.208
Nov 16 00:48:35 serverA sshd[5521]: input_userauth_request: invalid user admin
Nov 16 00:48:37 serverA sshd[5520]: Failed password for invalid user admin from 218.108.234.208 port 50752 ssh2
Nov 16 00:48:40 serverA sshd[5522]: Invalid user test from 218.108.234.208
Nov 16 00:48:40 serverA sshd[5523]: input_userauth_request: invalid user test
Nov 16 00:48:42 serverA sshd[5522]: Failed password for invalid user test from 218.108.234.208 port 52460 ssh2
Nov 16 00:49:05 serverA sshd[5536]: Invalid user danny from 218.108.234.208
Nov 16 00:49:05 serverA sshd[5537]: input_userauth_request: invalid user danny
Nov 16 00:49:07 serverA sshd[5536]: Failed password for invalid user danny from 218.108.234.208 port 32852 ssh2
Nov 16 00:49:10 serverA sshd[5538]: Invalid user sharon from 218.108.234.208
Nov 16 00:49:10 serverA sshd[5539]: input_userauth_request: invalid user sharon
Nov 16 00:49:12 serverA sshd[5538]: Failed password for invalid user sharon from 218.108.234.208 port 34547 ssh2
Nov 16 00:49:15 serverA sshd[5540]: Invalid user aron from 218.108.234.208
Nov 16 00:49:15 serverA sshd[5541]: input_userauth_request: invalid user aron
Nov 16 00:49:17 serverA sshd[5540]: Failed password for invalid user aron from 218.108.234.208 port 36174 ssh2
Nov 16 00:49:20 serverA sshd[5542]: Invalid user alex from 218.108.234.208
Nov 16 00:49:20 serverA sshd[5543]: input_userauth_request: invalid user alex
Nov 16 00:49:22 serverA sshd[5542]: Failed password for invalid user alex from 218.108.234.208 port 37737 ssh2
Nov 16 00:49:25 serverA sshd[5544]: Invalid user brett from 218.108.234.208
Nov 16 00:49:25 serverA sshd[5545]: input_userauth_request: invalid user brett
Nov 16 00:49:27 serverA sshd[5544]: Failed password for invalid user brett from 218.108.234.208 port 39340 ssh2
...............

From the server logs, we can determine:

  • Attack started at 00:45
  • Dictionary attack where the attacker is sequencing through names as well as common Unix account ids.
  • Rate is approximately 1 id every 1.5-2 seconds
  • Source port is reasonably random, or at least random enough to fool basic firewall and IPS technologies.

Scope

What other systems if any on the network are under attack?  To determine this quickly I logged onto an aggregation point and captured traffic that corresponded to the attack in progress for a few minutes.  Next, a command was run to filter the captured data to show the servers that were being attacked.

$ tcpdump -n -r ./sshBfAttack-ispView.cap "src net 218.108.234.0/24 and tcp[tcpflags] & (tcp-syn) != 0" | awk '{print $5}' | awk -F. '{print $1"."$2"."$3"."$4}' | sort -u
reading from file ./sshBfAttack-ispView.cap, link-type EN10MB (Ethernet)
xxx.x0.0.25
xxx.x0.0.4
xxx.x0.0.43
xxx.x0.12.100
xxx.x0.12.101
xxx.x0.12.103
xxx.x0.12.136
xxx.x0.12.142
xxx.x0.12.20
xxx.x0.12.29
$

We now have a list of current targets.  The filter above is a simple filter and it makes some basic assumptions.  Several filters were run on the traffic to ensure the scope of the attack but for the purposes of this post, the concept is what is important.  The type of filters and parameters of the filters one uses will depend on the type of attack, direction of the attack and other factors.

Assessment / mitigation

What most fear when they assess an attack are false positives of actions they perform.  An action that causes a valid request to be denied for example.  In the case of a company such as an Internet service provider, financial institution or any business that makes money using the Internet, this could be detrimental.  How a company mitigates or handles an attack really depends on many factors.  The type of attack, the behaviour of the attack,  the risk of stopping the attack,  the risk of letting the attack proceed are just some examples of questions that need to be asked and answered.

For this specific attack:

  • The servers being attacked contained no financial or personal data that was at risk to anyone.
  • One of the servers controls some password authentication features
  • The attack is external and coming from a specific IP address.
  • The service under attack is really not required for external access.

The solution was to deploy an access control list on the routers to not permit connections to that service from external sources.  This effectively mitigated the attack.

Conclusion and thoughts

What amazes me is that these dictionary type of attacks, regardless of service are very common.  Every step I have outlined here can be automated and should be, yet in so many cases this is not true.  I know many organizations that have spent thousands of dollars on projects, vendor equipment, security audits, and consultants, yet you take a look at their network and this simple, known,  attack is still present and goes on undetected.

Has your company spent time and money on security solutions such as audits, penetration tests, and products for security?  If you looked at your network or asked your security folks if the attack here would be automatically detected, reported, investigated and mitigated if it was present on your network would the answer be ‘yes’.  If not, why not?

Nov 15 10:38:00 flashpoint sshd[2924]: Invalid user webmaster from 200.87.171.78
Nov 15 10:38:00 flashpoint sshd[2925]: input_userauth_request: invalid user webmaster
Nov 15 10:38:02 flashpoint sshd[2924]: Failed password for invalid user webmaster from 200.87.171.78 port 53724 ssh2
Nov 15 10:38:18 flashpoint sshd[2933]: Invalid user sales from 200.87.171.78
Nov 15 10:38:18 flashpoint sshd[2934]: input_userauth_request: invalid user sales
Nov 15 10:38:20 flashpoint sshd[2933]: Failed password for invalid user sales from 200.87.171.78 port 54139 ssh2
Nov 15 10:38:24 flashpoint sshd[2935]: Invalid user admin from 200.87.171.78
Nov 15 10:38:24 flashpoint sshd[2936]: input_userauth_request: invalid user admin
Nov 15 10:38:26 flashpoint sshd[2935]: Failed password for invalid user admin from 200.87.171.78 port 54247 ssh2
Nov 15 10:38:30 flashpoint sshd[2937]: Invalid user andrea from 200.87.171.78
Nov 15 10:38:30 flashpoint sshd[2938]: input_userauth_request: invalid user andrea
Nov 15 10:38:32 flashpoint sshd[2937]: Failed password for invalid user andrea from 200.87.171.78 port 54347 ssh2
Nov 15 10:38:40 flashpoint sshd[2939]: Invalid user backup from 200.87.171.78
Nov 15 10:38:40 flashpoint sshd[2940]: input_userauth_request: invalid user backup
Nov 15 10:38:41 flashpoint sshd[2939]: Failed password for invalid user backup from 200.87.171.78 port 54462 ssh2
Nov 15 10:38:45 flashpoint sshd[2941]: Invalid user guest from 200.87.171.78
Nov 15 10:38:45 flashpoint sshd[2942]: input_userauth_request: invalid user guest
Nov 15 10:38:47 flashpoint sshd[2941]: Failed password for invalid user guest from 200.87.171.78 port 54613 ssh2
Nov 15 10:38:51 flashpoint sshd[2943]: Invalid user guest1 from 200.87.171.78
Nov 15 10:38:51 flashpoint sshd[2944]: input_userauth_request: invalid user guest1
Nov 15 10:38:53 flashpoint sshd[2943]: Failed password for invalid user guest1 from 200.87.171.78 port 54697 ssh2
Nov 15 10:38:57 flashpoint sshd[2945]: Invalid user guest2 from 200.87.171.78
Nov 15 10:38:57 flashpoint sshd[2946]: input_userauth_request: invalid user guest2
Nov 15 10:38:59 flashpoint sshd[2945]: Failed password for invalid user guest2 from 200.87.171.78 port 54798 ssh2
Nov 15 10:39:04 flashpoint sshd[2947]: Invalid user guest3 from 200.87.171.78
Nov 15 10:39:04 flashpoint sshd[2948]: input_userauth_request: invalid user guest3

Investigation of encrypted traffic

November 23rd, 2009 Clear2Go No comments

onyx1As the traffic on the Internet becomes more and more encrypted due to privacy concerns, the need to protect data from third parties, prying eyes, marketers, service providers and others, behavioural profiling of network sessions will become more and more necessary.  Already, there are many products that claim to do behavioural profiling of network activity in varying degrees to assist with behaviour detection.  There is more and more active research in this area by vendors, law enforcement, bad guys and others.

I reviewed a report where it was indicated that because the data was encrypted it was impossible to determine anything useful.  This is not always the case, but I have seen this conclusion in reports and investigations many times when dealing with encrypted or unidentified data.  Aside from the marketing which says that if my Internet sessions are encrypted then one is safe (nothing could be further from the truth), many network administrators do not understand or have had much experience with behavioural profiling.  Behavioural profiling of networks can be very complex, and research is relatively new in this area.  To give some insight into how one might profile network sessions and show how one can use behavioural profiling to extract information, I decided to walk through a simple example and answer a simple question.  Specifically, what are the differences between an encrypted network session where one is watching a program or video (user providing no input), compared to an interactive type of network session where one is interacting (providing input)?  I used the SSH protocol to illustrate.

I used video over SSH to watch a program.  The program was approximately 24 minutes in duration and was hosted on a server at my ISP.   There were no problems watching the program, it didn’t pause or stop, and it was just like watching a typical television program (in fact I watched it on my flat screen TV).  I used a device to capture the traffic between the server hosting the program and my home for the entire duration of the program.  Finally, I captured an interactive SSH session which was me logged into a server at my ISP, where I was doing some coding and some shell commands.

Attempts to look at the actual data of either of these captures will be useless.  Since the data is encrypted, without access to the session keys knowing what was transmitted is close to if not impossible.  That being stated, what behaviour characteristics can we observe to tell us what might be going on?

I separated the direction of each of two captures which gave me 4 capture files, video received, video transmitted, interactive data received and interactive data transmitted.

Bandwidth

Received Transmitted Ratio
Video 193.2 MB 7.0 MB 0.036
Interactive 0.59 MB 0.58 MB 0.98

Looking at the chart above, the video watching has a much larger amount of data received than transmitted compared to the interactive session where a similar amount of data is transmitted and received.  Analysis of most video streaming and flows where downloading is occurring will yield a similar results.  The ratio of received to transmitted data will be high.  Interactive sessions tend to have a more balanced ratio of transmitted to received data compared to a video session.  This of course has dependencies on what the user is doing in the interactive session, but typically this has been the case in my experience.

Inter-packet timing

Another interesting metric is the time difference or delta between two packets.  Watching a video or listening to music, the delta between two packets tends to be small in comparison to an interactive type of session.  There are a few reason for this.  Since the video is being viewed, it is important to ensure that the data arrives in a timely manner so as to not have the video ‘freeze’ while being watched.   Some software attempts to write the video data to disk in advance of viewing to help mitigate this problem, but that leaves an exposure where an savvy individual can obtain a copy of the video by simply making a copy of the temporary file.  As a result, newer software tends to attempt to keep the data in memory and not write it to disk.  The result is the need to ensure a smooth delivery of data, minimizing delay between packets (known as Jitter).

Received (seconds)
Transmitted (seconds)
Maximum Mean Std Dev. Maximum Mean Std Dev.
Video 3.065 0.021 0.094 3.051 0.014 0.076
Interactive 4028.555 3.568 88.736 4028.544 2.162 69.137

I wrote a simple python script which will take as input a capture file, calculates the inter-packet timing for each pair of packets and then outputs among other information, the results you see in the table above.  The Maximum field is the largest time between packets, the mean is the average time between packets, and the standard deviation is a measure of how ‘different’ the inter packet times are from the ‘normal’.  For those that don’t know or wish to have a refresher in standard deviation, here is a good place to start. However, most languages and spreadsheets have functions to calculate this for you if you do not wish to learn the math.  In simple terms and using our specific example, if all the packets had the exact same time between them then the standard deviation would be 0.  The greater the difference in timing between packets, the greater the standard deviation will be.

Notice that the standard deviation is much higher for the interactive session then the video session.  Sessions that stream data, tend to have a low standard deviation for inter-packet timing.  If you think about it this makes sense, as an interactive session you can walk away from the computer, or the program could be waiting for input from the user so data transmission will fluctuate more.

Bandwidth, inter-packet timing, and methods such as standard deviation and mean are just a few things that can be used to narrow down what a particular subjects activities might be.  In corporate or law enforcement investigations, profiling network behaviour can be a useful tool to determine if you need to spend more time on the investigation or if you have the right target.  Using our example above,  suppose a corporation wants to determine which employees are watching streaming videos.  A scan of the network data reveals an individual who has encrypted sessions, but these sessions show a transmit / receive ratio that is in line with typical interactive sessions and not video sessions.  Also, the standard deviation of the inter-packet timing is higher for these sessions, then you can rule them out as an individual of interest immediately.  This has the advantage of focusing your investigation, not encroaching on privacy issues unnecessarily,  and saves time by allowing you to focus on the users that have network sessions with characteristics that fit the behaviour you are looking for.

For those of you that feel comfortable because the data is ‘encrypted’ it can be a false sense of security.  These are two of the many metrics and theorems that can be used on the data.  This area has active research and there are many products that will do this type of analysis in an automated fashion.  For those interested in this, although older now, this is a great paper where an experiment was conducted to determine what movie people were watching even though the movie data was encrypted.  They used behavioural data to fingerprint the movies, then applied the fingerprints to encrypted transmitted data.

Associated Press analysis on news propagation of Michael Jackson’s death

August 15th, 2009 Clear2Go No comments

A confidential memo leaked from the Associated Press, explains a 3 part plan to control news they produce, to stop websites, blogs, Twitter, and anyone else from ’scraping’ the content and using it without their permission.

While I think they will have a tough fight on their hands and I doubt their plan will be acceptable today, the analysis provided in the confidential memo is interesting. Specifically, I like the analysis of how the news of Michael Jackson’s death propagated and how Wikipedia, Google, and Twitter where the main benefactors of the traffic.

Michael Jackson died suddenly on June 25, and within 30 minutes,
the news absorbed 25 percent of all web traffic. Online news
sites logged an astounding 4.2 million visitors a minute,
according to the delivery network Akamai.

Two of the biggest beneficiaries of that traffic bonanza were
Twitter and Wikipedia, a couple of digital natives that would
have been viewed as very unlikely news competitors even a few
months ago. Indeed, a new pattern of consumption was validated
in the confusing minutes that followed the first reports of
Jacko's death: Users shared; they searched and they clicked
on Wikipedia.

In the course of only a few hours on the first day of the story,
the Michael Jackson page on Wikipedia received 1.8 million
visits.  By Friday, the total reached 5 million visits.

For those with long Internet memories, the new routine of
Twitter-to-Google-to-Wikipedia contrasts sharply with the
behavior of users in August of 1997, when millions loaded
and reloaded bookmarked news sites to get updates on the
death of Princess Diana, another celebrity icon of similar
magnitude.

I have to agree with their behavioural analysis of consumers of news.  I myself saw a tweet on my PDA about the death of Michael Jackson.  Next, I searched Twitter and clicked on the links that made sense to click on.  Twitter is my main source of news.  From Twitter, I can decide what news tweets if any I am interested in.  If I am interested, I can investigate further via other tweets, links and/or Google.

The memo goes on to explain their 3 step approach to regaining control of the news from consumers.  AP did a press release on their “News Registry” to help “protect content” which is one of the steps in the memo.  However, the confidential memo is much more revealing and ‘colourful’ if you are interested.

DNS analysis – Part I

February 15th, 2009 Clear2Go 2 comments

I have been doing some investigation into DNS lately.   I set up to capture all DNS queries that left my house for approximately six days.  There are three people in my house that use the internet in one way or another.  Using some quick scripts I wrote, I extracted the queries that were asked of the DNS.  Using some graphical software, with this data as input, I created a couple of visualizations.  First, a standard word tag visualization, where the larger the word the more references are associated with the word in a particular dataset.

What can you learn from a visualization such as this?  Could you build a profile of the persons in this house just from their DNS queries?  And if you can, what does it tell you?  Twitter is obviously used in the house as the largest number of references are made to ‘twitter’. ‘Sandvine’ is also used often.  There are references to ‘mac’ and ‘apple’.  ‘facebook’ also is large relative to the others.  There are queries to ‘thepiratebay’. What do these all mean?  What can we infer from them, and are we accurate with our inferences?

Using the same dataset with full queries, here it is visualized as a bubble graph .

From this visualization, ‘twitter.com’ and ’search.twitter.com’ receive most of the queries, making it safe to say there is probably at least an active twitter account with an individual in this residence.  The ‘DC-2.sandvine.com’ sheds light that someone reguarily looks up what is probably a ‘Domain controller’ for ‘Sandvine.com’.  If from this you were to infer an employee of Sandvine, well you’d be correct.  You can not actually get to any of those servers without using a VPN, but due to the way DNS works, it often leaks.

Over the next few weeks, I will be working with this data, the graphs above, with other tools and DNS vectors to determine what  else can be inferred from just DNS.

Obama bandwidth – upward trend in bandwidth requirements

January 24th, 2009 Clear2Go No comments

Here are two graphs showing inbound HTTP from a link off a small service providers network.  The first graph is Jan 19th, 2009 day prior to Obama’s inauguration.  The second graph is Jan 20th, 2009 the day of the inauguration.  If you look at 11:00 – 12:30 you can clearly see the abnormal bandwidth increase due to this being broadcast live over the internet and this is just HTTP, not other streaming protocols that might have been used.

You can clearly see the increase in bandwidth on this one link during the Inauguration.  This has happened before. Twitter has inauguration data that shows the same trend for their micro blog service.

As the Internet becomes more and more the media for information, bandwidth is going to constantly increase and spike when these type of events occur. Service providers need to effectively manage the bandwidth, ensuring fairness, privacy, and deploying appropriate infrastructure to support the trending increase in bandwidth over the next 5-10 years.

I look at my family over the last 3 years. We hardly watch television and any shows we do watch, we watch via the Internet. We listen to the radio via the internet. We get all information and news via the Internet. We communicate almost exclusively via the internet.

Behavioural profiling … the next level

November 12th, 2008 Clear2Go No comments

Most know that behavioural profiling is becoming more and more standard practice every day. Just by watching communication between mobile phones, communication between systems, where people connect to on the internet you can glean so much valuable information about a target. Johnny Long wrote a book about similar ways to accomplish profiling by information gathering on targets. Behaviour profiling can be used to find botnets, DDoS attacks, phishing and other malicious activity. It has good uses.

The next level. Google.org has a site that indirectly tracks flu trends by correlating search terms with location where the search was performed and other information. Appears the accuracy level approaches that of the Centers for Disease Control and has a lead of up to two weeks. This is cool stuff.

Security and State requirements

October 7th, 2007 Clear2Go No comments

Lately myself and my team have been trying to solve some more difficult security problems with the detection of certain malware. It used to be that detection of malicious activity could be done effectively with minimal state.

Lately every time we discover a new piece of malware, and entertain possible detection mechanisms, we constantly end up dealing with the issue of resource requirements to detect the malware for many of our proposed solutions.

Anyone else having similar issues? Would love to hear your opinion.

Dynamic Botnets

September 22nd, 2007 Clear2Go No comments

A research paper / tutorial I wrote a few months back. It shows one of the many BotNets that was detected and tracked by my team. The goal of this paper was to show how a typical Dynamic BotNet communicates, the implications these BotNets can have to ISPs, why traditional detection and mitigation is not enough to stop them and why behavioural detection not just simple static signatures are needed to detect and mitigate this type of malicious software.