Archive

Archive for the ‘monitoring’ Category

Tor and plausible deniability

February 18th, 2010 Clear2Go 2 comments

Once again I have been experimenting with the Tor network.  In doing so I have set up some Tor nodes. I have received a few notifications that my computer ‘may be infected’. Google for a brief period of time requested I enter a capcha to confirm I am human.  These are all expected minor nuisances when running Tor as an exit node. My main reason for setting up Tor this time, is to obtain a better understanding of what happens to behavioural and static detection when a Tor exit node is present.

If you want privacy or anonymity on the Internet, there are many things you can do. Proxies, Tor, encrypted tunnels, compromised systems, and many other techniques are available.  None of these will guarantee you anonymity or privacy, but they each help and the more you can do the better.  There are caveats of course and in several cases while consulting I have come across scenarios where a client thought they were being anonymous but were in fact not as anonymous as they thought.  When you are trying to be anonymous, use of monitoring techniques and system checks really help.

I’ve realized that running a Tor exit node but not using it yourself gives you anonymity.  I’ve always known this inherently, but I’ve realized that it is even better than I thought.  Say you are an evil person doing something evil on the Internet.  If your activities were being tracked by your service provider due to a warrant from law enforcement or laws were put in place that required all service providers to track and retain your Internet surfing activities for a period of time, they would be recording the surfing habits of every connection that selected your Tor node as its exit node.

If they accused you of illegal activity, you could easily say that was not me, it must have been someone using my Tor node.  While this is not a guarantee the criminal would not get caught, it would increase the cost of the investigation significantly.  More investigation time, more forensics to prove that the suspect is the criminal.  Add in anti-forensics on your terminals and systems you use for the crime and the costs for investigation again will increase, forcing them to assess if it is worth the time, money, and resources required.

If countries are going to deploy the retention laws similar to the above, it will only be a matter of time before they will have to outlaw services such as Tor in order to make them effective at catching the serious criminals.  From a Tor network perspective, these laws might help increase the node count of the Tor network on the Internet which is a good thing for them.

I wonder if law makers consider these questions when suggesting these laws?

Investigation of encrypted traffic

November 23rd, 2009 Clear2Go No comments

onyx1As the traffic on the Internet becomes more and more encrypted due to privacy concerns, the need to protect data from third parties, prying eyes, marketers, service providers and others, behavioural profiling of network sessions will become more and more necessary.  Already, there are many products that claim to do behavioural profiling of network activity in varying degrees to assist with behaviour detection.  There is more and more active research in this area by vendors, law enforcement, bad guys and others.

I reviewed a report where it was indicated that because the data was encrypted it was impossible to determine anything useful.  This is not always the case, but I have seen this conclusion in reports and investigations many times when dealing with encrypted or unidentified data.  Aside from the marketing which says that if my Internet sessions are encrypted then one is safe (nothing could be further from the truth), many network administrators do not understand or have had much experience with behavioural profiling.  Behavioural profiling of networks can be very complex, and research is relatively new in this area.  To give some insight into how one might profile network sessions and show how one can use behavioural profiling to extract information, I decided to walk through a simple example and answer a simple question.  Specifically, what are the differences between an encrypted network session where one is watching a program or video (user providing no input), compared to an interactive type of network session where one is interacting (providing input)?  I used the SSH protocol to illustrate.

I used video over SSH to watch a program.  The program was approximately 24 minutes in duration and was hosted on a server at my ISP.   There were no problems watching the program, it didn’t pause or stop, and it was just like watching a typical television program (in fact I watched it on my flat screen TV).  I used a device to capture the traffic between the server hosting the program and my home for the entire duration of the program.  Finally, I captured an interactive SSH session which was me logged into a server at my ISP, where I was doing some coding and some shell commands.

Attempts to look at the actual data of either of these captures will be useless.  Since the data is encrypted, without access to the session keys knowing what was transmitted is close to if not impossible.  That being stated, what behaviour characteristics can we observe to tell us what might be going on?

I separated the direction of each of two captures which gave me 4 capture files, video received, video transmitted, interactive data received and interactive data transmitted.

Bandwidth

Received Transmitted Ratio
Video 193.2 MB 7.0 MB 0.036
Interactive 0.59 MB 0.58 MB 0.98

Looking at the chart above, the video watching has a much larger amount of data received than transmitted compared to the interactive session where a similar amount of data is transmitted and received.  Analysis of most video streaming and flows where downloading is occurring will yield a similar results.  The ratio of received to transmitted data will be high.  Interactive sessions tend to have a more balanced ratio of transmitted to received data compared to a video session.  This of course has dependencies on what the user is doing in the interactive session, but typically this has been the case in my experience.

Inter-packet timing

Another interesting metric is the time difference or delta between two packets.  Watching a video or listening to music, the delta between two packets tends to be small in comparison to an interactive type of session.  There are a few reason for this.  Since the video is being viewed, it is important to ensure that the data arrives in a timely manner so as to not have the video ‘freeze’ while being watched.   Some software attempts to write the video data to disk in advance of viewing to help mitigate this problem, but that leaves an exposure where an savvy individual can obtain a copy of the video by simply making a copy of the temporary file.  As a result, newer software tends to attempt to keep the data in memory and not write it to disk.  The result is the need to ensure a smooth delivery of data, minimizing delay between packets (known as Jitter).

Received (seconds)
Transmitted (seconds)
Maximum Mean Std Dev. Maximum Mean Std Dev.
Video 3.065 0.021 0.094 3.051 0.014 0.076
Interactive 4028.555 3.568 88.736 4028.544 2.162 69.137

I wrote a simple python script which will take as input a capture file, calculates the inter-packet timing for each pair of packets and then outputs among other information, the results you see in the table above.  The Maximum field is the largest time between packets, the mean is the average time between packets, and the standard deviation is a measure of how ‘different’ the inter packet times are from the ‘normal’.  For those that don’t know or wish to have a refresher in standard deviation, here is a good place to start. However, most languages and spreadsheets have functions to calculate this for you if you do not wish to learn the math.  In simple terms and using our specific example, if all the packets had the exact same time between them then the standard deviation would be 0.  The greater the difference in timing between packets, the greater the standard deviation will be.

Notice that the standard deviation is much higher for the interactive session then the video session.  Sessions that stream data, tend to have a low standard deviation for inter-packet timing.  If you think about it this makes sense, as an interactive session you can walk away from the computer, or the program could be waiting for input from the user so data transmission will fluctuate more.

Bandwidth, inter-packet timing, and methods such as standard deviation and mean are just a few things that can be used to narrow down what a particular subjects activities might be.  In corporate or law enforcement investigations, profiling network behaviour can be a useful tool to determine if you need to spend more time on the investigation or if you have the right target.  Using our example above,  suppose a corporation wants to determine which employees are watching streaming videos.  A scan of the network data reveals an individual who has encrypted sessions, but these sessions show a transmit / receive ratio that is in line with typical interactive sessions and not video sessions.  Also, the standard deviation of the inter-packet timing is higher for these sessions, then you can rule them out as an individual of interest immediately.  This has the advantage of focusing your investigation, not encroaching on privacy issues unnecessarily,  and saves time by allowing you to focus on the users that have network sessions with characteristics that fit the behaviour you are looking for.

For those of you that feel comfortable because the data is ‘encrypted’ it can be a false sense of security.  These are two of the many metrics and theorems that can be used on the data.  This area has active research and there are many products that will do this type of analysis in an automated fashion.  For those interested in this, although older now, this is a great paper where an experiment was conducted to determine what movie people were watching even though the movie data was encrypted.  They used behavioural data to fingerprint the movies, then applied the fingerprints to encrypted transmitted data.

What is my daughter up to on the Internet, part I

October 25th, 2009 Clear2Go No comments

ObservationMy daughter has recently become much more interested in some of the social networking sites such as Facebook and Youtube. This is a little concerning for my wife and I. We encourage her to use technology as much as possible, but at the same time there is a inherent risk. There is software you can purchase and install that will download the latest bad sites, look for questionable URLs and even questionable pictures, but I didn’t want to move to this level just yet.  She is not running Windows.

The problem became how could I use some standard networking tools to passively monitor what she is up to on the Internet? I made some basic assumptions.  First, I am only interested in HTTP for now.  Second, I want to extract the sites she visits and do not care about the data that is returned at this point.

We have a Linux box that acts as our gateway to the Internet, so that seemed like the best place to deploy the solution. The first thing was to create a regular expression (regex) that will examine each packet that leaves our internal network and look for commands from the HTTP protocol specification. Any packets matching this will be saved for future analysis. The regex I created is:

^([Gg][Ee][Tt]|[Pp][Oo][Ss][Tt])|([Hh][Ee][Aa][Dd])|([Pp][Uu][Tt])|([Dd][Ee][Ll][Ee][Tt][Ee])|([Tt][Rr][Aa][Cc][Ee])|([Oo][Pp][Tt][Ii][Oo][Nn][Ss])|([Cc][Oo][Nn][Nn][Ee][Cc][Tt])\x20*[Hh][Tt][Tt][Pp]\x2f\x31\x2e

This regex looks for any packet that begins with a HTTP 1.x command such as GET,POST,HEAD,PUT,DELETE,TRACE,OPTIONS, or CONNECT.  The command is separated by a space and then contains the HTTP version number, HTTP 1.  I am aware the regex could be made more optimal.  I chose to not worry about it as this format makes it easier to explain and understand if you are not familiar with regular expressions.  For those with DPI experience, there are more complex and accurate ways to detect HTTP.  For example, ipoque the company that initiated opendpi.org, released some “demo code” that shows some of the ways deep packet inspection (DPI) works.  You can run the demo code on any pre-saved capture files you have and it will attempt to inform you of the protocols that are in the capture file.   If you look at their code for HTTP detection, they have a multi-stage approach that looks at both sides of the flow to determine if the protocol is in fact HTTP.  Any vendors selling DPI equipment today should be doing this type of approach for protocol detection when possible.  However, for the purposes of determining what a individual is doing, I feel this is overkill.  If the situation was a company that was ’suspicious’ of an employee and just wanted to investigate simple solutions are better.  If criminal activity was found, and the data goes to court you want to be able to explain how you gathered the data, why it is valid and what it means.  Keep the explanation as simple as possible in these potential circumstances.

The only two missing pieces are we need to specify that this is for packets egressing from a particular computer (in this case my daughters).  This can be accomplished by adding a Berkeley Packet filter (BPF) on ngrep which will pre-process the packets prior to the application of the regular expression.  The final command I deployed was:

ngrep -O ./httpWatch1.cap -d eth1 -tq -Wbyline “^([Gg][Ee][Tt]|[Pp][Oo][Ss][Tt])|([Hh][Ee][Aa][Dd])|([Pp][Uu][Tt])|([Dd][Ee][Ll][Ee][Tt][Ee])|([Tt][Rr][Aa][Cc][Ee])|([Oo][Pp][Tt][Ii][Oo][Nn][Ss])|([Cc][Oo][Nn][Nn][Ee][Cc][Tt])\x20*[Hh][Tt][Tt][Pp]\x2f\x31\x2e”  “src host 10.1.1.40 and tcp”

This records all packets to a file called httpWatch1.cap that arrive on my internal interface eth1 where an HTTP 1.x command is encountered and the source of the request is TCP and from my daughters computer.  The screen shot below of the first few packets show what you can expect throughout the file.

HTTPCaptureFirstFewPackets

I let it capture for approximately 8 days.  In the next few days I will post how to take the data in this file and manipulate it to extract the information I am looking for.

Categories: Forensics, monitoring Tags:

Tracking with Local Shared Objects (LSO)

September 15th, 2009 Clear2Go No comments

Adobe Flash Logo

There has been lots of discussion lately about Flash websites using Local Shared Objects (LSO) to track users selections, browsing habits, and other information.  One of the advantages for websites has been that until now they have not been well known.  From my basic searching they have been around since at least 2004 and probably earlier.  A user may configure their browser to remove or delete all ‘cookies’, but LSOs stay.  According to some, many of the top websites use them.

I tried a little experiment to see how LSOs are stored.  The directory that they are stored varies depending upon your operating system.  For me I use Linux as my primary O/S.  The default directory for LSOs is ~/.macromedia/Flash_Player.

Clean Macromedia directory

Under the ‘Flash_Player’ there are two directories and under each of these directories are the security configuration and the binary installer for the Flash Air application.  Nothing interesting.  Next, I started Firefox and went to youtube.com and selected a video.  After the video completed, I took another look at the ~/.macromedia/Flash_Player directory.

macroMediaDirAfterYouTube1

Under~/.macromedia/Flash_Player we now have two new directories, macromedia.com and #SharedObjects.  If we decend the macromedia.com directory, we find 3 nested single directories called support, flashplayer, and sys respectively.  Under the ’sys’ directory we find a binary file called settings.sol and a subdirectory which is #s.ytimg.com owned by Google.  The #s.ytimg.com directory contains a separate settings.sol which is binary.

macroMediaDirAfterYouTube2

Under the #SharedObjects directory, there is a single oddly named directory ‘3BJH4AW6′, then a directory for the website ’s.ytimg.com’, a domain owned by Google.  Below this are two files entitled videostats.sol and soundData.sol, both containing binary data.

I haven’t investigated the format or contents of the .sol files, but it is obviously where the metadata is stored.  I may try to investigate the format or see if anyone else has already figured it out as I am curious.  The bigger question in my mind is how does one properly erase this data.  There is a Firefox add-on called BetterPrivacy which will do just that. It can be configured to delete LSOs on request or remove all the LSOs when you shutdown Firefox.  I installed BetterPrivacy and tried it.  Sure enough, upon shutting down Firefox I was greeted with this window:

betterPrivacyConfirm1

Selecting OK, put my ~/.macromedia/Flash_Player directory back to its original state with no LSOs or website directories present.  For the normal user that should suffice.  However, these are files and they have been deleted.  Most people should know that files these days that are deleted are typically still recoverable.   File systems such as NTFS (windows), ext2/ext3 (*nix) all can have files deleted on them recovered.  In the case of ext3, it is a journal file system and the default file system installed on most *nix platforms today.  Without getting into the details in this post, this effectively means that even if you wipe a file it can potentially still be recovered.

If you carry around sensitive information on your laptop, I recommend you create an encrypted volume on your hard drive using a package such as TrueCrypt, PGP.  In the case of my system, I formatted the encrypted file system to be ext2.  This means there is no journaling.  This has the disadvantage of being less ‘recoverable’ but it has the advantage that if you wipe a file with ‘wipe’, ’shred’ or some other wiping software it is unlikely to be recovered.  Next, I point my ~/.macromedia directory to the encrypted file system.

dirsToEncryptedFS1You can see the ~/mndData file which is the truecrypt fileystem.  ~/.macromedia is symbolically linked to the encrypted filesystem.  For those interested, you can see that my Evolution (~/.evolution), Google Desktop (~/.google), Firefox Cache and bookmarks (~/.mozilla), IM client (~/.purple) and Skype (~/.Skype) all write to the encrypted file system.  You have to be able to mount the ~/mndData to get at any of the email, browser cache, bookmarks, IM conversations and now LSOs.  It isn’t fool proof, but it offers another layer of protection so that client data remains unviewable in the event of my laptop being stolen.

DNS forensics and working with service providers

May 29th, 2009 Clear2Go No comments

magnificationhI had the privilege yesterday of speaking to some law enforcement personnel and forensics experts.  The topic was on DNS forensics, the SSL server_name option, and working with service providers.  I enjoyed the opportunity.   I really like talking about network forensics, and being surrounded by smart people that are experts in their field. It also allows me to practice my public speaking which is always good.

The DNS section of the presentation was based on my earlier two posts on DNS analysis which are here and here.   The SSL server_name option was based on my post that is here.  The “Working with service providers” I have never really posted about yet, but have been engaged with service providers all over the world for almost 5 years consistently, so I spoke about my experiences, and thoughts.

The presentation slides are here.

Categories: Forensics, law enforcement, monitoring Tags:

DNS analysis – Part I

February 15th, 2009 Clear2Go 2 comments

I have been doing some investigation into DNS lately.   I set up to capture all DNS queries that left my house for approximately six days.  There are three people in my house that use the internet in one way or another.  Using some quick scripts I wrote, I extracted the queries that were asked of the DNS.  Using some graphical software, with this data as input, I created a couple of visualizations.  First, a standard word tag visualization, where the larger the word the more references are associated with the word in a particular dataset.

What can you learn from a visualization such as this?  Could you build a profile of the persons in this house just from their DNS queries?  And if you can, what does it tell you?  Twitter is obviously used in the house as the largest number of references are made to ‘twitter’. ‘Sandvine’ is also used often.  There are references to ‘mac’ and ‘apple’.  ‘facebook’ also is large relative to the others.  There are queries to ‘thepiratebay’. What do these all mean?  What can we infer from them, and are we accurate with our inferences?

Using the same dataset with full queries, here it is visualized as a bubble graph .

From this visualization, ‘twitter.com’ and ’search.twitter.com’ receive most of the queries, making it safe to say there is probably at least an active twitter account with an individual in this residence.  The ‘DC-2.sandvine.com’ sheds light that someone reguarily looks up what is probably a ‘Domain controller’ for ‘Sandvine.com’.  If from this you were to infer an employee of Sandvine, well you’d be correct.  You can not actually get to any of those servers without using a VPN, but due to the way DNS works, it often leaks.

Over the next few weeks, I will be working with this data, the graphs above, with other tools and DNS vectors to determine what  else can be inferred from just DNS.

Obama bandwidth – upward trend in bandwidth requirements

January 24th, 2009 Clear2Go No comments

Here are two graphs showing inbound HTTP from a link off a small service providers network.  The first graph is Jan 19th, 2009 day prior to Obama’s inauguration.  The second graph is Jan 20th, 2009 the day of the inauguration.  If you look at 11:00 – 12:30 you can clearly see the abnormal bandwidth increase due to this being broadcast live over the internet and this is just HTTP, not other streaming protocols that might have been used.

You can clearly see the increase in bandwidth on this one link during the Inauguration.  This has happened before. Twitter has inauguration data that shows the same trend for their micro blog service.

As the Internet becomes more and more the media for information, bandwidth is going to constantly increase and spike when these type of events occur. Service providers need to effectively manage the bandwidth, ensuring fairness, privacy, and deploying appropriate infrastructure to support the trending increase in bandwidth over the next 5-10 years.

I look at my family over the last 3 years. We hardly watch television and any shows we do watch, we watch via the Internet. We listen to the radio via the internet. We get all information and news via the Internet. We communicate almost exclusively via the internet.

Bandwidth requirements for a basic audio stream

January 1st, 2009 Clear2Go No comments

I signed up and started periodically using last.fm in Feb 2006. I stopped in August 2006 and didn’t go back to it just this past December. If you are wondering how I know that in such detail it has to do with how last.fm keeps a profile on you, but I’ll save that for another post. I have found that the selection of music it picks for me has greatly improved since I first signed up.

There are different encoding formats for video and audio that affect the bandwidth and timing requirements for the transmission of streaming content. Ignoring the technical details around this for now, if a end user decides to stream audio from a service such as last.fm, how much bandwidth do they require to listen to that single stream? To test this, I selected a track that was approximately 120 seconds in length and captured the audio stream while it played. The track played fine with no delays or problems. I captured the audio stream in two places, the laptop where the song was being played and on my service provider’s network at the demarcation point between my service provider and their upstream service provider. Capturing the same stream at two points allowed me to compare both captures for issues such as dropped packets or other anomalies or problems. My provider actually has two upstream providers, but a quick check of the BGP routing table showed all the data for last.fm coming from just one of the upstream providers.

Comparison of the two streams showed only 2 packets were lost between entry into my service provider and receipt of the packets on my PC (kudos to my service provider). Bandwidth requirements for a 120 second song were approximately 0.157 Mb/s. That single song consumed approximately 2.1 MB of data, which is pretty consistent with a typical decent quality MP3 file (depending on encoding).
Service Provider stream summary

Local PC stream summary

Using simple math, if a service provider has 5000 subscribers and we assume that at peak 1% are listening to streaming audio in their home via one of the many services available on the Internet, that is a minimum rate of 7.85 Mb/s of bandwidth allocation the service provider must provide for the subscribers just listening to streaming audio. This does not include services such as web browsing, online gaming, watching video, downloading, or any other of the tasks that can be done over the internet. The demand to have more bits per second to the home is going to constantly increase. Weather service providers are able to keep up with this demand is a subject of debate.
Categories: monitoring Tags:

Covert Monitoring of IM

October 3rd, 2008 Clear2Go No comments

More news articles from stv.tv and EFF have been published on China working with Skype for chat conversations searching for key words etc. from the investigative work done by a Toronto based researcher. Although this is not a good thing for Skype, I wonder about other IM platforms such as MSN. I have friends in China that also use MSN regularly. If China has a policy to monitor IM transmissions for Skype, logic would dictate that they are doing the same with MSN and other chat programs as well.

Categories: monitoring Tags:

Chinese monitor Skype transmissions

October 2nd, 2008 Clear2Go No comments

This is not a surprise. There have been suggestions of Skype being monitored before. A research paper by Nart Villeneuve about the Chinese monitoring of Skype messaging has been published as well as a news article about the paper.

Just because something is encrypted does not mean it is secure. The fundamental problem is that of control. When businesses outsource their data to storage or processing to third parties, or one uses social networking sites, it may no longer be your data. Even if it is your data, you have given up some if not all control of the data. Deleting data such as a record, audio file or photograph does not mean it is actually deleted. Chances are very high the data is never really deleted and can be brought back. Try deleting your facebook profile for a week, then re-create it. You’ll find everything comes back, just as you left it.

Categories: monitoring Tags: