Archive - Security RSS Feed

DNS forensics and working with service providers

magnificationhI had the privilege yesterday of speaking to some law enforcement personnel and forensics experts.  The topic was on DNS forensics, the SSL server_name option, and working with service providers.  I enjoyed the opportunity.   I really like talking about network forensics, and being surrounded by smart people that are experts in their field. It also allows me to practice my public speaking which is always good.

The DNS section of the presentation was based on my earlier two posts on DNS analysis which are here and here.   The SSL server_name option was based on my post that is here.  The “Working with service providers” I have never really posted about yet, but have been engaged with service providers all over the world for almost 5 years consistently, so I spoke about my experiences, and thoughts.

The presentation slides are here.

Using DNS to determine when someone is home — DNS analysis, Part II

Last month, I did a quick write up on a DNS trace that I had extracted.  The trace was all the DNS queries that left my house over a few days.  Using that same trace, I noticed that there were many queries to the domain of my employer.   This in itself was not unusual, but one particular query caught my eye:

2009-02-08 05:34:02.680383 IP 216.240.7.12.58684 > 208.67.222.222.53: 30554+ A? ap-1.sandvine.com. (35)
2009-02-08 05:34:03.037603 IP 208.67.222.222.53 > 216.240.7.12.58684: 30554 1/0/0 A 216.16.234.191 (51)

This query happened every 10-20 minutes.  Tracing it back I realized it was coming from my mobile phone.  This got me to thinking, could one determine when I was or was not home with just access to a DNS trace?  To answer that I did a bit of investigation of the address ap-1.sandvine.com.

mike@Janel:~/investigation/homeDns$ dig @ns1.domainmonger.com ap-1.sandvine.com

; <<>> DiG 9.5.0-P2 <<>> @ns1.domainmonger.com ap-1.sandvine.com

; (1 server found)

;; global options: printcmd

;; Got answer:

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36335

;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 0

;; WARNING: recursion requested but not available

;; QUESTION SECTION:

;ap-1.sandvine.com. IN A

;; ANSWER SECTION:

ap-1.sandvine.com. 60 IN A 216.16.234.191

;; AUTHORITY SECTION:

sandvine.com. 60 IN NS ns1.domainmonger.com.

sandvine.com. 60 IN NS ns2.domainmonger.com.

;; Query time: 92 msec

;; SERVER: 216.98.150.33#53(216.98.150.33)

;; WHEN: Sun Apr 12 12:29:19 2009

;; MSG SIZE rcvd: 100

mike@Janel:~/investigation/homeDns$

From above the record, for ap-1.sandvine.com refreshes every 60 seconds.  That means that my mobile ignores the refresh request from the DNS.  While interesting to know, it doesn’t help answer my question.

I extracted all queries to ap-1.sandvine.com, the timestamp for each and quickly plotted them with gnuplot.  Next, I pulled my calendar and daily logs and added notes to the graph. The y-axis is irrelevant.  The red dots show when the queries were made and the green arrows and notes are my comments based on my calendar and logs.

A third party could easily determine when I was or was not home with a high degree of certainty.    With mobile phones now having wi-fi capabilities and connecting to the local wireless network it becomes trivial to use them as a vector to determine when someone is home or not.  I ran the same analysis on my wife’s mobile and got similar results (I didn’t add them to the chart here).

Obviously you could use other protocols and do a much more detailed analysis and correlation (or just execute standard physical surveillance), but DNS is good in that it is required for the Internet, a standard, and is not encrypted.  This was a relatively simple exercise and reasonably cost effective.   I am not a lawyer, but I suspect based on the ongoing privacy debate and  some recent court decisions that DNS queries executed by an individual or a business might be considered ‘public’ with no expectation of privacy.  I’d argue that with access to DNS information from a particular entity, one could glean interesting information from a competitive company.

Anti-Forensics – not as easy as once thought

image by wchulseiee (http://www.flickr.com/photos/wchulseiee/2427418216/)

image by wchulseiee (http://www.flickr.com/photos/wchulseiee/2427418216/)

My laptop is pretty secure. I am not silly enough to think that is is 100% secure or that no one could get into it, but relative to most laptops out there it’s not too bad. There are weaknesses due to time or software requirements, but I think I am aware of most of them. I don’t encrypt the operating system (yet), but all data partitions are encrypted. It has been configured with the goal that all sensitive data and metadata  (web browser, IM, video, audio, cache, bookmarks)  is encrypted.
once data is no longer ‘required it is stored on the servers at the office and then ‘wiped’ off the encrypted drives at regular intervals .    All metadata  is wiped from the encrypted drives each weekend, which gives at most one week of metadata, assuming an attacker can get into the encrypted drives to view it. The main reason for all this is to protect customer data. I like others in my industry work with institutions and their data.  In many cases that data can be politically, financially, or image ‘sensitive’ in nature if it was to get into the wrong hands.  Should my laptop ever be stolen, I want to at least make it difficult for an attacker to gain easy access to the data in a reasonable period of time.

Imagine my surprise when I was re-configuring my laptop and I discovered that my deleted file metadata had somehow been reset  to write to a different area, on an unencrypted area of my drive.  The following is a partial view of the files I discovered.  The files went back as far as November, 2008.

Trash Meta Directory on laptop

Trash Meta Directory on laptop

These are standard text files with information about each file that was deleted.   The information includes the original file location as well as a timestamp indicating when the file was deleted.

Trash meta data file details

Trash meta data file details

Even though the actual data files were not present, there is a lot of information here.  Just from working with the data contained in the files above, one could easily determine names of files worked on, importance, directory structure of encrypted partitions, date file was deleted and more.  You could very easily put together a time line of a customer, projects being worked on, dates of project activity, useful information that can be sold, used to a competing company or party’s advantage in court, for a bid, or a competitive product or service.

There is a lot of ‘negativity’ with Anti-Forensics lately, especially in the forensics community.   Although I understand and appreciate the problems and concerns they have, I believe anti-forensics is necessary and a good thing.   It all depends on who is using it and why.  Needless to say, I have fixed the problem with my laptop, and ‘double checked’ my drive encryption and scripts to ensure correct execution.

DNS analysis – Part I

I have been doing some investigation into DNS lately.   I set up to capture all DNS queries that left my house for approximately six days.  There are three people in my house that use the internet in one way or another.  Using some quick scripts I wrote, I extracted the queries that were asked of the DNS.  Using some graphical software, with this data as input, I created a couple of visualizations.  First, a standard word tag visualization, where the larger the word the more references are associated with the word in a particular dataset.

What can you learn from a visualization such as this?  Could you build a profile of the persons in this house just from their DNS queries?  And if you can, what does it tell you?  Twitter is obviously used in the house as the largest number of references are made to ‘twitter’. ‘Sandvine’ is also used often.  There are references to ‘mac’ and ‘apple’.  ‘facebook’ also is large relative to the others.  There are queries to ‘thepiratebay’. What do these all mean?  What can we infer from them, and are we accurate with our inferences?

Using the same dataset with full queries, here it is visualized as a bubble graph .

From this visualization, ‘twitter.com’ and ‘search.twitter.com’ receive most of the queries, making it safe to say there is probably at least an active twitter account with an individual in this residence.  The ‘DC-2.sandvine.com’ sheds light that someone reguarily looks up what is probably a ‘Domain controller’ for ‘Sandvine.com’.  If from this you were to infer an employee of Sandvine, well you’d be correct.  You can not actually get to any of those servers without using a VPN, but due to the way DNS works, it often leaks.

Over the next few weeks, I will be working with this data, the graphs above, with other tools and DNS vectors to determine what  else can be inferred from just DNS.

TLS/SSL data leakage

If you ask most people about TLS or SSL, they understand that it has something to do with ‘securing’ information that is on the Internet.  People with a networking background will understand it as an encrypted session which encrypts everything above layer 5, effectively user data.  In the case of HTTP, this would include the URL that a user was requesting such as https://www.tdcanadatrust.com.   I was looking at a network capture file recently, and was shocked to find at the start of the session the server that I was accessing in the initial client hello packet of the SSL session, specifically http://www.tdcanadatrust.com.

You can see in the server name in the SSL client hello packet.  The hello packet is the first part of the initial SSL handshake sequence when a application attempts to establish and SSL session.

Using Wireshark, and digging a little deeper, I found it is classified as an ‘Extension’ labeled ‘server_name’

It appears to be one of the acceptable extensions for SSL.  A quick check of the RFC revealed that it is an optional addition that applications such as a browser can add to the SSL negotiation process.

<snip>
.2. Extended Server Hello

The extended server hello message format MAY be sent in place of the
server hello message when the client has requested extended
functionality via the extended client hello message specified in
Section 2.1.

……

In order to provide the server name, clients MAY include an extension
of type “server_name” in the (extended) client hello.  The
“extension_data” field of this extension SHALL contain
“ServerNameList” where:

struct {
NameType name_type;
select (name_type) {
</snip>

As it turns out, this functionality was added to permit virtual hosting of SSL/TLS enabled sites.  Without it, every site requires a unique IP address.  With that reasoning, I expect it to become common place in the future.  One can argue that by having the destination IP address (which is not encrypted) of a network flow, determining which site a user is visiting when each IP address is mapped to a single SSL application is trivial.  Therefore adding this extended server_name option is no different and hence there is no added privacy concerns.   While I agree with this, it makes it much easier for the automation of statistics and monitoring of network flows.

The main point to keep in mind is that although you data is still encrypted, TLS/SSL still reveals the sites you visit.

Device security and encryption

Title of this article doesn’t really do it justice. It is a good article that gives a high-level understanding of the concept of Trusted Booting of a device. Good read for individuals in or working with law enforcement and digital forensics. As this type of technology becomes more and more mainstream, it will become much more difficult to surreptitiously obtain access to or data from devices without the owners cooperation.

Detecting BotNets

I am at a conference this week speaking and participating.  An individual asked about BotNet detection for a particular product.  Specifically they highlighted several vendors that have solutions that detect BotNets.  I find this concept amusing and more of a marketing or positioning type of thing.  Stating  “We detect BotNets” is like stating “We do security”.

Anyone doing security research or investigations today realizes that almost all attacks are part of a botnet.  Most botnets are really just advanced shell programs that allow you to deploy whatever exploit or attack you want.  Botnet software usually takes care of the control, messaging, encryption requirements, exploit updates and allow the author to use other code or ‘plugins’ to create the BotNet behaviour they wish.

It is rare you will detect a Bot, rather you will detect the existence of a bot via the behaviour it exhibits.  This behaviour is usually in the form of spam, a D/DoS attack, phishing scam for personal data are examples.  It is important when assessing vendors for security that you go deeper than “We detect Botnets”.  How do you detect Botnets?  How do you ensure it is a valid BotNet and not just a P2P application?  The answer to questions such as these will tell you if a vendor honestly understands and knows security.  Good vendors of security will respond with responses that highlihgt the concept that a product detects bots and/or botnets but via intelligence gathered by behaviour patterns, subscriber or network history, chronology and external data.

Obama bandwidth – upward trend in bandwidth requirements

Here are two graphs showing inbound HTTP from a link off a small service providers network.  The first graph is Jan 19th, 2009 day prior to Obama’s inauguration.  The second graph is Jan 20th, 2009 the day of the inauguration.  If you look at 11:00 – 12:30 you can clearly see the abnormal bandwidth increase due to this being broadcast live over the internet and this is just HTTP, not other streaming protocols that might have been used.

You can clearly see the increase in bandwidth on this one link during the Inauguration.  This has happened before. Twitter has inauguration data that shows the same trend for their micro blog service.

As the Internet becomes more and more the media for information, bandwidth is going to constantly increase and spike when these type of events occur. Service providers need to effectively manage the bandwidth, ensuring fairness, privacy, and deploying appropriate infrastructure to support the trending increase in bandwidth over the next 5-10 years.

I look at my family over the last 3 years. We hardly watch television and any shows we do watch, we watch via the Internet. We listen to the radio via the internet. We get all information and news via the Internet. We communicate almost exclusively via the internet.

Extracting audio from last.fm

Since I have been listening to last.fm lately and just recently pulled a capture file for analysis, I was wondering if audio extraction would work in the case of an investigation. Turns out using the procedure I wrote back in Oct works well. The end result is a directory of files containing the streamed audio from last.fm which can be played as standard mp3 files.

Bandwidth requirements for a basic audio stream

I signed up and started periodically using last.fm in Feb 2006. I stopped in August 2006 and didn’t go back to it just this past December. If you are wondering how I know that in such detail it has to do with how last.fm keeps a profile on you, but I’ll save that for another post. I have found that the selection of music it picks for me has greatly improved since I first signed up.

There are different encoding formats for video and audio that affect the bandwidth and timing requirements for the transmission of streaming content. Ignoring the technical details around this for now, if a end user decides to stream audio from a service such as last.fm, how much bandwidth do they require to listen to that single stream? To test this, I selected a track that was approximately 120 seconds in length and captured the audio stream while it played. The track played fine with no delays or problems. I captured the audio stream in two places, the laptop where the song was being played and on my service provider’s network at the demarcation point between my service provider and their upstream service provider. Capturing the same stream at two points allowed me to compare both captures for issues such as dropped packets or other anomalies or problems. My provider actually has two upstream providers, but a quick check of the BGP routing table showed all the data for last.fm coming from just one of the upstream providers.

Comparison of the two streams showed only 2 packets were lost between entry into my service provider and receipt of the packets on my PC (kudos to my service provider). Bandwidth requirements for a 120 second song were approximately 0.157 Mb/s. That single song consumed approximately 2.1 MB of data, which is pretty consistent with a typical decent quality MP3 file (depending on encoding).
Service Provider stream summary

Local PC stream summary

Using simple math, if a service provider has 5000 subscribers and we assume that at peak 1% are listening to streaming audio in their home via one of the many services available on the Internet, that is a minimum rate of 7.85 Mb/s of bandwidth allocation the service provider must provide for the subscribers just listening to streaming audio. This does not include services such as web browsing, online gaming, watching video, downloading, or any other of the tasks that can be done over the internet. The demand to have more bits per second to the home is going to constantly increase.Whether  service providers are able to keep up with this demand is a subject of debate.
Page 5 of 10« First...«34567»...Last »