I have been doing some investigation into DNS lately. I set up to capture all DNS queries that left my house for approximately six days. There are three people in my house that use the internet in one way or another. Using some quick scripts I wrote, I extracted the queries that were asked of the DNS. Using some graphical software, with this data as input, I created a couple of visualizations. First, a standard word tag visualization, where the larger the word the more references are associated with the word in a particular dataset.
What can you learn from a visualization such as this? Could you build a profile of the persons in this house just from their DNS queries? And if you can, what does it tell you? Twitter is obviously used in the house as the largest number of references are made to ‘twitter’. ‘Sandvine’ is also used often. There are references to ‘mac’ and ‘apple’. ‘facebook’ also is large relative to the others. There are queries to ‘thepiratebay’. What do these all mean? What can we infer from them, and are we accurate with our inferences?
Using the same dataset with full queries, here it is visualized as a bubble graph .

From this visualization, ‘twitter.com’ and ‘search.twitter.com’ receive most of the queries, making it safe to say there is probably at least an active twitter account with an individual in this residence. The ‘DC-2.sandvine.com’ sheds light that someone reguarily looks up what is probably a ‘Domain controller’ for ‘Sandvine.com’. If from this you were to infer an employee of Sandvine, well you’d be correct. You can not actually get to any of those servers without using a VPN, but due to the way DNS works, it often leaks.
Over the next few weeks, I will be working with this data, the graphs above, with other tools and DNS vectors to determine what else can be inferred from just DNS.

Pingback: Using DNS to determine when someone is home — DNS analysis, Part II « Kaizen
Pingback: Michael N. Dundas