What happens when you post a link on Twitter?
Posting a link on Twitter provokes a flurry of activity within a short space of time. This is an analysis of what happened when I posted a link to my blog.
Immediately
I was tailing the server log in preparation for this and was surprised just how fast the first hits turned up - it was almost simultaneous with hitting the "send" button. Twitter and Showyou pick up the robots.txt.
199.59.149.169 - - [12/Oct/2012:15:58:25 +0100] "GET /robots.txt HTTP/1.1" 200 24 "-" "Twitterbot/1.0"
173.192.79.101 - - [12/Oct/2012:15:58:25 +0100] "GET /robots.txt HTTP/1.1" 200 24 "-" "ShowyouBot (http://showyou.com/crawler)"
The fact that Showyou turned up at the same time as Twitter indicates to me that they have firehose access.
Five seconds
Within five seconds we have another hit on the robots.txt - this time from Butterfly.
74.112.131.128 - - [12/Oct/2012:15:58:27 +0100] "GET /robots.txt HTTP/1.1" 200 24 "-" "Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly/) Gecko/2009032608 Firefox/3.0.8"
Twitterbot - presumably having checked the robots.txt for clearance - is now back to check the page itself.
199.59.149.169 - - [12/Oct/2012:15:58:29 +0100] "GET /2012/10/12/webkit-notification-api/ HTTP/1.1" 200 3458 "-" "Twitterbot/1.0"
Tweetmebot now turns up checking the page and for the first time we see a 499 status code. This is an Nginx specific code indicating that the client closed the connection before the server had finished sending the data.
89.151.116.53 - - [12/Oct/2012:15:58:29 +0100] "GET /2012/10/12/webkit-notification-api/ HTTP/1.1" 499 0 "-" "Mozilla/5.0 (compatible; TweetmemeBot/2.11; +http://tweetmeme.com/)"
UnwindFetchor makes a HEAD request which means it can examine the header data and decide if it wants to GET the full page content but it doesn't even complete this.
50.18.21.225 - - [12/Oct/2012:15:58:30 +0100] "HEAD /2012/10/12/webkit-notification-api/ HTTP/1.1" 499 0 "-" "UnwindFetchor/1.0 (+http://www.gnip.com/)"
Ten seconds
We now get our first unidentified user agent and the reverse DNS shows it is the first of several visits from hosts on Amazon's compute cloud.
23.22.39.206 - - [12/Oct/2012:15:58:31 +0100] "GET /2012/10/12/webkit-notification-api/ HTTP/1.1" 200 3458 "-" "Mozilla/5.0 (compatible"
UnwindFetchor is now back for with a second head request which it, again, doesn't seem to complete.
50.18.21.225 - - [12/Oct/2012:15:58:31 +0100] "HEAD /2012/10/12/webkit-notification-api/ HTTP/1.1" 499 0 "-" "UnwindFetchor/1.0 (+http://www.gnip.com/)"
ShowyouBot is now back to check the page, despite getting the robots.txt at the same time as Twitterbot it has taken two seconds longer to come back for the page.
173.192.79.101 - - [12/Oct/2012:15:58:31 +0100] "GET /2012/10/12/webkit-notification-api/ HTTP/1.1" 200 3458 "-" "ShowyouBot (http://showyou.com/crawler)"
Butterfly is now back with two requests - the first one doesn't complete and they are immediately back for the content.
74.112.131.128 - - [12/Oct/2012:15:58:32 +0100] "GET /2012/10/12/webkit-notification-api/ HTTP/1.1" 499 0 "-" "Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly/) Gecko/2009032608 Firefox/3.0.8"
74.112.131.128 - - [12/Oct/2012:15:58:32 +0100] "GET /2012/10/12/webkit-notification-api/ HTTP/1.1" 200 3458 "-" "Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly/) Gecko/2009032608 Firefox/3.0.8"
Fifteen seconds
Next up we have JS-Kit which was a "real-time web platform". Visiting the URL listed in the user agent we get a page informing us that their services have been discontinued as of 1st October 2012 - it's now 11 days later and it seems that nobody has told their URL Resolver.
204.236.188.39 - - [12/Oct/2012:15:58:36 +0100] "GET /2012/10/12/webkit-notification-api/ HTTP/1.1" 200 11159 "-" "JS-Kit URL Resolver, http://js-kit.com/"
Yahoo! slurp checks the robots.txt and then the content. This is the first search engine on the scene.
72.30.142.218 - - [12/Oct/2012:15:58:55 +0100] "GET /robots.txt HTTP/1.0" 200 24 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
72.30.142.218 - - [12/Oct/2012:15:58:55 +0100] "HEAD /2012/10/12/webkit-notification-api/ HTTP/1.0" 200 0 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
This next request purports to be from Internet Explorer 7 but it's not requested any of the assets so I think it's safe to say this is a spoofed user agent, especially as the reverse DNS lookup fails.
65.52.0.51 - - [12/Oct/2012:15:58:58 +0100] "GET /2012/10/12/webkit-notification-api/ HTTP/1.1" 200 3458 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
One minute
Finally we get something that looks like it could actually be a genuine visit. It's the current version of Chrome and all the assets required to to fully render the page have been requested. Checking the reverse DNS lookup of this IP address confirms it's probably genuine.
91.240.174.38 - - [12/Oct/2012:15:59:42 +0100] "GET /2012/10/12/webkit-notification-api/ HTTP/1.1" 200 3458 "http://t.co/FSMLWM7D" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.79 Safari/537.4"
91.240.174.38 - - [12/Oct/2012:15:59:42 +0100] "GET /static/css/main.css?_=04642b8df8 HTTP/1.1" 200 3257 "http://blog.decadecity.net/2012/10/12/webkit-notification-api/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.79 Safari/537.4"
91.240.174.38 - - [12/Oct/2012:15:59:42 +0100] "GET /static/js/main.js?_=04642b8df8 HTTP/1.1" 200 24534 "http://blog.decadecity.net/2012/10/12/webkit-notification-api/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.79 Safari/537.4"
91.240.174.38 - - [12/Oct/2012:15:59:42 +0100] "GET /static/img/logo.png HTTP/1.1" 200 23884 "http://blog.decadecity.net/2012/10/12/webkit-notification-api/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.79 Safari/537.4"
91.240.174.38 - - [12/Oct/2012:15:59:43 +0100] "GET /static/js/prettify.js HTTP/1.1" 200 57688 "http://blog.decadecity.net/2012/10/12/webkit-notification-api/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.79 Safari/537.4"
91.240.174.38 - - [12/Oct/2012:15:59:43 +0100] "GET /favicon.ico HTTP/1.1" 200 614 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.79 Safari/537.4"
Two minutes
The next entry is my own URL resolver for my Twitter archiver. This runs every 30 minutes so it's just by chance that it's arrived on the scene so quickly.
127.0.0.1 - - [12/Oct/2012:16:00:06 +0100] "HEAD /2012/10/12/webkit-notification-api/ HTTP/1.1" 200 0 "-" "t.co resolver on behalf of @decadecity [Currently attempting to resolove: http://t.co/FSMLWM7D ]"
PaperLiBot comes in checking first with a HEAD request and then a GET for the content.
37.59.18.41 - - [12/Oct/2012:16:00:19 +0100] "HEAD /2012/10/12/webkit-notification-api/ HTTP/1.1" 200 0 "-" "Mozilla/5.0 (compatible; PaperLiBot/2.1; http://support.paper.li/entries/20023257-what-is-paper-li)"
37.59.18.43 - - [12/Oct/2012:16:00:20 +0100] "GET /2012/10/12/webkit-notification-api/ HTTP/1.1" 200 3458 "-" "Mozilla/5.0 (compatible; PaperLiBot/2.1; http://support.paper.li/entries/20023257-what-is-paper-li)"
Five minutes
Next we have what seem to be two genuine visitors - one with Chrome on a Mac and the other on an iPhone. The reverse DNS for the first appears to be domestic broadband. The second has no reverse DNS entry.
82.41.28.142 - - [12/Oct/2012:16:03:42 +0100] "GET /2012/10/12/webkit-notification-api/ HTTP/1.1" 200 3458 "http://t.co/FSMLWM7D" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1"
82.41.28.142 - - [12/Oct/2012:16:03:42 +0100] "GET /static/css/main.css?_=04642b8df8 HTTP/1.1" 200 3257 "http://blog.decadecity.net/2012/10/12/webkit-notification-api/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1"
82.41.28.142 - - [12/Oct/2012:16:03:43 +0100] "GET /static/js/main.js?_=04642b8df8 HTTP/1.1" 200 24534 "http://blog.decadecity.net/2012/10/12/webkit-notification-api/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1"
82.41.28.142 - - [12/Oct/2012:16:03:43 +0100] "GET /static/img/logo.png HTTP/1.1" 200 23884 "http://blog.decadecity.net/2012/10/12/webkit-notification-api/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1"
82.41.28.142 - - [12/Oct/2012:16:03:43 +0100] "GET /static/js/prettify.js HTTP/1.1" 200 57688 "http://blog.decadecity.net/2012/10/12/webkit-notification-api/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1"
82.41.28.142 - - [12/Oct/2012:16:03:43 +0100] "GET /favicon.ico HTTP/1.1" 200 614 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1"
86.129.79.206 - - [12/Oct/2012:16:04:21 +0100] "GET /2012/10/12/webkit-notification-api/ HTTP/1.1" 200 3458 "http://iconfactory.com/twitterrific/#iPhone" "Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Mobile/8L1"
86.129.79.206 - - [12/Oct/2012:16:04:21 +0100] "GET /static/css/main.css?_=04642b8df8 HTTP/1.1" 200 3257 "http://blog.decadecity.net/2012/10/12/webkit-notification-api/" "Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Mobile/8L1"
86.129.79.206 - - [12/Oct/2012:16:04:21 +0100] "GET /static/img/logo.png HTTP/1.1" 200 23884 "http://blog.decadecity.net/2012/10/12/webkit-notification-api/" "Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Mobile/8L1"
86.129.79.206 - - [12/Oct/2012:16:04:21 +0100] "GET /static/js/main.js?_=04642b8df8 HTTP/1.1" 200 24534 "http://blog.decadecity.net/2012/10/12/webkit-notification-api/" "Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Mobile/8L1"
86.129.79.206 - - [12/Oct/2012:16:04:22 +0100] "GET /static/js/prettify.js HTTP/1.1" 200 57688 "http://blog.decadecity.net/2012/10/12/webkit-notification-api/" "Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Mobile/8L1"
Ten minutes
Coming in fairly late to the part is Kimengi - Nine Connections Pro is listed on their website as still being in private beta so I'm guessing they don't really need the realtime info just yet. Again they use the HEAD -> GET request pattern.
176.34.78.244 - - [12/Oct/2012:16:06:19 +0100] "HEAD /2012/10/12/webkit-notification-api/ HTTP/1.1" 200 0 "-" "Kimengi/nineconnections.com"
176.34.78.244 - - [12/Oct/2012:16:06:20 +0100] "GET /2012/10/12/webkit-notification-api/ HTTP/1.1" 200 3458 "-" "Kimengi/nineconnections.com"
This is a completely unidentified User Agent but the reverse DNS reveals it's from Walmart - yes: the Wallmart. They appear to have a social mining product.
38.113.234.180 - - [12/Oct/2012:16:07:04 +0100] "GET /2012/10/12/webkit-notification-api/ HTTP/1.1" 200 11159 "-" "-"
This uses a Mac user agent string but there are no assets requested and the reverse DNS is for a host on Amazon's compute cloud so it's fair to assume it's not a real person.
184.73.14.223 - - [12/Oct/2012:16:07:07 +0100] "GET /2012/10/12/webkit-notification-api/ HTTP/1.1" 200 3458 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_6) AppleWebKit/534.24 (KHTML, like Gecko)"
Moreoverbot arrives next - good informative user agent string this.
70.39.246.37 - - [12/Oct/2012:16:07:11 +0100] "GET /2012/10/12/webkit-notification-api/ HTTP/1.0" 200 3458 "-" "Mozilla/5.0 Moreoverbot/5.1 (+http://www.moreover.com; webmaster@moreover.com)"
Twenty five minutes
Strawberryj.am resolver comes in, they are another beta service that seem to give information on who is sharing links.
23.22.31.36 - - [12/Oct/2012:16:24:44 +0100] "GET /2012/10/12/webkit-notification-api/ HTTP/1.1" 200 11159 "-" "Mozilla/5.0 (compatible; strawberryj.am url expander)"
One and a half hours
Two simultaneous visits from Flipboard proxies - both on Amazon's compute cloud. The URL in the user agent returns a 404.
23.20.211.56 - - [12/Oct/2012:17:22:01 +0100] "GET /2012/10/12/webkit-notification-api/ HTTP/1.1" 200 3458 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)"
184.72.70.11 - - [12/Oct/2012:17:22:01 +0100] "GET /2012/10/12/webkit-notification-api/ HTTP/1.1" 200 3458 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (FlipboardProxy/0.0.5; +http://flipboard.com/browserproxy)"
Three hours
Tweeted times is a "real-time personalized newspaper generated from your Twitter account".
100.43.81.8 - - [12/Oct/2012:19:39:45 +0100] "GET /2012/10/12/webkit-notification-api/ HTTP/1.1" 200 3458 "-" "Mozilla/5.0 (compatible; TweetedTimes Bot/1.0; +http://tweetedtimes.com)"
Eight hours
Last, but not least, Google decides to join the party.
66.249.66.66 - - [13/Oct/2012:00:42:53 +0100] "GET /2012/10/12/webkit-notification-api/ HTTP/1.1" 200 3458 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"