Thursday, October 29, 2009

Cost of YouTube Service on Amazon S3

YouTube hit a milestone recently - it served 1 billion videos per day. I was wondering how much it would cost Google to serve these videos. It seems to be a lot but how much?

Since we don't know Google's hosting costs, let's calculate it in a different way - from the published rates of Amazon S3 service. Note that this calculation is primarily focused on storage and bandwidth costs.

Here are some stats to work with:
  • average video size on YouTube is 10 MB (maybe slightly dated)
  • videos served per month 30 billion
  • 20 hours of video uploaded each minute, or 864,000 hours of video per month
  • average size of 1 hour of video = 0.5GB (calculated at bit-rate of 1150 kbps for VCD quality vide0)
Data transferred out per month to view the videos is:

30billion * 10MB = 300 billion MB = 292,968750 GB

From the Amazon S3 calculator, this comes to $29 million/month.

Cost of videos transferred in per month is:
0.5GB * 864000 = 432,000 GB

From the Amazon S3 calculator, it is $43,200/month.

Data transfer out dominates the data transfer-in costs.

Storing the new video each month costs $66,000/month. (Cost of holding previous videos is not calculated since I don't have that data).

The grand total is around $29,366,380/month or about $352 million per year. The real cost to Google would be lower than this (perhaps by 10-20%), but $352 million seems to be the upper limit for this year. This is quite similar to the estimates made by Credit Suisse report.

Cost of Amazon s3 vs Apple's MobileMe

I have been using Apple's MobileMe as a backup solution for a while and was wondering if Amazon S3 would work out cheaper than MobileMe. With Firefox extension such as S3Fox it is equally easy to use Amazon s3 for backup.

MobileMe costs $99/year for 20GB storage and 200GB of data transfer per month. Using the Amazon Calculator, I entered the following numbers:

  • Storage: 20 GB-months, i.e., using the full storage of 20GB per month
  • Data transfer in: 100GB/month, and Data transfer out: 100 GB/month. This is assuming my data transfer is equally used between input and output. This is usually not the case but let's assume it for simplicity in calculation.

The Amazon S3 cost comes to $3 for storage/month and $27 for data-transfer/month, for a grand total of $30/month, or $360/year, a whopping 3.6 times more than MobileMe.

More than the storage, it is the data-transfer costs that add up at Amazon. So if you were to go easy on the data-transfer (a typical use case when used as a backup), then the costs will come down.

What is the break even point? After tweaking the numbers, I found that if you store approximately 20GB/month and your data transfer is approximately 40GB/month (in and out included), then the price of Amazon s3 and Apple's MobileMe are the same. Beyond this, MobileMe is a better value.

In other words, Amazon S3 is a better value if you are mainly interested in using it as a storage device and data-transfer is small. On the other hand, if your data-transfer needs are high (e.g., as in hosting a popular video), MobileMe is better value.


Sunday, October 25, 2009

Kindle App on the iPhone

Recently, I was very excited about buying the new Kindle with international wireless. After doing a little bit of research, I stumbled upon the Kindle App for iPhone. Guess what, it is an excellent app. You can easily buy books from the Kindle store and it will appear on your iPhone. The iPhone app was so easy to use that I wonder if it is really necessary to buy the Kindle Reader.

Granted that the Kindle reader has a bigger screen and the battery lasts longer than the iPhone, but the iPhone screen is good enough for me. If you intend to read a book continuously for a long time, the Kindle may be better on your eyes, but for a quick read to kill time, the iPhone is good enough.

One thing I found lacking in the Kindle app for iPhone is the settings button. There is no way to prevent the app from using the wifi/3G/GPRS network for internet access when you don't want it to.

Tuesday, September 15, 2009

Google's Fast Flip or Fast Flop?

Fast Flip is an evolutionary step ahead in browsing for information. To determine if this is going to be a winner, we need to determine if Fast Flip is going to be a win-win-win for readers, publishers, and Google itself.


Who is the target audience?

Fast Flip is similar to a RSS feed reader except that you can also see the page as the website intended you to see it. If you don't care about the visual factor and any formatting is good enough for you, then Fast Flip provides no additional advantage. If you already have a feed reader, would you use Fast Flip? Not likely. So the target audience is mostly consumers who do not use a RSS feed readers - and that is the majority of the internet users.


Is it a win-win-win?

Consumers are certainly pressed for time. They would prefer any service that would eliminate the need for them to visit multiple websites. That's why News aggregators such as Yahoo news, Google news are popular. It gives them a quick overview of the day's news. If users want to know more about an article, they click. So what's the problem that Fast Flip is trying to solve? Is it that people are frustrated about slow websites? This was a problem a decade ago. Most people have broadband now. So the claim that websites are slow doesn't seem to make sense, or does it? We will revisit this a little later.

There is one interesting use case where the slowness becomes apparent. If there is an extremely popular article that everybody wants to read now but the source website can't handle the traffic, then a service such as Fast Flip backed by Google can enable users to read the story without experiencing any delay. In this case, it's a win-win-win for readers, publishers, and Google. But this is such a rare event. Clearly, Google is not banking on this use case for its service.

However, generalizing the above idea raises an interesting thought. What if the publishers "outsource" hosting of their content to Google Fast Flip - not as in web hosting, but in caching the image of their web page? Since most readers flip through information, why waste bandwidth and hosting resources on consumers that simply spend 20 seconds or less on a website. They don't even look at the ads, let alone click on them. However, the users that click through Fast Flip are the truly interested ones. To the publishers, this is the subset of readers that are of high value - they come to the site and stay longer. Displaying ads to this audience can command a much higher premium. Even though the participating websites will see a decline in traffic, the quality of the traffic will increase and thus drive their ad revenues higher. Given that Google is sharing revenue with participating web sites, the revenues are still coming in but slightly less. This is the cost of "outsourcing" the content to Google. But this will be offset by a reduction in infrastructure costs. Overall, it will benefit the publishers, but only if they transfer a significant chunk of their traffic to Google Fast Flip.

In short, Google wants to keep the low value but large number of visitors on Fast Flip, but send the fewer but higher value visitors to the participating website. Is this a good deal for the publishers?

Now let's revisit the "problem" of a website being slow. Given that most consumers spend no more that 20 seconds on a web page, a 10 second staggered wait time to fully load the web page is not uncommon these days. And that's 50% of a visitor's attention to a website. And that's is indeed too long! Fast Flip solves this problem and removes 50% of a reader's time and that is indeed huge. So assuming the content is deep, users will indeed opt to use Fast Flip. I can even see the standard Google search results being replaced by Fast Flip one day.

Thus Fast Flip seems to be a win-win-win for all, consumers, publishers, and Google. It does seem to make business sense. Will it work out? Only time will tell.

Thursday, August 27, 2009

2 Great Tools to Check Global Accessibility to Your Website

Would you like to know how long it takes for your customers/readers across the world to reach your website? Any techie would know that you can use the ping tool from your machine. But how do you reach the four corners of the world to do the ping test?

Try just-ping.com.

It is a web tool that pings your website from all around the world, in fact, from 38 locations around the world at the time of this writing.

Next is the just-traceroute.com , again from the same people.

This has remote terminals across the world and allows you to see the route from the machine to your website. This is particularly useful to debug latency problems, and also DNS problems.

Both tools will give you a glimpse into the performance of your website as seen by your customers. Of course, network latency is only part of the total performance, but it is an important one and cannot be ignored.

Friday, August 21, 2009

Superb free book for Software Architects

If you are a software architect or want to be a software architect, I highly recommend this superb free book from Microsoft. It is really a treasure trove of information for architecting software systems. It is not an academic book, rather, it incorporates wisdom from the field.

Application Architecture Guide 2.0

Although it includes information on how to build applications on the .NET platform, it's utility is not restricted to just .NET. The principles and guidelines can be applied to non-microsoft platforms too.

Wednesday, July 29, 2009

The disruptive power of URL shorteners

The popularity of URL shorteners has taken off ever since Twitter launched. We are increasingly seeing the short URLs everywhere. In fact, nobody would ever want to send a regular link again - even via email. Why? Simply because you get the tracking ability when you shorten the URL. For example, bit.ly gives you the click count on a URL and it gives you an idea about the reach of your network.

However, this blog post is not about the benefit of shortening the URL. Instead, I see in the URL shorteners the power to disrupt the status quo in the search industry. Yes, that's right! Google can be toppled! By these tiny URL shortener companies. I hope Google is reading this.

How you may ask? Here's how. Have you noticed this little button becoming more ubiquitous:




This button is a service provided by Tweetmeme and helps readers do a 1-click tweet of the web page, and I have used this on my website at www.mashedge.com.

Guess what happens when you add this button to your web page? The very first step in generating the button is to shorten the URL by a service like bit.ly. So, the URL shortener is the first one to see your new web page or blog entry as soon as it is born. And guess what, it can be hours or days before this web page shows up in the search results. This is the gap that the URL shorteners can exploit to provide a unique service, and a lucrative one too. Thus, the URL shorteners are becoming the gatekeepers to new web pages. They have new information that nobody else has. And with it, comes the power to disrupt.

Now, a real-time search service (probably there is one already that I am not aware of?) can be built on top of this stream of brand new URLs popping up all over the web. And nobody can beat this service in terms of freshness, not Twitter and not Google. When the results of this real-time service are aggregated with the conventional search engine results, you will have a new order in the search engines. Unless of course, Google buys up all the URL shorteners.

And the search service can be built at a fraction of the cost of Google. You don't need the armies of robots to discover these new URLs. They are being offered to the URL shorteners on a golden platter.

So, if the URL shorteners don't get into the search business, they will be very quickly acquired by one of the search companies. Expect these acquisitions to be have a huge price tag!

Follow kiran_achyutuni on Twitter.

Wednesday, July 22, 2009

Twitter vs Google: Twitter Winning

I am simply blown away by the power of real-time search in Twitter. Provided you choose to follow the right people, you can get a wealth of high-quality relevant real-time information from Twitter.

Google prides itself on providing relevant and fresh results but the little blue bird is beating the gorilla hands down. So, why is this? It's because of millions of users constantly searching their favorite corners of the web and tweeting about interesting articles. The collective search power of humans is trumping the computer algorithms of Google.

The question arises -can this be automated? Yes, parts of it can be automated - as in RSS feeds etc. The search engines can be informed in real-time about the existence of new content. So, why are Google and other search engines lacking? The answer is that a human being is the ultimate judge in quality and relevance, and that cannot be automated. Whenever a Tweet happens about a particular story, the user has voted on it about its quality. The Tweeter has put his/her reputation on the line by recommending a link. That's why it is a high-quality recommendation. The relevant users pick it up and it spreads like wild fire. It's the user community that is making the difference in Twitter and putting it over the top.

By the way, my use of Google search has decreased after discovering the power of Twitter! Has yours too?

Some useful links
  1. Official Twitter 101 for business
  2. Top 10 Twitter tips for Beginners (from PC Magazine)
  3. Technorati's @twitter_tips (articles from the blogsphere that are tagged as twitter_tips)
  4. Alltop twitter Stories (Keep track of the latest stories on Twitter)
  5. 105 Twitter apps for PR (although it is directed to PR people, this is a useful list of all audiences).

Saturday, April 18, 2009

IE Browser caching issues

Internet Explorer (IE) caches web pages differently when compared to Firefox and Safari. And so, the website developer must take extra care to ensure that their website works as expected in IE.

Here are some tips:

  1. Make sure you set the expiration date for every page. If yours is a dynamic web site, make sure to set the expiration date to 'now' so that the browser is forced to refetch the page.
  2. If you are using AJAX, make sure your XMLHttp requests that use the 'get' method use an URL that changes with time.
I have added links to good webpages that talk about this issue in this web book Mashedge Chapter: Browser Compatibility.

Monday, March 23, 2009

Why Google wins

Ever wonder how Google maintains its #1 position among search engines? Here's one reason: they send their robots to find places where stuff changes.

I first put up my website www.mashedge.com on a limited alpha test over a month ago. The first day I put it up, Googlebot was there and has been a regular visitor since. Only in the last 1 week, I have seen the MSN bot come by. There has been no trace of the Yahoo bot even now. If Microsoft and Yahoo want to catch up with Google, they better get their bots working right.

Saturday, March 7, 2009

Setting up sendmail relay with Airtel India as the ISP

This post is meant for those users who meet these requirements:

  1. You have obtained a static IP address from Airtel India.
  2. You have set up your host correctly - see my previous post.
  3. You plan to host a mailserver on your host with the ip address from step 1.
  4. You plan to use sendmail as the MTA (mail transport agent).
In order for sendmail to successfully relay messages to your ISP, and then onto the recipients, there are a number of steps to configure sendmail correctly. Some of these are required to eliminate spam (a very honorable goal).

First, you need to ask Airtel to enable your sendmail daemon to connect to their sendmail daemon on port 25. This is accomplished by sending a email to their customer service care.karnataka@airtel.in with cc: to d.blr@airtel.in. (Note these e-mail addresses will be different if you are in a different region in India). This step is mainly to get written confirmation from you that you will not send spam and have anti-virus measures properly done on your host.

Second, once port 25 is enabled, go to relay.airtelbroadband.in and register the e-mail address for which you need the relay. Let's say the webserver on your host will send mails from email address foo@blah.com, then you do the following:
  1. use the DSL userid and DSL password that Airtel gave you.
  2. In the relay addresses form, add "foo@blah.com". In the password field, enter the auth password of your choice for this email address, let's say it is "mypwd". Only emails from this address will be relayed. Everything else will be rejected by Airtel. Note that you should choose this email address to be the login account of your mailserver on your mailserver machine. In case you want to change this address later, you can login to relay.airtelbroadband.in with "foo@blah.com" and "mypwd" and then update the email address and/or passwd.
You should see that the email has been successfully registered. This completes the setup at Airtel ISP.

Now the important part on your side.

  1. cd to /etc/mail
  2. edit authinfo file and add this line: AuthInfo:relay.airtelbroadband.in "U:foo@blah.com" "I:foo@blah.com" "P:mypwd" "M:DIGEST-MD5 LOGIN PLAIN"
  3. edit sendmail.mc and add these lines. For more info about this, see this sendmail chapter at Mashedge:
define(`SMART_HOST', `esmtp:relay.airtelbroadband.in')dnl
FEATURE(authinfo)dnl
define(`confAUTH_OPTIONS', `A')dnl
TRUST_AUTH_MECH(`EXTERNAL GSSAPI DIGEST-MD5 CRAM-MD5 LOGIN PLAIN')dnl
define(`confAUTH_MECHANISMS', `EXTERNAL GSSAPI DIGEST-MD5 CRAM-MD5 LOGIN PLAIN')dnl
define(`confAUTH_REALM',`blah.com')dnl
define(`confCACERT_PATH', `/')dnl
define(`confCACERT', `ca-bundle.crt')dnl
define(`confSERVER_CERT', `sendmail.pem')dnl
define(`confSERVER_KEY', `sendmail.pem')dnl
define(`confCLIENT_CERT', `sendmail.pem')dnl


After you do these, do the following:

  • make -C /etc/mail (This will regenerate all the files required for sendmail)
  • /usr/sbin/service sendmail restart
This completes the setup.

You can test if sendmail is actually sending email by sending one test email as foo@blah.com from your host. Check the /var/log/maillog for any errors.

Monday, March 2, 2009

Sendmail from hosts with static IP addresses

Configuring sendmail to relay mails is straightforward. Look at this chapter on sendmail at Mashedge for suggestions by folks on the internet. The implicit assumption is that your host is allowed to send e-mail.

So, why would your host not be allowed to send mail? Due to the proliferation of spam, e-mail providers (such as Google, Yahoo etc) will refuse to relay mail from sendmail from a host unless both the DNS lookup and the reverse DNS lookup match for that host sending e-mail.

Let's say you have a domain name "blahblah.com". You get an static IP address from your ISP a.b.c.d.

You fix the DNS by adding the "A" entries using the domain management tools provided by the vendor from who you purchased the domain name. When you do "nslookup blahblah.com", you should see that "blahblah.com" resolves to "a.b.c.d". This is the easy part.

The next part is the reverse DNS. In this case, a lookup via the unix command "dig -x a.b.c.d" should resolve to "blahblah.com". Instead it will resolve to your ISP provider. Unless this part is fixed, you will NOT be able to send mails via sendmail from your host "blahblah.com". Fixing this is not in your control and you must contact your ISP to do the needful.

If you attempt to send mails without fixing the reverse DNS, your SMTP provider will reject the e-mail with an error 550 5.7.1 (IP name may be forged). Yikes! See here for more details about relaying denied.

Sunday, March 1, 2009

Open Source for your PC - Slowly advancing forward.

After using Apple Mac for years, I reluctantly bought a PC (S0ny VAIO NS VGN-NS25G) with windows vista on it! I was forced to because an accounting software that I planned to use (Tally) runs only on windows.

Not wanting to fork out more money for other Microsoft programs, I explored open source out there.

Here are the recommendations:

  1. OpenOffice is quite an excellent replacement for Microsoft Office.
  2. Mozilla's Thunderbird with Lightning and Google Contacts add-ons is a no-brainer replacement for Microsoft Outlook.

OpenOffice is getting major traction. Even some big banks in India have begun to use OpenOffice and it is real easy to download and set it up.

It is easy to configure Thunderbird to fetch e-mail from your e-mail provider. Using Google mail with IMAP was quite trivial. On the whole it took me about 3-4 hours to download and set up my laptop PC.

However, can my grandfather or father do this? Not likely. Can a non-computer professional set up their own PC with open source software? Not likely.

While all the great pieces of open source software exist and together can easily give Microsoft a run for their money, what is lacking is a comprehensive one click solution to install and set up all the various pieces of open source software into a complete workable solution on a PC running Windows.

Tuesday, February 24, 2009

Connecting the domain name and your static IP address for your website

So, I got a static IP address from my ISP. Here are the main steps to follow in order for users to find your website using your domain name.

1. Assign the IP address to your modem-router. Set up port forwarding in the modem-router to point to the webserver machine. Here I assume the modem-router is also the DHCP server. Port forwarding is easy if the modem-router is in the same subnet as the webserver. It is a simple one-step port forwarding.

2. Secure your modem. This is extremely important because there are always hackers trying to find easy entry into systems. Change the default modem-router password. Block all ports except the ports you want to expose (i.e., port 80 and 443). Enable the system log for monitoring. You really need to be paranoid about this. You can use tools such as open source nmap to find security holes.

3. If there is another router between the modem-router and your server, there will be additional complications. Multi-router port forwarding can be complex but this depends on the kind of router you have. You may need to turn 2nd router into an access point.

4. Change the DNS records.  Change the old "A" entries to use your new IP address.

Within a hour, users can get to your website via the domain name into your webserver.

Friday, February 13, 2009

Welcome

This is a blog about all things technical.