Google Analytics and Bigtable

Google Analytics and Bigtable

Another tidbit I found curious in the Google Bigtable paper was the massive size of the Google Analytics data set stored in Bigtable.

The paper says that 250 terabytes of Google Analytics data are stored in Bigtable. That’s more than all the images for Google Earth (71T). It is the second largest data set in Bigtable, behind only the 850T of the Google crawl.

Why is it so big? The way I had assumed Google Analytics worked is that it maintained only the summary data for each website. That would be a very small amount of data, nowhere near 250T.

Instead, it appears Google Analytics keeps all the information about user behavior on all sites using Google Analytics permanently, online, and available for various analyses. That would explain 250T of data.

