Popularity Characterization and Modelling for User-generated Videos
Islam, Muhammad Aminul
User-generated content systems such as YouTube have become highly popular. It is difficult to under- stand and predict content popularity in such systems. Characterizing and modelling content popularity can provide deeper insights into system design trade-offs and enable prediction of system behaviour in advance. Borghol et al. collected two datasets of YouTube video weekly view counts over eight months in 2008/09, namely a “recently-uploaded” dataset and a “keyword-search” dataset, and analyzed the popular- ity characteristics of the videos in the recently-uploaded dataset including the video popularity evolution over time. Based on the observed characteristics, they developed a model that can generate synthetic video weekly view counts whose characteristics with respect to video popularity evolution match those observed in the recently-uploaded dataset. For this thesis, new weekly view count data was collected over two months in 2011 for the videos in the recently-uploaded and keyword-search datasets of Borghol et al. This data was used to evaluate the accuracy of the Borghol et al. model when used to generate synthetic view counts for a much longer time period than the eight month period previously considered. Although the model yielded distributions of total (lifetime) video view counts that match the empirical distributions, significant differences between the model and em- pirical data were observed. These differences appear to arise because of particular popularity characteristics that change over time rather than being week-invariant as assumed in the model. This thesis also characterizes how video popularity evolves beyond the eight month period considered by Borghol et al., and studies the characteristics of the keyword-search dataset with respect to content popu- larity, popularity evolution, and sampling biases. Finally, the thesis studies the popularity characteristics of the videos in the recently-uploaded and keyword-search datasets for which additional view count data could not be collected, owing to the removal of these videos from YouTube.
DegreeMaster of Science (M.Sc.)
CommitteeEager, Derek L.
Copyright DateJanuary 2013
Video on Demand
Home Box Office
Digital Video Recorder
Organization of Economic Co-operation and Development
Internet Protocol Television
Content Distribution Network
Online Social Network
Cumulative Distribution Function
Complementary Cumulative Distribution Function
Maximum Likelihood Estimation