Search for content and authors
 

Analysis and simulation of emotional states in Internet communities

Paweł Weroński Julian M. Sienkiewicz 1Janusz A. Hołyst 1

1. Warsaw University of Technology, Centre of Excellence for Complex Systems Research, Koszykowa 75, Warszawa 00-662, Poland

Abstract

Because of continues growth of the Internet a common need for understanding the rules that govern the Internet communities has risen. In thiswork an analysis of SnEA Blogs06 dataset was performed. It is a collection of 1215 distinct time series - discussion threads which consists of 240592 comments. The text in the comments was classified by Naive Bayes Classifier which has given a pair of observables: subjective probability Psuband positive probability Ppos.

The first aim of this work was to verify the quality of the data and rejection of hypothesis which assumed that the signal is a statistical noise of a badly trained classifier. In order to achieve it, the time series were treated as a Markov chain, the transition Matrix was studied using the Pointwise Mutual Information definition. This analysis clearly indicated that there is a distinct, non-trivial signal structure in the time series which is induced by strong correlations. After that the mean observable values of threads were calculated and compared to a model that assumed a pure randomness of the signal (global shuffling procedure), there was no relation between two distributions. It was also observed that in most cases threads with given mean observable value give a very narrow range of variance. The conclusion was that there are groups, classes of threads in which threads have the same or very similar observable distributions. The last aim of the analysis was to determine if the signal in the thread is stationary, the results were positive. These conclusions led to creating a model which was used to simulate the time series, studies performed showed that the theoretical approach is highly correlated with the original data. Presented models are able to precisely define the observable distribution in each of the time series by using the n first comments. The conclusions that arise from the simulation data were used to propose few rules, named emotional interaction which determines the observable distribution in a thread.

 

Legal notice
  • Legal notice:
 

Related papers

Presentation: Poster at 5 Ogólnopolskie Sympozjum "Fizyka w Ekonomii i Naukach Społecznych", by Julian M. Sienkiewicz
See On-line Journal of 5 Ogólnopolskie Sympozjum "Fizyka w Ekonomii i Naukach Społecznych"

Submitted: 2010-10-14 07:51
Revised:   2010-11-19 17:06