Monday, October 14, 2013

You know that thing I used to hate?...I love it now!: Expert Evolution in Social Media

It took me a long time to really appreciate some of the most successful people that have ever lived, such as Steve Jobs or Coco Chanel. Despite the adversity of being born orphans, these individuals changed their reality and transformed the world that we know.

Have you ever felt that one same thing, can have various different meanings in different points of your life? For example, I used to dislike Apple products, due to their closed architecture, lack of customization, all the related Steve Jobs fanboys etc. Now my opinion has shifted, and I am considering purchasing a macbook pro.

This paper: "From Amateurs to Connoisseurs: Modeling the Evolution of User Expertise through Online Reviews" from www2013, models precisely how users' perceptions on products (particularly beer!) change through time. Note that perceptions can be compared to level of expertise.
The work considers that users evolve on their own "personal clock" (i.e., each user is modelled on its own time-scale.)  so some users may be really experienced when they provide their first review, while other users may  never ever become experts. It is also assumed, that users with similar levels of expertise will rate products in similar ways, even if their ratings are temporally far apart, e.g., a user  from 2013 will rate things similarly to a user from 1999, if both users are in the same "noob level" (note that what will remain constant throughout time is what the "noob level" means.)

The work models the level of experience of each user as  a series of latent parameters that are constrained to be monotonically non-decreasing as a function of time. Therefore users can only become more experienced (or stay at least at the same level of experience...this clearly does not consider cases related to the German word of "Verlernen," where you forget something you have already learned!) Users evolve as they rate more products, i.e. the work considers that the very act of consuming products will cause users' tastes to change.
In particular the authors consider that users move from one level of expertise  to another, depending  on how they rate particular products.  Given that the paper works with a beer rating dataset, the authors consider that expert users will rate higher the hoppiest beers, i.e. the strong Ales  (confused what hops in beer mean, check this article out.) It is assumed that liking strong Ales is an acquired taste.
The figure below shows the ratings different users have given to different beers. We see how the Firestone XV, a strong ale,  is one of the beer that was rated the highest, and they consider this corresponds to an expert rating.  The figure also shows how biases can exist for different beers given the level of expertise of the user.

The approach is somewhat simple, different recommender systems are created for different stages of user evolution.
A classic latent factor recommender system would look something like the following: 

The authors create a sort of feature vector, that has different recommender systems for each stage of the user's life cycle:


Latent factor recommender models, are a collaborative filtering type recommendation algorithms, which consider that for specific domains (e.g., action films or beer) there exits a set of factors that influence the rating a specific item receives. These factors are not obvious, and it is difficult to predict the actual impact they have on  a rating.
The first goal of the latent factor recommender models is to infer these latent factors from the data by using mathematical techniques. 

In the author's approach it is considered that users that have a certain experience level will be influenced by certain type of factors in their rating. So e.g., a novice user the fact that a certain beer is more inexpensive than another might play a big role in how much the user says to like the beer, but for a more experiencied user,  he might be more influenced by the beer's texture. This is why the authors consider different recommender models for each experience level of the user.

Latent Factor models map users and items into a latent feature space.
A user's feature vector denotes the user's affinity to each of the features.  The product or item's feature vector represents how much the item itself  is related to the features. A rating is approximated  by the dot product of the user feature vector and the item feature vector.
 Specifically, consider that we have  a set U of users, and a set D of items. Let \mathbf{R} of size |U| \times |D| be the matrix that contains all the ratings that the users have given to the items.
The first task of this method is to discover the  $K$ latent features. At the end of the day,  we want to find two matrics matrices U (a |U| \times K matrix) and M (a |D| \times K matrix) such that their product approximates \mathbf{R}:

  The following image shows how the rating of items is composed of the dot product of matrices with these feature vectors:

Note that the M transposed matrix corresponds to the feature vectors of items.
Each row of U  represents the strength of the associations between a user and the features. Similarly, each row of M represents the strength of the associations between an item and the features. To get the prediction of a rating of an item d_j by u_i, we  calculate the dot product of the two vectors corresponding to u_i and d_j:


\hat{r}_{ij} = p_i^T q_j = \sum_{k=1}^k{p_{ik}q_{kj}}




The task is now to obtain U and M. One common way to approach this problem is the first intialize the two matrices with some values, calculate how `different’ their product is to R, and then try to minimize this difference iteratively. This method is called gradient descent. Its purpose is find a local minimum of the difference.
This difference is usually called the error between the estimated rating and the real rating, can be calculated by the following equation for each user-item pair:

e_{ij}^2 = (r_{ij} - \hat{r}_{ij})^2 = (r_{ij} - \sum_{k=1}^K{p_{ik}q_{kj}})^2

The squared error is considered because the estimated rating can be either higher or lower than the real rating.
To minimize the error, we need to know in which direction we have to modify the values of p_{ik} and q_{kj}. In other words, we need to know the gradient at the current values. Thus  we differentiate the above equation with respect to these two variables separately:


\frac{\partial}{\partial p_{ik}}e_{ij}^2 = -2(r_{ij} - \hat{r}_{ij})(q_{kj}) = -2 e_{ij} q_{kj}
  \frac{\partial}{\partial q_{ik}}e_{ij}^2 = -2(r_{ij} - \hat{r}_{ij})(p_{ik}) = -2 e_{ij} p_{ik}

With the gradient,  the update rules for both p_{ik} and q_{kj} can be formulated as:


p'_{ik} = p_{ik} + \alpha \frac{\partial}{\partial p_{ik}}e_{ij}^2 = p_{ik} + 2\alpha e_{ij} q_{kj}
q'_{kj} = q_{kj} + \alpha \frac{\partial}{\partial q_{kj}}e_{ij}^2 = q_{kj} + 2\alpha e_{ij} p_{ik}

\alpha is a constant that helps determine the rate of approaching the minimum. Usually  the value of \alpha  is small to avoid the risk of missing the minimum and oscillating around the minimum.

Ok...
Now that latent factor models are clear (hopefully)...let's get back to the author's problem.
As we mentioned previously, the authors consider that  for each evolution stage of a user, there will be a recommender system that can adequtly capture the user's taste at that point in his life.
So their problem comes down to:
-Fitting the parameters of their recommender system.
-Fit the user's experience progression (i.e., be able to state that when the author reviewed X item, they were at a certain experience rate.)

Note that once we know the experience level in which all users contributed each of their reviews, fitting the parameters of each recommender model is a snitch (it's basically the same procedure experienced for classic latent factor recommendations.)

The question now is, how do we fit the user's experience progression? Well if we assume that users gain experience monotonically, as they rate more products, we  can fit experience using dynamic programming.


The two steps (fitting each user's experience level, and fitting the parameters of each recommender ) are repeated until conversion.



Sunday, June 16, 2013

Studying the Thesis of PhD Heroes: Munmun De Choudhury

Given that I am in the process of beginning to write my PhD thesis, I am currently reviewing the PhD thesis of doctors that have been more than successful in their career; these are people I admire and find inspirational for my own PhD path: my PhD heroes.
I have decided to create blog posts that describe some of the main contributions that these PhD thesis had, the new ways of thinking that these doctors brought in. I will begin this series with the PhD dissertation of Munmun De Choudhury, currently working in Microsoft research; she has several publications in top conferences such as CSCW, CHI, ICWSM, WWW, among others. Munmun is indeed one of my main true PhD heroes.

 Her thesis focused on designing frameworks and computational models to obtain a detailed understanding of how communication happens in online social networks. It was considered that online communication patterns are divided  in two main forms: the actual message discussed, and the channel or media used to discuss the message.  Work before Dr. De Choudhury's thesis focused  more on studying the network structure and dynamics, and little emphasis was given to providing tools that could characterize the type of messages present in an online community, providing insightful  observational studies on large-scale social communication datasets.

In particular, her research explored 3 main areas: (1) how information is diffused in an online social network, analyzing in particular how the influence of users and the fact that you can have very many similar users talking to each other, affects information spread; (2) how communication dynamics in online communities can be modeled, particularly focused on external and internal communication factors; (3) how "interestingness" of conversations can be modeled and measured , in particular focusing on detecting interesting conversations and identifying the features that turn them into interesting content.

Providing means to explore and analyze what are the dynamics and impact of our online social communications is important because social media data has shown to originate and create real world revolutions, think e.g., elections in Iran, Earthquake in Haiti. Social media also enables viral marketing, enabling collaborations in corporations, and can help users find experts, or even people that can help them connect with others.

In the following we begin exploring in detail each of the main themes discussed in her thesis.

Measuring the Intrestingness of a Conversation: The work considered that a conversation was interesting, when it made participants return to the conversation and continue commenting and posting. Such behavior is observed frequently on youtube, when users have already watched the video, yet they are returning to the video to comment and respond to others.
The work considered that people will participate and return to conversation when the theme of the conversation is engaging and/or interesting people are participating in the discussion. They predicted that users will return to a conversation, when they:  (a)  find the whole conversation theme interesting;  (b) see comments by people that are well known in the community; (c) observe an engaging dialogue between two or more people (an absorbing back and forth between two people).
Additionally conversations that are interesting will be propagated throughout the network; we will observe things like: users will seek other users who participated in interesting conversations; interesting themes will tend to be present in other conversations in the community; users who participated in the interesting conversations will search for other similar conversations about the same theme.
Themes are defined as a sets of salient topics associated with conversations at different points in time.

Interesting users are defined as users who after they comment, they receive a wide variety of comments from others; users that tend to participate in conversations that are currently popular in the community; users that tend to comment and engage in conversations with other interesting users.

Theme modeling: Within the modeling of themes, an idea that I found interesting from this thesis is that while there was a focus on modeling what themes were present in a conversation in a given time period, there was also an emphasis on normalizing the amount of content that was associated with a theme based on time and based on co-participation. This helped identified themes that were not only temporally popular or interesting due to an external event, or themes that certain users tended to frequently comment, not so much because the conversations around the theme were interesting, but rather because they had a probable passion for the subject.

Information Difusion:
(post in progress...come back soon!:)