Friday, September 04, 2015

Tag Me Maybe?  Maintaining Friendships and Influencing Algorithms

[Caption: Example of Facebook Post with People tagged.]

With the “@” sign, we are now able to include others in our lives more easily. Tagging or mentioning people in content lets you share the story with someone in front of everyone, demonstrating publicly both your friendship and the shared interests that created it. We did interviews with 120 participants to uncover perceptions on tagging people.

Tagging To Build Stronger Relationships.

Most (79%) saw tagging as a way to build strong relationships, citing: “it made people feel special to have someone making time for them to tag them.” Tagged posts generated automatically by companies were not perceived as positively, because people did not connect them with genuine human connections. As designers we need to help companies better engage with their online clients, one option being encouraging more humanized interactions. The discussion matters given the recent lawsuit over people’s names appearing in tagged ads.

Tagging to influence Facebook’s algorithms, and distribute more surprising content.

Our interviewees (61%) saw tagging as a way to expose audiences to information outside of what they saw on Facebook. There was a notion that Facebook showed to friends a same type of content. Tagging broke this by letting people reach new social graphs at once (e.g., the friends of the taggees), and share with them information outside their norm. One woman, said, “I like to tag people that I know are interested in something and whose audience will also care. But my interest is in creating a crossover. So I involve audiences that follow people in dance, but I share with them something totally different, such as poetry.” 

Another, told a folk story of how tags influenced Facebook’s algorithms and helped give her content more viewers, “If I tag someone and then he comments, the post becomes ‘active’ and it’ll appear at the top of the News Feed. So I’ll sometimes tag someone directly in the post or in the comments. In either case, if it makes him comment, the post will be pushed to the top.”

Recommendation algorithms have tried to filter opinions, people and items that are different to us, limiting the diffusion of information. Our study highlights how people appear to value “things outside the norm”. Our interviewees had a need to distribute fresh information that could free audiences from algorithmic biases. Our research on tagging thus raises the question — would social media benefit from more official digital structures tailored for targeting new audiences and sharing with them strange information? Audiences could be bombarded with more unwanted content, but it could also facilitate “serendipitous” discoveries. 

To read more, checkout our Hyptertext 2015 research paper:
Tag me Maybe: Perceptions of Public Targeted Sharing on Facebook,
with Saiph Savage, Andres Monroy-Hernandez, Kasturi Bhattacharjee, Tobias Hollerer.

Tuesday, March 31, 2015

#FixIT@UNAM: How We Got Over A Hundred Latinas In Our Hackathon

[Image Caption: Group photo of #FixIT: Latina Hackathon @ UNAM


  • Organized  a large scale Latina Hackathon #FixIT with almost 300 registered females and 147 female Hackers on February 26-27 2015 at the National Autonomous University of Mexico (UNAM).
  • The Hackathon had a large female turnout because we created a space welcoming for Latinas.
  •  Hackers designed innovative smart home appliances using arduinos, and collaborated with other very diverse females, such as high school students, young professionals, graduate and undergraduates in Computer Engineering, Mechatronics, Industrial Design, Philosophy, Politial Science, among others.
  • The Hackathon's organization  had powerful team work between Google Anita Borg Scholars, US and Mexican universities (UNAM, UCSB, UC Berkeley, UCD, Georgia Tech) and non-profits (Major League Hacking, SocialTIC).

Many organizations have been fighting to bring diversity into IT. But it has been a sloooow loooong progress. I don't like slow! I decided to take things into my own hands :) So I did what you always do when you’re a little guy facing a terrible future with long odds and little hope of success: I teamed up with my friends.

I was very fortunate to have been awarded to join the planning committee of Google Anita Borg Scholars. This new organization has the mission to increment world-wide the number of women participating in computer science. I started to work closely with Miray Kas, and Sarah Safir. These two women gave me the motivation and inspiration to start dreaming and making a reality: the idea of Fixing IT, rapidly bringing diversity. It gave me the drive to stay up multiple nights with friends further devising a plan.

Online I got introduced to many new great people, from other Googlers who had careers focused in fostering diversity, to 19 year old UNAM students, like Juan Pablo Flores, who organized large scale Hackathons throughout their country. I’ve worked with some of the strongest teams in industry, academia, and even sport (Olympic medalists!) But I’ve never seen anything like this.
Starting from literally nothing, we went to having a fast moving plan: We would do a Hackathon! A female Hackathon, a Latina Hackathon. We would be fighting to incorporate two minorities at once!

[Image Caption: Latinas in 
#FixIT hacking away.]

Why a Hackathon?
We saw Hackathons as a fast way to turn anyone into creators, not just consumers of, technology. We wanted to create a space that would empower these minorities to build and present their own visions of what technology looks like.

We were a strong motivated team, but in a certain sense we were also just newbies, running around telling everyone we were going to create this new awesome more inclusive hackathon. You can ask anyone, I was constantly inviting everyone to join and participate in my Hackathon. The problem was that we didn't really know any latinas interested in tech; we didn't have a space to hold the Hackathon. Heck, we didn't even have a budget. We had no funds to even run it.  Even I began to doubt myself. It was a rough period.

But December hit and suddenly things started moving.  Suddenly all the work we had done started coming together. All the folks we talked  to about it suddenly began getting really involved and getting others involved. Everything started snowballing. It happened so fast.

So... How did we make it happen?

[Image Caption: UNAM Professors helping to explain Hackathon dynamics.
The professors helped to secure the spaces, 
prepare bootcamps, and connect with students.]

A. Professors from the National Autonomous University of Mexico (UNAM) got involved.
[Note that UNAM is one of the largest public universities in the world, and also one with  high academic ranking] The professors played a key role in the Hackathon's success because they helped us secure a space where we could run the Hackathon. From interviews with latinas we found that: 1) some latinas did not attend Hackathons because they lacked their own equipment; 2)  some parents from latinas only allowed their daughters to  participate in  the Hackathon if it happened during the school week and it was part of a school activity; 3) some Latinas felt insecure of their technical skills and this made them hide away from hackathons
This lead us to want  to create a Hackathon that covered the needs of these girls, and made it easy for them to participate.
1. The UNAM professors helped us secure computer labs so that participants would not need to bring their own equipment.

2. The professors also gave us access to their network of high school and college professors (UNAM also has high schools!) These professors kindly invited their Latina students to our event; many physically drove them there; and some even turned it into an official school activity.

[Image Caption: Latinas with Mentors
@ #FixIT bootcamp.]

3. The UNAM professors and also UNAM senior students helped us to prepare a bootcamp. We felt the bootcamp would give participants more security in themselves and help them thrive. The bootcamps helped participants regardless of their programming knowledge or skill to start designing and building smart houses using Arduino. UNAM also gave us many mentors  that helped participants throughout the hackathon. These mentors were very engaged students that also helped to create security in participants.

From interviews, we also suddenly realized that this might be one of the first large scale latina Hackathon in the region. It then made sense to try to help participants to build a network, and inspire and help each other to succeed. We thus decided to have talks for the Hackathon. Female professors, such as Dr. Cindy Rubio from UC Davis,  and graduate students, such as Deana Brown from Georgia Tech. They  all came down to UNAM to give inspiring talks on their research, career paths, and how they managed to succeed. Here again the UNAM professors managed to secure for us a large auditorium for the talks.
**For all of this thanks to Prof. Norma Elva Chavez and Dr. Jesus Savage. 

B. Amazing Google Anita Borg Scholars Support
The network of Google Anita Borg Scholars, specifically Miray Kas,  connected me with a group of very passionate Googlers, e.g., Rohan Lamba,  who had the vision of creating large scale hackathons across the world. The group helped cover all of the costs of the hackathon, and helped immensely with the planning and logistics. Rohan also came to the hackathon at UNAM and was up with me since 4 am the day of the Hackathon planning and hacking away!  Thanks to Google, our Hackathon gave participants:

1) Awesome Swag (Tshirts, bags, stickers, Tech Girls are Superheroes book! **This was actually written and created by another Google Anita Borg Scholar, Jenine Beekhuyzen!)
2) Kits to build and design Smart Houses using arduino.
3) Lots of great delicious Mexican food!
4)Awesome Prizes! (This was important to motivate participants. )

The Google Anita Borg Scholar network also gave us inspiring speakers and mentors. Both Dr. Cindy Rubio and Deana Brown were Anita Borg Scholars. They came down to UNAM to give inspiring strong talks.

[Image Caption: Dr. Cindy Rubio was the
keynote speaker of #FixIT]

C. Awesome UNAM Students
Our Hackathon had a large turnout, perhaps one of the largest Latina Hackathons to date. How did this happen? I believe it helped us immensely to listen to the needs of these young latinas. Undergraduate students from UNAM connected us with large networks of females who explained to us what it would take to have them attend the hackathon. We created spaces that facilitated their participation and they came! We had participation from schools and universities across the globe. Some examples were Universidad nacional Autonoma de Mexico (UNAM); Instituto Politecnico Nacional (IPN);  ITAM; Tec de Monterrey; Georgia Tech; University of California Santa Barbara, UC Berkeley; UC Davis; Oxford.

The biggest takeaway from this is that perhaps the success of our Hackathon came not necessarily because we had big tech players on our side. We succeed because there was this enormous mental shift in the students, professors, and just everyone involved. All of them were thinking of ways they could help to recruit and bring their latina friends to our event. Often really clever, ingenious ways. Students made videos. They designed ads. They bought billboards. They announced massively and repeatedly on social media. They had rallies. Students saw it as their responsibility to help.
Here a big shoutout to Juan Pablo Flores, Alejandra Monroy!

D. The Designers
Laura Ballroom and Ariadna Gómez Dessavre were the designers of the Hackathon. Ari made the awesome #FixIT logo; and Laura designed the tshirts, stickers and certificates.

The designers were key for the event because they created products that were used and adopted by participants. This helped to create a collective identity for the event and build a community. I loved seeing on social media people share selfies with their Hackathon tshirts and feel proud to wear them.

I loved seeing the senior female student add our sticker to her laptop (along with the multiple other stickers she proudly shows to present all the tech events she has attend) and the young high school student add our sticker to her bare laptop (representing this is her first tech events. But seems to be likely to attend now more :)

I believe we won this fight, we were able to have a hackathon with hundreds of latinas; rapidly bringing diversity into technology, because everyone made themselves the hero of their own story. Everyone took it as their job to have an event where more audiences could be included; where more voices could be empowered to design and construct technology. They threw themselves into it. They did whatever they could think of to do. They didn’t stop to ask anyone for permission.

We, the people decided to make a difference. We decided to make it our responsibility to do this work. To change who is empowered to participate in a Hackathon. Who is empowered to define technology. Let's not forget that we do have the power to change our reality.
"Juntas Creando y Transformando la Realidad"
[This was the main quote for #FixIT, it reads in Spanish: Together Creating and Transforming Reality]

[Image Caption: #FixIT was a success because we worked with the community to 
make Latinas feel welcomed] 

This text is a remix of Aaron Swartz's inspiring speech: How we Stopped SOPA." We miss you Aaron. 

Monday, November 17, 2014

Visualizing Online Audiences

         Caption: Social Spread Interface lets people select audiences based on their social connections

We tend to think that the larger our online audience, the more comments and interactions our content will receive. Yet this is not always the case.

The presence of other people can create a diffusion of responsibility. People don't feel as pressured to take action when they feel they share responsibility with others. Paradoxically, when someone poses a question to her entire  network, her friends are less likely to respond than if she made the question to a small targeted audience. Directing content to specific groups of people can help users harvest richer online interactions.

Many tech-savvy users  use different sharing mechanisms to engage in selective sharing, directing content  to specific predefined audiences. These users usually define list of people with particular interests or social ties (e.g., coworkers.) They then post content contextualized so that it is relevant to the interests of the people in each list.
But, keeping up-to-date lists can be hard and time-consuming. It's also inapplicable for more dynamic interactions, based on location, or popularity of the targeted users. For example,  someone organizing a social rally might only want to target the friends who are in town on a particular day. Or a person who just wrote an article on Gay rights might want the help from their most  influential friends in the topic to promote their article.
Another technique involves selecting individuals to target on-the-fly and only sharing the content or message to them. This type of behavior allows for a more dynamic selective sharing experience that is context-driven. This practice is usually referred to as targeted sharing.

Finding the right people at the right time is hard, especially when we are interacting in large communities where it is hard to keep track of everyone's interests and traits. Previous work, including Facebook's graph search, used list-based interfaces to recommend people with a certain expertise, interest or trait. But these systems do not let people easily explore and compare the different characteristics of the recommended individuals. However, these characteristics can play an important role when people are deciding whether or not to select a person for a particular collaboration or interaction. People want ways through which they can understand the space of users they can target to interact. List based interfaces don't let people have quick overviews of their possible targeted audience. Nor do they let people easily zoom in and compare specific users. But to have rich sharing experiences people need to understand their audience: an overview and its details. For instance, a person wanting to post on LGBT might need to understand that half of her interested audience is from Russia. This might mean that to better engage with them, she should include some LGBT issues happening in their country.

 Interactive visualization tools can enable effective audience targeting by prompting a user to learn about their audience and to understand their different interests. To explore these ideas, we designed Hax. Hax is a tool that provides a query interface and multiple visualizations to support users in dynamically choosing audiences for their targeted sharing tasks. We study how users engaged with this tool in the context of sharing and connecting with an audience on Facebook.

 We believe the data modeling techniques that work for content categorization and information retrieval can be adapted to mine people's interests and retrieve audiences relevant to users' diverse needs. But, while specialized data modeling algorithms exist that can correctly categorize data, they rarely fully capture the complex and ever-changing decision-making process for targeting an audience. We therefore opt to integrate data visualizations that incorporate a human-in-the-loop approach. We designed different data visualizations that highlight specific traits, or social signals, of relevant individuals in order to aid users in their audience targeting tasks.

Our exploration begins with the three social signals listed below. We briefly define the signal and the reasons for considering it. We decided to begin with these signals as previous work identified they play an important role in targeting audiences:

Shared interests: This signal captures the personal thematic interests of each community member. Many researchers and practitioners view collaborations as a process that aggregates personal interests into collective choices through self-interested bargaining. We believe this bargaining process can be facilitated by making users aware of the personal interests of others, and how they relate to the collaboration task they are promoting.

 Location: This signal holds information about the countries, states, and cities where community members live. Collaborations supported by computers have traditionally provided users with the luxury of interacting with others without having to worry about their location. However, location does play an important role when interacting and organizing events within the physical world (e.g., a social rally) as others' spatial-temporal constraints can determine how mucha person will engage in the activity.

Social Spread: This signal holds information about the type of friends and social ties community members have. This signal is important because it can aid members in recognizing prospective newcomers who can help keep the community alive and active. Additionally, the social connections of a member can also help in the spread of the community's messages and visions. Members could thus use this signal to identify the users whose social connectivity would help them the most in distributing certain content.

      Caption: Zoomed in Version of our Location Based Interface (top,) Transparent Interface (middle,) and Social Spread (bottom)

For more, see our full paper to be presented at COOP2014,  Visualizing Targeted Audiences co-authored by Saiph Savage, Angus Forbes, Carlos Toxtli, Grant McKenzie, and Shloka Desai

Monday, October 14, 2013

You know that thing I used to hate?...I love it now!: Expert Evolution in Social Media

It took me a long time to really appreciate some of the most successful people that have ever lived, such as Steve Jobs or Coco Chanel. Despite the adversity of being born orphans, these individuals changed their reality and transformed the world that we know.

Have you ever felt that one same thing, can have various different meanings in different points of your life? For example, I used to dislike Apple products, due to their closed architecture, lack of customization, all the related Steve Jobs fanboys etc. Now my opinion has shifted, and I am considering purchasing a macbook pro.

This paper: "From Amateurs to Connoisseurs: Modeling the Evolution of User Expertise through Online Reviews" from www2013, models precisely how users' perceptions on products (particularly beer!) change through time. Note that perceptions can be compared to level of expertise.
The work considers that users evolve on their own "personal clock" (i.e., each user is modelled on its own time-scale.)  so some users may be really experienced when they provide their first review, while other users may  never ever become experts. It is also assumed, that users with similar levels of expertise will rate products in similar ways, even if their ratings are temporally far apart, e.g., a user  from 2013 will rate things similarly to a user from 1999, if both users are in the same "noob level" (note that what will remain constant throughout time is what the "noob level" means.)

The work models the level of experience of each user as  a series of latent parameters that are constrained to be monotonically non-decreasing as a function of time. Therefore users can only become more experienced (or stay at least at the same level of experience...this clearly does not consider cases related to the German word of "Verlernen," where you forget something you have already learned!) Users evolve as they rate more products, i.e. the work considers that the very act of consuming products will cause users' tastes to change.
In particular the authors consider that users move from one level of expertise  to another, depending  on how they rate particular products.  Given that the paper works with a beer rating dataset, the authors consider that expert users will rate higher the hoppiest beers, i.e. the strong Ales  (confused what hops in beer mean, check this article out.) It is assumed that liking strong Ales is an acquired taste.
The figure below shows the ratings different users have given to different beers. We see how the Firestone XV, a strong ale,  is one of the beer that was rated the highest, and they consider this corresponds to an expert rating.  The figure also shows how biases can exist for different beers given the level of expertise of the user.

The approach is somewhat simple, different recommender systems are created for different stages of user evolution.
A classic latent factor recommender system would look something like the following: 

The authors create a sort of feature vector, that has different recommender systems for each stage of the user's life cycle:

Latent factor recommender models, are a collaborative filtering type recommendation algorithms, which consider that for specific domains (e.g., action films or beer) there exits a set of factors that influence the rating a specific item receives. These factors are not obvious, and it is difficult to predict the actual impact they have on  a rating.
The first goal of the latent factor recommender models is to infer these latent factors from the data by using mathematical techniques. 

In the author's approach it is considered that users that have a certain experience level will be influenced by certain type of factors in their rating. So e.g., a novice user the fact that a certain beer is more inexpensive than another might play a big role in how much the user says to like the beer, but for a more experiencied user,  he might be more influenced by the beer's texture. This is why the authors consider different recommender models for each experience level of the user.

Latent Factor models map users and items into a latent feature space.
A user's feature vector denotes the user's affinity to each of the features.  The product or item's feature vector represents how much the item itself  is related to the features. A rating is approximated  by the dot product of the user feature vector and the item feature vector.
 Specifically, consider that we have  a set U of users, and a set D of items. Let \mathbf{R} of size |U| \times |D| be the matrix that contains all the ratings that the users have given to the items.
The first task of this method is to discover the  $K$ latent features. At the end of the day,  we want to find two matrics matrices U (a |U| \times K matrix) and M (a |D| \times K matrix) such that their product approximates \mathbf{R}:

  The following image shows how the rating of items is composed of the dot product of matrices with these feature vectors:

Note that the M transposed matrix corresponds to the feature vectors of items.
Each row of U  represents the strength of the associations between a user and the features. Similarly, each row of M represents the strength of the associations between an item and the features. To get the prediction of a rating of an item d_j by u_i, we  calculate the dot product of the two vectors corresponding to u_i and d_j:

\hat{r}_{ij} = p_i^T q_j = \sum_{k=1}^k{p_{ik}q_{kj}}

The task is now to obtain U and M. One common way to approach this problem is the first intialize the two matrices with some values, calculate how `different’ their product is to R, and then try to minimize this difference iteratively. This method is called gradient descent. Its purpose is find a local minimum of the difference.
This difference is usually called the error between the estimated rating and the real rating, can be calculated by the following equation for each user-item pair:

e_{ij}^2 = (r_{ij} - \hat{r}_{ij})^2 = (r_{ij} - \sum_{k=1}^K{p_{ik}q_{kj}})^2

The squared error is considered because the estimated rating can be either higher or lower than the real rating.
To minimize the error, we need to know in which direction we have to modify the values of p_{ik} and q_{kj}. In other words, we need to know the gradient at the current values. Thus  we differentiate the above equation with respect to these two variables separately:

\frac{\partial}{\partial p_{ik}}e_{ij}^2 = -2(r_{ij} - \hat{r}_{ij})(q_{kj}) = -2 e_{ij} q_{kj}
  \frac{\partial}{\partial q_{ik}}e_{ij}^2 = -2(r_{ij} - \hat{r}_{ij})(p_{ik}) = -2 e_{ij} p_{ik}

With the gradient,  the update rules for both p_{ik} and q_{kj} can be formulated as:

p'_{ik} = p_{ik} + \alpha \frac{\partial}{\partial p_{ik}}e_{ij}^2 = p_{ik} + 2\alpha e_{ij} q_{kj}
q'_{kj} = q_{kj} + \alpha \frac{\partial}{\partial q_{kj}}e_{ij}^2 = q_{kj} + 2\alpha e_{ij} p_{ik}

\alpha is a constant that helps determine the rate of approaching the minimum. Usually  the value of \alpha  is small to avoid the risk of missing the minimum and oscillating around the minimum.

Now that latent factor models are clear (hopefully)...let's get back to the author's problem.
As we mentioned previously, the authors consider that  for each evolution stage of a user, there will be a recommender system that can adequtly capture the user's taste at that point in his life.
So their problem comes down to:
-Fitting the parameters of their recommender system.
-Fit the user's experience progression (i.e., be able to state that when the author reviewed X item, they were at a certain experience rate.)

Note that once we know the experience level in which all users contributed each of their reviews, fitting the parameters of each recommender model is a snitch (it's basically the same procedure experienced for classic latent factor recommendations.)

The question now is, how do we fit the user's experience progression? Well if we assume that users gain experience monotonically, as they rate more products, we  can fit experience using dynamic programming.

The two steps (fitting each user's experience level, and fitting the parameters of each recommender ) are repeated until conversion.

Sunday, June 16, 2013

Studying the Thesis of PhD Heroes: Munmun De Choudhury

Given that I am in the process of beginning to write my PhD thesis, I am currently reviewing the PhD thesis of doctors that have been more than successful in their career; these are people I admire and find inspirational for my own PhD path: my PhD heroes.
I have decided to create blog posts that describe some of the main contributions that these PhD thesis had, the new ways of thinking that these doctors brought in. I will begin this series with the PhD dissertation of Munmun De Choudhury, currently working in Microsoft research; she has several publications in top conferences such as CSCW, CHI, ICWSM, WWW, among others. Munmun is indeed one of my main true PhD heroes.

 Her thesis focused on designing frameworks and computational models to obtain a detailed understanding of how communication happens in online social networks. It was considered that online communication patterns are divided  in two main forms: the actual message discussed, and the channel or media used to discuss the message.  Work before Dr. De Choudhury's thesis focused  more on studying the network structure and dynamics, and little emphasis was given to providing tools that could characterize the type of messages present in an online community, providing insightful  observational studies on large-scale social communication datasets.

In particular, her research explored 3 main areas: (1) how information is diffused in an online social network, analyzing in particular how the influence of users and the fact that you can have very many similar users talking to each other, affects information spread; (2) how communication dynamics in online communities can be modeled, particularly focused on external and internal communication factors; (3) how "interestingness" of conversations can be modeled and measured , in particular focusing on detecting interesting conversations and identifying the features that turn them into interesting content.

Providing means to explore and analyze what are the dynamics and impact of our online social communications is important because social media data has shown to originate and create real world revolutions, think e.g., elections in Iran, Earthquake in Haiti. Social media also enables viral marketing, enabling collaborations in corporations, and can help users find experts, or even people that can help them connect with others.

In the following we begin exploring in detail each of the main themes discussed in her thesis.

Measuring the Intrestingness of a Conversation: The work considered that a conversation was interesting, when it made participants return to the conversation and continue commenting and posting. Such behavior is observed frequently on youtube, when users have already watched the video, yet they are returning to the video to comment and respond to others.
The work considered that people will participate and return to conversation when the theme of the conversation is engaging and/or interesting people are participating in the discussion. They predicted that users will return to a conversation, when they:  (a)  find the whole conversation theme interesting;  (b) see comments by people that are well known in the community; (c) observe an engaging dialogue between two or more people (an absorbing back and forth between two people).
Additionally conversations that are interesting will be propagated throughout the network; we will observe things like: users will seek other users who participated in interesting conversations; interesting themes will tend to be present in other conversations in the community; users who participated in the interesting conversations will search for other similar conversations about the same theme.
Themes are defined as a sets of salient topics associated with conversations at different points in time.

Interesting users are defined as users who after they comment, they receive a wide variety of comments from others; users that tend to participate in conversations that are currently popular in the community; users that tend to comment and engage in conversations with other interesting users.

Theme modeling: Within the modeling of themes, an idea that I found interesting from this thesis is that while there was a focus on modeling what themes were present in a conversation in a given time period, there was also an emphasis on normalizing the amount of content that was associated with a theme based on time and based on co-participation. This helped identified themes that were not only temporally popular or interesting due to an external event, or themes that certain users tended to frequently comment, not so much because the conversations around the theme were interesting, but rather because they had a probable passion for the subject.

Information Difusion:
(post in progress...come back soon!:)

Wednesday, November 14, 2012

Another layman's explanation of: Expert Evolution in Online Social Networks

I was recently reading a very interesting paper titled: Evolution of Experts in Question Answering Communities by Aditya Pal, Shuo Chang and Joseph Konstan. And thought I would share the paper and intend to explain it in Layman's terms.
There has been vast amount of work done in detecting experts in Question Answering Communities, typically this analysis is either through graph based methods or feature based methods. Graph based methods tend to analyze the link structure of a user in an online social network to find authoritative users. They analyze things such as: to how many other people is the user "friends" to? Feature based methods, on the other hand, analyze the characteristics of the users: how many best answers does the user have? What language style does he use? etc etc
The work we are analyzing seeks to identify experts, but then does a temporal analysis, to study how experts evolve in a community and how they influence a community's dynamics. The online community studied is Stackoverflow. To identify experts, the authors used two approaches: On one hand, they identify the number of positive votes a user's answerers and questions have received (a user gets a positive vote, when his/her answer is helpful to the community, or when his/her question is interesting or relevant to someone in the community) and labeled the top 10% of users with the highest number of votes as experts.
To analyze how experts evolve and how a community can be influenced in time by the answers and social interactions of experts, the authors performed the following:
  1. the questions and answers of the community were divided into bi-weekly buckets. Were the first bucket would hold the questions and anwsers of the first two weeks of the stackoverflow data they had collected, the second bucket the questions and answers created in the 3-4th weeks etc etc 
  2. For each user it is then possible to calculate per bucket (per every 2 weeks,) the number of questions, answers and best answers he/she have given. 
  3. For each user a relative time series is computed of each data type he/she has generated (questions, answers and best answers). This relative time series is constructed so that the contribution of a user can be valued relatively to the contribution of other users. For this, what is done,  is that in each of the time buckets the mean and standard deviation for each data type  are calculated. (lets recall that a bucket holds the number of answers, questions and best answers different  users have given in that particular time period, so for each type of variables, we can calculate the mean and standard deviation. It is then possible to normalize a data point in the time bucket as:
    X_b=(X_b - Mean_b)/(standardDeviation_b)

    Where X_b represents the number of answers a particular user has generated in time bucket b. And Mean_b represents the mean of all the number of answers different users have given in time bucket b
  4. After this step, each user is associated with 3 relative time series: the time series of their answers, questions and best answers. From the answers and best answer time series, a point wise ratio between best answers and answers is then calculated. This point wise ratio indicates  the probability of a user's answers being selected as the best answer.
    The following figure shows an interesting plot where we see how the likelihood of an expert and an average user receiving the votes for best answer changes over time.

What we notice is that the likelihood of receiving the best answer increases significantly over time for experts in comparison to average users. Initially the likelihood of receiving a best answer is the same for both experts and average users. The authors believe that this occurs, because when a new person, who happens to be an expert, joins the community, other users are wary of marking the answers of newcomers as the best. But as the expert gains reputation, the rest of the community members become more and more comfortable in marking their answers as the best.
The next interesting thing the author's analyzed was the the likelihood of having a user ask a question. It was seen that in general expert users do not ask questions. They found that the overall question to answer ratio among experts was 1/15 !!! To compare the time series of questions and answers, the authors computed an aggregate time series of the number of questions and answers of experts, and then normalize the time series such that it has mean=0 and standard deviation =1. From these two resulting distributions (questions and answers) a cross-covariance was computed. Now, the cross-covariance will give us information about just how similar two signals are, as a function of a time-lag applied to them. The authors found that the optimal time lag was zero for the majority of expert users. Which indicates that likelihood of an expert asking or responding to a question vary simultaneously.

Friday, July 27, 2012

Layman's Explanation of Online LDA

Topic Modeling!
LDA stands for Latent Dirichlet Allocation, and it is a type of topic modeling algorithm. The purpose of LDA is to learn the representation of a fixed number of topics, and given this number of topics learn the topic distribution that each document in a collection of documents has. For example, if we were given the following sentences:
A:I spend the day at the beach tanning.
B: I ate Mexican Tacos and Guacamole.
C:I love tanning in Mexican beaches while eating quesadillas and tacos under the sun.

LDA might say something like:
Sentence A is 100% about Topic 1
Sentence B is 100% Topic 2
Sentence C is 30% Topic 1, 70% Topic 2

where LDA also discovers that:
Topic 1: 30% beach, 15% tanning, 10% sun, … (where we notice that topic 1 represents things related to the beach)
Topic 2: 40% Mexican, 10% Tacos, 10% Guacamole, 10% Quesadilla , … (where we notice that topic 2 represents things related to Mexico.)

LDA learns how topics and documents are represented in the following form:

1)First the number of topics to discover is selected. (Similar to when we specify the number of clusters we wish our clustering algorithm to consider)

2) Once the number of topics is selected, LDA will go through each of the words in each of the documents, and it will randomly assign the word to one of the K topics. After this step we will have topic representations (how the words are distributed in each topic) and documents represented in terms of topics (Just like the above example, where we said Sentence or Document C is 30% about Topic 1 and 70% about Topic 2.) Now, the thing is, the assignment of words to topics, was done in a random form, so of course this obtained representation is not very optimal or accurate. To better this representation LDA will analyze per document:
what is the percentage of words within the document that were assigned to a particular topic. And for each word in the document, LDA will analyze over all the documents, what is the percentage of times that particular word has been assigned to a particular topic. LDA will therefore be calculating:

1) p(topic t | document d) = percentage of words in a document d that are currently assigned to topic t.
2) p(word w | topic t) = percentage of times the word w was assigned to topic t over all documents.

LDA will decide to move a word w from topic A to topic B when:
p(topic A | document d) * p(word w | topic A)< p(topic B | document d) * p(word w |topic B)
After a while, LDA "converges" to a more optimal state, where topic representations and documents represented in terms of these topics are ok.

Now that we have understood the underlining principle about how LDA works. We will now discuss online LDA.
The problem with LDA is that the posterior probability we need to calculate in order to reassign words to topics is very difficult to compute. Therefore researchers use approximation techniques to find what this posterior probability is.
Generally algorithms for approximating this posterior probability are either based on sampling approaches or optimization approaches. Sampling approaches are typically based on Markov Chain Monte Carlo (MCMC) sampling. MCMC intends to find the posterior probability distribution by randomly drawing values from a complex distribution of interest. MCMC are named that way, because the previous sampled values (previous states) affect the generation of the next random sample value ( in other words, the transition probabilities between sample values is a function of the most recent sample value.)
Optimization approaches on the other hand, are typically based on variational inference. Variational Inference can be seen as deterministic alternative to MCMC. Variational Inference replaces MCMC's random, somewhat independent sampling, with optimization. Variational Inference seeks to optimize a simplified parametric distribution to be close in Kullback-Leibler divergence to the posterior. The following picture, intends to show how variational inference defines a subfamily of distributions, and the goal is to find a point in the subfamily distribution that is the closest to P(z|x). Similarity is measured using Kullback–Leibler divergence, which is a non-symmetric measure of the difference between two probability distributions P and Q.
Variational Inference has shown to be as accurate as MCMC, but FASTER, so this has made Variational Inference very popular when applying it to large datasets.

Now, despite the benefits Variational Inference brings. Large scale data analysis can still be difficult. What many groups have done is to use batch variational inference, where there is a constant iteration between analyzing each observation and updating dataset-wide variational parameters, but in really big datasets each iteration can become very costly and impractical...and this is where Online LDA comes to the rescue!
Online LDA is based on online stochastic optimization, which has shown to produce good parameter estimates dramatically faster than batch algorithms on large datasets.
Online stochastic optimization in LDA is about finding a balance between exploiting the knowledge gained on a particular topic assignation, and exploring new topic assignations. Note: Images from standford university and princeton university