Webology, Volume 5, Number 4, December, 2008

Home Table of Contents Titles & Subject Index Authors Index

Virtual polling data: A social network analysis on a student government election

Shane Tilton
Ohio University of Zanesville, 1425 Newark Road, Elson Hall 250, Zanesville, OH 43701, (740) 453-0762. E-mail: tiltons (at) ohio.edu

Received September 2, 2008; Accepted December 20, 2008


This paper will look at the ability of online social networks to predict election outcomes of a connected society, in this case a university. Facebook represents a new phenomenon in networking within a university. These network constructs allow for communication to occur rapidly and can influence the opinion of the student body. It is the conglomeration of previous information and communication technologies (ICTs) wrapped up under a simple graphical user interface (GUI) that allows the student body to communicate quickly and has allowed online social networks to dominate collegiate culture. Collegiate culture exists in a duality of the real world and this new online social network. Student governance is reflected in both of these realms. Student governance is as close to political power as most students get within the confines of the university and just as complex as the network structure present in Facebook. Like Facebook, the students within the collegiate experience must successfully navigate within the internal network to survive and become leaders in the community. With these similarities, the research question that will framed the rest of the paper will be "could Facebook be used to estimate the results of a student election?" The research used a hierarchical linear matrix, which was developed for the work of Raudenbush & Bryk, to develop a model that could answer this question. The final analysis of the matrix showed it was able to predict what place the candidates came in 21 out of 27 times for all of the candidates in a given election. In terms of predicting the candidate's final percentage of votes received (within half the standard deviation of the Estimated Polling Percentage, which was .072722) during the election 12 out of 27 times for all of the candidates in a given election.


Facebook; Elections; Political; Virtual engagement; Hierarchical linear model


In my current work, I am looking at online social and their impact on the overall social en-vironment they serve. Facebook represents a new phenomenon in networking within a university. It is an "online social utility" that allows users within the network to observe and communicate with others within the network. This construct is a conglomeration of previous information and communication technologies (ICTs) wrapped up under graphical user interface (GUI). It allows users to post messages like a discussion board. Blogs are also built into the functionality of the utility. Photos can be easily uploaded and viewed. Groups are formed and populated through a simple linking process. One clicked link allows users to join any group. One can even create "personas" within different group online. The main persona acts as the avatar for the user and is defined by the profile created by the user. Profiles are multilayered constructs created by a multi-mediated process. The base layer of the profile is the basic demographic information. Hometown, birthday, major and college form the real world structure that defines the individual. It is the other customizable categories within the system that create points of connection between real life and the virtual life of the network. These points of connection come from the concept of per-ception leading to reality. Groups within the online social network create "sub-universes" that maintain a new social reality (James, 1890 via Flaherty, 1999, 3).

Online social networks have dominated the collegiate culture. As of 2006, more than 7 million students from 2,600 colleges and universities used Facebook to interact with their peers from their own university or others that are connected colleges, high schools, workplaces, and/or geographic regions. Facebook, along with other online social networks, has allowed users to connect with others in ways that goes beyond simple classroom interactions (Epstein, 2006). Users have the capability to express themselves through their pictures, their textual discussions with others on the network and their ability to express their feelings through the use of a blog. It is through this freeflow of information that one creates their status in the virtual world. One could argue that this position of status within the network is similar in nature to the development and maintenance of political power in the "real world."

Student governance is as close to political power as most students get within the confines of the university and just as complex as the network structure present in Facebook. Student senate, which is one of the key units of student governance, exists with the overall environment of the twenty-first century university governance structure (Gayle, Tewarie & White, 2003). The impulse that dictates change within the university conflicts with the rules, policies and budget of the university. Leadership of the university must interact on multiple levels, from the legislators to the department level, to maintain the cohesion of the college. Like Facebook, the students within the collegiate experience must successfully navigate within the internal network to survive and become leaders in the community. With these similarities, the research question that will frame the rest of the project will be "could Facebook be used to estimate the results of a student election?"

This year has proven a need for a gauge of student opinion. Students have expressed themselves in traditional and alternative methods to show their feelings towards the administration. Traditional methods of research have their place in studying this social problem. Surveys are often used to measure student opinion. But, student surveys are often administered in the classroom or some other type of "laboratory setting." Surveys can fail to take into account the social environment of the university. A social network analysis can account for the social environment by using an online social network as a proxy.

To make this research relevant to the rest of the research involving online social networks, it is important to understand the level of saturation that the network has in the real world environment of the university it is trying to serve. At the time of this writing, 71% percent of the undergraduate students who attend the university had a Facebook profile (Facebook, 2007). According to a current survey that I am working on, out of those students who have a Facebook account at this university, about 46% would be considered active users of the network. Active users would be classified as those who use the social network more than 30 minutes in a given week and also post at least one piece of information onto the virtual network. Another point to consider when looking at the saturation of the network with regards to connection to the real world network is the national adaptation to the online social network. Since Facebook has a significant presence on many of the colleges and universities in the United States, over 250 colleges and universities have their own Facebook network, it seems that Facebook would represent a fair reflection of college life in the virtual realm.

Review of Literature

Online social networking has increasingly become a point of focus on most college campuses. Sites like Facebook (www.facebook.com), mySpace (www.myspace.com) and Friendster (www.friendster.com), among others, allow individuals to connect with others who share similar interests, hobbies and media tastes. A person's entry point to these networks is their profile page, which contains pictures that they post, a list of their favorite music, television shows and causes that the person supports & demographic data, such as a person's birthday, hometown and their name. Groups are created from these users with the central theme revolving around one or two interests. The listing of groups that a person belongs to acts as "an avatar" that represents an individual's support system. This avatar is one of the symbols that other users who search the site can use to determine if he or she wants to include them in their group of friends. In Facebook, the other users who search the site comprise of students, faculty, staff and alumni from the same university or college. Individual colleges and universities are the breakpoints. The web site replaces the broad www prefix with the individual schools. This was done to protect students' identity from outside searching (Jones & Soltren, 2005).

The representation of the individual on these sites is a point of concern for the administration of colleges and universities. The University of Michigan has required all student-athletes to sign a statement that he or she will present a positive image on social networking sites (Woo, 2006). Michigan believes that the on-line self should have the same "high standard of honor and dignity" that is expected of the "real world" athlete. Their Department of Intercollegiate Athletics has laid out guidelines of consequences for bad behavior on Facebook.

However, in the process of researching this paper, there was limited discussion regarding how Facebook role in creating and reinforcing groups within the online environment. There is little academic work examining online social networks. A 2005 survey of academic community members found that 90% of the undergraduates participated in a social network community, primarily Facebook, MySpace, and Friendster, and that many of the points of analysis of the research focused on the personal displays of information presented on the individual's profile (Stutzman, 2006). Donath and Boyd (2004) examine the decontextualized nature of Facebook with regards to personal reflection as the centerpoint of the analysis. But the research did not go beyond a simple analysis of personal space in online social networks.

Much of the research done in the field focuses on the personal elements of space and not the "syntax of interaction" (Liu, Maes & Davenport, 2006). Current research looks mainly at the creation of virtual identity as the means of interacting on the virtual environment of Facebook. Borrowing from Sherry Turkle (2000), the mediated user has the means to create their own flow through their understand of the online world and the social environment. There is a heavy focus on personal development and profile creation of the individual. This theme undervalues the purpose and one of the major reasons for the popularity of Facebook. Christy Hine (2003) looks at the individual in the context of the online environment. It is not just the creation of the profile that defines the individual on the online world, but how they position themselves in their world.

One of the key issues that has not been fully studied in the current body of work is the notion that Facebook represents an offline-online dynamic. A criticism to the current body of work is that it tends to present Facebook in a vacuum. Few researchers look at "real-world" interaction for contextual clue of what is happening in the Facebook medium. Ellison, Steinfield and Lampe (2006) examine this dynamic in some detail when looking at the Facebook phenomenon. They describe in limited detail about the attention placed on student joined certain groups (i.e. "Alliance for Unethical Journalism") and students were place in a negative light because the students failed to account for the possible audience that could be those outside of the campus and the humor that they were presenting was lost on the larger audience. When look at Facebook with an academic lens, much of the research explores how users are embedded within the network. These networks are framed within the unique geographically-bound target audience. But, the audience is not framed in the research as a connected network, merely a series of intra-dependent, high visible profiles within the network. Also, the previous research fails to explore the "fragmented self" (Turkle, 2000), that is to say the presentation of a concentration of parts of the individual within the social network. Many of the researcher look at the profile as the "authentic self," where what is display in the profile is the "true self."

When the online-offline dynamic is explored in the writing, it is an incomplete presentation of the dynamic. The three main forces that typically affect the dynamics of social networks are size, feeling of community and relevance. These forces are constantly in flux (Stutzman, 2006). Researchers tend to focus on relevance aspect of social networking and use profile development as a proxy (i.e. Johns, 2006). Relevance should be related to the online-offline dynamic, with real world interaction as the proxy for this force. Christine Hine (2000) look at these dynamic through an ethnographic analysis of the trail Louise Woodward, a British nanny charged with killing one of the children in her care. Much of her research looks at the formation of identity through a group dynamic. It is this level of research that is missing in current studies regarding online social networking.


The focus of this research will be in the form of a hierarchical linear model (HLM) (Raudenbush & Bryk, 1986). The model enables the complex information of the "multi-level social processes" to be analyzed constructively through cross-tabulation of the relevant information while not ignoring key influences that can affect the overarching questions of the research. Parameter variance across all of the subgroups within the model should help not only observes the differences between the groups, but also helps indicate the direction of the online social network. The conceptualization of the correlational relationships will be expressed in a linear mode that reflects parameter and sampling variables in the research. The within-group model created would indicate which candidate should win within a given group. Between-group models would compensate for natural variations that occur within the online social network.

This model addresses the two key hypotheses that the researcher believe are the overarching themes that dictate student governance and online social networking. These hypotheses are:

H1. Those who are most active in the realm of Facebook will be more likely to win a student election.

"Most active in the realm of Facebook" will be defined for the purpose of this research as the candidate who updates their status of Facebook the most, the candidate who connects with others on the network the most and also the candidate who logs onto the network the most will win the student election. There will be three vectors in the study used to determine this hypothesis.

H2. Those who have a higher level of virtual engagement will be more likely to win a student election.

For virtual engagement, the research will be looking at the amount of artifacts (textual and visual) that the candidate presents on the network, how many users react to those artifacts and it what way the users interact with those artifacts. There will be four vectors looking at virtual engagement in this study.

This link that this study is trying to show is that those candidate who are active in a online social network will more likely to connect with voters in the real world. This connection should help the candidate get elected.

This model works under the assumption that the structure of the online social network acts as a proxy to the voting habits of the real world population, and therefore those indicators present on the online social network will be the same as those in the real world. HLM will address these hypotheses within the structure of the model. The model will create an Estimated Poling Percentage for each candidate. Each candidate will act as an independent variable, while the independent metrics will act as the dependent variables.

To determine the Estimated Polling Percentage (EPP) of each candidate, a series of ten metrics were created. This level of complexity was developed to account for the fluctuations that exist within an online social network and these metrics attempt to stabilize the data for the purpose of estimating voter response. These metrics were created by combining the proxies that exist in a social network (Barnes, 1962) and the proxies present in political polling (Wu & Weaver, 1997). They are:

  1. Total Membership (TM): This refers to the total membership of either the Facebook group page of the candidate and the candidate's party Facebook page. This index would be a proxy for the support structure behind the candidate. The Facebook group acts a "virtual campaign office" and the number of members translates as a type of political capital.
  2. Unique Voter Membership Percentage (UVMP): This refers to the unique membership of either the Facebook group page of the candidate and the candidate's party Facebook page. A unique voting member refers to a Facebook member that can vote for a candidate and is not a member of opposing candidate's Facebook group. This metric will be calculated by dividing those who considered as unique voters by the total membership of the Facebook group. This index would first be a pseudo-double weight for how well connected the candidate is with the rest of the virtual student body, when used in conjunction with the total membership index.
  3. Related Group Radius (RGR): This refers to the relationship defined by the degrees of connection present within the Facebook group page with regards to the friends in the profile of the candidate. Therefore, this index will be an average of the amount of common friends of the candidate that are members of the five related groups associated with candidate group page(s). This is used as a proxy of the "distance" between two people that exist on the network. Liben-Nowell et al. (2005) used an index similar to this one in their analysis of LiveJournal, an online social network that use blogs as the main method of connecting users (Liben-Nowell et al., 2005).
  4. Present Group Artifacts (PGA): This refers to the textual postings and pictures present on the Facebook group page associated with the candidate and on the network with the past month. This would be a proxy for the symbolic processing that occurs within the superstructure of the online social network. Those within the network who have "media literacy" and mastery of mediated skills via representational tools (e.g., textual persuasive rhetoric in the form of mass messages and/or pictorial narratives) can influence those connected to this node (the group page) on the network (Greeno, 1997).
  5. Degrees of Separation between Candidates (DSC): This will be in the form of an statistic based on the average amount of members that separate a candidate from their opponents. This is based on J.A. Barnes analysis of social networks. This metric will be a proxy for the impact of the "less-bounded social systems" that exist within the online social network. This proxy also observes how ties (chains of influence) affects individuals and their relationships with real world events (Barnes, 1961). If there are only two candidates in a given group, then this value will be the same between the two candidates.
  6. Awareness of the Candidate (AC): This will be the quotient of friends within the Facebook group of the candidate and the total friends of the candidate. This metric will be a proxy for the how aware the people are of the candidate and the fact he or she is running for office. It is assumed that the friends connected to the candidate would be aware that the candidate is running for office. So, the bigger point of the study is how many people not directly friends of the candidate are aware of the candidate and the fact that he or she is running. This metric will compensate for those candidates who have a large number of friends and therefore "concentrates" the awareness of the candidate.
  7. Event Matrix (EM): This metric is used to determine the cross-relational data between the activities of the group that support the candidate on Facebook and the impact of those activities in the real world. To quantify this measurement, an algorithm developed by Matczynski (2006) will be used. Matczynski used a multi-dimensional concept called "FeatureVector" to normalize variations within individual profiles and allowed to the researcher to compare profiles within network. This distance metric can be modified to fit the context of this study. This will be used to determine the "reach" of events hosted by the candidate online in the real world. The formula that will be used to show the connection to candidate A to event B with relation to attendance to all the events XB1 + XB2 + XB3 … + XBn will be f(A) = 1- ∑XB / ((∑XB ∩ ∑BAn) (∑XB ∪ ∑BAn)).
  8. Trackbacks (TB): This metric will measures the relevance of the candidate within the network and the relevance of the candidate to the real world collegiate environment. TrackBacks are traditional the means that different web sites can post messages to one another not just to inform each other about citations, but also to alert one another of related resources. The way the trackbacks will be calculated for this research will be how often the candidate is referenced in the real world through online articles written about the candidate (2), about the party (.5), about the position (.75) or about an issue addressed by the candidate (e.g. the university's environmental policy) (.2) (Marlow, 2004).
  9. Action During 24-Hour Period Before the Election (ABE): This metric will look at the amount of activity (adding friends, committing to events and posting) conducted by the candidate within the last 24 hours of the election. This will be a proxy for last minute campaigning.
  10. Virtual Engagement of the Candidate (VEC): This final metric is designed to replicate the development of a common language within the online social network, the personalities that exist on this network and how well the candidate is able to navigate the long, complex and embedded environment that is Facebook. The metric will be a modified version the Dreyfus metric of interacting with members on the network. Dreyfus suggests that the sense of community evident on the Internet is only a kind of residue left from embodied, linguistic, social processes (Dreyfus, 2001). Therefore the modified Dreyfus' formula (the posts created the candidate plus pictures of the candidate plus the amount of pictures the candidate posts, then this figure will be multiplied by the response rate of the candidate to personal wall posts) is used to quantify the residue present on the network. It is important to note that this metric relates only to the candidate's campaign and may not truly represent the virtual engagement of the candidate within the entire online social network.

To fit into the HLM first the ten metrics will be transformed into a series of Z-scores. This will be done by using the universe of those running in the student elections and creating Z-scores across all of the metrics. This will allow for a larger amount of cases and should narrow the amount of variability within groups. These Z-scores will then be modified by transforming the Z-score back to their related percentile within metric group. The candidate's modified Z-scores across all of the metrics will be added together to create a virtual polling number (VPN). The candidate's VPN will then be divided by the sum of the group's VPN. Position groups, for the purpose of analysis, will be defined by the office of the candidate (i.e. those running for president will be in one group). This quotient will be the estimated polling percentage of the candidate. This figure should be the percentage of votes the candidate will receive on election day, plus or minus half the standard deviation of the EPP. The reason that half of the standard deviation is the margin of error is to correct for the transformation of the Z-score and to account for the "shift" within the matrix (Marlow, 2004).


Data was recorded during a thirty-six hour period before the student senate election at a given university. The information was recorded on a series of 1,340 .html files. Candidates were broken down based on position they were running for, if they have a Facebook profile and if they had a personal and/or a party Facebook group page. The original n for the project was 27, as the body of study was the president, vice president, treasurer and anybody that was running for a college senator position. The reason that the other positions (green senators, off-campus senators and the members of the student activities commission) were not accounted for in the matrix was they were deemed to be too massive for one person to study. The inclusion of those two groups would have increased the n to 63 and could have added up to another 1,000 .html files to sort through.

The 27 were broken down into three subgroups. First, there were three pairs within the matrix that only one had a Facebook profile. They were placed in subgroup C. The recorded data from the group pages were then cross-reference against each of the 21 remaining candidate's profile pages. The reason for this second step was that access to some of the candidate's pages were limited until they were added as friends. It was discovered during this step that two candidates in two separate races still did not grant access to their profile page. Therefore, four candidates (the two that didn't grant access and their opponents) were placed into subgroup B. The remaining 17 were in subgroup A. In terms of creating an EPP, it was determined that subgroup A would reflect the virtual community more accurately than subgroup B, which would reflect the virtual community more accurately than subgroup C.

After the data was complied, the calculated raw data was placed into the matrix and cross-tabbed based on candidates and subgroups (Table 1). The candidate's virtual engagement metric was enter into the matrix. The data was then transformed into their z-scores (Table 2) and finally transformed in their percentage (Table 3).

In terms of the presented hypothesis, the candidate who had the highest level of virtual engagement won their respective races. The final analysis of the matrix showed that it was able to predict what place the candidate came in 21 out of 27 times for all of the candidates, 19 out of 21 times in subgroups A and B & 15 out of 17 times in subgroup A. In terms of predicting the candidate's final percentage of votes received (within half the standard deviation of the EPP, which was .072722) during the election 12 out of 27 times for all of the candidates, 12 out of 21 times in subgroups A and B & 10 out of 17 times in subgroup A (Table 4).

The research used the HLM to analyze the results of the data. Therefore, it is important to note that the model can only be used to attempt to predict the ordeal results of the election. The method crafted by this research can not be used to determine the significance of the results nor can the effect size of the vectors be determined in this model. The only mechanism that can be is if there is a correlation between the aggregation of the vectors and the overall result of the election. Based on the final analysis of the data, there is a .7778 level of correlation between the study's projected finish of the candidates in the election and the actual final results of the election. If the data was fully observable from all of the candidates within the same group, the level of correlation increased to .8823. Therefore, it seems that this model represents a starting point for attempting to use online social network in the prediction of elections of a connected social group (the university).

It is also important to note that due to the open nature of the network, it was deemed necessary to anonymize the candidates as those individuals fell under the protection of the institutional research board of the local university that was hosting the study. As part of the research, no names could be released in association with the publication of this research.


It would be very easy to discuss how this model was effective in picking out nearly ninety percent of the winners of this particular election. However, there must be more fine-tuning to this research methodology in order to strengthen this model. The HLM present here in this study should only be considered a starting point for other researchers to modify and manipulate. The vectors developed within this model were a best guess in how the political system works online. The possible results from a research project such as this could help discover important essentials of the online community. What must be made clear is that this research represents essentially a pilot study into how an online social network serving a connected social ecology can be used to determine the outcome of the election with the right combination of vectors, proxies and statistics. I believe that this research represents a good starting point but it must be refined in order to be effective and more universal for other researchers studying the connection between online social networks and the social ecology they serve.


For the purpose of future research, it is important to note three critical points. First, this matrix is focused solely on candidates in elections with Facebook profiles, as opposed to election issues. The reason for this focus is due to the unstable nature of the online social network. Artifacts made online only exist for a short period of time. The matrix can measure trend that developed during this short windows, but it can only measure this trend as they relate to individual candidates and their respective parties. It is much harder to construct a matrix around an issue with a social network analysis.

Second, one of the metrics (Degrees of Separation between Candidates) was ineffective in measuring the "social separation" between the candidates as described by Barnes. A better metric should be developed to show the distance between the candidate and the student body. In terms of social distance, the further the candidate is away for the student body, the student body is less likely to vote for the candidate. Unfortunately, a proxy was not found to better quantify this phenomenon.

Third, it is important when using this model to have enough researchers to analyze the information. The 1,300 .html files were cross-referenced with the tables create for this project. Some of the information was coded based on either its relationship to the candidate or the party, this project took well over eighty work hours to analyze the recorded data. Therefore to be able to give a prediction before the election, one researcher must be assigned to every position group. If this ratio is met, the model should be developed within a two to four hour period.



Table 1. Raw Scores and Grouping Data Matrix Virtual Polling Data
Table 1. Raw Scores and Grouping Data Matrix Virtual Polling Data

Table 2. Z-Score Data Matrix
Table 2. Z-Score Data Matrix

Table 3. Percentage Data Matrix & Estimated Polling Percentage Results
Table 3. Percentage Data Matrix & Estimated Polling Percentage Results

Table 4. Final Data Analysis
Table 4. Final Data Analysis

Bibliographic information of this paper for citing:

Tilton, Shane (2008).   "Virtual polling data: A social network analysis on a student government election"   Webology, 5(4), Article 64. Available at: http://www.webology.org/2008/v5n4/a64.html

Alert us when: New articles cite this article

Copyright © 2008, Shane Tilton.