Webology, Volume 5, Number 1, March, 2008
|Home||Table of Contents||Titles & Subject Index||Authors Index|
Faculty of Information and Media Studies, University of Western Ontario, London, Ontario, Canada N6A 5B7. E-mail: iajiferu (at) uwo.ca
Received March 5, 2008; Accepted March 30, 2008
The objective of this exploratory study is to determine the delinking practices of webmasters of colleges and universities in Canada and the US. An online questionnaire was created and all the 92 webmasters of Canadian colleges and universities were invited to complete it while the invitation was extended to only 500 of their counterparts in the US. A total of 123 webmasters responded to the survey with about 88 % of them indicating that they have had to remove external links from their web sites before. According to most of the webmasters, delinking is done as often as required but less than 20 times a year. Reasons for delinking a web site are categorized into obvious (e.g., broken/dead links, and content of linked site is no longer current or relevant) and unobvious (e.g., political/philosophical difference with the owner of the linked site, and request from the webmaster of the linked site). Measures to minimize delinking are discussed.
Delinking; Link analysis; Universities; Colleges; Canada; US
Informetrics is the quantitative study of information production, storage, retrieval, dissemination, and utilization (Wolfram, 2000). The major areas of study within informetrics used to include: statistical aspects of language; characteristics of authors; characteristics of publication sources; citation and co-citation analysis; scientific productivity indicators; information growth and obsolescence; and document/information resource usage (Tague-Sutcliffe, 1992). However, since the rise in popularity of the Internet, researchers have tried to extend the applications of informetrics methods to the World Wide Web. Such applications have often been referred to as webometrics (Bjorneborn & Ingwersen, 2001).
A major thrust of webometrics studies is the analogy between a citation to a paper-based publication and an inlink (i.e. links pointing to or "sitation") to a web site (Smith, 2004; Vaughan & Shaw, 2003; Prime, Bassecoulard & Zitt, 2002, Rousseau, 1997). This has led some researchers to extend the comparisons to: motivations for citing and those for hyperlinking (Kim, 2000; Thelwall, 2003; Wilkinson et al., 2003; Bar-Ilan, 2004; Kousha & Horri, 2004); co-citations among scholarly publications and co-sitations among similar web sites (Larson, 1996; Faba-Perez et al., 2004; Thelwall & Wilkinson, 2004); and journal impact factors and impact factors of web sites (Ingwersen, 1998; Smith, 1999; Thelwall, 2000; Vaughan & Hysen, 2002).
One major difference between citation and sitation is that once a scholarly publication is cited, it cannot be uncited whereas a link to a web site can be removed. This phenomenon is known as delinking. The number of inlinks to a web site can increase the traffic to that site (Lohse & Spiller, 2003) and has also been factored into the algorithms used by some search engines, such as Google to rank web pages retrieved in response to a query (Brin & Page, 1998). Hence, delinking can decrease the traffic to a site as well as lower its rank in a set of pages retrieved by a search engine.
The main objective of this study is to investigate the extent of this phenomenon which has not attracted much attention from webometricians. Apart from a linked site being dead, we would like to find out what the other reasons for delinking are. For example, in 2002, the Authors Guild urged its members to remove links on their web sites to Amazon.com because it was offering used editions of current books (Associated Press, 2002). It is not known if the members heeded this advice.
We shall be reviewing two sets of studies: those investigating the reasons for hyperlinking in the first place (before we then investigate why these links are later removed); and those investigating retraction of published articles, which is the closest phenomenon in print publication to delinking.
There have been many studies investigating the motivations for hyperlinking with most situated in the academic environment. These include: those classifying link creation motivations among schools in a discipline (Chu, 2005), among university web sites in a country or region (Thelwall, 2003; Wilkinson et al., 2003; Bar-Ilan, 2004) or from universities in one country to those in another country (Kousha & Horri, 2004); and those examining motivations for hyperlinking to open access journal articles or hybrid of research-related web sites (Smith, 2004; Kousha & Thelwall, 2006), or in electronic articles (Kim, 2000). Content/context analysis was used to determine motivations for hyperlinking in all these studies except in the one by Kim where authors of electronic papers were interviewed to determine why they included links to external sites. Also, though these studies used different classification schemes for the hyperlink motivations, a common theme in all of them is that while a large percentage of the links were made for teaching or research reasons, a good percentage were also made for administrative/professional service and navigational purposes (e.g., link to resources, directories, subject indexes, personal home pages, etc.).
One hyperlink motivations study that was conducted outside the academic environment was by Vaughan, Gao and Kipp (2006). They examined links to 280 North American information technology companies, and listed the motivations for linking as acknowledging cooperate sponsors, link from customers, online directory, link from job advertisements, link to business partners, news articles about products or companies, list of products or clients or companies, and others (e.g. weblog, bookmark, product review, newsletter, etc.).
According to Nath, Marcus and Druss (2006), reasons for retracting articles could be categorized as misconduct (falsification, fabrication, or plagiarism) or unintentional error (mistakes in sampling, procedures, or data analysis; failure to reproduce findings; accidental omission of information about methods or data analysis). However, studies have shown that some retracted articles continued to be cited as valid work after being retracted (Budd et al., 1999; Neale et al., 2007), though at a reduced after retraction (Garfield & Welljams-Dorof, 1990; Pfeifer & Snodgrass, 1990; Whitley et al., 1994).
For this exploratory study, colleges and universities in Canada and the United States were chosen because of the availability of lists of their URLs. A list of URLs for American colleges and universities can be found at http://www.clas.ufl.edu/au/ while a similar one for the Canadian ones can be found at http://www.aucc.ca/can_uni/our_universities/index_e.html
An online questionnaire, containing five simple questions, was created on SurveyMonkey.com. Two different collectors were also created to collect responses from American webmasters and Canadian webmasters. In the case of the Canadian web sites, each of the 92 sites was visited to obtain the e-mail address of the webmaster. A short e-mail message was then sent to these webmasters inviting them to fill the online questionnaire. In a few cases, the web site did not have an e-mail address for the webmaster but an online form for comments/suggestions. In such cases, the form was used to send the invitation to the webmaster. In the case of the American web sites, it was not feasible to visit them all as there are over 1500 of them. Given that the sites are arranged in alphabetical order by the name of the university/college, a systematic sample of 500 web sites was taken. Each of these web sites was then visited for the e-mail address or form to be used in inviting the webmaster to fill the online questionnaire.
Responses were collected over a one-month period with 25 responses (i.e. 27.2% response rate) received from Canadian webmasters and 98 (i.e. 19.6 % response rate) from the American webmasters.
There was not much disparity between the responses between the American webmasters and their Canadian counterparts. Hence, a combined analysis is presented below.
Of the 123 respondents, about 88% have had the opportunity of removing external links from their web sites (see Figure 1). Of the few that were yet to remove external links, two indicated that they were new at their posts and were not sure of the past delinking practice of their sites. Another webmaster indicated that as a general policy, they do not link to external sites except for ticket purchasing sites; hence, removing external links is not a common practice for them. One other webmaster mentioned that they often correct broken links but he could not recall ever having to remove a link after it has been approved and posted. Of those who have delinked before, about 59% remove an average of 20 or less external links in a year, about 2 % remove more than 100 external links while about 24% could not give an estimate (see Figure 2).
In terms of how often delinking is done, almost two-third of the webmasters claimed to do it as often as required, especially whenever notified of a problem with an existing link or when re-designing a site (see Figure 3). According to one webmaster, "our procedure is very casual for removing external links. We usually are working on a page or content for other reasons when we discover external links that should be removed." About 26% of the webmasters claimed to delink on a regularly scheduled time basis, from daily to annually but monthly and quarterly are the two most popular time frequencies.
As done in retraction studies, the reasons for delinking can be categorized into two, namely obvious (i.e. apparent), and unobvious. In the first category, and as expected, the number one reason given by webmasters for delinking is dead/broken/incorrect links (see Table 1). One pair of reasons has to do with the content of the site being linked to. About 71% of the webmasters indicated that they remove links if the content is no longer relevant, especially if the content has been changed. The other reason is if the content is no longer current. According to one webmaster, "delinking is important to keep the site fresh and up-to-date. Like collection development in a library - get rid of an out-dated book and replace it with a more current one" while another stated that "most external links are related to a news story or an event. Once the event has passed, the link is removed." Another pair of reasons has to do with the linking page. Links are automatically removed if the page linking is removed while links may also be removed if the content of the linking page is modified.
|Reason for Delinking||Number of Webmasters|
|Linked site is dead||99 (91.7%)|
|Content of linked site is no longer current||68 (63.0%)|
|Content of linked site is no longer relevant||71 (65.7%)|
|Political/philosophical difference with owner of linked site||15 (13.9%)|
|Linking page is removed||53 (49.1%)|
|Linking page is modified||55 (50.9%)|
In the unobvious category, we have "political or philosophical difference with the owner of the linked site", and those labelled "others" in Table 1. These other reasons include: if the linked site's design or content is found to be of poor quality; if the content of the site linked to may be construed as an endorsement/partnership where none exists; if requested to remove the link by the public relations department of the owner of the linking site; or if requested to remove the link by the owner/webmaster of the linked site.
In this exploratory study, we have established that webmasters, especially those at colleges and universities in Canada and US, do delink on a regular basis, and reasons for delinking include dead links, site outdated, site is poorly designed, and site's content is of low quality. Given that most web search engines use the number of inlinks as one of the criteria in ranking search output, a site owner/webmaster should try to minimize the risk of the site being delinked from by other sites by ensuring that the site is well designed, rich in content, up-to-date, and if the site's (or any of its pages) address changes, users are re-routed from the old address to the new address.
It should be noted that the reasons given above for delinking are not specific to a particular web site but they pertain to the general practice by webmasters. To determine reasons for delinking from particular web sites (and possibly how frequently these sites are being delinked from), one may need to select a set of web sites, determine the external inlinks to each of them using a commercial search engine or a specially designed web crawler. Then, for a certain period of time, say 2 years, and at each interval, say monthly, the external inlinks to these sites will have to be determined again. For each site, the external inlinks for two successive periods will have to be compared to see the inlinks that have added or removed. In the case of inlinks removed, i.e. delinks, it would first have to be ascertained if they are dead. If they are not, then an e-mail will then have to be sent to the webmasters of these delinks to find out the reasons for delinking. The data collected from this kind of study can also be used to develop a model for the growth of inlinks to a site over time. It would be interesting to see if the rate of delinking is significant enough to warrant including a delinking parameter in such a model.
Another issue related to delinking that is worth investigating in the future is the stability of authority sites. In a field of study, a site is considered an authority if it is pointed to by many other sites (Kleinberg, 1999). Based on this criterion, one could produce a ranked list of authority sites in a field. In the face of delinking, it would be interesting to study the stability of the ranks of authority sites in a field over time.