Webology, Volume 5, Number 3, September, 2008

Home Table of Contents Titles & Subject Index Authors Index

Tag Gardening for Folksonomy Enrichment and Maintenance


Isabella Peters
Research Scholar, Institute for Language & Information, Department of Information Science, Heinrich-Heine-University Duesseldorf, Germany, Universitaetsstr. 1, 40223 Duesseldorf, Germany.
E-mail: isabella.peters (at) uni-duesseldorf.de

Katrin Weller
Research Scholar, Institute for Language & Information, Department of Information Science, Heinrich-Heine-University Duesseldorf, Germany, Universitaetsstr. 1, 40223 Duesseldorf, Germany.
E-mail: weller (at) uni-duesseldorf.de

Received August 7, 2008; Accepted September 25, 2008


Abstract

As social tagging applications continuously gain in popularity, it becomes more and more accepted that models and tools for (re-)organizing tags are needed. Some first approaches are already practically implemented. Recently, activities to edit and organize tags have been described as "tag gardening". We discuss different ways to subsequently revise and reedit tags and thus introduce different "gardening activities"; among them models that allow gradually adding semantic structures to folksonomies and/or that combine them with more complex forms of knowledge organization systems. Moreover, power tags are introduced as tag gardening candidates and the personal tag repository TagCare is presented.

Keywords

Social tagging; Folksonomy; Tag gardening; Emergent semantics; Power tags; Tagcare; Knowledge organization system; Knowledge representation; Personomy



Introduction

This article is an extended and revised version of the paper presented in (Weller & Peters, 2008). It is in first case a discussion paper providing an overview of techniques which can be applied to optimize tags in a folksonomy data set, followed by our own suggestions on the set of introduced problems.

Social tagging functionalities are by now a common feature for most social software applications (e.g., video or photo sharing platforms, social networking and social bookmarking tools). Folksonomies are used to organize various types of resources such as scientific articles (Hotho et al., 2006b), references, books (Spiteri, 2006) bookmarks, pictures (Beaudoin, 2007), videos (Paolillo & Penumarthy, 2007), audio files, blog posts (Brooks & Montanez, 2006), discussions, events, places (Kennedy et al., 2007), people (Farrell et al., 2007) etc. They have been greatly accepted by (Web) users as well as by a considerably large scientific community - although several shortcomings of folksonomies have been pointed out (Peters, 2006; Peters & Stock, 2007). These critiques are mainly based on comparisons of folksonomies with traditional methods of knowledge organization systems (KOS, like thesauri, classification systems etc.) and professional indexing techniques. Yet, the boundaries between structured KOS and folksonomies are not at all solid but rather blurred. This means, amongst others, that folksonomies can adopt some of the principle guidelines available for traditional KOS and may gradually be enriched with some elements of vocabulary control and semantics. On the other hand, folksonomies provide a useful basis for the stepwise creation of semantically richer KOS and for the refinement of existing classifications, thesauri or ontologies (Weller, 2007).

One of the basic questions regarding the enhanced use of folksonomies is: how to combine the dynamics of freely chosen tags with the steadiness and complexity of controlled vocabularies? It appears that a gradual refinement of folksonomy tags and a stepwise application of additional structure to folksonomies is a promising approach. Some platforms already provide different features to actually manipulate, revise and edit folksonomy tags (e.g., Delicious' tag bundles or the search engine RawSugar; Begelman et al., 2006). Theoretical approaches for structural enhancement of folksonomies are discussed under such diverse headlines as "emergent semantics" (Zhang et al., 2006), "ontology maturing" (Braun et al., 2007), "semantic upgrades" or "semantic enrichments" (Angeletou et al., 2007).

Lots of research in this regard currently deals with developing different algorithms to restructure folksonomies automatically (Grahl et al., 2007; Brooks & Montanez, 2006; Begelman et al., 2006). Jäschke et al. (2006) discuss the implementation of automatically mined tag recommendations for information indexing and information retrieval. The examined tagging systems are BibSonomy (Hotho et al., 2006b), a social bookmarking service for scientific articles and URLs, and Last.fm, a collaborative filter system for music recommendations. The tag recommendations are either based on related tags determined by the FolkRank algorithm (Hotho et al., 2006a) or they are based on the most popular tags of the whole database. The evaluation of both the indexing and search results shows that "the more tags of the recommendation are regarded, the better the recall and the worse the precision will be. [. . .] Finally, using the most popular tags as recommendation gives very poor results in both precision and recall" (Jäschke et al., 2007, 511).

Laniado et al. (2007a; 2007b) analyse the impact of query expansion via WordNet synsets. Automatically determined WordNet branches of related terms for the query tag(s) are presented to the user, which he can then choose manually for query expansion. The evaluation of this approach shows that problems occur with query expansion via WordNet and the search results. As only 8% of the used index tags can be found in WordNet (Laniado et al., 2007b, 194) the search engine is not able to retrieve relevant documents because the comparison of index tags and search tags (from the user and from WordNet) is impossible. This problem is due to the high variability in building compound terms. Nevertheless, the authors found that query expansion via WordNet is very helpful for merging synonyms (Laniado et al., 2007b, 200). Al-Khalifa and Davis (2007b) discover that folksonomies are very appropriate sources for the extraction of new terms which can then be implemented in existing ontologies. The authors state that this is due to the "latent (implicit) semantics embedded in the tags" (Al-Khalifa & Davis, 2007b). A subsequent study of Al-Khalifa and Davis (2007a) can verify this assumption. They found out that the following semantic relations exist between tags: 1) same (spelling variants or acronyms), 2) synonymy, 3) broader term, 4) narrower term, 5) related term (of a comparable thesaurus descriptor), and 6) related (an unspecific relation). The knowledge about these relations can be used for the refinement and expansion of ontologies or KOS. Angeletou et al. (2007) mostly confirm the findings of Al-Khalifa and Davis (2007a). And Kipp (2006c) can demonstrate that relations between tags do exist which were defined as associative relations in KOS, but were valuable sources for more fine-grained semantic relations in both folksonomies and KOS.

Another study of Al-Khalifa et al. (2007) reports the development of the tool "FolksAnnotation" which is used for information indexing and information retrieval. The tool provides the user with tag suggestions based on terms and relations of existing ontologies which he can use during indexing and searching. The evaluation of this system results in following statement of the authors: "These results demonstrate that semantic search outperforms folksonomy search in our sample test, this is because folksonomy search, even if the folksonomy keywords were produced by humans, is analogous to keyword search and therefore limited" (Al-Khalifa et al., 2007). Finally, Sen et al. (2006) show that the tagging users can be influenced by the tags which are already assigned to a resource. Thus, users can be directed in a specific direction by providing particular tags or purporting particular levels of the tags' specificity. The authors conclude: "a new tagging system might be seeded by its designers with a large set of tags of the preferred type. Our results suggest that users would tend to follow the pre-seeded tag distribution" (Sen et al., 2006, 190).

The results of the few former studies show that tag manipulation activities support users during indexing and searching and that structured folksonomies are able to enhance recall but fail in enhancing the precision of search results. This is primarily due to the lack of linguistic processing of the tags which has to be performed in advance of the semantic disambiguation of tags. Another problem of the automatic development or extraction of tag relations is the differentiation of the various associative relations or the allocation of somehow related tags. We discuss "tag gardening" as a mainly manual activity, performed by the users to manage folksonomies and gain better retrieval results, which can be supported by certain automatic processes.

Tag Gardening - Revision and Maintenance of Folksonomies

The image of "tag gardening" has been introduced in a blog post by James Governor (Governor, 2006). By now it is used to describe processes of manipulating and re-engineering folksonomy tags in order to make them more productive and effective. Along the lines of this, we now specify different "gardening activities" which are relevant for the maintenance of folksonomies and their effective usage in the course of time. These activities are to some extent based on common procedures for building classical KOS (e.g., Aitchison et al., 2004).

To discuss the different gardening activities, we first have to imagine a document-collection indexed with a folksonomy. This folksonomy now becomes our garden, each tag being a different plant. Currently, most folksonomy-gardens are rather savaged: different types of plants all grow wildly (see Figure 1). Some receive high attention, others almost none. Some are useful for the community and retrieval tasks, others are not as they are highly personal or rather inappropriate for indexing purposes (e.g., the tag "me" or tags with spelling mistakes). - Actually, folksonomies have been criticized for being a "mess" (Tanasescu & Streibel, 2007). First approaches to make them more easily accessible and navigable are for example tag clouds, computations of related tags by co-occurrence, or tag-recommendations.

Figure 1: Tag cloud (source: www.flickr.com) and savaged folksonomy garden
Tag cloud: folksonomy garden

In the long run, improvement of folksonomies will be needed on different levels: (a) Document collection vs. single document level: should the whole collection of all tags of a folksonomy be edited in total, or does one only want to change the tags of a single document. (b) Personal vs. collaborative level: We may distinguish tag gardening performed individually by single users for the personal tags they use within a system (personomy level; Hotho et al., 2006a), and situations that enable the whole user community of a certain platform collectively or collaboratively to edit and maintain all tags in use (folksonomy level). (c) Intra- and cross-platform level: Usually, a folksonomy is defined as the collection of tags within one platform or system. Yet, for some cases the use of consistent tags across different platforms will be useful.

Basic Formatting

One basic problem of folksonomies is that there is no guarantee for correct spelling or consistent formatting of tags (Guy & Tonkin, 2006). The very first activity in tag gardening would thus be weeding: Tag weeding is the process of removing "bad tags" (e.g., tags with spelling mistakes). Elimination of spam tags (Heymann et al., 2007) should be the simplest form of tag weeding, and can probably even be performed automatically (to keep the image of gardening, this automatic spam removal could be characterized as using pesticides). As in real gardens, the identification of weed is not always easy. For example, one has to consider which tags may be removed from the whole folksonomy (e.g., cuss words) and which should only be removed manually from certain documents.

Furthermore, due to the nature of most folksonomy tools which do not allow adding multi word concepts as tags, we end up with inconsistent makeshifts such as "semanticweb", "semanticWeb" or "semantic_web" (Tonkin, 2006). In this case, to make the folksonomy more consistent, a decision would be needed about which forms are preferred, and which should be removed as weed (of course on the document level, these tags should not be removed completely but replaced by the preferred terms - the same also holds for typing errors). In social tagging applications it seems more feasible to provide some general formatting guidelines to the tagging community in advance, than to manually re-edit tags. Alternatively, we may treat these spelling variants as synonyms (see below).

Similar problems arise with the handling of different word forms, e.g. singular and plural forms or nouns and corresponding verb forms. For example, the reduction to only singular nouns may be useful for enhancing recall e.g. in publication databases (if both singular and plural forms are allowed one would miss documents tagged with "thesauri" if searching for "thesaurus"), while it would bring about loss of information in other cases, e.g. for photo databases (where one may for example explicitly want to look for a photo showing more then one cow and would therefore need the plural "cows").

Such formatting problems can be addressed automatically with methods of Natural Language Processing (NLP) (Peters & Stock, 2007). Here, problems may arise with user generated tags which cannot be found in (multi-lingual) thesauri or dictionaries. Laniado, Eynard, and Colombetti (2007b) report that "only about 8% of the different tags used are contained in the lexicon" (Laniado, Eynard, & Colombetti, 2007b, 194). These unknown tags cannot be directly edited with methods of NLP. Their processing should be handed over to the users themselves. Additionally, a folksonomy based system would profit from editing functionalities which allow users to delete or edit the tags assigned to single documents and (carefully) remove certain tags from the whole system. Furthermore, some formatting guidelines may be provided to or discussed by the users (particularly for closed user groups like in corporate tagging systems) - although it has to be kept in mind that this is somehow contradictory to the intention of "free for all" tagging systems. Generally, tag gardening activities should be mainly established on top of existing folksonomies and not in advance.

Tag Popularity

One common entry point to folksonomies are tag clouds. They display most popular tags in different font sizes according to the degree of popularity.

In some cases, the most popular tags are too general to render precise and useful retrieval results, e.g. ending up in enormously large hit lists (Paolillo & Penumarthy, 2007; Muller, 2007; Kipp, 2006b). In this case, it might be necessary to explicitly seed new, more specific tags into the tag garden. These little seedlings will sometimes require specific attention and care, so that they do not get lost among the bigger plants. An inverse tag cloud (showing rarely used tags in bigger font sizes) can be used to display some very rarely used tags and provide an additional access point to the document collection. First approaches of other visualisations of tag clouds are provided in (Hassan-Montero & Herrero-Solana, 2006), where the authors propose similarity calculations (e.g., Jaccard-Sneath Coefficient and Vector Space Model) for visualising semantic neighbourhood of tags. This new display should enhance browsing facilities of tag clouds.

Vocabulary Control, Tag Clustering and Hierarchical Structures

After the formatting problems have been solved, the actual requests of the "vocabulary problem" (Furnas et al., 1987) begin: In folksonomies (a) synonyms are not bound together (thus, someone searching a photo-portal for pictures of "bicycles" would also have to use the tags "bike" and probably even translations to other languages for comprehensive searching and higher recall); (b) homonyms are not distinguished (searching for "jaguar" will retrieve pictures of the animal as well as the car); and (c) there are no explicit relations to enable semantic navigation between search- or index-terms (e.g., a search for photos of "cats" cannot automatically be broadened to include "siamese", "european shorthair", "birman" etc.). This lack of vocabulary control is the price for facile usability, flexibility and representation of active and dynamic language. Yet, the additional and subsequent editing of folksonomies may be the key for allowing free tagging as well as basic control functionalities over the vocabulary in use. Folksonomy users become more and more aware of these effects - which is the basis for introducing gardening techniques to enable the user to improve their tags.

For our garden this means, that we have some plants that look alike, but are not the same (homonyms), some plants which can be found in different variations and are sometimes difficult to recognize as one species (synonyms) and others which are somehow related or should be combined. Thus, we have to apply some garden design or landscape architecture to turn our savage garden. We may use labels for the homonyms, and establish flower beds as well as paths between them and pointers or sign posts to show us the way along the synonyms, hierarchies and other semantic interrelations (see Figure 2). We need some additional structure and direct accessibility to provide additional forms of (semantic) navigation (besides tag clouds, most popular tags and combinations of tags-user-document co-occurences).

Figure 2: Tag Garden with nicely arranged flower beds aka collection of synonyms and division of homonyms
Tag garden with nicely arranged flower beds aka collection of synonyms and division of homonyms

Tag garden with nicely arranged flower beds aka collection of synonyms and division of homonyms

Within classical KOS, homonyms are often distinguished by additional specifications (e.g., "bank (finance)" vs. "bank (river)") or unique identifiers (e.g., notations in classification systems). Synonyms can be interlinked to form a set of synonyms, sometimes "preferred terms" are chosen which have to be used exclusively to represent the whole set. In folksonomies, synonyms can be detected by comparison with thesauri or lexical databases. The tag gardening system may check the tags in the background; and in case of a match it informs the user that he currently uses several synonyms during indexing and asks him, whether they should be bound together. A preferred tag may be subsequently chosen by the user which he then uses constantly and solely for indexing - while the system has noticed and saved the synonym connection. The advance of this approach is that the user only uses the preferred tag for indexing but the system may add the identified synonyms to the resource. In this way, the user indexes constantly within his personomy but simultaneously allows other users to access the document via several tags to the document.

Some folksonomy systems already provide functionalities to derive "clusters" and "related tags", which mainly rely on information about co-occurrence and term frequencies. For example, the photo-sharing community Flickr provides a clustering function to distinguish homonyms, (e.g., http://www.flickr.com/tags/jaguar/clusters). Lots of research concentrates on automatic clustering and different clustering algorithms (Begelman et al., 2006; Schmitz, 2006; Grahl et al., 2007). Methods for automatically distinguishing homonyms - sometimes also referred to as "tag ambiguity" - in folksonomies by context information (users, documents, tags) are also being developed (Au Yeung et al., 2007). Even automatic approaches for "converting a tag corpus into a navigable hierarchical taxonomy" (Heymann & Garcia-Molina, 2006) can be found.

Besides these automatic approaches, options for individual manual manipulation of tags are needed. This is particularly useful for personal tag management, where categories, taxonomies and cross-references of tags can be built and maintained for individually customized resource management. Delicious already offers a simple model for grouping different tags manually under different headlines, called "tag bundles".

Folksonomies typically include implicit relations between tags (Peters & Weller, 2008) which should be made explicit in order to obtain gradually enriched semantics. A first approach could be the "tagging of tags" and their interrelations as discussed by (Tanasescu & Streibel, 2007).

Interactions with other Knowledge Organization Systems

Some of the problems discussed above can also be approached by combining Folksonomies with other, more complex Knowledge Organization Systems which then act as fertilizers.

Behind the scenes of a folksonomy system, thesauri or ontologies may be used for query expansion and query disambiguation (Au Yeung et al., 2007). Search queries over folksonomy tags may be (automatically) enhanced with semantically related terms, derived e.g. from an ontology. For example, WordFlickr expands query terms with the help of relational structures in the WordNet Thesaurus to perform enhanced queries in the Flickr database (Kolbitsch, 2007). Users submitting a query to WordFlickr may choose which types of relations (e.g., synonyms, hypernyms, hyponyms, holonyms or meronyms) should be used for expanding the query. Thus, if a user searches for "shoes" the query may be expanded with the hyponyms "slippers" and "trainers" to retrieve pictures tagged with these subtypes of shoes from the Flickr collection.

Furthermore, an ontology can be used for the tag recommendation process (which is by now based on co-occurrences). In an ontology-based approach, the nature of the suggested tags could be made explicit, which would help the user to judge its appropriateness. For example, if a user types the tag "Dublin", an ontology-based system might also suggest to use the broader-terms "Dublin County Borough" and "Ireland" or the synonym "Baile Átha Cliath"; another user choosing the tag "folksonomy" might be provided with the information that a folksonomy is_used_by "social software" and can then decide whether this tag should be added (Weller, 2007).

Fertilizing folksonomies with existing KOS is a promising approach to enable semantic enrichment. The key factor for success will be the availability of enough appropriately structured vocabularies. Angeletou et al. (2007) are developing algorithms to automatically map folksonomy tags to ontologies which are currently available on the Web to make semantic relations between tags explicit: "we can already conclude that it is indeed possible to automate the semantic enrichment of folksonomy tag spaces by harvesting online ontologies" (Angeletou et al., 2007).

Distinguishing Different Tag Qualities and Purposes

Another peculiarity of folksonomies is that tags can be intended to fulfil different purposes. We do not only find tags referring to the documents' content, but also to its author, origin, data format etc., as well as tags which are intended for personal (e.g., "toread") and interpersonal (e.g., "@peter") work management (Kipp, 2006a). Currently, all these tags are handled indifferently in folksonomy systems. That means, in our garden we have all different plants used for different purposes wildly mixed up (e.g., economic plants mixed with ornamental plants and medical herbs) - which of course makes it hard to find exactly what we need. Thus, our garden would need some additional structuring and - most of all - labelling. We need a way to distinguish the different tag qualities and label the tags accordingly (we may even decide to have different gardens, one for agriculture next to a flower garden and a herb garden and probably even a wine yard). While the clusters and hierarchies as discussed above focus on the meaning of tags and should represent some kind of real-world knowledge in a structured way, this additional level regards the different purposes of tags. In practice, this distinguishing of different tag qualities might be done in form of facets, categories or fields.

For each document different fields may be provided for tagging according to the different tag functionalities, e.g. one for content-descriptive tags, one for formal tags (or more specific one for the author, one for the document's file type, etc.), and one for organizational tags (e.g., task organization, reference to projects). Alternatively, complex naming conventions could be established to specify the purpose of non-content-descriptive tags. Certain conventions are already coming up to use specific formats for labelling different tag purposes (like the "@" in "@name"-tags which are attached to documents to be forwarded to a colleague or friend).

As manual adding this information increases cognitive costs for users a (supportive) automatic extraction of tag purposes is preferable. Some purposes may be automatically derived from the tags' syntax. The "@" is a strong indicator of interpersonal work management since it is indicating that the tagged resource is supposed for a particular person. Tag combinations of "to+verb" are indicators of personal task management and can be interpreted as future activities.

At this stage, we have our garden weed-free, with nicely arranged flower beds and walking paths between different areas and now with different areas for differently used plants plus little information panels providing information about what the plants can be used for.

Personal and Cross-Platform Tag Maintenance

Manual Tag gardening is rather difficult to be done collaboratively by a community and within a shared tag collection - particularly if the system is not explicitly collaborative (the community is aiming to agree on a shared representation of a topic, like in Wikipedia) but rather profits from the collective (the representation of the topic is the sum of primarily individual ideas) participation of a community (Vander Wal, 2008).

At least for small communities (e.g., single working groups) the use of shared guidelines for tagging behaviour may be useful. In the context of small groups, collaborative tag gardening tools may be considered. If specific communication channels are provided, the community may agree on a shared structure for their tagging vocabulary. Yet, there is always the danger that one user destroys the arrangements another one has made, or that someone regards certain tags as weed while others consider them pretty flowers. Thus, in larger communities the manual editing and weeding of tags should rather be done on a personal level by each user or should be carried out for the community by some sort of 'folksonomy administrator'. This administrator or moderator could be represented by the owner of the tagging system or by several users who gained the administrator status because of providing high-quality content or contributing meaningfully. They would act as central gardeners within the tag garden and are responsible for any kind of gardening activity. In this way, tag gardeners (whether the user himself or the administrator) enhance the performance of the tags. The collection of these individual approaches can be used for computing and providing tag recommendations - both during the tagging and the information retrieval process.

As folksonomies may consist of thousands of tags and even a personomy may comprise hundreds of tags (see Figure 3) the administrator or the user need some support during the tag gardening activities. Which tag should be considered first? Are any tags somehow related and may I use this connection for something? These are questions a user or administrator is faced with. Thus, it makes sense to choose some tag candidates from the whole folksonomy which can be used as tag gardening starting points. They should be extracted automatically given the huge number of tags, while the actual tag gardening activity has to be carried out by the administrator/ user intellectually. We propose so-called "power tags" as tag candidates.

Power Tags as Tag Gardening Candidates

Power tags depend on the tag distributions according the tags' popularity and can be determined on the document level or on the platform level. We assume that two different tag distributions may appear: a) the well-known power law, a Lotka-like curve (see Figure 5) and b) an inverse-logistic distribution (Stock, 2006) (see Figure 4). A Lotka-like power law (Egghe & Rousseau, 1990, p.293) has the form f(x) = C / xa, where C is a constant, x is the rank of the tag relative to the resource, and a is a value ranging normally from about 1 to about 2. The inverse logistic distribution follows the formula f(x) =e-C'(x-1)b, where e is the Euler number, x is the rank of the tag, C' is a constant and the exponent b is approximately 3.

Both distributions have a shared characteristic, the "long tail" on the right hand side of the curve. But they differ fundamentally in their beginnings. The power law shows only few tags on the left hand side (the number depends on the exponent; see Figure 5) reflecting that the users' tagging activities are concentrated on these tags, whereas the inverse-logistic distribution shows several tags applied with similar frequency in the beginning of the curve (see Figure 4). We can call these tags the "long trunk" in analogy to the "long tail" metaphor (Peters & Stock, 2007). Power tags are these tags which are located between the beginning and the turning point of the curve or a particular threshold value respectively (see example below).

Figure 3: Personomy of the user "MarmaladeToday's" for 1.157 bookmarks (retrieved 2008-03-05 from Delicious)
Personomy of the user

Personomy of the user

The actual processing of power tags for tag gardening and tag recommendation works as follows: the first step is to determine power tags on the resource level. This calculation is carried out for each and every resource of the collaborative information service. Since these power tags are - in sense of collective intelligence - the most important tags in giving meaning to the document, they should be chosen for the following process. Now, the n numbers of the power tags should each be investigated regarding their relationships to other tags of the whole database - in other words, a calculation of co-occurrence is carried out for the power tags. This calculation produces again specific tag distributions, where we can determine power tags as well. These new power tags are now the candidate tags for the emergence of semantics since their connection to the power tags of the first step seems to be very fruitful. In this connection the existence of semantic relations between the both power tags will be most obvious. Let us explain the procedure with an example from the social bookmarking service Delicious (see Figure 4).

Figure 4: Inverse logistic distribution for determination of power tags (retrieved 2008-05-15 from Delicious)
Inverse logistic distribution for determination of power tags

Figure 4 shows the tag distribution for the bookmarked URL www.readwriteweb.com which follows the inverse logistic distribution with C´= approximately 0,16 and b= 3. Here, we consider all tags up to the turning point of the curve as power tags - thus, from "web2.0" to "technology". Those seven tags are our power tags for which we examine co-occurrences with all other tags in our test set. Co-occurring tags for "web2.0" are for example "tools", "social", "blog", "socialnetworking", "community", "web", "webdesign", and "education". Those co-occurring tags follow a power law distribution (Figure 5) with approximately a = 1. The first n tags (say n = 2) are now considered as power tags of the next step and are the basis for the intellectual tag gardening activities or may be suggested as related tags for information retrieval or information indexing. What is more, these candidate tags can also be used for exploiting the hidden semantics in folksonomies via semantic relations. Let us explain the idea by means of Figure 5.

Figure 5: Power law distribution for co-occurring power tags
Power law distribution for co-occurring power tags

Figure 5 shows that the tag "web2.0" is frequently combined with the tags "blog", "web", and "technology" in decreasing order. Accordingly, the basic assumption for tag gardening is that tags which often co-occur have to be of similar meaning or have to be linked in a meaningful relationship. Thus, to name an example of these power tags with n = 2, the tags "web" and "web2.0" form a (taxonomic) hierarchical relation where "web" is a superordinate concept and "web2.0" is the subconcept. A part-of hierarchy can be found between the tags "web2.0" and "blog", since blogs are parts of the Web 2.0. In order to achieve richer semantics, we would thus establish these newly detected hierarchical connections as paradigmatic relations within the folksonomy (which in return grows to become a more formal KOS).

For communities, the use of (semi-)automatic tools may generally be an appropriate solution for tag gardening and tag suggestions (e.g., for query expansion). Tag candidates (based on power tags) may be the basis for further user discussions regarding their inclusion or their semantic relations. Thus, they are a proper means for supporting semantic enrichment of folksonomies.

Personal Tag Repository

On the personomy level, we do mainly need cross-platform solutions for tag maintenance and gardening. Someone, using different Web 2.0 tools in parallel might want to use his own terminology consistently across the different platforms. There are approaches which are aiming at these requests. The search engine "MyTag" (Braun et al., 2008) allows users to search different platforms (YouTube, Flickr, and Delicious) simultaneously. What is more, the users are allowed to restrict the search on the tags of their different platform-dependent personomies and to rank the search results according to their personomy tags. Tagsahoy! follows the same approach and lets the users search their tags with one search engine on different platforms (Delicious, Flickr, Gmail, Squirl, LibraryThing, and Connotea). For every website the user visits the Mozilla Firefox add-on "Tags Everywhere" displays the tag cloud for this website extracted from Delicious.

While these services concentrate on searching the personal tags or on recommending used tags they are disregarding the users' need for tagging consistently across different Web 2.0 services. Currently, the user has to be aware of the exact spelling variants and synonymous expressions he has used within different platforms, when he wants to retrieve documents from his collection (e.g., whether he has used the term "Web_20", "web20" or even "social_web" for tagging documents in bookmarking systems).

A potential solution would be a personal tag repository, an individual controlled vocabulary to be used independently with the different platforms. We envision a small tool which helps a user to collect, maintain and garden his very own tagging vocabulary. The user should be able to collect all tags he has used within different folksonomy systems (ideally with additional information on how often a single tag has been used in the different systems) and should then create his own vocabulary hierarchy, synonym collections and cross-references to related terms.

Motivated by this, we are developing TagCare (Golov et al., 2008) to help users to apply the same tags uniformly in different platforms.

TagCare

TagCare currently supports Flickr, Bibsonomy and Delicious; i.e. it allows a user to import his personal tags from these platforms into TagCare and to maintain them all in one place. TagCare has been developed in PHP; with JavaScript and Ajax for the user interface. It is realized as a web application (a browser plug-in is under development) and is using a MySQL database to store the information. It collects tagging data from the mentioned applications via their APIs - using the API implementations phpFlickr and php-delicious, to integrate Bibsonomy an own implementation (phpBibsonomy) had to be written. The database includes three types of data: users, tags, and tag interrelations (with the latter two always being assigned to exactly one user).

When a user signs up for TagCare he provides his login data for the different social tagging services he uses and can then import his tags from Flickr, Bibsonomy or Delicious. Basic statistics are provided on how often the user has applied single tags in total for all services and separately for Bibsonomy and Delicious - these statistics are based on usage data how often a user has applied a certain tag - as submitted by the services' API. Currently, Flickr does not provide this type of information). This can be displayed as tag cloud or as a ranking of the most frequently or least frequently used tags. This can help to detect tags which have been used too often (and thus become too general) or very seldom (which may indicate that they should be bundled with others). Basic editing functionalities for tags comprise renaming and deleting of tags as well as directly creating new terms in TagCare. In future, users should be enabled to predefine their preferred spelling variants (e.g., preferring singular over plural, preferring British English over American English spelling, or separating compound words by underscore or camelCase). Coupled to an underlying dictionary or even a software for speech pattern recognition, the user could be warned if he is deviating from the favored settings (e.g., using a plural form although singular is preferential).

The advanced editing options in TagCare concern the organization of tags. Knowledge relations between concepts are the structures that add semantics to a tag collection. The fundamental types are hierarchical relations. Hierarchies can easily be established in TagCare by drag and drop principle. Currently, TagCare does not distinguish is_a and part_of hierarchies. Furthermore, a relation of equivalence is of importance as it interlinks synonyms and quasi-synonyms, i.e. words that have exactly or almost the same meaning or can be regarded as being the same within a certain context. In TagCare, synonyms can by interlinked via a pre-defined synonymy-relation. Finally, one may label two tags as being generally related terms. This unspecified relation should in future be complemented by some more specific semantic relations, such as "is_opposite_of" - and further tag interrelations which can be freely named by the user.

So far the basis is provided for collecting, editing and structuring tags platform independently on a personomy level. While this currently means importing tags from different social software services, the next steps will be the other way round: to enable directly searching social software collections via tags from TagCare as well as directly tagging documents out of TagCare. Open questions regarding tagging and editing out of TagCare concern the transfer and representation of edited or structured tags in the underlying platforms like Bibsonomy.

Probably, at a later stage, such a tool could also be used for the exchange of terminologies within communities. Each user could take a walk in other users' tag gardens - and probably bring home some cuttings or seeds for his own one. A rather distant vision could then be the merging of two different personomies.

Conclusions and Future Work

This article provides an overview on activities which help to maintain and enhance folksonomies. We have discussed formatting guidelines, vocabulary control, distinguishing of different tag qualities and combinations with other KOS as major activities for improving social tagging systems. Automatic and manual approaches (based on power tags as tag gardening candidates) should be combined. In future, we expect more and more of these aspects to be integrated to existing tagging systems. Still missing is the evaluation of the tag gardening activities in terms of enhancing precision, of effecting recall and of evidencing linguistic problems in folksonomies. This is part of our future work.

Furthermore, our future work comprises the integration of tagging activities into collaborative ontology engineering processes and the critical investigation of semantic relations as a means for gradually enriching folksonomies. The personal tag repository TagCare is currently under development. Evaluation of its functionalities and usability tests are planned.

Acknowledgements

We would like to thank the anonymous referees and our colleagues from the Department of Information Science for their helpful advice. Special thanks go to Evgeni Golov who is concerned with the technical development of TagCare.

References


Bibliographic information of this paper for citing:

Peters, Isabella & Weller, Katrin (2008).   "Tag gardening for folksonomy enrichment and maintenance."   Webology, 5(3), Article 58. Available at: http://www.webology.org/2008/v5n3/a58.html

Alert us when: New articles cite this article

Copyright © 2008, Isabella Peters & Katrin Weller.