Webology, Volume 6, Number 2, June, 2009

Home Table of Contents Titles & Subject Index Authors Index

Ensuring the discoverability of digital images for social work education: an online "tagging" survey to test controlled vocabularies

Ellen Daly
Knowledge & Information Assistant, Institute for Research & Innovation in Social Services, 151 West George Street, Glasgow, G2 2JJ. E-mail: ellen.daly (at) iriss.org.uk

Neil Ballantyne
Acting Director, Institute for Research & Innovation in Social Services, 151 West George Street, Glasgow, G2 2JJ. E-mail: neil.ballantyne (at) iriss.org.uk

Received June 15, 2009; Accepted June 25, 2009


The digital age has transformed access to all kinds of educational content not only in text-based format but also digital images and other media. As learning technologists and librarians begin to organise these new media into digital collections for educational purposes, older problems associated with cataloguing and classifying non-text media have re-emerged. At the heart of this issue is the problem of describing complex and highly subjective images in a reliable and consistent manner. This paper reports on the findings of research designed to test the suitability of two controlled vocabularies to index and thereby improve the discoverability of images stored in the Learning Exchange, a repository for social work education and research. An online survey asked respondents to "tag", a series of images and responses were mapped against the two controlled vocabularies. Findings showed that a large proportion of user generated tags could be mapped to the controlled vocabulary terms (or their equivalents). The implications of these findings for indexing and discovering content are discussed in the context of a wider review of the literature on "folksonomies" (or user tagging) versus taxonomies and controlled vocabularies.


Indexing; Digital images; Controlled vocabularies; Folksonomies; Tagging; Taxonomies


The use of images in teaching and learning is increasing and not just in those subject domains that are intrinsically visual or those that have traditionally used images at the core of the curriculum. Green's (2006) study of 404 faculty1 from 12 different subject areas across the visual arts, sciences and social sciences highlights the many ways in which faculty are using images in education to enrich, and enliven, classroom based and online learning: 'Using images has clearly made teaching easier for many faculty. For others the effect goes much further; indeed, the potential for digital images to "revolutionize" teaching is enormous. ' (p.99).

The appropriate use of images in education can actively engage learners and enhance learning through the interplay of memory, emotion and the construction of meaning. In a review of the literature on learning with text and pictures, Levin (1989, p.83) concluded 'pictures interact with text to produce levels of comprehension and memory that exceed what is produced by text alone', although it should be noted that this finding applied only to the selective and meaningful use of images in an educational context, and not to the use of images for arbitrary decorative effect. Multimedia learning theorists - basing their work on Paivio's (1971, 1986) dual coding theory - have argued that well designed multimedia, combining instructional text and images, can enhance learning by taking advantage of humans' separate information processing channels for verbal and visual material (Mayer & Moreno, 2003). In addition, a renewed focus on the role of emotion in learning emerging from affective neuroscience (Immordino-Yang & Damasio, 2007), and a continuing interest in the use of media to support 'authentic learning' (Lombardi & Oblinger, 2007) suggests that images and other media may have an increasingly important role to play in teaching and learning.

In the United Kingdom, the development of a range of educational image services such as Joint Information Systems Committee (JISC) Digital Media (www.jiscdigitalmedia.ac.uk), Arts and Humanities Data Services (AHDS) Visual Arts service (http://ahds.ac.uk/visualarts/)2, Focusing Images for Learning & Teaching (FILTER) project (www.filter.ac.uk), the Scran Trust (www.scran.ac.uk), and the Educational Image Gallery (http://edina.ac.uk/eig/) all attest to a significant investment in the use of visual learning tools in education.

Whilst it seems clear that digital images are in demand, and have a valuable role to play in an educational context, Green (2006) identifies a number of issues that need to be resolved before they can be effectively deployed, one of which is that 'Users must be able to regularly and efficiently find the best image for the job with accompanying metadata attesting to its identity, authenticity and integrity and enabling its citation.' (p.99)

Indexing images

Effective delivery of images to users depends on well-organised collections that render images discoverable. Storing digital images in a repository facilitates discoverability through descriptive information, or metadata, about the images, which can be used to create an 'index' allowing the images to be searched and retrieved. This metadata can include information about the intellectual property of the image, technical format and size, date of creation, and, importantly, what the image is about. Describing what an image is about can be difficult because unlike text, images do not describe themselves or tell us what they are about (Baxter & Anderson, 1996; Arms, 1999). Images are subjective, relying on an individual's personal perception and can be multidisciplinary and context-dependent. The meaning of an image can be influenced by its intended use and intended user, including a user's area of study, educational level or their cultural, social and historical awareness (Evans & Shabajee, 2002). To further complicate the issue, images can convey both concrete and abstract concepts. Shatford-Layne (1994) differentiates between the 'ofness' and 'aboutness' of an image, for example, where a picture of a person crying might be about sorrow. The subjective nature of images can make indexing them a complex process. Three distinct approaches to indexing images demonstrate the different ways this can be tackled: professional indexing, content-based image retrieval and user tagging.

Professional indexing

Traditionally, images have been indexed using text-based, or 'concept-based', approaches, where cataloguers manually assign keywords to images. These keywords are sourced from a taxonomy, a form of classification scheme 'designed to group related things together, so that if you find one thing within a category, it is easier to find other related things in that category' (Lambe, 2007). Taxonomies are semantic and provide a fixed or controlled vocabulary, within which 'ambiguous, alternate or less precise terms are excluded' (Lambe, 2007). When a taxonomy is organised in dictionary format, it is called a thesaurus. Text-based indexing tools for the classification of visual materials include the Art and Architecture Thesaurus (AAT), Thesaurus for Graphic Materials 1 (TGM 1), Visual Resources Association Core (VRA Core), and Iconclass. A limitation of text-based indexing is that it is time consuming and expensive as it requires training and expertise. Also, studies have identified a variety of attributes and levels on which an image can be described (Enser & Macgregor, 1993; Jorgensen, 1995, 1996, 1998; Armitage & Enser, 1997), making it difficult for formal classification tools to capture the wide range of descriptive needs (Jorgensen, 1999).

Inter-indexer consistency for images

Giving the subjective nature of image indexing, research studies have focused on the degree to which indexers agree when describing images. Studies on inter-indexer consistency include Markey (1984) who found low consistency between 39 participants asked to index 100 art history images, though Markey notes that the 'use of inexperienced indexers and non-subject specialists in this study may have diminished indexer consistency scores'. Wells-Angerer (2005) looked at the effect of indexer subject knowledge (expert, knowledgeable, novice) on retrieval success for searching online art museum collections using 30 participants to assign terms to 10 works of art. The study found that terms assigned by indexers with the highest level of knowledge obtained the best retrieval. Beaudoin's study (2008) also found that image indexing experience and subject expertise influenced the way participants indexed images. Experienced image indexers applied the most terms to images, followed by subject experts and lastly subject novices. Co-occurrence of terms also followed this pattern suggesting better inter-indexer consistency among indexers with experience and expertise.

Balasubramanian et al. (2004) looked at the effect of different types of describing task on inter-indexer consistency. Part of the study compared indexing term overlap among participants where they were asked to supply a filename for an image, keywords to describe it, and words to describe it over the phone to a friend. Average term overlap between participants was highest for file names (59%), followed by keywords (57%) and lowest for sentence level description (43%), indicating a high degree of agreement between user's vocabularies for describing images and in the keywords used to search for them.

Content-Based Image Retrieval

Content-Based Image Retrieval (CBIR) refers to the application of 'computer vision' to images to analyse their content (i.e. colours, shapes and textures). Though CBIR systems can query large image databases quickly and automatically, it has been argued that the meaning of an image cannot be defined in terms of its physical properties. Hare et al. (2006) call this the 'semantic gap' arguing that:

'The representations one can compute from raw image data cannot be readily transformed to high-level representations of the semantics that the images convey and in which users typically prefer to articulate their queries'. (p. 2)

Enser (2000) also stresses the importance of semantic representation in image indexing and argues that meaning is 'a property ascribed by human analysis of the image' (p. 200).

User tagging

Tagging describes a practice where users assign their own keywords to information resources. These tags 'are used to enable the organisation of information within a personal information space, but are also shared', enabling other web users to browse and search them (Macgregor & McCulloch, 2006, p. 294). Popular services with tagging facilities include Delicious (www.delicious.com) and Furl (www.furl.net) for tagging web pages, Conntoea (www.connotea.org) for references, Technorati (www.technorati.com) for blogs, and Flickr (www.flickr.com) for images. Tagging is characteristic of 'Web 2.0', which describes the trend towards more user-generated content on the internet.

Collectively, tags form a 'folksonomy', a term coined by Vander Wal (2007), combining the words 'folks' and 'taxonomy' to describe a user-generated taxonomy. It has been argued that a strength of tagging is that it reflects the real language of users and that tags are 'by definition, the very terms that real users might be expected to use in future when searching for this information' (Hammond et al., 2005) whereas controlled vocabularies can become 'rigid, stale and distant from the vernacular of users' (Rosenfeld, 2005). Tags are fast and simple to create, requiring no training, expense or subject knowledge compared to professional indexing. However, tags do have limitations and can often be 'ambiguous, overly personalised and inexact' (Guy & Tonkin, 2006), misspelt or consist of multiple languages or multiple words (Guy & Tonkin, 2006). Folksonomies often contain tags that are personal and temporal such as 'todo' and 'toread'. These kinds of tags indicate 'a dynamic relationship between document and user, and between subject and task' (Kipp & Campbell, 2006) but also that tagging is primarily 'for personal use rather than public benefit' (Golder & Huberman, 2006). Synonyms (different words with identical or very similar meanings) and homonyms (same words with different meanings) are also prevalent. Unlike a taxonomy, a folksonomy has no hierarchical relationships between tags which means there is no way to express links between related or similar tags.

It has been argued that the context in which tagging takes place is an important factor as the limitations of tags and folksonomies can have a bigger impact in less casual services like organisation wide document repositories (Merholz, 2004). However, it has been suggested that tags can be usefully included in controlled vocabularies to enrich and enhance them for a more user-centred indexing approach (Rosenfeld, 2005; Matusiak, 2006).

Several studies, like the one described in this paper, have also focused on the overlaps between tags and controlled vocabularies. Lin et al. (2006) compared tags assigned to 45 medical-related journal articles in Connotea and terms from the Medical Subject Headings (MeSH) used to index them in the PubMed database. Results showed only 11% of tags matched MeSH terms. Similarly, Bruce (2008) compared the overlap between tags given to articles in CiteULike, a website for bookmarking bibliographic citations of scholarly research, and descriptors taken from a controlled vocabulary in the Education Resources Information Center (ERIC) database. Findings showed a 7.6% overlap between tags and descriptors, though Bruce points out that only exact matches were considered and further research could usefully include investigation of spelling variations and semantic analysis between tags and controlled terms.

The Library of Congress (LC) launched a pilot project to make over 3000 of their photographs available online and to invite the public to tag them. One of the aims of the project was to collect 'user-centric, relevant terms that have the potential to increase retrieval of items in the Library's collection' (Springer et al., 2008, p.2). The response was 'overwhelmingly positive and beneficial' (Springer et al., 2008) and between January and October 2008, 67,176 tags were added by 2,518 unique Flickr users. The LC's plans for future tag analysis include the comparison of . . .

'. . . tags used by Flickr members against terms / references found in vocabulary lists used primarily to describe photos at LC like Thesaurus for Graphic Materials . . . incorporating popular concepts or variants into our controlled vocabularies might be a way to derive benefit from this kind of user-generated data' (Springer et al., 2008, p. 24).

The Learning Exchange image collection

The Learning Exchange (www.iriss.org.uk/openlx) is a digital library of learning resources for the social services workforce developed and maintained by the Institute for Research & Innovation in Social Services (IRISS). The resources include information sheets, official publications, interactive learning resources, video clips, multimedia case studies, radio broadcasts and podcasts, all of which may be used for non-commercial and educational purposes. Virtually all the resources are freely available on the open internet but are gathered in one accessible place and professionally catalogued in line with the Dublin Core metadata standard and described using keywords from the Social Care Institute for Excellence's (SCIE) controlled vocabulary. The SCIE controlled vocabulary is subject specific and focuses on the social work services, social care and related topics.

In the course of developing multimedia learning objects, which involved filming case studies, IRISS also amassed a potentially useful collection of still photographs. With the increasing use and importance of images in education, it was decided that these photographs would make a valuable addition to the Learning Exchange.

An online image tagging survey was designed to test the suitability of the SCIE controlled vocabulary and a subset of the Thesaurus for Graphic Materials 1 (TGM1) to describe images in the Learning Exchange. The rationale for including the TGM1 subset was that it contained a broader range of terms relating to emotional and mental states than the SCIE vocabulary and could be useful in describing the images of people in the Learning Exchange image collection.


The online survey

The survey consisted of a set of thirty randomised images typical of those to be included in the Learning Exchange, in a simple click-through interface. Each image was displayed with three fields for participants to type their own keywords or 'tags'. There was no word limit for the fields, the rationale being that, like other disciplines, the social care vocabulary contains multiword concepts, for example 'black and minority ethnic people', 'young offenders', 'child protection', 'mental health problems'.

Each field was modifiable so that users had the option of rethinking their tags if they wanted. A mandatory input of one tag per image was required for users to progress to the next image in the set. As an incentive, if participants provided the maximum of three tags for each of the 30 images, they were entered into a prize draw to win an iPod Shuffle.

The survey was distributed to two electronic mailing lists, part of the JISC mail community: the Social Policy and Social Work (SWAP) mailing list with 260 subscribers and IRISS' list with approximately 700. This distribution included social workers, social work managers and social work educators primarily from the UK. The survey was live for ten days in which time 191 unique users tagged at least one image. The majority of participants were from the UK, though there were also participants from Canada, America, Australia and New Zealand.

Tag categories

The survey captured the natural language of the community and the tags were varied and diverse. Twelve categories were devised to group the tags. Rather than discount all those tags that were not exact matches to terms in the two controlled vocabularies, it was decided that categories would also reflect cases where a tag was related to a controlled term. In other words, if a tag was a variation of, or semantically equivalent to a controlled term these were also analysed not ignored. The rationale behind this was to investigate not just the spellings of the tags and whether these exactly matched controlled terms but also if it was possible to capture the meaning of a tag in the controlled terms. Descriptions of categories and examples are provided in Table 1.

Table 1: Category descriptions and examples
Category description Examples Category code
Tag has an exact match to a SCIE term(s) User tag given: children
SCIE term: children
Tag has an exact match to a TGM1 subset term(s) User tag given: smiling
TGM1 term: smiling
Tag has an exact match in both vocabularies User tag given: happiness
SCIE term: happiness
TGM1 term: happiness
Tag is a variant form of term(s) in SCIE User tag given: aggressive
SCIE term: aggression
Tag is a variant form of term(s) in TGM1 subset User tag given: sleep
TGM1 term: sleeping
Tag is a variant form of term(s) in both vocabularies User tag given: anxious
SCIE term: anxiety
TGM1 term: anxiety
Tag is semantically equivalent to SCIE term(s) User tag given: domestic abuse
SCIE term: domestic violence
Tag is semantically equivalent to TGM1 subset term(s) User tag given: distressed person
TGM1 term: distress
Tag is semantically equivalent to term(s) in both vocabularies User tag given: joy
SCIE term: happiness
TGM1 term: happiness
Tag is discounted User tag given: fe, w, 12345 10
Tag does not appear in either vocabulary list User tag given: barbed wire 11

Tags were initially categorised by one of the Learning Exchange cataloguers. The robustness of the categorisation was evaluated using an inter-indexer consistency test described in next section.


The survey generated a total of 3980 entries which were recorded in a database. This total included tags of one and two words (e.g. child protection, community care, older people) and phrases (e.g. whole sentences). The initial analysis of the tags which this paper describes, focuses on tags of one and two words, a total of 2643 tags.

As Table 2 shows, 46.7% of tags could not be mapped to either vocabulary list (category 11). Overall, only 10% of tags matched controlled terms exactly, 11.7% of tags were judged to be variant forms and 30.4% to be semantically equivalent to controlled terms. Taking these three percentages together, 52.1% of tags could be mapped to controlled terms using the categories. Table 2 provides a detailed breakdown of all the categories.

Table 2: Categories assigned to one and two word tags (n=2643)
Category description Category code No. of tags in category Percentage of sample (n=2643)
Tag has an exact match to a SCIE term(s) 1 235 8.9%
Tag has an exact match to a TGM1 subset term(s) 2 24 0.9%
Tag has an exact match in both vocabularies 3 6 0.2%
Tag is a variant form of term(s) in SCIE 4 248 9.3%
Tag is a variant form of term(s) in TGM1 subset 5 36 1.3%
Tag is a variant form of term(s) in both vocabularies 6 27 1%
Tag is semantically equivalent to SCIE term(s) 7 732 27.7%
Tag is semantically equivalent to TGM1 subset term(s) 8 47 1.8%
Tag is semantically equivalent to term(s) in both vocabularies 9 25 0.9%
Tag is discounted 10 29 1%
Tag does not appear in either vocabulary list 11 1234 46.7%

Inter-indexer consistency

A randomly selected sample of 10% of each tag category, a total of 269 tags, was given to a second Learning Exchange cataloguer. The overall agreement between indexers was 54.2%. Agreement was, of course, highest for exact matches (100%), followed by variant forms (53.1%) and 23.1% for semantic equivalents.


This study was not an inter-indexer comparison between tags given by users to a set of images and terms given by a professional indexer from a controlled vocabulary. Studies have shown that this form of inter-indexer comparison has limitations related to subject knowledge and indexing experience. However, the online survey did provide a way of capturing the language of the community for describing a set of images and comparing it to a controlled vocabulary. The aim was to measure the extent to which the indexing language of the Learning Exchange was capable of accommodating users' language to describe images typical of those to be included in the image collection.

The survey found that 46.7% of tags could not be mapped to controlled terms. Only 10% of tags exactly matched controlled terms, a small percentage similar to the findings of 11% in Lin et al.'s (2006) study and 7.6% in that of Bruce (2008). This research seems to confirm that matches between user and indexing language are infrequent. Variant forms made up 11.7% of the tag sample with many more (30.4%) being judged as semantically equivalent to controlled terms. Inter-indexer agreement was strong for variant forms (53.1%) but much lower for semantic equivalence suggesting that these judgements were much more subjective and relied heavily on individual interpretation. The results also show that more tags matched the SCIE vocabulary than the TGM1 subset. Forty-six percent of tags could be matched to the SCIE terms (including exact matches, variant forms and semantic equivalents), compared to only 4% to the subset (again, including exact matches, variant forms and semantic equivalents), though as the SCIE is much larger than the subset, this result is not surprising. Based on this finding there is no support for including the emotion-related terms in the subset for indexing images in the repository.

The findings of this study raise several issues. Firstly, if as the survey suggests, exact matches are to be so rare between users' search terms employed to retrieve images and the SCIE terms used to index them, what happens to all the variant forms and semantically equivalent terms typed into the search box? Though it is reasonably simple for indexers to judge when a tag is a variation of a controlled term or when it shares the same meaning, search engines do not have the same human insight. However, some search engines like the one used by the Learning Exchange, can handle search queries that are variant forms of indexing terms. The repository operates on a stemming algorithm which retrieves different endings of words that share the same 'stem' (e.g. 'run' will also retrieve 'running'). It also has a predictive text feature, shown in Figure 1, so when a user begins to type a search string the system displays similar and related terms.

Figure 1: Predictive text feature in Learning Exchange search box
 Figure 1: Predictive text feature in Learning Exchange search box

In addition, a 'did you mean ...?' prompt provides alternative terms closely related to the search query if that query contains misspellings or words similar to those in the index. These features go some way to bridging the gap between the user's language and that of the index for variant forms. One way to completely avoid mismatches between users' and indexers' language is to expose the SCIE taxonomy so users can browse terms, clicking on them to retrieve content rather than keyword searching for content.

One reading of the survey results could be that due to the number of tags that could not be mapped to controlled terms the SCIE taxonomy is inadequate for indexing images. As highlighted in the literature, there are some strong arguments in support of tagging: that it is an easier, cheaper, more democratic and inclusive way of describing resources than professional indexing. However, in this case the use of a controlled vocabulary for indexing images has one key benefit. The context of the Learning Exchange is different from tag-driven social bookmarking sites in terms of context and purpose. The Exchange provides resources to a professional community for a professional rather than social purpose. The repository is specific to the social services discipline and contains text, audio and multimedia related to that subject. Moreover, this discipline like many others carries with it a professional language shared and understood by the workforce. This language, expressed in the SCIE vocabulary, is key to describing the social care related concepts, policies, conditions and legislation communicated by the repository content. As the Exchange contains resources to support social services education and practice, it makes sense to use the shared language of that workforce to describe resources, including images, in the repository.


The findings of this study have shed light on the complex issues involved in the process of indexing images to help users discover them. Though it can be argued that the Learning Exchange lends itself to more formal indexing using a controlled vocabulary, this study has provided stimulus for further developments to incorporate tagging in the repository for a more Web 2.0 approach. Firstly, once the image collection is live analysis of the search terms employed by users to retrieve images will enable a re-appraisal of the suitability of the SCIE controlled vocabulary. Secondly, analysis of user search terms will open up the possibility of including popular tags for indexing in addition to controlled terms. Thirdly, exposing a tag cloud or folksonomy of user search queries would provide another way for searchers to explore and access content, a feature that could be of particular use to those communities from different disciplines and those unfamiliar with the SCIE vocabulary. There is also the potential to provide users with a personal space to gather and tag their favourite repository content as well as to upload their own. More imminently, there are plans to expose the SCIE taxonomy so users can browse rather than keyword-search for content. Ultimately, the practices used to describe content in the Learning Exchange will be informed by the needs of its users.


  1. Academic staff of a university
  2. AHDS was funded from 1996 to March 31st 2008.


Bibliographic information of this paper for citing:

Daly, Ellen, & Ballantyne, Neil (2009).   "Ensuring the discoverability of digital images for social work education: an online "tagging" survey to test controlled vocabularies."   Webology, 6(2), Article 69. Available at: http://www.webology.org/2009/v6n2/a69.html

Alert us when: New articles cite this article

Copyright © 2009, Ellen Daly & Neil Ballantyne.

Valid XHTML 1.0 Transitional