Are tags from Mars and descriptors from Venus? A study on the ecology of educational resource metadata

. In this study, over a period of six months, we gathered empirical data from more than 200 users on a learning resource portal with a social bookmarking and tagging feature. Our aim was to study the interrelation of conventional metadata and social tags on the one hand, and their interaction with the environment, which can be understood as the repository, its resources and all stakeholders that included the managers, metadata indexers and the whole community of users. We found an interplay between tags and descriptors and showed how tags can enrich and add value to multilingual controlled vocabularies in various ways. We also showed that, even if many tags can be seen as redundant in terms of the existing LOM, some of them can become a useful source of metadata for repository owners, and help them better understand users’ needs and demands.


Introduction
A conceptual model and taxonomy for social tagging system was presented in Marlow et al. [1] where the authors argue that tagging is motivated both by personal needs and sociable interests, e.g. attract attention, self presentation, future retrieval, contribution and sharing. Vander Wal [2] observed that tagging could be used to compensate for missing terms in a taxonomy and Lin et al. [3] and Al-Khalifa et al. [4] explored the overlap of tags with controlled vocabularies and automatic indexing. Sen et al. have studied the quality of tags and tagclouds in [5], Farooq et al. [6] focus on folksonomies adding intellectual value to a tagging system, whereas Heymann et al. [7] observe that tags are present in the page text of 50% of annotated pages and in 16% of the titles.
The quality of tags, like metadata, can be evaluated from two different perspectives: the validity of the metadata in describing the resources, and their usefulness in terms of searchability and the extent to which the metadata supports the retrieval of resources [8]. In this study, we are interested in how useful the usergenerated tags can be for the learning resource "metadata ecology". This term can be used to describe the interrelation of conventional metadata (e.g. LRE Application profile) [9] and social tags on the one hand, and their interaction with the environment, which can be understood as the repository, its resources and stakeholders, such as the managers, metadata indexers and the whole community of users. In the remaining part of this Section, we describe the context of the study and the data set. In Section 2 we presents the results of a number of studies with different stakeholders in the learning resource economy, including end-users, librarians/expert indexers and repository owners. Section 3 provides a discussion on the findings, whereas Section 4 concludes with possible future work.

Context and Method
The portal under consideration is the Learning Resource Exchange (LRE) (http://lreforschools.eun.org) developed by European Schoolnet and its partners in the MELT project. At the time of the data gathering (Jan 31 2009), a version of the LRE federation of repositories was made available to a restricted number of schools with more than 30 000 open educational resources and nearly 90 000 assets from 19 content providers in Europe and elsewhere [9]. These resources exist in different languages and conform to different national and local curricula. A common Learning Resource Exchange Application Profile [10] is used by content providers which makes the use of classification keywords from the LRE Thesaurus mandatory [11]. This Thesaurus currently exists in 17 languages.

Are tags from Mars and descriptors from Venus?
A study on the ecology of educational resource metadata 3 Figure 1 shows the front page of the LRE portal (hereafter referred as portal). The portal offers different categories of searches: "Explicit search" and "Browse by category" that take advantage of multilingual metadata. "Community browsing", on the other hand, takes advantage of the other user behaviour. This includes: the use of tagclouds and tags; social navigation features such as "most bookmarked resources"; and "Personal search" where users can search the resources they have previously saved in their Favourites by using tags. The data set was gathered using a logging scheme for users' attention metadata, details of which can be found in [12]. The current data is a snapshot from a six-month period. From July 2008 to January 2009, primary and secondary school teachers from Austria, Belgium, Hungary, Finland, Estonia, United Kingdom, Slovenia, Sweden, France, Germany and Greece became involved in the pilot test. In total this meant 234 users out of which 77 used the bookmarking and tagging tool. Table 1 shows the number of bookmarks and tags produced by the users, and the amount of attention metadata that tags generated.

Results
We first look at how teachers tag and interact with tags on the portal. Then, we provide two different evaluations on tags, one by expert indexers and another one by a focus group of learning resource repository owners.

How do users tag?
The basic dataset on users' tags is presented in Table 2. Out of all users, 33% added bookmarks and tags. In total, 1857 distinct resources were bookmarked 2490 times out of more than 30 000 learning resources made available. On average, each resource had 1.3 bookmarks; however, in reality, 80% of resources had only one bookmark. The remaining 20% accumulated 53% of all bookmarks. Each bookmark had an average of 3.7 tags (Table 2). When we look at the tags per resource, we find each resource has an average of 5 tags. However, the top 39% of resources had 70% of tags and the remaining 61% of resources had less than five tags (18% had only one tag). There were 3832 distinct tags applied 9219 times. On average, each tag was used 2.4 times. 15% of tags were used more than average; these tags comprised 59% of all tags applied. There were three tags that were applied more than a hundred times, namely "english" (257), "interactive" (161) and "Vocabulary" (126). Each user who bookmarked (77) added an average of 28 bookmarks. The top 28% of users were responsible for 85% of all bookmarks, whereas 72% users were below the average. An average user applied 118 tags to bookmarks. We find, however, that 29% of users added over 92% of all tags, whereas 71% of users were below average.
As the LRE portal is made available to teachers from European countries and its interface is made available in multiple languages, it is normal that users tag in multiple languages. With the help of the LRE Multilingual Thesaurus, we verified the language of the applied tags in a sample (n=3738). Table 3 shows the languages that were used for tagging. 29% of the tags were in English, although a very few users had English as mother tongue. We found a medium correlation (r=0.57) between the language of the content and language of tags. The tagging behaviour in a multilingual context is further studied in [13]. We ran a database query against all the tags and the multilingual Thesaurus terms. We found that 11.3% of distinct user-generated tags exist in the LRE multilingual Thesaurus. We call these "Thesaurus tags", as they are end-user generated, but they also exist in the Thesaurus. The number of times "Thesaurus tags" were applied rises to 30.6% of all tags (i.e. the same tag added to many resources). On average, these tags were reused 11.8 times compared to other tags which were reused on average 2.4 times. In the following evaluations we see the popularity of these terms is repeated (e.g. Table 5). It is interesting that, especially in a multilingual context, such a high percentage of overlap exists between natural language and controlled vocabularies. In [5] authors report that the folksonomy set overlapped with the indexer set on average 19.5%. Table 4 shows that 58% of all users had clicked on tags while searching for resources, whereas 42% never used tags. This means that more people use tags for retrieval than actually add tags (33%). For the resource discovery, we were interested in whether all the tags were used in a similar way. Out of more than 3800 distinct tags, our logging analyses show that 419 tags generated 2631 clicks of attention metadata, i.e. clickstream. On average, each tag received 6.9 clicks; however, in reality, the top 14% of tags that were above

Are tags from Mars and descriptors from Venus?
A study on the ecology of educational resource metadata 5 average generated 76% of the clickstream. In Table 5, in the middle column, we find the tags that generated the most clickstream. There were three end-user added tags that rose above others (english, interactive, animation), which also probably constitute the "wish list" of the users of an international learning resource portal. Table 5. Most added and clicked on tags on the LRE portal. "Add to LOM" shows the most voted tags by expert indexers to be added to LOM. * indicates the potential "Thesaurus tags" and ** indicates tags that were not added by the end-users, but project staff. As for Community browsing, we find that not only tags attract clickstream, but bookmarks are also used for social navigation. By registered users, tagcloud receives 22% of all search actions, whereas personal bookmarks 5% and another additional 2% come from clicking on other users' bookmarks. This shows that to a small extent, tags are used to discover resources by other users, but also for Personal searches.
Lastly, we asked whether the tags that were added a lot by users, also received users' attention. In the other words, does the offer of tags by teachers match the demand by teachers? We devised a measure for "attractive tags" which compared the amount of clickstream on a tag to how many times it had been added by teachers. If the number is above one (1), it means that the tag has generated more clickstream than tag applications. This means that the tag is "attractive". If the number equals to one, it means that there is an equal amount of both, and below one indicates that there are tag applications, but no demand. We found that 21% of tags were "attractive" ( Figure 2) and 24% had an equal demand and offer. 55% of tags received less clicks than there were tags applied to resources. Language-wise, within the "attractive" and "equal" tags, 28% are in another language than English.

What do expert indexers think of tags?
Out of the original dataset, we took a sample of ten learning resources with usergenerated tags that a) had a high number of tags and/or b) offered some variety in terms of discipline and type of resources. This data was used in order to obtain feedback from 15 expert indexers who work with metadata and classification of resources in a learning resource repository or portal. The details of these evaluations are reported in [14]. There were ten resources that included 23 Thesaurus terms as descriptors and 88 tags. We asked the indexers to evaluate the usefulness of end-user created tags as descriptors of learning resources.
In general, we detected that expert indexers were positive towards tags; they were evaluated as being suitable (i.e. clear and unambiguous) as indexing keywords (average 30%) and were actually added to the original LOM description (average 26%). The "Thesaurus tags" featured prominently (43%) among tags that expert indexers voted above average on the question "Would you want to revise the original LOM description of the resource and, if so, which of the following terms might you adopt" (Table 5, right column). Especially in the case where the original indexing was poor or limited, for example, due to too broad indexing, participants in the study indicated that they would be prepared to adopt these "Thesaurus tags". Examples of these Thesaurus tags in our analyses are: chemistry, culture, Európa, Europe, grammar, information, kemia, kultúra, reading, szobor, thermodynamics, vocabulary.
There were also potential Thesaurus tags -some tags that have an almost identical spelling to Thesaurus terms; however, these cannot be identified automatically, but require human intervention. Examples are tags such as "english" which could be mapped to Thesaurus term "English language", or "french" to "French language".

What do repository managers think of tags?
A focus group with five learning resource repository or portal managers was run to better understand how they perceived the value of tags to them. These are reported in details in [14]. One of the activities was a small case study where a repository manager analysed the added value of tags to existing Learning Object Metadata (LOM). The case in question is the Tiger Leap Foundation's repository which is part of the LRE federation. The study comprised 84 bookmarks on 63 distinct resources where users from different European countries had added tags to them. The tags were compared with the existing LOM, its keywords, LRE Thesaurus terms and other classification information such as curriculum topics.
In 25% of the cases the tags provided additional value for the repository. Tags, for example, described the content of the resource more clearly (tags 'Australia' and 'USA' added for the resource "English-speaking countries", or 'culture', 'nature' added for a resource titled "Scotland"). Even if our sample size is very small, the results point in the same direction as previous studies, e.g. [5] compared tags with the page text and back and forward link page text, and found that in 20% of the cases tags provided search data not provided by other sources.
Moreover, we was found that in 49% of the cases, the information that the tags provided was already reflected in existing keywords, LRE Thesaurus terms or in other classification information, and in 26% of the cases tags included somewhat redundant information, which already existed in other elements of LOM description. The following redundancy was observed with elements of the LOM description: • LOM 5.2: resource type. Examples: photo, picture; exercises, games; simulations; quiz, web quest • LOM 5.7: the age group being addressed (e.g. young learners) • LOM 1.3: the language of the resource (English).

Discussion
In this study we have focused on the interplay of tags and Learning Object Metadata descriptions that takes place on the learning resource portal. We have looked at the issue from multiple points of view, namely that of end-users, expert-indexers and repository managers. We have shown a number of levels where possibilities for interplay exist. A number of interesting issues arise. We have found that a third of tag applications by the end-users are actually descriptors that exist in the LRE

Are tags from Mars and descriptors from Venus?
A study on the ecology of educational resource metadata 7 Multilingual Thesaurus. These "Thesaurus tags" by users can be used to improve the semantic interoperability of tags. First, they have a potential to be used as a "bridge" between existing descriptors and tags, and thus enhance the semantic interoperability within and across languages. One example is the resource "Change of State" in Figure 2, which has tags by endusers as well as the classification terms by the expert indexer. Table 6, on the other hand, shows the Thesaurus "descriptor 195" representing the concept of "chemistry" with its language equivalences. As we can now observe, the tag "kemia" is actually a "Thesaurus tag". Thanks to the multilingual Thesaurus, we can first of all recognise the similarity between a "Thesaurus tag" and the descriptor, and then assign properties to these tags from the Thesaurus, e.g. the tag "kemia" is related to the concept of "descriptor 195" and its language is Finnish. A similar idea of connecting tags to existing ontologies has been presented in [15], although the difference is that in our case, we use the resource and its existing descriptors as a proxy for the semantic link between the descriptor and tag, and that this process can be automated to take place at the back-end without being intrusive to the user.

Fig. 2.
Learning resource "Change of State" with tags (e.g. "kemia") and indexing terms "sciences" and "physical sciences" from the multilingual Thesaurus.
The information gained from the link between the "Thesaurus tag" and descriptor can be used in various ways. It can be used, for example, in the tagcloud to show different translations of the tag "kemia". As for the retrieval purposes, the system could infer that other resources indexed with the "descriptor 195" are also relevant. Here, the user will get a chance to retrieve learning resources in multiple languages, thanks to the inter-language connection that the multilingual Thesaurus offers. Moreover, they open up new options to navigate across multilingual resources as, for example, we could imagine displaying all the tags that are related to the "descriptor 195" to create a multilingual chemistry tagcloud. Secondly, the "Thesaurus tags" can be suitable descriptors to be added to the original LOM description of the learning resource, particularly in cases where the original indexing has been poor or limited. In our example of "Change of State", we know from the Thesaurus hierarchies that the "descriptor 195" is a narrower term of the existing indexing term "physical sciences". As the "Thesaurus tag" narrows down the current classification of the learning resource in question, we can automatically add it as a new classification term for the resource.
Thirdly, the area of intra-language equivalence within the multilingual Thesaurus could be improved with tags, as in our evaluations they have been identified as a good source for non-descriptors. A non-descriptor provides the intra-language equivalence that facilitates access to resources that are indexed by using the thesaurus terms that do not translate well to the language that the end-user uses. For example, the tag "esl" (= "English second language") or "efl" (= "English foreign language") could be expressed in thesauri terms as "English language" + "foreign language". When the user types a text search "efl", not only tagged resources would be retrieved, but also the ones with the above descriptors. In this way the gap between natural language and controlled language could be reduced. The same could apply also for gathering better scope-notes, which deal with the meaning of terms and help the user to understand the term better. Especially in a multilingual context, where some differences occur from one language/culture to another, this feature is useful to understand cultural differences.
Lastly, in the area of interplay between the tags and Thesaurus, the Thesaurus enrichment should be noted. Tags can help to define, verify and enrich, and then redefine a number of relationships in thesauri. Our evaluations have shown that tags can help identify areas in the Thesaurus where descriptors are not sufficient and thus need enrichment.
We have also shown that tags can yield important information for the repository owners. In the case study we were able to show that, even if many tags were redundant because of the LOM description, some tags clarified the content better. Also, the clickstream generated from the users' attention can indicate "users' demand" and thus help the repository manager to display a large number of potentially relevant resources. Seeing the popularity of some tags in the tagcloud (e.g. English, interactivity), the repository managers could also possibly take advantage of the other elements of LOM (e.g. type, language, classification keyword) to create new navigation paths à la tagcloud, which seem to be very attractive for users. Lastly, we note that tags interplay with end-users by allowing them to create their own "ecoscape" of resources by using tags in a way that Marlow et al. [1] call "self presentation". This enhances the personal retrieval of resources and thus allows users to claim more ownership of resources. This type of "ego-scape" can further be used by other users to discover resources.

Conclusion and future work
This study has helped us to better understand the "metadata ecology", a term that can be used to describe the interrelation of conventional metadata (e.g. LRE Application profile) and social tags on the one hand, and their interaction with the environment, which can be understood as the repository, its resources and stakeholders We found interplay between tags and descriptors on the one hand, and on the other, we showed that tags can enrich and add value to multilingual controlled vocabularies as the multilingual LRE Thesaurus. We also showed that tags can become a useful source of metadata for repository owners, as well as help them better understand users' needs and demands through appraisal of "attractive tags".
Having established in this study that not all the tags are as far from the Thesaurus descriptors as Mars is from Venus, future work should particularly focus on improving the link between tags and terminological knowledge base such as the LRE thesaurus. Tags have been created in a specific cultural context where educational language is used, and thus are valuable as a way to reduce the gap between natural and controlled languages. Moreover, further work should focus on the inherent