Folksonomies the role of folksonomies in groupware systems

An article, posted more than 18 years ago filed in information, folksonomies, tags, tagging, hype, social bookmarking & retrieval.

In recent times, a huge number of websites has begun to generate extraordinary amounts of new content through web logs, Wikis and other social tools. The websites supporting this type of content generating is gaining hype and are emerging rapidly on internet. In this content generation, increasing amount of valuable content is generated not by trained writers and experts, but by non-trained and non-expert information professionals. Not only is original content being generated, community efforts are also generating fast amounts of content-describing data (meta data). While this was formerly only the task of trained professionals making use of well specified taxonomies, now large numbers of non-professionals are annotating with what is called “folksonomies”. In this essay we try to clarify what folksonomies bring to information architecture community and what it entails in collaborative work environments.

The piece of content that describes other content is often referred to as keyword or tag. Collaborative tagging, describes the process by which many users add metadata in the form of keywords to shared content. Users add tags to online items, such as images, videos, bookmarks and text. While most people seem to be tagging for themselves, the combined effort may result in a folksonomy (Golder & Huberman, 2005).

So what is a folksonomy? Folksonomy is Fusion of words 'folks' and 'taxonomy'. The first use of the term folksonomy has been attributed to Vander Wal who, later, defined it as follows:

“Folksonomy is the result of personal free tagging of information and objects (anything with a URL) for one's own retrieval. The tagging is done in a social environment (usually shared and open to others). Folksonomy is created from the act of tagging by the person consuming the information.” - Vander Wal (2007)

A folksonomy is a distributed classification system resulting from the combined effort of independent individuals, typically the resource users. Systems supporting the creation of folksonomies may allow users to add tags to online items, such as images, videos, bookmarks and text. When these tags are shared the description of the item is open for refinement by either themselves or others.

Explaining folksonomies

Folksonomy and Taxonomy

Taxonomy is a practice and science of classification. In Taxonomy, a hierarchal structure is followed and objects are classified in categories which are generated by professionals and authorities. Most common examples of taxonomies are Melvil Dewey classifications of books for libraries where he divided books into 10 broad areas and then several hundred sub-areas. As a coding he used numbers. For example books on Africa’s geography are in the Dewey Decimal system category 916 and books on South America’s in 918, but both are subsumed by the 900 category, covering all topics in geography. ( Golder & Huberman, 2005). Important to realize is that taxonomies are not value free. Reasons for this are because they were developed with the constraint of lack of physical space and knowledge about the domain is almost never complete when a classification is being developped (Weinberger, 2006).

Folksonomy is user generated classification. In Folksonomy, there is no predefined set of hierarchical ordered terms which user has to follow. People make up the words themselves. The power is to the people here.

Through tags people are allowed to position documents, video's or any other data in different groups, defined by a single tag, or a combination of these. Unlike hierarchical system we can access one such document from multiple directions.

Folksonomies in Work

There is a number of websites which allow people to publicly tag and share content with other users. There are mainly two commonly cited folksonomies websites in work on internet: Flickr and Del.icio.us. We will discuss these briefly.

Flickr

Flicker allows people to share their pictures with their family, friends and other people on internet. Other people can comment on their pictures assign tags to pictures. Flickr provides rapid access to images tagged with the most popular keywords.

Flickr was one of the first websites to implement tag clouds. A tag cloud is visual depiction of tags used on the website. Often, more frequently used tags are depicted in a larger font or otherwise emphasized, while the displayed order is generally alphabetical.

Figure 1: Screenshot of tagcloud (Flickr; February 8,
2007)

Tags in Flickr enable a free way of browsing fast amounts of images. A photo in Flickr may have e.g. tags of “kth” “stockholm” “sweden”. While viewing image, clicking on the tag “sweden” will display all images that an individual has tagged with term “sweden”.

Del.icio.us

Del.icio.us deals with electronic bookmarks. Unlike hierarchical folder structure to organize the links as done in most web browses, it allows users tag their links. Since del.icio.us is an online service, it adds value to the user: they can access their bookmarks from other computers as well. While by default all bookmarks are shared, privacy levels can be set.

The ‘Popular’ page on del.icio.us shows how many people have common web pages with same tag. It also shows that what other people found interesting. These two features of showing your personal bookmarks in the form of tags and using your bookmarks from other computer make del.ici.ous this website interesting for the members of del.icio.us. That these users in the meantime are also creating metadata collaboratively, is a bonus for the community.

Strengths & weaknesses

Each technology has advantages and disadvantages, for folksonomies this is not different. Advantage for one person could be disadvantage for another, this because people use different way for search information. They may find it very easy to search information through tagging system, and others may get lost in browsing all loosely structured tags.

One of the advantages of this new tagging is that a user is completely free in how to use it, there’s no authority to adhere to. It is simple to use and users can apply it straight away. But this advantage has its drawbacks. There is no-one to stop people from entering irrelevant meta data. But who is to decide whether something is relevant? In a folksonomy there are no predefined categories. Usually the meta data is defined by individual users. When the user tries to find some information through other user’s tags, the result can be inaccurate and irrelevant. Sometimes the used terms are ambiguous. Often heard examples of this ambiguity are polysemes, synonyms, and small variations in spelling (Golder & Huberman, 2005).

Polysemes are words having different, but related, explanations. In contrast to homonemes, words having different, but irrelated, explanations, search queries with polysemes are hard to extend because of the relation between the multiple explanations.

Synonyms, words having the same meaning or refer to the same thing but named differently is a common problem in folksonomy driven websites because all the tags are defined by individual users. The same thing, same information, same picture, can be named all different tags.

A comparable problem as to the synonym problem is that of minor variations. The tags can be case sensitive, can get the different information when use type in lower case or upper case. Same problem with singular and plural, same reason again, can get the different information. User can misspell tags when they created it, so when someone else tries to find the same information, the results won’t match (Regli, 2006).

One may counter these arguments by asking “what terms would a user use?”. A search in a system with controlled vocabularies doesn't allow for positive hits when the search query contains errors, this however will be forgiven in a situation where a folksonomy is used.

But most importantly: “Non-‍trivial and important metadata are captured through these folksonomies” (Mathes, 2006).

In folksonomy there is no tree's branching structure, no hierarchy. This makes it easy to understand. There is no need to go through a hierarchy to look for information. Through tags and results there is a free path from start to the document one searches for.: in a tagging system you can go anywhere you want to (Weinberger, 2006). And although folksonomies don't seem to necessarily lead to finding the best information always, at any time (Weinberger, 2006; assuming that taxonomies do), the fact that it allows for free browsing of information, may lead to unexpected, valuable, discoveries (Mathes, 2004).

A folksonomy is actually more content-like than just a taxonomy, it allows for improvement (increased number of hits) of a search engine’s results by adding tags. The drawback is however that it may also lead to more false positives. This is far less uncommon in a system exploiting a hierarchy. Organization seems to be lacking in a folksonomy. This could be a problem and user may lose the route in finding information. For example: the apple and fruit tag are in the same level. Because there’s no hierarchy, there could be hundreds of tags at the same level. It can be a huge chaos (Weinberger, 2006). Clustering, such as applied in Flickr however seems to be quite capable of creating rough organizations on the fly.

Another problem of taxonomies is that uncommon things are hard to find, it is easy for common things like photos, news and so on, but it is very hard to find less popular topics, since it relies so much on users tagging the material. In this situation normal search engines, or solution based on taxonomies, may be a better choice.

Looking at it from a information retrieval point of view, however, is actually a limiting view on folksonomies. Matthew (2004) notes that tagging allows for more creative uses as well, allowing for the groupings of documents less related to its content, but to the judged value of a documents content: “to_read” is an example of such. Furthermore many systems allowing for tagging allow for discovering like minded individuals, and defining oneself by showing what his or hers interests are. These may actually be one of the most important incentives for some.

Alternatives and variations

The use of folksonomies is not a goal in itself, it is simply a means to an end: increasing findability of information. In this section we want to shed some light on other means of improving retrieval and variations on the theme of folksonomies. By understanding alternatives better, the question “why folksonomies?” may be answered in a more wisely manner.

No tagging

If we are simply considering finding our documents back, why do we need tags (or keywords)? Especially when we talk about digital written documents, what does the tag add to the set of words that already make up the document? Today's search engines do a reasonably good job in retrieving valuable information. Search engines are dealing with a much larger part of the web than social bookmarking sites are covering. Search engines crawl every site they can crawl, while social bookmarking sites only deal with those sites added to their database. This has as an advantage:tagging is instantaneous. One does not have to wait for some search engine to come across your page (Fox, 2006).

The limited scope of the social bookmarking site del.icio.us (still probably one of the most popular social bookmarking websites) is reflected in its most popular tags: “blog design software music reference tools web2.0 programming”. One might start to wonder how well keywords-only based search would work when these sites would consider all, or at least a broader subset of documents. Mita (2005, comments-section) questions the reason for why tagging works at this moment: don't they work because the websites people gather tags on are used by mainly like minded groups of people? We might start to wonder whether it is the folksonomy that make the social bookmarking websites work. Doesn’t a website just gathering link-submissions of like-minded people combined with the power of a up to date search-engine work just as well?

Another take on this subject is that today's search engines are capable of doing tag-like things through investigating relations between pages. The words in the hyper link from one page are considered as tags for the page it already links to.

Suggesting tags

Many objecting to the power of tagging object because every folksonomy is very ill-defined compared to well designed taxonomies. To overcome these objections some have come up with alternatives. Either by using a real taxonomy as a back-up or by suggesting, and thus promoting, the use of already

Based on formal taxonomies

Increasing consistency can also be promoted by suggesting classifications from formal taxonomies when entering tags (e.g. Lothian, 2006). Without limiting freedom to use other tags, users are encouraged to also add formalized tags as well. While these attempts may limit errors such as typing errors, one may wonder how bad that actually is (see our 'Strength and Weaknesses section').

Based on previous tags

The social bookmarking website del.icio.us gives, when adding new bookmarks, suggestions for tags. These suggestions are based on tags given by people who bookmarked this page earlier. By suggesting tags a more consistent set of tags might be the result, promoting a more stable set of words that can be used for tagging.

The downside may be that due to a less diverse set of tags, the findability may decrease, since there are fewer variations, like synonyms, that allow for a positive match. Another downside as noted by Guy and Tonkin (2006) is that promoting a certain convention, namely the one settled by previous bookmarkers, might be considered to be “no more obvious to them than the formal taxonomies that the folksonomy replaced.”

Setting rules

Somewhat related to suggesting tags, but on a higher level, is setting out rules. Some have suggested (Guy, Powell & Day, 2004) that the introduction of rules may help improve the quality of the metadata. Translated to tagging this may lead to a kind of tag etiquette (Good, 2006).

Guy et al. (2004) suggest that good archiving demands well defined functional requirements (how does your organisation want to use the information), as well as external functional requirements (how can others make use of this information), how it shall be built into an application, and finally content rules.

Conclusion

Predicting whether folksonomies will survive the Web 2.0 hype as we see it being used now is hard to tell. It is hard to compare a powerful search engine which searches billions of pages to a website that only has tagged references to a limited set of external sources. What will happen if a social bookmarking site will deal with a more comparable number of sites? Thorough quantitative research is required to give answers to these questions. But as noted, tagging may be more than simply information retrieval. Also these social aspects should be taken into account when studying folksonomies.

One may also wonder whether we are not sticking to a ageing paradigm of obtaining information, namely entering keywords. Do we want to keep searching for documents, or will systems in the future enable documents to 'find us'. Maybe tagging, and the resulting folksonomies, is just another acceptable technology for these days, but not for tomorrow. Tags do not lend themselves for computer processing, reasoning etc. The technology to support this is still in its infancy, but improving. Or should we rephrase the before last sentence into “Tags do not yet lend...”

This article was co-authored by Maarten Brouwers, Ather Nawaz, Xu Zhao