close questionmark user search earth triangle retweet users star home mention pencil info lightbulb-o icon_fullscreen images refresh list grid mail in eye export twitter

Why do tweets get tagged as the wrong language

07 February, 2015 by Tom

Occasionally you might notice tweets in socialbearing.com and other Twitter tools being tagged in the wrong language.

Twitter automatically tags tweets depending entirely on the words within the tweet rather than any other indicator such as your location, country of origin or geo-location. This is the best and most reliable method of language identification because there are so many multilingual Twitter users and accounts, it would be impossible to do so otherwise. It does mean however that Twitter occasionally gets things wrong.

Incorrect language identification of tweets happens most frequently when tweets contain only a handful of words and there are not enough words in the body of the tweet to correctly identify a language. For example, the word ‘haha’ is often tagged as ‘Wikang Tagalog’,┬áthe national language of the Philippines. The Twitter API returns this tweet with the ISO code ‘tl’.

Twitter language tagging

Twitter incorrectly tags some tweets in the wrong language

 

Twitter does a particularly bad job at not being able to identify Welsh tweets and instead of tagging them in the wrong language, the Twitter API returns tweets as ‘und’ – unidentified. For example a search for the word ‘Diolch’ (Thank you) returns the majority of tweets as unidentified:

Welsh Tweets Unidentified

Tweets in Welsh are largely unidentified by Twitter

While there’s nothing Social Bearing can do to correct the incorrect tagging of tweets, an ‘Unidentified’ language option has been added so at least visitors can see that some tweets are not being identified correctly.



Leave a Reply

Your email address will not be published. Required fields are marked *

css.php