Very often, people who are not used to language tagging hesitate before choosing a tag for a given combination of language, region, script, etc. You can find the result of these hesitations on the Web, with people tagging, for instance, japanese as jp (the proper subtag for the japanese language is ja, jp is for the country) or using subtags that are not registered because they did not find the valid ones. The purpose of this small text is to explain how to find a subtag registered in the Language Subtag Registry. There are many ways to do so, of course, and you are free to report better ways.

The first thing to try is probably to use Richard Ishida's Language Subtag Registry Search. You can enter text which appears in the Description or Comments field of the registry and the corresponding subtags will be displayed. For instance, for japanese, it will correctly report ja (and the script Jpan, which indicates the mix of Han, Hiragana and Katakana).

A more powerful, but probably less user-friendly, method, is to use the registry directly. Since its canonical form is more adapted to computer programs than to humans (for instance, Unicode characters are reported as XML escapes, like ç), it may be better to use of the many unofficial forms, automatically computed from the official one, and available for various environments. For instance, you may load the text version and use the Find function of your Web browser (Control-F in Firefox). Say that you are not sure of the proper subtag for the canadian aboriginal script, searching "canadian" that way soon discovers the subtag Cans.

Both Richard Ishida's Web service and the above method have a limit: they only use information that is in the registry. If the relationship between common names and the tags is not in the registry, you will not find it. For instance, if you want to identify "British English", you have to realize that that is done by constructing the tag en-GB (not en-UK) from subtags in the registry. Similarly, if you want to write texts in Alsatian, and search this word in the registry, you won't find anything.

You have to use external tools, but please check their results against the registry, with the tools mentioned above. A good search tool is Wikipedia. The english-speaking Wikipedia displays the ISO code names for most of the languages it talks about. Since most subtags in the registry are based on ISO standards, this works most of the time. For instance, the article on Alsatian will show the language code gsw (Alemannic), which, even if it is broader than the Alsatian dialect, is a good start.

For languages (but not scripts or dialects), another very useful source is Ethnologue, which is managed by the ISO 639 registration agency. It has a search function that allows you to use words that are not in the formal standard. For example, searching for "Alsatian", you'll find http://www.ethnologue.com/show_language.asp?code=gsw, where there is the comment: "Called 'Schwyzerdütsch' in Switzerland, and 'Alsatian' in France". But be careful: Ethnologue displays only 3-letters code, while the registry uses 2-letters code whenever they are available. For instance, French is fr, not fra.

TODO: endonyms and exonyms