Web-LangTag/find-subtags.xml

69 lines
4.3 KiB
XML

<page title="Find a subtag for my language / dialect / script">
<p>Very often, people who are not used to <a href="whatare.html">language
tagging</a> hesitate before choosing a tag for a given
combination of language, region, script, etc. You can find the result
of these hesitations on the Web, with people tagging, for instance,
<wikipedia name="Japanese language">japanese</wikipedia> as <code>jp</code> (the proper
subtag for the japanese <em>language</em> is <code>ja</code>,
<code>jp</code> is for the country) or using subtags that are not
registered
because they did not find the valid
ones. The purpose of this small text is to explain how to find a
subtag registered in the <em>Language Subtag Registry</em>. There are
many ways to do so, of course, and you are free to report
better ways.</p>
<p>The first thing to try is probably to use <a
href="http://people.w3.org/rishida/utils/subtags/">Richard Ishida's
Language Subtag Registry Search</a>. You can enter text which appears
in the Description or Comments field of the registry and the
corresponding subtags will be displayed. For instance, for japanese,
it will correctly report <code>ja</code> (and the script
<code>Jpan</code>, which indicates the mix of
<wikipedia name="Kanji">Han</wikipedia>, <wikipedia>Hiragana</wikipedia> and
<wikipedia>Katakana</wikipedia>).</p>
<p>A more powerful, but probably less user-friendly, method, is to use
the registry directly. Since its canonical form is more adapted to
computer programs than to humans (for instance,
<wikipedia>Unicode</wikipedia> characters are reported as XML escapes,
like &amp;#xE7;), it may be better to use of the <a
href="registries.html/">many unofficial forms</a>, automatically
computed from the official one, and available for
various environments. For instance, you may load the <a
href="registries/language-subtag-registry-utf8">text version</a> and use the Find
function of your Web browser (Control-F in
<wikipedia>Firefox</wikipedia>). Say that you are not sure of the
proper subtag for the <wikipedia name="Canadian Aboriginal Syllabics">canadian
aboriginal script</wikipedia>, searching "canadian" that way soon discovers
the subtag <code>Cans</code>.</p>
<p>Both Richard Ishida's Web service and the above method have a
limit: they only use information that is in the registry. If the relationship between common
names and the tags is not in the registry, you will not find it. For
instance, if you want to identify "<wikipedia name="British English">British English</wikipedia>", you have to realize
that that is done by constructing the tag <code>en-GB</code> (not <code>en-UK</code>) from
subtags in the registry. Similarly, if you want to
write texts in <wikipedia name="Alsatian language">Alsatian</wikipedia>, and search this word
in the registry, you won't find anything.</p>
<p>You have to use external tools, but please check their results
against the registry, with the tools mentioned above. A good search
tool is
<wikipedia>Wikipedia</wikipedia>. The english-speaking Wikipedia
displays the <wikipedia>ISO</wikipedia> code names for most of the
languages it talks about. Since most subtags in the registry are based
on ISO standards, this works most of the time. For instance, the article on Alsatian will
show the language code <code>gsw</code>
(<wikipedia name="Alemannic German">Alemannic</wikipedia>), which, even if it is broader than the Alsatian dialect, is a good start.</p>
<p>For languages (but not scripts or dialects), another very useful
source is <a href="http://www.ethnologue.com/">Ethnologue</a>, which is
managed by the <wikipedia>ISO 639</wikipedia> registration agency. It
has a <a href="http://www.ethnologue.com/site_search.asp">search function</a> that allows you to use words that are not in the
formal standard. For example,
searching for "Alsatian", you'll find
<code><a href="http://www.ethnologue.com/show_language.asp?code=gsw">http://www.ethnologue.com/show_language.asp?code=gsw</a></code>, where there is the
comment: "Called 'Schwyzerd&#xFC;tsch' in Switzerland, and 'Alsatian' in France". But be careful:
Ethnologue displays only 3-letters code, while the registry
uses 2-letters code whenever they are available. For instance, <wikipedia name="French language">French</wikipedia> is
<code>fr</code>, not <code>fra</code>.</p>
<p>TODO: endonyms and exonyms</p>
</page>