forked from bortzmeyer/Web-LangTag
69 lines
4.3 KiB
XML
69 lines
4.3 KiB
XML
<page title="Find a subtag for my language / dialect / script">
|
|
<p>Very often, people who are not used to <a href="whatare.html">language
|
|
tagging</a> hesitate before choosing a tag for a given
|
|
combination of language, region, script, etc. You can find the result
|
|
of these hesitations on the Web, with people tagging, for instance,
|
|
<wikipedia name="Japanese language">japanese</wikipedia> as <code>jp</code> (the proper
|
|
subtag for the japanese <em>language</em> is <code>ja</code>,
|
|
<code>jp</code> is for the country) or using subtags that are not
|
|
registered
|
|
because they did not find the valid
|
|
ones. The purpose of this small text is to explain how to find a
|
|
subtag registered in the <em>Language Subtag Registry</em>. There are
|
|
many ways to do so, of course, and you are free to report
|
|
better ways.</p>
|
|
<p>The first thing to try is probably to use <a
|
|
href="http://people.w3.org/rishida/utils/subtags/">Richard Ishida's
|
|
Language Subtag Registry Search</a>. You can enter text which appears
|
|
in the Description or Comments field of the registry and the
|
|
corresponding subtags will be displayed. For instance, for japanese,
|
|
it will correctly report <code>ja</code> (and the script
|
|
<code>Jpan</code>, which indicates the mix of
|
|
<wikipedia name="Kanji">Han</wikipedia>, <wikipedia>Hiragana</wikipedia> and
|
|
<wikipedia>Katakana</wikipedia>).</p>
|
|
<p>A more powerful, but probably less user-friendly, method, is to use
|
|
the registry directly. Since its canonical form is more adapted to
|
|
computer programs than to humans (for instance,
|
|
<wikipedia>Unicode</wikipedia> characters are reported as XML escapes,
|
|
like &#xE7;), it may be better to use of the <a
|
|
href="registries.html/">many unofficial forms</a>, automatically
|
|
computed from the official one, and available for
|
|
various environments. For instance, you may load the <a
|
|
href="registries/language-subtag-registry-utf8">text version</a> and use the Find
|
|
function of your Web browser (Control-F in
|
|
<wikipedia>Firefox</wikipedia>). Say that you are not sure of the
|
|
proper subtag for the <wikipedia name="Canadian Aboriginal Syllabics">canadian
|
|
aboriginal script</wikipedia>, searching "canadian" that way soon discovers
|
|
the subtag <code>Cans</code>.</p>
|
|
<p>Both Richard Ishida's Web service and the above method have a
|
|
limit: they only use information that is in the registry. If the relationship between common
|
|
names and the tags is not in the registry, you will not find it. For
|
|
instance, if you want to identify "<wikipedia name="British English">British English</wikipedia>", you have to realize
|
|
that that is done by constructing the tag <code>en-GB</code> (not <code>en-UK</code>) from
|
|
subtags in the registry. Similarly, if you want to
|
|
write texts in <wikipedia name="Alsatian language">Alsatian</wikipedia>, and search this word
|
|
in the registry, you won't find anything.</p>
|
|
<p>You have to use external tools, but please check their results
|
|
against the registry, with the tools mentioned above. A good search
|
|
tool is
|
|
<wikipedia>Wikipedia</wikipedia>. The english-speaking Wikipedia
|
|
displays the <wikipedia>ISO</wikipedia> code names for most of the
|
|
languages it talks about. Since most subtags in the registry are based
|
|
on ISO standards, this works most of the time. For instance, the article on Alsatian will
|
|
show the language code <code>gsw</code>
|
|
(<wikipedia name="Alemannic German">Alemannic</wikipedia>), which, even if it is broader than the Alsatian dialect, is a good start.</p>
|
|
<p>For languages (but not scripts or dialects), another very useful
|
|
source is <a href="http://www.ethnologue.com/">Ethnologue</a>, which is
|
|
managed by the <wikipedia>ISO 639</wikipedia> registration agency. It
|
|
has a <a href="http://www.ethnologue.com/site_search.asp">search function</a> that allows you to use words that are not in the
|
|
formal standard. For example,
|
|
searching for "Alsatian", you'll find
|
|
<code><a href="http://www.ethnologue.com/show_language.asp?code=gsw">http://www.ethnologue.com/show_language.asp?code=gsw</a></code>, where there is the
|
|
comment: "Called 'Schwyzerdütsch' in Switzerland, and 'Alsatian' in France". But be careful:
|
|
Ethnologue displays only 3-letters code, while the registry
|
|
uses 2-letters code whenever they are available. For instance, <wikipedia name="French language">French</wikipedia> is
|
|
<code>fr</code>, not <code>fra</code>.</p>
|
|
<p>TODO: endonyms and exonyms</p>
|
|
</page>
|
|
|