Web-LangTag/tag-wisely.xml

60 lines
3.1 KiB
XML

<page title="Tag wisely">
<p>(Most of the content on this page comes directly from
<wikipedia name="Request for Comments">RFC</wikipedia> 5646.)</p>
<p>For the same body of text, you may have several possible
tags. Interoperability is best served when all users use the same
language tag for the same language. The rules here are intended to
help in that respect.</p>
<p>Subtags should only be used where they add useful distinguishing
information; extraneous subtags interfere with the meaning,
understanding, and processing of language tags. In particular, fields
<code>Suppress-Script</code> in the registry should be obeyed: for
instance, <code>fr</code> (<wikipedia name="French language">French</wikipedia>) has a
<code>Suppress-Script: Latn</code> because the overwhelming majority
of French texts are in the <wikipedia>Latin script</wikipedia>. Therefore, tagging text in French as
<code>fr-Latn</code> is useless and confusing. A simple
<code>fr</code> is enough. In the unlikely case that you meet French
texts in the <wikipedia>Arabic script</wikipedia>, then you can add a subtag for the script:
<code>fr-Arab</code>. (This is specially important since the former
standard, in RFC 3066, did not have subtags for scripts and therefore
old applications will have problems to handle them.)</p>
<p>Use as precise a tag as possible, but no more specific than is
justified. Avoid using subtags that are not important for
distinguishing content in an application. For example, <code>de</code>
might suffice for tagging an email written in
<wikipedia name="German language">German</wikipedia>, while <code>de-CH-1996</code>, while
legal,is probably unnecessarily precise for such a task.</p>
<p>But do not be too vague: the primary language subtag might not be
sufficient to give all the information necessary to understand the
text. For
example, the tag <code>az</code> (for
<wikipedia name="Azerbaijani language">Azerbaidjani</wikipedia>) is probably insufficient in the
absence of context, because this language has no dominant script. A person fluent in
one script might not be able to read the other, even though the text
might be identical. Content tagged as <code>az</code> most probably is written
in just one script and thus might not be intelligible to a reader
familiar with the other script. <code>az-Latn</code>,
<code>az-Cyrl</code> or <code>az-Arab</code> are probably necessary.</p>
<p>If a tag or subtag has a <code>Preferred-Value</code> field in its registry
entry, then the value of that field should be used to form the
language tag. For example, use <code>he</code> for <wikipedia
name="Hebrew language">Hebrew</wikipedia> in preference to
<code>iw</code>.</p>
<p>Validity of a tag is not everything. A tag may be both valid and
meaningless. This is unavoidable with a generative system like the
language subtag mechanism. So, <code>ar-Cyrl-AQ</code>
(<wikipedia>Arabic</wikipedia> written with the <wikipedia name="Cyrillic alphabet">cyrillic
script</wikipedia>, as used in <wikipedia>Antarctica</wikipedia>) is
perfectly valid but should nevertheless be avoided because it has no
relationship with the reality (there is not a single document with
these characteristics).</p>
</page>