forked from bortzmeyer/Web-LangTag
60 lines
3.1 KiB
XML
60 lines
3.1 KiB
XML
<page title="Tag wisely">
|
|
<p>(Most of the content on this page comes directly from
|
|
<wikipedia name="Request for Comments">RFC</wikipedia> 5646.)</p>
|
|
|
|
<p>For the same body of text, you may have several possible
|
|
tags. Interoperability is best served when all users use the same
|
|
language tag for the same language. The rules here are intended to
|
|
help in that respect.</p>
|
|
|
|
<p>Subtags should only be used where they add useful distinguishing
|
|
information; extraneous subtags interfere with the meaning,
|
|
understanding, and processing of language tags. In particular, fields
|
|
<code>Suppress-Script</code> in the registry should be obeyed: for
|
|
instance, <code>fr</code> (<wikipedia name="French language">French</wikipedia>) has a
|
|
<code>Suppress-Script: Latn</code> because the overwhelming majority
|
|
of French texts are in the <wikipedia>Latin script</wikipedia>. Therefore, tagging text in French as
|
|
<code>fr-Latn</code> is useless and confusing. A simple
|
|
<code>fr</code> is enough. In the unlikely case that you meet French
|
|
texts in the <wikipedia>Arabic script</wikipedia>, then you can add a subtag for the script:
|
|
<code>fr-Arab</code>. (This is specially important since the former
|
|
standard, in RFC 3066, did not have subtags for scripts and therefore
|
|
old applications will have problems to handle them.)</p>
|
|
|
|
<p>Use as precise a tag as possible, but no more specific than is
|
|
justified. Avoid using subtags that are not important for
|
|
distinguishing content in an application. For example, <code>de</code>
|
|
might suffice for tagging an email written in
|
|
<wikipedia name="German language">German</wikipedia>, while <code>de-CH-1996</code>, while
|
|
legal,is probably unnecessarily precise for such a task.</p>
|
|
|
|
<p>But do not be too vague: the primary language subtag might not be
|
|
sufficient to give all the information necessary to understand the
|
|
text. For
|
|
example, the tag <code>az</code> (for
|
|
<wikipedia name="Azerbaijani language">Azerbaidjani</wikipedia>) is probably insufficient in the
|
|
absence of context, because this language has no dominant script. A person fluent in
|
|
one script might not be able to read the other, even though the text
|
|
might be identical. Content tagged as <code>az</code> most probably is written
|
|
in just one script and thus might not be intelligible to a reader
|
|
familiar with the other script. <code>az-Latn</code>,
|
|
<code>az-Cyrl</code> or <code>az-Arab</code> are probably necessary.</p>
|
|
|
|
<p>If a tag or subtag has a <code>Preferred-Value</code> field in its registry
|
|
entry, then the value of that field should be used to form the
|
|
language tag. For example, use <code>he</code> for <wikipedia
|
|
name="Hebrew language">Hebrew</wikipedia> in preference to
|
|
<code>iw</code>.</p>
|
|
|
|
<p>Validity of a tag is not everything. A tag may be both valid and
|
|
meaningless. This is unavoidable with a generative system like the
|
|
language subtag mechanism. So, <code>ar-Cyrl-AQ</code>
|
|
(<wikipedia>Arabic</wikipedia> written with the <wikipedia name="Cyrillic alphabet">cyrillic
|
|
script</wikipedia>, as used in <wikipedia>Antarctica</wikipedia>) is
|
|
perfectly valid but should nevertheless be avoided because it has no
|
|
relationship with the reality (there is not a single document with
|
|
these characteristics).</p>
|
|
|
|
</page>
|
|
|