(Most of the content on this page comes directly from
For the same body of text, you may have several possible tags. Interoperability is best served when all users use the same language tag for the same language. The rules here are intended to help in that respect.
Subtags should only be used where they add useful distinguishing
information; extraneous subtags interfere with the meaning,
understanding, and processing of language tags. In particular, fields
Suppress-Script
in the registry should be obeyed: for
instance, fr
(Suppress-Script: Latn
because the overwhelming majority
of French texts are in the fr-Latn
is useless and confusing. A simple
fr
is enough. In the unlikely case that you meet French
texts in the fr-Arab
. (This is specially important since the former
standard, in RFC 3066, did not have subtags for scripts and therefore
old applications will have problems to handle them.)
Use as precise a tag as possible, but no more specific than is
justified. Avoid using subtags that are not important for
distinguishing content in an application. For example, de
might suffice for tagging an email written in
de-CH-1996
, while
legal,is probably unnecessarily precise for such a task.
But do not be too vague: the primary language subtag might not be
sufficient to give all the information necessary to understand the
text. For
example, the tag az
(for
az
most probably is written
in just one script and thus might not be intelligible to a reader
familiar with the other script. az-Latn
,
az-Cyrl
or az-Arab
are probably necessary.
If a tag or subtag has a Preferred-Value
field in its registry
entry, then the value of that field should be used to form the
language tag. For example, use he
for iw
.
Validity of a tag is not everything. A tag may be both valid and
meaningless. This is unavoidable with a generative system like the
language subtag mechanism. So, ar-Cyrl-AQ
(