Here is a regular expression to parse the future versions of language tags. Suitable for the syntax of the RFC 5646. Written by Addison Phillips, addison - at - amazon.com for the Java programming language.

     static final String langtag_ex =
     "(\\A[xX]([\\x2d]\\p{Alnum}{1,8})*\\z)"
       + "|(((\\A\\p{Alpha}{2,8}(?=\\x2d|\\z)){1}"
       + "(([\\x2d]\\p{Alpha}{3})(?=\\x2d|\\z)){0,3}"
       + "([\\x2d]\\p{Alpha}{4}(?=\\x2d|\\z))?"
       + "([\\x2d](\\p{Alpha}{2}|\\d{3})(?=\\x2d|\\z))?"
       + "([\\x2d](\\d\\p{Alnum}{3}|\\p{Alnum}{5,8})(?=\\x2d|\\z))*)"
       + "(([\\x2d]([a-wyzA-WYZ](?=\\x2d))([\\x2d](\\p{Alnum}{2,8})+)*))*"
       + "([\\x2d][xX]([\\x2d]\\p{Alnum}{1,8})*)?)\\z";