Network Working Group D. Ewell, Ed. Internet-Draft Consultant Intended status: Informational February 2, 2008 Expires: August 5, 2008 Update to the Language Subtag Registry draft-ietf-ltru-4645bis-04 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on August 5, 2008. Copyright Notice Copyright (C) The IETF Trust (2008). Ewell Expires August 5, 2008 [Page 1] Internet-Draft Update to the Language Subtag Registry February 2008 Abstract This memo defines the procedure used to update the IANA Language Subtag Registry in conjunction with the publication of RFC 4646bis [RFC EDITOR NOTE: replace with actual RFC number], for use in forming tags for identifying languages. As an Internet-Draft, it also contained a complete replacement of the contents of the Registry to be used by IANA in updating it. To prevent confusion, this material was removed before publication. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Updating the Registry . . . . . . . . . . . . . . . . . . . . 4 2.1. Starting Point . . . . . . . . . . . . . . . . . . . . . . 4 2.2. New Language Subtags . . . . . . . . . . . . . . . . . . . 5 2.3. Modified Language Subtags . . . . . . . . . . . . . . . . 5 2.4. New Region Subtags . . . . . . . . . . . . . . . . . . . . 5 2.5. Grandfathered and Redundant Tags . . . . . . . . . . . . . 6 2.6. Additional Changes . . . . . . . . . . . . . . . . . . . . 9 3. Updated Registry Contents . . . . . . . . . . . . . . . . . . 10 4. Security Considerations . . . . . . . . . . . . . . . . . . . 11 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 6. Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 7.1. Normative References . . . . . . . . . . . . . . . . . . . 15 7.2. Informative References . . . . . . . . . . . . . . . . . . 15 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 17 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 18 Intellectual Property and Copyright Statements . . . . . . . . . . 19 Ewell Expires August 5, 2008 [Page 2] Internet-Draft Update to the Language Subtag Registry February 2008 1. Introduction [RFC4646] provided for a Language Subtag Registry and described its format. The initial contents of the Registry and rules for determining them were specified in [RFC4645]. [draft-ietf-ltru-4646bis-11] expands on [RFC4646] by adding support for approximately 7,200 language subtags based on [ISO639-3] alpha-3 code elements, and seven region subtags based on [ISO3166-1] exceptionally reserved code elements. This memo describes the process of updating the Registry to include these additional subtags, and to make secondary changes to the Registry that result from adding the new subtags. In its initial phase as an Internet-Draft, this memo also contained a complete replacement of the contents of the Language Subtag Registry to be used by the Internet Assigned Numbers Authority (IANA) in updating it. This content was deleted from this memo prior to publication as an RFC. The format of the Language Subtag Registry, and the definition and intended purpose of each of the fields, are described in [draft-ietf-ltru-4646bis-11]. The Registry is expected to change over time, as new subtags are registered and existing subtags are modified or deprecated. The process of updating the Registry is described in Section 3 of [draft-ietf-ltru-4646bis-11]. In its Internet-Draft phase, this memo did not define the permanent contents of the Registry and should not be represented as having done so. Many of the subtags defined in the Language Subtag Registry are based on code elements defined in [ISO639-1], [ISO639-2], [ISO639-3], [ISO3166-1], [ISO15924], and [UN_M.49]. The Registry is not a mirror of the code lists defined by these standards and should not be used as one. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Ewell Expires August 5, 2008 [Page 3] Internet-Draft Update to the Language Subtag Registry February 2008 2. Updating the Registry This section describes the process for determining the updated contents of the Language Subtag Registry. 2.1. Starting Point The version of the Language Subtag Registry that was current at the time of IESG approval of this memo served as the starting point for this update. This version was created according to the process described in [RFC4645] and maintained according to the process described in [RFC4646]. The source data for [ISO639-3] used for this update consisted of three files, available from the official site of the ISO 639-3 Registration Authority. [RFC EDITOR NOTE: these files may be updated before approval of this memo.] o [iso-639-3_20080114] is a list of all language code elements in [ISO639-3], including the alpha-3 code element and reference name for each code element. For example, the entry for the Dari language contained the code element "prs" and the name "Dari" (among other information). o [iso-639-3_Name_Index_20080114] is a list containing all names associated with each language according to [ISO639-3], including both inverted and non-inverted forms where appropriate. A code element may have more than one entry in this file; the reference name and its inverted form are usually, but not always, given in the first entry. For example, this file contained an entry for the code element "prs" with the name "Dari" (twice) and another entry with the names "Eastern Farsi" and "Farsi, Eastern". o [iso-639-3-macrolanguages_20080116] is a list of all alpha-3 code elements for languages that are encompassed by a macrolanguage in [ISO639-3], together with the alpha-3 code element for the macrolanguage. For example, a line containing the code elements "fas" and "prs" indicated that the macrolanguage "Persian" encompasses the individual language "Dari". (Note that these alpha-3 code elements may not have corresponded directly to subtags in the Registry, which uses 2-letter subtags derived from [ISO639-1] when possible.) Language code elements that were already retired in [ISO639-3] prior to IESG approval of this memo were not listed in these files, and consequently were not considered in this update. The values of the File-Date field, the Added date for each new subtag Ewell Expires August 5, 2008 [Page 4] Internet-Draft Update to the Language Subtag Registry February 2008 record, and the Deprecated date for each existing grandfathered or redundant tag deprecated by this update are set to a date as near as practical to the date of IESG approval of this memo. [RFC EDITOR NOTE: these dates are initially set to 2029-09-09 for easy recognition, and MUST be updated during AUTH48.] 2.2. New Language Subtags For each language in [ISO639-3] that was not already represented by a language subtag in the Language Subtag Registry, a new subtag was added to the Registry, using the [ISO639-3] code element as the value for the Subtag field and each of the non-inverted [ISO639-3] names as a separate Description field. The [ISO639-3] reference name was represented by the first Description field. If the language was encompassed by a macrolanguage, as determined by [iso-639-3-macrolanguages_20080116], a Macrolanguage field was added for the encompassed language, with a value equal to the subtag of the macrolanguage. All subtags were added to the Registry maintaining alphabetical order within each type of subtag: all 2-letter "language" subtags first, then all 3-letter "language" subtags. Some existing records were moved to ensure this order. 2.3. Modified Language Subtags For each language in [ISO639-3] that was already represented by a language subtag in the Language Subtag Registry, Description fields were added as necessary to reflect all non-inverted names listed for that language in [iso-639-3_Name_Index_20080114]. Any existing Description fields reflecting inverted names were removed. The order of Description fields was adjusted to ensure that the reference name from [ISO639-3] was listed first, followed by other names from [ISO639-3] in the order presented by that standard, followed by any other names already existing in the Registry. In some cases this resulted in a reordering of Description fields for existing entries, even when no new values were added. For each language that was encompassed by a macrolanguage in [ISO639-3], a Macrolanguage field was added, with a value equal to the subtag of the macrolanguage. 2.4. New Region Subtags [draft-ietf-ltru-4646bis-11] expands the scope of region subtags by adding subtags based on code elements defined as "exceptionally reserved" in [ISO3166-1]. These code elements are reserved by the Ewell Expires August 5, 2008 [Page 5] Internet-Draft Update to the Language Subtag Registry February 2008 ISO 3166 Maintenance Agency "at the request of national ISO member bodies, governments and international organizations." At the time of IESG approval of this memo, ISO 3166/MA had defined nine exceptionally reserved code elements, all of which were added to the Language Subtag Registry except for the following: o "FX" (Metropolitan France) was already present in the Language Subtag Registry because it was an assigned [ISO3166-1] code element from 1993 to 1997. This subtag is deprecated with a Preferred-Value of "FR"; for stability reasons, Section 3.4, item 1 of [draft-ietf-ltru-4646bis-11] states that this "deprecated" status cannot be changed. o "UK" (United Kingdom) was not added because it is associated with the same UN M.49 code (826) as the existing region subtag "GB". Section 3.4, item 14 (D) of [draft-ietf-ltru-4646bis-11] states that a new region subtag is not added if it carries the same meaning as an existing region subtag. 2.5. Grandfathered and Redundant Tags As stated in [draft-ietf-ltru-4646bis-11], "grandfathered" and "redundant" tags are complete tags in the Language Subtag Registry that were registered under [RFC1766] or [RFC3066] and remain valid. Grandfathered tags cannot be generated from a valid combination of subtags, while redundant tags can be. Under certain conditions, registration of a subtag under [draft-ietf-ltru-4646bis-11] may cause a grandfathered tag to be reclassified as redundant. It may also enable the creation of a generative tag with the same meaning as a grandfathered or redundant tag; in that case, the grandfathered or redundant tag is marked as Deprecated, and the generative tag (including the new subtag) becomes its Preferred-Value. As a result of adding the new subtags in this update, the following grandfathered tags were deprecated, with the indicated generative tag serving as the Preferred-Value: i-ami (Preferred-Value: ami) i-bnn (Preferred-Value: bnn) i-pwn (Preferred-Value: pwn) i-tao (Preferred-Value: tao) Ewell Expires August 5, 2008 [Page 6] Internet-Draft Update to the Language Subtag Registry February 2008 i-tay (Preferred-Value: tay) i-tsu (Preferred-Value: tsu) zh-cmn (Preferred-Value: cmn) zh-cmn-Hans (Preferred-Value: cmn-Hans) zh-cmn-Hant (Preferred-Value: cmn-Hant) zh-gan (Preferred-Value: gan) zh-hakka (Preferred-Value: hak) zh-min (no Preferred-Value; see below) zh-min-nan (Preferred-Value: nan) zh-wuu (Preferred-Value: wuu) zh-xiang (Preferred-Value: hns) The tag "zh-min", originally registered under [RFC1766], is a special case: it represents a small class of Chinese languages, but is not a true macrolanguage. The string "min" could not ever be used to tag these languages since the [ISO639-3] code element "min" is assigned to an individual language (Minangkabau) that is not related to Chinese ("zh"). Because it is not believed to represent a useful linguistic entity for tagging purposes, it was deprecated without a Preferred-Value. The following grandfathered and redundant sign-language tags were deprecated, with the indicated generative tag serving as the Preferred-Value: sgn-BE-fr (Preferred-Value: sfb) sgn-BE-nl (Preferred-Value: vgt) sgn-BR (Preferred-Value: bzs) sgn-CH-de (Preferred-Value: sgg) sgn-CO (Preferred-Value: csn) sgn-DE (Preferred-Value: gsg) Ewell Expires August 5, 2008 [Page 7] Internet-Draft Update to the Language Subtag Registry February 2008 sgn-DK (Preferred-Value: dsl) sgn-ES (Preferred-Value: ssp) sgn-FR (Preferred-Value: fsl) sgn-GB (Preferred-Value: bfi) sgn-GR (Preferred-Value: gss) sgn-IE (Preferred-Value: isg) sgn-IT (Preferred-Value: ise) sgn-JP (Preferred-Value: jsl) sgn-MX (Preferred-Value: mfs) sgn-NI (Preferred-Value: ncs) sgn-NL (Preferred-Value: dse) sgn-NO (Preferred-Value: nsl) sgn-PT (Preferred-Value: psr) sgn-SE (Preferred-Value: swl) sgn-US (Preferred-Value: ase) sgn-ZA (Preferred-Value: sfs) No change was made to the Description field(s) for any of the grandfathered or redundant tags. For example, the redundant tag "sgn-US" continues to carry the Description "American Sign Language". The sign language tags registered prior to [RFC4646] remain an exception to the general principle that the meaning of a non- grandfathered tag can be derived from its component subtags. In previous versions of the Registry, grandfathered tags that had been deprecated as a result of adding an ISO 639-based language subtag included a Comments field, with a value of the form "replaced by ISO code xxx", where "xxx" represented the new language subtag. These comments duplicated the information contained within the Preferred-Value field, and were deleted as part of this update. No changes were made to other Comments fields. Ewell Expires August 5, 2008 [Page 8] Internet-Draft Update to the Language Subtag Registry February 2008 2.6. Additional Changes For consistency with the handling of alternative names in language subtags, Description fields for script subtags taken from [ISO15924] that represent alternative names were converted to multiple Description fields. For example, the Description "Han (Hanzi, Kanji, Hanja)" was converted to four separate Description fields. Some Description fields for script subtags contained parenthetical material that was explanatory, rather than identifying alternative names; these fields were not altered. This situation does not apply to region subtags taken from [ISO3166-1] and [UN_M.49] because those standards do not provide freely available alternative names for code elements. The capitalization of the Subtag field for the redundant tag "yi- latn" was changed to "yi-Latn" for consistency with the capitalization conventions described in Section 2.1 of [draft-ietf-ltru-4646bis-11]. Ewell Expires August 5, 2008 [Page 9] Internet-Draft Update to the Language Subtag Registry February 2008 3. Updated Registry Contents The remainder of this section specified the updated set of records for the Language Subtag Registry. This material was deleted before publication of this memo, to avoid any potential confusion with the Registry itself. The IANA Language Subtag Registry can be found at under "Language Tags". Ewell Expires August 5, 2008 [Page 10] Internet-Draft Update to the Language Subtag Registry February 2008 4. Security Considerations For security considerations relevant to the Language Subtag Registry and the use of language tags, see [draft-ietf-ltru-4646bis-11]. Ewell Expires August 5, 2008 [Page 11] Internet-Draft Update to the Language Subtag Registry February 2008 5. IANA Considerations In its initial phase as an Internet-Draft, this memo contained a complete replacement of the contents of the Language Subtag Registry to be used by IANA in updating it. As an RFC, it contains a pointer to the Registry, which is maintained by IANA. The Language Subtag Registry can be found at under "Language Tags". For details on the procedures for the format and ongoing maintenance of this Registry, see [draft-ietf-ltru-4646bis-11]. Ewell Expires August 5, 2008 [Page 12] Internet-Draft Update to the Language Subtag Registry February 2008 6. Changes [Editor's Note: This section is provided for the convenience of reviewers and will be removed from the final document.] This memo is a new work, not an incremental update of [RFC4645]. The procedure for populating the original Language Subtag Registry, specified by the earlier [RFC4646], is included by reference to [RFC4645]. Therefore, no changes from [RFC4645] are listed in this section. Changes between draft-ietf-ltru-4645bis-03 and this version are: o Updated Registry data to reflect new contents of Language Subtag Registry and new [ISO639-3] data. o Updated reference to [draft-ietf-ltru-4646bis-11]. o Clarified that language code elements already retired from [ISO639-3] are not included in the updated Registry contents in this memo. (K. Karlsson) Changes between draft-ietf-ltru-4645bis-02 and draft-ietf-ltru-4645bis-03 were: o Removed procedures for adding extended language subtags and Prefix fields. o Added procedure for new region subtags based on [ISO3166-1] exceptionally reserved code elements. o Added instructions for IANA to convert the updated Language Subtag Registry contents in this document to UTF-8. o Updated Registry data to reflect new contents of Language Subtag Registry and new [ISO639-3] data. o Updated reference to [ISO3166-1]. Changes between draft-ietf-ltru-4645bis-01 and draft-ietf-ltru-4645bis-02 were: o Updated Registry data to reflect new contents of Language Subtag Registry and new [ISO639-3] data. o Changed "placeholder" date to 2029-09-09 for easy recognition and to reduce confusion. Ewell Expires August 5, 2008 [Page 13] Internet-Draft Update to the Language Subtag Registry February 2008 o Added procedures to populate the Macrolanguage field. Changes between draft-ietf-ltru-4645bis-00 and draft-ietf-ltru-4645bis-01 were: o Changed procedure to incorporate data from new "Language Names Index" file and to ensure that the [ISO639-3] reference name is the first Description field. o Removed some exceptions as a result of improved [ISO639-3] draft data. o Removed section dealing with special handling of "(generic)" and "(specific)" strings within language names. o Added an explanation of the process of converting a compound Description field for a script subtag to multiple Description fields. o Removed Comments fields of the form "replaced by ISO code xxx", which provided no additional information beyond the Preferred- Value field. o Clarified that the included Registry contents are a complete replacement for the existing Registry, not a set of deltas. (J. Cowan) o Changed "included within" a macrolanguage to use "encompassed by" throughout. (J. Cowan) o Provided better explanation for deprecation of "zh-min" in Section 2.5. (J. Cowan) o Added Changes and Acknowledgements sections. Ewell Expires August 5, 2008 [Page 14] Internet-Draft Update to the Language Subtag Registry February 2008 7. References 7.1. Normative References [ISO639-3] International Organization for Standardization, "ISO 639- 3:2007. Codes for the representation of names of languages - Part 3: Alpha-3 code for comprehensive coverage of languages, first edition", 2007. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [draft-ietf-ltru-4646bis-11] Phillips, A., Ed. and M. Davis, Ed., "Tags for Identifying Languages", December 2007, . [iso-639-3-macrolanguages_20080116] International Organization for Standardization, "ISO 639-3 Macrolanguage Mappings", January 2008, . [iso-639-3_20080114] International Organization for Standardization, "ISO 639-3 Code Set", January 2008, . [iso-639-3_Name_Index_20080114] International Organization for Standardization, "ISO 639-3 Language Names Index", January 2008, . 7.2. Informative References [ISO15924] International Organization for Standardization, "ISO 15924:2004. Information and documentation -- Codes for the representation of names of scripts", January 2004. [ISO3166-1] International Organization for Standardization, "ISO 3166- 1:2006. Codes for the representation of names of countries and their subdivisions -- Part 1: Country codes", November 2006. [ISO639-1] Ewell Expires August 5, 2008 [Page 15] Internet-Draft Update to the Language Subtag Registry February 2008 International Organization for Standardization, "ISO 639- 1:2002. Codes for the representation of names of languages -- Part 1: Alpha-2 code", 2002. [ISO639-2] International Organization for Standardization, "ISO 639- 2:1998. Codes for the representation of names of languages -- Part 2: Alpha-3 code, first edition", 1998. [RFC1766] Alvestrand, H., "Tags for the Identification of Languages", RFC 1766, March 1995. [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, June 1999. [RFC3066] Alvestrand, H., "Tags for the Identification of Languages", RFC 3066, January 2001. [RFC4645] Ewell, D., "Initial Language Subtag Registry", RFC 4645, September 2006. [RFC4646] Phillips, A. and M. Davis, "Tags for Identifying Languages", BCP 47, RFC 4646, September 2006. [UN_M.49] Statistics Division, United Nations, "Standard Country or Area Codes for Statistical Use", UN Standard Country or Area Codes for Statistical Use, Revision 4 (United Nations publication, Sales No. 98.XVII.9), June 1999. Ewell Expires August 5, 2008 [Page 16] Internet-Draft Update to the Language Subtag Registry February 2008 Appendix A. Acknowledgements This memo is a collaborative work of the Language Tag Registry Update (LTRU) Working Group. All of its members have made significant contributions to this memo and to its predecessor, [RFC4645]. Specific contributions to this memo were made by Stephane Bortzmeyer, John Cowan, Mark Davis, Martin Duerst, Frank Ellermann, Kent Karlsson, and Addison Phillips. This document was written with the xml2rfc tool described in [RFC2629]. Ewell Expires August 5, 2008 [Page 17] Internet-Draft Update to the Language Subtag Registry February 2008 Author's Address Doug Ewell (editor) Consultant Email: doug@ewellic.org URI: http://www.ewellic.org Ewell Expires August 5, 2008 [Page 18] Internet-Draft Update to the Language Subtag Registry February 2008 Full Copyright Statement Copyright (C) The IETF Trust (2008). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgment Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA). Ewell Expires August 5, 2008 [Page 19]