2010/05/16 01:25
阅读数 1.2K

    翻译:红猎人 (zengsai@gmail.com)

Source code representation 源代码表示[Top]

Source code is Unicode text encoded in UTF-8. The text is not canonicalized, so a single accented code point is distinct from the same character constructed from combining an accent and a letter; those are treated as two code points. For simplicity, this document will use the term character to refer to a Unicode code point.

源代码是用 UTF-8 编码 的 Unicode 文本。文本不是规范化的,因此一个单独加了重音的代码点有别于由字母和重音 结合而成的字符;它们对当作两个代码点对待。为了简单起见,该文档使用术语 字符 指代 Unicode 代码点。

Each code point is distinct; for instance, upper and lower case letters are different characters.


Implementation restriction: For compatibility with other tools, a compiler may disallow the NUL character (U+0000) in the source text.

执行限制: 为了与其它工具兼容, 编译器可能不允许在源代码中包含 NUL 字符 (U+0000) 。


The following terms are used to denote specific Unicode character classes:

unicode_char   = /* an arbitrary Unicode code point */ .
unicode_letter = /* a Unicode code point classified as "Letter" */ .
unicode_digit  = /* a Unicode code point classified as "Digit" */ .

下面的术语用于表示指定的 Unicode 字符类:

unicode_char   = /* 任意一个 Unicode 代码点 */ .
unicode_letter = /* 属于 "字母" 类的一个 Unicode 代码点 */ .
unicode_digit  = /* 属于 "数字" 类的一个 Unicode 代码点 */ .

In The Unicode Standard 5.2, Section 4.5 General Category-Normative defines a set of character categories. Go treats those characters in category Lu, Ll, Lt, Lm, or Lo as Unicode letters, and those in category Nd as Unicode digits.

在 Unicode 标准 5.2 中, 第 4.5 节 General Category-Normative(一般分类规范)中定义了一组字符类别。Go 语言把 Lu, Ll, Lt, Lm 或 Lo 类中的字符当作 Unicode 字母, 把 Nd 类中的字符当作 Unicode 数字。

Letters and digits 字母和数字

The underscore character _ (U+005F) is considered a letter.

下划线 _ (U+005F) 被当作字母。

letter        = unicode_letter | "_" .
decimal_digit = "0" ... "9" .
octal_digit   = "0" ... "7" .
hex_digit     = "0" ... "9" | "A" ... "F" | "a" ... "f" .


点击引领话题📣 发布并加入讨论🔥
0 评论
2 收藏