2010/05/16 01:28
阅读数 1.5K

    翻译:红猎人 (zengsai@gmail.com)

Lexical elements 词法元素[Top]

Comments 注释

There are two forms of comments:

  1. Line comments start with the character sequence // and continue through the next newline. A line comment acts like a newline.
  2. General comments start with the character sequence /* and continue through the character sequence */. A general comment that spans multiple lines acts like a newline, otherwise it acts like a space.


  1. 行注释 从 // 开始直到行尾。行注释的行为就像一个换行符。
  2. 普通注释 从 /* 开始直到 */。 如果普通注释跨跃多行,它的行为就像一个换行符,否则它的行为就像一个空格。

Comments do not nest.


Tokens 标记

Tokens form the vocabulary of the Go language. There are four classes: identifierskeywordsoperators and delimiters, and literalsWhite space, formed from spaces (U+0020), horizontal tabs (U+0009), carriage returns (U+000D), and newlines (U+000A), is ignored except as it separates tokens that would otherwise combine into a single token. Also, a newline may trigger the insertion of a semicolon. While breaking the input into tokens, the next token is the longest sequence of characters that form a valid token.

Go 语言的词汇由标记组成. 分为四类: 标识符关键字运算符和分隔符 和 直接常量。 由空格(U+0020)、水平制表符(U+0009)、回车符(U+000D)、换行符(U+000A)组成的 空白 除用于分隔标记的之外,都会被合并为一个标记。同时,换行可能会触发一个 分号 插入操作。在把输入分解为标记时, 下一个标记将会是可以组成合法标记的最长字符序列。

Semicolons 分号

The formal grammar uses semicolons ";" as terminators in a number of productions. Go programs may omit most of these semicolons using the following two rules:

正式的语法用分号 ";" 作为生产式的终结符。Go 程序可能使用以下两个规则 来省略大多数分号。

  1. When the input is broken into tokens, a semicolon is automatically inserted into the token stream at the end of a non-blank line if the line's final token is

    • an identifier
    • an integer, floating-point, character, or string literal
    • one of the keywords breakcontinuefallthrough, or return
    • one of the operators and delimiters ++--)], or }
  2. To allow complex statements to occupy a single line, a semicolon may be omitted before a closing ")" or "}".
  1. 在把输入分解成标记的时候,如果不是空行并且该行的最后一个标记是以下情况时,会在标记流的末尾自动插入一个分号。

    • 标识符
    • 整数、浮点数、字符或字符串直接常量
    • 以下关键字之一 breakcontinuefallthrough 或 return
    • 以下运算符或分隔符之一 ++--)] 或 }
  2. 为了在一行中作复杂的声明,在封闭的 ")" 或 "}" 之前的分号将被省略。

To reflect idiomatic use, code examples in this document elide semicolons using these rules.


Identifiers 标识符

Identifiers name program entities such as variables and types. An identifier is a sequence of one or more letters and digits. The first character in an identifier must be a letter.

标识符用来给程序中的实体(如变量和类型)命名。 一个标识符就是由一个或多个字母和数字组成的序列。 标识符的第一个字符必须是字母。

identifier = letter { letter | unicode_digit } .

Some identifiers are predeclared.

有些标识符是 预定义 的。

Keywords 关键字

The following keywords are reserved and may not be used as identifiers.


break        default      func         interface    select
case         defer        go           map          struct
chan         else         goto         package      switch
const        fallthrough  if           range        type
continue     for          import     return       var

Operators and Delimiters 运算符和分隔符

The following character sequences represent operators, delimiters, and other special tokens:

下面的字符序列代表 运算符、分隔符和其它特殊标记:

+    &     +=    &=     &&    ==    !=    (    )
-    |     -=    |=     ||    <     <=    [    ]
*    ^     *=    ^=     <-    >     >=    {    }
/    <<    /=    <<=    ++    =     :=    ,    ;
%    >>    %=    >>=    --    !     ...   .    :
     &^          &^=

Integer literals 整型字面值

An integer literal is a sequence of digits representing an integer constant. An optional prefix sets a non-decimal base: 0 for octal, 0x or 0X for hexadecimal. In hexadecimal literals, letters a-f and A-F represent values 10 through 15.

整型字面值是一个代表 整型常量 的数字序列。 可以添加一个前缀来表示非十进制基底的数: 0 代表八进制, 0x 或 0X 代表十六进制。在十六进制的字面值中,字母 a-f和 A-F 代表值 10 到 15。

int_lit     = decimal_lit | octal_lit | hex_lit .
decimal_lit = ( "1" ... "9" ) { decimal_digit } .
octal_lit   = "0" { octal_digit } .
hex_lit     = "0" ( "x" | "X" ) hex_digit { hex_digit } .

Floating-point literals 浮点数字面值

A floating-point literal is a decimal representation of a floating-point constant. It has an integer part, a decimal point, a fractional part, and an exponent part. The integer and fractional part comprise decimal digits; the exponent part is an e or E followed by an optionally signed decimal exponent. One of the integer part or the fractional part may be elided; one of the decimal point or the exponent may be elided.

浮点字面值是 浮点数常量 的十进制表示。 它有一个整数部分、一个分数部分、一个小数部分和一个指数部分。 整数和分数部分包括十进制数;指数部分是一个 e or E 后面可选的跟着一上二进制指数。 整数部分和分数部分可以二者舍其一;小数点和指数也可以二者舍其一。

float_lit = decimals "." [ decimals ] [ exponent ] |
            decimals exponent |
            "." decimals [ exponent ] .
decimals  = decimal_digit { decimal_digit } .
exponent  = ( "e" | "E" ) [ "+" | "-" ] decimals .
072.40  // == 72.40

Imaginary literals 虚数字面值

An imaginary literal is a decimal representation of the imaginary part of a complex constant. It consists of a floating-point literal or decimal integer followed by the lower-case letter i.

虚数字面值是 复数型常量 的虚数部分的十进制表示。 它由一个 浮点数字面值 或二进制整型后面跟一个小字字母 i 构成。

imaginary_lit = (decimals | float_lit) "i" .
011i  // == 11i

Character literals 字符字面值

A character literal represents an integer constant, typically a Unicode code point, as one or more characters enclosed in single quotes. Within the quotes, any character may appear except single quote and newline. A single quoted character represents itself, while multi-character sequences beginning with a backslash encode values in various formats.

一个字符表示一个 整型常量,通常是一个 Unicode 代码点, 用一个或多个包围在单引号中的字符来表示。引号中可以包含除引号和换行之外的任何字符。 一个用单引号包围起来的字符代表字符本身,而用单引号包围起来的以反斜杠开头的字符序列则 会根据其不同的格式表示不同的值。

The simplest form represents the single character within the quotes; since Go source text is Unicode characters encoded in UTF-8, multiple UTF-8-encoded bytes may represent a single integer value. For instance, the literal 'a' holds a single byte representing a literal a, Unicode U+0061, value 0x61, while 'ä' holds two bytes (0xc30xa4) representing a literal a-dieresis, U+00E4, value 0xe4.

最简单的形式就是表示单引号包围的单一字符;由于 Go 的源代码文本是用 UTF-8 编码的 Unicode 字符, 因此多个 UTF-8 编码的字节可以表示一个整型值。如,字面值'a' 用一个字节表示一个文字 a,Unicode U+0061, 值 0x61,而 'ä' 用两个字节 (0xc3 0xa4) 表示一个文字 a-分音符, U+00E4, 值 0xe4.

Several backslash escapes allow arbitrary values to be represented as ASCII text. There are four ways to represent the integer value as a numeric constant: \xfollowed by exactly two hexadecimal digits; \u followed by exactly four hexadecimal digits; \U followed by exactly eight hexadecimal digits, and a plain backslash \followed by exactly three octal digits. In each case the value of the literal is the value represented by the digits in the corresponding base.

可以使用多种反斜杠转义格式把任意值表示为 ASCII 文本。有四种方法把整型值表示为数字常量: \x 后面跟两个十六进制数字, \u 后面跟四个十六进制数字, \U 后面跟八个十六进制数字以及 \ 后面跟三个八进制数字。 以上形式的表示中,字面值表示的值就是数字在相应的数基中代表的值。

Although these representations all result in an integer, they have different valid ranges. Octal escapes must represent a value between 0 and 255 inclusive. Hexadecimal escapes satisfy this condition by construction. The escapes \u and \U represent Unicode code points so within them some values are illegal, in particular those above 0x10FFFF and surrogate halves.

尽管上面几种表示都代表一个整数,但是它们表示的范围不同。八进制的转义序列只能表示 0 到 255 之间的数。十六进制转义序列也满足这个条件。转义符号 \u 和 \U 表示合法的 Unicode 代码点的值,通常这个值小于 0x10FFFF

After a backslash, certain single-character escapes represent special values:


\a   U+0007 alert or bell
\b   U+0008 backspace
\f   U+000C form feed
\n   U+000A line feed or newline
\r   U+000D carriage return
\t   U+0009 horizontal tab
\v   U+000b vertical tab
\\   U+005c backslash
\'   U+0027 single quote  (valid escape only within character literals)
\"   U+0022 double quote  (valid escape only within string literals)

All other sequences starting with a backslash are illegal inside character literals.


char_lit         = "'" ( unicode_value | byte_value ) "'" .
unicode_value    = unicode_char | little_u_value | big_u_value | escaped_char .
byte_value       = octal_byte_value | hex_byte_value .
octal_byte_value = `\` octal_digit octal_digit octal_digit .
hex_byte_value   = `\` "x" hex_digit hex_digit .
little_u_value   = `\` "u" hex_digit hex_digit hex_digit hex_digit .
big_u_value      = `\` "U" hex_digit hex_digit hex_digit hex_digit
                           hex_digit hex_digit hex_digit hex_digit .
escaped_char     = `\` ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | `\` | "'" | `"` ) .

String literals 字符串字面值

A string literal represents a string constant obtained from concatenating a sequence of characters. There are two forms: raw string literals and interpreted string literals.

字符串字面值表示由字符序列构成的 字符串常量。有两种格式: 原始字符串字面值和解释字符串字面值。

Raw string literals are character sequences between back quotes ``. Within the quotes, any character is legal except back quote. The value of a raw string literal is the string composed of the uninterpreted characters between the quotes; in particular, backslashes have no special meaning and the string may span multiple lines.

原始字符串是放在反引号 `` 之间的字符序列。 在反引号之间可以放置除反引号本身之外的任意字符。 原始字符串字面值表示的值就是由反引号之间的字符组成的字符串。 特别是,原始字符串中反斜杠没有特殊含义,可以跨跃多行。

Interpreted string literals are character sequences between double quotes "". The text between the quotes, which may not span multiple lines, forms the value of the literal, with backslash escapes interpreted as they are in character literals (except that \' is illegal and \" is legal). The three-digit octal (\nnn) and two-digit hexadecimal (\xnn) escapes represent individual bytes of the resulting string; all other escapes represent the (possibly multi-byte) UTF-8 encoding of individualcharacters. Thus inside a string literal \377 and \xFF represent a single byte of value 0xFF=255, while ÿ\u00FF\U000000FF and \xc3\xbf represent the two bytes0xc3 0xbf of the UTF-8 encoding of character U+00FF.

解释字符串字面值就是双引号 "" 之间的字符序列。引号之间的文本不可以跨跃多行, 字面值的值就是被解释过后的文本的值,反斜杠转义字符会被当成字符来解释(除开 \' 是非法的, 但是 \" 是合法的)。三位八进制数 (\nnn) 和两位十六进制数 (\xnn) 分别代表它们对应的字符的 字节码; 所有其它的转义代表一个(可能是多字节的)UTF-8 编码的 字符。 因此在字符串字面值内部 \377 和 \xFF 就代表点一个字节的值 0xFF=255, 而 ÿ\u00FF\U000000FF 和 \xc3\xbf 代表占两个字节 0xc30xbf 的 UTF-8 编码字符 U+00FF。

string_lit             = raw_string_lit | interpreted_string_lit .
raw_string_lit         = "`" { unicode_char } "`" .
interpreted_string_lit = `"` { unicode_value | byte_value } `"` .
`abc`  // same as "abc"
\n`    // same as "\\n\n\\n"
"Hello, world!\n"

These examples all represent the same string:


"日本語"                                 // UTF-8 input text
`日本語`                                 // UTF-8 input text as a raw literal
"\u65e5\u672c\u8a9e"                    // The explicit Unicode code points
"\U000065e5\U0000672c\U00008a9e"        // The explicit Unicode code points
"\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e"  // The explicit UTF-8 bytes

If the source code represents a character as two code points, such as a combining form involving an accent and a letter, the result will be an error if placed in a character literal (it is not a single code point), and will appear as two code points if placed in a string literal.

如果源代码中用两个代码点来表示一个字符(比如用重音和字母组合成一个字符), 如果出现在字符字面值中会是一个错误,因为字符字面值中不可以有两个字符。 如果出现在字符串中,将表示两个代码点。


点击引领话题📣 发布并加入讨论🔥
0 评论
0 收藏