--------------------- PatchSet 4009 Date: 2007/01/25 08:25:08 Author: amosjeffries Branch: squid3-ipv6 Tag: (none) Log: syncing with HEAD. Members: doc/rfc/draft-coar-cgi-v11-04.txt:1.1->1.1.6.1(DEAD) --- squid3/doc/rfc/draft-coar-cgi-v11-04.txt Wed Feb 14 13:38:35 2007 +++ /dev/null Wed Feb 14 13:37:19 2007 @@ -1,1904 +0,0 @@ - - - -INTERNET-DRAFT David Robinson -draft-coar-cgi-v11-04.txt Apache Software Foundation -Expires 18 April 2004 Ken A.L. Coar - IBM Corporation - 19 October 2003 - - - The Common Gateway Interface (CGI) Version 1.1 - - -Status of this Memo - - This document is an Internet-Draft and is in full conformance with - all provisions of Section 10 of RFC2026. - - Internet-Drafts are working documents of the Internet Engineering - Task Force (IETF), its areas, and its working groups. Note that - other groups may also distribute working documents as - Internet-Drafts. - - Internet-Drafts are draft documents valid for a maximum of six months - and may be updated, replaced, or obsoleted by other documents at any - time. It is inappropriate to use Internet-Drafts as reference - material or to cite them other than as 'work in progress'. - - The list of current Internet-Drafts can be accessed at - http://www.ietf.org/ietf/1id-abstracts.txt. - - The list of Internet-Draft Shadow Directories can be accessed at - http://www.ietf.org/shadow.html. - - Distribution of this document is unlimited. Please send comments to - the authors, or via the CGI-WG mailing list; see the project Web page - at . - -Abstract - - The Common Gateway Interface (CGI) is a simple interface for running - external programs, software or gateways under an information server - in a platform-independent manner. Currently, the supported - information servers are HTTP servers. - - The interface has been in use by the World-Wide Web since 1993. This - specification defines the 'current practice' parameters of the - 'CGI/1.1' interface developed and documented at the U.S. National - Centre for Supercomputing Applications. This document also defines - the use of the CGI/1.1 interface on UNIX(R) and other, similar - systems. - - - -Robinson & Coar Expires 18 April 2004 [Page 1] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - -Contents - - 1 Introduction 4 - 1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . 4 - 1.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . 4 - 1.3 Specifications . . . . . . . . . . . . . . . . . . . . . . 4 - 1.4 Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 - - 2 Notational Conventions and Generic Grammar 5 - 2.1 Augmented BNF . . . . . . . . . . . . . . . . . . . . . . 5 - 2.2 Basic Rules . . . . . . . . . . . . . . . . . . . . . . . 6 - 2.3 URL Encoding . . . . . . . . . . . . . . . . . . . . . . . 7 - - 3 Invoking the Script 8 - 3.1 Server Responsibilities . . . . . . . . . . . . . . . . . 8 - 3.2 Script Selection . . . . . . . . . . . . . . . . . . . . . 8 - 3.3 The Script-URI . . . . . . . . . . . . . . . . . . . . . . 9 - 3.4 Execution . . . . . . . . . . . . . . . . . . . . . . . . 10 - - 4 The CGI Request 10 - 4.1 Request Meta-Variables . . . . . . . . . . . . . . . . . . 10 - 4.1.1 AUTH_TYPE . . . . . . . . . . . . . . . . . . . . . 11 - 4.1.2 CONTENT_LENGTH . . . . . . . . . . . . . . . . . . 11 - 4.1.3 CONTENT_TYPE . . . . . . . . . . . . . . . . . . . 12 - 4.1.4 GATEWAY_INTERFACE . . . . . . . . . . . . . . . . . 13 - 4.1.5 PATH_INFO . . . . . . . . . . . . . . . . . . . . . 13 - 4.1.6 PATH_TRANSLATED . . . . . . . . . . . . . . . . . . 14 - 4.1.7 QUERY_STRING . . . . . . . . . . . . . . . . . . . 15 - 4.1.8 REMOTE_ADDR . . . . . . . . . . . . . . . . . . . . 15 - 4.1.9 REMOTE_HOST . . . . . . . . . . . . . . . . . . . . 16 - 4.1.10 REMOTE_IDENT . . . . . . . . . . . . . . . . . . . 16 - 4.1.11 REMOTE_USER . . . . . . . . . . . . . . . . . . . . 16 - 4.1.12 REQUEST_METHOD . . . . . . . . . . . . . . . . . . 16 - 4.1.13 SCRIPT_NAME . . . . . . . . . . . . . . . . . . . . 17 - 4.1.14 SERVER_NAME . . . . . . . . . . . . . . . . . . . . 17 - 4.1.15 SERVER_PORT . . . . . . . . . . . . . . . . . . . . 17 - 4.1.16 SERVER_PROTOCOL . . . . . . . . . . . . . . . . . . 18 - 4.1.17 SERVER_SOFTWARE . . . . . . . . . . . . . . . . . . 18 - 4.1.18 Protocol-Specific Meta-Variables . . . . . . . . . 18 - 4.2 Request Message-Body . . . . . . . . . . . . . . . . . . . 19 - 4.3 Request Methods . . . . . . . . . . . . . . . . . . . . . 20 - 4.3.1 GET . . . . . . . . . . . . . . . . . . . . . . . . 20 - 4.3.2 POST . . . . . . . . . . . . . . . . . . . . . . . 20 - 4.3.3 HEAD . . . . . . . . . . . . . . . . . . . . . . . 20 - 4.3.4 Protocol-Specific Methods . . . . . . . . . . . . . 20 - 4.4 The Script Command Line . . . . . . . . . . . . . . . . . 21 - - 5 NPH Scripts 21 - 5.1 Identification . . . . . . . . . . . . . . . . . . . . . . 21 - - -Robinson & Coar Expires 18 April 2004 [Page 2] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - 5.2 NPH Response . . . . . . . . . . . . . . . . . . . . . . . 22 - - 6 CGI Response 22 - 6.1 Response Handling . . . . . . . . . . . . . . . . . . . . 22 - 6.2 Response Types . . . . . . . . . . . . . . . . . . . . . . 22 - 6.2.1 Document Response . . . . . . . . . . . . . . . . . 23 - 6.2.2 Local Redirect Response . . . . . . . . . . . . . . 23 - 6.2.3 Client Redirect Response . . . . . . . . . . . . . 23 - 6.2.4 Client Redirect Response with Document . . . . . . 24 - 6.3 Response Header Fields . . . . . . . . . . . . . . . . . . 24 - 6.3.1 Content-Type . . . . . . . . . . . . . . . . . . . 24 - 6.3.2 Location . . . . . . . . . . . . . . . . . . . . . 25 - 6.3.3 Status . . . . . . . . . . . . . . . . . . . . . . 26 - 6.3.4 Protocol-Specific Header Fields . . . . . . . . . . 26 - 6.3.5 Extension Header Fields . . . . . . . . . . . . . . 27 - 6.4 Response Message-Body . . . . . . . . . . . . . . . . . . 27 - - 7 System Specifications 27 - 7.1 AmigaDOS . . . . . . . . . . . . . . . . . . . . . . . . . 27 - 7.2 UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 - 7.3 EBCDIC/POSIX . . . . . . . . . . . . . . . . . . . . . . . 28 - - 8 Implementation 29 - 8.1 Recommendations for Servers . . . . . . . . . . . . . . . 29 - 8.2 Recommendations for Scripts . . . . . . . . . . . . . . . 29 - - 9 Security Considerations 30 - 9.1 Safe Methods . . . . . . . . . . . . . . . . . . . . . . . 30 - 9.2 Header Fields Containing Sensitive Information . . . . . . 30 - 9.3 Data Privacy . . . . . . . . . . . . . . . . . . . . . . . 30 - 9.4 Information Security Model . . . . . . . . . . . . . . . . 30 - 9.5 Script Interference with the Server . . . . . . . . . . . 30 - 9.6 Data Length and Buffering Considerations . . . . . . . . . 31 - 9.7 Stateless Processing . . . . . . . . . . . . . . . . . . . 31 - 9.8 Relative Paths . . . . . . . . . . . . . . . . . . . . . . 32 - 9.9 Non-parsed Header Output . . . . . . . . . . . . . . . . . 32 - - 10 Acknowledgements 32 - - 11 References 32 - - 12 Authors' Addresses 34 - - - - - - - - - -Robinson & Coar Expires 18 April 2004 [Page 3] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - -1 Introduction - -1.1 Purpose - - The Common Gateway Interface (CGI) [21] allows an HTTP [2], [8] - server and a CGI script to share responsibility for responding to - client requests. The client request comprises a Universal Resource - Identifier (URI) [1], a request method and various ancillary - information about the request provided by the transport protocol. - - The CGI defines the abstract parameters, known as meta-variables, - which describe the client's request. Together with a concrete - programmer interface this specifies a platform-independent interface - between the script and the HTTP server. - - The server is responsible for managing connection, data transfer, - transport and network issues related to the client request, whereas - the CGI script handles the application issues, such as data access - and document processing. - -1.2 Requirements - - The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL NOT', - 'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'MAY' and 'OPTIONAL' in this - document are to be interpreted as described in RFC 2119 [5]. - - An implementation is not compliant if it fails to satisfy one or more - of the 'must' requirements for the protocols it implements. An - implementation that satisfies all of the 'must' and all of the - 'should' requirements for its features is said to be 'unconditionally - compliant'; one that satisfies all of the 'must' requirements but not - all of the 'should' requirements for its features is said to be - 'conditionally compliant'. - -1.3 Specifications - - Not all of the functions and features of the CGI are defined in the - main part of this specification. The following phrases are used to - describe the features that are not specified: - - 'system defined' - The feature may differ between systems, but must be the same for - different implementations using the same system. A system will - usually identify a class of operating-systems. Some systems are - defined in section 7 of this document. New systems may be defined - by new specifications without revision of this document. - - 'implementation defined' - - - -Robinson & Coar Expires 18 April 2004 [Page 4] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - The behaviour of the feature may vary from implementation to - implementation; a particular implementation must document its - behaviour. - -1.4 Terminology - - This specification uses many terms defined in the HTTP/1.1 - specification [8]; however, the following terms are used here in a - sense which may not accord with their definitions in that document, - or with their common meaning. - - 'meta-variable' - A named parameter which carries information from the server to the - script. It is not necessarily a variable in the operating- - system's environment, although that is the most common - implementation. - - 'script' - The software that is invoked by the server according to this - interface. It need not be a standalone program, but could be a - dynamically-loaded or shared library, or even a subroutine in the - server. It might be a set of statements interpreted at run-time, - as the term 'script' is frequently understood, but that is not a - requirement and within the context of this specification the term - has the broader definition stated. - - 'server' - The application program that invokes the script in order to - service requests from the client. - -2 Notational Conventions and Generic Grammar - -2.1 Augmented BNF - - All of the mechanisms specified in this document are described in - both prose and an augmented Backus-Naur Form (BNF) similar to that - used by RFC 822 [6]. Unless stated otherwise, the elements are - case-sensitive. This augmented BNF contains the following - constructs: - - name = definition - The name of a rule and its definition are separated by the equals - character ('='). Whitespace is only significant in that - continuation lines of a definition are indented. - - "literal" - Double quotation marks (") surround literal text, except for a - literal quotation mark, which is surrounded by angle-brackets ('<' - - - -Robinson & Coar Expires 18 April 2004 [Page 5] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - and '>'). - - rule1 | rule2 - Alternative rules are separated by a vertical bar ('|'). - - (rule1 rule2 rule3) - Elements enclosed in parentheses are treated as a single element. - - *rule - A rule preceded by an asterisk ('*') may have zero or more - occurrences. The full form is 'n*m rule' indicating at least n - and at most m occurrences of the rule. n and m are optional - decimal values with default values of 0 and infinity respectively. - - [rule] - An element enclosed in square brackets ('[' and ']') is optional, - and is equivalent to '*1 rule'. - - N rule - A rule preceded by a decimal number represents exactly N - occurrences of the rule. It is equivalent to 'N*N rule'. - -2.2 Basic Rules - - This specification uses a BNF-like grammar defined in terms of - characters. Unlike many specifications which define the bytes - allowed by a protocol, here each literal in the grammar corresponds - to the character it represents. How these characters are represented - in terms of bits and bytes within a a system are either - system-defined or specified in the particular context. The single - exception is the rule 'OCTET', defined below. - - The following rules are used throughout this specification to - describe basic parsing constructs. - - alpha = lowalpha | hialpha - lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | - "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | - "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | - "y" | "z" - hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | - "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | - "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | - "Y" | "Z" - digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | - "8" | "9" - alphanum = alpha | digit - OCTET = - - - -Robinson & Coar Expires 18 April 2004 [Page 6] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - CHAR = alpha | digit | separator | "!" | "#" | "$" | - "%" | "&" | "'" | "*" | "+" | "-" | "." | "`" | - "^" | "_" | "{" | "|" | "}" | "~" | CTL - CTL = - SP = - HT = - NL = - LWSP = SP | HT | NL - separator = "(" | ")" | "<" | ">" | "@" | "," | ";" | ":" | - "\" | <"> | "/" | "[" | "]" | "?" | "=" | "{" | - "}" | SP | HT - token = 1* - quoted-string = <"> *qdtext <"> - qdtext = and CTLs but including LWSP> - TEXT = - - Note that newline (NL) need not be a single control character, but - can be a sequence of control characters. A system MAY define TEXT to - be a larger set of characters than . - -2.3 URL Encoding - - Some variables and constructs used here are described as being - 'URL-encoded'. This encoding is described in section 2 of RFC 2396 - [3]. In a URL-encoded string an escape sequence consists of a - percent character ("%") followed by two hexadecimal digits, where the - two hexadecimal digits form an octet. An escape sequence represents - the graphic character that has the octet as its code within the - US-ASCII [20] coded character set, if it exists. Currently there is - no provision within the URI syntax to identify which character set - non-ASCII codes represent, so CGI handles this issue on an ad-hoc - basis. - - Note that some unsafe (reserved) characters may have different - semantics when encoded. The definition of which characters are - unsafe depends on the context; see section 2 of RFC 2396 [3], updated - by RFC 2732 [11], for an authoritative treatment. These reserved - characters are generally used to provide syntactic structure to the - character string, for example as field separators. In all cases, the - string is first processed with regard to any reserved characters - present, and then the resulting data can be URL-decoded by replacing - "%" escapes by their character values. - - To encode a character string, all reserved and forbidden characters - are replaced by the corresponding "%" escapes. The string can then - be used in assembling a URI. The reserved characters will vary from - context to context, but will always be drawn from this set: - - - -Robinson & Coar Expires 18 April 2004 [Page 7] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | - "," | "[" | "]" - - The last two characters were added by RFC 2732 [11]. In any - particular context, a sub-set of these characters will be reserved; - the other characters from this set MUST NOT be encoded when a string - is URL-encoded in that context. Other basic rules used to describe - URI syntax are: - - hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" - | "c" | "d" | "e" | "f" - escaped = "%" hex hex - unreserved = alpha | digit | mark - mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")" - -3 Invoking the Script - -3.1 Server Responsibilities - - The server acts as an application gateway. It receives the request - from the client, selects a CGI script to handle the request, converts - the client request to a CGI request, executes the script and converts - the CGI response into a response for the client. When processing the - client request, it is responsible for implementing any protocol or - transport level authentication and security. The server MAY also - function in a 'non-transparent' manner, modifying the request or - response in order to provide some additional service, such as media - type transformation or protocol reduction. - - The server MUST perform translations and protocol conversions on the - client request data required by this specification. Furthermore, the - server retains its responsibility to the client to conform to the - relevant network protocol even if the CGI script fails to conform to - this specification. - - If the server is applying authentication to the request, then it MUST - NOT execute the script unless the request passes all defined access - controls. - -3.2 Script Selection - - The server determines which CGI is script to be executed based on a - generic-form URI supplied by the client. This URI includes a - hierarchical path with components separated by "/". For any - particular request, the server will identify all or a leading part of - this path with an individual script, thus placing the script at a - particular point in the path hierarchy. The remainder of the path, - if any, is a resource or sub-resource identifier to be interpreted by - - - -Robinson & Coar Expires 18 April 2004 [Page 8] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - the script. - - Information about this split of the path is available to the script - in the meta-variables, described below. Support for non-hierarchical - URI schemes is outside the scope of this specification. - -3.3 The Script-URI - - The mapping from client request URI to choice of script is defined by - the particular server implementation and its configuration. The - server may allow the script to be identified with a set of several - different URI path hierarchies, and therefore is permitted to replace - the URI by other members of this set during processing and generation - of the meta-variables. The server - - 1. MAY preserve the URI in the particular client request; or - - 2. MAY select a canonical URI from the set of possible values for - each script; or - - 3. can implement any other selection of URI from the set. - - From the meta-variables thus generated, a URI, the 'Script-URI', can - be constructed. This MUST have the property that if the client had - accessed this URI instead, then the script would have been executed - with the same values for the SCRIPT_NAME, PATH_INFO and QUERY_STRING - meta-variables. The Script-URI has the structure of a generic URI as - defined in section 3 of RFC 2396 [3], with the exception that object - parameters and fragment identifiers are not permitted. The various - components of the Script-URI are defined by some of the - meta-variables (see below); - - script-URI = "://" ":" - "?" - - where is found from SERVER_PROTOCOL, , - and are the values of the respective - meta-variables. The SCRIPT_NAME and PATH_INFO values, URL-encoded - with ";", "=" and "?" reserved, give and . - See section 4.1.5 for more information about the PATH_INFO - meta-variable. - - The scheme and the protocol are not identical as the scheme - identifies the access method in addition to the protocol. For - example, a resource accessed using Transport Layer Security (TLS) [7] - would have a request URI with a scheme of https when using the HTTP - protocol [16]. CGI/1.1 provides no generic means for the script to - reconstruct this, and therefore the Script-URI as defined includes - - - -Robinson & Coar Expires 18 April 2004 [Page 9] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - the base protocol used. However, a script MAY make use of - scheme-specific meta-variables to better deduce the URI scheme. - - Note that this definition also allows URIs to be constructed which - would invoke the script with any permitted values for the path-info - or query-string, by modifying the appropriate components. - -3.4 Execution - - The script is invoked in a system defined manner. Unless specified - otherwise, the file containing the script will be invoked as an - executable program. The server prepares the CGI request as described - in section 4; this comprises the request meta-variables (immediately - available to the script on execution) and request message data. The - request data need not be immediately available to the script; the - script can be executed before all this data has been received by the - server from the client. The response from the script is returned to - the server as described in sections 5 and 6. - - In the event of an error condition, the server can interrupt or - terminate script execution at any time and without warning. That - could occur, for example, in the event of a transport failure between - the server and the client; so the script SHOULD be prepared to handle - abnormal termination. - -4 The CGI Request - - Information about a request comes from two different sources; the - request meta-variables and any associated message-body. - -4.1 Request Meta-Variables - - Meta-variables contain data about the request passed from the server - to the script, and are accessed by the script in a system defined - manner. Meta-variables are identified by case-insensitive names; - there cannot be two different variables whose names differ in case - only. Here they are shown using a canonical representation of - capitals plus underscore ("_"). A particular system can define a - different representation. - - meta-variable-name = "AUTH_TYPE" | "CONTENT_LENGTH" | - "CONTENT_TYPE" | "GATEWAY_INTERFACE" | - "PATH_INFO" | "PATH_TRANSLATED" | - "QUERY_STRING" | "REMOTE_ADDR" | - "REMOTE_HOST" | "REMOTE_IDENT" | - "REMOTE_USER" | "REQUEST_METHOD" | - "SCRIPT_NAME" | "SERVER_NAME" | - "SERVER_PORT" | "SERVER_PROTOCOL" | - - - -Robinson & Coar Expires 18 April 2004 [Page 10] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - "SERVER_SOFTWARE" | scheme | - protocol-var-name | extension-var-name - protocol-var-name = ( protocol | scheme ) "_" var-name - scheme = alpha *( alpha | digit | "+" | "-" | "." ) - var-name = token - extension-var-name = token - - Meta-variables with the same name as a scheme, and names beginning - with the name of a protocol or scheme (e.g. HTTP_ACCEPT) are also be - specified. The number and meaning of these variables may change - independently of this specification. (See also section 4.1.18.) - - The server MAY define additional implementation-specific extension - meta-variables, whose names SHOULD be prefixed with "X_". - - This specification does not distinguish between zero-length (NULL) - values and missing values. For example, a script cannot distinguish - between the two requests http://host/script and http://host/script? - as in both cases the QUERY_STRING meta-variable would be NULL. - - meta-variable-value = "" | 1* - - An optional meta-variable may be omitted (left unset) if its value is - NULL. Meta-variable values MUST be considered case-sensitive except - as noted otherwise. The representation of the characters in the - meta-variables is system defined; the server MUST convert values to - that representation. - -4.1.1 AUTH_TYPE - - The AUTH_TYPE variable identifies any mechanism used by the server to - authenticate the user. It contains a case-insensitive value defined - by the client protocol or server implementation. - - For HTTP, If the client request required authentication for external - access, then the server MUST set the value of this variable from the - 'auth-scheme' token in the request Authorization header field. - - AUTH_TYPE = "" | auth-scheme - auth-scheme = "Basic" | "Digest" | extension-auth - extension-auth = token - - HTTP access authentication schemes are described in RFC 2617 [9]. - -4.1.2 CONTENT_LENGTH - - The CONTENT_LENGTH variable contains the size of the message-body - attached to the request, if any, in decimal number of octets. If no - - - -Robinson & Coar Expires 18 April 2004 [Page 11] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - data is attached, then NULL (or unset). - - CONTENT_LENGTH = "" | 1*digit - - The server MUST set this meta-variable if and only if the request is - accompanied by a message-body entity. The CONTENT_LENGTH value must - reflect the length of the message-body after the server has removed - any transfer-codings or content-codings. - -4.1.3 CONTENT_TYPE - - If the request includes a message-body, the CONTENT_TYPE variable is - set to the Internet Media Type [10] of the message-body. - - CONTENT_TYPE = "" | media-type - media-type = type "/" subtype *( ";" parameter ) - type = token - subtype = token - parameter = attribute "=" value - attribute = token - value = token | quoted-string - - The type, subtype and parameter attribute names are not case- - sensitive. Parameter values may be case sensitive. Media types and - their use in HTTP are described section 3.7 of the HTTP/1.1 - specification [8]. - - There is no default value for this variable. If and only if it is - unset, then the script MAY attempt to determine the media type from - the data received. If the type remains unknown, then the script MAY - choose to assume a type of application/octet-stream or it may reject - the request with an error (as described in section 6.3.3). - - Each media-type defines a set of optional and mandatory parameters. - This may include a charset parameter with a case-insensitive value - defining the coded character set for the message-body. If the - charset parameter is omitted, then the default value should be - derived according to whichever of the following rules is the first to - apply: - - 1. There MAY be a system-defined default charset for some - media-types. - - 2. The default for media-types of type "text" is ISO-8859-1 [8]. - - 3. Any default defined in the media-type specification. - - 4. The default is US-ASCII. - - - -Robinson & Coar Expires 18 April 2004 [Page 12] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - The server MUST set this meta-variable if an HTTP Content-Type field - is present in the client request header. If the server receives a - request with an attached entity but no Content-Type header field, it - MAY attempt to determine the correct content type, otherwise it - should omit this meta-variable. - -4.1.4 GATEWAY_INTERFACE - - The GATEWAY_INTERFACE variable MUST be set to the dialect of CGI - being used by the server to communicate with the script. Syntax: - - GATEWAY_INTERFACE = "CGI" "/" 1*digit "." 1*digit - - Note that the major and minor numbers are treated as separate - integers and hence each may be incremented higher than a single - digit. Thus CGI/2.4 is a lower version than CGI/2.13 which in turn - is lower than CGI/12.3. Leading zeros MUST be ignored by the script - and MUST NOT be generated by the server. - - This document defines the 1.1 version of the CGI interface. - -4.1.5 PATH_INFO - - The PATH_INFO variable specifies a path to be interpreted by the CGI - script. It identifies the resource or sub-resource to be returned by - the CGI script, and is derived from the the portion of the URI path - hierarchy following the part that identifies the script itself. - Unlike a URI path, the PATH_INFO is not URL-encoded, and cannot - contain path-segment parameters. A PATH_INFO of "/" represents a - single void path segment. - - PATH_INFO = "" | ( "/" path ) - path = lsegment *( "/" lsegment ) - lsegment = *lchar - lchar = - - The value is considered case-sensitive and the server MUST preserve - the case of the path as presented in the request URI. The server MAY - impose restrictions and limitations on what values it permits for - PATH_INFO, and MAY reject the request with an error if it encounters - any values considered objectionable. That MAY include any requests - that would result in an encoded "/" being decoded into PATH_INFO, as - this might represent a loss of information to the script. Similarly, - treatment of non US-ASCII characters in the path is system defined. - - URL-encoded, the PATH_INFO string forms the extra-path component of - the Script-URI (see section 3.3) which follows the SCRIPT_NAME part - of that path. - - - -Robinson & Coar Expires 18 April 2004 [Page 13] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - -4.1.6 PATH_TRANSLATED - - The PATH_TRANSLATED variable is derived by taking the PATH_INFO - value, parsing it as a local URI in its own right, and performing any - virtual-to-physical translation appropriate to map it onto the - server's document repository structure. The set of characters - permitted in the result is system defined. - - PATH_TRANSLATED = * - - This is the file location that would be accessed by a request for - - "://" ":" - - where is the scheme for the original client request and - is a URL-encoded version of PATH_INFO, with ";", "=" and - "?" reserved. For example, a request such as the following: - - http://somehost.com/cgi-bin/somescript/this%2eis%2epath%3binfo - - would result in a PATH_INFO value of - - /this.is.the.path;info - - An internal URI is constructed from the scheme, server location and - the URL-encoded PATH_INFO: - - http://somehost.com/this.is.the.path%3binfo - - This would then be translated to a location in the server's document - repository, perhaps a filesystem path something like this: - - /usr/local/www/htdocs/this.is.the.path;info - - The result of the translation is the value of PATH_TRANSLATED. - - The value of PATH_TRANSLATED is derived in this way irrespective of - whether it maps to a valid repository location. The server MUST - preserve the case of the extra-path segment unless the underlying - repository supports case-insensitive names. If the repository is - only case-aware, case-preserving, or case-blind with regard to - document names, the server is not required to preserve the case of - the original segment through the translation. - - The translation algorithm the server uses to derive PATH_TRANSLATED - is implementation defined; CGI scripts which use this variable may - suffer limited portability. - - - - -Robinson & Coar Expires 18 April 2004 [Page 14] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - The server SHOULD set this meta-variable if the request URI includes - a path-info component. If PATH_INFO is NULL, then the - PATH_TRANSLATED variable MUST be set to NULL (or unset). - -4.1.7 QUERY_STRING - - The QUERY_STRING variable contains a URL-encoded search or parameter - string; it provides information to the CGI script to affect or refine - the document to be returned by the script. - - The URL syntax for a search string is described in section 3 of RFC - 2396 [3]. The QUERY_STRING value is case-sensitive. - - QUERY_STRING = query-string - query-string = *uric - uric = reserved | unreserved | escaped - - When parsing and decoding the query string, the details of the - parsing, reserved characters and support for non US-ASCII characters - depends on the context. For example, form submission from an HTML - document [15] uses application/x-www-form-urlencoded encoding, in - which the characters "+", "&" and "=" are reserved, and the ISO - 8859-1 encoding may be used for non US-ASCII characters. - - The QUERY_STRING value provides the query-string part of the - Script-URI. (See section 3.3). - - The server MUST set this variable; if the Script-URI does not include - a query component, the QUERY_STRING MUST be defined as an empty - string (""). - -4.1.8 REMOTE_ADDR - - The REMOTE_ADDR variable MUST be set to the network address of the - client sending the request to the server. - - REMOTE_ADDR = hostnumber - hostnumber = ipv4-address | ipv6-address - ipv4-address = 1*3digit "." 1*3digit "." 1*3digit "." 1*3digit - ipv6-address = hexpart [ ":" ipv4-address ] - hexpart = hexseq | ( [ hexseq ] "::" [ hexseq ] ) - hexseq = 1*4hex *( ":" 1*4hex ) - - The format of IPv6 addresses is defined in RFC 2373 [12]. - - - - - - - -Robinson & Coar Expires 18 April 2004 [Page 15] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - -4.1.9 REMOTE_HOST - - The REMOTE_HOST variable contains the fully qualified domain name of - the client sending the request to the server, if available, otherwise - NULL. Fully qualified domain names take the form as described in - section 3.5 of RFC 1034 [14] and section 2.1 of RFC 1123 [4]. Domain - names are not case sensitive. - - REMOTE_HOST = "" | hostname | hostnumber - hostname = *( domainlabel "." ) toplabel [ "." ] - domainlabel = alphanum [ *alphahypdigit alphanum ] - toplabel = alpha [ *alphahypdigit alphanum ] - alphahypdigit = alphanum | "-" - - The server SHOULD set this variable. If the hostname is not - available for performance reasons or otherwise, the server MAY - substitute the REMOTE_ADDR value. - -4.1.10 REMOTE_IDENT - - The REMOTE_IDENT variable MAY be used to provide identity information - reported about the connection by an RFC 1413 [17] request to the - remote agent, if available. The server may choose not to support - this feature, or not to request the data for efficiency reasons, or - not to return available identity data. - - REMOTE_IDENT = *TEXT - - The data returned may be used for authentication purposes, but the - level of trust reposed in it should be minimal. - -4.1.11 REMOTE_USER - - The REMOTE_USER variable provides a user identification string - supplied by client as part of user authentication. - - REMOTE_USER = *TEXT - - If the client request required HTTP Authentication [9] (e.g. the - AUTH_TYPE meta-variable is set to "Basic" or "Digest"), then the - value of the REMOTE_USER meta-variable MUST be set to the user-ID - supplied. - -4.1.12 REQUEST_METHOD - - The REQUEST_METHOD meta-variable MUST be set to the method which - should be used by the script to process the request, as described in - section 4.3. - - - -Robinson & Coar Expires 18 April 2004 [Page 16] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - REQUEST_METHOD = method - method = "GET" | "POST" | "HEAD" | extension-method - extension-method = "PUT" | "DELETE" | token - - The method is case sensitive. The HTTP methods are described in - section 5.1.1 of the HTTP/1.0 specification [2] and section 5.1.1 of - the HTTP/1.1 specification [8]. - -4.1.13 SCRIPT_NAME - - The SCRIPT_NAME variable MUST be set to a URI path (not URL-encoded) - which could identify the CGI script (rather then the script's - output). The syntax is the same as for PATH_INFO (section 4.1.5) - - SCRIPT_NAME = "" | ( "/" path ) - - The leading "/" is not part of the path. It is optional if the path - is NULL; however, the variable MUST still be set in that case. - - The SCRIPT_NAME string forms some leading part of the path component - of the Script-URI derived in some implementation defined manner. No - PATH_INFO segment (see section 4.1.5) is included in the SCRIPT_NAME - value. - -4.1.14 SERVER_NAME - - The SERVER_NAME variable MUST be set to the name of the server host - to which the client request is directed. It is a case-insensitive - hostname or network address. It forms the host part of the - Script-URI. The syntax for an IPv6 address in a URI is defined in - RFC 2373 [12]. - - SERVER_NAME = server-name - server-name = hostname | ipv4-address | ( "[" ipv6-address "]" ) - - A deployed server can have more than one possible value for this - variable, where several HTTP virtual hosts share the same IP address. - In that case, the server uses the contents of the Host header field - to select the correct virtual host. - -4.1.15 SERVER_PORT - - The SERVER_PORT variable MUST be set to the TCP/IP port number on - which this request is received from the client. This value is used - in the port part of the Script-URI. - - SERVER_PORT = server-port - server-port = 1*digit - - - -Robinson & Coar Expires 18 April 2004 [Page 17] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - Note that this variable MUST be set, even if the port is the default - port for the scheme and could otherwise be omitted from a URI. - -4.1.16 SERVER_PROTOCOL - - The SERVER_PROTOCOL variable MUST be set to the name and version of - the application protocol used for this CGI request. This is not - necessarily the same as the protocol version used by the server in - its communication with the client. - - SERVER_PROTOCOL = HTTP-Version | "INCLUDED" | extension-version - HTTP-Version = "HTTP" "/" 1*digit "." 1*digit - extension-version = protocol [ "/" 1*digit "." 1*digit ] - protocol = token - - 'protocol' is a version of the scheme part of the Script-URI, and is - not case sensitive. By convention, 'protocol' is in upper case. The - protocol may not be identical to the scheme of the request; for - example, the request may have scheme "https", whilst the protocol is - "HTTP". - - A well-known value for SERVER_PROTOCOL which the server MAY use is - "INCLUDED", which signals that the current document is being included - as part of a composite document, rather than being the direct target - of the client request. The script should treat this as an HTTP/1.0 - request. - -4.1.17 SERVER_SOFTWARE - - The SERVER_SOFTWARE meta-variable MUST be set to the name and version - of the information server software making the CGI request (and - running the gateway). It SHOULD be the same as the server - description reported to the client, if any. - - SERVER_SOFTWARE = 1*( product | comment ) - product = token [ "/" product-version ] - product-version = token - comment = "(" *( ctext | comment ) ")" - ctext = - -4.1.18 Protocol-Specific Meta-Variables - - The server SHOULD set meta-variables specific to the protocol and - scheme for the request. Interpretation of protocol-specific - variables depends on the protocol version in SERVER_PROTOCOL. The - server MAY set a meta-variable with the name of the scheme to a - non-NULL value if the scheme is not the same as the protocol. The - presence of such a variable indicates to a script which scheme is - - - -Robinson & Coar Expires 18 April 2004 [Page 18] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - used by the request. - - Meta-variables with names beginning with "HTTP_" contain values read - from the client request header fields, if the protocol used is HTTP. - The HTTP header field name is converted to upper case, has all - occurrences of "-" replaced with "_" and has "HTTP_" prepended to - give the meta-variable name. The header data can be presented as - sent by the client, or can be rewritten in ways which do not change - its semantics. If multiple header fields with the same field-name - are received then the server MUST rewrite them as a single value - having the same semantics. Similarly, a header field that spans - multiple lines must be merged onto a single line. The server MUST, - if necessary, change the representation of the data (for example, the - character set) to be appropriate for a CGI meta-variable. - - The server is not required to create meta-variables for all the - header fields that it receives. In particular, it SHOULD remove any - header fields carrying authentication information, such as - 'Authorization'; or that are available to the script in other - variables, such as 'Content-Length' and 'Content-Type'. The server - MAY remove header fields that relate solely to client-side - communication issues, such as 'Connection'. - -4.2 Request Message-Body - - Request data is accessed by the script in a system-defined method; - unless defined otherwise, this will be by reading the 'standard - input' file descriptor or file handle. - - Request-Data = [ request-body ] [ extension-data ] - request-body = OCTET - extension-data = *OCTET - - A request-body is supplied with the request if the CONTENT_LENGTH is - not NULL. The server MUST make at least that many bytes available - for the script to read. The server MAY signal an end-of-file - condition after CONTENT_LENGTH bytes have been read or it MAY supply - extension data. Therefore, the script MUST NOT attempt to read more - than CONTENT_LENGTH bytes, even if more data is available. However, - it is not obliged to read any of the data. - - For non-parsed header (NPH) scripts (section 5), the server SHOULD - attempt to ensure that the data supplied to the script is precisely - as supplied by the client and is unaltered by the server. - - As transfer-codings are not supported on the request-body, the server - MUST remove any such codings from the message-body, and recalculate - the CONTENT_LENGTH. If this is not possible (for example, because of - - - -Robinson & Coar Expires 18 April 2004 [Page 19] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - large buffering requirements), the server SHOULD reject the client - request. It MAY also remove content-codings from the message-body. - -4.3 Request Methods - - The Request Method, as supplied in the REQUEST_METHOD meta-variable, - identifies the processing method to be applied by the script in - producing a response. The script author can choose to implement the - methods most appropriate for the particular application. If the - script receives a request with a method it does not support it SHOULD - reject it with an error (see section 6.3.3). - -4.3.1 GET - - The GET method method indicates that the script should produce a - document based on the meta-variable values. By convention, the GET - method is 'safe' and 'idempotent' and SHOULD NOT have the the - significance of taking an action other than producing a document. - - The meaning of the GET method may be modified and refined by - protocol-specific meta-variables. - -4.3.2 POST - - The POST method is used to request the script perform processing and - produce a document based on the data in the request message-body, in - addition to meta-variable values. A common use is form submission in - HTML [15], intended to initiate processing by the script that has a - permanent affect, such a change in a database. - - The script MUST check the value of the CONTENT_LENGTH variable before - reading the attached message-body, and SHOULD check the CONTENT_TYPE - value before processing it. - -4.3.3 HEAD - - The HEAD method requests the script to do sufficient processing to - return the response header fields, without providing a response - message-body. The script MUST NOT provide a response message-body - for a HEAD request. If it does, then the server MUST discard the - message-body when reading the response. - -4.3.4 Protocol-Specific Methods - - The script MAY implement any protocol-specific method, such as - HTTP/1.1 PUT and DELETE; it SHOULD check the value of SERVER_PROTOCOL - when doing so. - - - - -Robinson & Coar Expires 18 April 2004 [Page 20] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - The server MAY decide that some methods are not appropriate or - permitted for a script, and may handle the methods itself or return - an error to the client. - -4.4 The Script Command Line - - Some systems support a method for supplying an array of strings to - the CGI script. This is only used in the case of an 'indexed' HTTP - query, which is identified by a 'GET' or 'HEAD' request with a URI - query string that does not contain any unencoded "=" characters. For - such a request, the server SHOULD treat the query-string as a - search-string and parse it into words, using the rules - - search-string = search-word *( "+" search-word ) - search-word = 1*schar - schar = unreserved | escaped | xreserved - xreserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "," | - "$" - - After parsing, each search-word is URL-decoded, optionally encoded in - a system defined manner and then added to the argument list. - - If the server cannot create any part of the argument list, then the - server MUST NOT generate any command line information. For example, - the number of arguments may be greater than operating system or - server limits, or one of the words may not be representable as an - argument. - - The script SHOULD check to see if the QUERY_STRING value contains an - unencoded "=" character, and SHOULD NOT use the command line - arguments if it does. - -5 NPH Scripts - -5.1 Identification - - The server MAY support NPH (Non-Parsed Header) scripts; these are - scripts to which the server passes all responsibility for response - processing. - - This specification provides no mechanism for an NPH script to be - identified on the basis of its output data alone. By convention, - therefore, any particular script can only ever provide output of one - type (NPH or CGI) and hence the script itself is described as an 'NPH - script'. A server with NPH support MUST provide an implementation- - defined mechanism for identifying NPH scripts, perhaps based on the - name or location of the script. - - - - -Robinson & Coar Expires 18 April 2004 [Page 21] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - -5.2 NPH Response - - There MUST be a system defined method for the script to send data - back to the server or client; a script MUST always return some data. - Unless defined otherwise, this will be the same as for conventional - CGI scripts. - - Currently, NPH scripts are only defined for HTTP client requests. An - (HTTP) NPH script MUST return a complete HTTP response message, - currently described in section 6 of the HTTP specifications [2], [8]. - The script MUST use the SERVER_PROTOCOL variable to determine the - appropriate format for a response. It MUST also take account of any - generic or protocol-specific meta-variables in the request as might - be mandated by the particular protocol specification. - - The server MUST ensure that the script output is sent to the client - unmodified. Note that this requires the script to use the correct - character set (US-ASCII [20] and ISO 8859-1 [21] for HTTP) in the - header fields. The server SHOULD attempt to ensure that the script - output is sent directly to the client, with minimal internal and no - transport-visible buffering. - - Unless the implementation defines otherwise, the script MUST NOT - indicate in its response that the client can send further requests - over the same connection. - -6 CGI Response - -6.1 Response Handling - - A script MUST always provide a non-empty response, and so there is a - system defined method for it to send this data back to the server. - Unless defined otherwise, this will be via the 'standard output' file - descriptor. - - The script MUST check the REQUEST_METHOD variable when processing the - request and preparing its response. - - The server MAY implement a timeout period within which data must be - received from the script. If a server implementation defines such a - timeout and receives no data from a script within the timeout period, - the server MAY terminate the script process. - -6.2 Response Types - - The response comprises a message-header and a message-body, separated - by a blank line. The message-header contains one ore more header - fields. The body may be NULL. - - - -Robinson & Coar Expires 18 April 2004 [Page 22] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - generic-response = 1*header-field NL [ response-body ] - - The script MUST return one of either a document response, a local - redirect response or a client redirect (with optional document) - response. In the response definitions below, the order of header - fields in a response is not significant (despite appearing so in the - BNF). The header fields are defined in section 6.3. - - CGI-Response = document-response | local-redir-response | - client-redir-response | client-redirdoc-response - -6.2.1 Document Response - - The CGI script can return a document to the user in a document - response, with an optional error code indicating the success status - of the response. - - document-response = Content-Type [ Status ] *other-field NL - response-body - - The script MUST return a Content-Type header field. A Status header - field is optional, and status 200 'OK' is assumed if it is omitted. - The server MUST make any appropriate modifications to the script's - output to ensure that the response to the client complies with the - response protocol version. - -6.2.2 Local Redirect Response - - The CGI script can return a URI path and query-string - ('local-pathquery') for a local resource in a Location header field. - This indicates to the server that it should reprocess the request - using the path specified. - - local-redir-response = local-Location NL - - The script MUST NOT return any other header fields or a message-body, - and the server MUST generate the response that it would have produced - in response to a request containing the URL - - scheme "://" server-name ":" server-port local-pathquery - -6.2.3 Client Redirect Response - - The CGI script can return an absolute URI path in a Location header - field, to indicate to the client that it should reprocess the request - using the URI specified. - - client-redir-response = client-Location *extension-field NL - - - -Robinson & Coar Expires 18 April 2004 [Page 23] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - The script MUST not provide any other header fields, except for - server-defined CGI extension fields. For an HTTP client request, the - server MUST generate a 302 'Found' HTTP response message. - -6.2.4 Client Redirect Response with Document - - The CGI script can return an absolute URI path in a Location header - field together with an attached document, to indicate to the client - that it should reprocess the request using the URI specified. - - client-redirdoc-response = client-Location Status Content-Type - *other-field NL response-body - - The Status header field MUST be supplied and MUST contain a status - value of 302 'Found'. The server MUST make any appropriate - modifications to the script's output to ensure that the response to - the client complies with the response protocol version. - -6.3 Response Header Fields - - The response header fields are either CGI or extension header fields - to be interpreted by the server, or protocol-specific headers to be - included in the response returned to the client. At least one CGI - field MUST be supplied; each CGI field MUST NOT appear more than once - in the response. The response header fields have the syntax: - - header-field = CGI-field | other-field - CGI-field = Content-Type | Location | Status - other-field = protocol-field | extension-field - protocol-field = generic-field - extension-field = generic-field - generic-field = field-name ":" [ field-value ] NL - field-name = token - field-value = *( field-content | LWSP ) - field-content = *( token | separator | quoted-string ) - - The field-name is not case sensitive. A NULL field value is - equivalent to a field not being sent. Note that each header field in - a CGI-Response MUST be specified on a single line; CGI/1.1 does not - support continuation lines. Whitespace is permitted between the ":" - and the field-value (but not between the field-name and the ":"), and - also between tokens in the field-value. - -6.3.1 Content-Type - - The Content-Type response field sets the Internet Media Type [10] of - the entity body. - - - - -Robinson & Coar Expires 18 April 2004 [Page 24] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - Content-Type = "Content-Type:" media-type NL - - If an entity body is returned, the script MUST supply a Content-Type - field in the response. If it fails to do so, the server SHOULD NOT - attempt to determine the correct content type. The value SHOULD be - sent unmodified to the client, except for any charset parameter - changes. - - Unless it is otherwise system-defined, the default charset assumed by - the client for text media-types is ISO-8859-1 if the protocol is HTTP - and US-ASCII otherwise. Hence the script SHOULD include a charset - parameter. See section 3.4.1 of the HTTP/1.1 specification [8] for a - discussion of this issue. - -6.3.2 Location - - The Location header field is used to specify to the server that the - script is returning a reference to a document rather than an actual - document. It is either an absolute URI (with fragment), indicating - that the client is to fetch the referenced document, or a local URI - path (with query string), indicating that the server is to fetch the - referenced document. - - Location = local-Location | client-Location - client-Location = "Location:" fragment-URI NL - local-Location = "Location:" local-pathquery NL - fragment-URI = absoluteURI [ "#" fragment ] - fragment = *uric - local-pathquery = abs-path [ "?" query-string ] - abs-path = "/" path-segments - path-segments = segment *( "/" segment ) - segment = *pchar - pchar = unreserved | escaped | extra - extra = ":" | "@" | "&" | "=" | "+" | "$" | "," - - The syntax of an absoluteURI is incorporated into this document from - that specified in RFC 2396 [3] and RFC 2732 [11]. A valid - absoluteURI always starts with the name of scheme followed by ":"; - scheme names start with a letter and continue with alphanumerics, - "+", "-" or ".". The local URI path and query must be an absolute - path, and not a relative path or NULL, and hence must start with a - "/". - - Note that any message-body attached to the request (such as for a - POST request) may not be available to the resource that is the target - of the redirect. - - - - - -Robinson & Coar Expires 18 April 2004 [Page 25] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - -6.3.3 Status - - The Status header field contains a 3-digit integer result code that - indicates the level of success of the script's attempt to handle the - request. - - Status = "Status:" status-code SP reason-phrase NL - status-code = "200" | "302" | "400" | "501" | 3digit - reason-phrase = *TEXT - - Status code 200 'OK' indicates success, and is the default value - assumed for a document response. Status code 302 'Found' is used - with a Location header field and response message-body. Status code - 400 'Bad Request' may be used for an unknown request format, such as - a missing CONTENT_TYPE. Status code 501 'Not Implemented' may be - returned by a script if it receives an unsupported REQUEST_METHOD. - - Other valid status codes are listed in section 6.1.1 of the HTTP - specifications [2], [8], and also the IANA HTTP Status Code Registry - [18], and can be used in addition to or instead of the ones listed - above. The script SHOULD check the value of SERVER_PROTOCOL before - using HTTP/1.1 status codes. The script MAY reject with error 405 - 'Method Not Allowed' HTTP/1.1 requests made using a method it does - not support. - - Note that returning an error status code does not have to mean an - error condition with the script itself. For example, a script that - is invoked as an error handler by the server should return the code - appropriate to the server's error condition. - - The reason-phrase is a textual description of the error to be - returned to the client for human consumption. - -6.3.4 Protocol-Specific Header Fields - - The script MAY return any other header fields that relate to the - response message defined by the specification for the SERVER_PROTOCOL - (HTTP/1.0 [2] or HTTP/1.1 [8]). The server MUST translate the header - data from the CGI header syntax to the HTTP header syntax if these - differ. For example, the character sequence for newline (such as - UNIX's US-ASCII LF) used by CGI scripts may not be the same as that - used by HTTP (US-ASCII CR followed by LF). - - The script MUST NOT return any header fields that relate to - client-side communication issues and could affect the server's - ability to send the response to the client. The server MAY remove - any such header fields returned by the client. It SHOULD resolve any - conflicts between headers returned by the script and headers that it - - - -Robinson & Coar Expires 18 April 2004 [Page 26] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - would otherwise send itself. - -6.3.5 Extension Header Fields - - The server may define additional implementation-specific CGI header - fields, whose field names SHOULD begin with "X-CGI-". It MAY ignore - (and delete) any unrecognised header fields with names beginning - "X-CGI-". - -6.4 Response Message-Body - - The response message-body is an attached document to be returned to - the client by the server. The server MUST read all the data provided - by the script, until the script signals the end of the message-body - by way of an end-of-file condition. The message-body SHOULD be sent - unmodified to the client, except for HEAD requests or any required - transfer-codings, content-codings or charset conversions. - - response-body = *OCTET - -7 System Specifications - -7.1 AmigaDOS - - Meta-Variables - Meta-variables are passed to the script in identically named - environment variables. These are accessed by the DOS library - routine GetVar(). The flags argument SHOULD be 0. Case is - ignored, but upper case is recommended for compatibility with - case-sensitive systems. - - The current working directory - The current working directory for the script is set to the - directory containing the script. - - Character set - The US-ASCII character set [20] is used for the definition of - meta-variables, header fields and values; the newline (NL) - sequence is LF; servers SHOULD also accept CR LF as a newline. - -7.2 UNIX - - For UNIX compatible operating systems, the following are defined: - - Meta-Variables - Meta-variables are passed to the script in identically named - environment variables. These are accessed by the C library - routine getenv() or variable environ. - - - -Robinson & Coar Expires 18 April 2004 [Page 27] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - The command line - This is accessed using the the argc and argv arguments to main(). - The words have any characters which are 'active' in the Bourne - shell escaped with a backslash. - - The current working directory - The current working directory for the script SHOULD be set to the - directory containing the script. - - Character set - The US-ASCII character set [20], excluding NUL, is used for the - definition of meta-variables, header fields and CHAR values; TEXT - values use ISO-8859-1. The PATH_TRANSLATED value can contain any - 8-bit byte except NUL. The newline (NL) sequence is LF; servers - should also accept CR LF as a newline. - -7.3 EBCDIC/POSIX - - For POSIX compatible operating systems using the EBCDIC character - set, the following are defined: - - Meta-Variables - Meta-variables are passed to the script in identically named - environment variables. These are accessed by the C library - routine getenv(). - - The command line - This is accessed using the the argc and argv arguments to main(). - The words have any characters which are 'active' in the Bourne - shell escaped with a backslash. - - The current working directory - The current working directory for the script SHOULD be set to the - directory containing the script. - - Character set - The IBM1047 character set [19], excluding NUL, is used for the - definition of meta-variables, header fields, values, TEXT strings - and the PATH_TRANSLATED value. The newline (NL) sequence is LF; - servers should also accept CR LF as a newline. - - media-type charset default - The default charset value for text (and other - implementation-defined) media types is IBM1047. - - - - - - - -Robinson & Coar Expires 18 April 2004 [Page 28] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - -8 Implementation - -8.1 Recommendations for Servers - - Although the server and the CGI script need not be consistent in - their handling of URL paths (client URLs and the PATH_INFO data, - respectively), server authors may wish to impose consistency. So the - server implementation should specify its behaviour for the following - cases: - - 1. define any restrictions on allowed path segments, in particular - whether non-terminal NULL segments are permitted; - - 2. define the behaviour for "." or ".." path segments; i.e. - whether they are prohibited, treated as ordinary path segments - or interpreted in accordance with the relative URL - specification [3]; - - 3. define any limits of the implementation, including limits on - path or search string lengths, and limits on the volume of - header fields the server will parse. - -8.2 Recommendations for Scripts - - If the script does not intend processing the PATH_INFO data, then it - should reject the request with 404 Not Found if PATH_INFO is not - NULL. - - If the output of a form is being processed, check that CONTENT_TYPE - is "application/x-www-form-urlencoded" [15] or "multipart/form-data" - [13]. If CONTENT_TYPE is blank, the script can reject the request - with a 415 'Unsupported Media Type' error, where supported by the - protocol. - - When parsing PATH_INFO, PATH_TRANSLATED or SCRIPT_NAME the script - should be careful of void path segments ("//") and special path - segments ("." and ".."). They should either be removed from the path - before use in OS system calls, or the request should be rejected with - 404 'Not Found'. - - When returning header fields, the script should try to send the CGI - headers as soon as possible, and should send them before any HTTP - headers. This may help reduce the server's memory requirements. - - - - - - - - -Robinson & Coar Expires 18 April 2004 [Page 29] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - -9 Security Considerations - -9.1 Safe Methods - - As discussed in the security considerations of the HTTP - specifications [2], [8], the convention has been established that the - GET and HEAD methods should be 'safe' and 'idempotent' (repeated - requests have the same effect as a single request). See section 9.1 - of RFC 2616 [8] for a full discussion. - -9.2 Header Fields Containing Sensitive Information - - Some HTTP header fields may carry sensitive information which the - server should not pass on to the script unless explicitly configured - to do so. For example, if the server protects the script using the - Basic authentication scheme, then the client will send an - Authorization header field containing a username and password. The - server validates this information and so it should not pass on the - password via the HTTP_AUTHORIZATION meta-variable without careful - consideration. This also applies to the Proxy-Authorization header - field and the corresponding HTTP_PROXY_AUTHORIZATION meta-variable. - -9.3 Data Privacy - - Confidential data in a request should be placed in a message-body as - part of a POST request, and not placed in the URI or message headers. - On some systems, the environment used to pass meta-variables to a - script may be visible to other scripts or users. In addition, many - existing servers, proxies and clients will permanently record the URI - where it might be visible to third parties. - -9.4 Information Security Model - - For a client connection using TLS, the security model applies between - the client and the server, and not between the client and the script. - It is the server's responsibility to handle the TLS session, and thus - it is the server which is authenticated to the client, not the CGI - script. - - This specification provides no mechanism for the script to - authenticate the server which invoked it. There is no enforced - integrity on the CGI request and response messages. - -9.5 Script Interference with the Server - - The most common implementation of CGI invokes the script as a child - process using the same user and group as the server process. It - should therefore be ensured that the script cannot interfere with the - - - -Robinson & Coar Expires 18 April 2004 [Page 30] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - server process, its configuration, documents or log files. - - If the script is executed by calling a function linked in to the - server software (either at compile-time or run-time) then precautions - should be taken to protect the core memory of the server, or to - ensure that untrusted code cannot be executed. - -9.6 Data Length and Buffering Considerations - - This specification places no limits on the length of the message-body - presented to the script. The script should not assume that - statically allocated buffers of any size are sufficient to contain - the entire submission at one time. Use of a fixed length buffer - without careful overflow checking may result in an attacker - exploiting 'stack-smashing' or 'stack-overflow' vulnerabilities of - the operating system. The script may spool large submissions to disk - or other buffering media, but a rapid succession of large submissions - may result in denial of service conditions. If the CONTENT_LENGTH of - a message-body is larger than resource considerations allow, scripts - should respond with an error status appropriate for the protocol - version; potentially applicable status codes include 503 'Service - Unavailable' (HTTP/1.0 and HTTP/1.1), 413 'Request Entity Too Large' - (HTTP/1.1), and 414 'Request-URI Too Large' (HTTP/1.1). - - Similar considerations apply to the server's handling of the CGI - response from the script. There is no limit on the length of the - header or message-body returned by the script; the server should not - assume that statically allocated buffers of any size are sufficient - to contain the entire response. - -9.7 Stateless Processing - - The stateless nature of the Web makes each script execution and - resource retrieval independent of all others even when multiple - requests constitute a single conceptual Web transaction. Because of - this, a script should not make any assumptions about the context of - the user-agent submitting a request. In particular, scripts should - examine data obtained from the client and verify that they are valid, - both in form and content, before allowing them to be used for - sensitive purposes such as input to other applications, commands, or - operating system services. These uses include (but are not limited - to) system call arguments, database writes, dynamically evaluated - source code, and input to billing or other secure processes. It is - important that applications be protected from invalid input - regardless of whether the invalidity is the result of user error, - logic error, or malicious action. - - Authors of scripts involved in multi-request transactions should be - - - -Robinson & Coar Expires 18 April 2004 [Page 31] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - particularly cautious about validating the state information; - undesirable effects may result from the substitution of dangerous - values for portions of the submission which might otherwise be - presumed safe. Subversion of this type occurs when alterations are - made to data from a prior stage of the transaction that were not - meant to be controlled by the client (e.g., hidden HTML form - elements, cookies, embedded URLs, etc.). - -9.8 Relative Paths - - The server should be careful of ".." path segments in the request - URI. These should be removed or resolved in the request URI before - it is split into the script-path and extra-path. Alternatively, when - the extra-path is used to find the PATH_TRANSLATED, care should be - taken to avoid the path resolution from providing translated paths - outside an expected path hierarchy. - -9.9 Non-parsed Header Output - - If a script returns a non-parsed header output, to be interpreted by - the client in its native protocol, then the script must address all - security considerations relating to that protocol. - -10 Acknowledgements - - This work is based on the original CGI interface that arose out of - discussions on the 'www-talk' mailing list. In particular, Rob - McCool, John Franks, Ari Luotonen, George Phillips and Tony Sanders - deserve special recognition for their efforts in defining and - implementing the early versions of this interface. - - This document has also greatly benefited from the comments and - suggestions made Chris Adie, Dave Kristol and Mike Meyer; also David - Morris, Jeremy Madea, Patrick McManus, Adam Donahue, Ross Patterson - and Harald Alvestrand. - -11 References - - [1] Berners-Lee, T., 'Universal Resource Identifiers in WWW: A - Unifying Syntax for the Expression of Names and Addresses of - Objects on the Network as used in the World-Wide Web', RFC 1630, - CERN, June 1994. - - [2] Berners-Lee, T., Fielding, R. T. and Frystyk, H., 'Hypertext - Transfer Protocol -- HTTP/1.0', RFC 1945, MIT/LCS, UC Irvine, - May 1996. - - [3] Berners-Lee, T., Fielding, R. and Masinter, L., 'Uniform - - - -Robinson & Coar Expires 18 April 2004 [Page 32] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - Resource Identifiers (URI) : Generic Syntax', RFC 2396, MIT/LC, - U.C. Irvine, Xerox Corporation, August 1998. - - [4] Braden, R. (Editor), 'Requirements for Internet Hosts -- - Application and Support', STD 3, RFC 1123, IETF, October 1989. - - [5] Bradner, S., 'Key words for use in RFCs to Indicate Requirements - Levels', BCP 14, RFC 2119, Harvard University, March 1997. - - [6] Crocker, D.H., 'Standard for the Format of ARPA Internet Text - Messages', STD 11, RFC 822, University of Delaware, August 1982. - - [7] Dierks, T. and Allen, C., 'The TLS Protocol Version 1.0', RFC - 2246, Certicom, January 1999. - - [8] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., - Leach, P. and Berners-Lee, T., 'Hypertext Transfer Protocol -- - HTTP/1.1', RFC 2616, UC Irving, Compaq/W3C, Compaq, W3C/MIT, - Xerox, Microsoft, W3C/MIT, June 1999. - - [9] Franks, J., Hallam-Baker, P., Hostetler, J., Lawrence, S., - Leach, P., Luotonen, A. and Stewart L., 'HTTP Authentication: - Basic and Digest Access Authentication', RFC 2617, Northwestern - University, Verisign Inc., AbiSource, Inc., Agranat Systems, - Inc., Microsoft Corporation, Netscape Communications - Corporation, Open Market, Inc., June 1999. - - [10] Freed, N. and Borenstein N., 'Multipurpose Internet Mail - Extensions (MIME) Part Two: Media Types', RFC 2046, Innosoft, - First Virtual, November 1996. - - [11] Hinden, R., Carpenter, B. and Masinter, L., 'Format for Literal - IPv6 Addresses in URL's', RFC 2732, Nokia, IBM, AT&T, December - 1999. - - [12] Hinden R. and Deering S., 'IP Version 6 Addressing - Architecture', RFC 2373, Nokia, Cisco Systems, July 1998. - - [13] Masinter, L., 'Returning Values from Forms: - multipart/form-data', RFC 2388, Xerox Corporation, August 1998. - - [14] Mockapetris, P., 'Domain Names - Concepts and Facilities', STD - 13, RFC 1034, ISI, November 1987. - - [15] Raggett, D., Le Hors, A. and Jacobs, I. (eds), 'HTML 4.01 - Specification', W3C Recommendation December 1999, - http://www.w3.org/TR/html401/. - - [16] Rescola, E. 'HTTP Over TLS', RFC 2818, RTFM, May 2000. - - -Robinson & Coar Expires 18 April 2004 [Page 33] - -INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 - - - [17] St. Johns, M., 'Identification Protocol', RFC 1413, US - Department of Defense, February 1993. - - [18] 'HTTP Status Code Registry', - http://www.iana.org/assignments/http-status-codes, IANA. - - [19] IBM National Language Support Reference Manual Volume 2, - SE09-8002-01, March 1990. - - [20] 'Information Systems -- Coded Character Sets -- 7-bit American - Standard Code for Information Interchange (7-Bit ASCII)', ANSI - INCITS.4-1986 (R2002). - - [21] 'Information technology -- 8-bit single-byte coded graphic - character sets -- Part 1: Latin alphabet No. 1', ISO/IEC - 8859-1:1998. - - [22] 'The Common Gateway Interface', - http://hoohoo.ncsa.uiuc.edu/cgi/, NCSA, University of Illinois. - - -12 Authors' Addresses - - David Robinson - Apache Software Foundation - Email: drtr@apache.org - - Ken A. L. Coar - MeepZor Consulting - 7824 Mayfaire Crest Lane, Suite 202 - Raleigh, NC 27615-4875 - USA - Tel: +1 (919) 254 4237 - Fax: +1 (919) 254 5420 - Email: Ken.Coar@Golux.com - - - - - - - - - - - - - - - - -Robinson & Coar Expires 18 April 2004 [Page 34] -