--------------------- PatchSet 5045 Date: 2007/07/11 05:32:06 Author: amosjeffries Branch: docs Tag: (none) Log: Convert prog-guide.sgml to Doxygen Recipe files. TODO - pull API and per-function documentation down into the actual source files linking by API name Members: doc/Programming-Guide/01_Main.dox:1.1->1.1.2.1 doc/Programming-Guide/02_CodingConventions.dox:1.1->1.1.2.1 doc/Programming-Guide/03_MajorComponents.dox:1.1->1.1.2.1 doc/Programming-Guide/04_ExternalPrograms.dox:1.1->1.1.2.1 doc/Programming-Guide/05_TypicalRequestFlow.dox:1.1->1.1.2.1 doc/Programming-Guide/07_MainLoop.dox:1.1->1.1.2.1 doc/Programming-Guide/08_ClientStreams.dox:1.1->1.1.2.1 doc/Programming-Guide/09_ClientRequests.dox:1.1->1.1.2.1 doc/Programming-Guide/10_DelayPools.dox:1.1->1.1.2.1 doc/Programming-Guide/11_StorageManager.dox:1.1->1.1.2.1 doc/Programming-Guide/12_StorageInterface.dox:1.1->1.1.2.1 doc/Programming-Guide/13_ForwardingSelection.dox:1.1->1.1.2.1 doc/Programming-Guide/14_IPCacheAndFQDNCache.dox:1.1->1.1.2.1 doc/Programming-Guide/15_ServerProtocols.dox:1.1->1.1.2.1 doc/Programming-Guide/16_Timeouts.dox:1.1->1.1.2.1 doc/Programming-Guide/17_Events.dox:1.1->1.1.2.1 doc/Programming-Guide/18_AccessControls.dox:1.1->1.1.2.1 doc/Programming-Guide/19_AuthenticationFramework.dox:1.1->1.1.2.1 doc/Programming-Guide/20_ICP.dox:1.1->1.1.2.1 doc/Programming-Guide/21_NetDB.dox:1.1->1.1.2.1 doc/Programming-Guide/22_ErrorPages.dox:1.1->1.1.2.1 doc/Programming-Guide/23_CallbackDataAllocator.dox:1.1->1.1.2.1 doc/Programming-Guide/24_RefCountDataAllocator.dox:1.1->1.1.2.1 doc/Programming-Guide/25_CacheManager.dox:1.1->1.1.2.1 doc/Programming-Guide/26_HTTPHeaders.dox:1.1->1.1.2.1 doc/Programming-Guide/27_MiscOther.dox:1.1->1.1.2.1 doc/Programming-Guide/Groups.dox:1.1->1.1.2.1 doc/Programming-Guide/doxy.footer.html:1.1->1.1.2.1 doc/Programming-Guide/doxy.header.html:1.1->1.1.2.1 doc/Programming-Guide/prog-guide.sgml:1.10->1.10.14.1(DEAD) --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/01_Main.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,55 @@ +/** \mainpage Squid 3.x Developer Programming Guide + +\section Abstract Abstract + +\par + Squid is a WWW Cache application developed by the National Laboratory + for Applied Network Research and members of the Web Caching community. + Squid is implemented as a single, non-blocking process based around + a BSD select() loop. This document describes the operation of the Squid + source code and is intended to be used by others who wish to customize + or improve it. + + +\section Introduction Introduction + +\par + The Squid source code has evolved more from empirical + observation and tinkering, rather than a solid design + process. It carries a legacy of being "touched" by + numerous individuals, each with somewhat different techniques + and terminology. + +\par + Squid is a single-process proxy server. Every request is + handled by the main process, with the exception of FTP. + However, Squid does not use a "threads package" such has + Pthreads. While this might be easier to code, it suffers + from portability and performance problems. Instead Squid + maintains data structures and state information for each + active request. + +\par + The code is often difficult to follow because there are no + explicit state variables for the active requests. Instead, + thread execution progresses as a sequence of "callback + functions" which get executed when I/O is ready to occur, + or some other event has happened. As a callback function + completes, it is responsible for registering the next + callback function for subsequent I/O. + +\par + Note there is only a pseudo-consistent naming scheme. In + most cases functions are named like \c moduleFooBar() . + However, there are also some functions named like + \c module_foo_bar() . + +\par + Note that the Squid source changes rapidly, and while we + do make some effort to document code as we go some parts + of the documentation may be left out. If you find any + inconsistencies, please feel free to notify + \link http://www.squid-cache.org/Support/contact.dyn the Squid Developers \endlink + . + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/02_CodingConventions.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,26 @@ +/** +\page 01_CodeConventions Coding Conventions + +\section Infrastructure Infrastructure + +\par + Most custom types and tools are documented in the code or the relevant + portions of this manual. Some key points apply globally however. + +\section FWT Fixed width types + +\par + If you need to use specific width types - such as + a 16 bit unsigned integer, use one of the following types. To access + them simply include "config.h". + +\code + int16_t - 16 bit signed. + u_int16_t - 16 bit unsigned. + int32_t - 32 bit signed. + u_int32_t - 32 bit unsigned. + int64_t - 64 bit signed. + u_int64_t - 64 bit unsigned. +\endcode + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/03_MajorComponents.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,372 @@ +/** +\page 02_MajorComponents Overview of Squid Components + +\par Squid consists of the following major components + +\section ClientSideSocket Client Side Socket + +\par + Here new client connections are accepted, parsed, and + reply data sent. Per-connection state information is held + in a data structure called ConnStateData. Per-request + state information is stored in the clientSocketContext + structure. With HTTP/1.1 we may have multiple requests from + a single TCP connection. +\todo DOCS: find out what has replaced clientSocketContext since it seems to not exist now. + +\section ClientSideRequest Client Side Request +\par + This is where requests are processed. We determine if the + request is to be redirected, if it passes access lists, + and setup the initial client stream for internal requests. + Temporary state for this processing is held in a + clientRequestContext. +\todo DOCS: find out what has replaced clientRequestContext since it seems not to exist now. + +\section ClientSideReply Client Side Reply +\par + This is where we determine if the request is cache HIT, + REFRESH, MISS, etc. This involves querying the store + (possibly multiple times) to work through Vary lists and + the list. Per-request state information is stored + in the clientReplyContext. + +\section ClientStreams Client Streams +\par + These routines implement a unidirectional, non-blocking, + pull pipeline. They allow code to be inserted into the + reply logic on an as-needed basis. For instance, + transfer-encoding logic is only needed when sending a + HTTP/1.1 reply. + +\section ServerSide Server Side +\par + These routines are responsible for forwarding cache misses + to other servers, depending on the protocol. Cache misses + may be forwarded to either origin servers, or other proxy + caches. Note that all requests (FTP, Gopher) to other + proxies are sent as HTTP requests. +\par + gopher.c is somewhat + complex and gross because it must convert from the Gopher + protocol to HTTP. Wais and Gopher don't receive much + attention because they comprise a relatively insignificant + portion of Internet traffic. + +\section StorageManager Storage Manager +\par + The Storage Manager is the glue between client and server + sides. Every object saved in the cache is allocated a + StoreEntry structure. While the object is being + accessed, it also has a MemObject structure. +\par + Squid can quickly locate cached objects because it keeps + (in memory) a hash table of all StoreEntry's. The + keys for the hash table are MD5 checksums of the objects + URI. In addition there is also a storage policy such + as LRU that keeps track of the objects and determines + the removal order when space needs to be reclaimed. + For the LRU policy this is implemented as a doubly linked + list. +\par + For each object the StoreEntry maps to a cache_dir + and location via sdirno and sfileno. For the "ufs" store + this file number (sfileno) is converted to a disk pathname + by a simple modulo of L2 and L1, but other storage drivers may + map sfilen in other ways. A cache swap file consists + of two parts: the cache metadata, and the object data. + Note the object data includes the full HTTP reply---headers + and body. The HTTP reply headers are not the same as the + cache metadata. +\par + Client-side requests register themselves with a StoreEntry + to be notified when new data arrives. Multiple clients + may receive data via a single StoreEntry. For POST + and PUT request, this process works in reverse. Server-side + functions are notified when additional data is read from + the client. + +\section RequestForwarding Request Forwarding + +\section PeerSelection Peer Selection +\par + These functions are responsible for selecting one (or none) + of the neighbor caches as the appropriate forwarding + location. + +\section AccessControl Access Control +\par + These functions are responsible for allowing or denying a + request, based on a number of different parameters. These + parameters include the client's IP address, the hostname + of the requested resource, the request method, etc. Some + of the necessary information may not be immediately available, + for example the origin server's IP address. In these cases, + the ACL routines initiate lookups for the necessary + information and continues the access control checks when + the information is available. + +\section AuthenticationFramework Authentication Framework +\par + These functions are responsible for handling HTTP + authentication. They follow a modular framework allow + different authentication schemes to be added at will. For + information on working with the authentication schemes See + the chapter Authentication Framework. + +\section NetworkCommunication Network Communication +\par + These are the routines for communicating over TCP and UDP + network sockets. Here is where sockets are opened, closed, + read, and written. In addition, note that the heart of + Squid (comm_select() or comm_poll()) exists here, + even though it handles all file descriptors, not just + network sockets. These routines do not support queuing + multiple blocks of data for writing. Consequently, a + callback occurs for every write request. +\todo DOCS: decide what to do for comm_poll() since its either obsolete or uses other names. + +\section FileDiskIO File/Disk I/O +\par + Routines for reading and writing disk files (and FIFOs). + Reasons for separating network and disk I/O functions are + partly historical, and partly because of different behaviors. + For example, we don't worry about getting a "No space left + on device" error for network sockets. The disk I/O routines + support queuing of multiple blocks for writing. In some + cases, it is possible to merge multiple blocks into a single + write request. The write callback does not necessarily + occur for every write request. + +\section Neighbors Neighbors +\par + Maintains the list of neighbor caches. Sends and receives + ICP messages to neighbors. Decides which neighbors to + query for a given request. File: neighbors.c. + +\section FQDNCache IP/FQDN Cache +\par + A cache of name-to-address and address-to-name lookups. + These are hash tables keyed on the names and addresses. + ipcache_nbgethostbyname() and fqdncache_nbgethostbyaddr() + implement the non-blocking lookups. Files: ipcache.c, + fqdncache.c. + +\section CacheManager Cache Manager +\par + This provides access to certain information needed by the + cache administrator. A companion program, cachemgr.cgi + can be used to make this information available via a Web + browser. Cache manager requests to Squid are made with a + special URL of the form +\code + cache_object://hostname/operation +\endcode + The cache manager provides essentially "read-only" access + to information. It does not provide a method for configuring + Squid while it is running. +\todo DOCS: get cachemgr.cgi documenting + +\section NetworkMeasurementDB Network Measurement Database +\par + In a number of situation, Squid finds it useful to know the + estimated network round-trip time (RTT) between itself and + origin servers. A particularly useful is example is + the peer selection algorithm. By making RTT measurements, a + Squid cache will know if it, or one if its neighbors, is closest + to a given origin server. The actual measurements are made + with the pinger program, described below. The measured + values are stored in a database indexed under two keys. The + primary index field is the /24 prefix of the origin server's + IP address. Secondly, a hash table of fully-qualified host + names that have data structures with links to the appropriate + network entry. This allows Squid to quickly look up measurements + when given either an IP address, or a host name. The /24 prefix + aggregation is used to reduce the overall database size. File: + net_db.c. + +\section Redirectors Redirectors +\par + Squid has the ability to rewrite requests from clients. After + checking the \link AccessControls access controls \endlink , + but before checking for cache hits, + requested URLs may optionally be written to an external + redirector process. This program, which can be highly + customized, may return a new URL to replace the original request. + Common applications for this feature are extended access controls + and local mirroring. File: redirect.c. + +\section ASN Autonomous System Numbers +\par + Squid supports Autonomous System (AS) numbers as another + access control element. The routines in asn.c + query databases which map AS numbers into lists of CIDR + prefixes. These results are stored in a radix tree which + allows fast searching of the AS number for a given IP address. + +\section ConfigurationFileParsing Configuration File Parsing +\par + The primary configuration file specification is in the file + cf.data.pre. A simple utility program, cf_gen, + reads the cf.data.pre file and generates cf_parser.c + and squid.conf. cf_parser.c is included directly + into cache_cf.c at compile time. +\todo DOCS: get cf.data.pre documenting +\todo DOCS: get squid.conf documenting +\todo DOCS: get cf_gen documenting and linking. + +\section Callback Data Allocator +\par + Squid's extensive use of callback functions makes it very + susceptible to memory access errors. Care must be taken + so that the callback_data memory is still valid when + the callback function is executed. The routines in cbdata.c + provide a uniform method for managing callback data memory, + canceling callbacks, and preventing erroneous memory accesses. +\todo DOCS: get callback_data (object?) linking or repalcement named. + +\section RefCountDataAllocator Refcount Data Allocator +\since Squid 3.0 +\par + Manual reference counting such as cbdata uses is error prone, + and time consuming for the programmer. C++'s operator overloading + allows us to create automatic reference counting pointers, that will + free objects when they are no longer needed. With some care these + objects can be passed to functions needed Callback Data pointers. +\todo DOCS: get cbdata documenting and linking. + +\section Debugging Debugging +\par + Squid includes extensive debugging statements to assist in + tracking down bugs and strange behavior. Every debug statement + is assigned a section and level. Usually, every debug statement + in the same source file has the same section. Levels are chosen + depending on how much output will be generated, or how useful the + provided information will be. The \em debug_options line + in the configuration file determines which debug statements will + be shown and which will not. The \em debug_options line + assigns a maximum level for every section. If a given debug + statement has a level less than or equal to the configured + level for that section, it will be shown. This description + probably sounds more complicated than it really is. + File: debug.c. Note that debugs() itself is a macro. +\todo DOCS: get debugs() documenting as if it was a function. + +\section ErrorGeneration Error Generation +\par + The routines in errorpage.c generate error messages from + a template file and specific request parameters. This allows + for customized error messages and multilingual support. + +\section EventQueue Event Queue +\par + The routines in event.c maintain a linked-list event + queue for functions to be executed at a future time. The + event queue is used for periodic functions such as performing + cache replacement, cleaning swap directories, as well as one-time + functions such as ICP query timeouts. + +\section FiledescriptorManagement Filedescriptor Management +\par + Here we track the number of filedescriptors in use, and the + number of bytes which has been read from or written to each + file descriptor. + + +\section HashtableSupport Hashtable Support +\par + These routines implement generic hash tables. A hash table + is created with a function for hashing the key values, and a + function for comparing the key values. + +\section HTTPAnonymization HTTP Anonymization +\par + These routines support anonymizing of HTTP requests leaving + the cache. Either specific request headers will be removed + (the "standard" mode), or only specific request headers + will be allowed (the "paranoid" mode). + +\section DelayPools Delay Pools +\par + Delay pools provide bandwidth regulation by restricting the rate + at which squid reads from a server before sending to a client. They + do not prevent cache hits from being sent at maximal capacity. Delay + pools can aggregate the bandwidth from multiple machines and users + to provide more or less general restrictions. + +\section ICPSupport Internet Cache Protocol +\par + Here we implement the Internet Cache Protocol. This + protocol is documented in the RFC 2186 and RFC 2187. + The bulk of code is in the icp_v2.c file. The + other, icp_v3.c is a single function for handling + ICP queries from Netcache/Netapp caches; they use + a different version number and a slightly different message + format. +\todo DOCS: get RFCs linked from ietf + +\section IdentLookups Ident Lookups +\par + These routines support \link http://www.ietf.org/rfc/rfc931.txt RFC 931 \endlink + "Ident" lookups. An ident + server running on a host will report the user name associated + with a connected TCP socket. Some sites use this facility for + access control and logging purposes. + +\section MemoryManagement Memory Management +\par + These routines allocate and manage pools of memory for + frequently-used data structures. When the \em memory_pools + configuration option is enabled, unused memory is not actually + freed. Instead it is kept for future use. This may result + in more efficient use of memory at the expense of a larger + process size. + +\section MulticastSupport Multicast Support +\par + Currently, multicast is only used for ICP queries. The + routines in this file implement joining a UDP + socket to a multicast group (or groups), and setting + the multicast TTL value on outgoing packets. + +\section PresistentConnections Persistent Server Connections +\par + These routines manage idle, persistent HTTP connections + to origin servers and neighbor caches. Idle sockets + are indexed in a hash table by their socket address + (IP address and port number). Up to 10 idle sockets + will be kept for each socket address, but only for + 15 seconds. After 15 seconds, idle socket connections + are closed. + +\section RefreshRules Refresh Rules +\par + These routines decide whether a cached object is stale or fresh, + based on the \em refresh_pattern configuration options. + If an object is fresh, it can be returned as a cache hit. + If it is stale, then it must be revalidated with an + If-Modified-Since request. + +\section SNMPSupport SNMP Support +\par + These routines implement SNMP for Squid. At the present time, + we have made almost all of the cachemgr information available + via SNMP. + +\section URNSupport URN Support +\par + We are experimenting with URN support in Squid version 1.2. + Note, we're not talking full-blown generic URN's here. This + is primarily targeted toward using URN's as an smart way + of handling lists of mirror sites. For more details, please + see \link http://squid.nlanr.net/Squid/urn-support.html URN Support in Squid \endlink + . + +\section ESI ESI +\par + ESI is an implementation of Edge Side Includes (\link http://www.esi.org http://www.esi.org \endlink.) + ESI is implemented as a client side stream and a small + modification to client_side_reply.c to check whether + ESI should be inserted into the reply stream or not. + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/04_ExternalPrograms.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,41 @@ +/** +\page 03_ExternalPrograms External Programs + +\section dnsserver dnsserver +\par + Because the standard gethostbyname(3) library call + blocks, Squid must use external processes to actually make + these calls. Typically there will be ten dnsserver + processes spawned from Squid. Communication occurs via + TCP sockets bound to the loopback interface. The functions + in dns.c are primarily concerned with starting and + stopping the dnsservers. Reading and writing to and from + the dnsservers occurs in the IP and FQDN cache modules. + +\section pinger pinger +\par + Although it would be possible for Squid to send and receive + ICMP messages directly, we use an external process for + two important reasons: + \li Because squid handles many filedescriptors simultaneously, + we get much more accurate RTT measurements when ICMP is + handled by a separate process. + \li Superuser privileges are required to send and receive + ICMP. Rather than require Squid to be started as root, + we prefer to have the smaller and simpler pinger + program installed with setuid permissions. + +\section unlinkd unlinkd +\par + The unlink(2) system call can cause a process to block + for a significant amount of time. Therefore we do not want + to make unlink() calls from Squid. Instead we pass them + to this external process. + +\section redirector redirector +\par + A redirector process reads URLs on stdin and writes (possibly + changed) URLs on stdout. It is implemented as an external + process to maximize flexibility. + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/05_TypicalRequestFlow.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,72 @@ +/** +\page 05_TypicalRequestFlow Flow of a Typical Request + +\par +\li A client connection is accepted by the client-side socket + support and parsed, or is directly created via + clientBeginRequest. + +\li The access controls are checked. The client-side-request builds + an ACL state data structure and registers a callback function + for notification when access control checking is completed. + +\li After the access controls have been verified, the request + may be redirected. + +\li The client-side-request is forwarded up the client stream + to GetMoreData which looks for the requested object in the + cache, and or Vary: versions of the same. If is a cache hit, + then the client-side registers its interest in the + StoreEntry. Otherwise, Squid needs to forward the request, + perhaps with an If-Modified-Since header. + +\li The request-forwarding process begins with protoDispatch. + This function begins the peer selection procedure, which + may involve sending ICP queries and receiving ICP replies. + The peer selection procedure also involves checking + configuration options such as \em never_direct and + \em always_direct. + +\li When the ICP replies (if any) have been processed, we end + up at protoStart. This function calls an appropriate + protocol-specific function for forwarding the request. + Here we will assume it is an HTTP request. + +\li The HTTP module first opens a connection to the origin + server or cache peer. If there is no idle persistent socket + available, a new connection request is given to the Network + Communication module with a callback function. The + comm.c routines may try establishing a connection + multiple times before giving up. + +\li When a TCP connection has been established, HTTP builds a + request buffer and submits it for writing on the socket. + It then registers a read handler to receive and process + the HTTP reply. + +\li As the reply is initially received, the HTTP reply headers + are parsed and placed into a reply data structure. As + reply data is read, it is appended to the StoreEntry. + Every time data is appended to the StoreEntry, the + client-side is notified of the new data via a callback + function. The rate at which reading occurs is regulated by + the delay pools routines, via the deferred read mechanism. + +\li As the client-side is notified of new data, it copies the + data from the StoreEntry and submits it for writing on the + client socket. + +\li As data is appended to the StoreEntry, and the client(s) + read it, the data may be submitted for writing to disk. + +\li When the HTTP module finishes reading the reply from the + upstream server, it marks the StoreEntry as "complete". + The server socket is either closed or given to the persistent + connection pool for future use. + +\li When the client-side has written all of the object data, + it unregisters itself from the StoreEntry. At the + same time it either waits for another request from the + client, or closes the client connection. + +*/ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/07_MainLoop.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,131 @@ +/** +\page 7_MainLoop The Main Loop: comm_select() +\par + At the core of Squid is the select(2) system call. + Squid uses select() or poll(2) to process I/O on + all open file descriptors. Hereafter we'll only use + "select" to refer generically to either system call. +\par + The select() and poll() system calls work by + waiting for I/O events on a set of file descriptors. Squid + only checks for \em read and \em write events. Squid + knows that it should check for reading or writing when + there is a read or write handler registered for a given + file descriptor. Handler functions are registered with + the commSetSelect function. For example: +\code + commSetSelect(fd, COMM_SELECT_READ, clientReadRequest, conn, 0); +\endcode + In this example, fd is a TCP socket to a client + connection. When there is data to be read from the socket, + then the select loop will execute +\code + clientReadRequest(fd, conn); +\endcode +\todo DOCS: find out if poll() is still used and get it linking to docs. + +\par + The I/O handlers are reset every time they are called. In + other words, a handler function must re-register itself + with commSetSelect() if it wants to continue reading or + writing on a file descriptor. The I/O handler may be + canceled before being called by providing NULL arguments, + e.g.: +\code + commSetSelect(fd, COMM_SELECT_READ, NULL, NULL, 0); +\endcode +\par + These I/O handlers (and others) and their associated callback + data pointers are saved in the fde data structure: +\code + struct _fde { + ... + PF *read_handler; + void *read_data; + PF *write_handler; + void *write_data; + close_handler *close_handler; + DEFER *defer_check; + void *defer_data; + }; +\endcode + read_handler and write_handler are called when + the file descriptor is ready for reading or writing, + respectively. The close_handler is called when the + filedescriptor is closed. The close_handler is + actually a linked list of callback functions to be called. +\todo DOCS: make _fde code example a grab straight from the current source file + +\par + In some situations we want to defer reading from a + filedescriptor, even though it has data for us to read. + This may be the case when data arrives from the server-side + faster than it can be written to the client-side. Before + adding a filedescriptor to the "read set" for select, we + call defer_check (if it is non-NULL). If defer_check + returns 1, then we skip the filedescriptor for that time + through the select loop. +\todo DOCS: update name defer_check to current one used in code. + +\par + These handlers are stored in the FD_ENTRY structure + as defined in comm.h. fd_table[] is the global + array of FD_ENTRY structures. The handler functions + are of type PF, which is a typedef: +\code + typedef void (*PF) (int, void *); +\endcode + The close handler is really a linked list of handler + functions. Each handler also has an associated pointer + (void *data) to some kind of data structure. +\todo DOCS: update FD_ENTRY (macro?) linking to current details. +\todo DOCS: get fd_table[] documenting and linking properly. + +\par + comm_select() is the function which issues the select() + system call. It scans the entire fd_table[] array + looking for handler functions. Each file descriptor with + a read handler will be set in the fd_set read bitmask. + Similarly, write handlers are scanned and bits set for the + write bitmask. select() is then called, and the return + read and write bitmasks are scanned for descriptors with + pending I/O. For each ready descriptor, the handler is + called. Note that the handler is cleared from the + FD_ENTRY before it is called. + +\par + After each handler is called, comm_select_incoming() + is called to process new HTTP and ICP requests. +\todo DOCS: what has replaced comm_select_incoming() +\par + Typical read handlers are + httpReadReply(), + diskHandleRead(), + icpHandleUdp(), + and ipcache_dnsHandleRead(). + Typical write handlers are + commHandleWrite(), + diskHandleWrite(), + and icpUdpReply(). + The handler function is set with commSetSelect(), with the + exception of the close handlers, which are set with + comm_add_close_handler(). +\todo DOCS: what has replaced httpReadReply() as callback. +\todo DOCS: what has replaced ipcache_dnsHandleRead() as callback. +\todo DOCS: what has replaced icpUdpReply() as callback + +\par + The close handlers are normally called from comm_close(). + The job of the close handlers is to deallocate data structures + associated with the file descriptor. For this reason + comm_close() must normally be the last function in a + sequence to prevent accessing just-freed memory. +\todo DOCS: what has replaced comm_close() as close callback handler + +\par + The timeout and lifetime handlers are called for file + descriptors which have been idle for too long. They are + further discussed in \link 8_CLientStreams.dyn Client Streams \endlink + . + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/08_ClientStreams.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,124 @@ +/** +\page 8_ClientStreams Client Streams + +\todo DOCS: this seems all to be a better fit as inline code comments. + +\section Introduction Introduction +\par + A clientStream is a uni-directional loosely coupled pipe. Each node + consists of four methods - read, callback, detach, and status, along with the + stream housekeeping variables (a dlink node and pointer to the head of + the list), context data for the node, and read request parameters - + readbuf, readlen and readoff (in the body). +\par + clientStream is the basic unit for scheduling, and the clientStreamRead + and clientStreamCallback calls allow for deferred scheduled activity if desired. +\par + Theory on stream operation: + \li Something creates a pipeline. At a minimum it needs a head with a + status method and a read method, and a tail with a callback method and a + valid initial read request. + \li Other nodes may be added into the pipeline. + \li The tail-1th node's read method is called. + For each node going up the pipeline, the node either: + \li satisfies the read request, or + \li inserts a new node above it and calls clientStreamRead, or + \li calls clientStreamRead +\todo DOCS: make the above list nested. + +\par + There is no requirement for the Read parameters from different + nodes to have any correspondence, as long as the callbacks provided are + correct. +\par + The first node that satisfies the read request MUST generate an + httpReply to be passed down the pipeline. Body data MAY be provided. + \li On the first callback a node MAY insert further downstream nodes in + the pipeline, but MAY NOT do so thereafter. + \li The callbacks progress down the pipeline until a node makes further + reads instead of satisfying the callback (go to 4) or the end of the + pipe line is reached, where a new read sequence may be scheduled. + +\section ImplementationNotes Implementation notes +\par + ClientStreams have been implemented for the client side reply logic, + starting with either a client socket (tail of the list is + clientSocketRecipient) or a custom handler for in-squid requests, and + with the pipeline HEAD being clientGetMoreData, which uses + clientSendMoreData to send data down the pipeline. +\par + Client POST bodies do not use a pipeline currently, they use the + previous code to send the data. This is a TODO when time permits. + +\section WhatsInANode Whats in a node +\todo ClientStreams: These details should really be codified as a class which all ClientStream nodes inherit from. +\par Each node must have: + \li read method - to allow loose coupling in the pipeline. (The reader may + therefore change if the pipeline is altered, even mid-flow). + \li callback method - likewise. + \li status method - likewise. + \li detach method - used to ensure all resources are cleaned up properly. + \li dlink head pointer - to allow list inserts and deletes from within a node. + \li context data - to allow the called back nodes to maintain their private information. + \li read request parameters - For two reasons: + \li To allow a node to determine the requested data offset, length and target buffer dynamically. Again, this is to promote loose coupling. + \li Because of the callback nature of squid, every node would have to keep these parameters in their context anyway, so this reduces programmer overhead. + +\section MethodDetails Method details +\par + The first parameter is always the 'this' reference for the client stream - a clientStreamNode *. + +\subsection Read Read +\par Parameters: + \li clientHttpRequest * - superset of request data, being winnowed down over time. MUST NOT be NULL. + \li offset, length, buffer - what, how much and where. + +\par Side effects: +\par + Triggers a read of data that satisfies the httpClientRequest + metainformation and (if appropriate) the offset,length and buffer + parameters. + +\subsection Callback Callback +\par Parameters: + \li clientHttpRequest * - superset of request data, being winnowed down over time. MUST NOT be NULL. + \li httpReply * - not NULL on the first call back only. Ownership is passed down the pipeline. Each node may alter the reply if appropriate. + \li buffer, length - where and how much. + +\par Side effects: +\par + Return data to the next node in the stream. The data may be returned immediately, + or may be delayed for a later scheduling cycle. + +\subsection Detach Detach +\par Parameters: + \li clienthttpRequest * - MUST NOT be NULL. + +\par Side effects: + \li Removes this node from a clientStream. The stream infrastructure handles the removal. This node MUST have cleaned up all context data, UNLESS scheduled callbacks will take care of that. + \li Informs the prev node in the list of this nodes detachment. + +\subsection Status Status +\par Parameters: + \li clienthttpRequest * - MUST NOT be NULL. + +\par Side effects: + +\par + Allows nodes to query the upstream nodes for : + \li stream ABORTS - request cancelled for some reason. upstream will not accept further reads(). + \li stream COMPLETION - upstream has completed and will not accept further reads(). + \li stream UNPLANNED COMPLETION - upstream has completed, but not at a pre-planned location (used for keepalive checking), and will not accept further reads(). + \li stream NONE - no special status, further reads permitted. + +\subsection Abort Abort +\par Parameters: + \li clienthttpRequest * - MUST NOT be NULL. + +\par Side effects: + +\par + Detachs the tail of the stream. CURRENTLY DOES NOT clean up the tail node data - + this must be done separately. Thus Abort may ONLY be called by the tail node. + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/09_ClientRequests.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,6 @@ +/** +\page 9_ClientRequests Processing Client Requests + +\todo DOCS: write this section. Or at least find place in the code autodocs to write it. + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/10_DelayPools.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,52 @@ +/** +\page 10_DelayPools Delay Pools + +\section Introduction Introduction +\par + A DelayPool is a Composite used to manage bandwidth for any request + assigned to the pool by an access expression. DelayId's are a used + to manage the bandwith on a given request, whereas a DelayPool + manages the bandwidth availability and assigned DelayId's. + +\section ExtendingDelayPools Extending Delay Pools +\par + A CompositePoolNode is the base type for all members of a DelayPool. + Any child must implement the RefCounting primitives, as well as five + delay pool functions: + \li stats() - provide cachemanager statistics for itself. + \li dump() - generate squid.conf syntax for the current configuration of the item. + \li update() - allocate more bandwith to all buckets in the item. + \li parse() - accept squid.conf syntax for the item, and configure for use appropriately. + \li id() - return a DelayId entry for the current item. + +\par + A DelayIdComposite is the base type for all delay Id's. Concrete + Delay Id's must implement the refcounting primitives, as well as two + delay id functions: + \li bytesWanted() - return the largest amount of bytes that this delay id allows by policy. + \li bytesIn() - record the use of bandwidth by the request(s) that this delayId is monitoring. + +\par + Composite creation is currently under design review, so see the + DelayPool class and follow the parse() code path for details. + +\section NeatExtensions Neat things that could be done. +\par + With the composite structure, some neat things have become possible. + For instance: + +\subsection +\par Dynamically defined pool arrangements + for instance an + aggregate (class 1) combined with the per-class-C-net tracking of a + class 3 pool, without the individual host tracking. This differs + from a class 3 pool with -1/-1 in the host bucket, because no memory + or cpu would be used on hosts, whereas with a class 3 pool, they are + allocated and used. + +\subsection +\par Per request bandwidth limits + A delayId that contains it's own bucket could limit each request + independently to a given policy, with no aggregate restrictions. + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/11_StorageManager.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,48 @@ +\** +\page 12_StorageManager Storage Manager + +\section Introduction Introduction +\par + The Storage Manager is the glue between client and server + sides. Every object saved in the cache is allocated a + StoreEntry structure. While the object is being + accessed, it also has a MemObject structure. + +\par + Squid can quickly locate cached objects because it keeps + (in memory) a hash table of all StoreEntry's. The + keys for the hash table are MD5 checksums of the objects + URI. In addition there is also a storage policy such + as LRU that keeps track of the objects and determines + the removal order when space needs to be reclaimed. + For the LRU policy this is implemented as a doubly linked + list. + +\par + For each object the StoreEntry maps to a cache_dir + and location via sdirn and sfilen. For the "ufs" store + this file number (sfilen) is converted to a disk pathname + by a simple modulo of L2 and L1, but other storage drivers may + map sfilen in other ways. A cache swap file consists + of two parts: the cache metadata, and the object data. + Note the object data includes the full HTTP reply---headers + and body. The HTTP reply headers are not the same as the + cache metadata. + +\par + Client-side requests register themselves with a StoreEntry + to be notified when new data arrives. Multiple clients + may receive data via a single StoreEntry. For POST + and PUT request, this process works in reverse. Server-side + functions are notified when additional data is read from + the client. + +\section ObjectStorage Object Storage +\par +\todo DOCS: write section about object storage + +\section ObjectRetrieval Object Retrieval +\par +\todo DOCS: write section about object retrieval + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/12_StorageInterface.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,1087 @@ +/** +\page 12_StorageInterface Storage Interface + +\section Itroduction Introduction +\par + Traditionally, Squid has always used the Unix filesystem (UFS) + to store cache objects on disk. Over the years, the + poor performance of UFS has become very obvious. In most + cases, UFS limits Squid to about 30-50 requests per second. + Our work indicates that the poor performance is mostly + due to the synchronous nature of open() and unlink() + system calls, and perhaps thrashing of inode/buffer caches. + +\par + We want to try out our own, customized filesystems with Squid. + In order to do that, we need a well-defined interface + for the bits of Squid that access the permanent storage + devices. We also require tighter control of the replacement + policy by each storage module, rather than a single global + replacement policy. + +\section BuildStructure Build structure +\par + The storage types live in squid/src/fs/ . Each subdirectory corresponds + to the name of the storage type. When a new storage type is implemented + configure.in must be updated to autogenerate a Makefile in + squid/src/fs/$type/ from a Makefile.in file. + +\par + configure will take a list of storage types through the + --enable-store-io parameter. This parameter takes a list of + space seperated storage types. For example, + --enable-store-io="ufs coss" . + +\par + Each storage type must create an archive file + in squid/src/fs/$type/.a . This file is automatically linked into + squid at compile time. + +\par + Each storefs must export a function named storeFsSetup_$type(). + This function is called at runtime to initialise each storage type. + The list of storage types is passed through store_modules.sh + to generate the initialisation function storeFsSetup(). This + function lives in store_modules.c. + +\par + An example of the automatically generated file: + +\code + /* automatically generated by ./store_modules.sh ufs coss + * do not edit + */ + #include "squid.h" + + extern STSETUP storeFsSetup_ufs; + extern STSETUP storeFsSetup_coss; + void storeFsSetup(void) + { + storeFsAdd("ufs", storeFsSetup_ufs); + storeFsAdd("coss", storeFsSetup_coss); + } +\endcode + + +\section InitStorageType Initialization of a storage type +\par + Each storage type initializes through the storeFsSetup_$type() + function. The storeFsSetup_$type() function takes a single + argument - a storefs_entry_t pointer. This pointer references + the storefs_entry to initialise. A typical setup function is as + follows: +\code + void + storeFsSetup_ufs(storefs_entry_t *storefs) + { + assert(!ufs_initialised); + storefs->parsefunc = storeUfsDirParse; + storefs->reconfigurefunc = storeUfsDirReconfigure; + storefs->donefunc = storeUfsDirDone; + ufs_state_pool = memPoolCreate("UFS IO State data", sizeof(ufsstate_t)); + ufs_initialised = 1; + } +\endcode + +\par + There are five function pointers in the storefs_entry which require + initializing. In this example, some protection is made against the + setup function being called twice, and a memory pool is initialised + for use inside the storage module. + +\par + Each function will be covered below. + + +\subsection done done +\par +\code + typedef void + STFSSHUTDOWN(void); +\endcode + +\par + This function is called whenever the storage system is to be shut down. + It should take care of deallocating any resources currently allocated. +\include src/typedefs.h +\skip typedef void STFSPARSE.* +\skip typedef void STFSRECONFIGURE.* +\code + typedef void STFSPARSE(SwapDir *SD, int index, char *path); + typedef void STFSRECONFIGURE(SwapDir *SD, int index, char *path); +\endcode + +\par + These functions handle configuring and reconfiguring a storage + directory. Additional arguments from the cache_dir configuration + line can be retrieved through calls to strtok() and GetInteger(). + +\par STFSPARSE + has the task of initialising a new swapdir. It should + parse the remaining arguments on the cache_dir line, initialise the + relevant function pointers and data structures, and choose the + replacement policy. STFSRECONFIGURE deals with reconfiguring an + active swapdir. It should parse the remaining arguments on the + cache_dir line and change any active configuration parameters. The + actual storage initialisation is done through the STINIT function + pointer in the SwapDir. + +\par +\code + struct _SwapDir { + char *type; /* Pointer to the store dir type string */ + int cur_size; /* Current swapsize in kb */ + int low_size; /* ?? */ + int max_size; /* Maximum swapsize in kb */ + char *path; /* Path to store */ + int index; /* This entry's index into the swapDir array */ + int suggest; /* Suggestion for UFS style stores (??) */ + size_t max_objsize; /* Maximum object size for this store */ + union { /* Replacement policy-specific fields */ + #ifdef HEAP_REPLACEMENT + struct { + heap *heap; + } heap; + #endif + struct { + dlink_list list; + dlink_node *walker; + } lru; + } repl; + int removals; + int scanned; + struct { + unsigned int selected:1; /* Currently selected for write */ + unsigned int read_only:1; /* This store is read only */ + } flags; + STINIT *init; /* Initialise the fs */ + STNEWFS *newfs; /* Create a new fs */ + STDUMP *dump; /* Dump fs config snippet */ + STFREE *freefs; /* Free the fs data */ + STDBLCHECK *dblcheck; /* Double check the obj integrity */ + STSTATFS *statfs; /* Dump fs statistics */ + STMAINTAINFS *maintainfs; /* Replacement maintainence */ + STCHECKOBJ *checkob; /* Check if the fs will store an object, and get the FS load */ + /* These two are notifications */ + STREFOBJ *refobj; /* Reference this object */ + STUNREFOBJ *unrefobj; /* Unreference this object */ + STCALLBACK *callback; /* Handle pending callbacks */ + STSYNC *sync; /* Sync the directory */ + struct { + STOBJCREATE *create; /* Create a new object */ + STOBJOPEN *open; /* Open an existing object */ + STOBJCLOSE *close; /* Close an open object */ + STOBJREAD *read; /* Read from an open object */ + STOBJWRITE *write; /* Write to a created object */ + STOBJUNLINK *unlink; /* Remove the given object */ + } obj; + struct { + STLOGOPEN *open; /* Open the log */ + STLOGCLOSE *close; /* Close the log */ + STLOGWRITE *write; /* Write to the log */ + struct { + STLOGCLEANOPEN *open; /* Open a clean log */ + STLOGCLEANWRITE *write; /* Write to the log */ + void *state; /* Current state */ + } clean; + } log; + void *fsdata; /* FS-specific data */ + }; +\endcode + +\section OperationOfStorageModules Operation of a Storage Module +\par + Squid understands the concept of multiple diverse storage directories. + Each storage directory provides a caching object store, with object + storage, retrieval, indexing and replacement. + +\par + Each open object has associated with it a storeIOState object. The + storeIOState object is used to record the state of the current + object. Each storeIOState can have a storage module specific data + structure containing information private to the storage module. + +\par +\code + struct _storeIOState { + sdirno swap_dirn; /* SwapDir index */ + sfileno swap_filen; /* Unique file index number */ + StoreEntry *e; /* Pointer to parent StoreEntry */ + mode_t mode; /* Mode - O_RDONLY or O_WRONLY */ + size_t st_size; /* Size of the object if known */ + off_t offset; /* current _on-disk_ offset pointer */ + STFNCB *file_callback; /* called on delayed sfileno assignments */ + STIOCB *callback; /* IO Error handler callback */ + void *callback_data; /* IO Error handler callback data */ + struct { + STRCB *callback; /* Read completion callback */ + void *callback_data; /* Read complation callback data */ + } read; + struct { + unsigned int closing:1; /* debugging aid */ + } flags; + void *fsstate; /* pointer to private fs state */ + }; +\endcode + +\par + Each SwapDir has the concept of a maximum object size. This is used + as a basic hint to the storage layer in first choosing a suitable + SwapDir. The checkobj function is then called for suitable + candidate SwapDirs to find out whether it wants to store a + given StoreEntry. A maxobjsize of -1 means 'any size'. + +\par + The specific filesystem operations listed in the SwapDir object are + covered below. + +\subsection initfs initfs +\par +\include src/typedefs.h +\skip .*STINIT.* +\code + typedef void STINIT(SwapDir *SD); +\endcode + +\par + Initialise the given SwapDir. Operations such as verifying and + rebuilding the storage and creating any needed bitmaps are done + here. + + +\subsection newfs newfs +\par +\code + typedef void STNEWFS(SwapDir *SD); +\endcode + +\par + Called for each configured SwapDir to perform filesystem + initialisation. This happens when '-z' is given to squid on the + command line. + +\subsection dumpfs dumpfs +\par +\code + typedef void STDUMP(StoreEntry *e, SwapDir *SD); +\endcode + +\par + Dump the FS specific configuration data of the current SwapDir + to the given StoreEntry. Used to grab a configuration file dump + from the cachemgr interface. + +\remark Note: The printed options should start with a space character to + separate them from the cache_dir path. + +\subsection freefs freefs +\par +\code + typedef void STFREE(SwapDir *SD); +\endcode + +\par + Free the SwapDir filesystem information. This routine should + deallocate SD->fsdata. + + +\subsection doublecheckfs doublecheckfs +\par +\code + typedef int STDBLCHECK(SwapDir *SD, StoreEntry *e); +\endcode + +\par + Double-check the given object for validity. Called during rebuild if + the '-S' flag is given to squid on the command line. Returns 1 if the + object is indeed valid, and 0 if the object is found invalid. + +\subsection statfs statfs +\par +\code + typedef void STSTATFS(SwapDir *SD, StoreEntry *e); +\endcode + +\par + Called to retrieve filesystem statistics, such as usage, load and + errors. The information should be appended to the passed + StoreEntry e. + +\subsection maintainfs maintainfs + +\code + typedef void STMAINTAINFS(SwapDir *SD); +\endcode + +\par + Called periodically to replace objects. The active replacement policy + should be used to timeout unused objects in order to make room for + new objects. + +\subsection callback callback + +\code + typedef void + STCALLBACK(SwapDir *SD); +\endcode + +\par + This function is called inside the comm_select/comm_poll loop to handle + any callbacks pending. + +\subsection sync sync + +\code + typedef void + STSYNC(SwapDir *SD); +\endcode + +\par + This function is called whenever a sync to disk is required. This + function should not return until all pending data has been flushed to + disk. + + +\subsection parse-reconfigure parse/reconfigure + +\subsection checkobj checkobj + +\code + typedef int + STCHECKOBJ(SwapDir *SD, const StoreEntry *e); +\endcode + +\par + Called by storeDirSelectSwapDir() to determine whether the + SwapDir will store the given StoreEntry object. If the + SwapDir is not willing to store the given StoreEntry + -1 should be returned. Otherwise, a value between 0 and 1000 should + be returned indicating the current IO load. A value of 1000 indicates + the SwapDir has an IO load of 100%. This is used by + storeDirSelectSwapDir() to choose the SwapDir with the + lowest IO load. + +\subsection referenceobj referenceobj + +\code + typedef void + STREFOBJ(SwapDir *SD, StoreEntry *e); +\endcode + +\par + Called whenever an object is locked by storeLockObject(). + It is typically used to update the objects position in the replacement + policy. + +\subsection unreferenceobj unreferenceobj + +\code + typedef void + STUNREFOBJ(SwapDir *SD, StoreEntry *e); +\endcode + +\par + Called whenever the object is unlocked by storeUnlockObject() + and the lock count reaches 0. It is also typically used to update the + objects position in the replacement policy. + +\subsection createobj createobj + +\code + typedef storeIOState * + STOBJCREATE(SwapDir *SD, StoreEntry *e, STFNCB *file_callback, STIOCB *io_callback, void *io_callback_data); +\endcode + +\par + Create an object in the SwapDir *SD. file_callback is called + whenever the filesystem allocates or reallocates the swap_filen. + Note - STFNCB is called with a generic cbdata pointer, which + points to the StoreEntry e. The StoreEntry should not be + modified EXCEPT for the replacement policy fields. + +\par + The IO callback should be called when an error occurs and when the + object is closed. Once the IO callback is called, the storeIOState + becomes invalid. + +\par + STOBJCREATE returns a storeIOState suitable for writing on + sucess, or NULL if an error occurs. + +\subsection openobj openobj + +\code + typedef storeIOState * + STOBJOPEN(SwapDir *SD, StoreEntry *e, STFNCB *file_callback, STIOCB *io_callback, void *io_callback_data); +\endcode + +\par + Open the StoreEntry in SwapDir *SD for reading. Much the + same is applicable from STOBJCREATE, the major difference being + that the data passed to file_callback is the relevant store_client. + +\subsection closeobj closeobj + +\code + typedef void + STOBJCLOSE(SwapDir *SD, storeIOState *sio); +\endcode + +\par + Close an opened object. The STIOCB callback should be called at + the end of this routine. + +\subsection readobj readobj + +\code + typedef void + STOBJREAD(SwapDir *SD, storeIOState *sio, char *buf, size_t size, off_t offset, STRCB *read_callback, void *read_callback_data); +\endcode + +\par + Read part of the object of into buf. It is safe to request a read + when there are other pending reads or writes. STRCB is called at + completion. + +\par + If a read operation fails, the filesystem layer notifies the + calling module by calling the STIOCB callback with an + error status code. + +\subsection writeobj writeobj + +\code + typedef void + STOBJWRITE(SwapDir *SD, storeIOState *sio, char *buf, size_t size, off_t offset, FREE *freefunc); +\endcode + +\par + Write the given block of data to the given store object. buf is + allocated by the caller. When the write is complete, the data is freed + through free_func. + +\par + If a write operation fails, the filesystem layer notifies the + calling module by calling the STIOCB callback with an + error status code. + +\subsection unlinkobj unlinkobj + +\code + typedef void STOBJUNLINK(SwapDir *SD, StoreEntry *e); +\endcode + +\par + Remove the StoreEntry e from the SwapDir SD and the + replacement policy. + + +\section StoreIOCalls Store IO calls + +\par + These routines are used inside the storage manager to create and + retrieve objects from a storage directory. + +\subsection storeCreate storeCreate() + +\code + storeIOState * + storeCreate(StoreEntry *e, STIOCB *file_callback, STIOCB *close_callback, void * callback_data) +\endcode + +\par + storeCreate is called to store the given StoreEntry in + a storage directory. + +\par + callback is a function that will be called either when + an error is encountered, or when the object is closed (by + calling storeClose()). If the open request is + successful, there is no callback. The calling module must + assume the open request will succeed, and may begin reading + or writing immediately. + +\par + storeCreate() may return NULL if the requested object + can not be created. In this case the callback function + will not be called. + +\subsection storeOpen storeOpen() + +\code + storeIOState * + storeOpen(StoreEntry *e, STFNCB * file_callback, STIOCB * callback, void *callback_data) +\endcode + +\par + storeOpen is called to open the given StoreEntry from + the storage directory it resides on. + +\par + callback is a function that will be called either when + an error is encountered, or when the object is closed (by + calling storeClose()). If the open request is + successful, there is no callback. The calling module must + assume the open request will succeed, and may begin reading + or writing immediately. + +\par + storeOpen() may return NULL if the requested object + can not be openeed. In this case the callback function + will not be called. + +\subsection storeRead storeRead() + +\code + void + storeRead(storeIOState *sio, char *buf, size_t size, off_t offset, STRCB *callback, void *callback_data) +\endcode + +\par + storeRead() is more complicated than the other functions + because it requires its own callback function to notify the + caller when the requested data has actually been read. + buf must be a valid memory buffer of at least size + bytes. offset specifies the byte offset where the + read should begin. Note that with the Swap Meta Headers + prepended to each cache object, this offset does not equal + the offset into the actual object data. + +\par + The caller is responsible for allocating and freeing buf. + +\subsection storeWrite storeWrite() + +\code + void + storeWrite(storeIOState *sio, char *buf, size_t size, off_t offset, FREE *free_func) +\endcode + +\par + storeWrite() submits a request to write a block + of data to the disk store. + The caller is responsible for allocating buf, but since + there is no per-write callback, this memory must be freed by + the lower filesystem implementation. Therefore, the caller + must specify the free_func to be used to deallocate + the memory. + +\par + If a write operation fails, the filesystem layer notifies the + calling module by calling the STIOCB callback with an + error status code. + +\subsection storeUnlink storeUnlink() + +\code + void + storeUnlink(StoreEntry *e) +\endcode + +\par + storeUnlink() removes the cached object from the disk + store. There is no callback function, and the object + does not need to be opened first. The filesystem + layer will remove the object if it exists on the disk. + +\subsection storeOfset storeOffset() + +\code + off_t storeOffset(storeIOState *sio) +\endcode + +\par + storeOffset() returns the current _ondisk_ offset. This is used to + determine how much of an objects memory can be freed to make way for + other in-transit and cached objects. You must make sure that the + storeIOState->offset refers to the ondisk offset, or undefined + results will occur. For reads, this returns the current offset of + successfully read data, not including queued reads. + + +\section Callbacks Callbacks + +\subsection STIOCB STIOCB callback + +\code + void + stiocb(void *data, int errorflag, storeIOState *sio) +\endcode + +\par + The stiocb function is passed as a parameter to + storeOpen(). The filesystem layer calls stiocb + either when an I/O error occurs, or when the disk + object is closed. + +\par + errorflag is one of the following: +\code + #define DISK_OK (0) + #define DISK_ERROR (-1) + #define DISK_EOF (-2) + #define DISK_NO_SPACE_LEFT (-6) +\endcode + +\par + Once the The stiocb function has been called, + the sio structure should not be accessed further. + +\subsection STRCB STRCB callback + +\code + void + strcb(void *data, const char *buf, size_t len) +\endcode + +\par + The strcb function is passed as a parameter to + storeRead(). The filesystem layer calls strcb + after a block of data has been read from the disk and placed + into buf. len indicates how many bytes were + placed into buf. The strcb function is only + called if the read operation is successful. If it fails, + then the STIOCB callback will be called instead. + + +\section StateLogging State Logging + +\par + These functions deal with state + logging and related tasks for a squid storage system. + These functions are used (called) in store_dir.c. + +\par + Each storage system must provide the functions described + in this section, although it may be a no-op (null) function + that does nothing. Each function is accessed through a + function pointer stored in the SwapDir structure: +\code + struct _SwapDir { + ... + STINIT *init; + STNEWFS *newfs; + struct { + STLOGOPEN *open; + STLOGCLOSE *close; + STLOGWRITE *write; + struct { + STLOGCLEANOPEN *open; + STLOGCLEANWRITE *write; + void *state; + } clean; + } log; + .... + }; +\endcode + +\subsection log.open log.open() + +\code + void + STLOGOPEN(SwapDir *); +\endcode + +\par + The log.open() function, of type STLOGOPEN, + is used to open or initialize the state-holding log + files (if any) for the storage system. For UFS this + opens the swap.state files. + +\par + The log.open() function may be called any number of + times during Squid's execution. For example, the + process of rotating, or writing clean logfiles closes + the state log and then re-opens them. A squid -k reconfigure + does the same. + +\subsection log.close log.close() + +\code + void + STLOGCLOSE(SwapDir *); +\endcode + +\par + The log.close function, of type STLOGCLOSE, is + obviously the counterpart to log.open. It must close + the open state-holding log files (if any) for the storage + system. + +\subsection log.write log.write() + +\code + void + STLOGWRITE(const SwapDir *, const StoreEntry *, int op); +\endcode + +\par + The log.write function, of type STLOGWRITE, is + used to write an entry to the state-holding log file. The + op argument is either SWAP_LOG_ADD or SWAP_LOG_DEL. + This feature may not be required by some storage systems + and can be implemented as a null-function (no-op). + +\subsection log.clean.start() log.clean.start() + +\code + int + STLOGCLEANSTART(SwapDir *); +\endcode + +\par + The log.clean.start function, of type STLOGCLEANSTART, + is used for the process of writing "clean" state-holding + log files. The clean-writing procedure is initiated by + the squid -k rotate command. This is a special case + because we want to optimize the process as much as possible. + This might be a no-op for some storage systems that don't + have the same logging issues as UFS. + +\par + The log.clean.state pointer may be used to + keep state information for the clean-writing process, but + should not be accessed by upper layers. + +\subsection log.clean.nextentry log.clean.nextentry() + +\code + StoreEntry * + STLOGCLEANNEXTENTRY(SwapDir *); +\endcode + +\par + Gets the next entry that is a candidate for the clean log. + Returns NULL when there is no more objects to log. + +\subsection log.clean.write log.clean.write() + +\code + void + STLOGCLEANWRITE(SwapDir *, const StoreEntry *); +\endcode + +\par + The log.clean.write()/ function, of type STLOGCLEANWRITE, + writes an entry to the clean log file (if any). + +\subsection log.clean.done log.clean.done() + +\code + void + STLOGCLEANDONE(SwapDir *); +\endcode + +\par + Indicates the end of the clean-writing process and signals + the storage system to close the clean log, and rename or + move them to become the official state-holding log ready + to be opened. + + +\section ReplacementPolicyImplementation Replacement Policy Implementation + +\par +The replacement policy can be updated during STOBJREAD/STOBJWRITE/STOBJOPEN/ +STOBJCLOSE as well as STREFOBJ and STUNREFOBJ. Care should be taken to +only modify the relevant replacement policy entries in the StoreEntry. +The responsibility of replacement policy maintainence has been moved into +each SwapDir so that the storage code can have tight control of the +replacement policy. Cyclic filesystems such as COSS require this tight +coupling between the storage layer and the replacement policy. + +\section RemovalPolicyAPI Removal policy API + +\par + The removal policy is responsible for determining in which order + objects are deleted when Squid needs to reclaim space for new objects. + Such a policy is used by a object storage for maintaining the stored + objects and determining what to remove to reclaim space for new objects. + (together they implements a replacement policy) + +\subsection API API + +\par + It is implemented as a modular API where a storage directory or + memory creates a policy of choice for maintaining it's objects, + and modules registering to be used by this API. + +\subsubsection createRemovalPolicy createRemovalPolicy() + +\code + RemovalPolicy policy = createRemovalPolicy(cons char *type, cons char *args) +\endcode + +\par + Creates a removal policy instance where object priority can be + maintained + +\par + The returned RemovalPolicy instance is cbdata registered + +\subsubsection policy.free policy.Free() + +\code + policy->Free(RemovalPolicy *policy) +\endcode + +\par + Destroys the policy instance and frees all related memory. + +\subsubsection policy.Add policy.Add() + +\code + policy->Add(RemovalPolicy *policy, StoreEntry *, RemovalPolicyNode *node) +\endcode + +\par + Adds a StoreEntry to the policy instance. + +\par + datap is a pointer to where policy specific data can be stored + for the store entry, currently the size of one (void *) pointer. + +\subsubsection policy.Remove policy.Remove() +\code + policy->Remove(RemovalPolicy *policy, StoreEntry *, RemovalPolicyNode *node) +\endcode + +\par + Removes a StoreEntry from the policy instance out of + policy order. For example when an object is replaced + by a newer one or is manually purged from the store. + +\par + datap is a pointer to where policy specific data is stored + for the store entry, currently the size of one (void *) pointer. + +\subsubsection policy.Referenced policy.Referenced() +\code + policy->Referenced(RemovalPolicy *policy, const StoreEntry *, RemovalPolicyNode *node) +\endcode + +\par + Tells the policy that a StoreEntry is going to be referenced. Called + whenever a entry gets locked. + +\par + node is a pointer to where policy specific data is stored + for the store entry, currently the size of one (void *) pointer. + +\subsubsection policy.Dereferenced policy.Dereferenced() +\code + policy->Dereferenced(RemovalPolicy *policy, const StoreEntry *, RemovalPolicyNode *node) +\endcode + +\par + Tells the policy that a StoreEntry has been referenced. Called when + an access to the entry has finished. + +\par + node is a pointer to where policy specific data is stored + for the store entry, currently the size of one (void *) pointer. + +\subsubsection policy.WalkInit policy.WalkInit() +\code + RemovalPolicyWalker walker = policy->WalkInit(RemovalPolicy *policy) +\endcode + +\par + Initiates a walk of all objects in the policy instance. + The objects is returned in an order suitable for using + as reinsertion order when rebuilding the policy. + +\par + The returned RemovalPolicyWalker instance is cbdata registered + +\note The walk must be performed as an atomic operation + with no other policy actions intervening, or the outcome + will be undefined. + +\subsubsection walker.Next walker.Next() +\code + const StoreEntry *entry = walker->Next(RemovalPolicyWalker *walker) +\endcode + +\par + Gets the next object in the walk chain + +\par + Return NULL when there is no further objects + +\subsubsectino walker.Done walker.Done() +\code + walker->Done(RemovalPolicyWalker *walker) +\endcode + +\par + Finishes a walk of the maintained objects, destroys + walker. + +\subsubsection policy.PurgeInit policy.PurgeInit() +\code + RemovalPurgeWalker purgewalker = policy->PurgeInit(RemovalPolicy *policy, int max_scan) +\endcode + +\par + Initiates a search for removal candidates. Search depth is indicated + by max_scan. + +\par + The returned RemovalPurgeWalker instance is cbdata registered + +\note The walk must be performed as an atomic operation + with no other policy actions intervening, or the outcome + will be undefined. + +\subsubsection purgewalker.Next purgewalker.Next() +\code + StoreEntry *entry = purgewalker->Next(RemovalPurgeWalker *purgewalker) +\endcode + +\par + Gets the next object to purge. The purgewalker will remove each + returned object from the policy. + +\par + It is the polices responsibility to verify that the object + isn't locked or otherwise prevented from being removed. What this + means is that the policy must not return objects where + storeEntryLocked() is true. + +\par + Return NULL when there is no further purgeable objects in the policy. + +\subsubsection purgewalker.Done purgewalker.Done() + +\code + purgewalker->Done(RemovalPurgeWalker *purgewalker) +\endcode + +\par + Finishes a walk of the maintained objects, destroys + walker and restores the policy to it's normal state. + +\subsubsection policy.Stats policy.Stats() + +\code + purgewalker->Stats(RemovalPurgeWalker *purgewalker, StoreEntry *entry) +\endcode + +\par + Appends statistics about the policy to the given entry. + +\subsection SourceLayout Source layout + +\par + Policy implementations resides in src/repl/<name>/, and a make in + such a directory must result in a object archive src/repl/<name>.a + containing all the objects implementing the policy. + +\subsection InternalStructures Internal structures + +\subsubsection RemovalPolicy RemovalPolicy + +\code + typedef struct _RemovalPolicy RemovalPolicy; + struct _RemovalPolicy { + char *_type; + void *_data; + void (*add)(RemovalPolicy *policy, StoreEntry *); + ... /* see the API definition above */ + }; +\endcode + +\par + The _type member is mainly for debugging and diagnostics purposes, and + should be a pointer to the name of the policy (same name as used for + creation) + +\par + The _data member is for storing policy specific information. + +\subsubsection RemvalPolicyWalker RemovalPolicyWalker + +\code + typedef struct _RemovalPolicyWalker RemovalPolicyWalker; + struct _RemovalPolicyWalker { + RemovalPolicy *_policy; + void *_data; + StoreEntry *(*next)(RemovalPolicyWalker *); + ... /* see the API definition above */ + }; +\endcode + + +\subsubsection RemovalPolicyNode _RemovalPolicyNode + +\code + typedef struct _RemovalPolicyNode RemovalPolicyNode; + struct _RemovalPolicyNode { + void *data; + }; +\endcode + +\par + Stores policy specific information about a entry. Currently + there is only space for a single pointer, but plans are to + maybe later provide more space here to allow simple policies + to store all their data "inline" to preserve some memory. + +\subsection PolicyRegistration Policy Registration + +\par + Policies are automatically registered in the Squid binary from the + policy selection made by the user building Squid. In the future this + might get extended to support loadable modules. All registered + policies are available to object stores which wishes to use them. + +\subsection PolicyInstanceCreation >Policy instance creation + +\par + Each policy must implement a "create/new" function "RemovalPolicy * + createRemovalPolicy_<name>(char *arguments)". This function + creates the policy instance and populates it with at least the API + methods supported. Currently all API calls are mandatory, but the + policy implementation must make sure to NULL fill the structure prior + to populating it in order to assure future API compability. + +\par + It should also populate the _data member with a pointer to policy + specific data. + +\subsection Walker Walker + +\par + When a walker is created the policy populates it with at least the API + methods supported. Currently all API calls are mandatory, but the + policy implementation must make sure to NULL fill the structure prior + to populating it in order to assure future API compatibility. + +\subsection DesignNotes Design notes/bugs + +\par + The RemovalPolicyNode design is incomplete/insufficient. The intention + was to abstract the location of the index pointers from the policy + implementation to allow the policy to work on both on-disk and memory + caches, but unfortunately the purge method for HEAP based policies + needs to update this, and it is also preferable if the purge method + in general knows how to clear the information. I think the agreement + was that the current design of tightly coupling the two together + on one StoreEntry is not the best design possible. + +\par + It is debated if the design in having the policy index control the + clean index writes is the correct approach. Perhaps not. Perhaps a + more appropriate design is probably to do the store indexing + completely outside the policy implementation (i.e. using the hash + index), and only ask the policy to dump it's state somehow. + +\par + The Referenced/Dereferenced() calls is today mapped to lock/unlock + which is an approximation of when they are intended to be called. + However, the real intention is to have Referenced() called whenever + an object is referenced, and Dereferenced() only called when the + object has actually been used for anything good. + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/13_ForwardingSelection.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,8 @@ +/** +\page 13_ForwardingSelection Forwarding Selection + +\section Infrastructure Infrastructure + +\todo Write documentation about Forwarding Selection + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/14_IPCacheAndFQDNCache.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,78 @@ +/** +\page 14_IPCacheAndFQDNCache IP Cache and FQDN Cache + +\section Introduction Introduction + +\par + The IP cache is a built-in component of squid providing + Hostname to IP-Number translation functionality and managing + the involved data-structures. Efficiency concerns require + mechanisms that allow non-blocking access to these mappings. + The IP cache usually doesn't block on a request except for + special cases where this is desired (see below). + +\section DataStructures Data Structures + +\par + The data structure used for storing name-address mappings + is a small hashtable (static hash_table *ip_table), + where structures of type ipcache_entry whose most + interesting members are: +\code + struct _ipcache_entry { + char *name; + time_t lastref; + ipcache_addrs addrs; + struct _ip_pending *pending_head; + char *error_message; + unsigned char locks; + ipcache_status_t status:3; + } +\endcode + +\section ExternalOverview External Overview + +\par + Main functionality is provided through calls to: + +\par ipcache_nbgethostbyname(const char *name, IPH *handler, void *handlerdata) + Where name is the name of the host to resolve, + handler is a pointer to the function to be called when + the reply from the IP cache (or the DNS if the IP cache + misses) and handlerdata is information that is passed + to the handler and does not affect the IP cache. + +\par ipcache_gethostbyname(const char *name,int flags) + is different in that it only checks if an entry exists in + it's data-structures and does not by default contact the + DNS, unless this is requested, by setting the flags + to IP_BLOCKING_LOOKUP or IP_LOOKUP_IF_MISS. + +\par ipcache_init() + is called from mainInitialize() + after disk initialization and prior to the reverse fqdn + cache initialization + +\par ipcache_restart() + is called to clear the IP + cache's data structures, cancel all pending requests. + Currently, it is only called from mainReconfigure. + + +\section InternalOperation Internal Operation + +\par + Internally, the execution flow is as follows: On a miss, + ipcache_getnbhostbyname checks whether a request for + this name is already pending, and if positive, it creates + a new entry using ipcacheAddNew with the IP_PENDING + flag set . Then it calls ipcacheAddPending to add a + request to the queue together with data and handler. Else, + ipcache_dnsDispatch() is called to directly create a + DNS query or to ipcacheEnqueue() if all no DNS port + is free. ipcache_call_pending() is called regularly + to walk down the pending list and call handlers. LRU clean-up + is performed through ipcache_purgelru() according to + the ipcache_high threshold. + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/15_ServerProtocols.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,22 @@ +/** +\page 15_ServerProtocols Server Protocols + +\section HTTP HTTP +\todo Write Documentation about HTTP + +\section HFTP FTP +\todo Write Documentation about FTP + +\section Gopher Gopher +\todo Write Documentation about Gopher + +\section Wais Wais +\todo Write Documentation about Wais + +\section SSL SSL +\todo Write Documentation about SSL + +\section Passthru Passthru +\todo Write Documentation about Passthru + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/16_Timeouts.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,8 @@ +/** +\page 16 Timeouts Timeouts + +\section Infrastructure Infrastructure + +\todo Write documentation on Timeouts + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/17_Events.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,8 @@ +/** +\page 17_Events Events + +\section Infrastructure Infrastructure + +\todo Write documentation on Events + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/18_AccessControls.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,8 @@ +/** +\page 18_AccessControls Access Controls + +\section Infrastructure Infrastructure + +\todo Write documentation on ACL Access Controls + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/19_AuthenticationFramework.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,420 @@ +/* +\page 19_AuthenticationFramework Authentication Framework + +\par + Squid's authentication system is responsible for reading + authentication credentials from HTTP requests and deciding + whether or not those credentials are valid. This functionality + resides in two separate components: Authentication Schemes + and Authentication Modules. + +\par + An Authentication Scheme describes how Squid gets the + credentials (i.e. username, password) from user requests. + Squid currently supports two authentication schemes: Basic + and NTLM. Basic authentication uses the WWW-Authenticate + HTTP header. The Authentication Scheme code is implemented + inside Squid itself. + +\par + An Authentication Module takes the credentials received + from a client's request and tells Squid if they are + are valid. Authentication Modules are implemented + externally from Squid, as child helper processes. + Authentication Modules interface with various types + authentication databases, such as LDAP, PAM, NCSA-style + password files, and more. + +\section AuthenticationSchemeAPI Authentication Scheme API + +\subsection DefinitionOfAuthenticationScheme Definition of an Authentication Scheme + +\par + An auth scheme in squid is the collection of functions required to + manage the authentication process for a given HTTP authentication + scheme. Existing auth schemes in squid are Basic and NTLM. Other HTTP + schemes (see for example RFC 2617) have been published and could be + implemented in squid. The term auth scheme and auth module are + interchangeable. An auth module is not to be confused with an + authentication helper, which is a scheme specific external program used + by a specific scheme to perform data manipulation external to squid. + Typically this involves comparing the browser submitted credentials with + those in the organization's user directory. + +\par + Auth modules SHOULD NOT perform access control functions. Squid has + advanced caching access control functionality already. Future work in + squid will allow a auth scheme helper to return group information for a + user, to allow Squid to more seamlessly implement access control. + +\subsection Functions Function typedefs + +\par + Each function related to the general case of HTTP authentication has + a matching typedef. There are some additional function types used to + register/initialize, deregister/shutdown and provide stats on auth + modules: + +\par typedef int AUTHSACTIVE(); + The Active function is used by squid to determine whether + the auth module has successfully initialised itself with + the current configuration. + +\par typedef int AUTHSCONFIGURED(); + The configured function is used to see if the auth module + has been given valid parameters and is able to handle + authentication requests if initialised. If configured + returns 0 no other module functions except + Shutdown/Dump/Parse/FreeConfig will be called by Squid. + +\par typedef void AUTHSSETUP(authscheme_entry_t *); + functions of type AUTHSSETUP are used to register an + auth module with squid. The registration function MUST be + named "authSchemeSetup_SCHEME" where SCHEME is the auth_scheme + as defined by RFC 2617. Only one auth scheme registered in + squid can provide functionality for a given auth_scheme. + (I.e. only one auth module can handle Basic, only one can + handle Digest and so forth). The Setup function is responsible + for registering the functions in the auth module into the + passed authscheme_entry_t. The authscheme_entry_t will + never be NULL. If it is NULL the auth module should log an + error and do nothing. The other functions can have any + desired name that does not collide with any statically + linked function name within Squid. It is recommended to + use names of the form "authe_SCHEME_FUNCTIONNAME" (for + example authenticate_NTLM_Active is the Active() function + for the NTLM auth module. + +\par typedef void AUTHSSHUTDOWN(void); + Functions of type AUTHSSHUTDOWN are responsible for + freeing any resources used by the auth modules. The shutdown + function will be called before squid reconfigures, and + before squid shuts down. + +\par typedef void AUTHSINIT(authScheme *); + Functions of type AUTHSINIT are responsible for allocating + any needed resources for the authentication module. AUTHSINIT + functions are called after each configuration takes place + before any new requests are made. + +par typedef void AUTHSPARSE(authScheme *, int, char *); + Functions of type AUTHSPARSE are responsible for parsing + authentication parameters. The function currently needs a + scheme scope data structure to store the configuration in. + The passed scheme's scheme_data pointer should point to + the local data structure. Future development will allow + all authentication schemes direct access to their configuration + data without a locally scope structure. The parse function + is called by Squid's config file parser when a auth_param + scheme_name entry is encountered. + +\par typedef void AUTHSFREECONFIG(authScheme *); + Functions of type AUTHSFREECONFIG are called by squid + when freeing configuration data. The auth scheme should + free any memory allocated that is related to parse data + structures. The scheme MAY take advantage of this call to + remove scheme local configuration dependent data. (Ie cached + user details that are only relevant to a config setting). + +\par typedef void AUTHSDUMP(StoreEntry *, const char *, authScheme *); + Functions of type AUTHSDUMP are responsible for writing + to the StoreEntry the configuration parameters that a user + would put in a config file to recreate the running + configuration. + +\par typedef void AUTHSSTATS(StoreEntry *); + Functions of type AUTHSSTATS are called by the cachemgr + to provide statistics on the authmodule. Current modules + simply provide the statistics from the back end helpers + (number of requests, state of the helpers), but more detailed + statistics are possible - for example unique users seen or + failed authentication requests. + +\par + The next set of functions + work on the data structures used by the authentication schemes. + +\par typedef void AUTHSREQFREE(auth_user_request_t *); + The AUTHSREQFREE function is called when a auth_user_request is being + freed by the authentication framework, and scheme specific data was + present. The function should free any scheme related data and MUST set + the scheme_data pointer to NULL. Failure to unlink the scheme data will + result in squid dying. + +\par typedef char *AUTHSUSERNAME(auth_user_t *); + Squid does not make assumptions about where the username + is stored. This function must return a pointer to a NULL + terminated string to be used in logging the request. Return + NULL if no username/usercode is known. The string should + NOT be allocated each time this function is called. + +\par typedef int AUTHSAUTHED(auth_user_request_t *); + The AUTHED function is used by squid to determine whether + the auth scheme has successfully authenticated the user + request. If timeouts on cached credentials have occurred + or for any reason the credentials are not valid, return + false. + +\par + The next set of functions perform the actual + authentication. The functions are used by squid for both + WWW- and Proxy- authentication. Therefore they MUST NOT + assume the authentication will be based on the Proxy-* + Headers. + +\par typedef void AUTHSAUTHUSER(auth_user_request_t *, request_t *, ConnStateData *, http_hdr_type); + Functions of type AUTHSAUTHUSER are called when Squid + has a request that needs authentication. If needed the auth + scheme can alter the auth_user pointer (usually to point + to a previous instance of the user whose name is discovered + late in the auth process. For an example of this see the + NTLM scheme). These functions are responsible for performing + any in-squid routines for the authentication of the user. + The auth_user_request struct that is passed around is only + persistent for the current request. If the auth module + requires access to the structure in the future it MUST lock + it, and implement some method for identifying it in the + future. For example the NTLM module implements a connection + based authentication scheme, so the auth_user_request struct + gets referenced from the ConnStateData. + +\par typedef void AUTHSDECODE(auth_user_request_t *, const char *); + Functions of type AUTHSDECODE are responsible for decoding the passed + authentication header, creating or linking to a auth_user struct and for + storing any needed details to complete authentication in AUTHSAUTHUSER. + +\par typedef int AUTHSDIRECTION(auth_user_request_t *); + Functions of type AUTHSDIRECTION are used by squid to determine what + the next step in performing authentication for a given scheme is. The + following are the return codes: + + \li -2 = error in the auth module. Cannot determine request direction. + \li -1 = the auth module needs to send data to an external helper. + Squid will prepare for a callback on the request and call the + AUTHSSTART function. + \li 0 = the auth module has all the information it needs to + perform the authentication and provide a succeed/fail result. + \li 1 = the auth module needs to send a new challenge to the + request originator. Squid will return the appropriate status code + (401 or 407) and call the registered FixError function to allow the + auth module to insert it's challenge. + +\par typedef void AUTHSFIXERR(auth_user_request_t *, HttpReply *, http_hdr_type, request_t *); + Functions of type AUTHSFIXERR are used by squid to add scheme + specific challenges when returning a 401 or 407 error code. On requests + where no authentication information was provided, all registered auth + modules will have their AUTHSFIXERR function called. When the client + makes a request with an authentication header, on subsequent calls only the matching + AUTHSFIXERR function is called (and then only if the auth module + indicated it had a new challenge to send the client). If no auth schemes + match the request, the authentication credentials in the request are + ignored - and all auth modules are called. + +\par typedef void AUTHSFREE(auth_user_t *); + These functions are responsible for freeing scheme specific data from + the passed auth_user_t structure. This should only be called by squid + when there are no outstanding requests linked to the auth user. This includes + removing the user from any scheme specific memory caches. + +\par typedef void AUTHSADDHEADER(auth_user_request_t *, HttpReply *, int); + +\par typedef void AUTHSADDTRAILER(auth_user_request_t *, HttpReply *, int); + These functions are responsible for adding any authentication + specific header(s) or trailer(s) OTHER THAN the WWW-Authenticate and + Proxy-Authenticate headers to the passed HttpReply. The int indicates + whether the request was an accelerated request or a proxied request. For + example operation see the digest auth scheme. (Digest uses a + Authentication-Info header.) This function is called whenever a + auth_user_request exists in a request when the reply is constructed + after the body is sent on chunked replies respectively. + +\par typedef void AUTHSONCLOSEC(ConnStateData *); + This function type is called when a auth_user_request is + linked into a ConnStateData struct, and the connection is closed. If any + scheme specific activities related to the request or connection are in + progress, this function MUST clear them. + +\par typedef void AUTHSSTART(auth_user_request_t * , RH * , void *); + This function type is called when squid is ready to put the request + on hold and wait for a callback from the auth module when the auth + module has performed it's external activities. + +\subsection DataStructures Data Structures + +\par + This is used to link auth_users into the username cache. + Because some schemes may link in aliases to a user, the + link is not part of the auth_user structure itself. + +\code +struct _auth_user_hash_pointer { + /* first two items must be same as hash_link */ + char *key; + auth_user_hash_pointer *next; + auth_user_t *auth_user; + dlink_node link; /* other hash entries that point to the same auth_user */ +}; +\endcode + +\par + This is the main user related structure. It stores user-related data, + and is persistent across requests. It can even persistent across + multiple external authentications. One major benefit of preserving this + structure is the cached ACL match results. This structure, is private to + the authentication framework. + +\code +struct _auth_user_t { + /* extra fields for proxy_auth */ + /* this determines what scheme owns the user data. */ + auth_type_t auth_type; + /* the index +1 in the authscheme_list to the authscheme entry */ + int auth_module; + /* we only have one username associated with a given auth_user struct */ + auth_user_hash_pointer *usernamehash; + /* we may have many proxy-authenticate strings that decode to the same user*/ + dlink_list proxy_auth_list; + dlink_list proxy_match_cache; + struct { + unsigned int credentials_ok:2; /*0=unchecked,1=ok,2=failed*/ + } flags; + long expiretime; + /* IP addr this user authenticated from */ + struct IN_ADDR ipaddr; + time_t ip_expiretime; + /* how many references are outstanding to this instance*/ + size_t references; + /* the auth scheme has it's own private data area */ + void *scheme_data; + /* the auth_user_request structures that link to this. Yes it could be a splaytree + * but how many requests will a single username have in parallel? */ + dlink_list requests; +}; +\endcode + +\par + This is a short lived structure is the visible aspect of the + authentication framework. + +\code +struct _auth_user_request_t { + /* this is the object passed around by client_side and acl functions */ + /* it has request specific data, and links to user specific data */ + /* the user */ + auth_user_t *auth_user; + /* return a message on the 401/407 error pages */ + char *message; + /* any scheme specific request related data */ + void *scheme_data; + /* how many 'processes' are working on this data */ + size_t references; +}; +\endcode + +\par + The authscheme_entry struct is used to store the runtime + registered functions that make up an auth scheme. An auth + scheme module MUST implement ALL functions except the + following functions: oncloseconnection, AddHeader, AddTrailer.. + In the future more optional functions may be added to this + data type. + +\code +struct _authscheme_entry { + char *typestr; + AUTHSACTIVE *Active; + AUTHSADDHEADER *AddHeader; + AUTHSADDTRAILER *AddTrailer; + AUTHSAUTHED *authenticated; + AUTHSAUTHUSER *authAuthenticate; + AUTHSDUMP *dump; + AUTHSFIXERR *authFixHeader; + AUTHSFREE *FreeUser; + AUTHSFREECONFIG *freeconfig; + AUTHSUSERNAME *authUserUsername; + AUTHSONCLOSEC *oncloseconnection; /*optional*/ + AUTHSDECODE *decodeauth; + AUTHSDIRECTION *getdirection; + AUTHSPARSE *parse; + AUTHSINIT *init; + AUTHSREQFREE *requestFree; + AUTHSSHUTDOWN *donefunc; + AUTHSSTART *authStart; + AUTHSSTATS *authStats; +}; +\endcode + +\par + For information on the requirements for each of the + functions, see the details under the typedefs above. For + reference implementations, see the squid source code, + /src/auth/basic for a request based stateless auth module, + and /src/auth/ntlm for a connection based stateful auth + module. + +\subsection HowToAddAuthenitcationSchemes How to add a new Authentication Scheme + +\par + Copy the nearest existing auth scheme and modify to receive the + appropriate scheme headers. Now step through the acl.c MatchAclProxyUser + function's code path and see how the functions call down through + authenticate.c to your scheme. Write a helper to provide you scheme with + any backend existence it needs. Remember any blocking code must go in + AUTHSSTART function(s) and _MUST_ use callbacks. + +\subsection HowToHookInNewFunctions How to "hook in" new functions to the API + +\par + Start of by figuring the code path that will result in + the function being called, and what data it will need. Then + create a typedef for the function, add and entry to the + authscheme_entry struct. Add a wrapper function to + authenticate.c (or if appropriate cf_cache.c) that called + the scheme specific function if it exists. Test it. Test + it again. Now port to all the existing auth schemes, or at + least add a setting of NULL for the function for each + scheme. + +\section AuthenticalModuleInterface Authentication Module Interface + +\subsection BasicAuthenticationModules Basic Authentication Modules + +\par +Basic authentication provides a username and password. These +are written to the authentication module processes on a single +line, separated by a space: +\code + +\endcode + +\par + The authentication module process reads username, password pairs + on stdin and returns either "OK" or "ERR" on stdout for + each input line. + +\par + The following simple perl script demonstrates how the + authentication module works. This script allows any + user named "Dirk" (without checking the password) + and allows any user that uses the password "Sekrit": + +\code +#!/usr/bin/perl -w +$|=1; # no buffering, important! +while (<>) { + chop; + ($u,$p) = split; + $ans = &check($u,$p); + print "$ans\n"; +} + +sub check { + local($u,$p) = @_; + return 'ERR' unless (defined $p && defined $u); + return 'OK' if ('Dirk' eq $u); + return 'OK' if ('Sekrit' eq $p); + return 'ERR'; +} +\endcode + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/20_ICP.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,8 @@ +/** +\page 20_ICP ICP + +\section Infrastructure Infrastructure + +\todo Write documentation on ICP + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/21_NetDB.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,8 @@ +/** +\page 21_NetDB NetDB + +\section Infrastructure Infrastructure + +\todo Write documentation on NetDB + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/22_ErrorPages.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,8 @@ +/** +\page 22_ErrorPages Error Pages + +\section Infrastructure Infrastructure + +\todo Write documentation on Error Pages + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/23_CallbackDataAllocator.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,303 @@ +/** +\page 23_CallbackDataAllocator Callback Data Allocator + +\par + Squid's extensive use of callback functions makes it very + susceptible to memory access errors. To address this all callback + functions make use of a construct called "cbdata". This allows + functions doing callbacks to verify that the caller is still + valid before making the callback. + +\par + Note: cbdata is intended for callback data and is tailored specifically + to make callbacks less dangerous leaving as few windows of errors as + possible. It is not suitable or intended as a generic referencecounted + memory allocator. + +\section API API + +\subsection CBDATA_TYPE CBDATA_TYPE +\code + CBDATA_TYPE(datatype); +\endcode + +\par + Macro that defines a new cbdata datatype. Similar to a variable + or struct definition. Scope is always local to the file/block + where it is defined and all calls to cbdataAlloc for this type + must be within the same scope as the CBDATA_TYPE declaration. + Allocated entries may be referenced or freed anywhere with no + restrictions on scope. + +\subsection CBDATA_GLOBAL_TYPE CBDATA_GLOBAL_TYPE +\code + /* Module header file */ + external CBDATA_GLOBAL_TYPE(datatype); + + /* Module main C file */ + CBDATA_GLOBAL_TYPE(datatype); +\endcode + +\par + Defines a global cbdata type that can be referenced anywhere in + the code. + +\subsectino CBDATA_INIT_TYPE CBDATA_INIT_TYPE +\code + CBDATA_INIT_TYPE(datatype); + /* or */ + CBDATA_INIT_TYPE_FREECB(datatype, FREE *freehandler); +\endcode + +\par + Initializes the cbdatatype. Must be called prior to the first use of + cbdataAlloc() for the type. + +\par + The freehandler is called when the last known reference to a + allocated entry goes away. + +\subsection cbdataAlloc cbdataAlloc +\code + pointer = cbdataAlloc(datatype); +\endcode + +\par + Allocates a new entry of a registered cbdata type. + +\subsectino cbdataFree cbdataFree +\code + cbdataFree(pointer); +\endcode + +\par + Frees a entry allocated by cbdataAlloc(). + +\note If there are active references to the entry then the entry + will be freed with the last reference is removed. However, + cbdataReferenceValid() will return false for those references. + +\subsection cbdataReference cbdataReference +\code + reference = cbdataReference(pointer); +\endcode + +\par + Creates a new reference to a cbdata entry. Used when you need to + store a reference in another structure. The reference can later + be verified for validity by cbdataReferenceValid(). + +\note The reference variable is a pointer to the entry, in all + aspects identical to the original pointer. But semantically it + is quite different. It is best if the reference is thought of + and handled as a "void *". + +\subsection cbdataReferenceDone cbdataReferenceDone +\code + cbdataReferenceDone(reference); +\endcode + +\par + Removes a reference created by cbdataReference(). + +\note The reference variable will be automatically cleared to NULL. + +\subsection cbdataReferenceValid cbdataReferenceValid +\code + if (cbdataReferenceValid(reference)) { + ... + } +\endcode + +\par + cbdataReferenceValid() returns false if a reference is stale (refers to a + entry freed by cbdataFree). + +\subsection cbdataReferenceValidDone cbdataReferenceValidDone +\code + void *pointer; + bool cbdataReferenceValidDone(reference, &pointer); +\endcode + +\par + Removes a reference created by cbdataReference() and checks + it for validity. A temporary pointer to the referenced data + (if valid) is returned in the &pointer argument. + +\par + Meant to be used on the last dereference, usually to make + a callback. + +\code + void *cbdata; + ... + if (cbdataReferenceValidDone(reference, &cbdata)) != NULL) + callback(..., cbdata); +\endcode + +\note The reference variable will be automatically cleared to NULL. + +\section Examples Examples +\par + Here you can find some examples on how to use cbdata, and why + +\subsection AsyncOpWithoutCBDATA Asynchronous operation without cbdata, showing why cbdata is needed +\par + For a asyncronous operation with callback functions, the normal + sequence of events in programs NOT using cbdata is as follows: + +\code + /* initialization */ + type_of_data our_data; + ... + our_data = malloc(...); + ... + /* Initiate a asyncronous operation, with our_data as callback_data */ + fooOperationStart(bar, callback_func, our_data); + ... + /* The asyncronous operation completes and makes the callback */ + callback_func(callback_data, ....); + /* Some time later we clean up our data */ + free(our_data); +\endcode + +\par + However, things become more interesting if we want or need + to free the callback_data, or otherwise cancel the callback, + before the operation completes. In constructs like this you + can quite easily end up with having the memory referenced + pointed to by callback_data freed before the callback is invoked + causing a program failure or memory corruption: + +\code + /* initialization */ + type_of_data our_data; + ... + our_data = malloc(...); + ... + /* Initiate a asyncronous operation, with our_data as callback_data */ + fooOperationStart(bar, callback_func, our_data); + ... + /* ouch, something bad happened elsewhere.. try to cleanup + * but the programmer forgot there is a callback pending from + * fooOperationsStart() (an easy thing to forget when writing code + * to deal with errors, especially if there may be many different + * pending operation) + */ + free(our_data); + ... + /* The asyncronous operation completes and makes the callback */ + callback_func(callback_data, ....); + /* CRASH, the memory pointer to by callback_data is no longer valid + * at the time of the callback + */ +\endcode + +\subsection AsyncOpWithCBDATA Asyncronous operation with cbdata + +\par + The callback data allocator lets us do this in a uniform and + safe manner. The callback data allocator is used to allocate, + track and free memory pool objects used during callback + operations. Allocated memory is locked while the asyncronous + operation executes elsewhere, and is freed when the operation + completes. The normal sequence of events is: + +\code + /* initialization */ + type_of_data our_data; + ... + our_data = cbdataAlloc(type_of_data); + ... + /* Initiate a asyncronous operation, with our_data as callback_data */ + fooOperationStart(..., callback_func, our_data); + ... + /* foo */ + void *local_pointer = cbdataReference(callback_data); + .... + /* The asyncronous operation completes and makes the callback */ + void *cbdata; + if (cbdataReferenceValidDone(local_pointer, &cbdata)) + callback_func(...., cbdata); + ... + cbdataFree(our_data); +\endcode + +\subsection AsynchronousOpCancelledByCBDATA Asynchronous operation cancelled by cbdata + +\par + With this scheme, nothing bad happens if cbdataFree gets called + before fooOperantionComplete(...). + +\code + /* initialization */ + type_of_data our_data; + ... + our_data = cbdataAlloc(type_of_data); + ... + /* Initiate a asyncronous operation, with our_data as callback_data */ + fooOperationStart(..., callback_func, our_data); + ... + /* foo */ + void *local_pointer = cbdataReference(callback_data); + .... + /* something bad happened elsewhere.. cleanup */ + cbdataFree(our_data); + ... + /* The asyncronous operation completes and tries to make the callback */ + void *cbdata; + if (cbdataReferenceValidDone(local_pointer, &cbdata)) + /* won't be called, as the data is no longer valid */ + callback_func(...., cbdata); + +\endcode + +\par + In this case, when cbdataFree is called before + cbdataReferenceValidDone, the callback_data gets marked as invalid. + When the callback_data is invalid before executing the callback + function, cbdataReferenceValidDone will return 0 and + callback_func is never executed. + +\subsection AddingCBDATAType Adding a new cbdata registered type + +\par + To add new module specific data types to the allocator one uses the + macros CBDATA_TYPE and CBDATA_INIT_TYPE. These creates a local cbdata + definition (file or block scope). Any cbdataAlloc calls must be made + within this scope. However, cbdataFree might be called from anywhere. +\code + /* First the cbdata type needs to be defined in the module. This + * is usually done at file scope, but it can also be local to a + * function or block.. + */ + CBDATA_TYPE(type_of_data); + + /* Then in the code somewhere before the first allocation + * (can be called multiple times with only a minimal overhead) + */ + CBDATA_INIT_TYPE(type_of_data); + /* Or if a free function is associated with the data type. This + * function is responsible for cleaning up any dependencies etc + * referenced by the structure and is called on cbdataFree or + * when the last reference is deleted by cbdataReferenceDone / + * cbdataReferenceValidDone + */ + CBDATA_INIT_TYPE_FREECB(type_of_data, free_function); +\endcode + +\subsection AddingGlobalCBDATATypes Adding a new cbdata registered data type globally + +\par + To add new global data types that can be allocated from anywhere + within the code one have to add them to the cbdata_type enum in + enums.h, and a corresponding CREATE_CBDATA call in + cbdata.c:cbdataInit(). Or alternatively add a CBDATA_GLOBAL_TYPE + definition to globals.h as shown below and use CBDATA_INIT_TYPE at + the appropriate location(s) as described above. + +\code + extern CBDATA_GLOBAL_TYPE(type_of_data); /* CBDATA_UNDEF */ +\endcode + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/24_RefCountDataAllocator.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,154 @@ +/** +\page 24_RefCountDataAllocator Reference Counting Data Allocator (C++ Only) + +\note This is only available in Squid 3.x C++ code. + +\par + Manual reference counting such as cbdata uses is error prone, + and time consuming for the programmer. C++'s operator overloading + allows us to create automatic reference counting pointers, that will + free objects when they are no longer needed. With some care these + objects can be passed to functions needed Callback Data pointers. + +\section API API +\par + There are two classes involved in the automatic refcouting - a + RefCountable class that provides the mechanics for reference + counting a given derived class. And a RefCount class that is the + smart pointer, and handles const correctness, and tells the RefCountable + class of references and dereferences. + +\subsection RefCountable RefCountable +\par + The RefCountable base class defines one abstract function - + deleteSelf(). You must implement deleteSelf for each concrete + class and. deleteSelf() is a workaround for 'operator delete' not + being virtual. delete Self typically looks like: +\code + void deleteSelf() const { delete this; } +\endcode + +\subsection RefCount RefCount +\par + The RefCount template class replaces pointers as parameters and + variables of the class being reference counted. Typically one creates + a typedef to aid users. + +\code + class MyConcrete : public RefCountable { + public: + typedef RefCount Pointer; + void deleteSelf() const {delete this;} + }; +\endcode + Now, one can pass objects of MyConcrete::Pointer around. + +\subsection CBDATA CBDATA +\par + To make a refcounting CBDATA class, you need to overload new and delete, + include a macro in your class definition, and ensure that some everyone + who would call you directly (not as a cbdata callback, but as a normal + use), holds a RefCount<> smart pointer to you. + +\code + class MyConcrete : public RefCountable { + public: + typedef RefCount Pointer; + void * operator new(size_t); + void operator delete (void *); + void deleteSelf() const {delete this;} + private: + CBDATA_CLASS(MyConcrete); + }; + + ... + /* In your .cc file */ + CBDATA_CLASS_INIT(MyConcrete); + + void * + MyConcrete::operator new (size_t) + { + CBDATA_INIT_TYPE(MyConcrete); + MyConcrete *result = cbdataAlloc(MyConcrete); + /* Mark result as being owned - we want the refcounter to do the + * delete call + */ + cbdataReference(result); + return result; + } + + void + MyConcrete::operator delete (void *address) + { + MyConcrete *t = static_cast(address); + cbdataFree(address); + /* And allow the memory to be freed */ + cbdataReferenceDone (t); + } +\endcode + +\par + When no RefCount smart pointers exist, the objects + delete method will be called. This will run the object destructor, + freeing any foreign resources it hold. Then cbdataFree + will be called, marking the object as invalid for all the cbdata + functions that it may have queued. When they all return, the actual + memory will be returned to the pool. + +\subsection UsingRefCounter Using the Refcounter +\par + Allocation and deallocation of refcounted objects (including those of + the RefCount template class) must be done via new() and delete(). If a + class that will hold an instance of a RefCount <foo> variable + does not use delete(), you must assign NULL to the variable before + it is freed. Failure to do this will result in memory leaks. You HAVE + been warned. + +\par + Never call delete or deleteSelf on a RefCountable object. You will + create a large number of dangling references and squid will segfault + eventually. + +\par + Always create at least one RefCount smart pointer, so that the + reference counting mechanism will delete the object when it's not + needed. + +\par + Do not pass RefCount smart pointers outside the squid memory space. + They will invariably segfault when copied. + +\par + If, in a method, all other smart pointer holding objects may be deleted + or may set their smart pointers to NULL, then you will be deleted + partway through the method (and thus crash). To prevent this, assign + a smart pointer to yourself: + +\code + void + MyConcrete::aMethod(){ + /* This holds a reference to us */ + Pointer aPointer(this); + /* This is a method that may mean we don't need to exist anymore */ + someObject->someMethod(); + /* This prevents aPointer being optimised away before this point, + * and must be the last line in our method + */ + aPointer = NULL; + } +\endcode + +\par + Calling methods via smart pointers is easy just dereference via -> +\code + void + SomeObject::someFunction() { + myConcretePointer->someOtherMethod(); + } +\endcode + +\par + When passing RefCount smart pointers, always pass them as their + native type, never as '*' or as '&'. + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/25_CacheManager.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,8 @@ +/** +\page 25_CacheManager Cache Manager + +\section Infrastructure Infrastructure + +\todo Write documentation for Cache Manager + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/26_HTTPHeaders.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,235 @@ +/** +/page 26_HTTPHeaders HTTP Headers + +\par Files: + \li HttpHeader.c + \li HttpHeaderTools.c + \li HttpHdrCc.c + \li HttpHdrContRange.c + \li HttpHdrExtField.c + \li HttpHdrRange.c + +\par + HttpHeader class encapsulates methods and data for HTTP header + manipulation. HttpHeader can be viewed as a collection of HTTP + header-fields with such common operations as add, delete, and find. + Compared to an ascii "string" representation, HttpHeader performs + those operations without rebuilding the underlying structures from + scratch or searching through the entire "string". + +\section General General remarks +\par + HttpHeader is a collection (or array) of HTTP header-fields. A header + field is represented by an HttpHeaderEntry object. HttpHeaderEntry is + an (id, name, value) triplet. Meaningful "Id"s are defined for + "well-known" header-fields like "Connection" or "Content-Length". + When Squid fails to recognize a field, it uses special "id", + HDR_OTHER. Ids are formed by capitalizing the corresponding HTTP + header-field name and replacing dashes ('-') with underscores ('_'). +\par + Most operations on HttpHeader require a "known" id as a parameter. The + rationale behind the later restriction is that Squid programmer should + operate on "known" fields only. If a new field is being added to + header processing, it must be given an id. + +\section LifeCycle Life cycle +\par + HttpHeader follows a common pattern for object initialization and + cleaning: + +\code + /* declare */ + HttpHeader hdr; + + /* initialize (as an HTTP Request header) */ + httpHeaderInit(&hdr, hoRequest); + + /* do something */ + ... + + /* cleanup */ + httpHeaderClean(&hdr); +\endcode + +\par + Prior to use, an HttpHeader must be initialized. A + programmer must specify if a header belongs to a request + or reply message. The "ownership" information is used mostly + for statistical purposes. + +\par + Once initialized, the HttpHeader object must be, + eventually, cleaned. Failure to do so will result in a + memory leak. + +\par + Note that there are no methods for "creating" or "destroying" + a "dynamic" HttpHeader object. Looks like headers are + always stored as a part of another object or as a temporary + variable. Thus, dynamic allocation of headers is not needed. + +\section HeaderManipulation Header Manipulation +\par + The mostly common operations on HTTP headers are testing + for a particular header-field (httpHeaderHas()), + extracting field-values (httpHeaderGet*()), and adding + new fields (httpHeaderPut*()). + +\par + httpHeaderHas(hdr, id) returns true if at least one + header-field specified by "id" is present in the header. + Note that using HDR_OTHER as an id is prohibited. + There is usually no reason to know if there are "other" + header-fields in a header. + +\par + httpHeaderGet<Type>(hdr, id) returns the value + of the specified header-field. The "Type" must match + header-field type. If a header is not present a "null" + value is returned. "Null" values depend on field-type, of + course. + +\par + Special care must be taken when several header-fields with + the same id are preset in the header. If HTTP protocol + allows only one copy of the specified field per header + (e.g. "Content-Length"), httpHeaderGet<Type>() + will return one of the field-values (chosen semi-randomly). + If HTTP protocol allows for several values (e.g. "Accept"), + a "String List" will be returned. + +\par + It is prohibited to ask for a List of values when only one + value is permitted, and visa-versa. This restriction prevents + a programmer from processing one value of an header-field + while ignoring other valid values. + +\par + httpHeaderPut<Type>(hdr, id, value) will add an + header-field with a specified field-name (based on "id") + and field_value. The location of the newly added field in + the header array is undefined, but it is guaranteed to be + after all fields with the same "id" if any. Note that old + header-fields with the same id (if any) are not altered in + any way. + +\par + The value being put using one of the httpHeaderPut() + methods is converted to and stored as a String object. + +\par Example: + +\code + /* add our own Age field if none was added before */ + int age = ... + if (!httpHeaderHas(hdr, HDR_AGE)) + httpHeaderPutInt(hdr, HDR_AGE, age); +\endcode + +\par + There are two ways to delete a field from a header. To + delete a "known" field (a field with "id" other than + HDR_OTHER), use httpHeaderDelById() function. + Sometimes, it is convenient to delete all fields with a + given name ("known" or not) using httpHeaderDelByName() + method. Both methods will delete ALL fields specified. + +\par + The httpHeaderGetEntry(hdr, pos) function can be used + for iterating through all fields in a given header. Iteration + is controlled by the pos parameter. Thus, several + concurrent iterations over one name, name)) + ... /* delete entry */ + } +\endcode + +\note httpHeaderGetEntry() is a low level function + and must not be used if high level alternatives are available. + For example, to delete an entry with a given name, use the + httpHeaderDelByName() function rather than the loop + above. + +\section HeaderIO I/O and Headers +\par + To store a header in a file or socket, pack it using + httpHeaderPackInto() method and a corresponding + "Packer". Note that httpHeaderPackInto will pack only + header-fields; request-lines and status-lines are not + prepended, and CRLF is not appended. Remember that neither + of them is a part of HTTP message header as defined by the + HTTP protocol. + +\section AddingNewHeaderFieldIDs Adding new header-field ids +\par + Adding new ids is simple. First add new HDR_ entry to the + http_hdr_type enumeration in enums.h. Then describe a new + header-field attributes in the HeadersAttrs array located + in HttpHeader.c. The last attribute specifies field + type. Five types are supported: integer (ftInt), string + (ftStr), date in RFC 1123 format (ftDate_1123), + cache control field (ftPCc), range field (ftPRange), + and content range field (ftPContRange). Squid uses + type information to convert internal binary representation + of fields to their string representation (httpHeaderPut + functions) and visa-versa (httpHeaderGet functions). + +\par + Finally, add new id to one of the following arrays: + GeneralHeadersArr, EntityHeadersArr, + ReplyHeadersArr, RequestHeadersArr. Use HTTP + specs to determine the applicable array. If your header-field + is an "extension-header", its place is in ReplyHeadersArr + and/or in RequestHeadersArr. You can also use + EntityHeadersArr for "extension-header"s that can be + used both in replies and requests. Header fields other + than "extension-header"s must go to one and only one of + the arrays mentioned above. + +\par + Also, if the new field is a "list" header, add it to the + ListHeadersArr array. A "list" field-header is the + one that is defined (or can be defined) using "#" BNF + construct described in the HTTP specs. Essentially, a field + that may have more than one valid field-value in a single + header is a "list" field. + +\par + In most cases, if you forget to include a new field id in + one of the required arrays, you will get a run-time assertion. + For rarely used fields, however, it may take a long time + for an assertion to be triggered. + +\par + There is virtually no limit on the number of fields supported + by Squid. If current mask sizes cannot fit all the ids (you + will get an assertion if that happens), simply enlarge + HttpHeaderMask type in typedefs.h. + +\section Efficiency A Word on Efficiency +\par + httpHeaderHas() is a very cheap (fast) operation + implemented using a bit mask lookup. + +\par + Adding new fields is somewhat expensive if they require + complex conversions to a string. + +\par + Deleting existing fields requires scan of all the entries + and comparing their "id"s (faster) or "names" (slower) with + the one specified for deletion. + +\par + Most of the operations are faster than their "ascii string" + equivalents. + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/27_MiscOther.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,489 @@ +/** +/page 27_MiscOther Miscellaneous Other Details + +\section FileFormats File Formats + +\subsection swap.state swap.state +\note This information is current as of version 2.2.STABLE4. + +\par + A swap.state entry is defined by the storeSwapLogData + structure, and has the following elements: +\code + struct _storeSwapLogData { + char op; + int swap_file_number; + time_t timestamp; + time_t lastref; + time_t expires; + time_t lastmod; + size_t swap_file_sz; + u_short refcount; + u_short flags; + unsigned char key[MD5_DIGEST_CHARS]; + }; +\endcode + + +\par op + Either SWAP_LOG_ADD (1) when an object is added to + the disk storage, or SWAP_LOG_DEL (2) when an object is + deleted. + +\par swap_file_number + The 32-bit file number which maps to a pathname. Only + the low 24-bits are relevant. The high 8-bits are + used as an index to an array of storage directories, and + are set at run time because the order of storage directories + may change over time. + +\par timestamp + A 32-bit Unix time value that represents the time when + the origin server generated this response. If the response + has a valid Date: header, this timestamp corresponds + to that time. Otherwise, it is set to the Squid process time + when the response is read (as soon as the end of headers are + found). + +\par lastref + The last time that a client requested this object. + Strictly speaking, this time is set whenver the StoreEntry + is locked (via storeLockObject()). + +\par expires + The value of the response's Expires: header, if any. + If the response does not have an Expires: header, this + is set to -1. If the response has an invalid (unparseable) + Expires: header, it is also set to -1. There are some cases + where Squid sets expires to -2. This happens for the + internal ``netdb'' object and for FTP URL responses. + +\par lastmod + The value of the response's Last-modified: header, if any. + This is set to -1 if there is no Last-modified: header, + or if it is unparseable. + +\par swap_file_sz + This is the number of bytes that the object occupies on + disk. It includes the Squid "swap file header". + +\par refcount + The number of times that this object has been accessed (referenced). + Since its a 16-bit quantity, it is susceptible to overflow + if a single object is accessed 65,536 times before being replaced. + +\par flags + A copy of the StoreEntry flags field. Used as a sanity + check when rebuilding the cache at startup. Objects that + have the KEY_PRIVATE flag set are not added back to the cache. + +\par key + The 128-bit MD5 hash for this object. + +\note storeSwapLogData entries are written in native machine + byte order. They are not necessarily portable across architectures. + +\section StoreSwapMeta Store "swap meta" Description +\par + "swap meta" refers to a section of meta data stored at the beginning + of an object that is stored on disk. This meta data includes information + such as the object's cache key (MD5), URL, and part of the StoreEntry + structure. + +\par + The meta data is stored using a TYPE-LENGTH-VALUE format. That is, + each chunk of meta information consists of a TYPE identifier, a + LENGTH field, and then the VALUE (which is LENGTH octets long). + +\subsection Types Types +\par + As of Squid-2.3, the following TYPES are defined (from enums.h): + +\par STORE_META_VOID + Just a placeholder for the zeroth value. It is never used + on disk. + +\par STORE_META_KEY_URL + This represents the case when we use the URL as the cache + key, as Squid-1.1 does. Currently we don't support using + a URL as a cache key, so this is not used. + +\par STORE_META_KEY_SHA + For a brief time we considered supporting SHA (secure + hash algorithm) as a cache key. Nobody liked it, and + this type is not currently used. + +\par STORE_META_KEY_MD5 + This represents the MD5 cache key that Squid currently uses. + When Squid opens a disk file for reading, it can check that + this MD5 matches the MD5 of the user's request. If not, then + something went wrong and this is probably the wrong object. + +\par STORE_META_URL + The object's URL. This also may be matched against a user's + request for cache hits to make sure we got the right object. + +\par STORE_META_STD + This is the "standard metadata" for an object. Really + its just this middle chunk of the StoreEntry structure: +\code + time_t timestamp; + time_t lastref; + time_t expires; + time_t lastmod; + size_t swap_file_sz; + u_short refcount; + u_short flags; +\endcode + +\par STORE_META_STD_LFS + Updated version of STORE_META_STD, with support for + >2GB objects. As STORE_META_STD except that the swap_file_sz + is a squid_file_sz (64-bit integer) instead of size_t. + +\par STORE_META_HITMETERING + Reserved for future hit-metering (RFC 2227) stuff. + +\par STORE_META_VALID +\todo Document STORE_META_VALID + +\par STORE_META_VARY_HEADERS + Information about the Vary header relation on this object + +\par STORE_META_OBJSIZE + object size, if its known + +\section ImplementationNotes Implementation Notes +\par + When writing an object to disk, we must first write the meta data. + This is done with a couple of functions. First, storeSwapMetaPack() + takes a StoreEntry as a parameter and returns a tlv linked + list. Second, storeSwapMetaPack() converts the tlv list + into a character buffer that we can write. + +\note MemObject has a member called swap_hdr_sz. + This value is the size of that character buffer; the size of the + swap file meta data. The StoreEntry has a member named + swap_file_sz that represents the size of the disk file. + Thus, the size of the object "content" is +\code + StoreEntry->swap_file_sz - MemObject->swap_hdr_sz; +\endcode +\note The swap file content includes the HTTP reply headers and the HTTP reply body (if any). + +\par + When reading a swap file, there is a similar process to extract + the swap meta data. First, storeSwapMetaUnpack() converts a + character buffer into a tlv linked list. It also tells us + the value for MemObject->swap_hdr_sz. + +\section leakFinder leakFinder +\par + src/leakfinder.c contains some routines useful for debugging + and finding memory leaks. It is not enabled by default. To enable + it, use +\code + configure --enable-leakfinder ... +\endcode + +\par + The module has three public functions: leakAdd, + leakFree, and leakTouch Note, these are actually + macros that insert __FILE__ and __LINE__ arguments to the real + functions. + +\par + leakAdd should be called when a pointer is first created. + Usually this follows immediately after a call to malloc or some + other memory allocation function. For example: +\code + ... + void *p; + p = malloc(100); + leakAdd(p); + ... +\endcode + +\par + leakFree is the opposite. Call it just before releasing + the pointer memory, such as a call to free. For example: +\code + ... + leakFree(foo); + free(foo); + return; +\endcode +\note leakFree aborts with an assertion if you give it a pointer that was never added with leakAdd. + +\par + The definition of a leak is memory that was allocated but never + freed. Thus, to find a leak we need to track the pointer between + the time it got allocated and the time when it should have been + freed. Use leakTouch to accomplish this. You can sprinkle + leakTouch calls throughout the code where the pointer is + used. For example: +\code +void +myfunc(void *ptr) +{ + ... + leakTouch(ptr); + ... +} +\endcode +\note leakTouch aborts with an assertion if you give it + a pointer that was never added with leakAdd, or if the + pointer was already freed. + +\par + For each pointer tracked, the module remembers the filename, line + number, and time of last access. You can view this data with the + cache manager by selecting the leaks option. You can also + do it from the command line: +\code +% client mgr:leaks | less +\endcode + +\par + The way to identify possible leaks is to look at the time of last + access. Pointers that haven't been accessed for a long time are + candidates for leaks. The filename and line numbers tell you where + that pointer was last accessed. If there is a leak, then the bug + occurs somewhere after that point of the code. + +\section MemPools MemPools +\par + MemPools are a pooled memory allocator running on top of malloc(). It's + purpose is to reduce memory fragmentation and provide detailed statistics + on memory consumption. + +\par + Preferably all memory allocations in Squid should be done using MemPools + or one of the types built on top of it (i.e. cbdata). + +\note Usually it is better to use cbdata types as these gives you additional + safeguards in references and typechecking. However, for high usage pools where + the cbdata functionality of cbdata is not required directly using a MemPool + might be the way to go. + +\subsection PublicAPI Public API +\par + This defines the public API definitions + +\subsubsection createMemPool createMemPool +\code + MemPool * pool = memPoolCreate(char *name, size_t element_size); +\endcode + +\par + Creates a MemPool of elements with the given size. + +\subsubsection memPoolAlloc memPoolAlloc +\code + type * data = memPoolAlloc(pool); +\endcode + +\par + Allocate one element from the pool + +\subsubsection memPoolFree memPoolFree +\code + memPoolFree(pool, data); +\endcode + +\par + Free a element allocated by memPoolAlloc(); + +\subsubsection memPoolDestroy memPoolDestroy +\code + memPoolDestroy(&pool); +\endcode + +\par + Destroys a memory pool created by memPoolCreate() and reset pool to NULL. + +\par + Typical usage could be: +\code + ... + myStructType *myStruct; + MemPool * myType_pool = memPoolCreate("This is cute pool", sizeof(myStructType)); + myStruct = memPoolAlloc(myType_pool); + myStruct->item = xxx; + ... + memPoolFree(myStruct, myType_pool); + memPoolDestroy(&myType_pool) +\endcode + +\subsubsection memPoolIterate memPoolIterate +\code + MemPoolIterator * iter = memPoolIterate(void); +\endcode + +\par + Initialise iteration through all of the pools. + +\subsubsection memPoolIterateNext memPoolIterateNext +\code + MemPool * pool = memPoolIterateNext(MemPoolIterator * iter); +\endcode + +\par + Get next pool pointer, until getting NULL pointer. + +\code + MemPoolIterator *iter; + iter = memPoolIterate(); + while ( (pool = memPoolIterateNext(iter)) ) { + ... handle(pool); + } + memPoolIterateDone(&iter); +\endcode + +\subsubsection memPoolIterateDone memPoolIterateDone +\code + memPoolIterateDone(MemPoolIterator ** iter); +\endcode + +\par + Should be called after finished with iterating through all pools. + +\subsubsection memPoolSetChunkSize memPoolSetChunkSize +\code + memPoolSetChunkSize(MemPool * pool, size_t chunksize); +\endcode + +\par + Allows you tune chunk size of pooling. Objects are allocated in chunks + instead of individually. This conserves memory, reduces fragmentation. + Because of that memory can be freed also only in chunks. Therefore + there is tradeoff between memory conservation due to chunking and free + memory fragmentation. +\par + As a general guideline, increase chunk size only for pools that keep very + many items for relatively long time. + +\subsubsection memPoolSetIdleLimit memPoolSetIdleLimit +\code + memPoolSetIdleLimit(size_t new_idle_limit); +\endcode + +\par + Sets upper limit in bytes to amount of free ram kept in pools. This is + not strict upper limit, but a hint. When MemPools are over this limit, + totally free chunks are immediately considered for release. Otherwise + only chunks that have not been referenced for a long time are checked. + +\subsubsection memPoolGetStats memPoolGetStats +\code + int inuse = memPoolGetStats(MemPoolStats * stats, MemPool * pool); +\endcode + +\par + Fills MemPoolStats struct with statistical data about pool. As a + return value returns number of objects in use, ie. allocated. + +\code + struct _MemPoolStats { + MemPool *pool; + const char *label; + MemPoolMeter *meter; + int obj_size; + int chunk_capacity; + int chunk_size; + + int chunks_alloc; + int chunks_inuse; + int chunks_partial; + int chunks_free; + + int items_alloc; + int items_inuse; + int items_idle; + + int overhead; + }; + + /* object to track per-pool cumulative counters */ + typedef struct { + double count; + double bytes; + } mgb_t; + + /* object to track per-pool memory usage (alloc = inuse+idle) */ + struct _MemPoolMeter { + MemMeter alloc; + MemMeter inuse; + MemMeter idle; + mgb_t gb_saved; /* account Allocations */ + mgb_t gb_osaved; /* history Allocations */ + mgb_t gb_freed; /* account Free calls */ + }; +\endcode + +\subsubsection memPoolGetGlobalStats memPoolGetGlobalStats +\code + int pools_inuse = memPoolGetGlobalStats(MemPoolGlobalStats * stats); +\endcode + +\par + Fills MemPoolGlobalStats struct with statistical data about overall + usage for all pools. As a return value returns number of pools that + have at least one object in use. Ie. number of dirty pools. + +\code + struct _MemPoolGlobalStats { + MemPoolMeter *TheMeter; + + int tot_pools_alloc; + int tot_pools_inuse; + int tot_pools_mempid; + + int tot_chunks_alloc; + int tot_chunks_inuse; + int tot_chunks_partial; + int tot_chunks_free; + + int tot_items_alloc; + int tot_items_inuse; + int tot_items_idle; + + int tot_overhead; + int mem_idle_limit; + }; +\endcode + +\subsubsection memPoolClean memPoolClean +\code + memPoolClean(time_t maxage); +\endcode + +\par + Main cleanup handler. For MemPools to stay within upper idle limits, + this function needs to be called periodically, preferrably at some + constant rate, eg. from Squid event. It looks through all pools and + chunks, cleans up internal states and checks for releasable chunks. + +\par + Between the calls to this function objects are placed onto internal + cache instead of returning to their home chunks, mainly for speedup + purpose. During that time state of chunk is not known, it is not + known whether chunk is free or in use. This call returns all objects + to their chunks and restores consistency. + +\par + Should be called relatively often, as it sorts chunks in suitable + order as to reduce free memory fragmentation and increase chunk + utilisation. + +\par + Parameter maxage instructs to release all totally idle chunks that + have not been referenced for maxage seconds. + +\par + Suitable frequency for cleanup is in range of few tens of seconds to + few minutes, depending of memory activity. + Several functions above call memPoolClean internally to operate on + consistent states. + + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/Groups.dox Mon Jul 23 00:19:27 2007 @@ -0,0 +1,64 @@ +/** + * \defgroup POD POD Classes + * + * \par + * Classes which encapsulate POD (plain old data) in such a way + * that they can be used as POD themselves and passed around Squid. + * These objects should have a formal API for safe handling of their + * content, but it MUST NOT depend on any externality than itself + * or the standard C++ libraries. + */ + +/** + * \defgroup Modules Module Classes + * + * \par + * Between the abstract component level and the practical POD types + * sits a layer of objects that are relatively static within the code. + * They form the codepath along which POD flows, rather than moving. + * They can combine into a process path along wich POD data is flowed + * to produce the desired outcome. + * They should provide an API for interfacing with other objects. + */ + +/** + * \defgroup libsquid Squid Library + * + * \par + * These objects are provided publicly through lidsquid.la + */ + +/** + * \defgroup Tests Unit Testing + * + * \par + * Any good application has a set of tests to ensure it stays + * in a good condition. Squid tends to use cppunit tests. + * \par + * It is preferrable to automated tests for units of functionality. There + * is a boilerplate for tests in "src/tests/testBoilerplate.[cc|h]". New + * tests need to be added to src/Makefile.am to build and run them during + * "make check". To add a new test script, just copy the references to + * testBoilerplate in Makefile.am adjusting the name, and likewise copy the + * source files. If you are testing an already tested area you may be able + * to just add new test cases to an existing script. I.e. to test the store + * some more just edit tests/testStore.h and add a new unit test method + * name. + */ + +/** + * \defgroup callback Event Callback Functions + * + * \par + * Squid uses events to process asynchronous actions. + * These mehods are registered as callbacks to receive notice whenever a + * specific event occurs. + */ + +/** + * \defgroup ExternalPrograms External Programs + * + * \par + * Squid uses external programs to assis in some critical activities. + * By nature these activities cannot be allowed to delay squid. + */ --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/doxy.footer.html Mon Jul 23 00:19:27 2007 @@ -0,0 +1 @@ + --- /dev/null Mon Jul 23 00:19:27 2007 +++ squid3/doc/Programming-Guide/doxy.header.html Mon Jul 23 00:19:27 2007 @@ -0,0 +1 @@ +
--- squid3/doc/Programming-Guide/prog-guide.sgml Mon Jul 23 00:19:27 2007 +++ /dev/null Mon Jul 23 00:19:27 2007 @@ -1,3989 +0,0 @@ - -
-Squid Programmers Guide -Squid Developers -$Id: prog-guide.sgml,v 1.10 2006/05/20 13:36:41 squidadm Exp $ - - -Squid is a WWW Cache application developed by the National Laboratory -for Applied Network Research and members of the Web Caching community. -Squid is implemented as a single, non-blocking process based around -a BSD select() loop. This document describes the operation of the Squid -source code and is intended to be used by others who wish to customize -or improve it. - - - - - - -Introduction - -

- The Squid source code has evolved more from empirical - observation and tinkering, rather than a solid design - process. It carries a legacy of being ``touched'' by - numerous individuals, each with somewhat different techniques - and terminology. - -

- Squid is a single-process proxy server. Every request is - handled by the main process, with the exception of FTP. - However, Squid does not use a ``threads package'' such has - Pthreads. While this might be easier to code, it suffers - from portability and performance problems. Instead Squid - maintains data structures and state information for each - active request. - -

- The code is often difficult to follow because there are no - explicit state variables for the active requests. Instead, - thread execution progresses as a sequence of ``callback - functions'' which get executed when I/O is ready to occur, - or some other event has happened. As a callback function - completes, it is responsible for registering the next - callback function for subsequent I/O. - -

- Note there is only a pseudo-consistent naming scheme. In - most cases functions are named like - Note that the Squid source changes rapidly, and some parts - of this document may become out-of-date. If you find any - inconsistencies, please feel free to notify . - -Conventions - -

- Function names and file names will be written in a courier - font, such as Coding Conventions - -Infrastructure - -

- Most custom types and tools are documented in the code or the relevant - portions of this manual. Some key points apply globally however. - -Fixed width types -

- If you need to use specific width types - such as - a 16 bit unsigned integer, use one of the following types. To access - them simply include "config.h". - - int16_t - 16 bit signed. - u_int16_t - 16 bit unsigned. - int32t - 32 bit signed. - u_int32_t - 32 bit unsigned. - int64_t - 64 bit signed. - u_int64_t - 64 bit unsigned. - - -Unit tests -

- It is preferrable to automated tests for units of functionality. There - is a boilerplate for tests in "src/tests/testBoilerplate.[cc|h]". New - tests need to be added to src/Makefile.am to build and run them during - "make check". To add a new test script, just copy the references to - testBoilerplate in Makefile.am adjusting the name, and likewise copy the - source files. If you are testing an already tested area you may be able - to just add new test cases to an existing script. I.e. to test the store - some more just edit tests/testStore.h and add a new unit test method - name, - -Overview of Squid Components - -

-Squid consists of the following major components - -Client Side Socket - -

- Here new client connections are accepted, parsed, and - reply data sent. Per-connection state information is held - in a data structure called Client Side Request -

- This is where requests are processed. We determine if the - request is to be redirected, if it passes access lists, - and setup the initial client stream for internal requests. - Temporary state for this processing is held in a - Client Side Reply -

- This is where we determine if the request is cache HIT, - REFRESH, MISS, etc. This involves querying the store - (possibly multiple times) to work through Vary lists and - the list. Per-request state information is stored - in the Client Streams -

- These routines implement a unidirectional, non-blocking, - pull pipeline. They allow code to be inserted into the - reply logic on an as-needed basis. For instance, - transfer-encoding logic is only needed when sending a - HTTP/1.1 reply. - -Server Side -

- These routines are responsible for forwarding cache misses - to other servers, depending on the protocol. Cache misses - may be forwarded to either origin servers, or other proxy - caches. Note that all requests (FTP, Gopher) to other - proxies are sent as HTTP requests. Storage Manager - -

- The Storage Manager is the glue between client and server - sides. Every object saved in the cache is allocated a - - Squid can quickly locate cached objects because it keeps - (in memory) a hash table of all - For each object the - Client-side requests register themselves with a Request Forwarding - -Peer Selection - -

- These functions are responsible for selecting one (or none) - of the neighbor caches as the appropriate forwarding - location. - -Access Control - -

- These functions are responsible for allowing or denying a - request, based on a number of different parameters. These - parameters include the client's IP address, the hostname - of the requested resource, the request method, etc. Some - of the necessary information may not be immediately available, - for example the origin server's IP address. In these cases, - the ACL routines initiate lookups for the necessary - information and continues the access control checks when - the information is available. - -Authentication Framework - -

- These functions are responsible for handling HTTP - authentication. They follow a modular framework allow - different authentication schemes to be added at will. For - information on working with the authentication schemes See - the chapter Authentication Framework. - -Network Communication - -

- These are the routines for communicating over TCP and UDP - network sockets. Here is where sockets are opened, closed, - read, and written. In addition, note that the heart of - Squid (File/Disk I/O - -

- Routines for reading and writing disk files (and FIFOs). - Reasons for separating network and disk I/O functions are - partly historical, and partly because of different behaviors. - For example, we don't worry about getting a ``No space left - on device'' error for network sockets. The disk I/O routines - support queuing of multiple blocks for writing. In some - cases, it is possible to merge multiple blocks into a single - write request. The write callback does not necessarily - occur for every write request. - -Neighbors - -

- Maintains the list of neighbor caches. Sends and receives - ICP messages to neighbors. Decides which neighbors to - query for a given request. File: IP/FQDN Cache - -

- A cache of name-to-address and address-to-name lookups. - These are hash tables keyed on the names and addresses. - Cache Manager - -

- This provides access to certain information needed by the - cache administrator. A companion program, - cache_object://hostname/operation - - The cache manager provides essentially ``read-only'' access - to information. It does not provide a method for configuring - Squid while it is running. - -Network Measurement Database - -

- In a number of situation, Squid finds it useful to know the - estimated network round-trip time (RTT) between itself and - origin servers. A particularly useful is example is - the peer selection algorithm. By making RTT measurements, a - Squid cache will know if it, or one if its neighbors, is closest - to a given origin server. The actual measurements are made - with the Redirectors - -

- Squid has the ability to rewrite requests from clients. After - checking the access controls, but before checking for cache hits, - requested URLs may optionally be written to an external - Autonomous System Numbers - -

- Squid supports Autonomous System (AS) numbers as another - access control element. The routines in Configuration File Parsing - -

- The primary configuration file specification is in the file - Callback Data Allocator - -

- Squid's extensive use of callback functions makes it very - susceptible to memory access errors. Care must be taken - so that the Refcount Data Allocator (C++ Only) - -

- Manual reference counting such as cbdata uses is error prone, - and time consuming for the programmer. C++'s operator overloading - allows us to create automatic reference counting pointers, that will - free objects when they are no longer needed. With some care these - objects can be passed to functions needed Callback Data pointers. - -Debugging - -

- Squid includes extensive debugging statements to assist in - tracking down bugs and strange behavior. Every debug statement - is assigned a section and level. Usually, every debug statement - in the same source file has the same section. Levels are chosen - depending on how much output will be generated, or how useful the - provided information will be. The Error Generation - -

- The routines in Event Queue - -

- The routines in Filedescriptor Management - -

- Here we track the number of filedescriptors in use, and the - number of bytes which has been read from or written to each - file descriptor. - - -Hashtable Support - -

- These routines implement generic hash tables. A hash table - is created with a function for hashing the key values, and a - function for comparing the key values. - -HTTP Anonymization - -

- These routines support anonymizing of HTTP requests leaving - the cache. Either specific request headers will be removed - (the ``standard'' mode), or only specific request headers - will be allowed (the ``paranoid'' mode). - -Delay Pools - -

- Delay pools provide bandwidth regulation by restricting the rate - at which squid reads from a server before sending to a client. They - do not prevent cache hits from being sent at maximal capacity. Delay - pools can aggregate the bandwidth from multiple machines and users - to provide more or less general restrictions. - -Internet Cache Protocol - -

- Here we implement the Internet Cache Protocol. This - protocol is documented in the RFC 2186 and RFC 2187. - The bulk of code is in the Ident Lookups - -

- These routines support RFC 931 ``Ident'' lookups. An ident - server running on a host will report the user name associated - with a connected TCP socket. Some sites use this facility for - access control and logging purposes. - -Memory Management - -

- These routines allocate and manage pools of memory for - frequently-used data structures. When the Multicast Support - -

- Currently, multicast is only used for ICP queries. The - routines in this file implement joining a UDP - socket to a multicast group (or groups), and setting - the multicast TTL value on outgoing packets. - -Persistent Server Connections - -

- These routines manage idle, persistent HTTP connections - to origin servers and neighbor caches. Idle sockets - are indexed in a hash table by their socket address - (IP address and port number). Up to 10 idle sockets - will be kept for each socket address, but only for - 15 seconds. After 15 seconds, idle socket connections - are closed. - -Refresh Rules - -

- These routines decide whether a cached object is stale or fresh, - based on the SNMP Support - -

- These routines implement SNMP for Squid. At the present time, - we have made almost all of the cachemgr information available - via SNMP. - -URN Support - -

- We are experimenting with URN support in Squid version 1.2. - Note, we're not talking full-blown generic URN's here. This - is primarily targeted toward using URN's as an smart way - of handling lists of mirror sites. For more details, please - see . - -ESI -

- ESI is an implementation of Edge Side Includes (.) - ESI is implemented as a client side stream and a small - modification to client_side_reply.c to check whether - ESI should be inserted into the reply stream or not. - -External Programs - -dnsserver - -

- Because the standard pinger - -

- Although it would be possible for Squid to send and receive - ICMP messages directly, we use an external process for - two important reasons: - - Because squid handles many filedescriptors simultaneously, - we get much more accurate RTT measurements when ICMP is - handled by a separate process. - Superuser privileges are required to send and receive - ICMP. Rather than require Squid to be started as root, - we prefer to have the smaller and simpler - -unlinkd - -

- The redirector - -

- A redirector process reads URLs on stdin and writes (possibly - changed) URLs on stdout. It is implemented as an external - process to maximize flexibility. - -Flow of a Typical Request - -

- - - A client connection is accepted by the - The access controls are checked. The client-side-request builds - an ACL state data structure and registers a callback function - for notification when access control checking is completed. - - - After the access controls have been verified, the request - may be redirected. - - The client-side-request is forwarded up the client stream - to - The request-forwarding process begins with - When the ICP replies (if any) have been processed, we end - up at - The HTTP module first opens a connection to the origin - server or cache peer. If there is no idle persistent socket - available, a new connection request is given to the Network - Communication module with a callback function. The - - When a TCP connection has been established, HTTP builds a - request buffer and submits it for writing on the socket. - It then registers a read handler to receive and process - the HTTP reply. - - - As the reply is initially received, the HTTP reply headers - are parsed and placed into a reply data structure. As - reply data is read, it is appended to the - As the client-side is notified of new data, it copies the - data from the StoreEntry and submits it for writing on the - client socket. - - - As data is appended to the - When the HTTP module finishes reading the reply from the - upstream server, it marks the - When the client-side has written all of the object data, - it unregisters itself from the - -Callback Functions - -The Main Loop: - At the core of Squid is the - The - commSetSelect(fd, COMM_SELECT_READ, clientReadRequest, conn, 0); - - In this example, - clientReadRequest(fd, conn); - - -

- The I/O handlers are reset every time they are called. In - other words, a handler function must re-register itself - with - commSetSelect(fd, COMM_SELECT_READ, NULL, NULL, 0); - - -

- These I/O handlers (and others) and their associated callback - data pointers are saved in the - struct _fde { - ... - PF *read_handler; - void *read_data; - PF *write_handler; - void *write_data; - close_handler *close_handler; - DEFER *defer_check; - void *defer_data; - }; - - - In some situations we want to defer reading from a - filedescriptor, even though it has data for us to read. - This may be the case when data arrives from the server-side - faster than it can be written to the client-side. Before - adding a filedescriptor to the ``read set'' for select, we - call - These handlers are stored in the - typedef void (*PF) (int, void *); - - The close handler is really a linked list of handler - functions. Each handler also has an associated pointer - - - After each handler is called, - Typical read handlers are - - The close handlers are normally called from - The timeout and lifetime handlers are called for file - descriptors which have been idle for too long. They are - further discussed in a following chapter. - - -Client Streams -Introduction -

A clientStream is a uni-directional loosely coupled pipe. Each node -consists of four methods - read, callback, detach, and status, along with the -stream housekeeping variables (a dlink node and pointer to the head of -the list), context data for the node, and read request parameters - -readbuf, readlen and readoff (in the body). -

clientStream is the basic unit for scheduling, and the clientStreamRead -and clientStreamCallback calls allow for deferred scheduled activity if desired. -

Theory on stream operation: - -Something creates a pipeline. At a minimum it needs a head with a -status method and a read method, and a tail with a callback method and a -valid initial read request. -Other nodes may be added into the pipeline. -The tail-1th node's read method is called. -for each node going up the pipeline, the node either: - -satisfies the read request, or -inserts a new node above it and calls clientStreamRead, or -calls clientStreamRead - -

There is no requirement for the Read parameters from different -nodes to have any correspondence, as long as the callbacks provided are -correct. -The first node that satisfies the read request MUST generate an -httpReply to be passed down the pipeline. Body data MAY be provided. -On the first callback a node MAY insert further downstream nodes in -the pipeline, but MAY NOT do so thereafter. -the callbacks progress down the pipeline until a node makes further -reads instead of satisfying the callback (go to 4) or the end of the -pipe line is reached, where a new read sequence may be scheduled. - -Implementation notes -

ClientStreams have been implemented for the client side reply logic, -starting with either a client socket (tail of the list is -clientSocketRecipient) or a custom handler for in-squid requests, and -with the pipeline HEAD being clientGetMoreData, which uses -clientSendMoreData to send data down the pipeline. -

client POST bodies do not use a pipeline currently, they use the -previous code to send the data. This is a TODO when time permits. - -Whats in a node -

Each node must have: - -read method - to allow loose coupling in the pipeline. (The reader may -therefore change if the pipeline is altered, even mid-flow). -callback method - likewise. -status method - likewise. -detach method - used to ensure all resources are cleaned up properly. -dlink head pointer - to allow list inserts and deletes from within a -node. -context data - to allow the called back nodes to maintain their -private information. -read request parameters - For two reasons: - -To allow a node to determine the requested data offset, length and -target buffer dynamically. Again, this is to promote loose coupling. -Because of the callback nature of squid, every node would have to -keep these parameters in their context anyway, so this reduces -programmer overhead. - - - -Method details -

The first parameter is always the 'this' reference for the client -stream - a clientStreamNode *. -Read -

Parameters: - -clientHttpRequest * - superset of request data, being winnowed down -over time. MUST NOT be NULL. -offset, length, buffer - what, how much and where. - -

Side effects: -

Triggers a read of data that satisfies the httpClientRequest -metainformation and (if appropriate) the offset,length and buffer -parameters. -Callback -

Parameters: - -clientHttpRequest * - superset of request data, being winnowed down -over time. MUST NOT be NULL. -httpReply * - not NULL on the first call back only. Ownership is -passed down the pipeline. Each node may alter the reply if appropriate. -buffer, length - where and how much. - -

Side effects: -

Return data to the next node in the stream. The data may be returned immediately, -or may be delayed for a later scheduling cycle. -Detach -

Parameters: - -clienthttpRequest * - MUST NOT be NULL. - -

Side effects: - -Removes this node from a clientStream. The stream infrastructure handles -the removal. This node MUST have cleaned up all context data, UNLESS scheduled -callbacks will take care of that. -Informs the prev node in the list of this nodes detachment. - -Status -

Parameters: - -clienthttpRequest * - MUST NOT be NULL. - -

Side effects: -

Allows nodes to query the upstream nodes for : - -stream ABORTS - request cancelled for some reason. upstream will not -accept further reads(). -stream COMPLETION - upstream has completed and will not accept further -reads(). -stream UNPLANNED COMPLETION - upstream has completed, but not at a -pre-planned location (used for keepalive checking), and will not accept -further reads(). -stream NONE - no special status, further reads permitted. - - -Abort -

Parameters: - -clienthttpRequest * - MUST NOT be NULL. - -

Side effects: -

Detachs the tail of the stream. CURRENTLY DOES NOT clean up the tail node data - -this must be done separately. Thus Abort may ONLY be called by the tail node. - - -Processing Client Requests - -

- To be written... - - -Delay Pools -Introduction -

A DelayPool is a Composite used to manage bandwidth for any request - assigned to the pool by an access expression. DelayId's are a used - to manage the bandwith on a given request, whereas a DelayPool - manages the bandwidth availability and assigned DelayId's. -Extending Delay Pools - -

A CompositePoolNode is the base type for all members of a DelayPool. - Any child must implement the RefCounting primitives, as well as five - delay pool functions: - - stats() - provide cachemanager statistics for itself. - dump() - generate squid.conf syntax for the current configuration - of the item. - update() - allocate more bandwith to all buckets in the item. - parse() - accept squid.conf syntax for the item, and configure - for use appropriately. - id() - return a DelayId entry for the current item. - -

A DelayIdComposite is the base type for all delay Id's. Concrete - Delay Id's must implement the refcounting primitives, as well as two - delay id functions: - - bytesWanted() - return the largest amount of bytes that this - delay id allows by policy. - bytesIn() - record the use of bandwidth by the request(s) that - this delayId is monitoring. - -

Composite creation is currently under design review, so see the - DelayPool class and follow the parse() code path for details. - -Neat things that could be done. - -

With the composite structure, some neat things have become possible. - For instance: - - Dynamically defined pool arrangements - for instance an - aggregate (class 1) combined with the per-class-C-net tracking of a - class 3 pool, without the individual host tracking. This differs - from a class 3 pool with -1/-1 in the host bucket, because no memory - or cpu would be used on hosts, whereas with a class 3 pool, they are - allocated and used. - Per request bandwidth limits - a delayId that contains it's own - bucket could limit each request independently to a given policy, with - no aggregate restrictions. - - - -Storage Manager - -Introduction - -

- The Storage Manager is the glue between client and server - sides. Every object saved in the cache is allocated a - - Squid can quickly locate cached objects because it keeps - (in memory) a hash table of all - For each object the - Client-side requests register themselves with a Object storage - -

- To be written... - -Object retrieval - -

- To be written... - - -Storage Interface - -Introduction - -

- Traditionally, Squid has always used the Unix filesystem (UFS) - to store cache objects on disk. Over the years, the - poor performance of UFS has become very obvious. In most - cases, UFS limits Squid to about 30-50 requests per second. - Our work indicates that the poor performance is mostly - due to the synchronous nature of - We want to try out our own, customized filesystems with Squid. - In order to do that, we need a well-defined interface - for the bits of Squid that access the permanent storage - devices. We also require tighter control of the replacement - policy by each storage module, rather than a single global - replacement policy. - -Build structure - -

- The storage types live in squid/src/fs/ . Each subdirectory corresponds - to the name of the storage type. When a new storage type is implemented - configure.in must be updated to autogenerate a Makefile in - squid/src/fs/$type/ from a Makefile.in file. - -

- configure will take a list of storage types through the - - Each storage type must create an archive file - - Each storefs must export a function named - An example of the automatically generated file: - - - /* automatically generated by ./store_modules.sh ufs coss - * do not edit - */ - #include "squid.h" - - extern STSETUP storeFsSetup_ufs; - extern STSETUP storeFsSetup_coss; - void storeFsSetup(void) - { - storeFsAdd("ufs", storeFsSetup_ufs); - storeFsAdd("coss", storeFsSetup_coss); - } - - - -Initialization of a storage type - -

- Each storage type initializes through the - void - storeFsSetup_ufs(storefs_entry_t *storefs) - { - assert(!ufs_initialised); - storefs->parsefunc = storeUfsDirParse; - storefs->reconfigurefunc = storeUfsDirReconfigure; - storefs->donefunc = storeUfsDirDone; - ufs_state_pool = memPoolCreate("UFS IO State data", sizeof(ufsstate_t)); - ufs_initialised = 1; - } - - -

- There are five function pointers in the storefs_entry which require - initializing. In this example, some protection is made against the - setup function being called twice, and a memory pool is initialised - for use inside the storage module. - -

- Each function will be covered below. - - -done - -

- - typedef void - STFSSHUTDOWN(void); - - -

- This function is called whenever the storage system is to be shut down. - It should take care of deallocating any resources currently allocated. - - - - typedef void STFSPARSE(SwapDir *SD, int index, char *path); - typedef void STFSRECONFIGURE(SwapDir *SD, int index, char *path); - - -

- These functions handle configuring and reconfiguring a storage - directory. Additional arguments from the cache_dir configuration - line can be retrieved through calls to strtok() and GetInteger(). - -

- - - struct _SwapDir { - char *type; /* Pointer to the store dir type string */ - int cur_size; /* Current swapsize in kb */ - int low_size; /* ?? */ - int max_size; /* Maximum swapsize in kb */ - char *path; /* Path to store */ - int index; /* This entry's index into the swapDir array */ - int suggest; /* Suggestion for UFS style stores (??) */ - size_t max_objsize; /* Maximum object size for this store */ - union { /* Replacement policy-specific fields */ - #ifdef HEAP_REPLACEMENT - struct { - heap *heap; - } heap; - #endif - struct { - dlink_list list; - dlink_node *walker; - } lru; - } repl; - int removals; - int scanned; - struct { - unsigned int selected:1; /* Currently selected for write */ - unsigned int read_only:1; /* This store is read only */ - } flags; - STINIT *init; /* Initialise the fs */ - STNEWFS *newfs; /* Create a new fs */ - STDUMP *dump; /* Dump fs config snippet */ - STFREE *freefs; /* Free the fs data */ - STDBLCHECK *dblcheck; /* Double check the obj integrity */ - STSTATFS *statfs; /* Dump fs statistics */ - STMAINTAINFS *maintainfs; /* Replacement maintainence */ - STCHECKOBJ *checkob; /* Check if the fs will store an object, and get the FS load */ - /* These two are notifications */ - STREFOBJ *refobj; /* Reference this object */ - STUNREFOBJ *unrefobj; /* Unreference this object */ - STCALLBACK *callback; /* Handle pending callbacks */ - STSYNC *sync; /* Sync the directory */ - struct { - STOBJCREATE *create; /* Create a new object */ - STOBJOPEN *open; /* Open an existing object */ - STOBJCLOSE *close; /* Close an open object */ - STOBJREAD *read; /* Read from an open object */ - STOBJWRITE *write; /* Write to a created object */ - STOBJUNLINK *unlink; /* Remove the given object */ - } obj; - struct { - STLOGOPEN *open; /* Open the log */ - STLOGCLOSE *close; /* Close the log */ - STLOGWRITE *write; /* Write to the log */ - struct { - STLOGCLEANOPEN *open; /* Open a clean log */ - STLOGCLEANWRITE *write; /* Write to the log */ - void *state; /* Current state */ - } clean; - } log; - void *fsdata; /* FS-specific data */ - }; - - - - -Operation of a storage module - -

- Squid understands the concept of multiple diverse storage directories. - Each storage directory provides a caching object store, with object - storage, retrieval, indexing and replacement. - -

- Each open object has associated with it a - - struct _storeIOState { - sdirno swap_dirn; /* SwapDir index */ - sfileno swap_filen; /* Unique file index number */ - StoreEntry *e; /* Pointer to parent StoreEntry */ - mode_t mode; /* Mode - O_RDONLY or O_WRONLY */ - size_t st_size; /* Size of the object if known */ - off_t offset; /* current _on-disk_ offset pointer */ - STFNCB *file_callback; /* called on delayed sfileno assignments */ - STIOCB *callback; /* IO Error handler callback */ - void *callback_data; /* IO Error handler callback data */ - struct { - STRCB *callback; /* Read completion callback */ - void *callback_data; /* Read complation callback data */ - } read; - struct { - unsigned int closing:1; /* debugging aid */ - } flags; - void *fsstate; /* pointer to private fs state */ - }; - - -

- Each - The specific filesystem operations listed in the SwapDir object are - covered below. - -initfs - -

- - typedef void - STINIT(SwapDir *SD); - - -

- Initialise the given newfs - -

- - typedef void - STNEWFS(SwapDir *SD); - - -

- Called for each configured dumpfs - -

- - typedef void - STDUMP(StoreEntry *e, SwapDir *SD); - - -

- Dump the FS specific configuration data of the current freefs - -

- - typedef void - STFREE(SwapDir *SD); - - -

- Free the fsdata/. - - -doublecheckfs - -

- - typedef int - STDBLCHECK(SwapDir *SD, StoreEntry *e); - - -

- Double-check the given object for validity. Called during rebuild if - the '-S' flag is given to squid on the command line. Returns 1 if the - object is indeed valid, and 0 if the object is found invalid. - - -statfs - -

- - typedef void - STSTATFS(SwapDir *SD, StoreEntry *e); - - -

- Called to retrieve filesystem statistics, such as usage, load and - errors. The information should be appended to the passed - maintainfs - -

- - typedef void - STMAINTAINFS(SwapDir *SD); - - -

- Called periodically to replace objects. The active replacement policy - should be used to timeout unused objects in order to make room for - new objects. - -callback - -

- - typedef void - STCALLBACK(SwapDir *SD); - - -

- This function is called inside the comm_select/comm_poll loop to handle - any callbacks pending. - - -sync - -

- - typedef void - STSYNC(SwapDir *SD); - - -

- This function is called whenever a sync to disk is required. This - function should not return until all pending data has been flushed to - disk. - - -parse/reconfigure - -

- -checkobj - -

- - typedef int - STCHECKOBJ(SwapDir *SD, const StoreEntry *e); - - -

- Called by referenceobj - -

- - typedef void - STREFOBJ(SwapDir *SD, StoreEntry *e); - - -

- Called whenever an object is locked by unreferenceobj - -

- - typedef void - STUNREFOBJ(SwapDir *SD, StoreEntry *e); - - -

- Called whenever the object is unlocked by createobj - -

- - typedef storeIOState * - STOBJCREATE(SwapDir *SD, StoreEntry *e, STFNCB *file_callback, STIOCB *io_callback, void *io_callback_data); - - -

- Create an object in the - The IO callback should be called when an error occurs and when the - object is closed. Once the IO callback is called, the - openobj - -

- - typedef storeIOState * - STOBJOPEN(SwapDir *SD, StoreEntry *e, STFNCB *file_callback, STIOCB *io_callback, void *io_callback_data); - - -

- Open the closeobj - -

- - typedef void - STOBJCLOSE(SwapDir *SD, storeIOState *sio); - - -

- Close an opened object. The readobj - -

- - typedef void - STOBJREAD(SwapDir *SD, storeIOState *sio, char *buf, size_t size, off_t offset, STRCB *read_callback, void *read_callback_data); - - -

- Read part of the object of into - If a read operation fails, the filesystem layer notifies the - calling module by calling the writeobj - -

- - typedef void - STOBJWRITE(SwapDir *SD, storeIOState *sio, char *buf, size_t size, off_t offset, FREE *freefunc); - - -

- Write the given block of data to the given store object. - If a write operation fails, the filesystem layer notifies the - calling module by calling the unlinkobj - -

- - typedef void STOBJUNLINK(SwapDir *, StoreEntry *); - - -

- Remove the Store IO calls - -

- These routines are used inside the storage manager to create and - retrieve objects from a storage directory. - -storeCreate() - -

- - storeIOState * - storeCreate(StoreEntry *e, STIOCB *file_callback, STIOCB *close_callback, void * callback_data) - - -

- - - storeOpen() - -

- - storeIOState * - storeOpen(StoreEntry *e, STFNCB * file_callback, STIOCB * callback, void *callback_data) - - -

- - - storeRead() - -

- - void - storeRead(storeIOState *sio, char *buf, size_t size, off_t offset, STRCB *callback, void *callback_data) - - -

- - The caller is responsible for allocating and freeing storeWrite() - -

- - void - storeWrite(storeIOState *sio, char *buf, size_t size, off_t offset, FREE *free_func) - - -

- - If a write operation fails, the filesystem layer notifies the - calling module by calling the storeUnlink() - -

- - void - storeUnlink(StoreEntry *e) - - -

- storeOffset() - -

- - off_t storeOffset(storeIOState *sio) - - - - -

- offset/ refers to the ondisk offset, or undefined - results will occur. For reads, this returns the current offset of - successfully read data, not including queued reads. - - -Callbacks - - - - void - stiocb(void *data, int errorflag, storeIOState *sio) - - -

- The - - #define DISK_OK (0) - #define DISK_ERROR (-1) - #define DISK_EOF (-2) - #define DISK_NO_SPACE_LEFT (-6) - - -

- Once the The - - void - strcb(void *data, const char *buf, size_t len) - - -

- The State Logging - -

- These functions deal with state - logging and related tasks for a squid storage system. - These functions are used (called) in - Each storage system must provide the functions described - in this section, although it may be a no-op (null) function - that does nothing. Each function is accessed through a - function pointer stored in the - struct _SwapDir { - ... - STINIT *init; - STNEWFS *newfs; - struct { - STLOGOPEN *open; - STLOGCLOSE *close; - STLOGWRITE *write; - struct { - STLOGCLEANOPEN *open; - STLOGCLEANWRITE *write; - void *state; - } clean; - } log; - .... - }; - - - - - void - STLOGOPEN(SwapDir *); - - -

- The - The - - void - STLOGCLOSE(SwapDir *); - - -

- The - - void - STLOGWRITE(const SwapDir *, const StoreEntry *, int op); - - -

- The - - int - STLOGCLEANSTART(SwapDir *); - - -

- The - The - - StoreEntry * - STLOGCLEANNEXTENTRY(SwapDir *); - - -

- Gets the next entry that is a candidate for the clean log. - -

- Returns NULL when there is no more objects to log - - - - void - STLOGCLEANWRITE(SwapDir *, const StoreEntry *); - - -

- The - - void - STLOGCLEANDONE(SwapDir *); - - -

- Indicates the end of the clean-writing process and signals - the storage system to close the clean log, and rename or - move them to become the official state-holding log ready - to be opened. - -Replacement policy implementation - -

-The replacement policy can be updated during STOBJREAD/STOBJWRITE/STOBJOPEN/ -STOBJCLOSE as well as STREFOBJ and STUNREFOBJ. Care should be taken to -only modify the relevant replacement policy entries in the StoreEntry. -The responsibility of replacement policy maintainence has been moved into -each SwapDir so that the storage code can have tight control of the -replacement policy. Cyclic filesystems such as COSS require this tight -coupling between the storage layer and the replacement policy. - - -Removal policy API - -

- The removal policy is responsible for determining in which order - objects are deleted when Squid needs to reclaim space for new objects. - Such a policy is used by a object storage for maintaining the stored - objects and determining what to remove to reclaim space for new objects. - (together they implements a replacement policy) - -API -

- It is implemented as a modular API where a storage directory or - memory creates a policy of choice for maintaining it's objects, - and modules registering to be used by this API. - -createRemovalPolicy() - -

- - RemovalPolicy policy = createRemovalPolicy(cons char *type, cons char *args) - - -

- Creates a removal policy instance where object priority can be - maintained - -

- The returned RemovalPolicy instance is cbdata registered - -policy.Free() - -

- - policy->Free(RemovalPolicy *policy) - - -

- Destroys the policy instance and frees all related memory. - -policy.Add() - -

- - policy->Add(RemovalPolicy *policy, StoreEntry *, RemovalPolicyNode *node) - - -

- Adds a StoreEntry to the policy instance. - -

- datap is a pointer to where policy specific data can be stored - for the store entry, currently the size of one (void *) pointer. - -policy.Remove() - -

- - policy->Remove(RemovalPolicy *policy, StoreEntry *, RemovalPolicyNode *node) - - -

- Removes a StoreEntry from the policy instance out of - policy order. For example when an object is replaced - by a newer one or is manually purged from the store. - -

- datap is a pointer to where policy specific data is stored - for the store entry, currently the size of one (void *) pointer. - -policy.Referenced() - -

- - policy->Referenced(RemovalPolicy *policy, const StoreEntry *, RemovalPolicyNode *node) - - -

- Tells the policy that a StoreEntry is going to be referenced. Called - whenever a entry gets locked. - -

- node is a pointer to where policy specific data is stored - for the store entry, currently the size of one (void *) pointer. - -policy.Dereferenced() - -

- - policy->Dereferenced(RemovalPolicy *policy, const StoreEntry *, RemovalPolicyNode *node) - - -

- Tells the policy that a StoreEntry has been referenced. Called when - an access to the entry has finished. - -

- node is a pointer to where policy specific data is stored - for the store entry, currently the size of one (void *) pointer. - -policy.WalkInit() - -

- - RemovalPolicyWalker walker = policy->WalkInit(RemovalPolicy *policy) - - -

- Initiates a walk of all objects in the policy instance. - The objects is returned in an order suitable for using - as reinsertion order when rebuilding the policy. - -

- The returned RemovalPolicyWalker instance is cbdata registered - -

- Note: The walk must be performed as an atomic operation - with no other policy actions intervening, or the outcome - will be undefined. - -walker.Next() - -

- - const StoreEntry *entry = walker->Next(RemovalPolicyWalker *walker) - - -

- Gets the next object in the walk chain - -

- Return NULL when there is no further objects - -walker.Done() - -

- - walker->Done(RemovalPolicyWalker *walker) - - -

- Finishes a walk of the maintained objects, destroys - walker. - -policy.PurgeInit() - -

- - RemovalPurgeWalker purgewalker = policy->PurgeInit(RemovalPolicy *policy, int max_scan) - - -

- Initiates a search for removal candidates. Search depth is indicated - by max_scan. - -

- The returned RemovalPurgeWalker instance is cbdata registered - -

- Note: The walk must be performed as an atomic operation - with no other policy actions intervening, or the outcome - will be undefined. - -purgewalker.Next() - -

- - StoreEntry *entry = purgewalker->Next(RemovalPurgeWalker *purgewalker) - - -

- Gets the next object to purge. The purgewalker will remove each - returned object from the policy. - -

It is the polices responsibility to verify that the object - isn't locked or otherwise prevented from being removed. What this - means is that the policy must not return objects where - storeEntryLocked() is true. - -

- Return NULL when there is no further purgeable objects in the policy. - -purgewalker.Done() - -

- - purgewalker->Done(RemovalPurgeWalker *purgewalker) - - -

- Finishes a walk of the maintained objects, destroys - walker and restores the policy to it's normal state. - -policy.Stats() - -

- - purgewalker->Stats(RemovalPurgeWalker *purgewalker, StoreEntry *entry) - - -

- Appends statistics about the policy to the given entry. - -Source layout - -

- Policy implementations resides in src/repl/<name>/, and a make in - such a directory must result in a object archive src/repl/<name>.a - containing all the objects implementing the policy. - -Internal structures - -RemovalPolicy - -

- - typedef struct _RemovalPolicy RemovalPolicy; - struct _RemovalPolicy { - char *_type; - void *_data; - void (*add)(RemovalPolicy *policy, StoreEntry *); - ... /* see the API definition above */ - }; - - -

- The _type member is mainly for debugging and diagnostics purposes, and - should be a pointer to the name of the policy (same name as used for - creation) - -

- The _data member is for storing policy specific information. - -RemovalPolicyWalker - -

- - typedef struct _RemovalPolicyWalker RemovalPolicyWalker; - struct _RemovalPolicyWalker { - RemovalPolicy *_policy; - void *_data; - StoreEntry *(*next)(RemovalPolicyWalker *); - ... /* see the API definition above */ - }; - - -RemovalPolicyNode - -

- - typedef struct _RemovalPolicyNode RemovalPolicyNode; - struct _RemovalPolicyNode { - void *data; - }; - - - Stores policy specific information about a entry. Currently - there is only space for a single pointer, but plans are to - maybe later provide more space here to allow simple policies - to store all their data "inline" to preserve some memory. - -Policy registration - -

- Policies are automatically registered in the Squid binary from the - policy selection made by the user building Squid. In the future this - might get extended to support loadable modules. All registered - policies are available to object stores which wishes to use them. - -Policy instance creation - -

- Each policy must implement a "create/new" function " - It should also populate the _data member with a pointer to policy - specific data. - -Walker - -

- When a walker is created the policy populates it with at least the API - methods supported. Currently all API calls are mandatory, but the - policy implementation must make sure to NULL fill the structure prior - to populating it in order to assure future API compatibility. - -Design notes/bugs - -

- The RemovalPolicyNode design is incomplete/insufficient. The intention - was to abstract the location of the index pointers from the policy - implementation to allow the policy to work on both on-disk and memory - caches, but unfortunately the purge method for HEAP based policies - needs to update this, and it is also preferable if the purge method - in general knows how to clear the information. I think the agreement - was that the current design of tightly coupling the two together - on one StoreEntry is not the best design possible. - -

- It is debated if the design in having the policy index control the - clean index writes is the correct approach. Perhaps not. Perhaps a - more appropriate design is probably to do the store indexing - completely outside the policy implementation (i.e. using the hash - index), and only ask the policy to dump it's state somehow. - -

- The Referenced/Dereferenced() calls is today mapped to lock/unlock - which is an approximation of when they are intended to be called. - However, the real intention is to have Referenced() called whenever - an object is referenced, and Dereferenced() only called when the - object has actually been used for anything good. - - -Forwarding Selection - -

- To be written... - - -IP Cache and FQDN Cache - - Introduction - -

- The IP cache is a built-in component of squid providing - Hostname to IP-Number translation functionality and managing - the involved data-structures. Efficiency concerns require - mechanisms that allow non-blocking access to these mappings. - The IP cache usually doesn't block on a request except for - special cases where this is desired (see below). - - Data Structures - -

- The data structure used for storing name-address mappings - is a small hashtable (static hash_table *ip_table), - where structures of type ipcache_entry whose most - interesting members are: - - - struct _ipcache_entry { - char *name; - time_t lastref; - ipcache_addrs addrs; - struct _ip_pending *pending_head; - char *error_message; - unsigned char locks; - ipcache_status_t status:3; - } - - - - External overview - -

- Main functionality - is provided through calls to: - - - ipcache_nbgethostbyname(const char *name, IPH *handler, - void *handlerdata) - where ipcache_gethostbyname(const char *name,int flags) - is different in that it only checks if an entry exists in - it's data-structures and does not by default contact the - DNS, unless this is requested, by setting the ipcache_init() is called from ipcache_restart() is called to clear the IP - cache's data structures, cancel all pending requests. - Currently, it is only called from - - Internal Operation - -

- Internally, the execution flow is as follows: On a miss, - -Server Protocols -HTTP - -

- To be written... - -FTP - -

- To be written... - -Gopher - -

- To be written... - -Wais - -

- To be written... - -SSL - -

- To be written... - -Passthrough - -

- To be written... - - -Timeouts - -

- To be written... - - -Events - -

- To be written... - - -Access Controls - -

- To be written... - - -Authentication Framework - -

- Squid's authentication system is responsible for reading - authentication credentials from HTTP requests and deciding - whether or not those credentials are valid. This functionality - resides in two separate components: Authentication Schemes - and Authentication Modules. - -

- An Authentication Scheme describes how Squid gets the - credentials (i.e. username, password) from user requests. - Squid currently supports two authentication schemes: Basic - and NTLM. Basic authentication uses the - An Authentication Module takes the credentials received - from a client's request and tells Squid if they are - are valid. Authentication Modules are implemented - externally from Squid, as child helper processes. - Authentication Modules interface with various types - authentication databases, such as LDAP, PAM, NCSA-style - password files, and more. - -Authentication Scheme API - -Definition of an Authentication Scheme - -

An auth scheme in squid is the collection of functions required to - manage the authentication process for a given HTTP authentication - scheme. Existing auth schemes in squid are Basic and NTLM. Other HTTP - schemes (see for example RFC 2617) have been published and could be - implemented in squid. The term auth scheme and auth module are - interchangeable. An auth module is not to be confused with an - authentication helper, which is a scheme specific external program used - by a specific scheme to perform data manipulation external to squid. - Typically this involves comparing the browser submitted credentials with - those in the organization's user directory. - -

Auth modules SHOULD NOT perform access control functions. Squid has - advanced caching access control functionality already. Future work in - squid will allow a auth scheme helper to return group information for a - user, to allow Squid to more seamlessly implement access control. - - Function typedefs - -

Each function related to the general case of HTTP authentication has - a matching typedef. There are some additional function types used to - register/initialize, deregister/shutdown and provide stats on auth - modules: - - - - The Active function is used by squid to determine whether - the auth module has successfully initialised itself with - the current configuration. - - The configured function is used to see if the auth module - has been given valid parameters and is able to handle - authentication requests if initialised. If configured - returns 0 no other module functions except - Shutdown/Dump/Parse/FreeConfig will be called by Squid. - - functions of type AUTHSSETUP are used to register an - auth module with squid. The registration function MUST be - named "authSchemeSetup_SCHEME" where SCHEME is the auth_scheme - as defined by RFC 2617. Only one auth scheme registered in - squid can provide functionality for a given auth_scheme. - (I.e. only one auth module can handle Basic, only one can - handle Digest and so forth). The Setup function is responsible - for registering the functions in the auth module into the - passed authscheme_entry_t. The authscheme_entry_t will - never be NULL. If it is NULL the auth module should log an - error and do nothing. The other functions can have any - desired name that does not collide with any statically - linked function name within Squid. It is recommended to - use names of the form "authe_SCHEME_FUNCTIONNAME" (for - example authenticate_NTLM_Active is the Active() function - for the NTLM auth module. - - Functions of type AUTHSSHUTDOWN are responsible for - freeing any resources used by the auth modules. The shutdown - function will be called before squid reconfigures, and - before squid shuts down. - - Functions of type AUTHSINIT are responsible for allocating - any needed resources for the authentication module. AUTHSINIT - functions are called after each configuration takes place - before any new requests are made. - - Functions of type AUTHSPARSE are responsible for parsing - authentication parameters. The function currently needs a - scheme scope data structure to store the configuration in. - The passed scheme's scheme_data pointer should point to - the local data structure. Future development will allow - all authentication schemes direct access to their configuration - data without a locally scope structure. The parse function - is called by Squid's config file parser when a auth_param - scheme_name entry is encountered. - - Functions of type AUTHSFREECONFIG are called by squid - when freeing configuration data. The auth scheme should - free any memory allocated that is related to parse data - structures. The scheme MAY take advantage of this call to - remove scheme local configuration dependent data. (Ie cached - user details that are only relevant to a config setting). - - Functions of type AUTHSDUMP are responsible for writing - to the StoreEntry the configuration parameters that a user - would put in a config file to recreate the running - configuration. - - Functions of type AUTHSSTATS are called by the cachemgr - to provide statistics on the authmodule. Current modules - simply provide the statistics from the back end helpers - (number of requests, state of the helpers), but more detailed - statistics are possible - for example unique users seen or - failed authentication requests.

The next set of functions - work on the data structures used by the authentication - schemes. - - The AUTHSREQFREE function is called when a auth_user_request is being - freed by the authentication framework, and scheme specific data was - present. The function should free any scheme related data and MUST set - the scheme_data pointer to NULL. Failure to unlink the scheme data will - result in squid dying. - - Squid does not make assumptions about where the username - is stored. This function must return a pointer to a NULL - terminated string to be used in logging the request. Return - NULL if no username/usercode is known. The string should - NOT be allocated each time this function is called. - - The AUTHED function is used by squid to determine whether - the auth scheme has successfully authenticated the user - request. If timeouts on cached credentials have occurred - or for any reason the credentials are not valid, return - false.

The next set of functions perform the actual - authentication. The functions are used by squid for both - WWW- and Proxy- authentication. Therefore they MUST NOT - assume the authentication will be based on the Proxy-* - Headers. - - Functions of type AUTHSAUTHUSER are called when Squid - has a request that needs authentication. If needed the auth - scheme can alter the auth_user pointer (usually to point - to a previous instance of the user whose name is discovered - late in the auth process. For an example of this see the - NTLM scheme). These functions are responsible for performing - any in-squid routines for the authentication of the user. - The auth_user_request struct that is passed around is only - persistent for the current request. If the auth module - requires access to the structure in the future it MUST lock - it, and implement some method for identifying it in the - future. For example the NTLM module implements a connection - based authentication scheme, so the auth_user_request struct - gets referenced from the ConnStateData. - - Functions of type AUTHSDECODE are responsible for decoding the passed - authentication header, creating or linking to a auth_user struct and for - storing any needed details to complete authentication in AUTHSAUTHUSER. - - Functions of type AUTHSDIRECTION are used by squid to determine what - the next step in performing authentication for a given scheme is. The - following are the return codes: - - - -2 = error in the auth module. Cannot determine request direction. - -1 = the auth module needs to send data to an external helper. - Squid will prepare for a callback on the request and call the - AUTHSSTART function. - 0 = the auth module has all the information it needs to - perform the authentication and provide a succeed/fail result. - 1 = the auth module needs to send a new challenge to the - request originator. Squid will return the appropriate status code - (401 or 407) and call the registered FixError function to allow the - auth module to insert it's challenge. - - - Functions of type AUTHSFIXERR are used by squid to add scheme - specific challenges when returning a 401 or 407 error code. On requests - where no authentication information was provided, all registered auth - modules will have their AUTHSFIXERR function called. When the client - makes a request with an authentication header, on subsequent calls only the matching - AUTHSFIXERR function is called (and then only if the auth module - indicated it had a new challenge to send the client). If no auth schemes - match the request, the authentication credentials in the request are - ignored - and all auth modules are called. - - These functions are responsible for freeing scheme specific data from - the passed auth_user_t structure. This should only be called by squid - when there are no outstanding requests linked to the auth user. This includes - removing the user from any scheme specific memory caches. - - - - These functions are responsible for adding any authentication - specific header(s) or trailer(s) OTHER THAN the WWW-Authenticate and - Proxy-Authenticate headers to the passed HttpReply. The int indicates - whether the request was an accelerated request or a proxied request. For - example operation see the digest auth scheme. (Digest uses a - Authentication-Info header.) This function is called whenever a - auth_user_request exists in a request when the reply is constructed - after the body is sent on chunked replies respectively. - - This function type is called when a auth_user_request is - linked into a ConnStateData struct, and the connection is closed. If any - scheme specific activities related to the request or connection are in - progress, this function MUST clear them. - - This function type is called when squid is ready to put the request - on hold and wait for a callback from the auth module when the auth - module has performed it's external activities. - - - - Data Structures - -

This is used to link auth_users into the username cache. - Because some schemes may link in aliases to a user, the - link is not part of the auth_user structure itself. - - -struct _auth_user_hash_pointer { - /* first two items must be same as hash_link */ - char *key; - auth_user_hash_pointer *next; - auth_user_t *auth_user; - dlink_node link; /* other hash entries that point to the same auth_user */ -}; - - -

This is the main user related structure. It stores user-related data, - and is persistent across requests. It can even persistent across - multiple external authentications. One major benefit of preserving this - structure is the cached ACL match results. This structure, is private to - the authentication framework. - - -struct _auth_user_t { - /* extra fields for proxy_auth */ - /* this determines what scheme owns the user data. */ - auth_type_t auth_type; - /* the index +1 in the authscheme_list to the authscheme entry */ - int auth_module; - /* we only have one username associated with a given auth_user struct */ - auth_user_hash_pointer *usernamehash; - /* we may have many proxy-authenticate strings that decode to the same user*/ - dlink_list proxy_auth_list; - dlink_list proxy_match_cache; - struct { - unsigned int credentials_ok:2; /*0=unchecked,1=ok,2=failed*/ - } flags; - long expiretime; - /* IP addr this user authenticated from */ - struct IN_ADDR ipaddr; - time_t ip_expiretime; - /* how many references are outstanding to this instance*/ - size_t references; - /* the auth scheme has it's own private data area */ - void *scheme_data; - /* the auth_user_request structures that link to this. Yes it could be a splaytree - * but how many requests will a single username have in parallel? */ - dlink_list requests; -}; - - -

This is a short lived structure is the visible aspect of the - authentication framework. - - -struct _auth_user_request_t { - /* this is the object passed around by client_side and acl functions */ - /* it has request specific data, and links to user specific data */ - /* the user */ - auth_user_t *auth_user; - /* return a message on the 401/407 error pages */ - char *message; - /* any scheme specific request related data */ - void *scheme_data; - /* how many 'processes' are working on this data */ - size_t references; -}; - - -

- The authscheme_entry struct is used to store the runtime - registered functions that make up an auth scheme. An auth - scheme module MUST implement ALL functions except the - following functions: oncloseconnection, AddHeader, AddTrailer.. - In the future more optional functions may be added to this - data type. - - -struct _authscheme_entry { - char *typestr; - AUTHSACTIVE *Active; - AUTHSADDHEADER *AddHeader; - AUTHSADDTRAILER *AddTrailer; - AUTHSAUTHED *authenticated; - AUTHSAUTHUSER *authAuthenticate; - AUTHSDUMP *dump; - AUTHSFIXERR *authFixHeader; - AUTHSFREE *FreeUser; - AUTHSFREECONFIG *freeconfig; - AUTHSUSERNAME *authUserUsername; - AUTHSONCLOSEC *oncloseconnection; /*optional*/ - AUTHSDECODE *decodeauth; - AUTHSDIRECTION *getdirection; - AUTHSPARSE *parse; - AUTHSINIT *init; - AUTHSREQFREE *requestFree; - AUTHSSHUTDOWN *donefunc; - AUTHSSTART *authStart; - AUTHSSTATS *authStats; -}; - - -

For information on the requirements for each of the - functions, see the details under the typedefs above. For - reference implementations, see the squid source code, - /src/auth/basic for a request based stateless auth module, - and /src/auth/ntlm for a connection based stateful auth - module. - -How to add a new Authentication Scheme - -

Copy the nearest existing auth scheme and modify to receive the - appropriate scheme headers. Now step through the acl.c MatchAclProxyUser - function's code path and see how the functions call down through - authenticate.c to your scheme. Write a helper to provide you scheme with - any backend existence it needs. Remember any blocking code must go in - AUTHSSTART function(s) and _MUST_ use callbacks. - -How to ``hook in'' new functions to the API - -

Start of by figuring the code path that will result in - the function being called, and what data it will need. Then - create a typedef for the function, add and entry to the - authscheme_entry struct. Add a wrapper function to - authenticate.c (or if appropriate cf_cache.c) that called - the scheme specific function if it exists. Test it. Test - it again. Now port to all the existing auth schemes, or at - least add a setting of NULL for the function for each - scheme. - -Authentication Module Interface - -Basic Authentication Modules - -

-Basic authentication provides a username and password. These -are written to the authentication module processes on a single -line, separated by a space: - - - -

-The authentication module process reads username, password pairs -on stdin and returns either ``OK'' or ``ERR'' on stdout for -each input line. - -

-The following simple perl script demonstrates how the -authentication module works. This script allows any -user named ``Dirk'' (without checking the password) -and allows any user that uses the password ``Sekrit'': - - -#!/usr/bin/perl -w -$|=1; # no buffering, important! -while (<>) { - chop; - ($u,$p) = split; - $ans = &check($u,$p); - print "$ans\n"; -} - -sub check { - local($u,$p) = @_; - return 'ERR' unless (defined $p && defined $u); - return 'OK' if ('Dirk' eq $u); - return 'OK' if ('Sekrit' eq $p); - return 'ERR'; -} - - - -ICP - -

- To be written... - - -Network Measurement Database - -

- To be written... - - -Error Pages - -

- To be written... - - -Callback Data Allocator - -

- Squid's extensive use of callback functions makes it very - susceptible to memory access errors. To address this all callback - functions make use of a construct called "cbdata". This allows - functions doing callbacks to verify that the caller is still - valid before making the callback. - -

- Note: cbdata is intended for callback data and is tailored specifically - to make callbacks less dangerous leaving as few windows of errors as - possible. It is not suitable or intended as a generic referencecounted - memory allocator. - -API - -CBDATA_TYPE - -

- - CBDATA_TYPE(datatype); - - -

- Macro that defines a new cbdata datatype. Similar to a variable - or struct definition. Scope is always local to the file/block - where it is defined and all calls to cbdataAlloc for this type - must be within the same scope as the CBDATA_TYPE declaration. - Allocated entries may be referenced or freed anywhere with no - restrictions on scope. - -CBDATA_GLOBAL_TYPE - -

- - /* Module header file */ - external CBDATA_GLOBAL_TYPE(datatype); - - /* Module main C file */ - CBDATA_GLOBAL_TYPE(datatype); - - -

- Defines a global cbdata type that can be referenced anywhere in - the code. - -CBDATA_INIT_TYPE - -

- - CBDATA_INIT_TYPE(datatype); - /* or */ - CBDATA_INIT_TYPE_FREECB(datatype, FREE *freehandler); - - -

- Initializes the cbdatatype. Must be called prior to the first use of - cbdataAlloc() for the type. - -

- The freehandler is called when the last known reference to a - allocated entry goes away. - -cbdataAlloc - -

- - pointer = cbdataAlloc(datatype); - - -

- Allocates a new entry of a registered cbdata type. - -cbdataFree - -

- - cbdataFree(pointer); - - -

- Frees a entry allocated by cbdataAlloc(). - -

- Note: If there are active references to the entry then the entry - will be freed with the last reference is removed. However, - cbdataReferenceValid() will return false for those references. - -cbdataReference - -

- - reference = cbdataReference(pointer); - - -

- Creates a new reference to a cbdata entry. Used when you need to - store a reference in another structure. The reference can later - be verified for validity by cbdataReferenceValid(). - -

- Note: The reference variable is a pointer to the entry, in all - aspects identical to the original pointer. But semantically it - is quite different. It is best if the reference is thought of - and handled as a "void *". - -cbdataReferenceDone - -

- - cbdataReferenceDone(reference); - - -

- Removes a reference created by cbdataReference(). - -

- Note: The reference variable will be automatically cleared to NULL. - -cbdataReferenceValid - -

- - if (cbdataReferenceValid(reference)) { - ... - } - - -

- cbdataReferenceValid() returns false if a reference is stale (refers to a - entry freed by cbdataFree). - -cbdataReferenceValidDone - -

- - void *pointer; - bool cbdataReferenceValidDone(reference, &pointer); - - -

- Removes a reference created by cbdataReference() and checks - it for validity. A temporary pointer to the referenced data - (if valid) is returned in the &pointer argument. - -

- Meant to be used on the last dereference, usually to make - a callback. - - - void *cbdata; - ... - if (cbdataReferenceValidDone(reference, &cbdata)) != NULL) - callback(..., cbdata); - - -

- Note: The reference variable will be automatically cleared to NULL. - -Examples - -

- Here you can find some examples on how to use cbdata, and why - -Asynchronous operation without cbdata, showing why cbdata is needed - -

- For a asyncronous operation with callback functions, the normal - sequence of events in programs NOT using cbdata is as follows: - - /* initialization */ - type_of_data our_data; - ... - our_data = malloc(...); - ... - /* Initiate a asyncronous operation, with our_data as callback_data */ - fooOperationStart(bar, callback_func, our_data); - ... - /* The asyncronous operation completes and makes the callback */ - callback_func(callback_data, ....); - /* Some time later we clean up our data */ - free(our_data); - - However, things become more interesting if we want or need - to free the callback_data, or otherwise cancel the callback, - before the operation completes. In constructs like this you - can quite easily end up with having the memory referenced - pointed to by callback_data freed before the callback is invoked - causing a program failure or memory corruption: - - /* initialization */ - type_of_data our_data; - ... - our_data = malloc(...); - ... - /* Initiate a asyncronous operation, with our_data as callback_data */ - fooOperationStart(bar, callback_func, our_data); - ... - /* ouch, something bad happened elsewhere.. try to cleanup - * but the programmer forgot there is a callback pending from - * fooOperationsStart() (an easy thing to forget when writing code - * to deal with errors, especially if there may be many different - * pending operation) - */ - free(our_data); - ... - /* The asyncronous operation completes and makes the callback */ - callback_func(callback_data, ....); - /* CRASH, the memory pointer to by callback_data is no longer valid - * at the time of the callback - */ - -Asyncronous operation with cbdata - -

- The callback data allocator lets us do this in a uniform and - safe manner. The callback data allocator is used to allocate, - track and free memory pool objects used during callback - operations. Allocated memory is locked while the asyncronous - operation executes elsewhere, and is freed when the operation - completes. The normal sequence of events is: - - /* initialization */ - type_of_data our_data; - ... - our_data = cbdataAlloc(type_of_data); - ... - /* Initiate a asyncronous operation, with our_data as callback_data */ - fooOperationStart(..., callback_func, our_data); - ... - /* foo */ - void *local_pointer = cbdataReference(callback_data); - .... - /* The asyncronous operation completes and makes the callback */ - void *cbdata; - if (cbdataReferenceValidDone(local_pointer, &cbdata)) - callback_func(...., cbdata); - ... - cbdataFree(our_data); - - - -Asynchronous operation cancelled by cbdata - -

- With this scheme, nothing bad happens if - /* initialization */ - type_of_data our_data; - ... - our_data = cbdataAlloc(type_of_data); - ... - /* Initiate a asyncronous operation, with our_data as callback_data */ - fooOperationStart(..., callback_func, our_data); - ... - /* foo */ - void *local_pointer = cbdataReference(callback_data); - .... - /* something bad happened elsewhere.. cleanup */ - cbdataFree(our_data); - ... - /* The asyncronous operation completes and tries to make the callback */ - void *cbdata; - if (cbdataReferenceValidDone(local_pointer, &cbdata)) - /* won't be called, as the data is no longer valid */ - callback_func(...., cbdata); - - - In this case, when Adding a new cbdata registered type - -

- To add new module specific data types to the allocator one uses the - macros CBDATA_TYPE and CBDATA_INIT_TYPE. These creates a local cbdata - definition (file or block scope). Any cbdataAlloc calls must be made - within this scope. However, cbdataFree might be called from anywhere. - - - /* First the cbdata type needs to be defined in the module. This - * is usually done at file scope, but it can also be local to a - * function or block.. - */ - CBDATA_TYPE(type_of_data); - - /* Then in the code somewhere before the first allocation - * (can be called multiple times with only a minimal overhead) - */ - CBDATA_INIT_TYPE(type_of_data); - /* Or if a free function is associated with the data type. This - * function is responsible for cleaning up any dependencies etc - * referenced by the structure and is called on cbdataFree or - * when the last reference is deleted by cbdataReferenceDone / - * cbdataReferenceValidDone - */ - CBDATA_INIT_TYPE_FREECB(type_of_data, free_function); - - -Adding a new cbdata registered data type globally - -

- To add new global data types that can be allocated from anywhere - within the code one have to add them to the cbdata_type enum in - enums.h, and a corresponding CREATE_CBDATA call in - cbdata.c:cbdataInit(). Or alternatively add a CBDATA_GLOBAL_TYPE - definition to globals.h as shown below and use CBDATA_INIT_TYPE at - the appropriate location(s) as described above. - - - extern CBDATA_GLOBAL_TYPE(type_of_data); /* CBDATA_UNDEF */ - - - -Refcount Data Allocator (C++ Only) - -

- Manual reference counting such as cbdata uses is error prone, - and time consuming for the programmer. C++'s operator overloading - allows us to create automatic reference counting pointers, that will - free objects when they are no longer needed. With some care these - objects can be passed to functions needed Callback Data pointers. - - API - -

- There are two classes involved in the automatic refcouting - a - RefCountable - -

- The RefCountable base class defines one abstract function - - - void deleteSelf() const {delete this;} - - - RefCount - -

- The RefCount template class replaces pointers as parameters and - variables of the class being reference counted. Typically one creates - a typedef to aid users. - - class MyConcrete : public RefCountable { - public: - typedef RefCount Pointer; - void deleteSelf() const {delete this;} - }; - - Now, one can pass objects of MyConcrete::Pointer around. - - CBDATA - -

- To make a refcounting CBDATA class, you need to overload new and delete, - include a macro in your class definition, and ensure that some everyone - who would call you directly (not as a cbdata callback, but as a normal - use), holds a RefCount<> smart pointer to you. - - class MyConcrete : public RefCountable { - public: - typedef RefCount Pointer; - void * operator new(size_t); - void operator delete (void *); - void deleteSelf() const {delete this;} - private: - CBDATA_CLASS(MyConcrete); - }; - - ... - /* In your .cc file */ - CBDATA_CLASS_INIT(MyConcrete); - - void * - MyConcrete::operator new (size_t) - { - CBDATA_INIT_TYPE(MyConcrete); - MyConcrete *result = cbdataAlloc(MyConcrete); - /* Mark result as being owned - we want the refcounter to do the - * delete call - */ - cbdataReference(result); - return result; - } - - void - MyConcrete::operator delete (void *address) - { - MyConcrete *t = static_cast(address); - cbdataFree(address); - /* And allow the memory to be freed */ - cbdataReferenceDone (t); - } - - - When no RefCount<MyConcrete> smart pointers exist, the objects - delete method will be called. This will run the object destructor, - freeing any foreign resources it hold. Then cbdataFree - will be called, marking the object as invalid for all the cbdata - functions that it may have queued. When they all return, the actual - memory will be returned to the pool. - - Using the Refcounter - -

- Allocation and deallocation of refcounted objects (including those of - the RefCount template class) must be done via new() and delete(). If a - class that will hold an instance of a RefCount <foo> variable - does not use delete(), you must assign NULL to the variable before - it is freed. Failure to do this will result in memory leaks. You HAVE - been warned. - -

- Never call delete or deleteSelf on a RefCountable object. You will - create a large number of dangling references and squid will segfault - eventually. - -

- Always create at least one RefCount smart pointer, so that the - reference counting mechanism will delete the object when it's not - needed. - -

- Do not pass RefCount smart pointers outside the squid memory space. - They will invariably segfault when copied. - -

- If, in a method, all other smart pointer holding objects may be deleted - or may set their smart pointers to NULL, then you will be deleted - partway through the method (and thus crash). To prevent this, assign - a smart pointer to yourself: - - void - MyConcrete::aMethod(){ - /* This holds a reference to us */ - Pointer aPointer(this); - /* This is a method that may mean we don't need to exist anymore */ - someObject->someMethod(); - /* This prevents aPointer being optimised away before this point, - * and must be the last line in our method - */ - aPointer = NULL; - } - - -

- Calling methods via smart pointers is easy just dereference via -> - - void - SomeObject::someFunction() { - myConcretePointer->someOtherMethod(); - } - - -

- When passing RefCount smart pointers, always pass them as their - native type, never as '*' or as '&'. - - -Cache Manager - -

- To be written... - - -HTTP Headers - -

- - General remarks - -

- - Most operations on Life cycle - -

- - /* declare */ - HttpHeader hdr; - - /* initialize (as an HTTP Request header) */ - httpHeaderInit(&hdr, hoRequest); - - /* do something */ - ... - - /* cleanup */ - httpHeaderClean(&hdr); - - -

- Prior to use, an - Once initialized, the - Note that there are no methods for "creating" or "destroying" - a "dynamic" Header Manipulation - -

- The mostly common operations on HTTP headers are testing - for a particular header-field ( - - - Special care must be taken when several header-fields with - the same id are preset in the header. If HTTP protocol - allows only one copy of the specified field per header - (e.g. "Content-Length"), - It is prohibited to ask for a List of values when only one - value is permitted, and visa-versa. This restriction prevents - a programmer from processing one value of an header-field - while ignoring other valid values. - -

- - The value being put using one of the - Example: - - - /* add our own Age field if none was added before */ - int age = ... - if (!httpHeaderHas(hdr, HDR_AGE)) - httpHeaderPutInt(hdr, HDR_AGE, age); - - -

- There are two ways to delete a field from a header. To - delete a "known" field (a field with "id" other than - - The - /* delete all fields with a given name */ - HttpHeaderPos pos = HttpHeaderInitPos; - HttpHeaderEntry *e; - while ((e = httpHeaderGetEntry(hdr, &pos))) { - if (!strCaseCmp(e->name, name)) - ... /* delete entry */ - } - - - Note that I/O and Headers - -

- To store a header in a file or socket, pack it using - Adding new header-field ids - -

- Adding new ids is simple. First add new HDR_ entry to the - http_hdr_type enumeration in enums.h. Then describe a new - header-field attributes in the HeadersAttrs array located - in - Finally, add new id to one of the following arrays: - - Also, if the new field is a "list" header, add it to the - - In most cases, if you forget to include a new field id in - one of the required arrays, you will get a run-time assertion. - For rarely used fields, however, it may take a long time - for an assertion to be triggered. - -

- There is virtually no limit on the number of fields supported - by Squid. If current mask sizes cannot fit all the ids (you - will get an assertion if that happens), simply enlarge - HttpHeaderMask type in A Word on Efficiency - -

- - Adding new fields is somewhat expensive if they require - complex conversions to a string. - -

- Deleting existing fields requires scan of all the entries - and comparing their "id"s (faster) or "names" (slower) with - the one specified for deletion. - -

- Most of the operations are faster than their "ascii string" - equivalents. - -File Formats - - -NOTE: this information is current as of version 2.2.STABLE4. - -

-A -struct _storeSwapLogData { - char op; - int swap_file_number; - time_t timestamp; - time_t lastref; - time_t expires; - time_t lastmod; - size_t swap_file_sz; - u_short refcount; - u_short flags; - unsigned char key[MD5_DIGEST_CHARS]; -}; - - - - - -Note that Store ``swap meta'' Description -

-``swap meta'' refers to a section of meta data stored at the beginning -of an object that is stored on disk. This meta data includes information -such as the object's cache key (MD5), URL, and part of the StoreEntry -structure. - -

-The meta data is stored using a TYPE-LENGTH-VALUE format. That is, -each chunk of meta information consists of a TYPE identifier, a -LENGTH field, and then the VALUE (which is LENGTH octets long). - -Types - -

-As of Squid-2.3, the following TYPES are defined (from - - time_t timestamp; - time_t lastref; - time_t expires; - time_t lastmod; - size_t swap_file_sz; - u_short refcount; - u_short flags; - - -2GB objects. As STORE_META_STD except that the swap_file_sz - is a squid_file_sz (64-bit integer) instead of size_t. - - - - -Implementation Notes - -

-When writing an object to disk, we must first write the meta data. -This is done with a couple of functions. First, -Note that the - StoreEntry->swap_file_sz - MemObject->swap_hdr_sz; - -Note that the swap file content includes the HTTP reply headers -and the HTTP reply body (if any). - -

-When reading a swap file, there is a similar process to extract -the swap meta data. First, swap_hdr_sz/. - -leakFinder - -

-src/leakfinder.c contains some routines useful for debugging -and finding memory leaks. It is not enabled by default. To enable -it, use - -configure --enable-leakfinder ... - - -

-The module has three public functions: leakAdd, -leakFree, and leakTouch Note, these are actually -macros that insert __FILE__ and __LINE__ arguments to the real -functions. -

-leakAdd should be called when a pointer is first created. -Usually this follows immediately after a call to malloc or some -other memory allocation function. For example: - - ... - void *p; - p = malloc(100); - leakAdd(p); - ... - - -

-leakFree is the opposite. Call it just before releasing -the pointer memory, such as a call to free. For example: - - ... - leakFree(foo); - free(foo); - return; - -NOTE: leakFree aborts with an assertion if you give it a -pointer that was never added with leakAdd. - - -

-The definition of a leak is memory that was allocated but never -freed. Thus, to find a leak we need to track the pointer between -the time it got allocated and the time when it should have been -freed. Use leakTouch to accomplish this. You can sprinkle -leakTouch calls throughout the code where the pointer is -used. For example: - -void -myfunc(void *ptr) -{ - ... - leakTouch(ptr); - ... -} - -NOTE: leakTouch aborts with an assertion if you give it -a pointer that was never added with leakAdd, or if the -pointer was already freed. - -

-For each pointer tracked, the module remembers the filename, line -number, and time of last access. You can view this data with the -cache manager by selecting the leaks option. You can also -do it from the command line: - -% client mgr:leaks | less - - -

-The way to identify possible leaks is to look at the time of last -access. Pointers that haven't been accessed for a long time are -candidates for leaks. The filename and line numbers tell you where -that pointer was last accessed. If there is a leak, then the bug -occurs somewhere after that point of the code. - -MemPools - -

-MemPools are a pooled memory allocator running on top of malloc(). It's -purpose is to reduce memory fragmentation and provide detailed statistics -on memory consumption. - -

-Preferably all memory allocations in Squid should be done using MemPools -or one of the types built on top of it (i.e. cbdata). - -

-Note: Usually it is better to use cbdata types as these gives you additional -safeguards in references and typechecking. However, for high usage pools where -the cbdata functionality of cbdata is not required directly using a MemPool -might be the way to go. - -Public API - -

-This defines the public API definitions - -createMemPool - -

- - MemPool * pool = memPoolCreate(char *name, size_t element_size); - - -

- Creates a MemPool of elements with the given size. - -memPoolAlloc - -

- - type * data = memPoolAlloc(pool); - - -

- Allocate one element from the pool - -memPoolFree - -

- - memPoolFree(pool, data); - - -

- Free a element allocated by memPoolAlloc(); - -memPoolDestroy - -

- - memPoolDestroy(&pool); - - -

- Destroys a memory pool created by memPoolCreate() and reset pool to NULL. - -

- Typical usage could be: - - ... - myStructType *myStruct; - MemPool * myType_pool = memPoolCreate("This is cute pool", sizeof(myStructType)); - myStruct = memPoolAlloc(myType_pool); - myStruct->item = xxx; - ... - memPoolFree(myStruct, myType_pool); - memPoolDestroy(&myType_pool) - - -memPoolIterate - -

- - MemPoolIterator * iter = memPoolIterate(void); - - -

- Initialise iteration through all of the pools. - -memPoolIterateNext - -

- - MemPool * pool = memPoolIterateNext(MemPoolIterator * iter); - - -

- Get next pool pointer, until getting NULL pointer. - -

- - MemPoolIterator *iter; - iter = memPoolIterate(); - while ( (pool = memPoolIterateNext(iter)) ) { - ... handle(pool); - } - memPoolIterateDone(&iter); - - -memPoolIterateDone - -

- - memPoolIterateDone(MemPoolIterator ** iter); - - -

- Should be called after finished with iterating through all pools. - -memPoolSetChunkSize - -

- - memPoolSetChunkSize(MemPool * pool, size_t chunksize); - - -

- Allows you tune chunk size of pooling. Objects are allocated in chunks - instead of individually. This conserves memory, reduces fragmentation. - Because of that memory can be freed also only in chunks. Therefore - there is tradeoff between memory conservation due to chunking and free - memory fragmentation. - As a general guideline, increase chunk size only for pools that keep very - many items for relatively long time. - -memPoolSetIdleLimit - -

- - memPoolSetIdleLimit(size_t new_idle_limit); - - -

- Sets upper limit in bytes to amount of free ram kept in pools. This is - not strict upper limit, but a hint. When MemPools are over this limit, - totally free chunks are immediately considered for release. Otherwise - only chunks that have not been referenced for a long time are checked. - -memPoolGetStats - -

- - int inuse = memPoolGetStats(MemPoolStats * stats, MemPool * pool); - - -

- Fills MemPoolStats struct with statistical data about pool. As a - return value returns number of objects in use, ie. allocated. -

- - struct _MemPoolStats { - MemPool *pool; - const char *label; - MemPoolMeter *meter; - int obj_size; - int chunk_capacity; - int chunk_size; - - int chunks_alloc; - int chunks_inuse; - int chunks_partial; - int chunks_free; - - int items_alloc; - int items_inuse; - int items_idle; - - int overhead; - }; - - /* object to track per-pool cumulative counters */ - typedef struct { - double count; - double bytes; - } mgb_t; - - /* object to track per-pool memory usage (alloc = inuse+idle) */ - struct _MemPoolMeter { - MemMeter alloc; - MemMeter inuse; - MemMeter idle; - mgb_t gb_saved; /* account Allocations */ - mgb_t gb_osaved; /* history Allocations */ - mgb_t gb_freed; /* account Free calls */ - }; - - -memPoolGetGlobalStats - -

- - int pools_inuse = memPoolGetGlobalStats(MemPoolGlobalStats * stats); - - -

- Fills MemPoolGlobalStats struct with statistical data about overall - usage for all pools. As a return value returns number of pools that - have at least one object in use. Ie. number of dirty pools. -

- - struct _MemPoolGlobalStats { - MemPoolMeter *TheMeter; - - int tot_pools_alloc; - int tot_pools_inuse; - int tot_pools_mempid; - - int tot_chunks_alloc; - int tot_chunks_inuse; - int tot_chunks_partial; - int tot_chunks_free; - - int tot_items_alloc; - int tot_items_inuse; - int tot_items_idle; - - int tot_overhead; - int mem_idle_limit; - }; - - -memPoolClean - -

- - memPoolClean(time_t maxage); - - -

- Main cleanup handler. For MemPools to stay within upper idle limits, - this function needs to be called periodically, preferrably at some - constant rate, eg. from Squid event. It looks through all pools and - chunks, cleans up internal states and checks for releasable chunks. -

- Between the calls to this function objects are placed onto internal - cache instead of returning to their home chunks, mainly for speedup - purpose. During that time state of chunk is not known, it is not - known whether chunk is free or in use. This call returns all objects - to their chunks and restores consistency. -

- Should be called relatively often, as it sorts chunks in suitable - order as to reduce free memory fragmentation and increase chunk - utilisation. -

- Parameter maxage instructs to release all totally idle chunks that - have not been referenced for maxage seconds. -

- Suitable frequency for cleanup is in range of few tens of seconds to - few minutes, depending of memory activity. - Several functions above call memPoolClean internally to operate on - consistent states. - -