Henrik Nordström's Squid work -> Ideas for Async-IO model
Ideas for Async-IO model
Summary:
- Select not used for disk I/O. Use a short select timeout if there is I/O
operations pending to keep things flowing quickly.
- Never have more than one outstanding I/O operation on one FD
- Multiple writes should be joined into one larger
- I/O buffers locked by the I/O functions. Some kind of lock-count needed.
All locking/unlocking done in the main thread when queueing operations
or receiving completed operations.
- Try to balance the I/O load on the available disks.
- Make larger composite I/O operations, to avoid unessesary inter-thread
communication.
- Use some kind of internal "file" to keep state info. Not based on UNIX
FD. A "file" does not nessesary have a filesystem file connected to
it at all times.
- One FD for each client, to avoid file pointer issues
- Disk cache filedescriptors are expendable. It is allowable to scrap "idle" cache
filedescriptors if we run low on filedescriptors. The filedescriptor in
itself does not contain any stateful information.
- The main code should NOT care about disk filedescriptors. This is
only a matter for disk I/O routines.
I/O operations:
- open read-only
- open write-only
- open write-only-append
- read up to 32K. Less if only header is wanted. More if both header and
object wanted.
- write any amount. Queue writes up to 32K before sheduling a write.
- open readonly+stat+read(+close if object small enought)
- open writeonly+write(+close if object small enought)
- write+close
- abort? (maybe not be needed)
- mmap()+write() for sending data to the client?
- fsync on close? (may be a good idea in order to limit buffer cache).
Balancing I/O load for swapouts:
- Use the disk with most available space and a idle thread. The definition
of idle can be that no request are queued on the thread§.
- If there is no idle thread then put the request on a global swapout queue.
Threads checks this global queue before idling. The chanses for getting
a lock situation (swapout requests, all threads idling) from not having
a signal related to this queue is close to non-existent as requests are
only queued here if all threads have requests queued and if it should occur
then it is non-fatal as the next request gets things going again.
Limiting queue lengths:
- If the amount of pending I/O data gets above a certain limit then bypass
disk
- If swapout queue gets above a certain limit, then bypass disk.
Thread communication:
- Use one thread pool for each cache_dir. Runtime configurable amount of
threads for each cache_dir.
- Use one I/O queue per disk or thread. If I/O operations to one FD should
be isolated to one thread then one queue per thread is needed, but due
to thread communication races it may be more efficient to have one queue
per disk, and allow filedesriptors to be shared between them. The combined
operations listed above should effectively cut down the number of inter-thread
operations per swapin/swapout to one for most objects. (open-stat-read-close,
or open-write-close).
- Use a configurable amount of "done" queues. Initially one, but it should
be possible to increase the amount if locking congestion gets to be a problem.
- Keep mutex locks for as short period of time as possible.
Why mmap should not be used
- High overhead for setting up and tearing down mmap() entries
- No read-ahead done by the OS. Most OS:es only demand page mmap()ed files.
- mmap()+write() makes it even harder for people to change Squid to process
body content when data is sent down to clients.
Maybe this does not matter; such processing should be done before store
to cache the processing done.
Guidelines for body-transforming implementations
- Always do the processing before storage
- Do not post-process objects when sent to the client from cache. Future
wersions of Squid may make it impossible to change the object body of a
cached object as it is sent directly from disk to the waiting client and
never even seen by the main Squid process.
- Store rewritten objects using a special store key format. Do not store
rewritten objects using the same store key as non-rewritten objects. This
is important for adecuate peering relationships.
- Change client_side.c to look up rewritten objects where appropriate.
- When peering, do so using store digests and Squids using the same kind
of rewrites. Look up store digests in the same way as local storage is
queried, with the exception that non-modified objects are looked up even
if local storage never contains non-rewritten objects.
Written by Henrik Nordström
<hno@squid-cache.org>,
last changed 1998-09-20