Section 17.6
Files

Operating systems devote a lot of code to the most efficient organization of files in a disk drive. A file is an abstract entity, a fiction maintained by the operating system, whereas a sector is somewhat more concrete, since it can be identified as being a particular place on the surface. Many files are longer than one sector so several sectors must be used to store a file. Disk drives and their controllers know nothing of files nor do they care what data the operating system stores inside the sectors, freeing the OS to structure files in any way it wants.

Some early microcomputer systems stored files as consecutive sectors on the same track, spilling over to the adjacent track if the file were too long. This method, called contiguous allocation, makes it difficult to move files around, or to enlarge an existing file, since it would clash with the next file on disk. MS-DOS works this way.

Fig. 17.6.1 shows this rigid system.


Fig. 17.6.1: Storing files in consecutive sectors under contiguous allocation

The advantages of this system are that it is simple and easy to implement and it is very fast. Most large computers and many modern personal computers use a much more complicated and flexible method. Taking the cue from virtual memory pages, these systems break a file into sector-sized pieces and store them anywhere on the disk where there is free space. In order to know which sectors are part of which files, the operating system maintains either tables relating sector addresses to positions in the file, or it chains together the sectors in linked lists. The directory entry that stores the file's name also stores the address of the first sector of the file. To find the second sector, the operating system reads the first sector and decodes part of the 512 bytes as the pointer to the next sector.

Fig. 17.6.2 shows how this might look.


Fig. 17.6.2: Several files stored as linked lists of sectors;
Only the beginning of the linked list for file 1 is shown

Linked-list allocation gives the ultimate in flexibility and disk utilization, but after a while, performance slows down. When files are deleted, sectors are reclaimed and stored on a free sector list, which serves as a pool from which the operating system can get a chunk of disk memory when it needs it. Over time, these sectors will be sprinkled over the entire disk, and files that are formed from them will likewise pepper the disk, causing the reading of a file to involve many seeks and slowing it down. Minimizing this effect is one of the tasks of operating systems designers.

One solution is to reorganize or defragment the disk every so often--write all files to tape or another disk, and then write them back to the original disk, only using consecutive sectors and adjacent tracks. This takes time and must be done when the disk is not actually needed by programs.

Another method of reducing seek time is to keep all sectors of a given file close to each other. A cylinder is a group of all tracks on the several surfaces of a disk drive that are at the same distance from the edge. (see Fig. 17.6.3)


Fig. 17.6.3: A cylinder is an imaginary grouping of all the tracks at the
same distance from the center of the disk drive on all the surfaces

Suppose a large file were being written. When the current track fills up, the computer switches to another read-write head, an electronic operation which is almost instantaneous, and continues writing on the next track in the same cylinder. Of course, it may run out of room even then, in which case it can switch to the next cylinder and continue there. UNIX BSD4.3 clumps cylinders into cylinder groups to further isolate parts of the disk and minimize seek time. Each cylinder group has its own private free sector list.

Very large files such as huge databases may not fit entirely on one disk drive, in which case they may be continued on another drive. Specialized database management software helps the operating system with their maintenance.