At bottom, information in a computer is stored as a series of bits, which can be grouped into larger units such as bytes or “words” that represent particular numbers or characters. In order to be stored and retrieved, a collection of such binary data must be given a name and certain attributes that describe how the information can be accessed. This named entity is the file.
Files and the Operating System
Files can be discussed at three levels, the physical layout, the operating system, and the application program. At the physical level, a file is stored on a particular medium. (See floppy disk, hard disk, cd-rom, and tape drives.) On disk devices a file takes up a certain number of sectors, which are portions of concentric tracks. (On tape, files are usually stored as contiguous segments or “blocks” of data.)
The file system is the facility of the operating system that organizes files (see operating system). For example, on DOS and older Windows PCs, there is a file allocation table (FAT) that consists of a linked list of clusters (each cluster consists of a fixed number of sectors, varying with the overall size of the disk). When the operating system is asked to access a file, it can go through the table and find the clusters belonging to that file, read the data and send it to the requesting applica-tion. Modern file systems further organize files into groups called folders or directories, which can be nested several lay-ers deep. Such a hierarchical file system makes it easier for users to organize the dozens of applications and thousands of files found on today’s PCs. For example, a folder called Book might have a subfolder for each chapter, which in turn con-tains folders for text and illustrations relating to that chapter.
Besides storing and retrieving files, the modern file sys-tem sets characteristics or attributes for each file. Typical attributes include write (the file can be changed), read (the file can be accessed but not changed), and archive (which determines whether the file needs to be included in the next backup). In multi-user operating systems such as UNIX there are also attributes that indicate ownership (that is, who has certain rights with regard to the file). Thus a file may be executable (run as a program) by anyone, but write-able (changeable) only by someone who has “superuser” status (see also data security).
The current generation of file systems for PCs includes additional features that promote efficiency and particularly data integrity. Versions of Windows starting with NT, 2000, and XP come standardly with NTFS, the “New Technology File System,” which includes journaling, or the keeping of a record of all transactions affecting the system (such as deleting or adding a file). In the event of a mishap such as a power failure, the transactions can be “replayed” from the journal, ensuring that the file system reflects the actual current status of all files. NTFS also uses “metadata” that describes each file or directory. Database principles can thus be applied to organizing and retrieving files at a higher level.
Linux (based on UNIX) uses a single file system hier-archy that incorporates all devices in the system. (The net-work file system, NFS, effectively extends the hierarchy to all machines on the local network.) The popular Linux ext3 file system also includes journaling.
Files and Applications
The ultimate organization of data in a file depends on the application. A typical approach is to define a data record with various fields. The program might have a loop that repeatedly requests a record from the file, processes it in some way, and repeats until the operating system tells it that it has reached the end of the file. This would be a sequential access; a program can also be set up for ran-dom access, which means that an arbitrary record can be requested and that request will be translated into the cor-rect physical location in the file. The two approaches can be combined in ISAM (Indexed Sequential Access Method), where the records are stored sequentially but fields are indexed so a particular record can be retrieved.
Since files such as graphics (images), sound, and format-ted word processing documents can only be read and used by particular applications, files are often given names with extensions that describe their format. When a Windows user sees, for example, a Microsoft Word document, the filename will have a .DOC extension (as in chapter.doc) and will be shown with an icon registered by the application for such files. Further, a file association will be registered so that when a user opens such a file the Word program will run and load it.
From a user interface point of view, the use of the file as the main unit of data has been criticized as not correspond-ing to the actual flow of most kinds of work. While from the computer’s point of view, the user is opening, modify-ing, and saving a succession of separate files, the user often thinks in terms of working with documents (which may have components stored in a number of separate files.) Thus, many office software applications offer a document-oriented or project-oriented view of data that hides or minimizes the details of individual files (see document model).
No comments:
Post a Comment