File System in Operating System
Files are managed by the operating system. The part of the Operating System that deals with the files are called File System. File System describes how files are named, accessed, used, protected and implemented in the Operating System.
A file is a collection of related information. A file may contain data or program. The program can be a source file or it can be an object file or some other too. Data can be numeric, alphabetic or alphanumeric, etc. So different types of information can be stored in a file i.e. source or object program, numeric, alphabetic or alphanumeric data, graphical images, movie clips, and sounds, etc. The structure of a file defined is according to its type e.g. the structure of a file containing the graphical images will be different from the structure of the file containing a source program, or textual information.
Also Read: Introduction to Operating System
Every file in the operating system has a name. The naming rules differ from one operating system to another. Information stored in the file depends upon the type of information, which ultimately becomes the type of a file e.g. a text file contains text in it whereas a graphical file contains a picture in it etc.
Apart from the file name and file type, other important properties are the creation date and time of a file, creator/owner of the file, size of the file and its permission or attributes.
An important consideration in the design of an operating system is whether it supports file types or not. If the operating system knows or recognizes the file type and the data contained in it then the file can be manipulated in a better manner, otherwise, the data or information stored in the file is of no use to the operating system.
If an operating system knows different types of files etc then one disadvantage is obvious. If the operating system defines 20 different file types then there must be the code to support these files types. Another problem will be for the files whose file type is not supported. Many operating systems use this approach e.g. Windows 95/98 etc.
Another technique adopted by some Operating Systems is of no file type. UNIX Operating System uses this technique and considers each file as a sequence of bytes. This technique gives flexibility but less support, as each program should include its own code to change the data into its structures.
Files are stored on disks. Disks systems have a well -defined block size determined by the size of a sector. Disk I/O is in units of one block and all blocks are of the same size. As the disk space is always allocated in blocks so some portion of the last block of each file is wasted. If the block size is 512 bytes, then a file of 1949 bytes will be allocated four blocks (2048 bytes) and the last 99 bytes would be wasted.
So, the wasted bytes allocated to keep everything in a unit of blocks (instead of bytes) are Internal Fragmentation. All the systems suffer from Internal Fragmentation if block size will be longer than Internal Fragmentation will be greater too. So the wasted bytes allocated to keep everything in units of blocks (instead of bytes) are internal fragmentation.
A directory structure provides a mechanism for organizing files in the file system.
Many systems use two directory structures
- Device Directory
- File Directory
Device Directory describes all files on that device. So, the device directory describes the Physical properties of a file i.e. file location and its size, etc.
File Directory describes the logical properties of files i.e. file name, file type, files owner name and files permissions, etc. For the Physical properties of a file, the File Directory can point to the Device Directory.
The type of information about files stored in a directory differs from Operating
System to Operating System. Information’s that can be kept in a file directory are
· File Name
Contains the name of the file.
· File Type
Contains the information about the type of file (used where the system supports files of different types)
Location of the file
Size of the file in bytes
· Current Position
Position of the pointer in the file
Contains files Protection Information, i.e. who can read, write or execute the files.
A value indicating the usage of the file
· Date & Time
Creation or Last modification Date & Time of the file
· Process Identification
Identification number when the file is executed. In order to record the information for each file, the Operating System may take 16 to 1000 bytes. So, a system having a large number of files will have a directory size of hundreds of thousands of bytes.
The directory can be organized in many ways. We should be able to insert entries (files etc), delete them, search according to some criteria and list all the files, etc of a directory.
Also Read: Race Conditions
Linear List of Directory Entries
A linear list of directory entries requires a linear search to find a particular entry. Although this approach is simple but time-consuming, as to create a new file we must first search the directory to be sure that no existing file has the same name. The real disadvantage of a linear list of directory entries is the sequential search to find a file.
Sorted List of Directory Entries
Sorted list of directory entries performs a binary search and is faster. Each entry in the list is sorted, but this requirement also complicates the creation and deletion of files.
Another data structure used is a hash -table. In hash-table the search time is fast and Insertion and Deletion of the file are also simple. The problem here is the fixed size of hash table entries, i.e. A hash -table of 64 entries converts file names into integers from 0 -63. To add 65th entry the size of the hash -table will have to be increased and for this, all the existing entries will be changed again to reflect the correct entry in the table.
There are two ways through which the information stored in the file can be accessed. One is called the Sequential Access whereas the other is called Random or Direct Access.
In early operating systems, access method was sequential. Processes can read all the bytes or records of the file in sequential order e.g. starting from the first and read all of them one by one in a sequence. For storage medium like tape drive sequential access method is used.
Files whose records etc can be accessed in any order is called “Random or Direct Access”. Random or direct access was possible when we started storing the information on Direct Access Storage Devices (DASD) i.e. Hard Disk and Floppy Diskettes etc. In random or direct access, all the records of files are stored on the basis of some key. So any record can very easily be found using this method.
Random Access files are used in many applications, especially in database systems e.g. an Airline Customer wants to reserve a seat on a particular flight, so the
reservation program should be able to access the record of that flight without checking the records of other flights.
In some older Operating Systems files are classified as Sequential or Random Access when they are created, but now the files of modern or present-day Operating Systems are Random Access files automatically.
Directories are used in Operating Systems in order to keep track of files.
Operations that can be performed on a directory are
A directory contains many files in it, so Operating System should be able to search a file in a directory that matches specified criteria. i.e. “dir” and “find” etc
· Create File
Operating System should be able to create and add new files in the directory. i.e. “edit” and “vi” etc
· Delete File
Operating System should be able to remove files from the directory that is no longer needed i.e. “del”, “erase” and “rm” etc
· List Directory
Operating System should be able to provide the list of files and directories present in a directory i.e. “dir” and “ls” etc
Operating System should be able to provide backup facility so that the important file or files in a directory can be copied to some other device too i.e. tape etc. Like this if an important file is lost due to some reason then we already have a copy of that file on some other device, so Backup saved us from trouble.
Single-Level Directory is the simplest directory structure. In Single -level directory, all the files are present in the same directory. Single -level directory structure has limitations too, i.e. as all files are present in one directory, so each file should have a unique name. Another problem in the Single -level directory is that when the number of file increases, it becomes difficult to manage the files.
The real problem in the Single -level directory is the confusion between the file names of different users as all the user’s place/store/keep their files in the same directory. Solution for such kind of problem is to have a separate directory for each user. On large systems, directory organization is logical rather than physical.
In the Two-level directory structure, each user has its own directory where he creates and stores his files. Users own directory is also called User File Directory or UFD. When the user logs in, the system Master File Directory MFD is searched. The MFD is indexed by user Account Number where each entry points to a UFD.
When a user tries to find a file then only that users file directory is checked for the existence of the file. So, in this way different users can have files with the same name as each users file is created in his directory.
To create a file for a user, the Operating System searches only that user directory to confirm the newly created file name is unique in the current directory. Normally a user can create or delete the file only in his directory.
A special system program is used when necessary to create or delete user directories. This program creates or deletes the user directory and adds the entry in the Master File Directory. Only authorized persons should be allowed to use this program.
There are advantages as well as disadvantages of a two-level directory structure. The advantage is that a user is completely independent i.e. no other user can access his files. Disadvantage starts if a user wishes to access the files of other users.
A two-level directory is like a tree of height 2. The root of the tree is Master File Directory. Its descendants are User File Directories, and descendants of User File Directories are the files. So, files are the leaves of the tree, and path name is defined as a user name a file name. The sequence of directories searched when a file is named is called the Search Path.
In Two-Level directories, the two-level directory was taken as a tree of height 2. Here the same concept is used i.e. tree that can be of any height. Its benefit is that users can create their own sub-directories and organize their files.
The file system of MS-DOS is based on Tree-Structured Directories. There is a root directory in the tree and every file in the file system has a unique pathname.
Pathnames can be of two types.
- Absolute Path
- Relative Path
Absolute Path begins from the root and goes a path down to the specified file,
showing the directory names on the path i.e. “ /usr/test/clear” or “\turboc\bin\tc.exe”
Relative Path begins from the current directory. In relative pathname, a path is defined for a file starting from the current directory instead of root i.e. “ test/clear” or “bin\tc.exe”
Sharing of files and directories etc is not possible in Tree-Structured directories, but files and directories can be shared in Acyclic -Graph Directories.
In Acyclic-Graph the shared file or directory exists in the file system in two or more places at the same time. Shared file or directory doesn’t mean two or more copies. If two copies of the same file are present in different directories than changing a copy of the file in one directory doesn’t mean the change in the second file automatically present in another directory. In shared files, there is only one actual file so any change made in the file is visible to all the other. So a new file created or copied in a shared directory will immediately be visible to all the users sharing that directory.
Although Acyclic-Graph Directory structure is flexible than a simple Tree structure but is more complex.
A problem in the use of Acyclic -Graph structure is to make sure that there are no cycles. When we allow users to create subdirectories and files in a Two-Level directory then a Tree-Structured directory is formed, and if we continue to add new files and directories in Tree-Structured directory then Tree-Structured directory retains its properties. But when we add links in the Tree-Structured directory, the result is the simple graph structure.
In Acyclic-Graph we want to avoid the search of shared sections twice, because of performance. If we have searched a directory for a file then searching that directory again will be the wastage of time. So, a poorly designed algorithm may result in a loop and continue searching the cycle and not terminating.
Garbage Collection In Garbage Collection whole of the file system is checked and everything that can be accessed is marked. A second check is used to mark all the entries that were not marked in the first phase.
So, Acyclic-Graph is easy to use. The problem in Acyclic -Graph structure is of cycles and algorithms like Garbage Collection can be used to determine cycles, but Garbage Collection is also a lengthy and time-consuming process.
Also Read: 5 Reasons to Choose Linux Mint Over Ubuntu
Files stored in the computers need to be protected from the physical damage (reliability) as well as unauthorized access (protection).
For reliability, we already take the backups, etc of the files, so in case of hardware failure i.e. errors in reading or writing, power failures, etc and in case of software problem i.e. bugs in the file system software, we have the copy of important files at some other place too.
Files can be protected in many ways. One way used in single-user systems is the removal of floppy disk from the computer so that no user can copy the files.
Files can be protected in a better way if we provide controlled access. In this mechanism, Access is permitted or denied depending upon some factors; one out of them is the type of access requested.
Access types available in Unix/Linux i.e. permission are
Also Read: Understanding Linux File System
|· read||Read from the file||r|
|· write||Write or Re-Write the file||w|
|· execute||Execute the file||x|
Similarly, Access types of rights available in Novell Netware are
|· Supervisory||Contains all the rights||S|
|· Read||Read the file/folder||R|
|· Write||Write to the file/folder||W|
|· Create||Create the file/folder||C|
|· Erase||Erase the file/folder||E|
|· Modify||Modify the contents of file/folder||M|
|· File Scan||See the list of files/folders||E|
|· Access Control||Assign rights to other users||A|
Permissions in Windows 2000 on Users/Groups are
- Full Control: Has Full Control
- Modify: Modify the file/folder
- Read & Execute: Read or Execute the file
- List Folder Contents: See the contents of files/folders
- Read: Read the file
- Write: Write to the file/folder Whereas, Permission in Windows 2000 on Folders, Subfolders and Files are
- Traverse Folder/Execute File: Move between folders/Execute the file
- List folder/Read Data: See folder list/Read data from the file
- Read Attributes: Attributes related to Reading
- Read Extended Attributes: Extended Read Attributes
- Create Files/Write Data: File Creation/Writing Data
- Create Folders/Append Data: Folder Creation/Appending Data
- Write Attributes: Attributes related to Writing
- Write Extended Attributes: Extended Write Attributes
- Delete Subfolders and Files: Delete files/folders
- Delete: Delete Permissions
- Read Permissions: Read Permissions
- Change Permissions: Modify/Change Permissions
- Take Ownership: Can Take Ownership of file/folder
Normally protection is provided at the lower level i.e. a user who has access to read a file can also copy and print it.
Protection of Directories is different from the protection of files. Directory protections may be the creation and deletion of files in a directory or even restricting the listing of files in a directory.
Similarly, Protection of Users/Groups can even be different from the protection of files or directories, etc i.e. as in the case of Windows 2000.