UNIX/Linux & C Programming:
UNIX/Linux & C Programming: Chapter n: Low-level I/O, & Files & Directories
(system calls, data structures, inodes, hard & symbolic links)



Coverage:

(low-level I/O) [UPE] §2.7 (pp. 65-70) and §§7.1-7.2 (pp. 201-214), and [USP] §2.4 (pp. 26-29) and §§4.1-4.6 (pp. 91-128)

(files & directories) [UPE] Chapters 2 and 7 (pp. 214-219) and [USP] Chapters 4 (pp. 102-107, 119-128) and 5 (pp. 145-173)


Overview I/O

  • file descriptors
    • 0 (STDIN_FILENO): stdin
    • 1 (STDOUT_FILENO): stdout
    • 2 (STDERR_FILENO): stderr
  • file descriptor table
    • resides in user program area
    • entry removed per close
  • system file table
    • resides in kernel area
    • shared by all processes
    • one entry per active open; fork?
    • each entry contains an offset for the open file
    • entry deleted when pointer count becomes 0
  • in-memory inode table
    • resides in kernel area
    • entry deleted when pointer count becomes 0
flushed (i.e., write called) when full



(regenerated from [USP] Fig. 4.2, p. 120)


Context switching (library vs. system calls)

  • develop iterative versions of UNIX cat
    • (verion 1: catv1.c) using only standard I/O and library functions getchar/putchar
    • (verion 2: catv2.c) using only standard I/O and library functions fread/fwrite with a buffer of size 1
    • (verion 3: catv3.c) using the system calls read/write with a buffer of size 1
  • compile (to cat1, cat2, and cat3) and time these programs (using /usr/bin/time) on a large input stream redirecting the output to the system's trashcan (/dev/null); what do you notice?
  • experimentation
    • modify versions 2 (catv2.c) and 3 (catv3.c) so that they take the buffer size as a command-line argument and call these version 4 (catv4.c) and version 5 (catv5.c), respectively
    • increase the buffer size
    • what do you notice?
    • at what buffer size does the performance begin to remain constant?
    • now time the UNIX cat command. what can you infer?


Self-exercise: code the UNIX cp command

  • here's a solution (courtesy William Kimball, CPS 445, Winter 2006), which, by default (i.e., without requiring the -p option), only preserves the owner's permissions on the source in the target; how to copy while preserving the permissions of group and other?
  • here's another solution (courtesy Matt Mize, CPS 445, Winter 2006), which does not preserve permissions (i.e., it behaves like cp without any options)


Restart library

  • [USP] example 2.8 on p. 29: close
  • [USP] example 2.9 on p. 29: r_close


Character and block special files in UNIX

  • files which represent hardware devices are located in /dev
  • character special file
    $ ls -l /devices/pci@1e,600000/ide@d/dad@0,0:a,raw
    crw-r-----   1 root     sys      136,  8 Feb 19 17:47 /devices/pci@1e,600000/ide@d/dad@0,0:a,raw
    
  • block special file
    $ ls -l /devices/pci@1e,600000/ide@d/dad@0,0:a
    brw-r-----   1 root     sys      136,  8 Feb 19 02:33 /devices/pci@1e,600000/ide@d/dad@0,0:a
    
  • read and write devices just as we do files
  • for more information, see Chapter 7: `Device and Network Interfaces' of the UNIX manual ($ man -s 7 intro)


I/O recap

  • what is the difference between the library function and a system call? UNIX as opposed to C
  • what is the difference between a FILE* and a file descriptor?
    • UNIX as opposed to C
    • both referred to as a handle
    • former involves another level of indirection (handle to handle)
  • read, write, open, close
  • hint on opening files: open(path, O_WRONLY | O_CREAT | O_TRUNC, mode) = creat(const char *path, mode_t mode) because former is so common


select and poll

  • handling I/O from multiple sources
  • non-blocking I/O
  • select: organizes parameters by type of condition (e.g., read ready or writeready)
  • poll: organizes parameters by file descriptor


Disk statistics

  • du (disk usage) command displays disk usage statistics
    • displays the amount of space (1/2k blocks) occupied by a directory plus all of its files and subdirectories
    • -a option displays an entry for each file in a file hierarchy
      $ du -a  /usr/users/guest
      1047    ./eps
      3       ./permsrec.p
      76      ./printscreen.ps
      93      ./menu.ps
      1       ./.Xdefaults
      1247    .
      
    • du -k shows usage in 1k blocks

  • df (disk free) command displays amount of free disk space on all disks:
    $ df
    Filesystem          Total    kbytes   kbytes     
    node               kbytes      used     free   used    Mounted on
    /dev/rz2a           19743     10387     7382    58%    /
    /dev/rz2g          294518    248466    16601    94%    /usr
    
    notice that each file system is mounted starting from / (root); makes the presence of multiple file systems (and devices) transparent to the user
  • quota -v


File access (3 types)

  • r read (view and copy access; permits ls and od)
  • w write (modify and erase access; permits rm, vi, cp, and mv)
  • x execute (allows one to execute file); slightly different interpretation for directories where is mean (searchable -- access to any files within the directory to which permission has been granted


File permissions, owners, and groups

  • what are the permissions on /bin/ls?
  • file access permissions (modes) are changed with the chmod (change mode) command: $ chmod permissions <file(s)>
    • permissions can be set for:



      (regenerated from [USP] Fig. 4.1, p. 105)

        u user (owner)
        g group
        o others
        a all users

    • 3 types of access:
        r read (view and copy access)
        w write (modify and erase access)
        x execute (allows cd access to directories)

    • use octal permission specification; the rwx access of each class are bits which are on (1) or off (0) and can be specified to chmod as 3 separate digits:
      $ chmod 644 index.html
      $ chmod 700 script.ksh
      
    • use abbreviations; permission can be modified by referencing a class and adding (+), subtracting (-), or setting (=) permissions (general format) chmod [ugoa][+-=][rwx] <file(s)>
      $ chmod u=rw,g=r,o=r index.html
      $ chmod u=rwx script.ksh
      $ chmod a+r README
      
  • to change the owner of a file use the chown command: chown <new-owner> <file(s)>
  • users can belong to groups to facilitate file sharing; students enrolled in CPS 444 are a member of the cps444 group
  • to change the group of a file use the chgrp command: chgrp <new-group> <file(s)>
  • note: recursive option -R to chmod, chown, or chgrp changes the permissions, owner, or group, respectively, for all files in the directory and all subdirectories
  • umask command
    • with no arguments, gets the user's file creation mask
    • with an argument, sets the user's file creation mask
    • use octal code
    • specify the denied permissions (opposite of chmod)
  • investigate the id command


Relevant accessor/modifier functions, and structs

  • ref. [USP] Chapter 5
  • functions
    • chdir
    • getcwd
    • fpathconf
    • pathconf; do not assume PATH_MAX will be defined; use #ifndef, #define, and #endif
    • sysconf
    • opendir
    • readdir
      • returns a pointer to static storage
      • is not thread-safe
      • use readddir_r
    • closedir
    • rwinddir
    • stat; if passed a link, returns information about the file referred to by the link
    • lstat;
    • returns information about the link itself
  • stat struct
    • various fields
    • functions for determining the type of a file


Inodes

  • structure containing bookkeeping information for the file
  • schematic of inode struct
    • does not contain filename, why?
    • why are indirect pointers necessary?


  • (regenerated from [USP] Fig. 5.3, p. 160)



    (regenerated from [USP] Fig. 5.4, p. 163)

  • ls -i prints the inode number in the first column of the listing


File links: hard vs. soft

  • UNIX filesystem is a tree
  • including links make it a DAG (Directed Acyclic Graph)
  • links are filenames which refer to the same file
  • used to reduce redundancy
  • changes to one affects the other
  • both refer to the same physical file on the disk
  • created with the ln command or the link, symlink functions
  • two types of links
    • hard links
    • soft links (also sometimes called symbolic links or symlinks)
  • note:
    • rm does not actually erase a file (i.e., inode struct not freed); rather it removes the link to the file (rationale behind name of C function and UNIX command unlink)
    • when the last link to a file is removed, UNIX reclaims the file space (i.e., the blocks pointed to by the inode list are returned to the free list)


Hard links

  • a direct pointer to an inode
  • indistinguishable from original file (any changes made in the linked file reflect in the original file)
  • changes made in any reflect in all
  • a file is accessible as long as a hard link to it remains
  • created with the ln command (and no options) or the link function
  • example:
    $ pwd
    /home/lucy/dir1
    $ ln ../dir2/proga prog1
    
    creates a second (hard link) file reference to proga

  • another example: ln /dirA/name1 /dirB/name2



    (regenerated from [USP] Fig. 5.5, p. 165)

  • cannot be made across distinct filesystems


Symbolic (soft) links

  • a symbolic link (or symlink or soft link) is a pointer to a file name
  • if the original file is moved, the link becomes invalid
  • deleting the link does not delete the original file
  • if removed, the original file will remain
  • usually made to directories to create shortcuts
  • created with the -s option to ln or the symlink function
  • example: ln -s ../dir2/proga prog1; same effect as above command, except that a symbolic (soft) link is created
  • another example: ln -s /dirA/name1 /dirB/name2



    (regenerated from [USP] Fig. 5.8, p. 170)


  • displayed in the ls listing as a file type of l
    lrwxrwxrwx  1 lucy cps444 104 Sep 12 19:54 note2
    
    notice the 1 in the link field (show up with a trailing @ in output of ls -F)
  • note: mv is implemented as ln and rm; a new link is established and then the old link removed
  • helpful for creating shortcuts to directories or files, especially in setting up a pages served on the web
  • can be made across distinct filesystems


Editor examples

  • editing a file which has a hard link to it
    • with basic editor (does not create a backup of file); drew diagrams of directory entries and inodes
    • with editor which creates a backup (.bak) file; how does the diagram change?
  • editing a file which has a symbolic link to it
    • with basic editor; drew diagrams of directory entries and inodes
    • with backup editor: self-exercise
    • how can we change our editor so that a symbolic or hard link to a file it modifies always reference the new file? use O_WRONLY and O_TRUNC flags; causes same inode to be used


od (octal dump) command

  • display an ascii, decimal, hexadecimal, or octal dump of a file
  • takes each set of 16-bits words and converts each to octal
  • examples:
    $ od -c # as a print character
    $ od -b # bytewise, but in octal
    $ od -cb 
    $ od -d # decimal
    $ od -x # hex
    
  • 7-digit #s down the lhs are positions in the file (i.e., the ordinal # (starting from 0) of the next character shown) in octal
  • emphasis on octal is remnant of the PDP-11
  • also see xxd command
  • see ascii(7)


File `types' and `names'

  • no such concept as a file type in UNIX
  • how does the file command do its job? uses clues (e.g., binary `magic' #)
    • examples:
      $ od /bin/ls | head -2
      0000000 077505 046106 000402 000400 000000 000000 000000 000000
      0000020 000002 000002 000000 000001 000001 007010 000000 000064
      $ od /bin/od | head -2
      0000000 077505 046106 000402 000400 000000 000000 000000 000000
      0000020 000002 000002 000000 000001 000001 006014 000000 000064
      $ od 20060125/v3cat | head -2
      0000000 077505 046106 000402 000400 000000 000000 000000 000000
      0000020 000002 000002 000000 000001 000001 002320 000000 000064
      
    • 0000000 077505 046106 000402 000400 000000 000000 000000 000000 on sun131-32
  • if filename is not stored in the inode, where is a file's name located?
    • try an od on a directory
    • notice anything interesting?
    • compare ordering of filenames to that in output of ls -l
    • a directory consists of 16-byte chunks
      • first 2 hold the inode #
        • the only connection between name of a file and its contents
        • this is why each filename in a directory is called a link; links name of file to inode and, thus, data
        • rm removes directory entries, not inodes
        • an inode # of 0 of a file in a directory indicates that the link been removed, but not necessarily the contents of the file
      • last 14 hold the filename, padded with ASCII NULLs; filename size limits?


Question to investigate

  • why cannot we od or xxd on directories we own?
  • places to look for information:
    • section 4 of the UNIX manual covers `File Formats'
    • dir_ufs(4), dir(4)


Set-uid program

  • for instance, passwd
  • how can you change your password when you do not have write permission on the password file?
    $ ls -l /etc/passwd 
    -rw-r--r--   1 root     sys          580 Aug  1  2005 /etc/passwd
    
  • set-uid: grant permission to use file and while doing so to have all of owner's privileges
    • sounds dangerous
    • indicated with s bit(s) in permission list
  • $ ls -l /bin/passwd
    -r-sr-sr-x   1 root     sys        27220 Jan 22  2005 /bin/passwd*
    
    bin/passwd is said to be `set-uid to root'
  • root is superuser on the system


Login process

  • your login name is just a mnemonic for your uid
  • when you enter your password during login, the system encrypts it (e.g., using crypt) and compares the encrypted form to value (which is encrypted there) of the appropriate field in your entry in the passwd file
  • if they agree, you are permitted to access the system


Things to do

  • find the entry in the passwd file which corresponds to your account
  • because of the shared file systems you will have to do some exploring
  • once your find your entry, record your uid and gid
  • see the following manpages for help:
    • passwd(1)
    • passwd(4)
    • finger(1)
    • ypcat(1)
    • ypmatch(1)
    • getent(1)


find command

    walk a file hierarchy to find files and directories, and optionally execute a command line on all files found
    $ find . -name wc.c -print
    $ find . -name "*.c" -print
    $ find ~ -name "*.c" -print
    $ find . -name "sf[1-9].cpp" -print
    $ find . -type d -print
    $ find ~ -type d -print
    $ find $HOME -type f -print
    $ find . -name "*.c" -exec chmod 660 {} \;
    $ find . -name "*" -type f -exec chmod 400 {} \;
    $ find . -name "*" -type d -exec chmod 500 {} \;
    $ find . -name .DS_Store -exec rm {} \;
    $ find . -name .DS_Store -delete
    $ find . -name ".*rc" -print
    $ find . \( -name "*~" -o -name "*.bak" \) -exec rm {} \;
    
    can we solve the file renaming problem with the find command? E.g., $ find home -name route.c -exec mv {} route.cpp \;


Accounts

  • su (allows you to become superuser or another user
    • root assumed if no user specified
    • passing a - (e.g., su -) changes the environment to what would be expected if the user actually logged in as the specified user; otherwise environment inherited
  • wheel group (allows execution of su command)
  • adduser


References

    [C] C Language for Experienced Programmers, Version 2.0.0, AT&T, 1988.
    [UPE] B.W. Kernighan and R. Pike. The UNIX Programming Environment. Prentice Hall, Upper Saddle River, NJ, Second edition, 1984.
    [USP] K.A. Robbins and S. Robbins. UNIX Systems Programming: Concurrency, Communication, and Threads. Prentice Hall, Upper Saddle River, NJ, Second edition, 2003.

Return Home