What are directories, if everything on Linux is a file?












15















Very often beginners hear a phrase "Everything is a file on Linux/Unix". However, what are the directories then? How are they different from files?










share|improve this question





























    15















    Very often beginners hear a phrase "Everything is a file on Linux/Unix". However, what are the directories then? How are they different from files?










    share|improve this question



























      15












      15








      15


      5






      Very often beginners hear a phrase "Everything is a file on Linux/Unix". However, what are the directories then? How are they different from files?










      share|improve this question
















      Very often beginners hear a phrase "Everything is a file on Linux/Unix". However, what are the directories then? How are they different from files?







      files directory






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Sep 10 '18 at 2:36









      muru

      1




      1










      asked Sep 10 '18 at 2:29









      Sergiy KolodyazhnyySergiy Kolodyazhnyy

      73.3k9153318




      73.3k9153318






















          1 Answer
          1






          active

          oldest

          votes


















          20














          Note: originally this was written to support my answer for Why is the current directory in the ls command identified as linked to itself? but I felt that this is a topic that deserves to stand on its own, and hence this Q&A.



          Understanding Unix/Linux filesystem and files: Everything is an inode



          Essentially, a directory is just a special file, which contains list of entries and their ID.



          Before we begin the discussion, it's important to make a distinction between a few terms and understand what directories and files really represent. You may have heard the expression "Everything is a file" for Unix/Linux. Well, what users often understand as file is this: /etc/passwd - An object with a path and a name. In reality, a name (be it a directory or file, or whatever else) is just a string of text - a property of the actual object. That object is called inode or I-number, and stored on disk in the inode table. Open programs also have inode tables, but that's not our concern for now.



          Unix's notion of a directory is as Ken Thompson put it in a 1989 interview:




          ...And then some of those files, were directories which just contained name and I-number.




          An interesting observation can be made from Dennis Ritchie's talk in 1972 that




          "...directory is actually no more than a file, but its contents are controlled by the system, and the contents are names of other files. (A directory is sometimes called a catalog in other systems.)"




          ...but there's no mention of inodes anywhere in the talk. However,
          the 1971 manual on format of directories states:




          The fact that a file is a directory is indicated by a bit in the flag word of its i—node entry.



          Directory entries are 10 bytes long. The first word is the i—node of the file represented by the entry, if non—zero; if zero, the entry is empty.




          So it has been there since the beginning.



          Directory and inode pairing is also explained in How are directory structures stored in UNIX filesystem?. a directory itself is a data structure, more specifically: a list of objects (files and inode numbers) pointing to lists about those objects (permissions, type, owner, size, etc.). So each directory contains its own inode number, and then filenames and their inode numbers. Most famous is the inode #2 which is / directory. (Note, though that /dev and /run are virtual filesystems, so since they are root folders for their filesystem, they also have inode 2; i.e. an inode is unique on its own fileystem, but with multiple filesystems attached, you have non-unique inodes ). the diagram borrowed from the linked question probably explains it more succinctly:



          Directory-iNode-Block



          All that information stored in the inode can be accessed via stat() system calls, as per Linux man 7 inode:




          Each file has an inode containing metadata about the file. An
          application can retrieve this metadata using stat(2) (or related calls), which returns a stat structure, or statx(2), which returns a statx structure.




          Is it possible to access a file only knowing its inode number ( ref1 , ref2 )? On some Unix implementations it is possible but it bypasses permission and access checks, so on Linux it's not implemented, and you have to traverse the filesystem tree (via find <DIR> -inum 1234 for example) to get a filename and its corresponding inode.



          On the source code level, it's defined in the Linux kernel source and is also adopted by many filesystems that work on Unix/Linux operating systems, including ext3 and ext4 filesystems (Ubuntu default). Interesting thing: with data being just blocks of information, Linux actually has inode_init_always function that can determine if an inode is a pipe (inode->i_pipe). Yes, sockets and pipes are technically also files - anonymous files, which may not have a filename on disk. FIFOs and Unix-Domain sockets do have filenames on filesystem.



          Data itself may be unique, but inode numbers aren't unique. If we have a hard link to foo called foobar, that will point to inode 123 as well. This inode itself contains information as to what actual blocks of disk space are occupied by that inode. And that's technically how you can have . being linked to the directory filename. Well,almost: you can't create hardlinks to directories on Linux yourself, but filesystems can allow hard links to directories in a very disciplined way, which makes a constraint of having only . and .. as hard links.



          Directory Tree



          Filesystems implement a directory tree as one of the tree datastructures. In particular,




          • ext3 and ext4 use HTree

          • xfs uses B+ Tree

          • zfs uses hash tree


          Key point here is that directories themselves are nodes in a tree, and subdirectories are child nodes, with each child having a link back to the parent node. Thus, for a directory link the inode count is minimum 2 for a bare directory (link to parent .. and link to self .), and each additional subdirectory is an extra link/node:



          # new directory has link count of 2
          $ stat --format=%h .
          2
          # Adding subdirectories increases link count
          $ mkdir subdir1
          $ stat --format=%h .
          3
          $ mkdir subdir2
          $ stat --format=%h .
          4
          # Count of links for root
          $ stat --format=%h /
          25
          # Count of subdirectories, minus .
          $ find / -maxdepth 1 -type d | wc -l
          24


          The diagram found on Ian D. Allen's course page shows a simplified very clear diagram:



          WRONG - names on things      RIGHT - names above things
          ======================= ==========================

          R O O T ---> [etc,bin,home] <-- ROOT directory
          / | / |
          etc bin home ---> [passwd] [ls,rm] [abcd0001]
          | / | / |
          | ls rm abcd0001 ---> | <data> <data> [.bashrc]
          | | | |
          passwd .bashrc ---> <data> <data>


          The only thing in the RIGHT diagram that's incorrect is that files aren't technically considered being on the directory tree itself: Adding a file has no effects on the links count:



          $ mkdir subdir2
          $ stat --format=%h .
          4
          # Adding files doesn't make difference
          $ cp /etc/passwd passwd.copy
          $ stat --format=%h .
          4


          Accessing directories as if they're file



          To quote Linus Torvalds:




          The whole point with "everything is a file" is not that you have some random filename (indeed, sockets and pipes show that "file" and "filename" have nothing to do with each other), but the fact that you can use common tools to operate on different things.




          Considering that a directory is just a special case of a file, naturally there have to be APIs that allow us to open/read/write/close them in a similar fashion to regular files.



          That's where dirent.h C library comes into place, which defines the dirent structure, which you can find in man 3 readdir:



             struct dirent {
          ino_t d_ino; /* Inode number */
          off_t d_off; /* Not an offset; see below */
          unsigned short d_reclen; /* Length of this record */
          unsigned char d_type; /* Type of file; not supported
          by all filesystem types */
          char d_name[256]; /* Null-terminated filename */
          };


          Thus, in your C code you have to define struct dirent *entry_p, and when we open a directory with opendir() and start reading it with readdir(), we'll be storing each item into that entry_p structure. Of course, each item will contain the fields defined in the template for dirent shown above.



          The practical example of how this works can be found in my answer on How to list files and their inode numbers in the current working directory.



          Note that the POSIX manual on fdopen states that "[t]he directory entries for dot and dot-dot are optional" and readdir manual states struct dirent is only required to have d_name and d_ino fields.



          Note on "writing" to directories: writing to a directory is modifying its "list" of entries. Hence, creating or removing a file is directly associated with directory write permissions, and adding/removing files is the writing operation on said directory.






          share|improve this answer





















          • 2





            I refuse to accept sockets are files ;) Would "everything is accessible as a file" be more accurate?

            – Rinzwind
            Sep 10 '18 at 6:53











          • @Rinzwind Well, the phrase "everything is accessible as a file" is accurate. Regular files have open() and read() sockets have connect() and read(),too. What would be more accurate is that "file" is really organized "data" stored on either disk or memory, and some files are anonymous - they don't have filename. Usually users think of files in terms of that icon on the desktop, but that's not the only thing that exists. See also unix.stackexchange.com/a/116616/85039

            – Sergiy Kolodyazhnyy
            Sep 10 '18 at 7:01











          • Well the question was more about if a directory was a file. And it is. Sockets could almost be a separate question along with FIFO Named Pipes.

            – WinEunuuchs2Unix
            Sep 10 '18 at 20:01











          • Well, I got an answer about pipes so far: askubuntu.com/a/1074550/295286 Maybe FIFOs will be next

            – Sergiy Kolodyazhnyy
            Sep 12 '18 at 9:40











          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "89"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1073802%2fwhat-are-directories-if-everything-on-linux-is-a-file%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          20














          Note: originally this was written to support my answer for Why is the current directory in the ls command identified as linked to itself? but I felt that this is a topic that deserves to stand on its own, and hence this Q&A.



          Understanding Unix/Linux filesystem and files: Everything is an inode



          Essentially, a directory is just a special file, which contains list of entries and their ID.



          Before we begin the discussion, it's important to make a distinction between a few terms and understand what directories and files really represent. You may have heard the expression "Everything is a file" for Unix/Linux. Well, what users often understand as file is this: /etc/passwd - An object with a path and a name. In reality, a name (be it a directory or file, or whatever else) is just a string of text - a property of the actual object. That object is called inode or I-number, and stored on disk in the inode table. Open programs also have inode tables, but that's not our concern for now.



          Unix's notion of a directory is as Ken Thompson put it in a 1989 interview:




          ...And then some of those files, were directories which just contained name and I-number.




          An interesting observation can be made from Dennis Ritchie's talk in 1972 that




          "...directory is actually no more than a file, but its contents are controlled by the system, and the contents are names of other files. (A directory is sometimes called a catalog in other systems.)"




          ...but there's no mention of inodes anywhere in the talk. However,
          the 1971 manual on format of directories states:




          The fact that a file is a directory is indicated by a bit in the flag word of its i—node entry.



          Directory entries are 10 bytes long. The first word is the i—node of the file represented by the entry, if non—zero; if zero, the entry is empty.




          So it has been there since the beginning.



          Directory and inode pairing is also explained in How are directory structures stored in UNIX filesystem?. a directory itself is a data structure, more specifically: a list of objects (files and inode numbers) pointing to lists about those objects (permissions, type, owner, size, etc.). So each directory contains its own inode number, and then filenames and their inode numbers. Most famous is the inode #2 which is / directory. (Note, though that /dev and /run are virtual filesystems, so since they are root folders for their filesystem, they also have inode 2; i.e. an inode is unique on its own fileystem, but with multiple filesystems attached, you have non-unique inodes ). the diagram borrowed from the linked question probably explains it more succinctly:



          Directory-iNode-Block



          All that information stored in the inode can be accessed via stat() system calls, as per Linux man 7 inode:




          Each file has an inode containing metadata about the file. An
          application can retrieve this metadata using stat(2) (or related calls), which returns a stat structure, or statx(2), which returns a statx structure.




          Is it possible to access a file only knowing its inode number ( ref1 , ref2 )? On some Unix implementations it is possible but it bypasses permission and access checks, so on Linux it's not implemented, and you have to traverse the filesystem tree (via find <DIR> -inum 1234 for example) to get a filename and its corresponding inode.



          On the source code level, it's defined in the Linux kernel source and is also adopted by many filesystems that work on Unix/Linux operating systems, including ext3 and ext4 filesystems (Ubuntu default). Interesting thing: with data being just blocks of information, Linux actually has inode_init_always function that can determine if an inode is a pipe (inode->i_pipe). Yes, sockets and pipes are technically also files - anonymous files, which may not have a filename on disk. FIFOs and Unix-Domain sockets do have filenames on filesystem.



          Data itself may be unique, but inode numbers aren't unique. If we have a hard link to foo called foobar, that will point to inode 123 as well. This inode itself contains information as to what actual blocks of disk space are occupied by that inode. And that's technically how you can have . being linked to the directory filename. Well,almost: you can't create hardlinks to directories on Linux yourself, but filesystems can allow hard links to directories in a very disciplined way, which makes a constraint of having only . and .. as hard links.



          Directory Tree



          Filesystems implement a directory tree as one of the tree datastructures. In particular,




          • ext3 and ext4 use HTree

          • xfs uses B+ Tree

          • zfs uses hash tree


          Key point here is that directories themselves are nodes in a tree, and subdirectories are child nodes, with each child having a link back to the parent node. Thus, for a directory link the inode count is minimum 2 for a bare directory (link to parent .. and link to self .), and each additional subdirectory is an extra link/node:



          # new directory has link count of 2
          $ stat --format=%h .
          2
          # Adding subdirectories increases link count
          $ mkdir subdir1
          $ stat --format=%h .
          3
          $ mkdir subdir2
          $ stat --format=%h .
          4
          # Count of links for root
          $ stat --format=%h /
          25
          # Count of subdirectories, minus .
          $ find / -maxdepth 1 -type d | wc -l
          24


          The diagram found on Ian D. Allen's course page shows a simplified very clear diagram:



          WRONG - names on things      RIGHT - names above things
          ======================= ==========================

          R O O T ---> [etc,bin,home] <-- ROOT directory
          / | / |
          etc bin home ---> [passwd] [ls,rm] [abcd0001]
          | / | / |
          | ls rm abcd0001 ---> | <data> <data> [.bashrc]
          | | | |
          passwd .bashrc ---> <data> <data>


          The only thing in the RIGHT diagram that's incorrect is that files aren't technically considered being on the directory tree itself: Adding a file has no effects on the links count:



          $ mkdir subdir2
          $ stat --format=%h .
          4
          # Adding files doesn't make difference
          $ cp /etc/passwd passwd.copy
          $ stat --format=%h .
          4


          Accessing directories as if they're file



          To quote Linus Torvalds:




          The whole point with "everything is a file" is not that you have some random filename (indeed, sockets and pipes show that "file" and "filename" have nothing to do with each other), but the fact that you can use common tools to operate on different things.




          Considering that a directory is just a special case of a file, naturally there have to be APIs that allow us to open/read/write/close them in a similar fashion to regular files.



          That's where dirent.h C library comes into place, which defines the dirent structure, which you can find in man 3 readdir:



             struct dirent {
          ino_t d_ino; /* Inode number */
          off_t d_off; /* Not an offset; see below */
          unsigned short d_reclen; /* Length of this record */
          unsigned char d_type; /* Type of file; not supported
          by all filesystem types */
          char d_name[256]; /* Null-terminated filename */
          };


          Thus, in your C code you have to define struct dirent *entry_p, and when we open a directory with opendir() and start reading it with readdir(), we'll be storing each item into that entry_p structure. Of course, each item will contain the fields defined in the template for dirent shown above.



          The practical example of how this works can be found in my answer on How to list files and their inode numbers in the current working directory.



          Note that the POSIX manual on fdopen states that "[t]he directory entries for dot and dot-dot are optional" and readdir manual states struct dirent is only required to have d_name and d_ino fields.



          Note on "writing" to directories: writing to a directory is modifying its "list" of entries. Hence, creating or removing a file is directly associated with directory write permissions, and adding/removing files is the writing operation on said directory.






          share|improve this answer





















          • 2





            I refuse to accept sockets are files ;) Would "everything is accessible as a file" be more accurate?

            – Rinzwind
            Sep 10 '18 at 6:53











          • @Rinzwind Well, the phrase "everything is accessible as a file" is accurate. Regular files have open() and read() sockets have connect() and read(),too. What would be more accurate is that "file" is really organized "data" stored on either disk or memory, and some files are anonymous - they don't have filename. Usually users think of files in terms of that icon on the desktop, but that's not the only thing that exists. See also unix.stackexchange.com/a/116616/85039

            – Sergiy Kolodyazhnyy
            Sep 10 '18 at 7:01











          • Well the question was more about if a directory was a file. And it is. Sockets could almost be a separate question along with FIFO Named Pipes.

            – WinEunuuchs2Unix
            Sep 10 '18 at 20:01











          • Well, I got an answer about pipes so far: askubuntu.com/a/1074550/295286 Maybe FIFOs will be next

            – Sergiy Kolodyazhnyy
            Sep 12 '18 at 9:40
















          20














          Note: originally this was written to support my answer for Why is the current directory in the ls command identified as linked to itself? but I felt that this is a topic that deserves to stand on its own, and hence this Q&A.



          Understanding Unix/Linux filesystem and files: Everything is an inode



          Essentially, a directory is just a special file, which contains list of entries and their ID.



          Before we begin the discussion, it's important to make a distinction between a few terms and understand what directories and files really represent. You may have heard the expression "Everything is a file" for Unix/Linux. Well, what users often understand as file is this: /etc/passwd - An object with a path and a name. In reality, a name (be it a directory or file, or whatever else) is just a string of text - a property of the actual object. That object is called inode or I-number, and stored on disk in the inode table. Open programs also have inode tables, but that's not our concern for now.



          Unix's notion of a directory is as Ken Thompson put it in a 1989 interview:




          ...And then some of those files, were directories which just contained name and I-number.




          An interesting observation can be made from Dennis Ritchie's talk in 1972 that




          "...directory is actually no more than a file, but its contents are controlled by the system, and the contents are names of other files. (A directory is sometimes called a catalog in other systems.)"




          ...but there's no mention of inodes anywhere in the talk. However,
          the 1971 manual on format of directories states:




          The fact that a file is a directory is indicated by a bit in the flag word of its i—node entry.



          Directory entries are 10 bytes long. The first word is the i—node of the file represented by the entry, if non—zero; if zero, the entry is empty.




          So it has been there since the beginning.



          Directory and inode pairing is also explained in How are directory structures stored in UNIX filesystem?. a directory itself is a data structure, more specifically: a list of objects (files and inode numbers) pointing to lists about those objects (permissions, type, owner, size, etc.). So each directory contains its own inode number, and then filenames and their inode numbers. Most famous is the inode #2 which is / directory. (Note, though that /dev and /run are virtual filesystems, so since they are root folders for their filesystem, they also have inode 2; i.e. an inode is unique on its own fileystem, but with multiple filesystems attached, you have non-unique inodes ). the diagram borrowed from the linked question probably explains it more succinctly:



          Directory-iNode-Block



          All that information stored in the inode can be accessed via stat() system calls, as per Linux man 7 inode:




          Each file has an inode containing metadata about the file. An
          application can retrieve this metadata using stat(2) (or related calls), which returns a stat structure, or statx(2), which returns a statx structure.




          Is it possible to access a file only knowing its inode number ( ref1 , ref2 )? On some Unix implementations it is possible but it bypasses permission and access checks, so on Linux it's not implemented, and you have to traverse the filesystem tree (via find <DIR> -inum 1234 for example) to get a filename and its corresponding inode.



          On the source code level, it's defined in the Linux kernel source and is also adopted by many filesystems that work on Unix/Linux operating systems, including ext3 and ext4 filesystems (Ubuntu default). Interesting thing: with data being just blocks of information, Linux actually has inode_init_always function that can determine if an inode is a pipe (inode->i_pipe). Yes, sockets and pipes are technically also files - anonymous files, which may not have a filename on disk. FIFOs and Unix-Domain sockets do have filenames on filesystem.



          Data itself may be unique, but inode numbers aren't unique. If we have a hard link to foo called foobar, that will point to inode 123 as well. This inode itself contains information as to what actual blocks of disk space are occupied by that inode. And that's technically how you can have . being linked to the directory filename. Well,almost: you can't create hardlinks to directories on Linux yourself, but filesystems can allow hard links to directories in a very disciplined way, which makes a constraint of having only . and .. as hard links.



          Directory Tree



          Filesystems implement a directory tree as one of the tree datastructures. In particular,




          • ext3 and ext4 use HTree

          • xfs uses B+ Tree

          • zfs uses hash tree


          Key point here is that directories themselves are nodes in a tree, and subdirectories are child nodes, with each child having a link back to the parent node. Thus, for a directory link the inode count is minimum 2 for a bare directory (link to parent .. and link to self .), and each additional subdirectory is an extra link/node:



          # new directory has link count of 2
          $ stat --format=%h .
          2
          # Adding subdirectories increases link count
          $ mkdir subdir1
          $ stat --format=%h .
          3
          $ mkdir subdir2
          $ stat --format=%h .
          4
          # Count of links for root
          $ stat --format=%h /
          25
          # Count of subdirectories, minus .
          $ find / -maxdepth 1 -type d | wc -l
          24


          The diagram found on Ian D. Allen's course page shows a simplified very clear diagram:



          WRONG - names on things      RIGHT - names above things
          ======================= ==========================

          R O O T ---> [etc,bin,home] <-- ROOT directory
          / | / |
          etc bin home ---> [passwd] [ls,rm] [abcd0001]
          | / | / |
          | ls rm abcd0001 ---> | <data> <data> [.bashrc]
          | | | |
          passwd .bashrc ---> <data> <data>


          The only thing in the RIGHT diagram that's incorrect is that files aren't technically considered being on the directory tree itself: Adding a file has no effects on the links count:



          $ mkdir subdir2
          $ stat --format=%h .
          4
          # Adding files doesn't make difference
          $ cp /etc/passwd passwd.copy
          $ stat --format=%h .
          4


          Accessing directories as if they're file



          To quote Linus Torvalds:




          The whole point with "everything is a file" is not that you have some random filename (indeed, sockets and pipes show that "file" and "filename" have nothing to do with each other), but the fact that you can use common tools to operate on different things.




          Considering that a directory is just a special case of a file, naturally there have to be APIs that allow us to open/read/write/close them in a similar fashion to regular files.



          That's where dirent.h C library comes into place, which defines the dirent structure, which you can find in man 3 readdir:



             struct dirent {
          ino_t d_ino; /* Inode number */
          off_t d_off; /* Not an offset; see below */
          unsigned short d_reclen; /* Length of this record */
          unsigned char d_type; /* Type of file; not supported
          by all filesystem types */
          char d_name[256]; /* Null-terminated filename */
          };


          Thus, in your C code you have to define struct dirent *entry_p, and when we open a directory with opendir() and start reading it with readdir(), we'll be storing each item into that entry_p structure. Of course, each item will contain the fields defined in the template for dirent shown above.



          The practical example of how this works can be found in my answer on How to list files and their inode numbers in the current working directory.



          Note that the POSIX manual on fdopen states that "[t]he directory entries for dot and dot-dot are optional" and readdir manual states struct dirent is only required to have d_name and d_ino fields.



          Note on "writing" to directories: writing to a directory is modifying its "list" of entries. Hence, creating or removing a file is directly associated with directory write permissions, and adding/removing files is the writing operation on said directory.






          share|improve this answer





















          • 2





            I refuse to accept sockets are files ;) Would "everything is accessible as a file" be more accurate?

            – Rinzwind
            Sep 10 '18 at 6:53











          • @Rinzwind Well, the phrase "everything is accessible as a file" is accurate. Regular files have open() and read() sockets have connect() and read(),too. What would be more accurate is that "file" is really organized "data" stored on either disk or memory, and some files are anonymous - they don't have filename. Usually users think of files in terms of that icon on the desktop, but that's not the only thing that exists. See also unix.stackexchange.com/a/116616/85039

            – Sergiy Kolodyazhnyy
            Sep 10 '18 at 7:01











          • Well the question was more about if a directory was a file. And it is. Sockets could almost be a separate question along with FIFO Named Pipes.

            – WinEunuuchs2Unix
            Sep 10 '18 at 20:01











          • Well, I got an answer about pipes so far: askubuntu.com/a/1074550/295286 Maybe FIFOs will be next

            – Sergiy Kolodyazhnyy
            Sep 12 '18 at 9:40














          20












          20








          20







          Note: originally this was written to support my answer for Why is the current directory in the ls command identified as linked to itself? but I felt that this is a topic that deserves to stand on its own, and hence this Q&A.



          Understanding Unix/Linux filesystem and files: Everything is an inode



          Essentially, a directory is just a special file, which contains list of entries and their ID.



          Before we begin the discussion, it's important to make a distinction between a few terms and understand what directories and files really represent. You may have heard the expression "Everything is a file" for Unix/Linux. Well, what users often understand as file is this: /etc/passwd - An object with a path and a name. In reality, a name (be it a directory or file, or whatever else) is just a string of text - a property of the actual object. That object is called inode or I-number, and stored on disk in the inode table. Open programs also have inode tables, but that's not our concern for now.



          Unix's notion of a directory is as Ken Thompson put it in a 1989 interview:




          ...And then some of those files, were directories which just contained name and I-number.




          An interesting observation can be made from Dennis Ritchie's talk in 1972 that




          "...directory is actually no more than a file, but its contents are controlled by the system, and the contents are names of other files. (A directory is sometimes called a catalog in other systems.)"




          ...but there's no mention of inodes anywhere in the talk. However,
          the 1971 manual on format of directories states:




          The fact that a file is a directory is indicated by a bit in the flag word of its i—node entry.



          Directory entries are 10 bytes long. The first word is the i—node of the file represented by the entry, if non—zero; if zero, the entry is empty.




          So it has been there since the beginning.



          Directory and inode pairing is also explained in How are directory structures stored in UNIX filesystem?. a directory itself is a data structure, more specifically: a list of objects (files and inode numbers) pointing to lists about those objects (permissions, type, owner, size, etc.). So each directory contains its own inode number, and then filenames and their inode numbers. Most famous is the inode #2 which is / directory. (Note, though that /dev and /run are virtual filesystems, so since they are root folders for their filesystem, they also have inode 2; i.e. an inode is unique on its own fileystem, but with multiple filesystems attached, you have non-unique inodes ). the diagram borrowed from the linked question probably explains it more succinctly:



          Directory-iNode-Block



          All that information stored in the inode can be accessed via stat() system calls, as per Linux man 7 inode:




          Each file has an inode containing metadata about the file. An
          application can retrieve this metadata using stat(2) (or related calls), which returns a stat structure, or statx(2), which returns a statx structure.




          Is it possible to access a file only knowing its inode number ( ref1 , ref2 )? On some Unix implementations it is possible but it bypasses permission and access checks, so on Linux it's not implemented, and you have to traverse the filesystem tree (via find <DIR> -inum 1234 for example) to get a filename and its corresponding inode.



          On the source code level, it's defined in the Linux kernel source and is also adopted by many filesystems that work on Unix/Linux operating systems, including ext3 and ext4 filesystems (Ubuntu default). Interesting thing: with data being just blocks of information, Linux actually has inode_init_always function that can determine if an inode is a pipe (inode->i_pipe). Yes, sockets and pipes are technically also files - anonymous files, which may not have a filename on disk. FIFOs and Unix-Domain sockets do have filenames on filesystem.



          Data itself may be unique, but inode numbers aren't unique. If we have a hard link to foo called foobar, that will point to inode 123 as well. This inode itself contains information as to what actual blocks of disk space are occupied by that inode. And that's technically how you can have . being linked to the directory filename. Well,almost: you can't create hardlinks to directories on Linux yourself, but filesystems can allow hard links to directories in a very disciplined way, which makes a constraint of having only . and .. as hard links.



          Directory Tree



          Filesystems implement a directory tree as one of the tree datastructures. In particular,




          • ext3 and ext4 use HTree

          • xfs uses B+ Tree

          • zfs uses hash tree


          Key point here is that directories themselves are nodes in a tree, and subdirectories are child nodes, with each child having a link back to the parent node. Thus, for a directory link the inode count is minimum 2 for a bare directory (link to parent .. and link to self .), and each additional subdirectory is an extra link/node:



          # new directory has link count of 2
          $ stat --format=%h .
          2
          # Adding subdirectories increases link count
          $ mkdir subdir1
          $ stat --format=%h .
          3
          $ mkdir subdir2
          $ stat --format=%h .
          4
          # Count of links for root
          $ stat --format=%h /
          25
          # Count of subdirectories, minus .
          $ find / -maxdepth 1 -type d | wc -l
          24


          The diagram found on Ian D. Allen's course page shows a simplified very clear diagram:



          WRONG - names on things      RIGHT - names above things
          ======================= ==========================

          R O O T ---> [etc,bin,home] <-- ROOT directory
          / | / |
          etc bin home ---> [passwd] [ls,rm] [abcd0001]
          | / | / |
          | ls rm abcd0001 ---> | <data> <data> [.bashrc]
          | | | |
          passwd .bashrc ---> <data> <data>


          The only thing in the RIGHT diagram that's incorrect is that files aren't technically considered being on the directory tree itself: Adding a file has no effects on the links count:



          $ mkdir subdir2
          $ stat --format=%h .
          4
          # Adding files doesn't make difference
          $ cp /etc/passwd passwd.copy
          $ stat --format=%h .
          4


          Accessing directories as if they're file



          To quote Linus Torvalds:




          The whole point with "everything is a file" is not that you have some random filename (indeed, sockets and pipes show that "file" and "filename" have nothing to do with each other), but the fact that you can use common tools to operate on different things.




          Considering that a directory is just a special case of a file, naturally there have to be APIs that allow us to open/read/write/close them in a similar fashion to regular files.



          That's where dirent.h C library comes into place, which defines the dirent structure, which you can find in man 3 readdir:



             struct dirent {
          ino_t d_ino; /* Inode number */
          off_t d_off; /* Not an offset; see below */
          unsigned short d_reclen; /* Length of this record */
          unsigned char d_type; /* Type of file; not supported
          by all filesystem types */
          char d_name[256]; /* Null-terminated filename */
          };


          Thus, in your C code you have to define struct dirent *entry_p, and when we open a directory with opendir() and start reading it with readdir(), we'll be storing each item into that entry_p structure. Of course, each item will contain the fields defined in the template for dirent shown above.



          The practical example of how this works can be found in my answer on How to list files and their inode numbers in the current working directory.



          Note that the POSIX manual on fdopen states that "[t]he directory entries for dot and dot-dot are optional" and readdir manual states struct dirent is only required to have d_name and d_ino fields.



          Note on "writing" to directories: writing to a directory is modifying its "list" of entries. Hence, creating or removing a file is directly associated with directory write permissions, and adding/removing files is the writing operation on said directory.






          share|improve this answer















          Note: originally this was written to support my answer for Why is the current directory in the ls command identified as linked to itself? but I felt that this is a topic that deserves to stand on its own, and hence this Q&A.



          Understanding Unix/Linux filesystem and files: Everything is an inode



          Essentially, a directory is just a special file, which contains list of entries and their ID.



          Before we begin the discussion, it's important to make a distinction between a few terms and understand what directories and files really represent. You may have heard the expression "Everything is a file" for Unix/Linux. Well, what users often understand as file is this: /etc/passwd - An object with a path and a name. In reality, a name (be it a directory or file, or whatever else) is just a string of text - a property of the actual object. That object is called inode or I-number, and stored on disk in the inode table. Open programs also have inode tables, but that's not our concern for now.



          Unix's notion of a directory is as Ken Thompson put it in a 1989 interview:




          ...And then some of those files, were directories which just contained name and I-number.




          An interesting observation can be made from Dennis Ritchie's talk in 1972 that




          "...directory is actually no more than a file, but its contents are controlled by the system, and the contents are names of other files. (A directory is sometimes called a catalog in other systems.)"




          ...but there's no mention of inodes anywhere in the talk. However,
          the 1971 manual on format of directories states:




          The fact that a file is a directory is indicated by a bit in the flag word of its i—node entry.



          Directory entries are 10 bytes long. The first word is the i—node of the file represented by the entry, if non—zero; if zero, the entry is empty.




          So it has been there since the beginning.



          Directory and inode pairing is also explained in How are directory structures stored in UNIX filesystem?. a directory itself is a data structure, more specifically: a list of objects (files and inode numbers) pointing to lists about those objects (permissions, type, owner, size, etc.). So each directory contains its own inode number, and then filenames and their inode numbers. Most famous is the inode #2 which is / directory. (Note, though that /dev and /run are virtual filesystems, so since they are root folders for their filesystem, they also have inode 2; i.e. an inode is unique on its own fileystem, but with multiple filesystems attached, you have non-unique inodes ). the diagram borrowed from the linked question probably explains it more succinctly:



          Directory-iNode-Block



          All that information stored in the inode can be accessed via stat() system calls, as per Linux man 7 inode:




          Each file has an inode containing metadata about the file. An
          application can retrieve this metadata using stat(2) (or related calls), which returns a stat structure, or statx(2), which returns a statx structure.




          Is it possible to access a file only knowing its inode number ( ref1 , ref2 )? On some Unix implementations it is possible but it bypasses permission and access checks, so on Linux it's not implemented, and you have to traverse the filesystem tree (via find <DIR> -inum 1234 for example) to get a filename and its corresponding inode.



          On the source code level, it's defined in the Linux kernel source and is also adopted by many filesystems that work on Unix/Linux operating systems, including ext3 and ext4 filesystems (Ubuntu default). Interesting thing: with data being just blocks of information, Linux actually has inode_init_always function that can determine if an inode is a pipe (inode->i_pipe). Yes, sockets and pipes are technically also files - anonymous files, which may not have a filename on disk. FIFOs and Unix-Domain sockets do have filenames on filesystem.



          Data itself may be unique, but inode numbers aren't unique. If we have a hard link to foo called foobar, that will point to inode 123 as well. This inode itself contains information as to what actual blocks of disk space are occupied by that inode. And that's technically how you can have . being linked to the directory filename. Well,almost: you can't create hardlinks to directories on Linux yourself, but filesystems can allow hard links to directories in a very disciplined way, which makes a constraint of having only . and .. as hard links.



          Directory Tree



          Filesystems implement a directory tree as one of the tree datastructures. In particular,




          • ext3 and ext4 use HTree

          • xfs uses B+ Tree

          • zfs uses hash tree


          Key point here is that directories themselves are nodes in a tree, and subdirectories are child nodes, with each child having a link back to the parent node. Thus, for a directory link the inode count is minimum 2 for a bare directory (link to parent .. and link to self .), and each additional subdirectory is an extra link/node:



          # new directory has link count of 2
          $ stat --format=%h .
          2
          # Adding subdirectories increases link count
          $ mkdir subdir1
          $ stat --format=%h .
          3
          $ mkdir subdir2
          $ stat --format=%h .
          4
          # Count of links for root
          $ stat --format=%h /
          25
          # Count of subdirectories, minus .
          $ find / -maxdepth 1 -type d | wc -l
          24


          The diagram found on Ian D. Allen's course page shows a simplified very clear diagram:



          WRONG - names on things      RIGHT - names above things
          ======================= ==========================

          R O O T ---> [etc,bin,home] <-- ROOT directory
          / | / |
          etc bin home ---> [passwd] [ls,rm] [abcd0001]
          | / | / |
          | ls rm abcd0001 ---> | <data> <data> [.bashrc]
          | | | |
          passwd .bashrc ---> <data> <data>


          The only thing in the RIGHT diagram that's incorrect is that files aren't technically considered being on the directory tree itself: Adding a file has no effects on the links count:



          $ mkdir subdir2
          $ stat --format=%h .
          4
          # Adding files doesn't make difference
          $ cp /etc/passwd passwd.copy
          $ stat --format=%h .
          4


          Accessing directories as if they're file



          To quote Linus Torvalds:




          The whole point with "everything is a file" is not that you have some random filename (indeed, sockets and pipes show that "file" and "filename" have nothing to do with each other), but the fact that you can use common tools to operate on different things.




          Considering that a directory is just a special case of a file, naturally there have to be APIs that allow us to open/read/write/close them in a similar fashion to regular files.



          That's where dirent.h C library comes into place, which defines the dirent structure, which you can find in man 3 readdir:



             struct dirent {
          ino_t d_ino; /* Inode number */
          off_t d_off; /* Not an offset; see below */
          unsigned short d_reclen; /* Length of this record */
          unsigned char d_type; /* Type of file; not supported
          by all filesystem types */
          char d_name[256]; /* Null-terminated filename */
          };


          Thus, in your C code you have to define struct dirent *entry_p, and when we open a directory with opendir() and start reading it with readdir(), we'll be storing each item into that entry_p structure. Of course, each item will contain the fields defined in the template for dirent shown above.



          The practical example of how this works can be found in my answer on How to list files and their inode numbers in the current working directory.



          Note that the POSIX manual on fdopen states that "[t]he directory entries for dot and dot-dot are optional" and readdir manual states struct dirent is only required to have d_name and d_ino fields.



          Note on "writing" to directories: writing to a directory is modifying its "list" of entries. Hence, creating or removing a file is directly associated with directory write permissions, and adding/removing files is the writing operation on said directory.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Feb 3 at 23:14

























          answered Sep 10 '18 at 2:29









          Sergiy KolodyazhnyySergiy Kolodyazhnyy

          73.3k9153318




          73.3k9153318








          • 2





            I refuse to accept sockets are files ;) Would "everything is accessible as a file" be more accurate?

            – Rinzwind
            Sep 10 '18 at 6:53











          • @Rinzwind Well, the phrase "everything is accessible as a file" is accurate. Regular files have open() and read() sockets have connect() and read(),too. What would be more accurate is that "file" is really organized "data" stored on either disk or memory, and some files are anonymous - they don't have filename. Usually users think of files in terms of that icon on the desktop, but that's not the only thing that exists. See also unix.stackexchange.com/a/116616/85039

            – Sergiy Kolodyazhnyy
            Sep 10 '18 at 7:01











          • Well the question was more about if a directory was a file. And it is. Sockets could almost be a separate question along with FIFO Named Pipes.

            – WinEunuuchs2Unix
            Sep 10 '18 at 20:01











          • Well, I got an answer about pipes so far: askubuntu.com/a/1074550/295286 Maybe FIFOs will be next

            – Sergiy Kolodyazhnyy
            Sep 12 '18 at 9:40














          • 2





            I refuse to accept sockets are files ;) Would "everything is accessible as a file" be more accurate?

            – Rinzwind
            Sep 10 '18 at 6:53











          • @Rinzwind Well, the phrase "everything is accessible as a file" is accurate. Regular files have open() and read() sockets have connect() and read(),too. What would be more accurate is that "file" is really organized "data" stored on either disk or memory, and some files are anonymous - they don't have filename. Usually users think of files in terms of that icon on the desktop, but that's not the only thing that exists. See also unix.stackexchange.com/a/116616/85039

            – Sergiy Kolodyazhnyy
            Sep 10 '18 at 7:01











          • Well the question was more about if a directory was a file. And it is. Sockets could almost be a separate question along with FIFO Named Pipes.

            – WinEunuuchs2Unix
            Sep 10 '18 at 20:01











          • Well, I got an answer about pipes so far: askubuntu.com/a/1074550/295286 Maybe FIFOs will be next

            – Sergiy Kolodyazhnyy
            Sep 12 '18 at 9:40








          2




          2





          I refuse to accept sockets are files ;) Would "everything is accessible as a file" be more accurate?

          – Rinzwind
          Sep 10 '18 at 6:53





          I refuse to accept sockets are files ;) Would "everything is accessible as a file" be more accurate?

          – Rinzwind
          Sep 10 '18 at 6:53













          @Rinzwind Well, the phrase "everything is accessible as a file" is accurate. Regular files have open() and read() sockets have connect() and read(),too. What would be more accurate is that "file" is really organized "data" stored on either disk or memory, and some files are anonymous - they don't have filename. Usually users think of files in terms of that icon on the desktop, but that's not the only thing that exists. See also unix.stackexchange.com/a/116616/85039

          – Sergiy Kolodyazhnyy
          Sep 10 '18 at 7:01





          @Rinzwind Well, the phrase "everything is accessible as a file" is accurate. Regular files have open() and read() sockets have connect() and read(),too. What would be more accurate is that "file" is really organized "data" stored on either disk or memory, and some files are anonymous - they don't have filename. Usually users think of files in terms of that icon on the desktop, but that's not the only thing that exists. See also unix.stackexchange.com/a/116616/85039

          – Sergiy Kolodyazhnyy
          Sep 10 '18 at 7:01













          Well the question was more about if a directory was a file. And it is. Sockets could almost be a separate question along with FIFO Named Pipes.

          – WinEunuuchs2Unix
          Sep 10 '18 at 20:01





          Well the question was more about if a directory was a file. And it is. Sockets could almost be a separate question along with FIFO Named Pipes.

          – WinEunuuchs2Unix
          Sep 10 '18 at 20:01













          Well, I got an answer about pipes so far: askubuntu.com/a/1074550/295286 Maybe FIFOs will be next

          – Sergiy Kolodyazhnyy
          Sep 12 '18 at 9:40





          Well, I got an answer about pipes so far: askubuntu.com/a/1074550/295286 Maybe FIFOs will be next

          – Sergiy Kolodyazhnyy
          Sep 12 '18 at 9:40


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Ask Ubuntu!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1073802%2fwhat-are-directories-if-everything-on-linux-is-a-file%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to make a Squid Proxy server?

          Is this a new Fibonacci Identity?

          19世紀