find

Submitted by brock on Tue, 2018.07.24 - 23:56

Notes on using the Linux find command.

List First Level Of Directories In Current Directory

This will list just the directory names in the current directory.

find * -maxdepth 1 -type d

Note that the list will be the bare names of the directories and not include the path.

 

This will list the directory names in a directory pointed to by the /path/to/directory value.

find /path/to/directory -type d -maxdepth 1

Note that the list will include the /path/to/directory/ as a prefix to each of the directory names.

Count Number Of Files or Directories In Directory Tree

This is how to use find to count the number of files in the current directory tree.

find . -type f | wc -l

This is how to use find to count the number of files in a specific directory tree.

find /path/to/top/of/directory/tree -type f | wc -l
  • The -type f parameter tells find to look for files only.
  • The . parameter tells find to start looking in the current directory and, by default, all directories below the current directory.
  • The output of find is a list of the files found which is formatted as one file per line.
  • The | pipes the output of the find command to the wc word count command.
  • The -l parameter on the wc command tells wc to count lines rather than words.
  • The output is the count of the files found by the find command.

This is how to use find to count the number of directories in the current directory tree.

find . -type d | wc -l

This is how to use find to count the number of directories in a specific directory tree.

find /path/to/top/of/directory/tree -type d | wc -l

Note that this includes the starting directory in the directory count.

To exclude the starting directory from the count, append /* to the end of the starting directory path.

find /path/to/top/of/directory/tree/* -type d | wc -l

Separate Checksum File For Each Directory Tree Under Current Directory

In a certain directory there is a series of directories that are snapshots of separate websites.

The requirement is to generate a checksum file for the directory tree under the top website directory for all of the files under that website directory tree.

This checksum file should contain sufficient information to verify against the existing directory structure below the top directory.

/
`--website.snapshots
     |-- website-1.com
     |    `-- 2013
     |-- website-2.ca
     |    `-- 2013
     |-- website-3.net
     |    `-- 2013
     `-- website-4.biz
           `-- 2013

This code will generate a bare list of the top level of directories within the current directory.

find * -maxdepth 1 -type d

This code will generate a check sum file containing check sums for all of the files under the current directory.

find mintocomm -type f -print0 | xargs -0 sha256sum >>test.mintocomm.sha256sum

This code does generate the check sum file under each website directory with the proper information. Note that it will also attempt to generate the check sum for the .sha256sum file it is creating, which usually generates an error when the directory tree check sums are verified.

find * -maxdepth 0 -type d -exec sh -c 'find "{}" -type f -print0 | xargs -0 sha256sum >> "{}"/"{}".sha256sum' \; 

This version will avoid trying to generate the check sum for the .sha256sum file that is being generated. Otherwise, you will generally get a check sum FAILED when verifying the check sums on a directory tree. Note that it will exclude all files that have a suffix of .sha256sum.

find * -maxdepth 0 -type d -exec sh -c 'find "{}" -type f \( -iname "*" ! -iname "*.sha256sum" \) -print0 | xargs -0 sha256sum >> "{}"/"{}".sha256sum' \;

Verify Directory Tree Using The Checksum Created Above

The first step is to navigate to the top directory holding the website directories.

cd /path/to/top/website.snapshots/directory/

Use the following commands on each of the website directories under the website.snapshots directory.

This will verify the check sums and print out the number of files that were OK.

sha256sum -c top.website.directory.name/top.website.directory.name.sha256sum | grep -c OK

This will verify the check sums and print out the number of files that FAILED.

sha256sum -c top.website.directory.name/top.website.directory.name.sha256sum | grep -c FAIL

This will verify the check sums and print out the names of files that FAILED.

sha256sum -c top.website.directory.name/top.website.directory.name.sha256sum | grep FAIL

Note that the sha256sum actually prints the word FAILED but these commands are only looking for the FAIL portion.

Separate Archive File For Each Directory Tree Under Current Directory

For the same directory structure as shown above for generating check sum files for directory trees under the current directory, generate a separate archive file (.tar.bz2) for each directory tree.

The archive files should be created in the current directory and NOT under the website directory trees.

The normal Linux tar command to create an archive of a directory tree is :

tar -cjvf name.of.archive.file.tar.bz2 path.to.top.directory.of.directory.tree

This is the command used previously to create check sum files for each of the website directory trees. It can be modified to perform that backup of the same website directory trees.

find * -maxdepth 0 -type d -exec sh -c 'find "{}" -type f -print0 | xargs -0 sha256sum >> "{}"/"{}".sha256sum' \; 

In bash you can add the following to the end of your filename in order to add a timestamp in the form YYYY.MM.DD-HH.MM like 2012.10.22-14.41

$(date '+%Y.%m.%d-%H.%M')
OR
$(date "+%Y.%m.%d-%H.%M")

For an archive that we are generating with a script / command, it is nice if existing archives are NOT over written. A filename with a timestamp is quite useful for preventing that.

find * -maxdepth 0 -type d -exec sh -c 'tar -cjvf "{}"-$(date "+%Y.%m.%d-%H.%M").tar.bz2 "{}"' \; 

Verifying The .tar.bz2 Archives Created Above

Note that some versions of tar do not have the -d, --diff or --compare options.

The diff or compare only compares the contents of the archive against the files of the same name on the file system. It does not examine files on the file system that are not in the archive. It will report any differences in file size, mode, owner, modification date and contents.

If everything is OK, there will be NO messages output.

Messages are only output if there is a difference between the archive contents and the files on the file system.

If the -v or -vv verbose options are used, and if everything is OK, the output will only contain a list of the files that have been processed. Basically it is a list of the contents of the archive. -v gives you the paths and file names. -vv give you more information like the ls -l command.

tar --compare --file=archive.file.name.tar.bz2
tar -v --compare --file=archive.file.name.tar.bz2
tar -vv --compare --file=archive.file.name.tar.bz2
tar --diff --file=archive.file.name.tar.bz2
tar -v --diff --file=archive.file.name.tar.bz2
tar -vv --diff --file=archive.file.name.tar.bz2
tar -d --file=archive.file.name.tar.bz2
tar -v -d --file=archive.file.name.tar.bz2
tar -vv -d --file=archive.file.name.tar.bz2
  • --diff or --compare can be used to compare the files in the archive against the files on the file system.
  • Without the -v or -vv there is no real output unless a difference between the files in the archive and those on the file system is identified.
  • -v will give you a verbose output that includes the path and filename of every file that is checked.
  • -vv gives you a more verbose output that includes the path, filename, size, permissions, etc. (like ls -l would) for each file that is checked.
  • Note that presence of the file on the -v and -vv output without any warning / error indicates that the file is good.
  • --file= is the name of the archive file.

The following command does the archive to file system compares for all of the archive files created by the previous command for all of the website directories. There is no indication of which archives are being processed.

find * -maxdepth 0 -type f -name '*.tar.bz2' -exec sh -c 'tar --compare --file="{}"' \;

The following command does the archive to file system compares for all of the archive files created by the previous command for all of the website directories. The name of the archive being processed is displayed before each compare is performed.

find * -maxdepth 0 -type f -name '*.tar.bz2' -exec sh -c 'echo "{}"; tar --compare --file="{}"' \;

Note that using the * in conjunction with the -maxdepth 0 option results in bare file names.

Note that using the . in conjunction with the -maxdepth 1 option results in the file names being prefixed with the ./ path.

Check Sum The Archive Files Before Burning To CD, DVD or Blu-Ray

In order to be able to test if the archive files have been burnt properly to the backup media or to test if they are in good condition when restoring, it is a good idea to generate the check sums for the archive files. Remember to burn the check sum file to the backup media with the archive files.

sha256sum *.tar.bz2 >website.archives-$(date "+%Y.%m.%d-%H.%M").sha256sum

Verify The Archive Files With The sha256sum File

Note that this example uses a .sha256sum file created by the command above.

The result should be a list of the archive file names with an OK beside each one.

If there is a problem with a check sum, FAILED will be displayed beside the archive file name.

sha256sum -c website.archives-2013.03.20-20.18.sha256sum

Determining The Depth Of A Directory Tree

The following three (3) commands will print out the maximum depth and the name of the directory that is at that depth.

This currently the most complex of the commands available and makes use of several Linux utility programs such as find, tr, wc, awk, tail, sort, etc.

This command will determine the directory depth of a directory tree starting from the current directory. On the line displayed, the first number is the depth and it is followed by the directory path that is at that depth.

Note that the directory listed may not be the only one at that depth. The one displayed is usually the one that was the last one detected at that depth.

find . -type d -exec bash -c 'echo $(tr -cd / <<< "$1"|wc -c):$1' -- {} \;  | sort -n | tail -n 1 | awk -F: '{print $1, $2}'

5 ./d.1/d.2.c/d.3.c/d.4.c/d.5.c.1

This command will determine the directory depth of a directory tree starting from the root "/" directory.

This one might NOT be a good idea. This one will take a long time as it will process all directories on your computer system.

find / -type d -exec bash -c 'echo $(tr -cd / <<< "$1"|wc -c):$1' -- {} \;  | sort -n | tail -n 1 | awk -F: '{print $1, $2}'

This command will determine the directory depth of a directory tree for a specific directory. Note that this may not be what you expect.

  • If you specify the directory explicitly (i.e. /website.snapshots/2013/), then the depth will be measured from the root "/" directory for all directories below the specified directory.
  • If you specify a relative directory (i.e. ../../website.snapshots/2013/), then the depth will include the relative (../../ which is 2 levels) and specific directory entries in the path for all directories below the specified directory.
find /path/to/specific/directory -type d -exec bash -c 'echo $(tr -cd / <<< "$1"|wc -c):$1' -- {} \;  | sort -n | tail -n 1 | awk -F: '{print $1, $2}'

A simpler command in which just prints out the maximum depth of the directory tree below the current directory and reduces the need for several Linux utility programs.

find . -printf '%d\n' | sort -n  | tail -1

5

There is also this command which will display how many directories are at each depth below the current directory. On each line, the first number is the quantity and the second number is depth.

find . -type d | while read f; do awk -F'/' '{print NF -1}' | sort | uniq -c | sort -nr; done

      6 2
      4 3
      4 1
      3 4
      1 5

Test tree :

.
|-- a.1
|   `-- a.2
|       `-- a.3
|-- b.1
|   `-- b.2
|-- c.1
`-- d.1
    |-- d.2
    |   `-- d.3
    |       `-- d.4
    |-- d.2.a
    |-- d.2.b
    |   `-- d.3.b
    |       `-- d.4.b
    `-- d.2.c
        `-- d.3.c
            `-- d.4.c
                |-- d.5.c
                `-- d.5.c.1

Reason For Using SHA256 Check Sum

The SHA256 Check Sum provides better detection of errors than the MD5 Check Sum.

The SHA 256 Check Sum provides better tamper detection.

Reason For Archiving Website Directories

Due to the limited directory depth available on optical media such as CD, DVD and Blu-Ray, it is usually not possible to burn a website directly to these media.

Due to the limited file name support on optical media such as CD, DVD and Blu-Ray, it is usually not possible to burn a website directly to these media.

Due to the flexibility of the tar archive program, the website directory can easily be archived with all of its filenames, permissions and directories intact.

Tar archives can also be mounted as read only file systems on Linux.

Needless to say, archives also occupy less space than the original files due to compression that is applied when creating the archive file.

Be wary when making a direct backup of the website files to optical media as many disk burning programs will try to be helpful by modifying file / directory names and lopping off any directories beyond the allowed depth. This is not helpful when it comes time to recover your data from the archive.

Limitations On Optical Media - CD, DVD, Blu-Ray

Some information from Wikipedia :

Directory depth limit - ISO 9660

The restrictions on filename length (8 characters plus 3 character extension for level 1) and directory depth (8 levels, including the root directory) are a more serious limitation of the ISO 9660 file system.

Many CD authoring applications attempt to get around the filename length by truncating filenames automatically, but do so at the risk of breaking applications that rely on a specific file structure.

The Rock Ridge extension works around the 8 directory depth limit by folding paths.

In practice however, few drivers and OSes care about the directory depth, so this rule is often ignored.

ISO Level 2 allows longer names, but since the total directory entry uses a single byte it is limited to 255 bytes. This puts upper limit on the filename of just over 200 characters depending on what directory extensions are used.

Limit on number of directories - ISO 9660

Another limitation, less well known, is the number of directories.

The ISO image has a structure called "path table". For each directory in the image, the path table provides the identifier of its parent directory. The problem is that the directory identifier is a 16-bit number, limiting its range from 1 to 65,535. The content of each directory is written also in a different place, making the path table redundant, and suitable only for fast searching. Some operating systems (e.g., Windows) use it, while others (e.g., Linux) do not.

If an ISO image or disk consists of more than 65,535 directories, it will be readable in Linux, while in early Windows versions all files from the additional directories will be visible, but show up as empty (zero length). Current Windows versions appear to handle this correctly.

A popular application using ISO format, mkisofs, aborts if there is a path table overflow. Nero Burning ROM (for Windows) and also Pinnacle Instant CD/DVD does not check whether the problem occurs, and will produce an invalid ISO file or disk without warning. Also, isovfy cannot easily report this problem.

This is the only place in the ISO format where a 16-bit number is used, causing such limitations.

Rock Ridge Interchange Protocol RRIP Extensions to ISO 9660

The RRIP extensions are, briefly:

  • Longer file names (up to 255 bytes) and fewer restrictions on allowed characters (support for lowercase, etc.)
  • UNIX-style file modes, user ids and group ids, and file timestamps
  • Support for Symbolic links and device files
  • Deeper directory hierarchy (more than 8 levels)
  • Efficient storage of sparse files

Joliet File System

Joliet File System is an extension to the ISO 9660 files system that is mainly aimed at extending the length of file names.

The specification only allows filenames to be up to 64 Unicode characters in length. However, the documentation for genisoimage states filenames up to 103 characters in length do not appear to cause problems.

Universal Disk Format UDF

Introduced in 1995, it is replacing ISO 9660 for DVDs and newer optical disc formats.

There are multiple versions of UDF implemented. Care must be taken in choosing one that is compatible with the platform that is being targeted. See the Wikipedia page noted above for a table that has some compatibility information. The table looks a little dated as of this writing.

Universal Disk Format (UDF) is a profile of the specification known as ISO/IEC 13346 and ECMA-167 and is an open vendor-neutral file system for computer data storage for a broad range of media.

  • Max. file size 16 EB
  • Max. filename length 255 bytes (path 1023 bytes[1])
  • Max. volume size ?
  • Allowed characters in filenames Any Unicode except NUL
  • Dates recorded creation, archive, modification (mtime), attribute modification (ctime), access (atime)
  • File system permissions POSIX

Some Collected Figures

  • A general rule when creating a data CD using the Joliet file system is:
    • File naming = 64 characters (max)
    • Directory structure = 8 levels deep (max)
    • Total characters = 128 (max) path+filename
  • ISO 9660:
    • File name = 8.3 or 8.3 MSDOS or 30 character if Advanced Option is used
    • Directory name = same applies
    • Total characters = 128 (max) path+filename (not clear on this but it seems about right)
  • Here is the "actual" UDF FS description:
    • UDF Directory Limits
      • Directory Size — 264-1 bytes
      • Directory Depth — unlimited
      • Sub-directories per Directory — 216-1 sub-directories
      • Directory Name — 256 bytes
    • UDF File Limits
      • Logical Block Size — 232-1 partitions * Maximum Partition Size
      • File Extent Size — 230-1 bytes
      • File Size — 264-1 bytes
      • File Name — 256 bytes

    And here is a description of NTFS file/path names from Microsoft:

    In the Windows API, the maximum length for a path is MAX_PATH, which is defined as 260 characters. A path is structured as follows: drive letter, colon, backslash, components separated by backslashes, and a null-terminating character. For example, the maximum path on the D drive is D:\<256 chars>NUL.

ext4 Linux File System

This is for comparison with the Optical Storage capabilities. This is the environment that that the web site directory trees are normally stored in on the httpd web server.

  • In ext3 a directory can have at most 32,000 subdirectories. In ext4 this limit increased to 64,000.
  • Max. filename length : 255 bytes (characters)
  • Max. volume size : 1 EiB
  • Allowed characters in filenames : All bytes except NUL ('\0') and '/'
  • Max. file size : 16 TiB (for 4k block filesystem)
  • Max. number of files : 4 billion (specified at filesystem creation time)
  • Dates recorded : modification (mtime), attribute modification (ctime), access (atime), delete (dtime), create (crtime)

Note On Units

  • exbibyte (EiB) = 260 = 1,152,921,504,606,846,976 Bytes
  • exabyte (EB) = 1018 = 1,000,000,000,000,000,000 Bytes

Count The Number Of Lines In Each File In A Directory (Individually)

[me@cherry sample.data]$ find . -type f | xargs wc -l | sort -k 2
  507 ./201108150800.txt
  507 ./201108150840.txt
  117 ./201108150848.txt
   63 ./201108151800.txt
   67 ./201108152000.txt
 1261 total

Count the number of lines in files in a directory tree

Except in the directories under "./public", "./modules" and "./templates".

Reference : how-can-i-count-the-number-of-lines-in-my-all-my-files-in-this-directory

 find . -type f |grep -v "^./public" |grep -v "^./modules"|grep -v "^./templates"|xargs cat |wc -l
  1. make a list of all files under current directory with find . -type f
  2. filter out files from "exclude" dirs with grep -v
  3. xargs will read list of files from stdin and pass all files as options to cat.
  4. cat will print all files to stdout
  5. wc will count lines.
If you want to count lines in every file individually, change xargs cat |wc -l to xargs wc -l

Looking for Files NOT owned by a specific user

Reference Page : stackoverflow -- Looking for Files NOT owned by a specific user

Recursively look through directories to find files NOT owned by a particular user.

The find(1) utility has primaries that can be negated ("reversed") using the "!" operator. On the prompt one must however escape the negation with a backslash as it is a shell metacharacter.

find . \! -user foo -print

Alternately :

find . \! -user root | xargs ls -al

Another suggestion was :

piping the output to xargs -I{} -P3 -- ${cmdhere} {} can have ${cmdhere} operate on each file in parallel. cmdhere ::= standard unix utils; standard unix utils ::= chmod, chown, stat, ls, ...

But that has not been tested by me, yet.

Some Examples From About.com

Reference Page : About.com -- Linux / Unix Command: find

EXAMPLES

find /home -user joe

Find every file under the directory /home owned by the user joe.

find /usr -name *stat

Find every file under the directory /usr ending in ".stat".

find /var/spool -mtime +60

Find every file under the directory /var/spool that was modified more than 60 days ago.

find /tmp -name core -type f -print | xargs /bin/rm -f

Find files named core in or below the directory /tmp and delete them. Note that this will work incorrectly if there are any filenames containing newlines, single or double quotes, or spaces.

find /tmp -name core -type f -print0 | xargs -0 /bin/rm -f

Find files named core in or below the directory /tmp and delete them, processing filenames in such a way that file or directory names containing single or double quotes, spaces or newlines are correctly handled. The -name test comes before the -type test in order to avoid having to call stat(2) on every file.

find . -type f -exec file '{}' \;

Runs `file' on every file in or below the current directory. Notice that the braces are enclosed in single quote marks to protect them from interpretation as shell script punctuation. The semicolon is similarly protected by the use of a backslash, though ';' could have been used in that case also.

find /       \( -perm -4000 -fprintf /root/suid.txt '%#m %u %p\n' \) , \
             \( -size +100M -fprintf /root/big.txt  '%-10s %p\n' \)

Traverse the filesystem just once, listing setuid files and directories into /root/suid.txt and large files into /root/big.txt.

find $HOME  -mtime 0

Search for files in your home directory which have been modified in the last twenty-four hours. This command works this way because the time since each file was last modified is divided by 24 hours and any remainder is discarded. That means that to match -mtime 0, a file will have to have a modification in the past which is less than 24 hours ago.

find . -perm 664

Search for files which have read and write permission for their owner, and group, but which other users can read but not write to. Files which meet these criteria but have other permissions bits set (for example if someone can execute the file) will not be matched.

find . -perm -664

Search for files which have read and write permission for their owner and group, and which other users can read, without regard to the presence of any extra permission bits (for example the executable bit). This will match a file which has mode 0777, for example.

find . -perm /222

Search for files which are writable by somebody (their owner, or their group, or anybody else).

find . -perm /220
find . -perm /u+w,g+w
find . -perm /u=w,g=w

All three of these commands do the same thing, but the first one uses the octal representation of the file mode, and the other two use the symbolic form. These commands all search for files which are writable by either their owner or their group. The files don't have to be writable by both the owner and group to be matched; either will do.

find . -perm -220
find . -perm -g+w,u+w

Both these commands do the same thing; search for files which are writable by both their owner and their group.

find . -perm -444 -perm /222 ! -perm /111
find . -perm -a+r -perm /a+w ! -perm /a+x

These two commands both search for files that are readable for everybody (-perm -444 or -perm -a+r), have at least on write bit set (-perm /222 or -perm /a+w) but are not executable for anybody (! -perm /111 and ! -perm /a+x respectively)

Count The Number Of Files In Each Directory Under A Directory Tree

This version reports the number of files under each Subdirectory separately and does not give a grand total.

find . -type d -print0 | while read -d '' -r dir; do files=("$dir"/*); printf "%5d files in directory %s\n" "${#files[@]}" "$dir"; done
find put.top.directory.path.here -type d -print0 | while read -d '' -r dir; do files=("$dir"/*); printf "%5d files in directory %s\n" "${#files[@]}" "$dir"; done

How Do I Count All The Files Recursively Through Directories

Reference : how-do-i-count-all-the-files-recursively-through-directories

find -maxdepth 1 -type d | while read -r dir; do printf "%s:\t" "$dir"; find "$dir" -type f | wc -l; done
Note:

Replace spaces with underscores :
find . -print0 | xargs -0 /usr/local/bin/prename "s/\s+/_/g" {} \;

It appears that directory renaming is getting ahead of file renaming
but the glob of directory names still has the old version of the
directory name...

Rename directories first and then files?

Also have to add ’ character to quotes to be removed.

Apply Perl rename Regex To All Files In Directory Tree

 

find . -type f -print0 | xargs -0 /usr/local/bin/prename "s/\s+/_/g" {} \;

find . -type f -print0 | xargs -0 /usr/local/bin/prename "s/#//g" {} \;

find . -type f -print0 | xargs -0 /usr/local/bin/prename "s/\,//g" {} \;

find . -type f -print0 | xargs -0 /usr/local/bin/prename "s/[\(\)]//g" {} \;

find . -type f -print0 | xargs -0 /usr/local/bin/prename "s/&/and/g" {} \;

find . -type f -print0 | xargs -0 /usr/local/bin/prename "s/[\'\"\’]//g" {} \;

find . -type f -print0 | xargs -0 /usr/local/bin/prename "s/[Cc]\+\+/Cpp/g" {} \;

find ".c" and ".h" files in directory tree

find . -name '*.[ch]' -type f 

*.[ch] represents all *.h and *.c files

-type f finds regular files

note : don't you have to escape the "." period as it is a special regular expression character?