*nix: find the largest files/directories within a directory

An article, posted more than 6 years ago filed in how i do it, unix, command line, terminal, sort, linux, macos & osx.

Every now and then I’m searching for this little snippet in my notes using NotationalVelocity (or currently actually a fork):

du -hsx * | sort -rh | head -100

It’s a variation of a snippet I found somewhere, but hardly invested any time in understanding what it actually does. Let’s decompose, from head to taildu.

head

head -100

head simply limits the results to a maximum of 100 lines. Not much more to explain here

sort

sort sorts. by default it sorts the files by filename, but adding ‘-h’ to it allows it to sort by “human readable numbers” (e.g. 5M > 6K); if ‘-n’ would be added as option 6K would be > 5M. The ‘-r’ options reverses the sort wich is by default ascending.

du

du by defaults crawls a directory recursively for all files. passing ‘-s’ tells it to sum the values of files within directories. the ‘-x’ option is used to not crawl beyond the current file system (useful when you want to discover what is causing this filesystem to fill up), and ‘-h’ makes it all human readable (to make sure you’re not overly focussed on a file that is just a megabyte in size.

Powered by |

The pipe character (‘|’) takes the output of the previous command and passes it on to the following command (instead of displaying it).

Bonus: Where I typically find insanely large files

Ubuntu

  • Check your backup folders
  • Check your log files (don’t forget your application logs)

MacOS

Well, Photo’s Library consumes already a 180GB worth of photo’s. Some deletable.

Make sure to check ~/Library/com.docker.docker when you are a developer … I sometimes drop the entire directory as I only use it as development support. And also ~/.rbenv is often overlooked as a container of years of outdated gems.

###

Enjoyed this? Follow me on Mastodon or add the RSS, euh ATOM feed to your feed reader.

Op de hoogte blijven?

Maandelijks maak ik een selectie artikelen en zorg ik voor wat extra context bij de meer technische stukken. Schrijf je hieronder in:

Mailfrequentie = 1x per maand. Je privacy wordt serieus genomen: de mailinglijst bestaat alleen op onze servers.