Disk space

From CsWiki
Jump to: navigation, search

Storage Policy

Backups

It is important to understand that none of our main file systems are backed up on other devices. This means that in case of a hardware crash data will be lost.

Snapshots

Some file systems, have snapshots, which allow a rollback of the file system to a certain point in time. Snapshots reside on the same file system as the data itself, and, while CSE storage systems have raid protection that guards against single disk failure, without backup, data will be lost if a multiple disk/shelf crash occurs.

The main storage systems are:

Home directories: These directories (/cs/usr/login accessible as "~") reside on a highly capable file server with provided snapshots and have the lowest chance of crashing but are relatively more limited in space.
Storage for labs:  These are directories that reside under /cs/labs. They provide snapshots
and are the volume of choice for large amounts of data.  Some labs have directories under /cs/snapless which are desired for large volume volatile data and, hence, contain no snapshots. Lab users that intend to run heavy IO intensive jobs should consult with the CSE System Group first.

Lab storage limits

Each faculty member is allocated up to 2 terabytes of space, which should cover most users. Anyone requiring more space will be asked to pay a yearly amount that will cover the price of additional storage resources.

Other filesystems can have varying snapshot policies. Contact the CSE System Group if you want to better understand their status.

Data Sets

Many labs have large amounts of data that are shared with other labs. Instead of labs downloading such data to one's lab directory and, therefore, having multiple copies of the same data in different labs, the repository /cs/dataset was created. Labs who need to download such shared data sets should send a request to system@cs.huji.ac.il with the links/passwords needed to download the data.

Additional recommendations

To safely store code, we suggest using our Github. The github virtual machine and data are backed up on two different file-systems, providing high-level resilience.

Another easy backup option is using the unlimited space Google provides along with your mail HUJI account. Some students and faculty members are using this option and are pleased with the results. Please note that these backups are solely the responsibility of the user and are in no way supported by the CSE System Group.

Snapshots do not behave well with heavy writes. If your lab needs to write a lot of data or do heavy reads please contact us for further assistance.

Disk Quotas

Users are allocated disk quota as follows:

Account typeGroupHome
undergraduatestud6G
graduategrad7G
doctoralphd9G
post-doctoralpdoc35G
visting lecturervisitor50G
facultystaff25G
guestguest6G
external graduatecgrad7G
external phdcphd9G

To see how much quota you have, type nquota. Note that the data for nquota is updated once every few minutes.

Snapshots

Snapshots are short-term online backups. There are three types of snapshots: hourly, daily and weekly. Snapshots cycle out -- i.e. if N hourly snapshots are defined for a filesystem, when the next snapshot is created, the oldest (N+1) snapshot disappears permanently. Note that online means that the snapshots are part of the filesystem. If a filesystem becomes unavailable due to a fileserver failure, the snapshots are unavailable as well.

Snapshots are available in the special .snapshot directory and can be accessed using the snapshot utility or directly (e.g. "cd .snapshot"). Note that the .snapshot doesn't show up when searching a directory (i.e. "ls -a" won't show it), but it exists and can be "cd"-ed into.

The current snapshot schedule is as follows:

File System TypePathHourlyDailyWeekly
Home~6144
Graduate Labs/cs/labs/supervisor/login2474

Disk space shortage

Logging in without quota

If there is insufficient disk space, some window managers might not start at all. When this happens you'll receive some message of insufficient quota, or an error message saying the Xsession has terminated too quickly. In this case, to be able to login and clean up your quota, you'll have to login using xterm.

There are two ways to login in xterm:

  1. When the chooser appears, choose XTerm instead of the normal window manager you use.
  2. Before logging in (i.e. entering the username and password), click on the Session button, and choose Failsafe Terminal

After you've logged in, you'll get a simple xterm window. There you can start cleaning up your quota as described in the following sections. When finished, simply type exit, and re-login again.

Note: Without a window manager, to write commands to the xterm window, you'll have to place the mouse cursor on that window.

Basic cleanup

Some basic utilities to manage disk space:

  1. Use the cleanup script to remove files that can be safely deleted:
    cleanup
  2. Use du to find how space is distributed within your home directory:
    du ~ | sort -rn
    Note that this will also show you the space taken by files beginning with '.' which do not normally show in directory listings.
    A more elaborate usage of the du utility is:
    \ls -A1 | sed "s#'#\\\'#" | sed 's/ /\\ /g' | xargs du -sckx -- | sort -n
    Which will calculate the sum of each directory (in the current directory).
  3. Graduate students can avoid creating large dot directories by installing software in their /cs/labs directory using virtual environment -- e.g.
    virtualenv /cs/labs/einstein/shainde/venv-01

Other Tips for freeing some disk space

  • Compress directories and files that aren't used regularly, using the command:
     tar cvzf <archive file> <list of files>
    and remove these files
    rm <list of files>
    or in one command
    tar --remove-files cvfz <archive file> <list of files>
    If you want to decompress those files, use the command:
    tar xvzf <archive file>
  • If you want to view the disk-space used by a specific directory (and its descendants) type:
     du -sh <directory>
  • Graduate students should move large dot directories to their lab storage -- i.e.
mv ~/.cache ~/.config ~/.local ~/.ntprofile.V2  /cs/labs/supervisor/login/
ln -s /cs/labs/supervisor/login/.cache ~
ln -s /cs/labs/supervisor/login/.config ~
ln -s /cs/labs/supervisor/login/.local ~
ln -s /cs/labs/supervisor/login/.ntprofile.V2 ~