Startseite Mitarbeiter Projekte Publikationen Lehrveranstaltungen Kooperationen Freie Stellen english version



1. Getting started
2. First steps in CHARMM
3. Manipulating structures
4. Energy minimization
5. Introduction to molecular dynamics
6. Molecular dynamics of homogeneous systems
7. Molecular dynamics of solvated biomolecules
8. Computing NMR properties
9. Distance restraints


CHARMM Tutorial

Next Previous Contents

1. Getting started

1.1 Introduction

Scope and goals of the course (Übungen)

The course is intended to familiarize you with important concepts of biomolecular simulation, energy minimization and molecular dynamics. This is done by carrying out practical examples using the simulation program CHARMM. The exercises may also serve as a tutorial on how to use CHARMM though the emphasis is on the principles of the methods, not the details of the functionality provided by the program. Topics were selected according to less is more. Thus, the course does not cover every aspect of or introduces you to all capabilities of CHARMM. Instead, the knowledge you hopefully will have acquired after working through the examples and exercises should enable you to use CHARMM (or another biomolecular simulation package) intelligently in your own work, relying on the (terse) CHARMM documentation and the testcases that come with the (academic) distribution of the program for further help.

The accompanying lecture deals in some detail with the principles of biomolecular NMR-spectroscopy and how techniques of biomolecular simulation are related to and used in structure determination of biological macromolecules by NMR. In the course (Übungen), however, the emphasis will be on simulation methods.

Why CHARMM?

Short answer: It's the program I know best.

Long answer: There are a number of reasons that make CHARMM very suitable for use in an introductory course (which is not to say that it is not equally well suited for research applications!) It is one single program, making it easier to use and to learn than a suite of smaller programs that often have to be "forced" to interact properly with each other. The user interface is fairly uniform; in addition, the (academic) version of CHARMM is quite up-to-date with respect to the latest methodological developments. Obviously, for a number of things programs exist which handle a particular problem much more elegantly or are easier to use. In real life, one might, therefore, prefer program XYZ to CHARMM in such instances. However, in the context of this course we decided that it is best to use one program for all applications rather than making you (re)learn several programs.

The proper reference for CHARMM (Chemistry at HARvard Molecular Mechanics) is: B. R. Brooks et al. J. Comput. Chem. 1983, 4, 187-217. Currently, a new version of CHARMM, containing bug-fixes and new features, is released appr. twice a year. To obtain CHARMM, one has to contact Prof. Martin Karplus (e-mail: marci@tammy.harvard.edu or marci@brel.u-strasbg.fr)

With respect to computer graphics ...

Less is more also was the reason why no introduction to a high-powered graphics program is given in this course. This was a somewhat problematic decision to make since experimental structure determination requires the use of a good graphics package. However, WHATIF, the package used most in our group, has a too steep learning curve which would require us to spend a few days just for getting acquainted with the program. Thus, we shall rely on the very primitive built in graphics of CHARMM to look at a structure when this is more instructive than a bunch of numbers.

However, if you are already familiar with one of the graphics programs used in this group (WHATIF, RasMol, Insight, MolMol) then the parts requiring graphics can of course also be done using this program.

This manual

tries to provide a link between the theoretical concepts taught in the main lecture (Vorlesung) and the documentation that comes with CHARMM. It assumes that you are familiar with basic concepts of structural molecular biology, e.g., that you know a bit about the properties of the peptide bond or that you know something about the properties of amino acids. (What you find in any introductory textbook of biochemistry or molecular biology on the subject is more than enough!) Similarly, this is not the place to explain the theory behind the methods; however, practical ramifications are mentioned. For example, it is not derived why a molecular dynamics simulation corresponds to the microcanonical ensemble of statistical mechanics; however, it is emphasized that one can use conservation of energy to gauge the quality of the computation. The CHARMM documentation explains most commands in reasonable detail; however, it is intended as a reference and not as an introduction. This guide, therefore, attempts to provide the role of a tutorial. In future, improved versions of this documents proper references and suggestions for further reading should be added. In particular, some examples are based on the old CHARMM course outline and/or the testcases that come with the program.

A final remark

I hope that you will find the examples and exercises interesting and instructive. Remember that there are no stupid questions --- so do ask if anything is unclear.

1.2 Introduction to Unix and some useful utilities

This section is a crash-course about those elements/commands of Unix which one needs to work with CHARMM. Here is a brief overview of what follows and why it is useful. (i) CHARMM is not intended for interactive use (as opposed to, e.g., Word); instead, one prepares a command file, a script, which tells CHARMM what operations to perform. Thus, one has to know how to manipulate files ("Dateien"), in particular, how to copy (cp), rename (mv) and organize (mkdir, cd, rm, ln, mv) them. In addition, for the exercises you regularly have to modify (edit) the example input files to achieve the desired results. My texteditor of choice is emacs, which for our purposes also has the advantage of offering an interactive tutorial. A texteditor is the more powerful equivalent of the Notepad tool under Windows. (If you are already familiar with a different editor (vi, nedit, pico, ... and if we have it installed on our machines, USE IT!) (ii) CHARMM produces a lot of output, but sometimes only a very small subset of the results reported are of interest. That's what the advanced Unix utilities grep and awk are made for. (iii) I will want to look at your results. Please send them to me by e-mail. Unless you are familiar with one of the many mail utilities under Unix, I recommend that you use the mail facilities under emacs that can be invoked from the menubar. (iv) Finally, one needs to plot data. For this we shall use gnuplot; its use will be explained when it's actually needed.

File system concepts and commands

After logging into the system, there should be at least one command or terminal window. Make sure that the cursor is over it (or click at it), then you are ready to type commands in this window. Depending on the computer you work on, the lines on which you can type commands contain a % or $ sign (plus possibly some other text, like the machine name and/or your username) Whenever the system (or rather this command window) is ready to accept input from you, it will display this so-called prompt character, i.e., the % or the $. Don't type if the prompt does not reappear after a command; this is the case, for example, when you start an editor or netscape. The best way to start netscape (which you may be using to read this documentation) is to type


netscape &

which (because of the & ending the command line) puts the command "into the background" and returns the command prompt to you. Please refer to any (introductory) book on Unix for further details or ask.

This section contains a brief introduction to the Unix directory tree, as well as the pwd, mkdir, rmdir, cd, cp, mv, ln, rm commands and simple wildcard operations. Please refer in the following to the diagram handed out. Unix files ("Dateien") are organized hierarchically in directories ("Verzeichnisse"). Programs, such as CHARMM, a text-editor, or a word-processor; a list of data, a text file, a word-processor document (e.g., the file from which this guide was printed); computer code in any programming language --- all these (and more) are files. A directory is best thought of as a "container" of files; usually one puts related files in a directory. For example, many people put their word processor documents (files) in one directory and their programs in a separate directory. A directory can contain other directories, which are then often called sub-directories. This immediately leads to the hierarchy mentioned above. All commands introduced in this section have to do with changing between directories, with moving and copying files within a directory or between directories, as well as with deleting files. Within a directory, a file can be referred to simply by its name. From a given directory, a file in a different directory requires to specify the so-called path ("Pfad") in addition to the filename. The path makes clear the relation of one directory to another. Before these concepts lead to total confusion, let's try out a few things that hopefully will make them clear.

Your immediate starting point is your so-called home directory. It is where you are now. To get more information, type pwd and hit the enter (return) key ("Eingabetaste"). Depending on the machine, you will see something like /usr/people/<uname> or /home/<uname>, where <uname> is your account name, i.e. the name (NOT the password!) you have to type at the login prompt. The command is the abbreviation for print working directory = pwd. Try to type Pwd or pwD or PWD. It won't work. Unix commands are case sensitive, and so are file and directory names! The true starting point of the directory hierarchy under Unix is the directory /, the so-called root directory. A working directory name of /usr/people/<uname> implies that there is a directory usr in the / directory; in usr there is a directory people, and in the directory people there is a directory <uname>. The slashes ("/") serve to separate levels (hierarchies) of directories. Your home directory is the place where you will always find yourself after login. If you ever get "lost" in a directory hierarchy, you can return to the homedirectory by typing cd. We shall return to the cd command shortly.

Next, let's see whether there is anything in your home directory. To list your files and directories, type ls (followed by enter/return). You should see something like:


ls
datadir unit1 unit1-examples unit3 unit5 unit7 unit9 
pdb unit2 unit4 unit6 unit8

Most likely, the actual arrangement of names will be different. Unfortunately, ls did not tell you whether these names stand for files or directories. Repeat the command, but add the -l option (this is a lowercase L, not the digit 1!), i.e., type ls -l.


ls -l
total 11 
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:50 datadir 
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:51 pdb 
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:50 unit1 
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:50 unit1-examples
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:50 unit2 
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:50 unit3 
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:50 unit4 
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:50 unit5 
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:50 unit6 
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:50 unit7 
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:50 unit8 
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:50 unit9

Now you have gotten a lot of information. Your output may look somewhat differently, but it should have the same overall structure. The information we are looking for is the very first letter of each line containing a file/directory name. A 'd' indicates that the name is that of a directory; if it were a file, you would see a hyphen ('-') instead. The various directories contain the material for this course, and you should leave most of them alone at this point. You see that there is a directory 'unit?' corresponding to each major chapter ('unit') of this tutorial. Consequently, we should look into the directory unit1 since this is the first chapter (unit 1). We shall also need the directory unit1-examples later on. To do this, we need to change into the directory, which is done by the change directory = cd command. Type cd unit1. To verify where you are, type pwd and then ls -l. This time you should see something similar to


ls total 5 
-rw-rw-r-- 1 stefan stefan 14 Apr 11 13:59 file1 
-rw-rw-r-- 1 stefan stefan 14 Apr 11 13:59 file2 
-rw-rw-r-- 1 stefan stefan 14 Apr 11 13:59 file3 
-rw-rw-r-- 1 stefan stefan 14 Apr 11 13:59 file4 
-rw-rw-r-- 1 stefan stefan 14 Apr 11 13:59 file5

showing you that there are five files (note the initial '-') in directory unit1. This is a good place to try out getting back into the home directory by typing cd without any additional argument. Use pwd and ls to verify that you are indeed back, then cd unit1 to get back.

To create directories of your own use the make (a) directory, mkdir <dirname>, command, where <dirname> is the name you want to give to the directory. Before we try this out, a few hints concerning names of files and directories. Under Unix, there is no restriction on the length of a file/directoryname. Names are case sensitive, they can contain or consist exclusively of numbers. So filename, directoryname, DirectoryName, dir1, DIR1, 111 and file.name.with.some.explanations are all valid and would refer to distinct files or directories. You have to be careful with respect to certain characters, e.g.


*?"'~;<>/\()[]{} 

which have a special meaning under Unix, and you should not use them as part of a name. (This is a simplification. Many of the commands discussed and in particular the special symbols just mentioned are not directly handled by Unix, the operating system, but by the (command) shell. However, this differentiation is initially of little importance.) You may have seen in Windows95 that one can embed spaces in descriptive filenames, e.g. to use "This is my masters thesis" for the Word document containing your Diplomarbeit. You can do the same in Unix using a trick, but in general such filenames are simply a hassle (in Unix and in Windows95, should anything go wrong), so I won't show you how to do it. If you like such a long name, then use "This_is_my_masters_thesis" which will cause no problems.

Back to mkdir. Let's create two directories in unit1, where you should currently be (check with pwd). Type mkdir dir1 dir2 (or choose names of your own), then verify with ls -l that there are indeed two new directories. Now cd dir1 (change directory into dir1). Obviously, it's empty (as you can see by ls -l) since it was just created. Now we encounter an interesting problem. How do we get back from dir1 to unit1? Well, there are several ways --- let's start with the one that makes clear the general method of moving between directories: Type pwd, which, as you already know, shows you the full path ("Pfad") of the directory you are currently in. Most likely, you will see /usr/people/<uname>/unit1/dir1 as the path. When you typed cd dir1 in unit1, Unix knew that what you really wanted to do was to cd /usr/people/<uname>/unit1/dir1. The reverse reasoning leads to the slow way of stepping up in the directory hierarchy. Type cd /usr/people/<uname>/unit1. Verify that you are indeed back in unit1, then cd to dir1 again. Having to type a long pathname every time just to get back one level in a directory hierarchy is tedious. Fortunately, there is an abbreviation for your home directory, the character "~". You can, therefore, get back to unit1 by typing cd ~/unit1. Good, but it gets even simpler. Go back to dir1. Instead of typing ls -l, type ls -al. The option -a (all) shows you truly everything that is in a directory. You should see something like:


ls -al
total 2 
drwxrwxr-x 2 stefan stefan 1024 Apr 11 14:46 . 
drwxrwxr-x 4 stefan stefan 1024 Apr 11 14:46 ..

The -a option revealed two otherwise hidden directories, named "." and "..". Just as the "~" is a shorthand for your home directory, the "." is a shorthand for the current directory and ".." is a shorthand for the directory immediately above in the hierarchy. Let's try this out. First, type cd ., that is cd followed by a space and a dot. Use pwd to verify that you are still (or again) in dir1. While that was not too helpful, cd .. is. Now you are back in unit1. You can carry that concept further. From dir1, type cd ../.. Typing pwd, you see that you are in your home directory; i.e., you moved up two directory hierarchies from dir1. (Remember, this you could have achieved more easily with just cd and no argument.)

Position yourself in directory unit1. It contains five file (file1, file2 etc.) and two directories (dir1 and dir2, or whatever names you chose). Let's look what is in these files; you do this by typing cat file1 etc., e.g.


cat file1
This is file1

Not a very original content, but useful to see what happened to files when we alter their names, move them around or copy them, which we are going to do next. (The cat command can actually do quite a lot of things, but we can't go into details here). Let's start with renaming a file. For Unix, this is a special case of moving a file, so the command is called mv. Try it out by typing mv file1 newfile1. When you do a ls, you will see that file1 has vanished and that there is a new file newfile1. With cat, you can convince yourself that newfile1 has the same content as file1. The mv command can also move files between different directories, e.g. mv newfile1 dir1. Now newfile1 has vanished from unit1; however, there is now a file newfile1 in dir1. Alternatively, you could also change the name in one step, i.e., mv file2 dir1/newfile2. Convince yourself that newfile2 in dir1 has the content of the previous file2 in unit2 (from which it has disappeared). To copy a file, use the command cp instead of mv. Otherwise, the syntax is exactly the same for most things. Try out what cp file3 newfile3, cp file3 dir1 and cp file3 dir1/newfile3 do.

Since we created a lot of copies in the last step, we should get rid of (remove) some of the duplicates with the rm command. Be warned that in Unix there is no "undo" or "undelete" option for the rm command. In both unit1 and dir1 we have copies of file3, the command rm newfile3 dir1/file3 dir1/newfile3 deletes all of them. Similar to the mkdir command, rm accepts more than one argument. To remove a directory, use rmdir instead of rm. Try it out and (in unit1) execute rmdir dir1 dir2. You will get an error saying that dir1 is not empty, and ls shows you that while dir2 is gone, dir1 with all its files is still there. In other words, deleting a directory requires two steps: (i) remove the content of the directory, then (ii) the directory itself. Do that now for dir1. This is admittedly tedious, but it introduces a margin of safety.

Finally, I want to introduce you to a slightly obscure, but useful cousin of cp and mv, the creation of symbolic links with the ln -s command. It is sometimes advantageous to have in a directory something which for all practical purposes behaves like it were a file in this directory, but which in reality just points to a real file in a different directory. Let's look at a simple example. Recreate dir1 and cd into it. It's now empty again. Now we make a symbolic link to file3 in unit1. In dir1 type


ln -s ../file3 link2file3
ls -l
total 0 
lrwxrwxrwx 1 stefan stefan 8 Apr 11 15:38 link2file3 -> ../file3
cat link2file3
This is file3

The ordering of names in the ln -s command is crucial. The name of the real file comes first, then the name of the link. Don't omit the -s in the command, otherwise the result will be unexpected. It's not terribly important that you know exactly how to use this command at this stage, but you need to be aware of its existence, since in setting up your home directory I made heavy use of this feature. Change into directory ~/pdb. When you do a ls -l, you will see that the "files" are symbolic links to files in a location that does not belong to your workspace. This has several advantages. For all practical purposes, each of you has a private copy of these datafiles, yet the files exist physically only once, which saves some disk space. You can read the files via the links, but you cannot change their content by mistake. The same concept was used for the content of the other ~/units and the ~/datadir directories. You cannot change the content of a file pointed to by a symbolic link if that file is outside of your workspace and you have no permission to write to it; however, you can make a copy, which you then can edit. Later, you will have to do this for the various CHARMM input scripts to modify them for the exercises. However, the symbolic link is still present and guarantees you an unmodified copy in case you mess up or delete a file by mistake. Let's see how that works. Go back to ~/unit1/dir1. Copy link2file3 to file3 in dir1 (cp link2file3 file3). Look at the differences between file3 and link2file3 with ls -l, but note that their content is identical. Now delete the content of dir1 (rm link2file3 file3). Directory dir1 is empty, but file3 in unit1, to which link2file3 pointed, is still there. Thus, should you delete by mistake a symbolic link in one of your other directories (which you should avoid!), the original files will still be there.

So far, we have always manipulated either a single file (cp fil1 fil2) or listed each file explicitly (rm fil1 fil2). Often, however, you want to operate on a group of files simultaneously. One does this with the help of the so-called wildcard operators (one possible translation would be "Platzhalter"). The most important ones, which should be sufficient for you are the * and the ?. To try what they do, reuse or recreate the directory dir1 in ~/unit1. Then, in ~/unit1/dir1 type


touch f1 f2kl fil1 file1 fiiiiiiiiiiiiiiiiiiiiiiiiile1 f2 fil2 

(You don't have to remember the touch command; here it's used to create files having a name but no content (i.e., empty files), so that we have something to experiment with.) Then try out what


ls
ls f?
ls f*?
ls *

do.

One sees that

?

replaces exactly one letter or number in a filename (or directory name); thus, ls f? listed f1 and f2, but nothing else. (In the unit1 directory, ls file? would match all remaining filenames, file3, file4 and file5. By comparison,

*

replaces zero, one or more letters (and/or numbers) in a filename or directory name. Therefore, ls f*1 matched f1, f2k1, fil1, file1, as well as fiiiiiiiiiiiiiiiiiiiiiiiiile1; however, it did not match f2 or fil2. In our example, both ls f* and ls * match all filenames.

As useful as wildcards are, there are some limitations and dangers. First, before you do anything "final", such as rm f*, check with ls f* that you indeed select only the files that you want to delete --- maybe you only wanted to get rid of f*1? Second, it must be unambiguous what the wildcard operation should do. A command like ls f* is completely unambiguous. However, a command such as cp f? fil? is not, and cp will complain with an error that often may not make too much sense (you may want to try what happens). Nevertheless, there is one very useful combination of wildcards and cp or mv. If you give cp (or mv) more than two arguments, the last one has to be a directory. In this case all other files are copied (moved) to that directory. To demonstrate this, create a directory in dir1, e.g., subdir1. The command mv f? subdir1 moves the two files f1 and f2 into subdir1.

Editing text --- emacs

Emacs is the text editor I use regularly, so I recommend that you do the same in this course. The program has an interactive tutorial, which one can work through in 30 minutes to an hour; afterwards, you should be fairly well prepared to use it. Start emacs by typing


emacs &

on the command line (remember, the & puts the command in the background and you can continue to use the command window for other things). Once the program has started (this may be slow!) in a separate window, move the cursor over the new window and/or click on it, and type Ctrl-h t to start the tutorial (While pressing the Ctrl key type an h, release the Ctrl key and type t. On a German keyboard, the Ctrl key is the "Strg-Taste"!). Work through the tutorial; in addition look at the facilities offered by the menu bar by clicking at it with the mouse, just as you would do in a Windows program. When you are done, continue to read, but don't quit emacs!

If you are accustomed to Word or some other word-processing program under Windows, please note the following difference when editing or writing textfiles with emacs (or any texteditor). A word-processor breaks the line automatically for you ("automatischer Zeilenumbruch"). Emacs does not do this and when a line gets too long, you have to hit the Enter key to start a new line. This is actually as it should be for our purposes since a command in CHARMM normally is one line of text (so you don't want the editor to split it into two lines because it thinks the line is too long!). You can have commands in CHARMM that are longer than one line, but you have to mark that specifically (you need to put a '-' at the end of the line that is continued on the next).

Once emacs has started, it is quite fast; starting, however, is slow as you may have experienced. One should, therefore, not start emacs to edit a single file, save the file and then quit emacs again; instead, one starts one emacs session and does all editing operations in it (remember that you can split the window or open a second window (frame) to look at more than one file simultaneously!). Only before you log out, exit emacs. Remember that C-x C-f reads a file, C-x C-s saves a file, C-x C-k kills a file (buffer) which doesn't interest you anymore, and C-x C-i inserts the content of a file at he cursor position of the current buffer. To save a buffer under a different name, type C-x C-w, emacs then prompts you for the filename (you can even write to a different directory). C-x b switches between buffers, C-x C-b gives you a list of all buffers. All of these commands are also accessible from the menubar.

There is one very useful command that is not explained in the tutorial, the search and replace function of emacs. To test it, write a short file in emacs (it should contain several lines) and deliberately write one word wrong repeatedly. Alternatively, there is a short file in ~/unit1-examples/replacement.txt, which you can use to work with. Let's assume you have written Curs instead of Kurs as I did in replacement.txt. Position the cursor at the beginning of the file and start search and replace by M-%. You are prompted for the search string, type Curs followed by the Return key, then for the replacement string, Kurs followed by the Return key. Immediately, the first occurrence of Curs is highlighted (or, at least, the cursor has moved there) and emacs asks you what to do. Hitting Space or y replaces Curs by Kurs and the next instance of Curs is searched for. Hitting Backspace or n instead skips the replacement and emacs moves on to the next occurrence of Curs. When all occurrences of Curs have been visited, the command ends and emacs tells you how many replacements were made. Make sure to try it. You can also type ? to get a list of all options that you can type when emacs prompts you whether to replace a word or not.

Since you now know a little bit about emacs, it makes sense to also use it to send (and read) e-mails whenever you need to do so. Remember, you should send me mail using the username course0; your account names are course1, course2 etc. Just activate the respective functionality from the menubar. New items appear on the menubar, allowing you to carry out the most important operations without the need for knowing the abbreviations of the command. In addition, in the help menu you always find an item "Describe-mode" which gives you a terse description of what special commands are available. Please do not use this account to send mail outside the group. You can do this, but (i) these accounts expire as soon as the course is over and (ii) you do not have true privacy in these accounts.

The command (terminal) window

The editing commands you just learned for emacs also help you to use the terminal window more effectively. Start to type some command at the prompt. With C-f and C-b you can move back and forth on the line, inserting or deleting text where needed when you make a typo. Next, type C-p and you will see the last command you executed from this window before. Typing C-p again gives you the previous to last command etc. C-p and C-n go back and forth through the history of commands. Finally, you do not always have to type a full filename (directory name). Type ls and the first character of a filename in your current directory. Then hit the Tab key. The filename gets completed as much as this is unambiguously possible. Whenever the autocompletion cannot continue, you hear as slight beep and/or your terminal window blinks. Then you have to give an additional character. (This is the same mechanism available in emacs when you want to read a file)

"Redirection" and "pipes"

To run CHARMM, one needs to use a feature of Unix called inupt/output redirection. The file ~/unit1-examples/sample.output is an example of actual CHARMM output. Look at it with emacs. The content of the file will become clear over the next days. You see that CHARMM produces lots of detailed information. Frequently, one is only interested in a subset of data and would like to have them in a more compact form. Two fairly advanced Unix utilities, grep and awk, described in the next subsection, can help you accomplish exactly this. To use them effectively, one also needs to be familiar with the concept of input / output redirection including "pipes".

Switch back to (i.e., make active) a command window (by moving the mouse over it and/or clicking on it). You know that the cat command allows you to look at the content of a file. Try using it to look at sample.output. While you see the content of the file, it flashes by too quickly to read. Of course, we have emacs, but let's pretend we don't. We need to put the output of cat into a program that allows us to view a long text one page at a time. One such program is more. Connecting two commands on the commandline with the symbol "|", the so-called pipe, tells Unix to hand over (redirect) the output of the first command as input to the second command. Try this out by typing


cat sample.output | more

The output of cat is stopped after the first page. By hitting the spacebar key you can scroll through the document screen by screen; hit q to quit. You have just redirected the output of one program (cat) into another (more). As another example, let's look at a directory that contains really a lot of files. Compare what the following two commands do.


ls -l /usr/bin
ls -l /usr/bin | more

To give you a third example, there is a small utility in Unix that counts the words in a file, wc. Pipe the output of cat sample.output into wc; it tells you the number of lines (first number), number of words (second number), and the number of characters in sample.output (third number). Once you are familiar with more Unix utilities, you will get accustomed to building chains of commands, i.e.,


command1 | command2 | command3 | ... | lastcommand

One can also redirect the output of a command to a file with the symbol ">". Try the following two commands:


cat sample.output > sample.copy
cat sample.output | wc > sample.words

and look at the two new files (sample.copy, sample.words) in the editor. The first one effectively copied sample.output to sample.copy and just illustrates the concept of redirection to a file. (We could have achieved the same effect with cp sample.output sample.copy.)The second is an example of how to capture the output of a command for later reuse. A command can also read input from a file. This is done by the operator "<". Both input and output redirection will be used when we work with CHARMM, which is usually started by


charmm < input.file > output.file

CHARMM reads what it has to do from file input.file (or whatever name you have given to the script) and writes the output to file output.file (this is how sample.output was generated).

Some of the examples used are a little bit construed since both more and wc would take a filename as an argument, and one does not have to pipe the output from cat into it (cat sample.output | wc would be accomplished more easily by wc sample.output).

grep and awk

Having shown you the basics of redirection you will need to work with CHARMM, we turn our attention to the grep and awk utilities. grep searches for the occurrences of a string ("Zeichenkette") or a (search) pattern in a file (or a group of files, if you use wildcards) and prints the lines containing the string/pattern. The much more powerful awk scans a file for occurrences of a certain string or search pattern, then decomposes every line found into elements and allows you to manipulate them. The two commands are invoked as follows


grep <searchpattern> file
awk 'commands' file
awk -f script file

where file is always the name of the the file that is worked on. Here is an example. When CHARMM prints out energies it has computed, it puts the string ENER in front of each output line. Let's use grep to search for the string ENER in sample.output. Type


grep ENER sample.output

You will see several lines that contain the string ENER somewhere in them. Not all of them are interesting; one even contains the word GENERATE, which most likely has nothing to do with the energy of the system. This is where search patterns are useful. From looking at the previous output, it appears that the most interesting lines are the ones that begin with the string ENER. You can tell grep to look specifically for these lines by putting a "^" in front of ENER, i.e.,


grep ^ENER sample.output

This time you only get lines beginning with ENER. Strings, or better patterns, like ^ENER are referred to as regular expressions. They make grep (and awk) so extremely powerful; consult a textbook on Unix to learn more about them.

Try out another example. Skimming through sample.output, you see that there are many lines starting with DYNA (these are the output from a molecular dynamics simulations). In a later unit, we shall study in detail the content of the lines starting with DYNA>. To extract these lines from sample.output, use


grep 'DYNA>' sample.output

The single quotes are necessary since the character ">" would confuse Unix otherwise (remember it's a redirection operator). The quotes tell Unix to ignore its special meaning. The command does what we want it to do, but again there is too much output for one screen. Thus, you can either pipe the output into more (| more) or redirect it to a file (> file) at which you look with emacs.

The grep command let us extract the lines of interest from sample.output. The third item on each line is the simulation time in picoseconds, the following number is the total energy at this step of molecular dynamics. Assume now that you would like to plot energy as a function of simulation time. You can tell most plotting programs to select certain columns of data from a file and to ignore others, but the presence of the string DYNA> in each line may cause problems. What we really would like to do is to extract all lines from sample.output as we did with the above grep command, but to just print the third and fourth item. This would be a file that any plotting program can easily handle. One can accomplish all this with a single command using awk. Try


awk '/DYNA>/ {print $3, $4;}' sample.output | more

Alternatively to piping the output into more, you might want to redirect it to a file, e.g., > extract1.dat. You see that we select indeed the same lines as with the corresponding grep command, but only the third and fourth elements of each line are printed. In the above command, the sequence within single quotes are commands for awk; these operate on sample.output, and the result is piped into more. awk works roughly as follows: Similarly to grep, it scans the file for a string or regular expression. The search string is put between a pair of slashes, /string/, in the above example /DYNA>/. Each line containing the string can then be manipulated. The program automatically breaks up a line into elements. By default, any space is considered to separate two elements. E.g., consider the first line that is matched by the search pattern /DYNA>/


DYNA> 0  0.00000  -10.94400  427.49675  -438.44075  386.56990

The first element of this line is DYNA>, the second 0, the third 0.00000 etc. You can refer to the elements with the variables $1, $2, $3 etc. (The variable $0 is set to the full line matched by the search pattern.) awk is a complete programming language; here we just use a single command, print. We want to print the third and fourth element of each matched line, and this is exactly what print $3, $4; does. (Any awk command has to end with a ";".)

There are a few more simple things you should know about awk. (i) Often, all awk commands are placed in a file. I did this for you and created the command file extract1.awk; take a look at it with emacs. Aside from lines that are comments (those beginning with #), you see the command used above:


/DYNA>/ {print $3, $4;} 

When there is a command file (a script) for awk, it is invoked as


awk -f extract1.awk sample.output | more

The result is the same as specifying the command(s) (in single quotes) directly on the command line. (ii) You can search for more than one pattern in the same run, and of course execute more than one command for each line matched, i.e.


/pattern1/ {command1a; command1b; command1c;}
/pattern2/ {command2a; command2b;}

awk first searches for all occurrences of pattern1, executing command1a, command1b and command1c for each match. Then it starts at the beginning of the file to look for pattern2, executing command2a and command2b for each line found. To try this out, uncomment the line containing


/DCNTRL>/ {print "# ", $0;} 

in extract1.awk. When you now execute awk, there is one additional line of output. For the line containing /DCNTRL>/ the whole line ($0) is printed; before it we place the two characters "# ". A line starting with "#" is ignored by the plotting program we are going to use later on (gnuplot). The line following DCNTRL> in sample.output reports which molecular dynamics method was used by CHARMM. Thus, the first line printed by the sequence of commands above adds a "title" to the data file. (iii) Finally, as a last example let me introduce you to the getline command of awk, which is very useful in connection with extracting data from CHARMM output. In sample.output, you find blocks of lines of output, each of which begins with


DYNA>
DYNA PROP>
DYNA INTERN>
DYNA EXTERN>
DYNA PRESS>

So far, we have obtained items only from the lines beginning with DYNA>. The other entries are of course also of interest. For example, the second number in the line beginning with DYNA EXTERN> is the electrostatic energy computed for this molecular dynamics step. Assume that you want to create a datafile in which each line consists of the time and the total energy as before, plus the electrostatic energy as third number. We cannot search first for DYNA> and then for DYNA EXTERN> since it would be difficult to print all quantities of interest on a single line. Take a look at extract2.awk, which shows you one possible solution:


/DYNA>/ { 
  time=$3; 
  energy=$4;
  getline;
  getline; 
  getline; 
  elec=$4; 
  print time,energy,elec; 
}

As before, we only search for DYNA>. For each line matched, several commands are carried out. First, instead of printing them immediately, the contents of $3 and $4 are stored in variables time and energy. The getline is executed three times. This command replaces the currently matched line ($0) by the one immediately following it. Thus, after three getlines, we are now on the line beginning with DYNA EXTERN> (You should look at sample.output simultaneously.) Each line read by getline is decomposed as usual; thus, we can access the electrostatic energy as $4. It is stored in variable elec; then time, energy and elec are printed.

Let me stress one more time that these examples barely scratch the surface of what awk can do. I hope, however, that they gave you some idea as to the usefulness of this utility. In my experience, when you do serious work with CHARMM, it pays to familiarize yourself with awk. Other similar programs exist (e.g., tcl or perl), but I have found awk easiest to use for what one usually needs (a quick method of taking data of interest buried in the CHARMM output and printing them in a form so that another program can easily use them).

Getting additional help

During this course, you can obviously always ask me. There are copies of a list with important commands in emacs; there is also a book on emacs. Further, we have one introductory text and one reference book on Unix, and a book on awk; feel free to look at them, but please don't take them away! Finally, Unix has an online help system, the so-called man(ual) pages. Try them out by typing man cp. You may be surprised how many more options even this "simple" command has. The information found in man-pages is usually complete, but in a very terse format. Depending on the machine, you can either only go downwards in the manpage (enter for one line, space for a page; similarly to more), or use emacs type commands to move around, in particular, M-v to go back up again.

Summary of commands introduced in this unit

man : man <command> gives a short summary of the command and all its options. If you are not exactly sure of the name of a command, try man -k <string>, where <string> is what you think the name of the command might be. (Helps sometimes...)

pwd : print working directory. Shows you where you currently are in the directory hierarchy.

cd : change directory. Without argument, it puts you back into your home directory. Normally, it takes a single argument, which must be a valid specification of a directory.

ls : list files and directories. The following options (that can also combined) are useful: -l (long), -a (all), -CF (prints a short form which makes clear what is a directory and what is a file). Often used without argument, the command also takes one or more arguments (including wildcards). In this case, only files that match the arguments are shown

cp : copy one file to another (cp file1 file2), both file1 and file2 can contain path information. A second useful form is cp file1 file2 file10 dir, which copies file1, file2, ..., file10 to directory dir. Instead of explicitly giving all files to be copied, one may use wildcards.

mv : move and/or rename a file or directory. The syntax is very similar to that of the cp command.

mkdir : make directory. Takes one or more arguments (but obviously no wildcards...)

rmdir : remove directory. Takes one or more arguments including wildcards. The directories need to be empty, otherwise the command fails.

rm : remove files. Takes one or more arguments including wildcards. Caution: Once a file is deleted, it cannot be recovered.

cat : concatenate. Has a number of uses. The most simple one (cat file1) prints the content of file1 to the screen.

more : takes a filename as the argument and displays the content of this file one page at a time. Also very useful in combination with redirection.

grep : grep <string|pattern> file prints all lines of file containing string or matching pattern.

awk : A complete programming language that is particularly well suited to manipulate strings and rearrange the content of files. Similarly to grep, awk scans a file for all lines that contain certain strings or patterns (there can be more than one). All matching lines can be manipulated very easily. By omitting a search pattern, one can operate on every line in a file.

> : Redirection symbol which redirects the output of a command to a file. If the file exists, its content is overwrittent.

< : Redirection symbol which makes a command read its input from a file rather than from the keyboard.

| : Redirection symbol ("pipe") which redirects the output of a command directly into another command.

* : Wildcard operator replacing zero, one or more characters and/or digits.

? : Wildcard operator replacing exactly one characters and/or digit.


Next Previous Contents
Imprint: (as stipulated by austrian law, MedienG 2005): O. Steinhauser / S. Boresch, Institut für Computergestützte Biologische Chemie, Währinger Strasse 17, 1090 Wien, Austria