Next
Previous
Contents
Scope and goals of the course (Übungen)
The course is intended to familiarize you with important concepts of biomolecular
simulation, energy minimization and molecular dynamics. This is done by carrying
out practical examples using the simulation program CHARMM. The exercises may
also serve as a tutorial on how to use CHARMM though the emphasis is on the
principles of the methods, not the details of the functionality provided by
the program. Topics were selected according to less is more. Thus, the course
does not cover every aspect of or introduces you to all capabilities of CHARMM.
Instead, the knowledge you hopefully will have acquired after working through
the examples and exercises should enable you to use CHARMM (or another biomolecular
simulation package) intelligently in your own work, relying on the (terse)
CHARMM documentation and the testcases that come with the (academic) distribution
of the program for further help.
The accompanying lecture deals in some detail with the principles of biomolecular
NMR-spectroscopy and how techniques of biomolecular simulation are related
to and used in structure determination of biological macromolecules by NMR.
In the course (Übungen), however, the emphasis will be on simulation methods.
Why CHARMM?
Short answer: It's the program I know best.
Long answer: There are a number of reasons that make CHARMM very suitable
for use in an introductory course (which is not to say that it is not equally
well suited for research applications!) It is one single program, making it
easier to use and to learn than a suite of smaller programs that often have
to be "forced" to interact properly with each other. The user interface
is fairly uniform; in addition, the (academic) version of CHARMM is quite up-to-date
with respect to the latest methodological developments. Obviously, for a number
of things programs exist which handle a particular problem much more elegantly
or are easier to use. In real life, one might, therefore, prefer program XYZ
to CHARMM in such instances. However, in the context of this course we decided
that it is best to use one program for all applications rather than making
you (re)learn several programs.
The proper reference for CHARMM (Chemistry at HARvard Molecular Mechanics)
is: B. R. Brooks et al. J. Comput. Chem. 1983, 4, 187-217. Currently, a new
version of CHARMM, containing bug-fixes and new features, is released appr.
twice a year. To obtain CHARMM, one has to contact Prof. Martin Karplus (e-mail:
marci@tammy.harvard.edu or marci@brel.u-strasbg.fr)
With respect to computer graphics ...
Less is more also was the reason why no introduction to a high-powered
graphics program is given in this course. This was a somewhat problematic decision
to make since experimental structure determination requires the use of a good
graphics package. However, WHATIF, the package used most in our group, has
a too steep learning curve which would require us to spend a few days just
for getting acquainted with the program. Thus, we shall rely on the very primitive
built in graphics of CHARMM to look at a structure when this is more instructive
than a bunch of numbers.
However, if you are already familiar with one of the graphics programs
used in this group (WHATIF, RasMol, Insight, MolMol) then the parts requiring
graphics can of course also be done using this program.
This manual
tries to provide a link between the theoretical concepts taught in the
main lecture (Vorlesung) and the documentation that comes with CHARMM. It assumes
that you are familiar with basic concepts of structural molecular biology,
e.g., that you know a bit about the properties of the peptide bond or that
you know something about the properties of amino acids. (What you find in any
introductory textbook of biochemistry or molecular biology on the subject is
more than enough!) Similarly, this is not the place to explain the theory behind
the methods; however, practical ramifications are mentioned. For example, it
is not derived why a molecular dynamics simulation corresponds to the microcanonical
ensemble of statistical mechanics; however, it is emphasized that one can use
conservation of energy to gauge the quality of the computation. The CHARMM
documentation explains most commands in reasonable detail; however, it is intended
as a reference and not as an introduction. This guide, therefore, attempts
to provide the role of a tutorial. In future, improved versions of this documents
proper references and suggestions for further reading should be added. In particular,
some examples are based on the old CHARMM course outline and/or the testcases
that come with the program.
A final remark
I hope that you will find the examples and exercises interesting and instructive.
Remember that there are no stupid questions --- so do ask if anything is unclear.
This section is a crash-course about those elements/commands of Unix which
one needs to work with CHARMM. Here is a brief overview of what follows and
why it is useful. (i) CHARMM is not intended for interactive use (as opposed
to, e.g., Word); instead, one prepares a command file, a script, which tells
CHARMM what operations to perform. Thus, one has to know how to manipulate
files ("Dateien"), in particular, how to copy (cp), rename (mv) and organize
(mkdir, cd, rm, ln, mv) them. In addition, for the exercises you regularly
have to modify (edit) the example input files to achieve the desired results.
My texteditor of choice is emacs, which for our purposes also has the advantage
of offering an interactive tutorial. A texteditor is the more powerful equivalent
of the Notepad tool under Windows. (If you are already familiar with a different
editor (vi, nedit, pico, ... and if we have it installed on our machines, USE
IT!) (ii) CHARMM produces a lot of output, but sometimes only a very small
subset of the results reported are of interest. That's what the advanced Unix
utilities grep and awk are made for. (iii) I will want to look at your results.
Please send them to me by e-mail. Unless you are familiar with one of the many
mail utilities under Unix, I recommend that you use the mail facilities under
emacs that can be invoked from the menubar. (iv) Finally, one needs to plot
data. For this we shall use gnuplot; its use will be explained when it's actually
needed.
File system concepts and commands
After logging into the system, there should be at least one command or
terminal window. Make sure that the cursor is over it (or click at it), then
you are ready to type commands in this window. Depending on the computer you
work on, the lines on which you can type commands contain a % or $
sign (plus possibly some other text, like the machine name and/or your username)
Whenever the system (or rather this command window) is ready to accept input
from you, it will display this so-called prompt character, i.e., the %
or the $. Don't type if the prompt does not reappear after a command;
this is the case, for example, when you start an editor or netscape. The best
way to start netscape (which you may be using to read this documentation) is
to type
netscape &
which (because of the & ending the command line) puts the command "into
the background" and returns the command prompt to you. Please refer to any
(introductory) book on Unix for further details or ask.
This section contains a brief introduction to the Unix directory tree,
as well as the pwd, mkdir, rmdir, cd, cp, mv, ln, rm commands and simple wildcard
operations. Please refer in the following to the diagram handed out. Unix files
("Dateien") are organized hierarchically in directories ("Verzeichnisse").
Programs, such as CHARMM, a text-editor, or a word-processor; a list of data,
a text file, a word-processor document (e.g., the file from which this guide
was printed); computer code in any programming language --- all these (and
more) are files. A directory is best thought of as a "container" of files;
usually one puts related files in a directory. For example, many people put
their word processor documents (files) in one directory and their programs
in a separate directory. A directory can contain other directories, which are
then often called sub-directories. This immediately leads to the hierarchy
mentioned above. All commands introduced in this section have to do with changing
between directories, with moving and copying files within a directory or between
directories, as well as with deleting files. Within a directory, a file can
be referred to simply by its name. From a given directory, a file in a different
directory requires to specify the so-called path ("Pfad") in addition to
the filename. The path makes clear the relation of one directory to another.
Before these concepts lead to total confusion, let's try out a few things that
hopefully will make them clear.
Your immediate starting point is your so-called home directory. It is where
you are now. To get more information, type pwd and hit the enter (return) key
("Eingabetaste"). Depending on the machine, you will see something like /usr/people/<uname>
or /home/<uname>, where <uname> is your account name, i.e. the
name (NOT the password!) you have to type at the login prompt. The command
is the abbreviation for print working directory = pwd. Try to type Pwd or pwD
or PWD. It won't work. Unix commands are case sensitive, and so are file and
directory names! The true starting point of the directory hierarchy under Unix
is the directory /, the so-called root directory. A working directory name
of /usr/people/<uname> implies that there is a directory usr in the /
directory; in usr there is a directory people, and in the directory people
there is a directory <uname>. The slashes ("/") serve to separate levels
(hierarchies) of directories. Your home directory is the place where you will
always find yourself after login. If you ever get "lost" in a directory hierarchy,
you can return to the homedirectory by typing cd. We shall return to the cd
command shortly.
Next, let's see whether there is anything in your home directory. To list
your files and directories, type ls (followed by enter/return). You should
see something like:
ls
datadir unit1 unit1-examples unit3 unit5 unit7 unit9
pdb unit2 unit4 unit6 unit8
Most likely, the actual arrangement of names will be different. Unfortunately,
ls did not tell you whether these names stand for files or directories. Repeat
the command, but add the -l option (this is a lowercase L, not the digit 1!),
i.e., type ls -l.
ls -l
total 11
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:50 datadir
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:51 pdb
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:50 unit1
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:50 unit1-examples
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:50 unit2
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:50 unit3
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:50 unit4
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:50 unit5
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:50 unit6
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:50 unit7
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:50 unit8
drwxrwxr-x 2 stefan stefan 1024 Apr 11 13:50 unit9
Now you have gotten a lot of information. Your output may look somewhat
differently, but it should have the same overall structure. The information
we are looking for is the very first letter of each line containing a file/directory
name. A 'd' indicates that the name is that of a directory; if it were a file,
you would see a hyphen ('-') instead. The various directories contain the material
for this course, and you should leave most of them alone at this point. You
see that there is a directory 'unit?' corresponding to each major chapter ('unit')
of this tutorial. Consequently, we should look into the directory unit1 since
this is the first chapter (unit 1). We shall also need the directory unit1-examples
later on. To do this, we need to change into the directory, which is done by
the change directory = cd command. Type cd unit1. To verify where you are,
type pwd and then ls -l. This time you should see something similar to
ls total 5
-rw-rw-r-- 1 stefan stefan 14 Apr 11 13:59 file1
-rw-rw-r-- 1 stefan stefan 14 Apr 11 13:59 file2
-rw-rw-r-- 1 stefan stefan 14 Apr 11 13:59 file3
-rw-rw-r-- 1 stefan stefan 14 Apr 11 13:59 file4
-rw-rw-r-- 1 stefan stefan 14 Apr 11 13:59 file5
showing you that there are five files (note the initial '-') in directory
unit1. This is a good place to try out getting back into the home directory
by typing cd without any additional argument. Use pwd and ls to verify that
you are indeed back, then cd unit1 to get back.
To create directories of your own use the make (a) directory, mkdir <dirname>,
command, where <dirname> is the name you want to give to the directory.
Before we try this out, a few hints concerning names of files and directories.
Under Unix, there is no restriction on the length of a file/directoryname.
Names are case sensitive, they can contain or consist exclusively of numbers.
So filename, directoryname, DirectoryName, dir1, DIR1, 111 and file.name.with.some.explanations
are all valid and would refer to distinct files or directories. You have to
be careful with respect to certain characters, e.g.
*?"'~;<>/\()[]{}
which have a special meaning under Unix, and you should not use them as
part of a name. (This is a simplification. Many of the commands discussed and
in particular the special symbols just mentioned are not directly handled by
Unix, the operating system, but by the (command) shell. However, this differentiation
is initially of little importance.) You may have seen in Windows95 that one
can embed spaces in descriptive filenames, e.g. to use "This is my masters thesis"
for the Word document containing your Diplomarbeit. You can do the same in
Unix using a trick, but in general such filenames are simply a hassle (in Unix
and in Windows95, should anything go wrong), so I won't show you how to do
it. If you like such a long name, then use "This_is_my_masters_thesis" which
will cause no problems.
Back to mkdir. Let's create two directories in unit1, where you should
currently be (check with pwd). Type mkdir dir1 dir2 (or choose names of your
own), then verify with ls -l that there are indeed two new directories. Now
cd dir1 (change directory into dir1). Obviously, it's empty (as you can see
by ls -l) since it was just created. Now we encounter an interesting problem.
How do we get back from dir1 to unit1? Well, there are several ways --- let's
start with the one that makes clear the general method of moving between directories:
Type pwd, which, as you already know, shows you the full path ("Pfad") of
the directory you are currently in. Most likely, you will see /usr/people/<uname>/unit1/dir1
as the path. When you typed cd dir1 in unit1, Unix knew that what you really
wanted to do was to cd /usr/people/<uname>/unit1/dir1. The reverse reasoning
leads to the slow way of stepping up in the directory hierarchy. Type cd /usr/people/<uname>/unit1.
Verify that you are indeed back in unit1, then cd to dir1 again. Having to
type a long pathname every time just to get back one level in a directory hierarchy
is tedious. Fortunately, there is an abbreviation for your home directory,
the character "~". You can, therefore, get back to unit1 by typing
cd ~/unit1. Good, but it gets even simpler. Go back to dir1. Instead
of typing ls -l, type ls -al. The option -a (all) shows you truly everything
that is in a directory. You should see something like:
ls -al
total 2
drwxrwxr-x 2 stefan stefan 1024 Apr 11 14:46 .
drwxrwxr-x 4 stefan stefan 1024 Apr 11 14:46 ..
The -a option revealed two otherwise hidden directories, named "." and
"..". Just as the "~" is a shorthand for your home directory, the
"." is a shorthand for the current directory and ".." is a shorthand for
the directory immediately above in the hierarchy. Let's try this out. First,
type cd ., that is cd followed by a space and a dot. Use pwd to verify that
you are still (or again) in dir1. While that was not too helpful, cd .. is.
Now you are back in unit1. You can carry that concept further. From dir1, type
cd ../.. Typing pwd, you see that you are in your home directory; i.e., you
moved up two directory hierarchies from dir1. (Remember, this you could have
achieved more easily with just cd and no argument.)
Position yourself in directory unit1. It contains five file (file1, file2
etc.) and two directories (dir1 and dir2, or whatever names you chose). Let's
look what is in these files; you do this by typing cat file1 etc., e.g.
cat file1
This is file1
Not a very original content, but useful to see what happened to files when
we alter their names, move them around or copy them, which we are going to
do next. (The cat command can actually do quite a lot of things, but we can't
go into details here). Let's start with renaming a file. For Unix, this is
a special case of moving a file, so the command is called mv. Try it out by
typing mv file1 newfile1. When you do a ls, you will see that file1 has vanished
and that there is a new file newfile1. With cat, you can convince yourself
that newfile1 has the same content as file1. The mv command can also move files
between different directories, e.g. mv newfile1 dir1. Now newfile1 has vanished
from unit1; however, there is now a file newfile1 in dir1. Alternatively, you
could also change the name in one step, i.e., mv file2 dir1/newfile2. Convince
yourself that newfile2 in dir1 has the content of the previous file2 in unit2
(from which it has disappeared). To copy a file, use the command cp instead
of mv. Otherwise, the syntax is exactly the same for most things. Try out what
cp file3 newfile3, cp file3 dir1 and cp file3 dir1/newfile3 do.
Since we created a lot of copies in the last step, we should get rid of
(remove) some of the duplicates with the rm command. Be warned that in Unix
there is no "undo" or "undelete" option for the rm command. In both unit1
and dir1 we have copies of file3, the command rm newfile3 dir1/file3 dir1/newfile3
deletes all of them. Similar to the mkdir command, rm accepts more than one
argument. To remove a directory, use rmdir instead of rm. Try it out and (in
unit1) execute rmdir dir1 dir2. You will get an error saying that dir1 is not
empty, and ls shows you that while dir2 is gone, dir1 with all its files is
still there. In other words, deleting a directory requires two steps: (i) remove
the content of the directory, then (ii) the directory itself. Do that now for
dir1. This is admittedly tedious, but it introduces a margin of safety.
Finally, I want to introduce you to a slightly obscure, but useful cousin
of cp and mv, the creation of symbolic links with the ln -s command. It is
sometimes advantageous to have in a directory something which for all practical
purposes behaves like it were a file in this directory, but which in reality
just points to a real file in a different directory. Let's look at a simple
example. Recreate dir1 and cd into it. It's now empty again. Now we make a
symbolic link to file3 in unit1. In dir1 type
ln -s ../file3 link2file3
ls -l
total 0
lrwxrwxrwx 1 stefan stefan 8 Apr 11 15:38 link2file3 -> ../file3
cat link2file3
This is file3
The ordering of names in the ln -s command is crucial. The name of the
real file comes first, then the name of the link. Don't omit the -s in the
command, otherwise the result will be unexpected. It's not terribly important
that you know exactly how to use this command at this stage, but you need to
be aware of its existence, since in setting up your home directory I made heavy
use of this feature. Change into directory ~/pdb. When you do a ls -l,
you will see that the "files" are symbolic links to files in a location that
does not belong to your workspace. This has several advantages. For all practical
purposes, each of you has a private copy of these datafiles, yet the files
exist physically only once, which saves some disk space. You can read the files
via the links, but you cannot change their content by mistake. The same concept
was used for the content of the other ~/units and the ~/datadir
directories. You cannot change the content of a file pointed to by a symbolic
link if that file is outside of your workspace and you have no permission to
write to it; however, you can make a copy, which you then can edit. Later,
you will have to do this for the various CHARMM input scripts to modify them
for the exercises. However, the symbolic link is still present and guarantees
you an unmodified copy in case you mess up or delete a file by mistake. Let's
see how that works. Go back to ~/unit1/dir1. Copy link2file3 to file3
in dir1 (cp link2file3 file3). Look at the differences between file3 and link2file3
with ls -l, but note that their content is identical. Now delete the content
of dir1 (rm link2file3 file3). Directory dir1 is empty, but file3 in unit1,
to which link2file3 pointed, is still there. Thus, should you delete by mistake
a symbolic link in one of your other directories (which you should avoid!),
the original files will still be there.
So far, we have always manipulated either a single file (cp fil1 fil2)
or listed each file explicitly (rm fil1 fil2). Often, however, you want to
operate on a group of files simultaneously. One does this with the help of
the so-called wildcard operators (one possible translation would be "Platzhalter").
The most important ones, which should be sufficient for you are the * and the
?. To try what they do, reuse or recreate the directory dir1 in ~/unit1.
Then, in ~/unit1/dir1 type
touch f1 f2kl fil1 file1 fiiiiiiiiiiiiiiiiiiiiiiiiile1 f2 fil2
(You don't have to remember the touch command; here it's used to create
files having a name but no content (i.e., empty files), so that we have something
to experiment with.) Then try out what
ls
ls f?
ls f*?
ls *
do.
One sees that
- ?
replaces exactly one letter or number in a filename (or directory
name); thus, ls f? listed f1 and f2, but nothing else. (In the unit1 directory,
ls file? would match all remaining filenames, file3, file4 and file5. By comparison,
- *
replaces zero, one or more letters (and/or numbers) in a filename
or directory name. Therefore, ls f*1 matched f1, f2k1, fil1, file1, as well
as fiiiiiiiiiiiiiiiiiiiiiiiiile1; however, it did not match f2 or fil2. In
our example, both ls f* and ls * match all filenames.
As useful as wildcards are, there are some limitations and dangers. First,
before you do anything "final", such as rm f*, check with ls f* that you
indeed select only the files that you want to delete --- maybe you only wanted
to get rid of f*1? Second, it must be unambiguous what the wildcard operation
should do. A command like ls f* is completely unambiguous. However, a command
such as cp f? fil? is not, and cp will complain with an error that often may
not make too much sense (you may want to try what happens). Nevertheless, there
is one very useful combination of wildcards and cp or mv. If you give cp (or
mv) more than two arguments, the last one has to be a directory. In this case
all other files are copied (moved) to that directory. To demonstrate this,
create a directory in dir1, e.g., subdir1. The command mv f? subdir1 moves
the two files f1 and f2 into subdir1.
Editing text --- emacs
Emacs is the text editor I use regularly, so I recommend that you do the
same in this course. The program has an interactive tutorial, which one can
work through in 30 minutes to an hour; afterwards, you should be fairly well
prepared to use it. Start emacs by typing
emacs &
on the command line (remember, the & puts the command in the background
and you can continue to use the command window for other things). Once the
program has started (this may be slow!) in a separate window, move the cursor
over the new window and/or click on it, and type Ctrl-h t to start the tutorial
(While pressing the Ctrl key type an h, release the Ctrl key and type t. On
a German keyboard, the Ctrl key is the "Strg-Taste"!). Work through the tutorial;
in addition look at the facilities offered by the menu bar by clicking at it
with the mouse, just as you would do in a Windows program. When you are done,
continue to read, but don't quit emacs!
If you are accustomed to Word or some other word-processing program under
Windows, please note the following difference when editing or writing textfiles
with emacs (or any texteditor). A word-processor breaks the line automatically
for you ("automatischer Zeilenumbruch"). Emacs does not do this and when
a line gets too long, you have to hit the Enter key to start a new line. This
is actually as it should be for our purposes since a command in CHARMM normally
is one line of text (so you don't want the editor to split it into two lines
because it thinks the line is too long!). You can have commands in CHARMM that
are longer than one line, but you have to mark that specifically (you need
to put a '-' at the end of the line that is continued on the next).
Once emacs has started, it is quite fast; starting, however, is slow as
you may have experienced. One should, therefore, not start emacs to edit a
single file, save the file and then quit emacs again; instead, one starts one
emacs session and does all editing operations in it (remember that you can
split the window or open a second window (frame) to look at more than one file
simultaneously!). Only before you log out, exit emacs. Remember that C-x C-f
reads a file, C-x C-s saves a file, C-x C-k kills a file (buffer) which doesn't
interest you anymore, and C-x C-i inserts the content of a file at he cursor
position of the current buffer. To save a buffer under a different name, type
C-x C-w, emacs then prompts you for the filename (you can even write to a different
directory). C-x b switches between buffers, C-x C-b gives you a list of all
buffers. All of these commands are also accessible from the menubar.
There is one very useful command that is not explained in the tutorial,
the search and replace function of emacs. To test it, write a short file in
emacs (it should contain several lines) and deliberately write one word wrong
repeatedly. Alternatively, there is a short file in ~/unit1-examples/replacement.txt,
which you can use to work with. Let's assume you have written Curs instead
of Kurs as I did in replacement.txt. Position the cursor at the beginning of
the file and start search and replace by M-%. You are prompted for the
search string, type Curs followed by the Return key, then for the replacement
string, Kurs followed by the Return key. Immediately, the first occurrence
of Curs is highlighted (or, at least, the cursor has moved there) and emacs
asks you what to do. Hitting Space or y replaces Curs by Kurs and the next
instance of Curs is searched for. Hitting Backspace or n instead skips the
replacement and emacs moves on to the next occurrence of Curs. When all occurrences
of Curs have been visited, the command ends and emacs tells you how many replacements
were made. Make sure to try it. You can also type ? to get a list of all options
that you can type when emacs prompts you whether to replace a word or not.
Since you now know a little bit about emacs, it makes sense to also use
it to send (and read) e-mails whenever you need to do so. Remember, you should
send me mail using the username course0; your account names are course1, course2
etc. Just activate the respective functionality from the menubar. New items
appear on the menubar, allowing you to carry out the most important operations
without the need for knowing the abbreviations of the command. In addition,
in the help menu you always find an item "Describe-mode" which gives you
a terse description of what special commands are available. Please do not use
this account to send mail outside the group. You can do this, but (i) these
accounts expire as soon as the course is over and (ii) you do not have true
privacy in these accounts.
The command (terminal) window
The editing commands you just learned for emacs also help you to use the
terminal window more effectively. Start to type some command at the prompt.
With C-f and C-b you can move back and forth on the line, inserting or deleting
text where needed when you make a typo. Next, type C-p and you will see the
last command you executed from this window before. Typing C-p again gives you
the previous to last command etc. C-p and C-n go back and forth through the
history of commands. Finally, you do not always have to type a full filename
(directory name). Type ls and the first character of a filename in your current
directory. Then hit the Tab key. The filename gets completed as much as this
is unambiguously possible. Whenever the autocompletion cannot continue, you
hear as slight beep and/or your terminal window blinks. Then you have to give
an additional character. (This is the same mechanism available in emacs when
you want to read a file)
"Redirection" and "pipes"
To run CHARMM, one needs to use a feature of Unix called inupt/output redirection.
The file ~/unit1-examples/sample.output is an example of actual CHARMM
output. Look at it with emacs. The content of the file will become clear over
the next days. You see that CHARMM produces lots of detailed information. Frequently,
one is only interested in a subset of data and would like to have them in a
more compact form. Two fairly advanced Unix utilities, grep and awk, described
in the next subsection, can help you accomplish exactly this. To use them effectively,
one also needs to be familiar with the concept of input / output redirection
including "pipes".
Switch back to (i.e., make active) a command window (by moving the mouse
over it and/or clicking on it). You know that the cat command allows you to
look at the content of a file. Try using it to look at sample.output. While
you see the content of the file, it flashes by too quickly to read. Of course,
we have emacs, but let's pretend we don't. We need to put the output of cat
into a program that allows us to view a long text one page at a time. One such
program is more. Connecting two commands on the commandline with the symbol
"|", the so-called pipe, tells Unix to hand over (redirect) the output of
the first command as input to the second command. Try this out by typing
cat sample.output | more
The output of cat is stopped after the first page. By hitting the spacebar
key you can scroll through the document screen by screen; hit q to quit. You
have just redirected the output of one program (cat) into another (more). As
another example, let's look at a directory that contains really a lot of files.
Compare what the following two commands do.
ls -l /usr/bin
ls -l /usr/bin | more
To give you a third example, there is a small utility in Unix that counts
the words in a file, wc. Pipe the output of cat sample.output into wc; it tells
you the number of lines (first number), number of words (second number), and
the number of characters in sample.output (third number). Once you are familiar
with more Unix utilities, you will get accustomed to building chains of commands,
i.e.,
command1 | command2 | command3 | ... | lastcommand
One can also redirect the output of a command to a file with the symbol
">". Try the following two commands:
cat sample.output > sample.copy
cat sample.output | wc > sample.words
and look at the two new files (sample.copy, sample.words) in the editor.
The first one effectively copied sample.output to sample.copy and just illustrates
the concept of redirection to a file. (We could have achieved the same effect
with cp sample.output sample.copy.)The second is an example of how to capture
the output of a command for later reuse. A command can also read input from
a file. This is done by the operator "<". Both input and output redirection
will be used when we work with CHARMM, which is usually started by
charmm < input.file > output.file
CHARMM reads what it has to do from file input.file (or whatever name you
have given to the script) and writes the output to file output.file (this is
how sample.output was generated).
Some of the examples used are a little bit construed since both more and
wc would take a filename as an argument, and one does not have to pipe the
output from cat into it (cat sample.output | wc would be accomplished more
easily by wc sample.output).
grep and awk
Having shown you the basics of redirection you will need to work with CHARMM,
we turn our attention to the grep and awk utilities. grep searches for the
occurrences of a string ("Zeichenkette") or a (search) pattern in a file
(or a group of files, if you use wildcards) and prints the lines containing
the string/pattern. The much more powerful awk scans a file for occurrences
of a certain string or search pattern, then decomposes every line found into
elements and allows you to manipulate them. The two commands are invoked as
follows
grep <searchpattern> file
awk 'commands' file
awk -f script file
where file is always the name of the the file that is worked on. Here is
an example. When CHARMM prints out energies it has computed, it puts the string
ENER in front of each output line. Let's use grep to search for the string
ENER in sample.output. Type
grep ENER sample.output
You will see several lines that contain the string ENER somewhere in them.
Not all of them are interesting; one even contains the word GENERATE, which
most likely has nothing to do with the energy of the system. This is where
search patterns are useful. From looking at the previous output, it appears
that the most interesting lines are the ones that begin with the string ENER.
You can tell grep to look specifically for these lines by putting a "^" in
front of ENER, i.e.,
grep ^ENER sample.output
This time you only get lines beginning with ENER. Strings, or better patterns,
like ^ENER are referred to as regular expressions. They make grep (and awk)
so extremely powerful; consult a textbook on Unix to learn more about them.
Try out another example. Skimming through sample.output, you see that there
are many lines starting with DYNA (these are the output from a molecular dynamics
simulations). In a later unit, we shall study in detail the content of the
lines starting with DYNA>. To extract these lines from sample.output, use
grep 'DYNA>' sample.output
The single quotes are necessary since the character ">" would confuse
Unix otherwise (remember it's a redirection operator). The quotes tell Unix
to ignore its special meaning. The command does what we want it to do, but
again there is too much output for one screen. Thus, you can either pipe the
output into more (| more) or redirect it to a file (> file) at which you
look with emacs.
The grep command let us extract the lines of interest from sample.output.
The third item on each line is the simulation time in picoseconds, the following
number is the total energy at this step of molecular dynamics. Assume now that
you would like to plot energy as a function of simulation time. You can tell
most plotting programs to select certain columns of data from a file and to
ignore others, but the presence of the string DYNA> in each line may cause
problems. What we really would like to do is to extract all lines from sample.output
as we did with the above grep command, but to just print the third and fourth
item. This would be a file that any plotting program can easily handle. One
can accomplish all this with a single command using awk. Try
awk '/DYNA>/ {print $3, $4;}' sample.output | more
Alternatively to piping the output into more, you might want to redirect
it to a file, e.g., > extract1.dat. You see that we select indeed the same
lines as with the corresponding grep command, but only the third and fourth
elements of each line are printed. In the above command, the sequence within
single quotes are commands for awk; these operate on sample.output, and the
result is piped into more. awk works roughly as follows: Similarly to grep,
it scans the file for a string or regular expression. The search string is
put between a pair of slashes, /string/, in the above example /DYNA>/. Each
line containing the string can then be manipulated. The program automatically
breaks up a line into elements. By default, any space is considered to separate
two elements. E.g., consider the first line that is matched by the search pattern
/DYNA>/
DYNA> 0 0.00000 -10.94400 427.49675 -438.44075 386.56990
The first element of this line is DYNA>, the second 0, the third 0.00000
etc. You can refer to the elements with the variables $1, $2,
$3 etc. (The variable $0 is set to the full line matched by the
search pattern.) awk is a complete programming language; here we just use a
single command, print. We want to print the third and fourth element of each
matched line, and this is exactly what print $3, $4; does. (Any
awk command has to end with a ";".)
There are a few more simple things you should know about awk. (i) Often,
all awk commands are placed in a file. I did this for you and created the command
file extract1.awk; take a look at it with emacs. Aside from lines that are
comments (those beginning with #), you see the command used above:
/DYNA>/ {print $3, $4;}
When there is a command file (a script) for awk, it is invoked as
awk -f extract1.awk sample.output | more
The result is the same as specifying the command(s) (in single quotes)
directly on the command line. (ii) You can search for more than one pattern
in the same run, and of course execute more than one command for each line
matched, i.e.
/pattern1/ {command1a; command1b; command1c;}
/pattern2/ {command2a; command2b;}
awk first searches for all occurrences of pattern1, executing command1a,
command1b and command1c for each match. Then it starts at the beginning of
the file to look for pattern2, executing command2a and command2b for each line
found. To try this out, uncomment the line containing
/DCNTRL>/ {print "# ", $0;}
in extract1.awk. When you now execute awk, there is one additional line
of output. For the line containing /DCNTRL>/ the whole line ($0)
is printed; before it we place the two characters "# ". A line starting
with "#" is ignored by the plotting program we are going to use later
on (gnuplot). The line following DCNTRL> in sample.output reports which
molecular dynamics method was used by CHARMM. Thus, the first line printed
by the sequence of commands above adds a "title" to the data file. (iii)
Finally, as a last example let me introduce you to the getline command of awk,
which is very useful in connection with extracting data from CHARMM output.
In sample.output, you find blocks of lines of output, each of which begins
with
DYNA>
DYNA PROP>
DYNA INTERN>
DYNA EXTERN>
DYNA PRESS>
So far, we have obtained items only from the lines beginning with DYNA>.
The other entries are of course also of interest. For example, the second number
in the line beginning with DYNA EXTERN> is the electrostatic energy computed
for this molecular dynamics step. Assume that you want to create a datafile
in which each line consists of the time and the total energy as before, plus
the electrostatic energy as third number. We cannot search first for DYNA>
and then for DYNA EXTERN> since it would be difficult to print all quantities
of interest on a single line. Take a look at extract2.awk, which shows you
one possible solution:
/DYNA>/ {
time=$3;
energy=$4;
getline;
getline;
getline;
elec=$4;
print time,energy,elec;
}
As before, we only search for DYNA>. For each line matched, several
commands are carried out. First, instead of printing them immediately, the
contents of $3 and $4 are stored in variables time and energy.
The getline is executed three times. This command replaces the currently matched
line ($0) by the one immediately following it. Thus, after three getlines,
we are now on the line beginning with DYNA EXTERN> (You should look at sample.output
simultaneously.) Each line read by getline is decomposed as usual; thus, we
can access the electrostatic energy as $4. It is stored in variable
elec; then time, energy and elec are printed.
Let me stress one more time that these examples barely scratch the surface
of what awk can do. I hope, however, that they gave you some idea as to the
usefulness of this utility. In my experience, when you do serious work with
CHARMM, it pays to familiarize yourself with awk. Other similar programs exist
(e.g., tcl or perl), but I have found awk easiest to use for what one usually
needs (a quick method of taking data of interest buried in the CHARMM output
and printing them in a form so that another program can easily use them).
Getting additional help
During this course, you can obviously always ask me. There are copies of
a list with important commands in emacs; there is also a book on emacs. Further,
we have one introductory text and one reference book on Unix, and a book on
awk; feel free to look at them, but please don't take them away! Finally, Unix
has an online help system, the so-called man(ual) pages. Try them out by typing
man cp. You may be surprised how many more options even this "simple" command
has. The information found in man-pages is usually complete, but in a very
terse format. Depending on the machine, you can either only go downwards in
the manpage (enter for one line, space for a page; similarly to more), or use
emacs type commands to move around, in particular, M-v to go back up again.
Summary of commands introduced in this unit
man : man <command> gives a short summary of the command and all
its options. If you are not exactly sure of the name of a command, try man -k <string>,
where <string> is what you think the name of the command might be. (Helps
sometimes...)
pwd : print working directory. Shows you where you currently are in the
directory hierarchy.
cd : change directory. Without argument, it puts you back into your home
directory. Normally, it takes a single argument, which must be a valid specification
of a directory.
ls : list files and directories. The following options (that can also combined)
are useful: -l (long), -a (all), -CF (prints a short form which makes clear
what is a directory and what is a file). Often used without argument, the command
also takes one or more arguments (including wildcards). In this case, only
files that match the arguments are shown
cp : copy one file to another (cp file1 file2), both file1 and file2 can
contain path information. A second useful form is cp file1 file2 file10 dir,
which copies file1, file2, ..., file10 to directory dir. Instead of explicitly
giving all files to be copied, one may use wildcards.
mv : move and/or rename a file or directory. The syntax is very similar
to that of the cp command.
mkdir : make directory. Takes one or more arguments (but obviously no wildcards...)
rmdir : remove directory. Takes one or more arguments including wildcards.
The directories need to be empty, otherwise the command fails.
rm : remove files. Takes one or more arguments including wildcards. Caution:
Once a file is deleted, it cannot be recovered.
cat : concatenate. Has a number of uses. The most simple one (cat file1)
prints the content of file1 to the screen.
more : takes a filename as the argument and displays the content of this
file one page at a time. Also very useful in combination with redirection.
grep : grep <string|pattern> file prints all lines of file containing
string or matching pattern.
awk : A complete programming language that is particularly well suited
to manipulate strings and rearrange the content of files. Similarly to grep,
awk scans a file for all lines that contain certain strings or patterns (there
can be more than one). All matching lines can be manipulated very easily. By
omitting a search pattern, one can operate on every line in a file.
> : Redirection symbol which redirects the output of a command to a
file. If the file exists, its content is overwrittent.
< : Redirection symbol which makes a command read its input from a file
rather than from the keyboard.
| : Redirection symbol ("pipe") which redirects the output of a command
directly into another command.
* : Wildcard operator replacing zero, one or more characters and/or digits.
? : Wildcard operator replacing exactly one characters and/or digit.
Next
Previous
Contents
|