O for Morons

Tutorial version 1.1

O for Morons - a Beginner's Guide

Written by Gerard J. Kleywegt

Department of Molecular Biology

University of Uppsala

Uppsala, Sweden

E-mail: "gerard@xray.bmc.uu.se"

(c) 1994 - G.J. Kleywegt

Version 0.1 @ 931222 - lay-out of the document

Version 0.2 @ 931223 - chapters 1 and 2; part of 10.3

Version 0.3 @ 931227 - chapters 3, 4, 5 and 6

Version 0.4 @ 931228 - chapters 7 and 8; more macros

Version 0.5 @ 931229 - first, almost complete version

Version 0.6 @ 931230 - appendix 10.2 and 10.6; put into MacWriteII

Version 0.7 @ 931231 - checked spelling; cleaned up

Version 0.8 @ 940102 - "beta-test version"

Version 0.9 @ 940114 - changes, corrections etc.

Version 1.0 @ 940115 - this-will-have-to-do version

Version 1.1 @ 940211 - changes after Protein Engineering course

0.0 - Contents

Section Contents Page

0.0 Contents 3

0.1 Preamble 7

0.2 Notes for instructors 8

1.0 Getting started 10

1.1 Let's gO ! 11

1.2 Saving your work 13

1.3 Well, well, what's all this then ? 15

1.4 My first protein structure ! 16

1.5 Connecting the dots 19

1.6 Shake, rattle 'n' roll 20

1.7 Question time ! 22

2.0 Properties, structure and paint 24

2.1 More detail 25

2.2 Structure 26

2.3 Painting 28

2.4 It's the Spanish Inquisition ! 31

3.0 What's on the menu today ? 34

3.1 What's a menu ? 35

3.2 Fast food 35

3.3 A la carte 37

3.4 The customiser is king 40

3.5 Connecting the dots differently 43

3.6 What's up, Doc ? 44

4.0 Again and again and again ... 46

4.1 Recipes 47

4.2 What's the ID ? 49

4.3 What if ? 50

4.4 Symbolically speaking 50

4.5 Unix speaking 51

4.6 Do-it-yourself ! 51

4.7 Inter-course fun 53

4.8 Tell me why 53

5.0 What a super position ! 54

5.1 All lipocalins are equal ... 55

5.2 Match-makers 56

5.3 Operator, what's the number ? 59

5.4 Advanced match 62

5.5 How did you do ? 64

5.6 Always asking questions 66

6.0 Superficial voids 68

6.1 Speleology 69

6.2 Surfaces 69

6.3 More superficiality 71

6.4 What now ? 72

7.0 Get out the yard-stick ! 74

7.1 What's your angle ? 75

7.2 Neighbours 76

7.3 Interactive 76

7.4 The main chain stays mainly in the plane 78

7.5 Making flippy floppy 80

7.6 Walk on the wild side chain 82

7.7 Answer me, please ! 84

8.0 Teenage, mutant, Ninja proteins 88

8.1 I think I'm having a fit ! 89

8.2 More mutations 91

8.3 Cleaning up 92

8.4 Insertions 95

8.5 Fine-tuning 97

8.6 Reconstructing a structure 98

8.7 One more question, Your Honour 100

9.0 Fancy pictures 102

9.1 Sketch it ! 103

9.2 On the table 104

9.3 Selections 106

9.4 Your own drawings 107

9.5 Nice pictures 108

10.0 Appendices 110

10.1 Index of O commands discussed in this tutorial 110

10.2 Inverted index 113

10.3 Frequently Asked Questions 115

10.3.1 Why does mutate_insert/replace distort my molecule ? 115

10.3.2 Is there an easy way to select fancy colours in O ? 115

10.3.3 Why do the dials move so fast on my SGI ? 115

10.3.4 What is the best way to backup a molecule ? 116

10.3.5 Is there a split-screen stereo, and if so, how do I access it ? 116

10.3.6 When I try to display a map, I get an error condition #43. 116

10.3.7 How do I reset the parameters for major menu X ? 116

10.3.8 When I display my CA trace, some of the bonds are missing. 116

10.3.9 Can I display an RNA backbone ? 116

10.3.10 How can I centre on a particular spot in space ? 116

10.3.11 How do I mutate_insert after the last residue ? 117

10.3.12 How do I mutate_insert before the first residue ? 117

10.3.13 Why does RSC give crazy values for some residue types ? 118

10.3.14 Why does O ignore some of the commands in my macro ? 118

10.3.15 Why doesn't O draw all bonds in my ligand ? 118

10.3.16 Does O work with nucleic acids ? 118

10.3.17 How can I give my waters/ligands a different chain-id ? 118

10.3.18 How can I get H-bonds between protein and ligand ? 119

10.3.19 Can I get a LOG file from O ? 119

10.3.20 What does this INST error mean ? 119

10.3.21 Can I ID the atom at the active centre from a macro ? 119

10.3.22 How should I contour cavities and surfaces ? 119

10.3.23 How can I connect the two S-gamma atoms in a disulfide ? 119

10.3.24 Why doesn't sphere_atom work in my ODL file ? 119

10.4 Macros 121

10.4.1 date.omac 121

10.4.2 colour_code.omac 121

10.4.3 edb.omac 121

10.4.4 all_on_off.omac 121

10.4.5 bad_flip.omac 122

10.4.6 acid_base.omac 122

10.4.7 cnos_colours.omac 122

10.4.8 ball_and_stick.omac 12210.4.9 set_prefs.omac 123

10.4.10 paint_restype.omac 123

10.4.11 save_view.omac 123

10.4.12 yasspa.omac 123

10.4.13 rainbow.omac 124

10.4.14 nice_residue_colours.omac 124

10.4.15 sketch_setup.omac 125

10.5 Other O commands 126

10.6 Selected datablocks 128

10.6.1 .message_template 128

10.6.2 .id_template 128

10.6.3 .molec_obj_integer 128

10.6.4 .molec_obj_real 128

10.6.5 .o-version 129

10.6.6 .active_centre 129

10.6.7 .menu 129

10.6.8 .moving_atom 129

10.6.9 .colour_names 129

10.6.10 .error_messages 130

10.6.11 .dial_real 130

10.6.12 .timestamp 130

10.6.13 .symbols 130

10.6.14 .cpk_radii 130

10.6.15 .gs_real 131

10.6.16 .active_colour 131

10.6.17 .menu_minor_name 131

10.6.18 .trig_real 131

10.6.19 .torsion_information 131

10.6.20 file_o_save 132

10.6.21 file_o_backup 132

10.6.22 .solid_hbonds 132

10.6.23 .lsq_integer 132

10.6.24 .refi_dict 132

10.6.25 .menu_integer 133

10.6.26 .menu_real 133

0.1 - Preamble

This document was written for use in the "Computers, Graphics & Databases in Molecular Biology" module of the Protein Engineering course of Winter 1994. This document assumes use of O release 5.9.2 on an SGI/Unix workstation.

O is a macromolecular crystallographic modelling program. It can be used to look at biomacromolecular structures, to analyse them, to compare them, to modify them and to build them from scratch (using crystallographic data). The program is described in:

* T.A. Jones, J.Y. Zou, S.W. Cowan & M. Kjeldgaard, "Improved Methods for the Building of Protein Models in Electron Density Maps and the Location of Errors in these Models", Acta Crystallographica, A47 (1991) 110-119

* T.A. Jones & M. Kjeldgaard, "O -- the manual", 161 pp., Uppsala (1993)

Some of the algorithms implemented in O are described in earlier papers, referenced in both documents. Others are described in these documents themselves.

This tutorial is no substitute for the O Manual; they are complementary documents. The O Manual explains every command in turn; this tutorial (hopefully) guides you through a subset of the O commands in a more or less logical order. Also, the tutorial will not teach you how to use O in a biomacromolecular crystallography context, i.e. you will not learn how to build a protein structure from an MIR map (we may write a tutorial for that later).

What you WILL learn is how to draw, analyse and compare protein structures. Extension to other types of molecule (RNA, DNA, ...) is not entirely trivial, but should not be impossible once you master the basics of O.

The tutorial contains nine instructional chapters and an appendix. Working through chapter one should take about an hour; chapters 2, 3, 4, 6 and 9 take ~2 to 3 hours and chapters 5, 7 and 8 take ~3 to 4 hours. Some questions which test subject matter to which you have just been introduced have been included both throughout and at the end of each chapter. Try to answer these (there's not always necessarily one correct answer) and write down your answers, so your assistant can check them.

0.2 - Notes for instructors

In the Uppsala Protein Engineering course, the first three days are reserved for an introduction to computers (Unix), graphics (O) and databases (PDB, nucleotide sequences) in Molecular Biology. In the 1994 course, we have room for 32 students using 8 SGI Indy's. The students are divided into 16 pairs, 8 of which are working in the graphics lab at any given time (the others have other assignments); once a day the groups switch.

The program for this course is as follows (practical O1 is chapter 1 of the tutorial, O2 is chapters 2 and 3, and O3 is chapters 3 and 4; the other chapters are done later on in the course):

Day 1 a.m.: four thirty-minute talks about the following subjects:

* Introduction to Unix

* Introduction to O

* Databases in Molecular Biology

* The Protein Data Bank

Day 1 p.m.: two-hour practical (Unix exercise and O1) for each group of eight pairs

Day 2 a.m.:

* four-hour practical (O2) for eight pairs

* two 90-minute database practicals for two groups of two pairs

Day 2 p.m.:

* four-hour practical (O2) for eight pairs

* two 90-minute database practicals for two groups of two pairs

Day 3 a.m.:

* four-hour practical (PDB exercise and O3) for eight pairs

* two 90-minute database practicals for two groups of two pairs

Day 3 p.m.:

* four-hour practical (PDB exercise and O3) for eight pairs

* two 90-minute database practicals for two groups of two pairs

We work with four teaching assistants (TAs) for every O practical; another TA takes care of all database practicals (using Macintoshes with access to the Internet). The coordinator is present during all practicals and can jump in where needed.

The students that take the course are not expected to know anything at all about computers, Unix, graphics, O, proteins, crystallography, NMR, etc. etc. This is the reason why some basic principles of protein structure have to be introduced briefly in this part of the course. If this tutorial is used by protein crystallographers who are new to O, they will probably want to avoid drawing a partial Ramachandran plot by hand ...

Students need to have a work directory; they should be supplied with the coordinates of chain A of P2 myelin protein (a very poor and unrefined model !) and, separately, with those of the fatty-acid ligand inside chain A. Students should have paths (alternative: aliases) to reach the C-shell scripts "run" and "ono". Executables of VOIDOO, MOLEMAN, MAPMAN, ODBMAN and O2D are needed for some exercises. Also, the students must be able to run GhostScript or GhostView, or some other PostScript viewer (printing PostScript files is optional).

Please send questions, suggestions, comments, complaints etc. to

gerard@xray.bmc.uu.se

1 - Getting started

In this chapter you will learn how to run the O program, how to create a new O database file, how to import a molecule into the program/database, how to display a molecule and how to save your database.

New O commands:

Backup_DB

Ca_zone

Centre_ID

Centre_Xyz

Centre_atom

Centre_zone

Directory

End_object

Molecule_nam

Object_name

On_off

Sam_atom_in

Sam_list_seq

Save_DB

Stop

*

Your notes:

1.1 - Let's gO !

During the course, you can run the O program by typing "ono" at the Unix prompt:

unix > ono

Throughout this tutorial, lines starting with " unix > " indicate that you are in the Unix environment, where you have to use Unix commands. Lines starting with " O > " indicate that you are within O, which means that you have to type valid O commands. Things that you have to type in are shown in bold typeface. The output has sometimes been edited to conserve space; this is usually indicated by a series of three dots (...) on a line by itself. Occasionally, action you have to take with the mouse, function keys or dials will be indicated in brackets, for instance: { click "On_off" }. Note that each chapter starts with an ultra-brief description of what you will learn, plus a list of new O commands that you will be taught how to use. There is white space next to these commands which you are encouraged to use for making your own notes (for instance, the syntax of the command, or an evaluation of how useful the command is, or a description in your own words of what the command actually does) !

The first time you start the program (make sure that you are in your work directory when you do this !), two new directories will be created automatically:

unix > ono

... Run 4d_ono

... Link to odat directory not found

... Making a soft link to the odat directory for you

... Link to omac directory not found

... Making a soft link to the omac directory for you

... Executing /nfs/taj/alwyn/o/bin/4d_ono

... For gerard on rigel at Thu Dec 23 15:37:56 MET 1993

The first one, called "odat", contains lots of files which all users need often. The other one, "omac", contains lots of useful O files which have been made by other O users.

Since this is the first time that you use O, you have to put some general information into the database. You do this by giving O the names of two general files (which reside in the "odat" directory):

* menu.o a file which contains the names of all O commands

* startup.o a file which contains lots of general information

Once you have told O the names of these files, it asks you again for a file name. Since you don't have to supply any more files at this stage, just hit the key marked "Enter" (also known as the "Return" or "Carriage Return" or "Linefeed" key): O > Use of this program implies acceptance of conditions

O > described in Appendix 1 of the O manual

O > O version 5.9.2 , Tue Nov 23 12:20:13 MET 1993

O > Define an O file (terminate with blank): menu.o

O > ...file is formatted

O > menu.o file for O version 5.9

O > Last modified 10-Aug-1993

O > Define an O file (terminate with blank): startup.o

O > ...file is formatted

O > startup.o file for O version 5.9

O > Last modified 8-oct-1993

O > Define an O file (terminate with blank):

Now O finds out that it misses an important file, which defines which atoms are connected (and, therefore, which bonds have to be drawn in pictures of proteins that you are going to make). O says "Enter file name", and then suggests a file name which is enclosed in [square brackets]. This is a general mechanism in O: often when the program asks you a question, it will suggest an answer which is printed in between square brackets. Such a value (it can be a file name, but also a number, or a series of numbers) is called a "default (value)". If you want to use this default, all you have to do is to hit the Return key. In this case, the default for the file name may be assumed to be okay, so accept it:

O > Enter file name [ /nfs/taj/alwyn/o/data/all.dat]:

O > Maximum inter-residue link distance = 2.00

O > There were 23 residues.

O > 175 atoms.

This particular file contains information about 23 amino-acid residue types. If you use this file to display a protein structure, all bonded atoms will get lines drawn between them. Inter-residue peptide bonds (between the carbonyl carbon of a residue i and the amide nitrogen of residue i+1) will be drawn if the distance between these two atoms is less than 2.0 Å.

The final thing that O wants to know is if you want to use the display. The default answer is "Yes"; only on rare occasions will your actual answer be "No". In this case, accept the default:

O > Do you want to use the display? [Yes]:

O > Graphics board GL4DXG-4.0

O > Making visibility data structures.

O > O > Trackball on (F7KEY)

Oops-a-daisy: all of a sudden you have an extra window on the screen ! This window is black and contains some information about the version of O that you are using. All the drawings that you are going to make will appear in this window. It is therefore called the "graphics window". The window in which you have been typing is called the "terminal window". In the terminal window, you see " O > ". This is called a "prompt"; it reminds you that you are actually "talking" to O, rather than to Unix. If the prompt doesn't appear, hit the Return key.

1.2 - Saving your work

You probably do not want to type all the file names again the next time you use the program. And later, when your database contains protein atom coordinates etc., you will want O to "remember" all this information in between sessions. However, the database exists only in the memory of the program while the program is active; as soon as the program is stopped, all information is lost, unless you have stored it on a disk in a file.

This is what you are going to do now. The first O command that you will learn is called "Save_db" (note the special character "_" in between the words "Save" and "db"; where is this character on your keyboard ?). The fact that this is the first command that you learn does indeed mean that this is the most important command ! If you type this command, O will make a "carbon copy" of your current database in a real file on a real disk in your current directory. The first time that you use this command, O needs to know what you want to call this file. You may call the file anything you like, but you are advised to use a short, meaningful name, and to add ".o" to that name, so that when you list the contents of your directory you will easily remember that this is an O database file. Do yourself a favour and do NOT use any special characters (*, %, $, #, spaces, etc.) in the names of any files; just use a-z, A-Z, 0-9 and the dot (.), minus (-) and underscore (_) signs:

O > save_db

As1> File_O_save is not defined.

As1> Enter file name [ binary.o]: p2.o

The second command that you will learn is called "Backup_db". It does exactly the same as "Save_db", but to a different file. Sometimes, software or hardware problems may occur which lead to corruption of your database file (the one created with the "Save_db" command). In that case, it's good to have another copy of this file with a different name:

O > backup_db

As3> File_O_Backup is not defined.

As3> Enter file name [ backup.o]: p2_backup.o

Since the database is a sort of file system, you can list its contents. The command to do this is called "Directory". When you type this command, O wants to know which datablocks ("files") you want to have listed. In this case there is no default value indicated in square brackets, but if you just hit the Return key, you will get a listing of all of them. You may also type an asterisk (*) which O takes to mean "all datablocks" (the asterisk is called a "wildcard").

O > directory

Heap> Which param blocks :

Heap> .MENU_MAJOR_NAME C W 76

Heap> .MENU_MINOR_NAME C W 758

...

Heap> .OBJECT_OBJ_DISP_ATOMS T W 720

Heap> .OBJECT_VIS_DISP_ATOMS I W 20

Heap> 159 data blocks used, space for 2000

Heap> 1438 integer/real units used, space for 1000000

Heap> 1282 character units used, space for 200000

Heap> 23820 text units used, space for 500000

Each datablock ("file") has a name (e.g., ".MENU_MAJOR_NAME"), a type (I, R, C or T), a read/write-flag (R or W) and a size. The type can be Integer (0, 1, 2, -6342 and 853526 are examples of integer numbers), Real (0.0, 3.14 or -9823.87, for example), Character (which may be any text of up to 6 characters) or Text (which may be any text of up to 72 characters). The read/write-flag is R for datablocks which will not be saved when you save your database to a file (temporary datablocks); it is W for all your normal datablocks which will be saved. The datablock size of R and I datablocks tells you how many numbers a datablock contains. For example, the datablock ".MENU_INTEGER" contains three integer numbers, and ".MENU_REAL" contains two real numbers. For C-type datablocks, the size is the number of six-character strings in the datablock. For T-type datablocks, the size is the number of characters in the datablock plus one.

The datablock names may contain up to 25 characters (no spaces); they are printed by O in UPPERCASE, but O recognises them irrespective of how you type them. In other words, ".TIMESTAMP" and ".timestamp" are identical datablocks. This is a general feature of O: commands, parameters etc. may be typed in lowercase, UPPERCASE or MiXeD cAsE. There is one exception, namely names of files. The reason for this is that Unix DOES consider "p2.o", "p2.O" and "P2.o" and "P2.O" to be four DIFFERENT names.

At the bottom of the directory list, you see how many datablocks there are in your current database (and for how many there is room inside the computer's memory).

Now stop the program by issuing the command "Stop":

O > stop

As1> Saved

As1> Graphics released.

Note that O automatically saves your database when you stop the program. Check that the database files exist in your directory:

unix > ls -FartCos

1 lrwxrwxrwx 1 gerard 16 Dec 23 15:41 omac

1 lrwxrwxrwx 1 gerard 21 Dec 23 15:41 odat

84 -rw-r--r-- 1 gerard 42794 Dec 23 16:14 p2_backup.o

84 -rw-r--r-- 1 gerard 42794 Dec 23 16:16 p2.o

1.3 - Well, well, what's all this then ?

The next time that you use O (i.e., now), just go to your work directory and type "ono", followed by the name of the file to which you saved your O database:

unix > ono p2.o

... Run 4d_ono

... Executing /nfs/taj/alwyn/o/bin/4d_ono p2.o

... For gerard on rigel at Thu Dec 23 16:18:03 MET 1993

O > Use of this program implies acceptance of conditions

O > described in Appendix 1 of the O manual

O > O version 5.9.2 , Tue Nov 23 12:20:13 MET 1993

O > Loading p2.o

O > Maximum inter-residue link distance = 2.00

O > There were 23 residues.

O > 175 atoms.

O > Do you want to use the display? [Yes]:

O > Graphics board GL4DXG-4.0

O > Making visibility data structures.

O > Making visibility data structures.

O > O > Trackball on (F7KEY)

Now O doesn't need to read "menu.o" etc. again; instead, it asks you immediately if you want to use the display (i.e., the graphics window). Check that the database contains the same number of datablocks as it did the last time you used O:

O > dir ;

Heap> .MENU_MAJOR_NAME C W 76

...

Heap> .OBJECT_OBJ_DISP_ATOMS T W 720

Heap> .OBJECT_VIS_DISP_ATOMS I W 20

Heap> 159 data blocks used, space for 2000

Heap> 1438 integer/real units used, space for 1000000

Heap> 1282 character units used, space for 200000

Heap> 23820 text units used, space for 500000

Hey, wait a minute ! Shouldn't you type "Directory", then hit the Return key, wait for O to ask us which datablocks to list and then hit Return again ? The answer is: you may, but:

* first, you usually don't have to type the full names of O commands (see below)

* second, you may provide parameters to commands (i.e., answers to questions that O asks you when you execute a command) on the same line as the command. This is often done in this tutorial in order to save space. You, however, are encouraged to find out what the parameters for the O commands that you will learn are. You can find this out by checking the O Manual, or by simply typing only the command, hitting the Return key and letting O prompt you for the values of the parameters. * third, the semi-colon (;) is a special character in O: it means: "rather than me waiting for O asking me to provide a value for this parameter, I accept the default that O will come up with, whatever it may be"

About abbreviating commands: in O you only have to type the part of a command that makes it unique, i.e. such that it cannot be confused with the name of another command which starts with the same letter(s). Let's take the "Directory" command: all of the following command abbreviations are recognised by O as meaning "Directory":

directory, director, directo, direct, direc, dire, dir

However, "di" is not unique, since there are several other O commands whose names begin with the letters "di":

O > di

O > DI is not a unique keyword.

O > Directory is a possibility.

O > Dial_box is a possibility.

O > Dial_previou is a possibility.

O > Dial_next is a possibility.

O > Dist_define is a possibility.

O > DI is not a visible command.

1.4 - My first protein structure !

If you do not have a file called "p2a.pdb" in your work directory, ask your assistant for help. This file contains the coordinates of all non-hydrogen atoms of a protein called P2 myelin protein. This protein contains 131 amino-acid residues and a total of 1038 non-hydrogen atoms. The file is in so-called PDB format (have a look at its contents if you like, but don't change anything in it !).

Before you can draw this structure, you have to store the coordinates and names of the atoms in your O database. This can be done with the "Sam_atom_in" command:

O > s_a_i

Sam> Name of input file: p2a.pdb

Sam> O associated molecule name: p2a

...

Sam> Molecule P2A contained 131 residues and 1038 atoms

Note how the command name has been abbreviated ! O wants to know what the name of the coordinate file is (no default value). Also, you have to give the molecule a name (no more than 5 characters). This is necessary, since you may have coordinates of ten or fifty different proteins in your database; in that case, you have to be able to tell O of which structure you want to make a drawing, etc. Let's see if the structure exists in the database: O > dir *

Heap> .MENU_MAJOR_NAME C W 76

...

Heap> P2A_ATOM_XYZ R W 3114

Heap> P2A_ATOM_B R W 1038

Heap> P2A_ATOM_WT R W 1038

Heap> P2A_ATOM_Z I W 1038

Heap> P2A_ATOM_VISIBLE I W 1038

Heap> P2A_ATOM_SELECT I W 1038

Heap> P2A_RESIDUE_NAME C W 131

Heap> P2A_RESIDUE_TYPE C W 131

Heap> P2A_ATOM_NAME C W 1038

Heap> P2A_RESIDUE_POINTERS I W 262

Heap> P2A_RESIDUE_CG R W 524

Heap> P2A_CELL R W 6

Heap> P2A_SPACEGROUP T W 13

Heap> P2A_PDB_SCALE R W 12

Heap> P2A_DATE T W 25

Heap> 170 data blocks used, space for 2000

Heap> 10506 integer/real units used, space for 1000000

Heap> 2582 character units used, space for 200000

Heap> 22418 text units used, space for 500000

Perhaps you expected just one datablock ("p2a.pdb") ? However, contrary to Unix, O knows a thing or two about proteins (thanks to the files you gave it when you first created your database !). What O has done is to extract information about P2 myelin protein from the PDB file and stored it in three types of datablocks:

* datablocks which contain information about the protein as a whole (e.g., "P2A_SPACEGROUP")

* datablocks which contain one item of information for each residue in the protein (e.g., "P2A_RESIDUE_TYPE")

* datablocks which contain information about the individual atoms in the protein (e.g., "P2A_ATOM_XYZ")

Note that the names of the datablocks have structure:

* they all start with "P2A_"; remember that you called this molecule "p2a" !

* residue information is stored in datablocks whose names begin with "P2A_RESIDUE_"

* atomic information is stored in datablocks whose names begin with "P2A_ATOM_"

This structure makes it easy to use the "Directory" command and get only a list of datablocks related to your P2A molecule:

O > dir p2a_*

Heap> P2A_ATOM_XYZ R W 3114

Heap> P2A_ATOM_B R W 1038

Heap> P2A_ATOM_WT R W 1038 Heap> P2A_ATOM_Z I W 1038

Heap> P2A_ATOM_VISIBLE I W 1038

Heap> P2A_ATOM_SELECT I W 1038

Heap> P2A_RESIDUE_NAME C W 131

Heap> P2A_RESIDUE_TYPE C W 131

Heap> P2A_ATOM_NAME C W 1038

Heap> P2A_RESIDUE_POINTERS I W 262

Heap> P2A_RESIDUE_CG R W 524

Heap> P2A_CELL R W 6

Heap> P2A_SPACEGROUP T W 13

Heap> P2A_PDB_SCALE R W 12

Heap> P2A_DATE T W 25

The parameter "p2a_*" means: all datablocks whose names begin with "p2a_". Similarly, to get a list of all residue-related datablocks of molecule P2A, use:

O > dir p2a_resid*

Heap> P2A_RESIDUE_NAME C W 131

Heap> P2A_RESIDUE_TYPE C W 131

Heap> P2A_RESIDUE_POINTERS I W 262

Heap> P2A_RESIDUE_CG R W 524

And to get a list of all molecules in your database for which you have atomic coordinates, use:

O > dir *xyz

Heap> ALPHA_ATOM_XYZ R W 105

Heap> BETA_ATOM_XYZ R W 120

Heap> DI_ATOM_XYZ R W 30

Heap> P2A_ATOM_XYZ R W 3114

Oops - you never loaded molecules called "ALPHA", "BETA" or "DI" !? These three are special "mini-molecules" whose use will not be discussed now. Just remember to never call one of your own molecules "ALPHA", "BETA" or "DI" !

Let's see what kind of information some of the datablocks of your P2A molecule contain:

P2A_ATOM_XYZ = Cartesian X, Y and Z coordinates

P2A_ATOM_B = Atomic temperature factors

P2A_ATOM_Z = Atomic numbers (e.g., 6 for carbon atoms)

P2A_ATOM_NAME = Names of the atoms (e.g., N, CA, CD1)

P2A_RESIDUE_NAME = Names of the residues (e.g., 1, 2, B3)

P2A_RESIDUE_TYPE = Types of the residues (e.g., Ala, Trp)

Now, save and backup your database so that you won't have to read the molecule in again the next time that you use O.

1.5 - Connecting the dots

You are now ready to draw the structure of P2 myelin protein. This requires the following steps:

Tell O which molecule you want to use. This is done with the command "Molecule_name" (usually abbreviated "mol"):

O > mol

O > Current molecule has not been loaded.

Mol> Molecule code name []: p2a

In O, you can have many, many drawings in the graphics window. Each of these is called an "object", and, just like molecules, objects must have a name (so you can refer to them, for example when you want to change their colour, or when you want to delete one). The command to define such a name is called "Object_name" (often abbreviated "obj"). Names of objects may not contain more than 6 characters:

O > obj

Mol> Name of the new object [P2A ]: first

Now you have to start drawing things, for example, using the "Ca_zone" command (usually abbreviated "ca"). This command will draw lines between C-alpha atoms in neighbouring residues (provided their distance is not too large). O asks you which residues should be included in the drawing. Just accept the default ("all residues") for the moment:

O > ca

Mol> Ca zone [all molecule]:

Is there something wrong ? There's nothing on the screen ! Don't worry; O doesn't actually draw objects until you tell it that you have entered all the drawing commands for this particular object. You do this by issuing the "End_object" (abbreviated "end") command:

O > end

Still nothing ! But, there's one change: in the top right corner of the graphics window you can see the following:

On_off

^FIRST

This list of "words" is called the O "menu". Right now, there's not much there: an O command called "On_off" and -surprise !- the name of the graphics object that you just created: "FIRST", although it has a "caret" (^) in front of it. This means that the object really exists. The reason you can't see it is that you are somewhere in space -initially at (0,0,0)- and the protein is somewhere else. To fix that, type the following command:

O > centre_xyz 50 65 33

Yeah ! Bingo ! You have the drawing at the centre of the graphics window !

1.6 - Shake, rattle 'n' roll

Of course, looking at a static picture of a protein is not very interesting (you don't need expensive computers to do just that). You want to rotate, translate, zoom in, etc. In O, there are three different ways to do this:

* using the dials: you have eight dials on the "dial box"; in the bottom left corner of the graphics window you see what each dial is supposed to do. Verify this and try to explain what the "slab" dial does (first zoom in; then turn the slab dial counter-clockwise). If the rotations are too fast, type: "db_set_data .dial_real 1 1 0.15" (don't worry about the meaning of this command).

* using the mouse:

RIGHTMOUSE = xyz rotation

RIGHTMOUSE + SHIFTKEY = x/y translation

RIGHTMOUSE + SHIFTKEY + MIDDLEMOUSE = z translation

RIGHTMOUSE + MIDDLEMOUSE = zoom

RIGHTMOUSE + LEFTMOUSE = slab

* using a pseudo-dial box on the screen: first press the F6 key on your keyboard while the cursor is in the graphics window. You'll see the outlines of a new window; move it somewhere outside your graphics window and press the left mouse button. Now move the cursor to the box marked "Rot Y" and put it on top of the letter "Y". Press the left mouse button and keep it pressed down while you slowly move the mouse to the left and to the right.

Play around with each of these methods and use the mechanism that you are most comfortable with. Can you get a view perpendicular to the axis of a helix ? To get rid of the pseudo-dials, press the F6 key again (while the cursor is in the graphics window). To switch off the mouse control, press the F7 key.

Type "centre_xyz 50 65 33" again. Now the middle of your molecule is at the centre of the graphics screen again. This is because the point with coordinates (50, 65, 33) (Å) is close to the centre of the molecule.

However, if you enter an unknown molecule into your database, you will not know where its centre is. The following, general, procedure can then be used. Use the command "Sam_list_seq" to find out what the names of the residues in the molecule are:

O > s_l_s

Sam> Molecule name [P2A ]:

Sam> Name Type From To Centre Radius

Sam> A1 SER 1 6 50.17 57.02 19.38 2.19

Sam> A2 ASN 7 14 46.61 56.25 17.61 2.89

...

Sam> A130 LYS 1023 1031 38.01 55.56 27.25 3.64

Sam> A131 VAL 1032 1038 33.84 57.47 27.11 2.33

In this case, you have 131 residues with NAMES A1, A2, ... A131. Now you can use another command, "Centre_zone" to put the centre of gravity of the molecule at the centre of the screen. A "zone" is an important concept in O: it defines a stretch of one or more consecutive residues in one molecule. You define it by giving O the molecule name, followed by the names of the first and the last residue in the zone:

O > ce_zo p2a a1 a131

As4> P2A A1 A131 FIRST

As4> Centering on zone from A1 to A131

If you want to centre on a specific residue, type its name twice (or use ;):

O > ce_zo p2a a1 ;

As4> No object defined.

As4> P2A A1 A1 FIRST

As4> Centering on zone from A1 to A1

If you want to centre on the C-alpha atom of a certain residue, use the "Centre_atom" command:

O > ce_at p2a a131

If you want to centre on another atom than the C-alpha of a certain residue, use the same command, but add the atom name:

O > ce_at p2a a34 n

But, you may ask, how do you know which residues and atoms are which ? This is done by "picking" (a.k.a. "ID-ing"). This means that you move the cursor to one of the atoms in the picture and then press the left or middle mouse button. If you do this, you will see:

* a label appearing next to the picked atom (e.g., "A3 CA"; these are the name of the residue and the name of the atom, respectively)

* a text at the top of the graphics screen which tells you more about the atom; for example:
P2A A3 Lys CA , xyz = 44.97 54.24 20.97 ; B = 20.0 ; Z = 6 ;

Which two C-alpha atoms are not connected, even though they are in neighbouring residues ? Why are they not connected ?

There's another centre command which is useful in cases like these: "Centre_id". If you type this command, O expects you to pick an atom and it will centre on that atom. Use this command to centre on one of the two C-alpha atoms that are not connected in order to find out which residues they belong to.

Click on the text "^FIRST" in the graphics window with the left or middle mouse button. What happens ? Now type "^first" in the terminal window. What happens ? This demonstrates yet another important principle in O (about which you will learn more later): commands or parameters or even text (molecule names, for example) can be entered with the keyboard, or they can be put on the menu and then clicked.

1.7 - Question time !

(1) Explain the difference between working in Unix and working in O.

(2) Which of the following commands can you use in Unix, and which in O (and what do they do): ls, stop, rm, directory, cp, ca_zone, save_db, cat, centre_atom, jot ?

(3) Explain, define or describe the following concepts: default, caret, zoom, prompt, graphics window, wildcard, mouse, cursor, molecule, object, uppercase, underscore, asterisk, pseudo-dials.

(4) What is the difference between "real" and "integer" numbers ?

(5) What information about datablocks is listed with the "directory" command ?

(6) What is the minimal abbreviation of the following commands: save_db, sam_atom_in, ca_zone, centre_xyz, stop ?

(7) What are the full names and the parameters of the following commands: s_a_i, ce_zo, mol, obj, ce_xyz ?

(8) Suppose you have a protein structure in a file called "1guh.pdb". Which commands do you have to type (including their parameters) in order to read the structure into your database, to draw a C-alpha trace of it and to centre on the middle of the molecule ?

(9) Explain the difference between the "mol" and the "obj" command.

(10) Explain the concept of a "zone" as defined in O.

(11) Which commands can be used to centre on a particular point in space ?

(12) Move the cursor to your graphics window, type "dir *resid*" and hit the Return key. Explain your observations.

(13) Suppose that someone told you that there is a command called "CPK_object" in O. How can you verify that this command exists ? Describe what the command does and what its parameters are.

Your notes:

2.0 - Properties, structure and paint

In this chapter you will learn about properties of molecules, residues and individual atoms, and how to use these for painting your residues.

New O commands:

Clear_ID

Cover_Sphere

Delete_objec

Paint_case

Paint_colour

Paint_obj_at

Paint_obj_zo

Paint_object

Paint_proper

Paint_ramp

Paint_zone

Read_formatt

Sphere_centr

Write_format

YASSPA

Zone

Your notes:

2.1 - More detail

Start O again. The first thing you should notice is that the object that you created last time is still in the graphics window. O stores the drawing instructions in the database and these are therefore kept in between sessions.

The object that you have, "FIRST", is a simple C-alpha trace. If you want to see side chains, you may use the "Zone" command. Type "Zone" followed by the names of the first and the last residue of the zone of residues whose sidechains you want to see:

O > mol p2a

O > obj bit

O > zone a5 a14

O > zone a103 a109

O > zone a131 ;

O > z a1 a1

O > end

Note that the graphics object called "bit" contains several zones of residues and two individual residues. Also note that different chemical elements are drawn in different colours. What is the colour of oxygen atoms ? And of nitrogen atoms ? Switch the two objects on the screen off and on. What is the name of the tryptophan residue that you have drawn ? Switch all objects on. Centre on the N-epsilon atom of the arginine residue that you have drawn.

Often you will want to draw residues that are close to a certain atom or residue. If you want to draw, say, all residues that are within 5 Å of the N-epsilon atom of the current arginine, you may use the "Cover_sphere" command. This command can be given in two different ways:

* cover_sphere residue_name radius

* cover_sphere residue_name atom_name radius

O > obj cover

O > cov_sph a106 ne 5

O > end

Experiment with the different ways of issuing the "Cover_sphere" command. Which residues are drawn if you use a 5 Å radius around the N-epsilon atom ?

Sometimes, you don't want to draw residues close to a particular atom, but rather those close to the current centre of the screen. In that case, use the "Sphere_centre" command; the only parameter of this command is the desired radius in Å. Centre on the middle of the molecule and use this command to draw all residues within 8 Å of this point:

O > ce_zo p2a a1 a131

As4> P2A A1 A131 SPH

As4> Centering on zone from A1 to A131

O > obj sph

O > sphere 8 end

Which aromatic residues are drawn now ?

Note that in the last line above, you typed "sphere 8 end". In other words, you typed TWO O commands on one line. This is perfectly valid ! You may even issue three, four or more commands on a single line.

Now make an object called "TEST" which contains all residues that are within 6 Å from the C-alpha atom of the N-terminal residue; type all commands on a single line. How many leucines are drawn ?

By now, you probably have been clicking on quite a few atoms. The command "Clear_id" can be used to remove the labels from the graphics screen. Try this.

You may now also remove all objects except "FIRST". To do this, use the "Delete_object" command:

O > de

Mol> Objects = FIRST BIT COVER SPH TEST

Mol> Object name ( <CR> = exit ) : test

Mol> Objects = FIRST BIT COVER SPH

Mol> Object name ( <CR> = exit ) :

O > del cover sph

Mol> Objects = FIRST BIT

Mol> Object name ( <CR> = exit ) :

O > del bit ;

Compare the different ways of using this command in this example.

2.2 - Structure

It's time to do something non-trivial which is relevant for proteins. O contains a command that will figure out where in your protein the alpha helices and the beta strands are. It is called "YASSPA" ("Yet Another Secondary Structure Prediction Algorithm"). You have to execute this command twice, once to find the helices and once to get the strands. This command uses two of the three "mini-molecules" that you encountered earlier (which two, do you think ?). For each residue, O considers the two neighbouring residues on both sides as well, and uses their C-alpha coordinates to decide if the central residues are in a helix or a strand. Just repeat the commands that follow:

O > yasspa

Util> Molecule name ([P2A ]):

Util> Template molecule name ([alpha]):

Util> Cuttoff ([0.5Ang]):

Util> Template size : 5 residues.

Util> There were 17

O > yasspa p2a beta 0.8

Util> Template size : 5 residues.

Util> There were 75 Well, that's not very informative ... P2 myelin protein apparently contains 17 residues in helices and 75 in strands. But, hey, didn't O store a lot of information in the database ? Check if there are any new datablocks related to molecule P2A:

Heap> P2A_ATOM_XYZ R W 3114

Heap> P2A_ATOM_B R W 1038

...

Heap> P2A_DATE T W 25

Heap> P2A_MOLECULE_TYPE C W 2

Heap> P2A_MOLECULE_CA C W 1

Heap> P2A_MOLECULE_CA_MXDST R W 1

Heap> P2A_ATOM_COLOUR I W 1038

Heap> P2A_RESIDUE_2RY_STRUC C W 131

In fact, there are five new ones (which ?). But the one created by "YASSPA" must be "P2A_RESIDUE_2RY_STRUC". How do you access the information in it ?

The first method is by looking at the contents of the datablock. This can be done with the "Write_formatted" command. The command has three parameters:

* the name of the datablock to be written

* the name of the file to which it should be written, OR a semi-colon (;) which means: "write to the terminal window"

* the format (don't worry about this; just use a semi-colon)

O > wr P2A_RESIDUE_2RY_STRUC ;;

P2A_RESIDUE_2RY_STRUC C 131 (1x,5a)

BETA BETA BETA

BETA BETA BETA

...

BETA BETA BETA BETA

Apparently, YASSPA has written the word "ALPHA" for every residue in a helix and "BETA" for residues in strands. What about the other residues ?

There is one place where a helix is followed immediately by a strand. This is not very likely to be true (it just shows that YASSPA isn't perfect). Therefore, write the datablock to a file and change the secondary structure assignment for the neighbouring ALPHA/BETA residues to "nothing", i.e. spaces, by editing the file (do this in Unix).

Once you have edited the file, you have to get it back into O. To do this, use the "Read_formatted" command (the only parameter of this command is the name of the file). After that, verify that the datablock has indeed been changed and read correctly by typing its contents to the terminal window again.

Now you have a sort of list of the secondary structure types of the residues in P2 myelin protein. But you may like a more graphical representation, for example a drawing where each residue is coloured according to its type (e.g., helix red, strand green, rest yellow). In the next section you will learn how to do this.2.3 - Painting

Colour is an extremely powerful means of conveying information in molecular drawings. Within O there are dozens of predefined (and zillions of user-definable) colours available. The command "Paint_colour" can be used to select a colour, or to get a list of all predefined colours (by using a question mark, ?, as parameter):

O > paint_colour ?

Paint> Available colors:

Paint> aquamarine black blue

Paint> blue_violet brown cadet_blue

Paint> coral cornflower_blue cyan

Paint> dark_green dark_olive_green dark_orchid

Paint> dark_slate_blue dark_slate_gray dark_slate_grey

...

Paint> thistle turquoise violet

Paint> violet_red wheat white

Paint> yellow yellow_green

Paint> Error condition [Colour name not in database] in askcol

You should still have the "FIRST" object on the graphics screen (if not, draw it again). If you want to paint this object in a nice colour, e.g. dark_slate_blue, then first select this colour with the "Paint_colour" command. Next, use the "Paint_object" command and either type the name of the object, or click on an atom in the object that you want to colour:

O > pai_col

Paint> Colour? [orange]: dark_slate_blue

O > pai_object first

Paint> FIRST

If you have some time to spare, and if you are curious to find out what all these predefined colours are, then type the following commands (you will learn later what the second one means):

O > read omac/colour_demo.odb

O > @col

Cute, eller hur ? Select a colour that appeals to you and use it to paint your "FIRST" object. Delete the "ALL_COL" object and centre on your molecule again. Whenever you want to see the colours again, just type "@col".

Instead of colouring an entire object, you may also paint a zone, a residue or even a single atom inside an object. Use the commands "Paint_obj_zone" and "Paint_obj_atom" to do this. For both commands you may either type the parameters in the terminal window, or you may pick one or two atoms to identify the zone/residue/atom:

O > pa_col medium_forest_green

O > pai_obj_zo p2a a1 a15 first

Paint> P2A A1 A15 FIRST O > pa_col slate_blue pai_obj_zo p2a a131 a131 first

Paint> P2A A131 A131 FIRST

O > pa_co red pa_obj_at p2a a1 ca first

Paint> P2A A1 CA FIRST

Note that up until now, you have been colouring OBJECTS, but you haven't changed the colours of the atoms in the database. In O, each atom has a colour associated with it (which datablock contains the atom colours of your P2 myelin protein molecule, do you think ?). The default is to colour carbons yellow, oxygens red, nitrogens blue, sulphurs green etc. In addition, each molecular graphics object has colour information associated with it. So even though you coloured the C-alpha atom of the N-terminal residue red in the "FIRST" object, this atom still has the colour yellow associated with it in the P2A molecule in the database !

Verify this by drawing a zone containing only this residue. It is important that you realise the difference between colouring an object (zone/atom) and a molecule (zone/atom).

So far, you have only been colouring parts of objects that you selected yourself. This is useful, for instance, if you want to paint a particular helix, or residues near the active site or ligand-binding cavity. An even more interesting application of painting is colouring your molecule (i.e., not the object !) according to certain residue or atom properties. Residue properties are all those which are stored in datablocks called "P2A_RESIDUE_xxx", atom properties are stored in datablocks called "P2A_ATOM_xxx".

There are four O commands which you can use to change the colour of your molecule (on a per-residue or per-atom basis): "Paint_zone", "Paint_case", "Paint_property" and "Paint_ramp".

The "Paint_zone" command is the equivalent of the "Paint_obj_zone" command, except that "Paint_zone" actually changes the colours associated with the atoms. Any objects that were generated previously retain their old colours (since the colour information is stored for each object). Verify this as follows:

O > paint_zone

Paint> What molecule [P2A ]:

Paint> Residue range [all molecule]: a16 a88

Paint> Colour? [red]: medium_blue

O > obj test zo ; end

The "Paint_case" command is used to colour all atoms for which a certain property has one of a series of possible values in possibly different colours. Try the following:

O > mol p2a pai_zon ; ; white

O > pain_cas

Paint> Colour-case a property in molecule P2A

Paint> Property [atom_z] : residue_type

Paint> How many cases [8] ? 4

Paint> Enter property values [ 1 2 3 4] :

Paint> Property value 1 : trp

Paint> Property value 2 : his

Paint> Property value 3 : phe

Paint> Property value 4 : tyr Paint> Enter 4 colour names: Paint> Colour? [white]: blue

Paint> Colour? [blue]: red

Paint> Colour? [red]: green

Paint> Colour? [green]: yellow

O > obj test2 zo ; end

What you have done is this:

* coloured all atoms in your molecule white

* coloured certain atoms in other colours, in case their residue type is Trp (blue), His (red), Phe (green) or Tyr (yellow)

* drawn an object with this colouring scheme

From this object you can see immediately how many His, Trp, Phe and Tyr residues you have and where they are in the structure and with respect to one another. Which two phenylalanines, which are close in space, have almost parallel stacked rings ?

Of course, you can also use properties of individual atoms to colour them. Try to figure out what happens in the following examples:

O > pai_zon ; ; white

O > pai_case atom_name 4 n ca c o blue yellow red magenta

O > obj test3 zo ; end

O > pai_zo ;; white

O > pa_ca atom_z 3 6 7 8 green blue red

O > obj test4 zo ; end

Now, see if you can paint the entire protein white, but all arginines, lysines and histidines blue and all glutamates and aspartates red ! Why might this be interesting ?

The "Paint_property" command is similar to the previous one, except that you may paint using comparison operators (=, > etc.) rather than just for a few individual cases. Explain what happens in the following examples:

O > pai_zon ; ; white

O > pai_prop res_type = gly red

O > pai_prop res_type = ala red

O > pai_prop res_type = trp blue

O > pai_prop res_type = arg blue

O > obj test5 zo ; end

O > pai_zon ; ; white

O > pai_prop res_name < a61 red

O > pai_prop res_name >= a61 blue

O > obj test6 zo ; end

Do you think you can colour your molecule according to the secondary structure assignment now ? Here's how it works:

O > pai_zon ; ; white

O > pai_prop res_2ry = beta sky_blue

O > pai_prop res_2ry = alpha orange_red

O > obj yasspa ca ; end

Check that the two residues for which you changed the secondary structure assignments are indeed coloured white !

The final paint command that you will learn about in this chapter is "Paint_ramp". You can only use this with numerical properties. The example which you shall use here is that of the "O internal residue counter"; for the first residue in your sequence, this number is 1, for the next 2, etc. In this way, you can paint each residue depending on where it is in the sequence. What the "Paint_ramp" command does is to use the value of the property to calculate a colour which lies in between two "extreme colours" which you define. For example:

O > O > pa_ram

Paint> Colour-ramp a property in molecule P2A

Paint> Property [residue_irc] :

Paint> Minimum and maximum value of property [1 131] :

Paint> First colour [red] : sky_blue

Paint> Second colour [blue] : orange_red

O > obj test7 ca ; end

You now have a drawing of the C-alpha trace in which the N-terminal residue is coloured sky blue, the C-terminal one orange red, and all others have intermediate colours which gradually change from blue to red as you move along the sequence. Using this colouring scheme makes it much easier to follow the chain when you look at the picture !

If you have time left, play around with some of the paint commands. Afterwards, reset the colours of your molecule such that all carbon atoms are yellow again, etc. Don't forget to backup your database before you stop !

2.4 - It's the Spanish Inquisition !

(1) Explain the difference between the "zone" and the "ca_zone" command.

(2) Which commands do you have to type (including their parameters) in order to generate a single object which contains a C-alpha trace of P2, all atoms of both the first and the last residue, plus all atoms of the residues that lie within 5 Å of the C-alpha atom of residue A83 ?

(3) How many Å fit into one meter ?

(4) Which amino acid types are aromatic ? Which are hydrophobic ? Which are charged ?

(5) How many O commands does the following line contain: "mol x obj y ca ; zo a1 ; zo a5 a6 sph 8 end" ?

(6) What are the parameters of the following O commands: cover_sphere, zone, yasspa, write_formatted, paint_obj_zone, paint_case, paint_property ?

(7) Explain the difference between "Paint_zone" and "Paint_obj_zone".

(8) Which command (including all parameters) would you type to reset the default colours for all atoms of a molecule ?

(9) In P2 myelin protein, which residues are in the two helices ? How many strands are there ? How many residues are in the longest strand ?

(10) One of the atomic properties is the so-called isotropic temperature factor (or B-factor; which datablock is this ?). This number is a measure for the mobility (or disorder) of the atom; the lower the B-factor, the less mobile and the better defined is the atom. Make a zone object of P2 myelin protein in which the atoms are coloured by their B-factors; make atoms with low temperature factors blue and those with high temperature factors red. Which residues contain atoms with the highest B-factors ? Centre on the C-zeta atom of arginine A106 and draw all residues within an 8 Å radius. Is any of these residues red, orange or yellow ? What does this tell you ?

(11) Suppose the following commands have been issued for a certain molecule (not P2): "pa_zone ;; white pa_prop res_type = trp blue pa_prop res_type = his magenta obj q1 zo ; end pa_prop res_name < A100 red obj q2 zo ; end pa_case atom_z 4 6 7 8 16 yellow blue red green". Determine which colours the following atoms will have (a) in object "q1", (b) on object "q2", and (c) in the database: the C-alpha atom of Trp A91, the N atom of His A173, the S-gamma atom of Cys A12.

Your notes:

3.0 - What's on the menu today ?

In this chapter you will learn how to use and customise the O menu on the graphics window, as well as how to change some of the default settings of the program.

New O commands:

Clear_flags

Connect_File

Db_Set_data

Menu_control

Your notes:

3.1 - What's a menu ?

Start up O again. If you look at the top right corner, you see the O command "On_off" and beneath it a list of your graphics objects, each preceded by a caret (^). You have already learned that this caret is a special O command which switches an object on and off. You have also learned that you can execute this command in two different ways:

* by clicking on the text "^OBJECT" on the graphics screen

* or by typing the string "^OBJECT" on the keyboard in the terminal window

This is a general feature of O: every command and parameter can be input in these two different ways (in the next chapter you'll learn about even more ways of doing this). The list of "words" (O commands, object names etc.) on the right of the graphics window is called "the menu". And, yes, you can put ANY STRING on this menu. For example, to put the "Clear_id" command on the menu, type the following (exactly as shown):

O > menu clear_i on

You'll see that the text "Clear_ID" has been appended to the menu; it is shown in purple and with a capital "C". The latter means that O has recognised the fact that "clear_i" is a unique abbreviation of the built-in O command "Clear_ID". Now, click on a few atoms and type "clear_id" on the keyboard. Click some more atoms and bring the cursor over the text "Clear_id" on the menu.

Now click on the text "On_off", which is at the top of the menu; watch closely to see what happens to the menu. Click it again; now nothing happens. Can you explain in your own words what the "On_off" command does ?

3.2 - Fast food

There is a quick way to put several O commands at the same time on the menu. O commands are grouped internally in so-called "Major Menus". The command to put them on the menu (and to remove them from it) is the same that you used above, namely "Menu_control". Execute this command, and when O asks you for the name of a major menu, just type a question mark (?):

O > menu

As2> ? gives list of menu names

As2> Major menu? (<cr> = refresh)?

As2> Available menus:

As2> Assorted_1 Draw_Mol O_heap_1 Sketch

As2> Lsq_align Assorted_2 Map Slider

...

As2> Object_4 Object_5

Let's put the major menu called "Draw_Mol" on the menu:

O > menu

As2> ? gives list of menu names

As2> Major menu? (<cr> = refresh)draw_mol

As2> [On]/off:

As2> Colour? (<cr> = no change): orange

O > on_off

All the O commands that have been added to the menu should be quite familiar to you by now, except one (which one ?) which will be discussed later in this chapter.

Now make a new object called "P2A" which is a C-alpha trace of the entire P2 myelin protein molecule, but use ONLY the menu to issue the commands:

{ click "molecule_na" }

O > Mol> Molecule code name [P2A ]:

{ hit the Return key }

{ click "Object_name" }

O > Mol> Name of the new object [P2A ]:

{ hit the Return key }

{ click "Ca_zone" }

O > Mol> Ca zone [all molecule]:

{ hit the Return key }

{ click "End_object" }

Let's add another useful major menu, "Assorted_1":

O > menu

As2> ? gives list of menu names

As2> Major menu? (<cr> = refresh)assorted_1

As2> [On]/off:

As2> Colour? (<cr> = no change): yellow

O > on

Again you have added several familiar commands to the menu. Now save your database (using the menu) and watch the menu item "Save_DB" closely while you click it ! If you didn't notice anything special, repeat it. Save your database again, and now watch what happens in the terminal window. Save your database for the third time, and now watch the top left corner of the graphics window. Can you explain what happened ?

The first time you saved your database, you saw that the menu item "Save_DB" changed colour for a little while. In general, when you hit an O command on the menu, this item will change colour as long as it is active. The second time, you noticed that as soon as O had executed the command, a new prompt (" O > ") was printed in the terminal window. The third time, you saw that the active command (in fact: ALL active commands) was displayed on the fourth line from the top of the graphics window. This is a useful feature, since you are often engrossed in your work in the graphics window, due to which you easily ignore whatever is written to the terminal window.

In the case of "Save_DB", the command is active until the database has been written to a file. Other commands stay active until you have supplied all necessary input (e.g., the "Ca_zone" command; verify this); after that they execute and when this is done, they are no longer active. There are still other commands (which you will run into later), that stay active until you explicitly switch them off.

Sometimes, you accidentally hit a wrong item on the menu. Depending on which item you hit, different things may happen:

* the command requires no further input and executes immediately. Could you give an example of such a command ?

* the command requires some input, but it can be rendered harmless. For example, if you accidentally hit "Delete_obje", just hit the Return key when O asks you which object should be deleted.

* other commands can often be deactivated by hitting one of the O commands that you just put on the menu, namely "Clear_flags".

* a few commands have their own "reset" command; you will meet some of these commands later.

3.3 - A la carte

On your current menu there are some commands which are easier to enter from the keyboard, for example the "Ca_zone" command. You will find that it is much quicker to type the command plus the commands that usually accompany it (which ?) plus the parameters. It's quicker to type "mol p2a obj t1 ca ; end" than it is to click on "molecule_na", go to the terminal window, hit the Return key, go back to the menu, click "Object_name", etc. etc. Also, you should NOT have the "Stop" command on the menu (it's too easy to hit this by accident and waste time by having to restart O).

Moreover, there are some commands that you don't know how to use yet, and there may be commands missing which you would like to add to the menu ("Paint_object", for instance).

There are two ways to change the menu. One you have already encountered, namely using the O command "Menu_control". For example:

O > menu wait_id off

O > menu connect_file off

O > menu pai_objec on on

However, if you want to make many changes, it is handier to use the following method. It won't come as a surprise that the list of menu items is stored as a datablock in your database:

O > dir *menu*

Heap> .MENU_MAJOR_NAME C W 76

Heap> .MENU_MINOR_NAME C W 758

Heap> .MENU_COLOUR I W 38

Heap> .MENU_DISPLAYED I W 38

Heap> .MENU_VISIBLE I W 38

Heap> .MENU_INTEGER I W 3

Heap> .MENU_REAL R W 2

Heap> .MENU T W 364

The datablock that you are looking for is called ".MENU". Write this datablock to the screen and subsequently also to a file (call this file "mymenu.odb", for example):

O > wr .menu ;;

.MENU T 28 12

Object_name

molecule_nam

Ca_zone

...

^YASSPA

^TEST7

^P2A

O > wr .menu mymenu.odb

Heap> Format:

Now use another terminal window and edit this file. You may change, add and delete items from the list at will. When you edit the file, you will notice that the first line is rather special. It may look like this: ".MENU T 28 12". The first item, ".MENU", is the name of the datablock; do not change this ! The second item, "T", identifies the type of the datablock, in this case Text (which other datablock types do you know ?). The third number is CRUCIAL: this number MUST be the number of items in the datablock. In the case of a text datablock, this is always equal to the number of lines in the file, minus one (why ?). The last number is the length of each item, in this case twelve characters. This means that only the first 12 characters of each menu item are read and stored. The simplest way to ensure that the number of items (lines) is correct is:

* edit the file and save it

* use the Unix utility "wc" ("word count"): the number of items in the text datablock is the first number that is printed, minus one

* update this number in the file, save it again and quit from the editor

unix > jot mymenu.odb

{ edit menu and save file }

unix > wc mymenu.odb

20 25 293 mymenu.odb

{ change number of items to "19"; save and quit }

Read the file back into O and watch the menu change:

O > read mymenu.odb

Your menu may now look as follows:

O > wr .menu ;;

.MENU T 19 12

Centre_ID

Clear_ID

Clear_flags

Yes

No

Save_DB

Paint_object

ca ; end

On_off

^first

^test

^test2

^test3

^test4

^test5

^test6

^yasspa

^test7

^P2A

Note that a menu item may contain spaces and even multiple O commands ("ca ; end") ! Now add the following two lines to your menu: "obj sphere" and "sph ; end". Check that they do what you would expect them to do.

3.4 - The customiser is king

Make a C-alpha trace of P2 myelin protein and make this the only visible object on the screen. Find the N-terminus of the protein and click on the C-alpha atom of the first residue. The label "A1 CA" should appear in red next to the atom. Now click on the C-alpha atoms of residues two, three, four and five. It looks as if you can have only three labels on the screen simultaneously. Can't you have more ? And why do they have to be coloured red ? And why can't you have the residue type, rather than or together with, the residue name as the label ?

The answers to these questions are: "yes, you can", "they don't" and "you can". It shouldn't come as a surprise that these features are controlled through datablocks again. Changing them from their default values to something else is called "customising". In fact, you have already customised something, namely the menu. The advantage of customising is that you can make O look/feel/work in a way that you are comfortable with.

There are several ways in which customisation can take place, but ALL of them involve changes in one or more datablocks in your O database:

* some groups of commands have a special "Setup" command with which you can set values for certain parameters which determine what the commands will do or how they will do it (you shall meet some of these later)

* other commands ask for a parameter the first time you execute them, and the value that you enter is stored and used in the future (the very first O command that you learned works this way)

* many default settings can be changed by editing a datablock

* some settings can be changed by adding a datablock to your database

Consider the "Save_DB" and "Backup_DB" commands: the very first time you executed these, they asked you to supply a file name. Ever since, O has been saving and backing up your database to these files. It's obvious that the names of these files must have been stored in the database (why is this obvious ?).

O > dir *file*

Heap> FILE_O_SAVE T W 73

Heap> FILE_O_BACKUP T W 73

Heap> FILE_DISPLAY_CONNECTIVITY T W 73

O > wr FILE_O_SAVE ;;

FILE_O_SAVE T 1 72

p2.o

O > wr FILE_O_BACKUP ;;

FILE_O_BACKUP T 1 72

p2_backup.o

Can you change the name of the backup file to something else ? Check that it actually works !

Okay, but how do you know which datablock to change, and how to change it, in order to change a certain parameter ? The only reliable source of such information is the O Manual. Most chapters end with a listing of important datablocks and an explanation of some of the items stored in them. Some datablocks are also described briefly in appendix 10.6 of this tutorial.

Type the following sequence of commands and check what happens:

O > wr .MOLEC_OBJ_REAL ;;

O > obj sph sphe 8 end

O > wr .MOLEC_OBJ_REAL ;;

Apparently, the datablock named ".MOLEC_OBJ_REAL" contains one real number which is the radius used by the "Sphere_centre" command. This explains how O comes up with the same value you used before as the default when you type only "Sphere_centre":

O > obj sph sphere

Mol> Residues will be chosen if within a radius of [ 8.00] :

O > end

Now type the following commands:

O > db_set_data .molec_obj_real 1 1 12.5

O > obj sph sphere

Mol> Residues will be chosen if within a radius of [ 12.50] :

O > end

The "DB_set_data" command can be used to set one or more items in an O datablock to a certain value. This command, however, can NOT be used to change datablocks of type Text; these you'll have to write to a file and edit yourself:

O > db_set_data

Heap> Full name of data block: .menu

Heap> Since this is very dangerous, we think

Heap> you should write the block out and then

Heap> edit it with the regular text editor.

Okay; now change the maximum number of labels as follows (and check that it worked; also explain what the parameters of this command are):

O > db_s_d .molec_obj_integer 9 9 10

Changing their colour is a bit more involved. O stores colours internally in a rather arcane way, namely as positive integers which are encoded and decoded by various O commands. The colour red, for example, is represented by the number "16711680" (which is not entirely intuitive ...). The simplest way for you to find the number that corresponds to a certain colour is the following:

* execute the "Paint_colour" command and enter the NAME of the colour you want to use

* O encodes this colour and the resulting number is stored in a datablock called ".ACTIVE_COLOUR"; hence, write the contents of this datablock to the screen

* use this number to set the eighth element of the datablock ".MOLEC_OBJ_INTEGER"

O > pai_col sky_blue

O > wr .active_colour ;;

.ACTIVE_COLOUR I 1 (10(x,i7))

3316172

O > db_s_d .molec_obj_integer 8 8 3316172

Click on some atoms and look what happens to the labels.

Right - now you've changed the number of labels and their colour. But how about the actual text of the labels ? This, and the text which is shown on the third line from the top of the graphics window whenever you click on an atom, is controlled via a so-called "template". Type the following:

O > dir *templ*

Heap> .MESSAGE_TEMPLATE T W 369

O > wr .MESSAGE_TEMPLATE ;;

.MESSAGE_TEMPLATE T 9 40

%MOLNAM %RESNAM %Restyp %ATMNAM, xyz =

atom_xyz

; B =

atom_b

; Z =

atom_z

atom_bone

;

residue_2ry_struc

The datablock ".MESSAGE_TEMPLATE" controls what is written at the third line of the graphics window. It is called a template since it contains elements whose VALUE will be substituted when an atom is actually clicked:

* "%MOLNAM" will be replaced by the name of the molecule

* "%RESNAM" by the name of the residue

* "%Restyp" by the type of the residue

* "%ATMNAM" by the name of the atom you clicked on

* ", xyz = " is a text which will be printed literally

* "atom_xyz" (on a line by itself !) will be replaced by the X, Y and Z coordinates of the atom

* "; B = " is a literal text again, etc.

Edit this datablock so that it looks as follows (remember to update the number of lines in the datablock !):

O > wr .MESSAGE_TEMPLATE ;;

.MESSAGE_TEMPLATE T 5 40

%MOLNAM %RESNAM %RESTYP %ATMNAM ; Coords

atom_xyz

; Sec Struc

residue_2ry_struc

What effect did changing the word "%Restyp" to "%RESTYP" have ?

The contents of the labels is also controlled by a datablock, this one called ".ID_TEMPLATE". Write ".MESSAGE_TEMPLATE" to a file, edit this file, change the name of the datablock to ".ID_TEMPLATE" and design a useful label. Read the file in again and check the results.

O > wr .id_template ;;

.ID_TEMPLATE T 2 40

%Restyp %RESNAM %ATMNAM

residue_2ry_struc

3.5 - Connecting the dots differently

A little while ago, you came across the "Connect_file" command. Execute this command, and when O asks for a file name, just accept the default.

O > conn_fil

Mol> Connectivity file? [/nfs/taj/alwyn/o/data/all.dat]:

Mol> Maximum inter-residue link distance = 2.00

Mol> There were 23 residues.

Mol> 175 atoms.

The file which you just read in determines which residue types and atoms are recognised by O, and which bonds are to be drawn within each residue type. There is another ready-made file, in the same directory, called "o.dat". Use this connectivity file and draw a zone of the entire P2 molecule:

O > con_fil o.dat

Mol> Maximum inter-residue link distance = 6.00

Mol> There were 23 residues.

Mol> 113 atoms.

O > mol p2a obj o zo ; end

Now use Unix to copy "o.dat" to your own directory (call it "weirdo.dat"); edit the file such that, for amino acids, only the C-alpha atoms and the atom that is furthest from the C-alpha atom in the side chain are drawn (if two atoms are the same number of bonds removed from the C-alpha atom, select one at random). Use this file to draw another zone of P2 and check that you have changed the file correctly.

O > con_fil weirdo.dat

O > obj weird zo ; end

These connectivity files define explicit bonds for amino-acid residues. If you would read in a DNA molecule now, bonds would be drawn using a distance criterion (which may be wrong, especially if hydrogen atoms are included in the structure). Be sure to read either "all.dat" or "o.dat" in again when you're done with "weirdo.dat" !

3.6 - What's up, Doc ?

(1) Explain the two different ways in which the "menu_control" command can be used. What are the parameters of this command for both cases ?

(2) Why is it useful to have the semi-colon character (;) on the menu ?

(3) What may happen, or what can you do, when you accidentally hit a wrong command on the menu ?

(4) Save your database. Press down the key marked "Ctrl" (the Control key) and keep it down as you hit the "C" key. What happened ?

(5) Which four types of O datablocks do you know ? Give an example of each of these from your own database.

(6) What are the parameters of the "db_set_data" command ?

(7) What are the O colour codes for black, white, yellow, cyan and magenta ?

(8) What is the difference between using "%RESTYP" and "%restyp" in the message template ?

(9) Explain the difference between a zone drawn with "all.dat" and one drawn with "o.dat". If you can't, look at the entry for glycine in both files and compare these.

Your notes:

4.0 - Again and again and again ...

In this chapter you will learn how to create and use macros (files with sequences of O instructions which you execute often).

New O commands:

Bell_ring

If_yes_no

Message

No

Print

Spawn

Symbols

Terminal_ID

Wait_ID

Yes

!

#

$ (symbols)

$ (Unix)

@

Your notes:

4.1 - Recipes

In the previous chapter you learnt a lot about the O menu; now it's time to take a look at some recipes. As you have probably noticed by now, there are many sequences of O commands which you execute time and again. Some of these sequences are always identical, others just vary in one or two parameters (e.g., the name of the molecule or object, or the zone to which a command applies).

O contains a mechanism which allows you to type such sequences of commands only once, and to execute them as often as you like. This mechanism is called the "O macro facility". The idea is that you write little O "programs", i.e. series of O commands, in a file, and from then on "execute" this file, rather than typing the commands again.

Use an editor to create a file (in your work directory) called "yasspa.omac" which contains the following commands:

! yasspa.omac

print ... yasspa.omac ...

print First I will run YASSPA on molecule P2A

print Then I will make an object called "yasspa"

print In which the CA atoms are coloured red

print If they are in a helix or blue if they

print Are in a strand

message Please wait a little while ...

mol p2a

yasspa p2a alpha 0.5

yasspa p2a beta 0.8

paint_zone p2a ; white

paint_property res_2ry_struc = "ALPHA" red

paint_property res_2ry_struc = "BETA" blue

obj yasspa ca ; end

bell on_off

message Done

There are several new O commands inside this file:

* an exclamation mark (!) followed by any text marks a line as being a comment line; it will be ignored by O. Use this in order to document your macros so that you will remember what they do even if you haven't looked at them for half a year.

* "Print" followed by a text will make O type that text to the terminal window

* "Message" followed by a text will make O put that text on the second line from the top of the graphics window

* "Bell_ring" rings the terminal bell; use this to alert the user (usually, yourself) that input is required, or that a macro has finished its job

Now execute the macro by typing an "at" sign (@), directly followed by the complete and exact name of the file (no spaces !):

O > @yasspa.omac

O > Macro in computer file-system.

As4> ... yasspa.omac ... O > As4> First I will run YASSPA on molecule P2A

O > As4> Then I will make an object called "yasspa"

O > As4> In which the CA atoms are coloured red

O > As4> If they are in a helix or blue if they

O > As4> Are in a strand

O > O > O > Util> Template size : 5 residues.

Util> There were 17

O > Util> Template size : 5 residues.

Util> There were 75

O > O > O > O > O > O > O > O >

Well done - you have now written and executed your first O macro ! There is one little problem with this macro, though: it will only work for a molecule called "P2A". Also, the object will always be called "YASSPA", so you can't run it twice on different molecules (why not ?).

Normally, when you type "Molecule_name" followed by the Return key, O asks you which molecule you want to select. However, when O executes a macro, it expects all parameters to commands to be in that file. In other words, if you would use "mol" instead of "mol p2a", O would read from the next line, find the word "yasspa" and assume that this is the name of the molecule you want to use. Of course, you don't have such a molecule in your database and the macro doesn't work as you hoped.

Fortunately, there is a way around this. In a macro, you may replace any parameter by a question to the user, enclosed in two hash signs (#). For instance, you could replace the line "mol p2a" by something like: "mol # Which molecule should I use ? #". Do this, and also change the line "obj yasspa ca ; end" so that the user can type the name of the object himself or herself and can type either "ca" or "zone". Execute the altered macro to check that it works:

O > @yasspa.omac

O > Macro in computer file-system.

As4> ... yasspa.omac ...

O > As4> First I will run YASSPA on molecule P2A

...

O > O > O > Which molecule should I use ? p2a

O > Util> Template size : 5 residues.

Util> There were 17

O > Util> Template size : 5 residues.

Util> There were 75

O > O > O > O > O > What should I call the object ? yazoo

O > O > Do you want a CA trace or a ZOne (CA/ZO) ? zo

O > O > O > O >

How many aromatic residues are not in any helix or strand ? Can you write a macro which asks the user to enter the name of a molecule and then paints carbons yellow, oxygens red, nitrogens blue, sulphurs green, hydrogens magenta and all other elements white ? Execute this macro for your P2 molecule and check that it works all right.

4.2 - What's the ID ?

Often you will want to be able to select an atom or a zone of residues while executing a macro. For example, if you want to write a macro which draws a sphere of residues around any atom that you click on, using the radius that you used last. Try to write such a macro and execute it.

Once you've done this, it's probably time to execute the "Clear_flags" command ...

The problem is that O puts "Centre_ID" (which you probably used in your macro) on the list of active commands, ditto for the "Sphere_centre", then gets to and executes the "End_object" command and tells you that there are no bonds to be drawn. To make O wait after the "Centre_ID" until you have actually picked an atom, you have to insert the command "Wait_ID":

! sphere.omac

! pick an atom and draw a sphere of residues

centre_id

wait_id

obj sph

sphere ; end

Sometimes, when you want to look at a number of specific residues, you may want to enter the atom name via the keyboard (in particular if you are working with a molecule you're not very familiar with yet). In that case, use the "Terminal_ID" command. This tells O to expect a molecule name plus residue name plus atom name from the keyboard in the terminal window, rather than from a click in the graphics window:

! sphere_term.omac

! id an atom with the keyboard and draw a sphere of residues

centre_id

message "Sphere around which atom ?"

term_id # Mol, residue, atom ? # sph

obj sph sphere ; end

Type this macro in a file and execute it.

O > @sphere_term.omac

O > Macro in computer file-system.

O > O > O > Mol, residue, atom ? p2a a131 c

4.3 - What if ?

Occasionally, you may want to ask the user a question inside a macro, and, depending on the answer ("Yes" or "No", which, by the way, are also O "commands") execute one macro or another. For this purpose, you may use the command "If_yes_no". It has two parameters:

* the name of a macro which is executed when the user replies Yes

* the name of a(nother) macro which is executed in the case of a No

Both macro names must include the @ sign. Use this command to write three general O macros:

* paint_mol.omac - which asks the user to enter the name of molecule, selects this molecule and asks the user whether (s)he wants to colour this molecule according to secondary structure assignment, or such that aromatic residues are drawn in green and all others in white; after the "If_yes_no" it should reset the normal colours for the most common chemical elements

* paint_aromatic.omac - which colours the molecule such that aromatic residues are drawn in green and all others in white, and draws a zone of the entire molecule in an object with a user-definable name

* paint_yasspa.omac - which colours the molecule according to secondary structure assignment and draws a C-alpha trace in an object with a user-definable name

4.4 - Symbolically speaking

There is yet another way to make macros in particular operate in general cases. This is done with the "symbol mechanism" in O. The associated command is "Symbols":

O > symbol

As2> Here are the current symbols :

As2> .ID_M P2A

As2> .ID_R A131

As2> .ID_A CA

As2> Symbol name : user

As2> Symbol expansion (<CR>=delete symbol from list) : Gerard

As2> Symbol inserted.

You now have a symbol called "USER"; however, the CONTENTS, or value, of this symbol is the text "Gerard". You can use the value of a symbol, both in typed commands and in macros, by putting a dollar sign ($) immediately in front of it (e.g., "$USER"). Check that this works:

O > print ... My name is $user

As4> ... My name is Gerard O > symbol

As2> Here are the current symbols :

As2> .ID_M P2A

As2> .ID_R A131

As2> .ID_A CA

As2> USER Gerard

As2> Symbol name :

As you can see, there are three predefined symbols in O: ".ID_M", ".ID_R" and ".ID_A". Their VALUES are the name of the molecule/residue/atom, respectively, which was last identified. Click on any atom to verify that the values of these symbols change accordingly.

Usually, you will define symbols in macros as follows: "symbol my_symbol # Enter the value of ... #".

4.5 - Unix speaking

There are two ways to "talk to Unix" from within O. If you want to execute just one or two Unix commands from inside O, you may use the "$" command. It's a bit confusing that this command is called "$" (see the previous section), but that's the way life is. To execute a Unix command, type a dollar sign followed by one or more spaces and the entire Unix command (i.e., including all its arguments and parameters). For example, if you want to change the menu:

O > write .menu q ;

O > $ jot q

O > read q

If you want to do more Unix work, you may use the "Spawn" command. This hands control to Unix, until you type "exit" from there. However, with modern, window-based workstations this command is not really necessary anymore, since it's so easy to start a new terminal window to do with Unix whatever you want to do.

4.6 - Do-it-yourself !

Try to write general macros which do the following (if you haven't written them yet):

* cnosh_colours.omac - reset the normal colours for the elements C, O, N, S and H for the current molecule

* acid_base.omac - this should select a molecule; colour acidic residues red, basic ones blue and all others yellow; draw an object "ACIBAS" with a C-alpha trace and reset the colours using cnosh_colours.omac

* change_id.omac - this should ask the user for a colour code and the maximum number of labels which can be displayed simultaneously

* rainbow.omac - this should draw a C-alpha trace of a user-defined molecule, where the colours vary from red at the N-terminus to blue at the C-terminus

* yasspa.omac - this should run YASSPA and draw a C-alpha trace coloured according to secondary structure assignment

Check section 10.4 for some other useful macros. You'll learn a thing or two by studying other people's macros !

There are two more things that are good to know about macros:

* you can store macros in files, but if they are extremely general, you may want to have them in all your O databases. In that case, store the macro in a text datablock. Call the datablock "@your_macro", i.e. include the at sign (@) as the first character of the datablock name.

* this has an additional advantage, namely that you can use short names (which, in turn, makes it possible to put them on the menu !).

To demonstrate these two points, create a file which contains the following (call it "all_on_off.odb"):

! @all_on_off - macro to toggle ALL objects ON/OFF

@all_on_off t 6 30

on_off message Wait

write .menu q.1 ;

$ grep '\^' q.1 > q.2

@q.2

$ \rm q.1 q.2

message Done

! type: menu @all_on_off on on

Read the file into O, put the macro on the menu and execute it:

O > read all_on_off.odb

Heap> @all_on_off

Heap> type: menu @all_on_off on on

O > menu @all_on_off on on

O > O > Macro in database.

O > O > O > O > Macro in computer file-system.

O > O > O > O > O > O > O > O > O > O > O >

O > O > O > O > O > O > O >

Note that the comment lines in the O datablock file are now printed in the terminal window (use this to include instructions to the user on how to install and use the macro). Execute the macro again to restore your old menu.

4.7 - Inter-course fun

Yes, yes, chapters three and four are a trifle boring because they are so technical and have fairly little to do with proteins specifically. Therefore, treat yourself to an "Aha-Erlebnis"; type the following commands literally (don't worry about what the commands do, just sit back and enjoy yourself):

O > ce_zo p2a a1 a131

O > @omac/sketch_setup.omac

O > sketch_auto p2a ; ;

O > on_off

O > spin

The "Sketch_auto" command has created a macro inside your database which actually produces the picture. Can you find out the name of this macro ? Try to figure out what the macro does.

If you can't get enough, play around with some of the function keys ("F-keys", they are in a row at the top of your keyboard); you have already encountered F6 and F7, but you may also be interested in the keys F12, F10, F9 and F8.

4.8 - Tell me why

(1) Explain what a macro is.

(2) Explain how symbols can be defined, listed and used in O.

(3) Discuss the two different uses of the dollar sign in O.

(4) What is the difference between "print" and "message" ?

(5) What is the difference between "wait_id" and "terminal_id" ?

5.0 - What a super position !

In this chapter you will learn how to superimpose the structures of two similar proteins and how to quantify their similarity.

New O commands:

Copy_db

Db_delete

Db_kill

Dial_next

Dial_previou

Lsq_Paired_a

Lsq_explicit

Lsq_improve

Lsq_molecule

Lsq_object

Move_object

O_setup

Sam_atom_out

Your notes:

5.1 - All lipocalins are equal ...

... But some lipocalins are more equal than others, George Orwell might have said (if he had lived today and had been a protein scientist). This leads us to the subject of structural similarity of proteins. How do you find out if (or: which) proteins are similar ? How do you analyse this similarity, use it on the display, quantify it ?

The first question is the most difficult one; one answer might be "through experience", another "by looking at each and every structure that is or has been solved", a third "by using a sufficiently clever program". O cannot be used to detect similarities between different protein structures. Suffice it to say that there is an accompanying program (called "DEJAVU") which can do this. However, use of this program is well beyond the scope of this tutorial.

P2 myelin protein, the protein you've been looking at and playing with so far, is a lipocalin, or lipid-binding protein (LBP). There's a whole family of lipocalins; all of them have similar three-dimensional structures and they bind similar ligands. In a separate exercise, you have retrieved a PDB file containing the structure of one member of this family. The example below uses the coordinates of cellular retinol-binding protein, CRBP, to illustrate the use of O. However, since you have another protein to compare to P2, the names and types of the residues that you compare, as well as the numbers that you obtain, will be different from those shown below !

Read your lipocalin molecule into O, give it a sensible name, draw a C-alpha trace (coloured from red to blue going from N- to C-terminus) and a zone object. Check if your PDB file contains one or more than one copy of the protein. Centre on the middle of the (first) molecule. Run YASSPA (use your macro from the previous chapter !) and compare the positions of the helices and strands with those in P2 myelin protein (this may give you an idea of how to align the structures !). Use the paint commands ! Does your structure contain a ligand ? If so, generate a separate object with only the ligand in it (use the "Zone" command). Does it contain anything else ? If so, can you guess what it is ?

O > s_a_i crbp.pdb crbp

Sam> File type is PDB

Sam> Database compressed.

Sam> Space for 136501 atoms

Sam> Space for 10000 residues

Sam> Molecule CRBP contained 248 residues and 1236 atoms

O > sam_list crbp

Sam> Name Type From To Centre Radius

Sam> 1 PRO 1 7 5.32 -13.06 -14.35 2.44

Sam> 2 VAL 8 14 4.21 -9.66 -11.64 2.49

...

Sam> 133 VAL 1085 1091 -2.06 4.07 -2.92 2.77

Sam> 134 HIS 1092 1102 -3.00 5.35 -7.96 3.62

Sam> 200 RTL 1103 1123 16.33 4.35 -2.23 7.80

Sam> 201 CD2 1124 1124 2.80 -16.23 -11.66 0.00

Sam> 202 CD2 1125 1125 13.39 -12.80 3.63 0.00

Sam> 203 HOH 1126 1126 14.26 0.68 9.40 0.00

... Sam> 314 HOH 1235 1235 20.42 10.07 -5.77 0.00

Sam> 315 HOH 1236 1236 35.08 1.99 -6.12 0.00

O > mol crbp

O > pai_ramp

Paint> Colour-ramp a property in molecule CRBP

Paint> Property [residue_irc] :

Paint> Minimum and maximum value of property [1 248] : 1 134

Paint> First colour [blue] : red

Paint> Second colour [red] : blue

O > obj crbpt ca ; end

O > obj crbpz zone ; end

O > O > ce_zo crbp 1 134

As4> CRBP 1 134 CRBPZ

As4> Centering on zone from 1 to 134

O > @yasspa.omac

O > Macro in computer file-system.

O > Which molecule ? crbp

O > O > Util> Template size : 5 residues.

Util> There were 17

O > Util> Template size : 5 residues.

Util> There were 75

O > mol crbp obj ligand

O > zone 200 ; end

5.2 - Match-makers

The first thing you have to do is to get an initial idea of which residues in your molecule correspond to which residues in the structure (!) of P2 myelin protein. You could for example compare where the helices are.

In P2 the first helix (according to YASSPA) runs from Phe A16 to Leu A23, and the second from Leu A27 to Leu A35. In CRBP these helices run from Phe 16 to Leu 23 and from Val 27 to Leu 35, respectively. Apparently, P2 and CRBP are VERY similar.

Before you try to get an initial alignment, it's good practice to use a new O command to align two similar protein structures roughly by hand: "Move_object". When you type this command, O expects you to click on a molecular object. After that, you can use the dials (or the pseudo-dials if you hit the F6 key) to rotate and translate the object. Execute this command and type the name of your lipocalin object (i.e., NOT on P2 myelin protein). (Hint: if your PDB file contained more than one protein molecule, make a new object of only one of them and use that object here.) Use the dials to position your lipocalin on top of P2 as best as you can.

O > move_object

Mnp> What object is to be moved around ? yasspa

Mnp> Fragment pivot point: 0.000 0.000 0.000

Wait a minute: you can only translate the object (unless you use the pseudo-dials) ! Time for another O command: "Dial_previous". You'll probably want to put this command and it's little sister, "Dial_next", on your menu: O > menu dial_next on on

O > menu dial_prev on on

O > dial_prev

Check the bottom-left corner of the graphics window as you click these two commands in turn: they toggle the assignment of three of the physical dials between rotation and translation of your object.

Another thing you may want to do as soon as your molecular object is in the neighbourhood of P2 is to centre on the middle of P2:

O > ce_zo p2a a1 a131

As4> P2A A1 A131 YASSPA

As4> Centering on zone from A1 to A131

Continue rotating and translating your molecule until you're happy with the fit; then click or type "Yes" and answer the question that follows:

O > yes

Mnp> The trnasformation s saved in ([obj_rt]): crbp_to_p2

No, this has nothing to do with the "Formation of tRNAs" ... It merely shows that some programmers write better Fortran than English. In fact, O asks you for the name of a datablock in which the transformation that you applied by hand can be stored. Have a look at this datablock:

O > wr CRBP_TO_P2 ; (3f15.8)

CRBP_TO_P2 R 12 (3f15.8)

0.71129280 0.49543828 0.49860156

-0.64166176 0.74725312 0.17286676

-0.28693673 -0.44289240 0.84941959

38.35695267 58.46720123 27.90679932

It contains twelve numbers; the first nine constitute a unitary rotation matrix, the last three a translation vector (in Å); the twelve numbers together define a so-called (RT) operator. If the numbers in the operator are denoted R1 to R12 as follows:

R1 R2 R3

R4 R5 R6

R7 R8 R9

R10 R11 R12

then the relationships between the coordinates of a point (X',Y',Z') after application of the transformation and the point (X,Y,Z) from which it originates are:

X' = R1 * X + R4 * Y + R7 * Z + R10

Y' = R2 * X + R5 * Y + R8 * Z + R11

Z' = R3 * X + R6 * Y + R9 * Z + R12

Now find a pair of C-alpha atoms, one in P2 and one in your lipocalin, which are very close together now that you have applied the transformation by hand, for example two residues in corresponding helices. Find out what coordinates these two atoms have. For the atom in P2 you can simply click on it and read the coordinates from the information line in the graphics window. Unfortunately, you cannot click on atoms in your transformed object (since O still thinks that the molecule is at its old position, which it actually IS since you haven't changed the coordinates of the molecule in the database ! All you've done is to move an OBJECT around!). However, you could either make a new object of your molecule, centre on it and find the corresponding residue, or use the "Terminal_id" command:

O > term_id p2a a20

As1> Object name? [YASSPA]: p2a

{ read coordinates: 51.80 68.08 44.45 }

O > term_id crbp 20 ca ;

{ read coordinates: 23.12 1.87 4.99 }

Apply the transformation (which you saved after moving your object) to the coordinates of the C-alpha atom in your lipocalin to find the coordinates of the transformed atom (you may want to use the Unix calculator "xcalc" if you don't have a pocket calculator at hand):

X' = 0.711 * 23.12 + -0.642 * 1.87 + -0.287 * 4.99 + 38.357 = 52.16

Y' = 0.495 * 23.12 + 0.747 * 1.87 + -0.443 * 4.99 + 58.467 = 69.10

Z' = 0.499 * 23.12 + 0.173 * 1.87 + 0.849 * 4.99 + 27.907 = 44.00

In other words: the C-alpha of residue A20 in P2 myelin protein is at (51.80,68.08,44.45) and the C-alpha of the corresponding residue 20 in CRBP is at (52.16,69.10,44.00). What is the distance between these two matched C-alpha atoms after transformation of CRBP ?

Distance = SQRT ( (X1-X2)^2 + (Y1-Y2)^2 + (Z1-Z2)^2 ) =

= SQRT ( 0.36^2 + 1.02^2 + 0.45^2 ) =

= SQRT ( 1.3725 ) ~ 1.17 Å

How well did you do ? Did your C-alpha atoms lie within a distance of 1 Å, 1.5 Å, 2 Å or more ? One way to quantify the similarity between two protein structures is:

* superimpose the two molecules "as best as you can"

* count how many C-alpha atoms have distances less than x Å

* calculate the RMSD of these C-alpha atoms

The "RMSD" is the "root-mean-square distance"; it is calculated by summing the squares of the distances of the matching atoms, dividing by the number of atoms, and taking the square root of this number. For instance, if you have three matching atoms (for simplicity !) with distances of 1.17, 1.35 and 0.98 Å respectively, then the RMSD is calculated as follows:RMSD = SQRT ( (1.17^2 + 1.35^2 + 0.98^2) / 3 ) =

= SQRT ( 4.1518 / 3 ) ~ 1.18 Å

The RMSD is thus a kind of average distance between matched atoms, except that (due to the squares) larger distances are weighted somewhat heavier.

5.3 - Operator, what's the number ?

It's time to stow the calculators and to let O do some of the hard work. Aligning two structures usually entails three steps:

* finding an initial alignment (output from a program, manual rotation and translation, alignment of certain structural elements)

* getting an initial operator from this alignment

* optimising the alignment

You have already carried out the first step. Now there are two possible ways to proceed. The first is to identify stretches of corresponding residues in P2 and your lipocalin and feeding them to the O command "Lsq_explicit":

O > lsq_explicit

Lsq > Least squares match by explicit definition of atoms.

Lsq > Given 2 molecules A, B the transformation rotates B onto A

Lsq > What is the name of A (the not rotated molecule)? p2a

Lsq > What is the name of B (the rotated molecule)? crbp

Lsq > Now define what atoms in A [=P2A] are to be matched to B [=CRBP]

Lsq > Defining 3 names in P2A implies a zone and an atom name.

Lsq > Defining 2 names in P2A implies a zone and CA atoms.

Lsq > Defining 1 name in P2A implies the CA of that residue.

Lsq > Molecule CRBP just requires the start residue and atom name.

Lsq > A blank line terminates input.

Lsq > Define atoms from P2A (the not rotated molecule): a16 a23 ca

Lsq > Define atoms from CRBP (the rotated molecule): 16

Lsq > Define atoms from P2A (the not rotated molecule): a27 a35

Lsq > Define atoms from CRBP (the rotated molecule): 27

Lsq > Define atoms from P2A (the not rotated molecule):

Lsq > The 17 atoms have an r.m.s. fit of 0.618

Lsq > xyz(1) = 0.7280*x+ -0.6346*y+ -0.2595*z+ 37.6558

Lsq > xyz(2) = 0.6080*x+ 0.7725*y+ -0.1835*z+ 53.4470

Lsq > xyz(3) = 0.3169*x+ -0.0242*y+ 0.9481*z+ 32.3691

Lsq > The transformation can be stored in O.

Lsq > A blank is taken to mean do not store anything

Lsq > The transformation will be stored in .LSQ_RT_crbp_to_p2a

Make sure that you identify the molecules correctly, i.e. P2 is the "not rotated molecule" ! Also, be careful in specifying the zones which are to be aligned (check the residue names !). In this case, O was told to align the 17 alpha-helical residues on their C-alpha atoms. This gave an RMSD of 0.62 Å (twice as good as the example match calculated earlier). Compare the operator printed by O to the one you obtained by manual rotation and translation. They should be fairly similar. Note that the operator is stored in a datablock with a partially fixed name: ".LSQ_RT_" plus any name that you want to give it (use meaningful names !).

Make a new C-alpha trace of your lipocalin (call it "LSQO") and apply the new operator to it with the O command "Lsq_object":

O > lsq_obj

Lsq > Apply a transformation to an existing object.

Lsq > There are these transformations in the database

Lsq > CRBP_TO_P2A

Lsq > Which alignment [<CR>=restore a transformed object] ? crbp_to_p2a

Lsq > There is an object called FIRST

...

Lsq > There is an object called LSQO

Lsq > Which object ? lsqo

The alignment you have now is based solely on the alignment of the two helices. Fortunately, O contains a command which will try to improve the alignment by looking for long stretches of matching residues in both proteins. This command is called "Lsq_improve":

O > lsq_imp

Lsq > Least squares match by Semi Automatic Alignment.

Lsq > There are these transformations in the database

Lsq > CRBP_TO_P2A

Lsq > Which alignment ? crbp_to_p2a

Lsq > Given 2 molecules A,B the transformation rotates B onto A

Lsq > What is the name of molecule A [P2A ]?

Lsq > Zone to look for alignment [all molecule A] :

Lsq > What is the name of molecule B [CRBP ]?

Lsq > Zone to look for alignment [all molecule B] :

Lsq > What atom [CA] ?

Lsq > Number of atoms in A/B to look for alignment 131 134

Note that you don't have to type the entire name of the operator since O expects operators to begin with ".LSQ_RT_". Once you give the name of the operator, O "remembers" which two molecules were compared. The defaults are to look for matching residues in the two complete molecules and to use the C-alpha atoms.

After this, O starts looking for long connected fragments; when it's finished, it will calculate a new operator by fitting only these residues. The new operator, the number of matched residues (C-alpha atoms) and their RMS distance are printed:

Lsq > A fragment of 41 residues located.

Lsq > A fragment of 30 residues located.

Lsq > A fragment of 15 residues located.

Lsq > A fragment of 6 residues located.

Lsq > A fragment of 4 residues located. Lsq > Loop = 1 ,r.m.s. fit = 1.393 with 96 atoms

Lsq > x(1) = 0.7102*x+ -0.6767*y+ -0.1943*z+ 37.8776

Lsq > x(2) = 0.6238*x+ 0.7328*y+ -0.2719*z+ 53.9637

Lsq > x(3) = 0.3264*x+ 0.0719*y+ 0.9425*z+ 31.7407

With this new operator, O repeats the fragment search process. This continues until no further improvement can be achieved:

Lsq > 0Search for connected fragments.

Lsq > A fragment of 42 residues located.

Lsq > A fragment of 30 residues located.

Lsq > A fragment of 27 residues located.

Lsq > A fragment of 24 residues located.

Lsq > Loop = 2 ,r.m.s. fit = 1.207 with 123 atoms

Lsq > x(1) = 0.6830*x+ -0.7115*y+ -0.1655*z+ 38.0744

Lsq > x(2) = 0.6473*x+ 0.6945*y+ -0.3141*z+ 53.9228

Lsq > x(3) = 0.3385*x+ 0.1074*y+ 0.9348*z+ 31.6207

...

Lsq > 0Search for connected fragments.

Lsq > A fragment of 42 residues located.

Lsq > A fragment of 32 residues located.

Lsq > A fragment of 27 residues located.

Lsq > A fragment of 24 residues located.

Lsq > Loop = 4 ,r.m.s. fit = 1.270 with 125 atoms

Lsq > x(1) = 0.6795*x+ -0.7158*y+ -0.1610*z+ 38.1367

Lsq > x(2) = 0.6527*x+ 0.6900*y+ -0.3130*z+ 53.8917

Lsq > x(3) = 0.3