1.14: Biochemical Reactions - Biology

Understanding chemistry is essential to fully understand biology. Why?

A general understanding of chemistry is necessary to understand biology. Essentially, ourcells are just thousands of chemicals — made of elements like carbon, hydrogen, oxygen, nitrogen, phosphorus and sulfur — in just the right combinations. And these chemicals combine through chemical reactions.

Chemical Reactions

The element chlorine (Cl) is a greenish poison. Would you eat chlorine? Of course not, but you often eat a compound containing chlorine. In fact, you probably eat this chlorine compound just about every day. Do you know what it is? It’s table salt. Table salt is sodium chloride (NaCl), which forms when chlorine and sodium (Na) combine in certain proportions. How does chlorine, a toxic green chemical, change into harmless white table salt? It happens in a chemical reaction.

A chemical reaction is a process that changes some chemical substances into others. A substance that starts a chemical reaction is called a reactant, and a substance that forms as a result of a chemical reaction is called a product. During a chemical reaction, the reactants are used up to create the products.

An example of a chemical reaction is the burning of methane. In this chemical reaction, the reactants are methane (CH4) and oxygen (O2), and the products are carbon dioxide (CO2) andwater (H2O). A chemical reaction involves the breaking and forming of chemical bonds. When methane burns, bonds break in the methane and oxygen molecules, and new bonds form in the molecules of carbon dioxide and water.

Chemical Equations

A chemical reaction can be represented by a chemical equation. For example, the burning of methane can be represented by the chemical equation

CH4 + 2O2 → CO2 + 2H2O

The arrow in a chemical equation separates the reactants from the products and shows the direction in which the reaction proceeds. If the reaction could occur in the opposite direction as well, two arrows pointing in opposite directions would be used. The number 2 in front of O2 and H2O shows that two oxygen molecules and two water molecules are involved in the reaction. (With no number in front of a chemical symbol, just one molecule is involved.)

Conservation of Matter

In a chemical reaction, the quantity of each element does not change; there is the same amount of each element in the products as there was in the reactants. This is because matter is always conserved. The conservation of matter is reflected in a reaction’s chemical equation. The same number of atoms of each element appears on each side of the arrow. For example, in the chemical equation above, there are four hydrogen atoms on each side of the arrow. Can you find all four of them on each side of the equation?


  • A chemical reaction is a process that changes some chemical substances into others. During a chemical reaction, the reactants are used up to create the products.
  • In a chemical reaction, matter is always conserved.


  1. Define a chemical reaction.
  2. Describe the roles of reactants and products in chemical reactions.
  3. How does a chemical equation show that matter is always conserved in a chemical reaction?
  4. Knowing that water (H2O) forms from hydrogen (H+) and oxygen (O2), write a chemical equation for the formation of water from these two elements.

Rethinking glycolysis: on the biochemical logic of metabolic pathways

Metabolic pathways may seem arbitrary and unnecessarily complex. In many cases, a chemist might devise a simpler route for the biochemical transformation, so why has nature chosen such complex solutions? In this review, we distill lessons from a century of metabolic research and introduce new observations suggesting that the intricate structure of metabolic pathways can be explained by a small set of biochemical principles. Using glycolysis as an example, we demonstrate how three key biochemical constraints—thermodynamic favorability, availability of enzymatic mechanisms and the physicochemical properties of pathway intermediates—eliminate otherwise plausible metabolic strategies. Considering these constraints, glycolysis contains no unnecessary steps and represents one of the very few pathway structures that meet cellular demands. The analysis presented here can be applied to metabolic engineering efforts for the rational design of pathways that produce a desired product while satisfying biochemical constraints.

Examples of Condensation Reactions


The basic glycosylation reaction happens when a molecule with a glycosyl group, like a carbohydrate, attaches to a functional group on another molecule. One of the most important and prevalent forms of glycosylation in nature is called N-linked glycosylation which is a post-translational process that is essential for the proper folding of proteins, matrix attachment between cells and modulating the function of proteins. In this condensation reaction, an oligosaccharide (sugar molecule) known as a glycan binds to the nitrogen atom of a protein molecule and produces a water molecule in the process.


Phosphorylation is a condensation reaction that is important for the functioning of sugars, lipids and proteins and is critical when it comes to regulating the function of enzymes. One of the simplest phosphorylation reactions in nature is the phosphorylation of glucose which is the first step in glycolysis. When glucose is phosphorylated on the 6th carbon by adenosine triphosphate (ATP) with the help of the enzyme hexokinase, the byproducts are adenosine diphosphate (ADP) and phosphoric acid.

Polypeptide and Polynucleotide Synthesis

Amino acids can condense into polypeptide molecules (proteins), releasing water as a byproduct. Also, DNA and RNA are made via polynucleotide synthesis which is also a condensation reaction. Polynucleotides form when the phosphate group on one nucleotide molecule reacts with the hydroxyl group on the carbohydrate group of another nucleotide. During this reaction, a water molecule is formed and released.

The image above shows polynucleotide synthesis using the amine group (red) of one amino acid and the carboxyl group (red) of another amino acid. The condensation reaction forms a water molecule (blue).


Nylon is a man-made condensation polymer, meaning several identical molecular units are attached together using a condensation reaction. Nylon 66, one of the most important types of nylon, is made from adipic acid and hexamethylenediamine. The nylon polymer is formed by binding the amine groups on the hexamethylenediamine to the carboxylic acid groups on the adipic acid molecules in an alternating pattern. The byproduct of this condensation reaction is water.

The simplest form of nylon is Nylon 6 which is made from the amino acid 6-aminohexanoic acid. Water is also produced in this condensation reaction. A hydrogen molecule from the amine end of the 6-aminohexanoic acid splits off and is joined by the hydroxyl group from the carboxylic acid group at the other end of another molecule of 6-aminohexanoic acid. Then, the nitrogen and carbon atoms at the ends of each of the 6-aminohexanoic acid molecules bond to form the nylon 6 polymer.

The image above shows the chemical structure of Nylon 66 and Nylon 6, two products of condensation reactions that split off water molecules.


Dacron is the trade name for polyethylene terephthalate (PET), a man-made polyester that results from the reaction between terephthalic acid and ethylene glycol. A terephthalic acid molecule has a carboxylic acid group at each end and ethylene glycol has a hydroxyl (alcohol) group at each end. Water is formed when a carboxylic acid group and a hydroxyl group are split off. The two molecules then bind together at the ends via ester linkages.

The image above shows how terephthalic acid and ethylene glycol combine to form polyethylene terephthalate via a condensation reaction that releases water as a byproduct.

Lecture 5: Biochemistry 4

Download the video from iTunes U or the Internet Archive.

Topics covered: Biochemistry 4

Instructors: Prof. Robert A. Weinberg

Lecture 10: Molecular Biolo.

Lecture 11: Molecular Biolo.

Lecture 12: Molecular Biolo.

Lecture 13: Gene Regulation

Lecture 14: Protein Localiz.

Lecture 15: Recombinant DNA 1

Lecture 16: Recombinant DNA 2

Lecture 17: Recombinant DNA 3

Lecture 18: Recombinant DNA 4

Lecture 19: Cell Cycle/Sign.

Lecture 26: Nervous System 1

Lecture 27: Nervous System 2

Lecture 28: Nervous System 3

Lecture 29: Stem Cells/Clon.

Lecture 30: Stem Cells/Clon.

Lecture 31: Molecular Medic.

Lecture 32: Molecular Evolu.

Lecture 33: Molecular Medic.

Lecture 34: Human Polymorph.

Lecture 35: Human Polymorph.

I just wanted to spend the first couple minutes clearing up three issues. None is a major conceptual issue, but we like to focus on details and get them right, get them correct here as well.

Firstly, I misdrew a reaction last time that described why RNA is alkali labile, i.e., if we have high pH we call that an alkali pH, or an alkaline pH, actually, to use the adjective. And we said that hydroxyl groups can cause the cleavage of the phosphodiester bonds of RNA but not DNA. And the way I described that happening is that the alkali group causes the formation of this five-membered ring right here, two carbons, two oxygens and a phosphate. And that resolves eventually to this where there's no longer any connection with the ribonucleoside monophosphate below. And I drew it like this, without an oxygen, and that's a no-no because, in fact, in truth, and as many of you picked up, this reverts to a two prime hydroxyl. So, please note there's a mistake there. There's also a couple other mistakes. For example, in the textbook it gives you the impression that when you polymerize nucleic acids you use a monophosphate to do so.

And, if you listened to my lecture last time, that doesn't make any sense, because you need to invest the energy of a triphosphate in order to create enough energy to generate enough energy for the polymerization. The textbook is incorrect there.

Textbooks are written by people, for better or worse, and as such, like everything else, they are a mortal and fallible. So, the truth of the matter is, when you're polymerizing DNA or RNA you need one of the four ribonucleoside or deoxyribonucleoside triphosphates in order to donate the energy that makes possible this polymerization.

And please note that is a mistake in the book. Recall, as I said last time, the fact that ATP is really the currency of energy in the cell, and that its energy is stored and coiled up in this pent up spring where the mutual electrostatic repulsion between the three negatively charged phosphates carries with it enormous potential energy.

And some of that potential energy can be realized, during the synthesis of polymerization of nucleic acids by cleaving this bond here. One can also generate potential energy by cleaving this bond here. This is the alpha, the beta and the gamma-phosphate. And cleavage of either can create substantial energy, which in turn can, as we'll indicate shortly, be invested in other reactions. The reaction of polymerization. A second point I'd like to make to you is the following, and you'd say it's kind of coincidence. The currency of energy in the cell is ATP, adenosine triphosphate, we see its structure here, and this happens to be one of the four precursors of the RNA.

So, the same molecule is used in these two different ostensively unrelated applications. One, to polymerize to make RNA where genetic information is stored and conveyed.

Or, alternatively it's used here in this context in order to serve as a currency for energy. High energy as ATP. ADP with a little lower energy. AMP monophosphate with even lower energy. And you might ask yourself, scratch your head and say why is the same molecule used for these two different things?

In fact, there are yet other applications of these ribonucleosides which also seem to be unrelated to the storage or the conveyance of genetic information. And it is believed, probably correctly, that the reason why the same molecule is used for these totally different applications is that early in the evolution of life on this planet there really were a rather small number of biological molecules that existed. Indeed, as we'll mention again later, it's probably the case that the first organisms didn't use DNA as genomes. It's an article of faith with us that one stores genetic information in DNA molecules.

And I implied that quite explicitly last time. But, the fact of the matter is, it's probably the case that the first organism, the first pre-cellular life forms used RNA as the genetic material, RNA to store things, replicating RNA via double-stranded RNA molecules as a way of archiving genetic information. And only later during the evolution of life on this planet, when that later was we can't tell, but it could have been a hundred or two hundred years later. Obviously, if we're talking about the origin of life as between 3. and 3.5 billion years ago, we can't really localize that in time very well, but only later was DNA assigned the job of storing, in a stable fashion, genetic information. And as a consequence, we come to realize as well yet another discovery, which is that all the catalysts that we're going to talk about today, the enzymes as we call them, almost all modern-day enzymes are proteins. And we talked about them briefly before. But over the last 15 years, 20 years there's been the discovery that certain RNA molecules also posses the ability to catalyze certain kinds of reactions. When I was taking biochemistry, if somebody would have told me that, I would have called the psychiatric ward because that was such an outlandish idea.

How can an RNA molecule catalyze a biochemical reaction?

It doesn't have all the side groups that one needs to create the catalytic sites for reactions. But we now realize, on the basis of research which actually led to a Nobel Prize being awarded about five years ago, that RNA molecules are able to catalyze certain kinds of reactions. And that begins to give us an insight into how life originated on this planet because RNA molecules may have stored genetic information, as I said before, RNA molecules, or their precursors like ATP, may have been their currency for storing high energy bonds, as is indicated here.

And RNA molecules may well have been the first enzymes to catalyze many of the reactions in the most primitive life forms that first existed on this planet. And, therefore, what I'm saying is that as life developed in the first hundred or two hundred million years, who knows how long it took, gradually DNA took over the job of storing information from RNA and gradually proteins took over the job of mediating catalysis, of acting as enzymes to taking over the job from RNA molecules. Today there are certain vestigial biochemical reactions which we believe are relics, echoes of the beginning of life on earth, which are still mediated by RNA catalysts. We think that they are throwbacks to these very early steps, maybe even in pre-cellular life form where RNA was delegated with the task of acting as a catalyst.

We're going to focus a lot today on the whole issue of biochemical reactions and the issue of energy. And this gets us into the realization that there really are two kinds of biochemical reactions.

Some of you may have learned this a long time ago.

Either exergonic reactions that release energy, that produce energy as they proceed, or conversely endergonic reactions which require an investment of energy in order to move forward.

So, here, obviously, if this is a high energy state and we're talking about the free energy of the system, which is one way to depict in thermodynamic language how much energy is in a molecule, if we go from a high energy state to a low energy state then we can draw this like this and we can realize that in order to conserve energy, the energy that was inherent in this molecule, the high potential energy is released as this ball or this molecule rolls down the hill. And, therefore, the reaction yields energy, it's exergonic. And, conversely, if we want this reaction to proceed, we need to invest free energy in order to make it happen. The free energy happens to be, more often than not, in the form of chemical bonds, i.e., energy that can be invested, for example, by taking advantage of the potential energy stored in these phosphodiester, in these phosphate-phosphate linkages indicated right here.

Here, by the way, is the space-filling model of ATP just for your information. That's the way it actually would look in life, and this is the way we actually draw it.

Now, having said that, if we look at the free energy profile of various biochemical changes then we can depict them, once again, in this very schematic way here.

And, by the way, free energy is called G, the Gibbs free energy after Josiah Gibbs who was a thermodynamic wiz in the 19th century at Yale in New Haven. And here what we see is that the change in free energy between the reactants and the products is given by delta G. So, by definition, we start out the reaction with reactants.

And we end up at the end of the reaction with products.

And, overall, if the reaction is exergonic and will proceed forward, it releases energy. And the net release of energy is indicated here by delta G. But, more often than not, biochemical reactions that are energetically favored, that are exergonic actually can't happen spontaneously.

They don't happen spontaneously because, for various reasons, they have to pass through an intermediate state.

Which actually represents a much higher free energy than the initial reactants posses. And this higher free energy, that they need to acquire in order to move over the hill and down into the valley, is called the energy of activation, the activation energy.

And, therefore, if I were to supply these reactants with energy, for instance, let's say I were to heat up these reactants and therefore give them a higher degree of thermal energy which they might be able to use to move up to this high energy state.

I supplied them with free energy by giving them heat.

Then they might be able to move up to here and then roll down the hill.

But in the absence of actually actively intervening and supplying them that energy, they'll remain right here, and they may remain right there for a million years, even though in principle, if they were to reach down here, they would be much happier in terms of reaching a much lower energy state. To state the obvious, all these kinds of reactions wish to reach the lowest energy state possible. But in real-time it can't happen if there is a high energy of activation. Now, what do enzymes do?

As always, I'm glad I asked that question. What they do is they lower the energy of activation. And this is in one sense obvious, and in one sense it's subtle, because enzymes have no affect on the free energy state of the reactants, they have no affect on the free energy of the products. All they do is to lower the hump, and they may lower it very substantially.

And because they lower it substantially, it might be that some of the reactants here may, just through a chance, acquisition of thermal energy, be able to move over the much lowered hump and move down into this state right here. Now, the actual difference in the Gibbs free energy is totally unaffected.

All that happens is that the enzyme, by lowering the energy of activation, make this possible in real-time. The fact is that ultimately, if one were to plot many kinds of reactions, many reactions, as is indicated here, have a very high activation energy, and therefore we look at it like this. But there could be other reactions which might have an activation energy that looks like this, almost nothing at all. And these reactions could happen spontaneously at room temperature in the absence of any intervention by an enzyme. For example, let's say we're talking about a carboxyl group which discharges a proton. We've talked about that already. Well, that reaction happens spontaneously at room temperature. It doesn‘t need an enzyme to make it happen. It can happen because there's essentially not energy of activation. But the great majority of biochemical reactions do have such an activation energy, and therefore do require a lowering like this in order to take place.

Now, let's imagine other versions of the energy profile of a reaction.

And keep in mind that what I'm showing here on the abscissa is just the course of the reaction. You could imagine I'm not really plotting time. I'm just talking about a situation where to the left the reaction hasn't happened and to the right it has happened. Can you see this over there? Then I won't write over there. All right. Let's see if this works.

Boy, here we are in the 21st century and we still haven't worked this out.

OK. Everybody can see this right here, right? OK.

So, look. Let's imagine we have a reaction that looks like this, a reaction profile that looks like this, where these two energies are actually equivalent. OK? I've tried to draw them on.

Well, they're not exactly, but they're pretty much on exactly the same level. And let's say we start out with a large number of molecules right over here. Now, if there were an enzyme around, the enzyme might lower the activation energy and, in so doing, make it possible for molecules to tunnel through this hill and move over here. The fact that when a molecule gets over here it has the same free energy as over there means that the catalyst may, in principle, also facilitate a back reaction.

What do I mean by a back reaction? I mean going in exactly the opposite direction. And so, once molecules over here are formed, the energy lowering affects of the enzyme may allow them to move in both directions. And, therefore, what we will have is ultimately the establishment of an equilibrium.

If these two energy states are equivalent then, I will tell you, 50% of the molecules end up here and 50% of the molecules end up here. And here we're beginning now to wrestle between two different independent concepts, the rate of the reaction and the equilibrium state of the reaction.

Note that the enzyme has no affect whatsoever on the equilibrium state.

These two are at equal free energies, the equilibrium state.

Whether the energy barrier is this high or whether it's this high is irrelevant. The fact is if the enzyme makes possible this motion back and forth, the ultimate equilibrium state will be 50% of the molecules here and 50% of the molecules there.

And, therefore, the enzyme really only affects the rate at which the reaction takes place. Will it happen in a microsecond or will it happen in a day or will it happen in a million years?

The enzyme has no affect whatsoever on the ultimate end product, which in this case is the equilibrium.

Of course, there is a simple mathematic formalism which relates the difference in free energies with the equilibrium.

Here we might have a situation where 80% of the molecules end up at equilibrium over here and 20% end up here. Or, we might end up as a state where 99. % of the molecules end up here and 0.

% of the molecules end up here. But that ultimate equilibrium state is no way influenced by the enzyme. They just make it happen in real-time. And, therefore, to repeat and echo a point I made last time, if most biochemical reactions are to occur in real-time, i.e., in the order of seconds or minutes, an enzyme has to be around to make sure they happen.

In the absence of such an enzyme of its intermediation, it just won't happen in real-time. Even though, in principle, it's energetically favored. So, let's just keep that very much in mind in the course of discussions that happen. And let's just begin now to look at an important energy-generating reaction in the cell which is called glycolysis. We already know the prefix glycol.

Glyco refers to sugar. And lysis, L-Y-S-I-S refers to the breakdown of a certain compound. I am not going to ask you, nor is anyone else in the room going to ask you to memorize this sequence of reactions. But I'd like you to look at it and see what take-home lessons we can distill out of that, what wisdom we can learn from looking at such a complex series of reactions. Perhaps, the first thing we can learn is that when we think about biochemical reactions, we don't think of them as happening in isolation. Here I'm talking about, for example, in this case I could be talking A plus B going to C plus D, and there might be a back reaction to reach equilibrium.

And we're just isolating that simple reaction from all others around it.

But in the real world in living cells most reactions are parts of very long pathways where each of these steps here indicates one of the others, a step in the pathway. What we're interested in here is how glucose, which I advertised two lectures ago as being an important energy source, is actually broken down.

How does the cell harvest the energy, which is inherent in glucose, in order to generate, among other things, ATP, which we've said repeatedly is the energy currency? ATP is used by hundreds of different biochemical reactions in order to make them happen.

These other biochemical reactions are endergonic, they require the investment of energy, and almost invariably, but not invariably, but almost invariably the cell will grab hold of an ATP molecule, break it down usually to AMP or ADP.

And then utilize the energy, which derives from breaking down ATP, it will invest that energy in an endergonic reaction, which in the otherwise would not happen. So, here we reach the idea that perhaps by investing energy in a reaction, the equilibrium is shifted. Because by investing energy, actually, the cell is able to lower the free energy state between these two.

And that makes it possible for their equilibrium to be much more favored.

Let's look at this glycolytic pathway. Glycolytic refers, obviously, to glycolysis. And here we start out with glucose.

We're drawing it out flat rather than the circular structure we talked about last time. And let's look at what happens here, again, not because anyone wants you to memorize this, but because some of the details are in themselves very illustrative.

The goal of this exercise is to create ATP for the cell, but the first step in the reaction is actually totally counterproductive. Look at the first thing that happens. The first thing that happens is that the cell invests an ATP molecule to make glucose-6-phosphate.

I've advertised the goal of this is to generate ATP from ADP, adenosine diphosphate. But the first thing here, this is an endergonic reaction in which the cell invests energy to create this molecule here. So, this doesn't make sense.

But ostensively it must make sense, at one level or another, because you and I, we're all here, and everybody in this room, at least this moment is metabolically active.

All right. So, we've got this molecule here, glucose-6-phosphate. And this can isomerize.

You see, here's glucose-6-phosphate, fructose-6-phosphate.

And, the fact of the matter is, there's no oxidation reduction reaction here. It's just an isomerization.

And this molecule and this molecule are virtually in the same free energy state. It happens to be the case that their profile will look very much like the one I drew you before. Their energy profile will look like this. And one needs an enzyme to lower it, but there's no energy that needs to be invested in converting one to the other because they're very similar molecules and therefore incomparable free energy states. Now look at the next step.

The next step is again another ostensively totally counterproductive way of generating energy. Because, once again, ATP, the gamma-phosphate, its energy is invested in creating a dephosphorylated hexose, fructose 1, 6-diphosphate where the numbers refer obviously to the identities of the carbon.

And now we have a dephosphorylated fructose molecule.

And so here you can actually see what the three-dimensional, what we would imagine closer to what the three-dimensional structures of these molecules look like. And we shouldn't focus this time on whether it's this or this. For all practical purposes, let's just focus on this pathway here. And here, for the first time, what now happens is that this hexose is broken down into two trioses, i.e., into two three carbon sugars.

And this is a slightly exergonic reaction.

It yields, it happens without the investment of energy.

And there's an enzyme, once again, that's required in order to catalyze it. But let's be really clear now.

Now we have to follow the fate of two molecules.

The first triose and the second triose. They have different names, but we're not going to focus on the names. One thing you notice about these trioses is that they're readily interconvertible.

Once again, we can image that we have a situation that looks like this. These are flipping back and forth.

And therefore, for all practical purposes from our point of view, these two are equivalent because they can be exchanged virtually instantaneously one with the other. Now, so far we've actually expended energy. We haven't harvested energy. But, keep in mind, the old economic dictum you have to invest money to make money.

And that's what's going on here. The first thing that happens is we have an oxidation reaction. What's an oxidation reaction?

We want to strip some electrons, a pair of electrons off of this particular triose, the 3 carbon sugar.

And by stripping off a pair of electrons we donate the electrons from NAD+ to NADH. And here these structures are given in your book. But NADH, it turns out, is the electrons are pulled away from the triose and they're used to reduce NAD+ to NADH.

Keep in mind that in an oxidation reaction, one molecule that's being oxidized is deprived, is denied a pair of electrons.

The other molecule that's being reduced, in this case NAD, acquires a pair of electrons. And you can focus, if you want, about the charge of these molecules, one or the other. But, keep in mind, that in these oxidation reduction reactions, whether it's plus charged or minus charged is irrelevant. The real name of the game is the electrons. Forget about the protons, whether it has a plus charge or it's neutral. The real name of the game here is that two electrons are being used to reduce this molecule to this.

By the way, third mistake I forgot to tell you before, there's a double-bond in one of the pyrimidines in the book that doesn't make any sense. Whoever finds it gets a prize, but no one's figured out what the prize is yet. So, this double bond gets reduced. You see the difference between this and this over here. And this NADH, it turns out, is a high energy molecule. The street value of NADH is three ATPs, i.e., in the mitochondria NADH can be used to generate three ATPs, and that's worth something. So, NADH on its own is a high energy molecule. It can't be used for that many things, but it can be pulled into the mitochondria where it's converted to three ATPs.

So, we say, well, we're starting to make some money out of this investment because we've made, in fact, these NADHs.

See right here. Why do we say two NADHs?

Because there are two trioses we're working with, and each one of the trioses gives you an NADH. So, everything that's going on after this, starting from the top here, is now double because we're looking at the parallel behaviors of two identical three carbon sugars.

So, here we've so far generated, in principle, six ATPs.

How much did we invest already up to this point? Two.

We invested two but we harvested six. Already we're starting to make a little money because I told you the street value of an NADH is three ATPs on the black market. OK, so what happens next?

Next is another good thing. Each of the trioses, one can actually cause each of the trioses to generate an ATP molecule from an ADP. What happens here?

It turns out that this phosphate over here is actually in a pretty high energy state, in no small part because of electron negative-negative repulsion. And by stripping this phosphate off this high energy phosphate stripped off of this molecule here, whose name we will ignore, allows us to phosphorylate an ATP.

And since there are two trioses being converted, we're going to get two ATPs. So, in effect, now we're actually ahead. We started out investing two, we got six back from the NADHs, and we're getting two back here.

So, we've made two ATPs. This is a good thing. Keep in mind, ADP is lower energy, ATP is a high energy. Once again, we have an isomerization where these two molecules are at comparable states here and here, where the phosphate just jumps over to this state. And this hydrolyzes spontaneously and we get this molecule right over here, phosphoenolpyruvate at the end.

And, once again, we harvest two ATPs, one ATP from each of the trioses. And we end up, at the end of this reaction, with pyruvate. And you'll say this is terrific because we invested two ATPs, we harvested four, plus we got six from the NADHs, right? Two NADHs, each NADH gives us three each, so let's do the arithmetic. Let's do the balance sheet. We invested to begin with, with the one glucose, we invested two ATPs. That was early on. Then the return was first two NADHs, which I've told you equals six ATPs. Because an NADH is worth three ATPs.

This is so far good. And now subsequently we've made four ATPs so that the net yield looks pretty useful. Six plus four is ten minus two, a profit of eight ATPs from one glucose molecule.

This is terrific you may say, but there's a rub.

There's a catch. If glycolysis is occurring in the absence of oxygen, if that happens, then we have a problem here, because the only way that these NADHs can generate ATP is if there is oxygen around to take these electron pairs and use them to reduce an oxygen molecule. That is, by the way, part of the reason we breathe. Keep in mind that when you generate an NADH from an NAD molecule, you need to regenerate the NAD.

You can't just accumulate more and more NADHs. You need to regenerate the NAD. And, therefore, this NADH, with their electron pairs, the electron pairs have some to be disposed of. You have to regenerate NAD. You can't just make more and more and more of this. So, how do cells get rid of it?

Well, how they get rid of it is simple.

You take the electron pairs and you slap them onto oxygen, and that's really called combustion. And you get a lot of energy out of that. But what happens if all of this is occurring anaerobically?

Anaerobically means the reaction is occurring in the absence of oxygen.

Well, if you have a yeast that's growing 14 feet underground, this is happening anaerobically. If you have a yeast that's fermenting in a big keg to make wine or beer, it's also probably happening anaerobically. If you start running in a 100 yard sprint, or let's say you had to run a mile, then initially there's enough oxygen, there's a lot of oxygen around to allow you to get rid of these NADHs and dump the electrons that they have acquired onto the oxygen molecule. And that's fine.

That's worth a lot because, in effect, what you're doing is you're taking oxygen and hydrogen and you're combusting them together.

And that's great. But as you start running down the street, soon the oxygen supply to your muscles is going to run out, and soon a lot of the energy production in your muscles happens anaerobically. Why? Because you can't get oxygen quickly enough to your muscles, and therefore, for a period of time, you start feeling that burning sensation in your muscles because oxidation of NADH isn't happening. And these NADHs instead are regenerated by another way. How are they regenerated? The electron pairs of the NADHs, must be, are dumped back onto this molecule right here, pyruvate. They're not used to make ATP because they can't be used to make ATP because there's no oxygen around to accept the electron pairs that these NADHs have acquired.

And so, what happens with these valuable NADHs?

Under anaerobic conditions this doesn't happen.

These NADHs are used instead, their electrons are donated to our friend pyruvate here, these three carbon sugar.

And what happens, when they are donated back to the pyruvate, in order to regenerate NAD you need more NAD to pick up to use later in the reaction, to use over again in another reaction.

When you donate the electrons from NADH back onto pyruvate, what happens? You get lactic acid. Lactic acid is what makes your muscles burn when you're running very quickly and you can't get enough oxygen into them to begin to burn up the NADH.

So, instead of using NADH to generate ATP, it's diverted to make lactic acid. That's in one sense good because you regenerate NAD.

Why do you need to regenerate NAD? Because you need a lot of NAD around for the earlier steps in the reaction. Keep in mind, early in the reaction you need NAD here. If you don't regenerate it then glycolysis grinds to a halt. So, even though you make NADH and it's a good thing in principle, in practice it has to be recycled.

And if it's not recycled to make more new NAD to allow this step to happen then the whole glycolytic reaction will shut down and you're in a mess. However, sadly, in the absence of oxygen, the only way to recycle this is to dump these electrons not onto oxygen which is energy rich, it's dump them back onto pyruvic acid creating lactic acid.

So, you reduce this bond right here. So, you get CH, COH. This bond right here is reduced and you get lactic acid.

So, instead of a carbonyl bond here you have CH and COH right here, that's a reduction reaction. And now you're able to regenerate the NAD. And now you say that's a great thing. But, keep in mind, that now the entire glycolytic reaction, how much is our net profit now? Before I was gloating about the fact that we made eight ATPs, we netted eight ATPs out of this. What are we back down to now?

What's the whole net yield now? Well, the TAs can't answer.

It's two, because we invested two and we got out four.

And it's only two. Now, why is this so interesting?

Well, until about six hundred million years ago there wasn't that much oxygen in the atmosphere. And in the absence of oxygen this is almost the only reaction that could be used in order to generate energy. And about six hundred million years ago more and more oxygen from photosynthesis became dumped into the atmosphere.

And soon oxygen became available to organisms like our ancestors.

And then they could actually begin to recycle this NADH in a much more productive way. And as a consequence what happened, instead of having glycolysis yielding two, we could go up to this theoretical eight because the NADHs could now deposit their electrons on oxygen, which is much more profitable.

In fact, I've just told you now that in the absence of oxygen you can only make two ATPs. I will tell you, without providing it to you, that in the presence of oxygen you can make 34 ATPs.

And 34 is, we can agree, much better than two in the presence of oxygen. Higher life forms could not evolve until this much more effective way of generating energy became available. And, therefore, if our ancestors who lived longer than six hundred million years ago were very sluggish and they weren't very smart, the reason why they were sluggish and they weren't very smart is because they couldn't generate the energy that was required to efficiently drive metabolism.

The metabolism, anaerobic metabolism, i.

., occurring in the absence of energy, is extremely inefficient.

It just doesn't happen very well. Now, what actually happens if we have oxygen around? Well, what happens is something like this. We take the pyruvate, which is the product of glycolysis and which is this much more primitive pathway, and we dump it into the mitochondria. And now we generate through this cycle here, which I'm not asking you memorize, please, don't do that. We generate the reactions which go from here and get us up to this 34 ATP yield per glucose. And the essence of the citric acid cycle, which happens in the mitochondria, keep in mind that the mitochondria look like this.

Keep in mind that the mitochondrion are the decedents of bacteria which parasitized the cytoplasm of cells probably 1.5 billion years ago.

But if we now look at what happens in the mitochondrion, the pyruvate that we generated in the cytosol, in the soluble part of the cytoplasm is now pumped into the mitochondria, and there's a whole series of reactions that go on here, which takes this three-carbon sugar. The first thing that happens is that carbon is boiled off. Carbon dioxide, that's released.

Now we're down to a two carbon sugar. And then this two carbon sugar is added to a four carbon sugar and progressively oxidized.

And as it's oxidized what's spun off? Well, what's spun off is, for example, there's NADH which is spun off, there's ATP.

See, there's an NADH which is spun off. Here's an NADH that's spun off.

Here is a cousin of NADH. It's called FADH which, once again, generates a high energy molecule. Once again, the carbon molecules are oxidized, electrons are stripped away and used to create these high energy molecules, FADH and NADH.

By the way, FADH, a cousin of NADH, is only worth two ATPs on the open market. Whereas, NADH, as I've told you repeatedly, is worth three. And by the time we add up all of the NADHs that have been generated by this cycling and the carbon dioxides that are releases, at the end of this cycle here we start with two carbons, add it to four and we get a six carbon molecule.

We spew off some carbon dioxides here and go back to four carbon sugar. Add another two, go up to six carbons. Go around again, spin around the wheel. And each time we do that we generate a lot of NADHs, we generate a lot of FADHs, and we generate a lot of ATP. In all cases, these are highly profitable reactions simply because the NADHs and the FADHs can be used in the mitochondrion to generate ATP. So, let's look at the energy profile of the entire thing. Put it all together. This is where we started out at the beginning, and this is the end of glycolysis, OK? So, now we're adding up the energy profiles of the whole sequence of reactions that constituted glycolysis, which begins up here and ends right here because pyruvate, as you will recall, is the product of glycolysis, the first step. The Krebs Cycle happens, or sometimes it's called the Citric Acid Cycle. So, let's just get these words straight. Citric Acid Cycle because it happens to be one of the cycles, or it's sometimes called the Krebs Cycle after the person who really discovered it, Krebs.

The Krebs Cycle begins here. You see how the shading changes from pyruvate. And here we go all the way down there. And let's now look at what happens in terms of energy exchange.

Recall that early on we needed to invest ATPs to kick up the energy state up to here. We invested ATPs at this stage right here, and then we began to get some back.

We got these two NADHs, one NADH coming from each of the three carbon sugars. We got some more ATPs here and we got some more ATPs here, but these NADHs could not be used productively for generating ATP in the absence of oxygen, but in the presence of oxygen now we can begin to use these very profitably. Each of these makes three ATPs and each of these, obviously, makes ATPs. And then let's look at what happens in the mitochondrion. Keep in mind here's the borderline between the cytosol, the cytoplasm and the mitochondrion.

Here is where the oxygen is actually used and here we generate all these NADHs here, here and here, FADHs. And I keep saying, and it's still true, just in spite of the fact I keep saying it, that these NADHs can be converted to ATPs, and the ATPs can then be diffused, transmitted throughout the entire cell where they're then used invested in endergonic reactions.

Here we see all these NADHs. And look at the overall change in free energy. The initial steps in glycolysis didn't really take advantage. Glucose has inherent in it almost 680 kilocalories per mole of energy. It's pretty high up here. But by the time we get from here down to here, there's an enormous release of energy, it's harvested in the form of these molecules which are then reinvested.

In the absence of oxygen, this entire procedure can only go from here down to here. And a lot of this drop from six to seven is futile because we have to reinvest this NADH.

These cannot be used, actually, to generate more ATPs, as I've said repeatedly. So, this means in the end that we can generate an enormous amount of energy in the form of these coupled reactions. Having said that, let's actually look at what happens inside of the mitochondria.

Inside of the mitochondria there are actually different physical compartments. See the blue space there, the intermembrane space, the blue spaces there? The matrix is on the inside.

The intermembrane space is between the two, the inner and the outer membrane, and outside is the cytoplasm. The outer membrane, the inner membrane, in between it. So, look what happens, actually, in the mitochondrion. Those NADHs are used to pump protons from the inner space of the mitochondrion into the intermembrane space. I'm not showing you that happening.

But you'll have to take it on my word. So, protons pictured here are extracted from NADH and FADH, and they're used to pump protons out here. And, therefore, protons are moved from here to here.

Obviously, when you pump protons out the pH gets lower on the outside than it does on the inside, and because there's a gradient, there's a higher concentration of protons here than on the inside.

The protons begin to accumulate outside here in the intermembrane space. Are they in the cytoplasm? No. They're in the space between the inner and the outer membrane. You start to accumulate in this blue space lots of protons. And this pumping of protons into the space between the two membranes requires energy, and the energy comes from our friends NADH and FADH as it turns out. They are responsible for causing this accumulation of protons in the space between the inner and the outer membrane. So, now we get lots of protons out there. And what happens now, the protons like to flow back in because there is a higher concentration here as they are inside the space that's called the mitochondrial matrix, on the inside of the mitochondrion. So, what happens?

Here, yet another Nobel Prize winning discovery is the discovery of a very interesting molecule, or complex of proteins I should say, that looks in three-dimensions roughly like this.

And what this complex does is as the protons flow through the inner channel here, they're moving down an energy gradient.

They're going from a state of high concentration to a state of low concentration. What that does, that diffusional pressure actually yields energy.

And this complex right here harvests that energy in order to convert ADP into ATP. So, when I talk about NADH as being worth, each of them being worth three ATPs, what I'm really talking about is the fact that NADHs can be used to pump protons in the mitochondria outside here, and these protons can then be used, can then be pumped, can then flow in this way through this proton pump, which then uses ADP in the inner cavity of the mitochondria to create ATP. And here we get finally the conversion of ADP into ATP. We can realize, finally, this much promised benefit. And then these ATP molecules are exported from the mitochondria throughout the entire cell and used to drive many reactions. We've already encountered one important set of reactions, and those reactions are the polymerization of nucleic acids. Now, one final point I want to make is the following. We've just talked about metabolic, we've talked about the pathway of energy production in the cell.

And you might have had the illusion, for a brief instant, that those are all, that's the sum of all the biochemical reactions in the cell. But, in fact, if we plot out all the biochemical reactions in the cell, they're much more complicated. Here is the glycolytic pathway. You see it right down here where nothing is named? Here is the Krebs Cycle right here.

And we're not even talking about energy here. And as molecules move down this pathway from here to here to here to here, some of these molecules are diverted for other applications.

Not for energy production but for other applications.

And what happens out here, they are converted through a series of complex biochemical steps into other essential biological molecules. What do I mean by that?

If you give E. coli, a bacterium, you give it a simple carbon source like glucose and you give it phosphate and you give it a simple nitrogen source like ammonium acetate or something, E. coli can, from those simple atoms generate all the amino acids, can generate the purines and the pyrimidines, can generate all kinds of different complex biological molecules just from those simple building blocks. And so, the process of biosynthesis involves not only the creation of macromolecules, these steps of what are called intermediary metabolism are used to synthesize all the other biochemical entities that one needs to make a cell. They're used to synthesize purines and pyrimidines.

They're used to synthesize lipids, they're used to synthesize amino acids, and they're used to synthesize literally hundreds of other compounds. And when we see this chart like this, and nobody on the face of the planet has ever memorized this chart, each one of these steps, going from one molecule to the next, represents another biochemical reaction. And the vast majority of these biochemical reactions going from A to B to C to D.

Each one of these steps requires the intervention of an enzyme, a catalyst that is specialized for that particular step.

So, this begins to give you an appreciation of how many distinct biochemical steps one needs in a cell. The numbers probably to make a simple cell, you probably need about a thousand distinct biochemical reactions, each of one of which requires the involvement of an enzyme. And many of these steps, importantly, many of these biochemical steps are endergonic reactions. Where do they get the energy for driving these reactions forward if they're endergonic? ATP. So, the ATP from the energy generating furnace down here is the then spread throughout the cell to power all of these energy consuming reactions. Have a great weekend.


Department of Biophysics and Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, USA

William Peeples & Michael K. Rosen

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar


M.K.R. and W.P. conceived the study and designed the research program. W.P. performed all experiments. M.K.R. secured funding and supervised the work. W.P. and M.K.R. wrote the manuscript.

Corresponding author


Several subsequent steps are involved in the creation of initial models from KEGG pathways. All of these steps are described in detail in the following sections and depicted as a flowchart in Figure ​ Figure1 1 .

Generation of systems biology models from KEGG pathways. The flowchart shows all major steps involved in the creation of initial systems biology models from KEGG pathways. The whole method requires two sources: a KGML-formatted KEGG pathway and access to other KEGG databases, e.g., via the KEGG API. The preprocessing steps, depicted on the top, involve mainly the removal of inappropriate nodes and processing of reactions. An important step is the removal of duplicate entries. However, some further steps require information about these duplicates (e.g., when using the layout extension package for SBML) and thus, it is not always part of the preprocessing and may be performed at a later stage. Depending on the desired output format, separate processing steps are executed that involve appropriate conversion and annotation of the initial model.

The KEGG Markup Language (KGML)

KEGG uses the KGML format to encode its pathways [16]. For each pathway, a generic reference pathway exists that is derived for a plethora of different organisms. All nodes in those pathways mainly correspond to proteins, small molecules, other referenced pathways or complexes and are encoded as entries in KGML. These entries have a type attribute that further specifies its nature. Additionally, they may have a graphics attribute that is essential for pathway visualizations. Entries corresponding to groups contain components that refer to their contained entries.

Besides entries, KGML specifies reactions, which contain substrates and products that are essentially references to the corresponding entries. The only additional information that is given for reactions is a type attribute, which is either ‘reversible’ or ‘irreversible’. Moreover, KEGG specifies relations, which are primarily important for the visualization of signaling pathways. Relations contain network connections between two entries, such as 𠇊 phosphorylates B”, or 𠇊 inhibits B” but they do not provide sufficient information for conversions to complete biochemical reactions.

Preprocessing and correcting issues in the input KGML

Prior to converting the KEGG pathways to other modeling languages, several issues need to be corrected in preprocessing steps directly on the input KGML. These include operations that involve adding or removing entries from the KGML document, as well as processing contained reactions. The actual conversion to models is independent of those steps and is performed after the preprocessing. To generate reliable models, one might want to remove links to other pathway maps from the document. These referenced pathway maps are no physical instances and thus need to be ignored for some model simulation software. However, they might be required for cross-linking pathways. Furthermore, orphans (i.e., entries that are not present in reactions or relations) might be useless for some modeling approaches and therefore may also be removed. An important step towards building metabolic models are correct biochemical reactions. The reactions specified in the KGML require significant preprocessing in order to reliably translate these to SBML or BioPAX. KGML pathways often contain single XML-reaction objects that point to multiple different biochemical reactions in the KEGG REACTION database. These bundled reactions must be disassembled into separate reaction objects in the XML document, in order to obtain a model with balanced and correct biochemical reactions. Since the information provided in the KGML is limited, the KEGG API needs to be queried for further correction steps. From the KEGG API, information about reversibility of the reaction is retrieved, as well as the reaction equation, including all substrates, products, catalysts, and stoichiometric information. The reversibility is directly annotated on the reaction, the stoichiometric information has to be stored in separate classes, which are later translated to the desired output format. The equation is used to check for missing reaction participants. But simply comparing all KEGG identifiers that are present in the KGML to the reaction equation is not adequate. KEGG consists of many separate databases that contain information about compounds, drugs, glycans, etc. Therefore, one compound might have multiple KEGG identifiers, e.g., one in KEGG COMPOUND and another one in KEGG DRUG. The reaction equations specify just one identifier for each participant, which is any of all available identifiers for an object. Therefore, more queries to the KEGG API are necessary in order to fetch all synonyms for all identifiers. Now, it is possible to compare all reactants to the pathway components, check for missing reaction participants and eventually add those to the KGML. A similar method is required to check for missing enzymes (i.e., reaction modifiers)—we use Enzyme Commission numbers (EC numbers) to check for missing enzymes.

One last important preprocessing step might be performed before converting the pathways to models. The KEGG database uses information about orthology to provide pathway maps for different organisms. Enzymes, catalyzing reactions are annotated using EC numbers, which are independent of actual organisms. In some cases, this leads to annotated enzymes or entries in the KGML, for which no physical instance in the current organism of interest is known. In other words, the entry does probably not exist in the current organism or its existence has not yet been proven. To visualize this information, KEGG changes the background color of those orthologous nodes to white. These nodes should also be removed in order to obtain organism-specific models.

Atom balance of reactions

After the described preprocessing step, the KGML document contains unbundled and complete reactions, for which the equation and stoichiometry has been annotated. Using the KEGG API, the chemical formula of each compound, participating in a reaction can be fetched. By using this information together with the stoichiometry, it is possible to count and compare all atoms on the substrate and product side. There are some further properties that need to be considered: A generic ‘R’ is sometimes used on the substrate and product side to indicate any substituent. Variables like n and n+1 are used by KEGG to create more generic reactions. During our tests, we detected some simple cases, in which an H + or P + was missing, but also some other cases, in which multiple atoms (e.g., 2 C, 3 H and 1 P) were missing. Automatically correcting those issues is not recommended, because the real missing components are unknown. For example, if a P + is missing on the substrate side, larger compounds could be missing on any side of the reaction. The possibilities of missing components on both sides include ATP → ADP, NADPH → NADH, and many others. Therefore, our implementation appends the result of each atom check as comment on every reaction and researchers might have to manually correct reactions with missing atoms.

Conversion and annotation of the KGML document

The completed and corrected KGML document can now be used to generate models. Therefore, conversions to BioPAX, SBML, SBML-qual and several other formats are required. Typically, the model instance has to be initialized and all entries need to be added to the model. Caution needs to be taken in this step, because multiple copies of an entry might exist in one KGML document. Usually, every graphical copy catalyzes different reactions. But for systems biology models, only one element should be created for all copies, representing a union of all physically identical entries. Furthermore, KGML specifies an entry type called ‘reaction’, which should not be converted to a physical entity in the resulting model. Depending on the modeling language, either the reactions or the relations or both need to be converted to the chosen format.

Besides those conversion steps, additional operations are required in order to facilitate further modeling efforts by researchers. This includes extensive annotations and comments for all elements. Hence, Gene Ontology terms, describing the elements and their function, as well as identifiers for a plethora of other databases for genes, proteins, interactions, structural information, small molecules, etc. are added to the model. In more detail, identifiers are added for Entrez Gene, OMIM, Ensembl, UniProt, ChEBI, DrugBank, Gene Ontology, HGNC, PubChem, 3DMET, NCBI Taxonomy, PDBeChem, GlycomeDB, LipidBank, EC numbers (enzyme nomenclature) and various KEGG databases (GENE, GLYCAN, REACTION, COMPOUND, DRUG, PATHWAY, ORTHOLOGY). Besides those cross-references, other helpful human and machine-readable annotations are added, for example, official gene symbols, synonyms, human-readable descriptions, links to more resources or visualizations, and the chemical formula and molecular weight for small molecules.

The annotation of the models is an important step, because simulations on real data or simple experimental data visualization tools require unique identifiers to map the experimental data on the pathway structure. If models provide a simple data structure with labels, but no reference identifiers, they are hardly usable in conjunction with experimental data.


Today, Level 3 is the most recent Level of BioPAX. But Level 2 is still common and there are some data structures in Level 3 that are not available in Level 2. Therefore, separate converters for BioPAX Level 2 and for Level 3 are required. First of all, a BioPAX model has to be created and a pathway object, corresponding to the input KGML, needs to be added to the model . Then, several annotations and cross-references are defined for this pathway. This includes, for instance, the organism, cross-references to other databases, and Gene Ontology terms to define the pathway’s function. The next step involves mapping each KGML element to a corresponding BioPAX element. Figure ​ Figure2 2 gives an overview of these mappings.

Simplified class structure and mapping from KGML to BioPAX. The figure shows the raw mapping of KGML to BioPAX class instances. The type attribute of each entry determines how it is translated (see Table ​ Table1). 1 ). Reactions that are catalyzed by enzymes are translated to Catalysis , whereas non-catalyzed reactions are translated directly to BiochemicalReaction s. Relations are translated differently, depending on their subtype, the participating entities and the chosen BioPAX level (see Table ​ Table2). 2 ). To keep the clarity, the figure does not include the information that in BioPAX Level 2, control and conversion inherit from physicalInteraction . Furthermore, a Catalysis consists of two elements: a Controller and a Controlled element. For our purposes, Controller is always an enzyme and Controlled is a BiochemicalReaction . Similarly, KGML relations may be translated to a Control element that regulates either a Conversion or TemplateReaction .

Having the initial pathway model, the next step is to create BioPAX elements for each KGML entry. This translation mainly depends on the type of the KGML entry and is listed in detail in Table ​ Table1. 1 . Entries with the same identifier (graphical copies of the same element) are grouped to one instance and only one BioPAX element is created for those. Depending on the just created BioPAX element, further annotation steps are required. For Complex es, we need to add all of its components. For SmallMolecule s, we add the molecular weight and chemical formula to the corresponding BioPAX fields, which facilitates further modeling steps. For each element, cross-references to other databases and more annotations are added as described in the previous section.

Table 1

BioPAX instances and SBO terms corresponding to KGML entry types

Entry typeBioPAX elementSBO term
compound smallMolecule 247 (simple chemical)
enzyme protein 252 (polypeptide chain)
gene protein 252 (polypeptide chain)
ortholog protein 252 (polypeptide chain)
group complex 253 (non-covalent complex)
mappathway552 (reference annotation)

This table depicts the conversion of KGML entries to BioPAX or SBML. The conversion depends on the KGML entry type attribute. For BioPAX, different class instances are initialized. Conversions to SBML always involve the creation of a species with the given SBO term for each KGML entry. The KGML specification states that an entry of type ‘gene’ “is a gene product (mostly a protein)”. Additionally, a ‘group’ “is a complex of gene products (mostly a protein complex)” [16]. For compatibility with previous KGML versions, the deprecated type ‘genes’ corresponds to ‘group’ since KGML v0.6.1. Further, entries of type ‘reaction’ are not listed in the table, but discussed in a separate section.

KEGG reactions always correspond to biochemical reactions. Thus, a BiochemicalReaction is the appropriate data structure for those reactions and one instance of this class is created for each KGML reaction. If catalyzing enzymes are annotated, a Catalysis instance is created. This Catalysis catalyzing enzymes as Controller s and the BiochemicalReaction as Controlled element. The reaction is annotated with the reaction direction and if it is reversible or not. Further, the stoichiometry of each participant is annotated, as well as the EC numbers of all catalyzing enzymes. Even to the reactions, human readable supporting information is added, like the reaction equation, other pathways in which this reaction also occurs, and a generic description. In addition, the result of the atom balance check is added as further comment, together with comprehensive information which atoms are on the substrate side, which are on the product side and the difference between them.

Besides biochemical reactions, BioPAX also supports other kinds of relationships between entities. These include universal elements, such as Conversion s or MolecularInteraction s, which are convenient for translating generic KEGG relations that do not provide much information. Relations of type �tivation’, ‘inhibition’ or ‘missing interaction’ constitute examples for such generic translations. The difference between those is that Conversion s can be used to specify a source and a target, whereas MolecularInteraction s (which is the same as physicalInteraction s in BioPAX Level 2) only have a single pool of participating entities. Other KEGG relations can be converted to more specific BioPAX interaction classes. A ComplexAssembly , for example, is used to express a binding between multiple elements, but also for a dissociation of elements. However, the usage of this class requires that the given product or substrate (in a disassembly) is a Complex . If these requirements are not met, a generic Conversion is used. Relations that involve the modification of a protein are appropriately translated to BioPAX by creating controlled processes. This involves the creation of a Control element that contains a Process and a Controller that regulates this process. This is used to translate relations that describe, e.g., a phosphorylation.

To this end, a Conversion is generated, which contains the unphosphorylated protein as source and a phosphorylated variant as target. This conversion is controlled by an instance of Controller that contains the controlling protein.

In BioPAX Level 3, some additional improvements of the translations are performed, such as encoding phosphorylation or other modifications by adding a ModificationFeature to an entity. Furthermore, the expression of a protein can be encoded with a TemplateReaction . This type of interaction is used to describe the production of an RNA or Protein from a template sequence. This process is regulated by a TemplateReactionRegulation that contains mostly a transcription factor as regulator. In KEGG, this is specified by a relation that contains the transcription factor as source, the protein as target and the term 𠆎xpression’ as subtype.

An InteractionVocabulary is created for each translated relation that specifies the type of interaction as controlled vocabulary term and human-readable string. For this purpose, terms from the Systems Biology Ontology (SBO) [17], Gene Ontology (GO) [18] and Molecular Interactions Ontology (MI) [19] are used. Protein modifications are further denoted by a SequenceModificationVocabulary in BioPAX Level 3, which uses terms from the Protein Modification Ontology (MOD) [20]. Table ​ Table2 2 shows in detail, how each relation is converted, and which ontology terms are being used.

Table 2

BioPAX instances and ontology terms corresponding to KGML relation subtypes

Relation subtypeBioPAX elementSBO termMI termGO term
activation conversion, control 170 (stimulation) nonenone
inhibition conversion, control 169 (inhibition) nonenone
expression TemplateReaction, -Regulation 170 (stimulation) none10467
repression TemplateReaction, -Regulation 169 (inhibition) nonenone
indirect effect conversion 344 (molecular interaction) nonenone
state change conversion 168 (control) nonenone
binding/association ComplexAssembly 177 (non-covalent binding) 914 5488
dissociation ComplexAssembly 180 (dissociation) nonenone
missing interaction MolecularInteraction 396 (uncertain process) nonenone
phosphorylation conversion, control 216 (phosphorylation) 217 16310
dephosphorylation conversion, control 330 (dephosphorylation) 203 16311
glycosylation conversion, control 217 (glycosylation) 559 70085
ubiquitination conversion, control 224 (ubiquitination) 220 16567
methylationconversion, control214 (methylation)21332259

This table shows how relations are handled during conversion to BioPAX or SBML. The conversion depends on the subtype of each relation. For each subtype, the corresponding BioPAX element, as well as terms from different ontologies are specified. When converting to BioPAX, all terms are annotated as an instance of InteractionVocabulary , whereas an SBML transition has a field for the SBO term and other terms are added as controlled vocabularies on the transition . Please note that some BioPAX elements are subject to certain conditions and others need to be replaced by more generic classes in BioPAX Level 2, due to differences in both releases. Please see the KEGG to BioPAX section for more details.


Even though it is not the latest release of SBML, Level 2 Version 4 is still used in many applications and hence, should be supported for the conversion of metabolic models. The most recent SBML Level 3 release introduces extension packages and is required to include qualitative models (qual), groups, and layout information in the document, which are essential for modeling signaling pathways. At the first glance, conversion of KGML to SBML seems to be simple. This is also suggested by the mapping scheme, depicted in Figure ​ Figure3. 3 . But in SBML, the distinction between various relation or entry types is not made by using different class instances, as in BioPAX, but by using special attribute-value pairs, such as SBO terms. KEGG defines entries and an entry type, which specifies if the entry corresponds to a protein, complex, small molecule, referenced pathway map, or some other type. BioPAX provides different classes to distinguish between those types. SBML, similar to KGML, just has a class named species to encode all those entries. The type of the species should be specified by using terms from the Systems Biology Ontology (SBO) [17]. These SBO terms are hierarchically organized and only SBO terms from the ‘material entity’ branch should be used to encode the entities. Table ​ Table1 1 shows, which SBO terms are most appropriate to encode the different KGML entries. Furthermore, as in BioPAX translations, it is important to group graphical copies of the same entries to one element and to create only one species element for this entry. To make the model usable for further applications, extensive annotations and references to other databases are added, using standardized controlled vocabulary (CV) terms and MIRIAM identifiers [21,22]. Further, a description, various synonyms, the CAS number, chemical formula, a reference picture (structural formula for compounds, image of the pathway-map for pathways), molecular weight, and mass are added as human-readable annotation, if available.

Simplified class structure and mapping from KGML to SBML. This mapping includes the SBML qualitative models (qual) and groups extension packages. Most properties are encoded as attributes on the actual classes. Tables ​ Tables1 1 and ​ and2 2 give further details about translation of entries and relations. SBML can only handle reactions. Therefore, SBML-qual is required to properly encode relations. This extension package requires its own model. Subsequently, the SBML-core model and each species have to be duplicated to obtain a qualitativeModel including the translated relations. Furthermore, the groups extension package can be used for a proper encoding of groups in SBML.

Groups are not supported by SBML-core. In order to encode entries of type ‘group’ in SBML Level 3, one can use the groups extension package [23]. To encode groups in SBML prior to Level 3, the only way are annotations, for example by adding a CV term with a BQB_IS_ENCODED_BY or BQB_HAS_PART qualifier that specifies the contents of the group. In any case, an SBO term should also be used, which marks this species as a complex of multiple other species .

KEGG reactions are converted to SBML reaction s with correct SBO terms for substrates (SBO:0000015) and products (SBO:0000011). If the reaction is reversible, a generic reactant SBO term (SBO:0000010) should be applied to all reaction participants. In addition, the reversibility is annotated to the reaction itself and the stoichiometry is annotated on all reaction participants. Catalyzing enzymes are included as ModifierSpeciesReference and CV terms, referring to the KEGG reaction identifier as well as all pathways, in which this reaction occurs, are added. Human-readable annotations on reaction s include the reaction definition, equation, a reference to the reaction equation as HTML-image, and the result of the atom balance check (i.e., if there are missing atoms in the reaction).

Relations are required to encode signaling pathways but cannot properly be included into core SBML. There is no structure that encodes, e.g., 𠇊 activates B”—we can only add reactions to SBML. For SBML Level 3, the recently proposed qualitative models (qual) extension package solves this problem [24]. This extension is designed for qualitative modeling and allows for modeling relationships that cannot be described in detail. Thus, to encode the KEGG relations, we have to convert the model to a qualitativeModel and create a qualitative transition for each relation. An SBO term, as given in Table ​ Table2, 2 , is assigned to the transition to specify its type. A GO term, mentioned in the same table, is further added as CV term on the transition .

Further KGML characteristics

KGML entries that are reactions

The KGML specification allows entries to have a type called ‘reaction’. This can be used, for example, to let a relation point to a reaction. Actually, KGML only allows entries to be targets of relations but these constructs can be used to relax the constraints. However, BioPAX naturally allows interactions to point to other interactions as sources or targets. Hence, the document structure is not invalidated if entries with type ‘reaction’ are converted to real reactions in BioPAX and every use of this entry is replaced by using the BioPAX reaction.

In SBML, these entries are also converted to reaction s. No species is created for entries with type ‘reaction’ in SBML-core. For SBML-qual, the specification has similar requirements as KGML: all transition s must have qualitativeSpecies as sources or targets. Therefore, for SBML-qual the translation is similar to the source KGML and a qualitativeSpecies with adequate annotation is created for entries with type ‘reaction’.

Relations of subtype 𠆌ompound’

Some KGML documents include reactions and exclusively relations of subtype 𠆌ompound’. These compound-relations are mostly relations between enzymes and compounds. KEGG states that this compound is “shared with two successive reactions […]” [16]. In other words, these relations are copies of reactions that have been created by KEGG for the sake of better graphical representation of the pathway. Thus, translating both, the reactions and the compound-relations, would yield duplicated information.

Documents with glycans instead of compounds

Sometimes, KGML specifies glycans as reaction participants instead of compounds. Actually, there is nothing wrong with this, except that the KEGG API often returns reaction equations with compound identifiers and some attributes, such as chemical formula or molecular weight, are exclusively available for compounds. This leads to reactions that are erroneously detected as incorrect or to missing chemical formulas. Therefore, if a synonymous compound identifier is available for a KEGG glycan or another KEGG database identifier that contains synonyms in KEGG COMPOUND, it is advisable to fetch and internally work with the compound identifier. Otherwise, it is very likely that duplicates of the same entries but with different identifiers are created in a model and some relationships are not correctly resolved.

Implementation and availability

All described methods are implemented in the second release of KEGGtranslator (since version 2.2). The application uses and includes Paxtools, a Java™ library for working with BioPAX that facilitates building and writing the internal BioPAX data structure ( To establish the SBML data structure, KEGGtranslator uses the Java™ library JSBML [25] and supports SBML Level 2 Version 4 [26] and SBML Level 3 Version 1 [27].

KEGGtranslator is implemented in Java™, provides an interactive, user-friendly and easy-to-use graphical user interface (GUI), and is freely available under the LGPL version 3 license from KGML pathways can be downloaded automatically from within KEGGtranslator. The application can convert KEGG pathways from KGML files to BioPAX Level 2, BioPAX Level 3, SBML (core), SBML (qual), or SBML-core and -qual in one model. If desired, graphical representations can be created in SBGN, SIF, GML, GraphML, JPG and some other formats. Furthermore, many options are provided that control the described (pre-) processing of KEGG conversions and allow users to customize the generated models to meet a great number of different requirements.


The ability to accurately identify microorganisms is fundamental to all aspects of fungal epidemiology and diagnosis. In phytopathology, the early identification of disease-causing agents is essential to the recognition of pathogens (60). In the last ten years, advancements have been made in the molecular diagnosis of fungi through PCR technology. Unlike conventional methods, samples can be tested directly through PCR and isolated without the need for cultures. The technique is fast and highly specific. It can be used to detect trace amounts of fungal DNA from environment samples before symptoms occur. It therefore allows the implementation of early disease control methods. PCR can be performed routinely and does not require specialized skill to interpret the results. The technology can also offer more accurate quantitative data, providing additional information necessary for decision making and the assessment of how effective fungal agents are in biological control. Since its introduction in the mid 1980s, PCR has become the cornerstone of DNA technology and has cleared the path for the creation of innumerous associated technologies. It is remarkable for its ability to detect amounts of DNA amplified from one or few original sequences. Conventional PCR is not quantitative, but rather qualitative. It has been used to detect, monitor and identify fungi from an entire set of environmental samples and is the core of molecular fungal diagnostics (4).

Fluorescent PCR in situ utilizes fluorescently marked primers or probes to detect and locate fungi in fixed environmental samples following semi-permeabilization (102). The fluorescence of the primers or probes is detected using a confocal microscope. This technique allows the direct detection of the organism in the sample. It also shows the spatial distribution, interactions with the host and other organisms. Bago, Piche, Simon (5) used in situ PCR to detect and locate infections caused by Arbuscular Mycorrhiza fungi. The scope of PCR is infinite (5). It can be used to investigate either a single species or entire communities (22,35,80).

Pneumocystis jiroveci (a fungus previously denominated Pneumocystis carinii) can cause severe pneumonia in patients infected with HIV or otherwise immunosuppressed, but its detection is restricted to the microscopy of specimens in the respiratory tract. Microscopy for the detection of P. jiroveci generally involves the use of stains. Immunofluorescence is more sensitive than these stains, but is more expensive and requires specialized facilities. PCR is more sensitive, especially in patients not infected with HIV, and can therefore be of considerable usefulness (36). PCR specificity is limited, but as this microorganism is an omnipresent commensal, it can be detected through PCR in the absence of the disease (37).

Another example of the use of PCR technology in mycology is in the detection of infection from Aspergillus ssp. in patients with neutropenia. This disease is notoriously difficult to diagnose due to the poor sensitivity of the culture method and the difficulty of finding histopathological specimens in individuals with low platelet counts. Early treatment is essential to achieving the best results. PCR can reduce the time required for the specific diagnosis (111). Real-Time PCR has been successfully used to quantify the number of pathogens (7,13,21,112), thereby assisting in decisions regarding how to treat fungal diseases and assess the effects of fungi (57).

Parasitological diagnostics can be assisted by molecular methods. Many parasites are not cultivable in laboratory and diagnosis principally relies on serology and relatively less sensitive microscopy. Microscopy remains a support to the diagnosis of malaria, but due to its greater sensitivity, PCR can diagnose this illness in even in difficult situations. Plasmodium species can also be detected in different infections, which can hinder microscopic discernment (99)

Enzymes and Biochemical Reactions

Most chemical reactions within organisms would be impossible under the conditions in cells. For example, the body temperature of most organisms is too low for reactions to occur quickly enough to carry out life processes. Reactants may also be present in such low concentrations that it is unlikely they will meet and collide. Therefore, the rate of most biochemical reactions must be increased by a catalyst. A catalyst is a chemical that speeds up chemical reactions. In organisms, catalysts are called enzymes. Essentially, enzymes are biological catalysts.

Like other catalysts, enzymes are not reactants in the reactions they control. They help the reactants interact but are not used up in the reactions. Instead, they may be used over and over again. Unlike other catalysts, enzymes are usually highly specific for particular chemical reactions. They generally catalyze only one or a few types of reactions.

Enzymes are extremely efficient in speeding up reactions. They can catalyze up to several million reactions per second. As a result, the difference in rates of biochemical reactions with and without enzymes may be enormous. A typical biochemical reaction might take hours or even days to occur under normal cellular conditions without an enzyme, but less than a second with an enzyme.

Figure (PageIndex<1>) diagrams a typical enzymatic reaction. A substrate is the molecule or molecules on which the enzyme acts. In the urease catalyzed reaction, urea is the substrate.

Figure (PageIndex<1>): The sequence of steps for a substrate binding to an enzyme in its active site, reacting, then being released as products.

The first step in the reaction is that the substrate binds to a specific part of the enzyme molecule, known as the active site. The binding of the substrate is dictated by the shape of each molecule. Side chains on the enzyme interact with the substrate in a specific way, resulting in the making and breaking of bonds. The active site is the place on an enzyme where the substrate binds. An enzyme folds in such a way that it typically has one active site, usually a pocket or crevice formed by the folding pattern of the protein. Because the active site of an enzyme has such a unique shape, only one particular substrate is capable of binding to that enzyme. In other words, each enzyme catalyzes only one chemical reaction with only one substrate. Once the enzyme/substrate complex is formed, the reaction occurs and the substrate is transformed into products. Finally, the product molecule or molecules are released from the active site. Note that the enzyme is left unaffected by the reaction and is now capable of catalyzing the reaction of another substrate molecule.

For many enzymes, the active site follows a lock and key (A in the figure below) model where the substrate fits exactly into the active site. The enzyme and substrate must be a perfect match so the enzyme only functions as a catalyst for one reaction. Other enzymes have an induced fit (B in the figure below) model. In an induced fit model, the active site can make minor adjustments to accommodate the substrate. This results in an enzyme that is capable of interacting with a small group of similar substrates. Look at the shape of the active site compared to the shape of the substrate in B of the figure below. The active site adjusts to accommodate the substrate.

Figure (PageIndex<2>): (A) Lock and key enzyme model and (B) induced fit enzyme model.


Alberti, S., Gladfelter, A. & Mittag, T. Considerations and challenges in studying liquid-liquid phase separation and biomolecular condensates. Cell 176, 419–434 (2019).

Banani, S. F., Lee, H. O., Hyman, A. A. & Rosen, M. K. Biomolecular condensates: organizers of cellular biochemistry. Nat. Rev. Mol. Cell Biol. 18, 285–298 (2017).

Peng, A. & Weber, S. C. Evidence for and against liquid-liquid phase separation in the nucleus. Noncoding RNA 5, 50 (2019).

Shin, Y. & Brangwynne, C. P. Liquid phase condensation in cell physiology and disease. Science 357, eaaf4382 (2017).

Ditlev, J. A., Case, L. B. & Rosen, M. K. Who’s in and who’s out-compositional control of biomolecular condensates. J. Mol. Biol. 430, 4666–4684 (2018).

Brangwynne, C. P. et al. Germline P granules are liquid droplets that localize by controlled dissolution/condensation. Science 324, 1729–1732 (2009).

Brangwynne, C. P., Mitchison, T. J. & Hyman, A. A. Active liquid-like behavior of nucleoli determines their size and shape in Xenopus laevis oocytes. Proc. Natl Acad. Sci. USA 108, 4334–4339 (2011).

Strom, A. R. et al. Phase separation drives heterochromatin domain formation. Nature 547, 241–245 (2017).

Larson, A. G. et al. Liquid droplet formation by HP1α suggests a role for phase separation in heterochromatin. Nature 547, 236–240 (2017).

Hnisz, D., Shrinivas, K., Young, R. A., Chakraborty, A. K. & Sharp, P. A. A phase separation model for transcriptional control. Cell 169, 13–23 (2017).

Plys, A. J. & Kingston, R. E. Dynamic condensates activate transcription. Science 361, 329–330 (2018).

Altmeyer, M. et al. Liquid demixing of intrinsically disordered proteins is seeded by poly(ADP-ribose). Nat. Commun. 6, 8088 (2015).

Oshidari, R. et al. DNA repair by Rad52 liquid droplets. Nat. Commun. 11, 695 (2020).

Abraham, K. J. et al. Nucleolar RNA polymerase II drives ribosome biogenesis. Nature 585, 298–302 (2020).

Aguzzi, A. & Altmeyer, M. Phase separation: linking cellular compartmentalization to disease. Trends Cell Biol. 26, 547–558 (2016).

Forman-Kay, J. D., Kriwacki, R. W. & Seydoux, G. Phase separation in biology and disease. J. Mol. Biol. 430, 4603–4606 (2018).

Alberti, S. & Dormann, D. Liquid-liquid phase separation in disease. Annu. Rev. Genet. 53, 171–194 (2019).

Babinchak, W. M. & Surewicz, W. K. Liquid-liquid phase separation and its mechanistic role in pathological protein aggregation. J. Mol. Biol. 432, 1910–1925 (2020).

Pak, C. W. et al. Sequence determinants of intracellular phase separation by complex coacervation of a disordered protein. Mol. Cell 63, 72–85 (2016).

Vernon, R. M. et al. Pi-Pi contacts are an overlooked protein feature relevant to phase separation. elife 7, e31486 (2018).

Boeynaems, S. et al. Protein phase separation: a new phase in cell biology. Trends Cell Biol. 28, 420–435 (2018).

Li, P. et al. Phase transitions in the assembly of multivalent signalling proteins. Nature 483, 336–340 (2012).

Banjade, S. & Rosen, M. K. Phase transitions of multivalent proteins can promote clustering of membrane receptors. elife 3, e04123 (2014).

Jain, A. & Vale, R. D. RNA phase transitions in repeat expansion disorders. Nature 546, 243–247 (2017).

Fay, M. M. & Anderson, P. J. The role of RNA in biological phase separations. J. Mol. Biol. 430, 4685–4701 (2018).

Garcia-Jove Navarro, M. et al. RNA is a critical element for the sizing and the composition of phase-separated RNA-protein condensates. Nat. Commun. 10, 3230 (2019).

Kwon, Y. & Chung, Y. D. RNA-mediated regulation of chromatin structures. Genes Genomics 42, 609–617 (2020).

Lin, Y., Protter, D. S., Rosen, M. K. & Parker, R. Formation and maturation of phase-separated liquid droplets by RNA-binding proteins. Mol. Cell 60, 208–219 (2015).

Lin, Y., Currie, S. L. & Rosen, M. K. Intrinsically disordered sequences enable modulation of protein phase separation through distributed tyrosine motifs. J. Biol. Chem. 292, 19110–19120 (2017).

Oldfield, C. J. & Dunker, A. K. Intrinsically disordered proteins and intrinsically disordered protein regions. Annu. Rev. Biochem. 83, 553–584 (2014).

Uversky, V. N. Intrinsically disordered proteins in overcrowded milieu: membrane-less organelles, phase separation, and intrinsic disorder. Curr. Opin. Struct. Biol. 44, 18–30 (2017).

Bah, A. & Forman-Kay, J. D. Modulation of intrinsically disordered protein function by post-translational modifications. J. Biol. Chem. 291, 6696–6705 (2016).

Monahan, Z. et al. Phosphorylation of the FUS low-complexity domain disrupts phase separation, aggregation, and toxicity. EMBO J. 36, 2951–2967 (2017).

Murray, D. T. et al. Structure of FUS protein fibrils and its relevance to self-assembly and phase separation of low-complexity domains. Cell 171, 615–627.e616 (2017).

Owen, I. & Shewmaker, F. The role of post-translational modifications in the phase transitions of intrinsically disordered proteins. Int. J. Mol. Sci. 20, 5501 (2019).

Riback, J. A. et al. Stress-triggered phase separation is an adaptive, evolutionarily tuned response. Cell 168, 1028–1040.e1019 (2017).

Ehrenberg, L. Influence of temperature on the nucleolus and its coacervate nature. Hereditas 32, 407–418 (1946).

Weber, S. C. & Brangwynne, C. P. Inverse size scaling of the nucleolus by a concentration-dependent phase transition. Curr. Biol. 25, 641–646 (2015).

Feric, M. et al. Coexisting liquid phases underlie nucleolar subcompartments. Cell 165, 1686–1697 (2016).

Hyman, A. A., Weber, C. A. & Jülicher, F. Liquid-liquid phase separation in biology. Annu. Rev. Cell Dev. Biol. 30, 39–58 (2014).

McSwiggen, D. T., Mir, M., Darzacq, X. & Tjian, R. Evaluating phase separation in live cells: diagnosis, caveats, and functional consequences. Genes Dev. 33, 1619–1634 (2019).

Olins, A. L. & Olins, D. E. Spheroid chromatin units (v bodies). Science 183, 330–332 (1974).

Wang, J., Jia, S. T. & Jia, S. New insights into the regulation of heterochromatin. Trends Genet. 32, 284–294 (2016).

Allshire, R. C. & Madhani, H. D. Ten principles of heterochromatin formation and function. Nat. Rev. Mol. Cell Biol. 19, 229–244 (2018).

Mekhail, K., Seebacher, J., Gygi, S. P. & Moazed, D. Role for perinuclear chromosome tethering in maintenance of genome stability. Nature 456, 667–670 (2008).

Mekhail, K. & Moazed, D. The nuclear envelope in genome organization, expression and stability. Nat. Rev. Mol. Cell Biol. 11, 317–328 (2010).

Ostrowski, L. A. et al. Conserved Pbp1/Ataxin-2 regulates retrotransposon activity and connects polyglutamine expansion-driven protein aggregation to lifespan-controlling rDNA repeats. Commun. Biol. 1, 187 (2018).

Peters, A. H. et al. Partitioning and plasticity of repressive histone methylation states in mammalian chromatin. Mol. Cell 12, 1577–1589 (2003).

Lachner, M., O’Carroll, D., Rea, S., Mechtler, K. & Jenuwein, T. Methylation of histone H3 lysine 9 creates a binding site for HP1 proteins. Nature 410, 116–120 (2001).

Brasher, S. V. et al. The structure of mouse HP1 suggests a unique mode of single peptide recognition by the shadow chromo domain dimer. EMBO J. 19, 1587–1597 (2000).

Cowieson, N. P., Partridge, J. F., Allshire, R. C. & McLaughlin, P. J. Dimerisation of a chromo shadow domain and distinctions from the chromodomain as revealed by structural analysis. Curr. Biol. 10, 517–525 (2000).

Machida, S. et al. Structural Basis of heterochromatin formation by human HP1. Mol. Cell 69, 385–397.e388 (2018).

Sanulli, S. et al. HP1 reshapes nucleosome core to promote phase separation of heterochromatin. Nature 575, 390–394 (2019).

Erdel, F. et al. Mouse heterochromatin adopts digital compaction states without showing hallmarks of HP1-driven liquid-liquid phase separation. Mol. Cell 78, 236–249.e237 (2020).

Gibson, B. A. et al. Organization of chromatin by intrinsic and regulated phase separation. Cell 179, 470–484.e421 (2019).

Fujisawa, T. & Filippakopoulos, P. Functions of bromodomain-containing proteins and their roles in homeostasis and cancer. Nat. Rev. Mol. Cell Biol. 18, 246–262 (2017).

Wang, L. et al. Rett syndrome-causing mutations compromise MeCP2-mediated liquid-liquid phase separation of chromatin. Cell Res. 30, 393–407 (2020).

Cramer, P. Organization and regulation of gene transcription. Nature 573, 45–54 (2019).

Whyte, W. A. et al. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell 153, 307–319 (2013).

Hnisz, D. et al. Super-enhancers in the control of cell identity and disease. Cell 155, 934–947 (2013).

Fukaya, T., Lim, B. & Levine, M. Enhancer control of transcriptional bursting. Cell 166, 358–368 (2016).

Sabari, B. R. et al. Coactivator condensation at super-enhancers links phase separation and gene control. Science 361, eaar3958 (2018).

Boija, A. et al. Transcription factors activate genes through the phase-separation capacity of their activation domains. Cell 175, 1842–1855.e1816 (2018).

Cho, W. K. et al. Mediator and RNA polymerase II clusters associate in transcription-dependent condensates. Science 361, 412–415 (2018).

Chong, S. et al. Imaging dynamic and selective low-complexity domain interactions that control gene transcription. Science 361, eaar2555 (2018).

Cai, D. et al. Phase separation of YAP reorganizes genome topology for long-term YAP target gene expression. Nat. Cell Biol. 21, 1578–1589 (2019).

Lu, Y. et al. Phase separation of TAZ compartmentalizes the transcription machinery to promote gene expression. Nat. Cell Biol. 22, 453–464 (2020).

Han, X. et al. Roles of the BRD4 short isoform in phase separation and active gene transcription. Nat. Struct. Mol. Biol. 27, 333–341 (2020).

Kwon, I. et al. Phosphorylation-regulated binding of RNA polymerase II to fibrous polymers of low-complexity domains. Cell 155, 1049–1060 (2013).

Patel, A. et al. A liquid-to-solid phase transition of the ALS protein FUS accelerated by disease mutation. Cell 162, 1066–1077 (2015).

Meng, Z., Moroishi, T. & Guan, K. L. Mechanisms of Hippo pathway regulation. Genes Dev. 30, 1–17 (2016).

Boehning, M. et al. RNA polymerase II clustering through carboxy-terminal domain phase separation. Nat. Struct. Mol. Biol. 25, 833–840 (2018).

Lu, H. et al. Phase-separation mechanism for C-terminal hyperphosphorylation of RNA polymerase II. Nature 558, 318–323 (2018).

Guo, Y. E. et al. Pol II phosphorylation regulates a switch between transcriptional and splicing condensates. Nature 572, 543–548 (2019).

Kilic, S. et al. Phase separation of 53BP1 determines liquid-like behavior of DNA repair compartments. EMBO J. 38, e101379 (2019).

Pessina, F. et al. Functional transcription promoters at DNA double-strand breaks mediate RNA-driven phase separation of damage-response factors. Nat. Cell Biol. 21, 1286–1299 (2019).

Singatulina, A. S. et al. PARP-1 activation directs FUS to DNA damage sites to form PARG-reversible compartments enriched in damaged DNA. Cell Rep. 27, 1809–1821.e1805 (2019).

Ray Chaudhuri, A. & Nussenzweig, A. The multifaceted roles of PARP1 in DNA repair and chromatin remodelling. Nat. Rev. Mol. Cell Biol. 18, 610–621 (2017).

Jungmichel, S. et al. Proteome-wide identification of poly(ADP-Ribosyl)ation targets in different genotoxic stress responses. Mol. Cell 52, 272–285 (2013).

Mastrocola, A. S., Kim, S. H., Trinh, A. T., Rodenkirch, L. A. & Tibbetts, R. S. The RNA-binding protein fused in sarcoma (FUS) functions downstream of poly(ADP-ribose) polymerase (PARP) in response to DNA damage. J. Biol. Chem. 288, 24731–24741 (2013).

Rulten, S. L. et al. PARP-1 dependent recruitment of the amyotrophic lateral sclerosis-associated protein FUS/TLS to sites of oxidative DNA damage. Nucleic Acids Res. 42, 307–314 (2014).

Thandapani, P., O’Connor, T. R., Bailey, T. L. & Richard, S. Defining the RGG/RG motif. Mol. Cell 50, 613–623 (2013).

Teloni, F. & Altmeyer, M. Readers of poly(ADP-ribose): designed to be fit for purpose. Nucleic Acids Res. 44, 993–1006 (2016).

Deng, Q. et al. FUS is phosphorylated by DNA-PK and accumulates in the cytoplasm after DNA damage. J. Neurosci. 34, 7802–7813 (2014).

Naumann, M. et al. Impaired DNA damage response signaling by FUS-NLS mutations leads to neurodegeneration and FUS aggregate formation. Nat. Commun. 9, 335 (2018).

Mirman, Z. & de Lange, T. 53BP1: a DSB escort. Genes Dev. 34, 7–23 (2020).

Pryde, F. et al. 53BP1 exchanges slowly at the sites of DNA damage and appears to require RNA for its association with chromatin. J. Cell Sci. 118, 2043–2055 (2005).

Francia, S., Cabrini, M., Matti, V., Oldani, A. & d’Adda di Fagagna, F. DICER, DROSHA and DNA damage response RNAs are necessary for the secondary recruitment of DNA damage response factors. J. Cell Sci. 129, 1468–1476 (2016).

Michelini, F. et al. Damage-induced lncRNAs control the DNA damage response through interaction with DDRNAs at individual double-strand breaks. Nat. Cell Biol. 19, 1400–1411 (2017).

Oshidari, R. et al. Nuclear microtubule filaments mediate non-linear directional motion of chromatin and promote DNA repair. Nat. Commun. 9, 2567 (2018).

Shin, Y. et al. Liquid nuclear condensates mechanically sense and restructure the genome. Cell 175, 1481–1491 (2018).

Torres-Rosell, J. et al. The Smc5-Smc6 complex and SUMO modification of Rad52 regulates recombinational repair at the ribosomal gene locus. Nat. Cell Biol. 9, 923–931 (2007).

Chiolo, I. et al. Double-strand breaks in heterochromatin move outside of a dynamic HP1a domain to complete recombinational repair. Cell 144, 732–744 (2011).

Ryu, T. et al. Heterochromatic breaks move to the nuclear periphery to continue recombinational repair. Nat. Cell Biol. 17, 1401–1411 (2015).

Oshidari, R., Mekhail, K. & Seeber, A. Mobility and repair of damaged DNA: random or directed? Trends Cell Biol. 30, 144–156 (2019).

Hubstenberger, A. et al. P-body purification reveals the condensation of repressed mRNA regulons. Mol. Cell 68, 144–157.e145 (2017).


Enzymes are proteins that have the ability to bind substrate in their active site and then chemically modify the bound substrate, converting it to a different molecule — the product of the reaction. Substrates bind to enzymes just like ligands bind to proteins. However, when substrates bind to enzymes, they undergo an enzyme-induced chemical change, and are converted to products.

Figure 4. Compare the protein-ligand interaction to the enzyme-substrate interaction. Notice that both binding proteins and enzymes have binding sites for their ligands (L) and substrates (S), respectively. This area of the enzyme is called the active site because it also contains amino acids that are important for the conversion of substrate to product.

The substrate binds to the enzyme by interacting with amino acids in the binding site. The binding site on enzymes is often referred to as the active site because it contains amino acids that both bind the substrate and aid in its conversion to product.

You can often recognize that a protein is an enzyme by its name. Many enzyme names end with –ase. For example, the enzyme lactase is used to break down the sugar lactose, found in mammalian milk. Other enzymes are known by a common name, such as pepsin, which is an enzyme that aids in the digestion of proteins in your stomach by breaking the peptide bonds in the proteins.

Enzymes are catalysts, meaning that they make a reaction go faster, but the enzymes themselves are not altered by the overall reaction. Examine this image to see how enzymes work.

Figure 5. Simplified enzymatic reaction. The substrate reversibly binds to the active site of the enzyme, forming the enzyme-substrate (ES) complex. The bound substrate is converted to product by catalytic groups in the active site, forming the enzyme-product complex (EP). The bound products are released, returning the enzyme to its unbound form, ready to catalyze another round of converting substrate to product.

The amino acids in the active site of enzymes play two roles, and sometimes those roles overlap. Some of the amino acids in the active site are responsible for binding of the substrate and others are responsible for facilitating the chemical reaction. Enzymes are generally quite specific for their substrates. Although lactase and pepsin both catalyze the same type of reaction, breaking a bond using water (hydrolysis: “hydro” means “water” and “lysis” means “to break”), lactase only functions when lactose is its substrate and pepsin can only break peptide bonds.

Practice Question

Two substrates—lactose and a short protein—are shown on the left. Two enzymes are shown on the right, labeled A and B. Which of the two enzymes is lactase?

How Enzymes Work

Figure 6. Diagram of a catalytic reaction showing difference in activation energy in uncatalysed and catalysed reaction. The enzyme reduces the energy barrier required to activate the substrate, allowing more substrates to become activated, which increases the rate of product formation. Note that the energy difference between the substrate and the product is not changed by the enzyme.

In all chemical reactions, there is an initial input of energy that is required before the reaction can occur. If this initial energy requirement (called the activation energy or energy barrier) is small, then the reaction will happen quickly and easily. If the activation energy is large, then the reaction will take longer to occur. Enzymes function to reduce the activation energy required for a chemical reaction to occur.

First, the enzyme binds to the substrate and slightly distorts its shape. The change in shape activates the substrate molecule and decreases the total activation energy required for the substrate to be turned into product. As the number of activated substrate molecules increases, so does the conversion of substrate to product. An analogy for this effect is a ski hill, with skiers at the bottom of one side of the hill representing substrates, skiers on the top of the hill representing activated substrates, and the products being the number of skiers that ski down the other side. If the height of the hill is lowered (due to the presence of the enzyme), then more skiers can make it to the top, increasing the number that ski down to become products.

Practice Questions

Fill in the blank: When an enzyme catalyzes a reaction, ________.

  1. it raises the activation energy of the reaction.
  2. it is used once and discarded.
  3. it becomes a product.
  4. it acts as a reactant.
  5. it lowers the activation energy of the reaction.

What will happen to the rate at which a chemical reaction proceeds if the activation energy is increased?

  1. The reaction will happen faster (at a higher rate).
  2. The reaction will happen slower (at a lower rate).
  3. The reaction rate will not change.

In Summary: Chemical Reactions

The outer electron shell dictates how readily and what type of chemical bonds a particular atom will form. The formation of compounds is often visually outlined in chemical equations which show the reactants participating in chemical reactions to form products.

Anabolic pathways assemble large molecules form smaller ones. Catabolic pathways break large molecules into small pieces.

Enzymes are proteins that speed up reactions by reducing the activation energy. Each enzyme typically binds only one substrate. Enzymes are not consumed during a reaction instead they are available to bind new substrates and catalyze the same reaction repeatedly.

Watch the video: Lab 7 Staphylococci Biochemical reactions (December 2021).