What are genes and the human genome. The concept of the genome, the organization of the human genome Decoding the human genome message briefly

It was seven years ago - June 26, 2000. At a joint press conference with the participation of the President of the United States and the Prime Minister of Great Britain, representatives of two research groups - International Human Genome Sequencing Consortium(IHGSC) and Celera Genomics- announced that the work on deciphering the human genome, which began in the 70s, has been successfully completed, and its draft version has been compiled. A new episode in the development of mankind has begun - the postgenomic era.

What can genome deciphering give us, and are the funds and efforts worth the result achieved? Francis Collins ( Francis S. Collins), head of the American Human Genome Program, in 2000 gave the following forecast for the development of medicine and biology in the post-genomic era:

  • 2010 - genetic testing, preventive measures that reduce the risk of diseases, and gene therapy for up to 25 hereditary diseases. Nurses begin to perform medical genetic procedures. Preimplantation diagnosis is widely available, and the limitations of this method are being actively discussed. The US has passed laws to prevent genetic discrimination and respect privacy. Practical applications of genomics are not available to everyone, especially in developing countries.
  • 2020 - Medicines for diabetes, hypertension and other diseases, developed on the basis of genomic information, appear on the market. Cancer therapies are being developed that specifically target the properties of cancer cells in certain tumors. Pharmacogenomics is becoming the accepted approach for developing many drugs. Changing the way mental illness is diagnosed, the emergence of new ways to treat them, changing the attitude of society towards such diseases. Practical applications of genomics are still far from being available everywhere.
  • 2030 - determination of the nucleotide sequence of the entire genome of an individual will become a common procedure, the cost of which is less than $1000. The genes involved in the aging process have been cataloged. Clinical trials are being conducted to increase the maximum life expectancy of a person. Laboratory experiments on human cells have been replaced by experiments on computer models. Mass movements of opponents of advanced technologies are activated in the USA and other countries.
  • 2040 - All conventional health measures are based on genomics. The predisposition to most diseases is determined (even before birth). Effective preventive medicine is available, taking into account the characteristics of the individual. Diseases are detected at early stages by molecular monitoring.
    Gene therapy is available for many diseases. Replacing drugs with gene products produced by the body in response to therapy. Average life expectancy will reach 90 years due to improved socio-economic conditions. There is a serious debate about the ability of man to control his own evolution.
    Inequality in the world persists, creating tension at the international level.

As can be seen from the forecast, genomic information in the near future can become the basis for the treatment and prevention of many diseases. Without information about their genes (and it fits on a standard DVD), a person in the future will only be able to cure a runny nose from some healer in the jungle. Does this seem like fantasy? But once the same fantasy was universal vaccination against smallpox or the Internet (mind you, in the 70s it did not exist yet)! In the future, the child's genetic code will be given to parents at the maternity hospital. Theoretically, in the presence of such a disk, the treatment and prevention of any ailments of a single person will become a mere trifle. A professional doctor will be able to make a diagnosis in the shortest possible time, prescribe effective treatment, and even determine the likelihood of various diseases in the future. For example, modern genetic tests already make it possible to accurately determine the degree of a woman's predisposition to breast cancer. Almost certainly, in 40–50 years, no self-respecting doctor without a genetic code will want to “treat blindly” - just as today surgery cannot do without an x-ray.

Let's ask ourselves a question - is what is said true, or, maybe, in reality, everything will be the other way around? Will people finally be able to defeat all diseases and will they come to universal happiness? Alas. Let's start with the fact that the Earth is small, and there is not enough happiness for everyone. In truth, it is not enough even for half the population of developing countries. "Happiness" is intended mainly for states developed in terms of science, in particular - biological sciences. For example, a technique with which you can "read" the genetic code of any person has long been patented. This is a well-established automated technology - however, expensive and very thin. If you want, buy a license, but if you want, come up with a new technique. But far from all countries will have enough money for such a development! As a result, a number of states will have medicine that is significantly ahead of the level of the rest of the world. Naturally, in underdeveloped countries, the Red Cross will build charitable hospitals, hospitals and genomic centers. And gradually this will lead to the fact that the genetic information of patients in developing countries (which are the majority) will be concentrated in two or three powers that finance this charity. What can be done with such information is hard to imagine. Maybe it's nothing to worry about. However, another outcome is also possible. The battle for priority that has accompanied genome sequencing illustrates the importance of the availability of genetic information. Let's briefly recall some facts from the history of the Human Genome Program.

Opponents of genome deciphering considered the task to be unrealistic, because human DNA is tens of thousands of times longer than the DNA molecules of viruses or plasmids. The main argument against was: the project will require billions of dollars, which other areas of science will miss, so the genomic project will slow down the development of science as a whole. And if, nevertheless, money is found and the human genome is deciphered, then the information obtained as a result will not justify the costs ..."However, James Watson, one of the discoverers of the structure of DNA and the ideologist of the program for the total reading of genetic information, wittily retorted:" It's better not to catch a big fish than not to catch a small one., . The scientist's argument was heard - the problem of the genome was brought up for discussion in the US Congress, and as a result, the national program "Human Genome" was adopted.

In the American city of Bethesda, not far from Washington, there is one of the HUGO focal points ( Human Genome Organization). The Center coordinates scientific work on the topic "Human Genome" in six countries - Germany, England, France, Japan, China and the USA. The work involved scientists from many countries of the world, united in three teams: two interstate - American Human Genome Project and British from Wellcome Trust Sanger Institute- and a private corporation from the state of Maryland, which joined the game a little later, - Celera Genomics. By the way, this is perhaps the first case in biology when a private firm competed with interstate organizations at such a high level.

The struggle took place with the use of colossal means and opportunities. As Russian experts noted some time ago, Celera stood on the shoulders of the Human Genome program, that is, used what had already been done as part of a global project. Really, Celera Genomics I joined the program not at first, but when the project was already in full swing. However, experts from Celera improved sequencing algorithm. In addition, by their order, a supercomputer was built, which made it possible to add the identified "bricks" of DNA into the resulting sequence faster and more accurately. Of course, all this did not give the company Celera unconditional advantage, however, she was forced to reckon with her as a full-fledged participant in the race.

Appearance Celera Genomics sharply increased tension - those who were employed in government programs felt fierce competition. In addition, after the creation of the company, the question of the effectiveness of the use of public investments arose. At the head Celera became Professor Craig Venter ( Craig Venter), who had vast experience in scientific work under the state program "Human Genome". It was he who said that all public programs are ineffective and that in his company the genome is sequenced faster and cheaper. And then another factor appeared - large pharmaceutical companies realized it. The fact is that if all the information about the genome is in the public domain, they will lose their intellectual property, and there will be nothing to patent. Concerned about this, they invested billions of dollars in Celera Genomics (which was probably easier to deal with). This further strengthened her position. In response to this, the teams of the interstate consortium urgently had to increase the efficiency of work on deciphering the genome. At first, the work went inconsistently, but then certain forms of coexistence were achieved - and the race began to pick up pace.

The finale was beautiful - competing organizations, by mutual agreement, simultaneously announced the completion of work on deciphering the human genome,. It happened, as we already wrote, on June 26, 2000. But the difference in time between America and England brought the USA to the first place.

Figure 1. The race for the genome, in which an interstate and a private company took part, formally ended in a "draw": both groups of researchers published their achievements almost simultaneously. Head of a private company Celera Genomics Craig Venter published his work in the magazine Science co-authored with ~270 scientists who worked under him. Work carried out by the International Human Genome Sequencing Consortium (IHGSC) is published in the journal Nature, and the full list of authors includes about 2800 people who worked in almost three dozen centers around the world.

The studies took a total of 15 years. The creation of the first "draft" version of the human genome cost $ 300 million. However, about three billion dollars were allocated for all research on this topic, including comparative analyzes and the solution of a number of ethical problems. Celera Genomics invested about the same amount, although she spent them in only six years. The price is colossal, but this amount is insignificant in comparison with the benefit that the developer country will receive from the final victory over dozens of serious diseases expected soon. In early October 2002, in an interview with The Associated Press, President Celera Genomics Craig Venter said that one of his non-profit organizations plans to produce CDs containing as much information as possible about the client's DNA. The preliminary cost of such an order is more than 700 thousand dollars. And one of the discoverers of the structure of DNA - Dr. James Watson - already this year was presented with two DVDs with his genome worth $ 1 million - as we see, prices are falling. Yes, Vice President 454 Life sciences Michael Egholm ( Michael Egholm) said that soon the company will be able to bring the price of decryption to 100 thousand dollars.

Widespread fame and large-scale funding - a double-edged sword. On the one hand, due to unlimited funds, the work is moving forward easily and quickly. But on the other hand, the result of the research should turn out the way it is ordered. By the beginning of 2001, more than 20,000 genes had been identified in the human genome with 100% certainty. This figure turned out to be three times less than had been predicted just two years earlier. A second team of researchers from the US National Institute for Genomic Research, led by Francis Collins, independently obtained the same results - between 20,000 and 25,000 genes in the genome of every human cell. However, two other international collaborative scientific projects introduced uncertainty into the final estimates. Dr. William Heseltine (CEO Human Genome Studies) insisted that their bank contains information about 140 thousand genes. And he is not going to share this information with the world community yet. His firm has invested in patents and intends to capitalize on the information it has received, as it relates to the genes for widespread human diseases. Another group claimed 120,000 identified human genes and also insisted that this figure reflects the total number of human genes.

Here it is necessary to clarify that these researchers were engaged in deciphering the DNA sequence not of the genome itself, but of DNA copies of informational (also called matrix) RNA (mRNA or mRNA). In other words, not the entire genome was studied, but only that part of it that is recoded by the cell into mRNA and directs protein synthesis. Since one gene can serve as a template for the production of several different types of mRNA (which is determined by many factors: cell type, stage of development of the organism, etc.), the total number of all different mRNA sequences (and this is exactly what patented Human Genome Studies) will be much larger. Most likely, it is simply incorrect to use this number to estimate the number of genes in the genome.

Obviously, hastily "privatized" genetic information will be scrutinized in the coming years until the exact number of genes finally becomes generally accepted. But it is alarming that in the process of "cognition" everything that can be patented is patented in general. It’s not even the skin of an unkilled bear, but in general everything that was in the den was divided! By the way, today the debate has slowed down, and the human genome officially has only 21667 genes (NCBI version 35, dated October 2005). It should be noted that while most of the information still remains publicly available. Now there are databases that accumulate information about the structure of the genome not only of humans, but also of the genomes of many other organisms (for example, EnsEMBL). However, attempts to obtain exclusive rights to use any genes or sequences for commercial purposes have always been, are now and will be made in the future.

To date, the main goals of the structural part of the program have already been largely achieved - the human genome has been almost completely read. The first, "draft" version of the sequence, published in early 2001, was far from perfect. It was missing approximately 30% of the sequence of the genome as a whole, of which about 10% of the sequence of the so-called euchromatin- rich in genes and actively expressed parts of chromosomes. According to the latest estimates, euchromatin makes up approximately 93.5% of the entire genome. The remaining 6.5% are heterochromatin- these sections of chromosomes are poor in genes and contain a large number of repeats, which present serious difficulties for scientists trying to read their sequence. Moreover, it is believed that DNA in heterochromatin is in an inactive state and is not expressed. (This may explain such "inattention" of scientists to the remaining "small" percent of the human genome.) But even the "draft" versions of euchromatic sequences available for 2001 contained a large number of breaks, errors, and incorrectly connected and oriented fragments. Without detracting from the importance for science and its applications, the appearance of this "draft", it is worth noting, however, that the use of this preliminary information in large-scale experiments on the analysis of the genome as a whole (for example, in the study of the evolution of genes or the general organization of the genome) revealed many inaccuracies and artifacts. Therefore, further and no less painstaking work, the "last inches", was absolutely necessary.

Figure 2. Left: Automated line for preparing DNA samples for sequencing at the Whitehead Institute's Genomic Research Center. On right: A laboratory filled with machines for high-performance deciphering of DNA sequences.

Completion of the decryption took several more years and almost doubled the cost of the entire project. However, already in 2004, it was announced that 99% of euchromatin was read with an overall accuracy of one error per 100,000 base pairs. The number of breaks has decreased by 400 times. The accuracy and completeness of reading has become sufficient for an effective search for genes responsible for a particular hereditary disease (for example, diabetes or breast cancer). In practice, this means that researchers no longer have to deal with time-consuming confirmation of the sequences of the genes they work with, since they can completely rely on the sequence of the entire genome that is defined and accessible to everyone.

Thus, the original project plan was significantly overfulfilled. Has it helped us in understanding how our genome is structured and works? Undoubtedly. The authors of the article in Nature, in which the "final" (for 2004) version of the genome was published, carried out several analyzes using it, which would be absolutely meaningless if they had only a "draft" sequence on hand. It turned out that more than a thousand genes were "born" quite recently (by evolutionary standards, of course) - in the process of doubling the original gene and the subsequent independent development of the child gene and the parent gene. A little less than forty genes recently "died", having accumulated mutations that made them completely inactive. Another article published in the same issue of the journal Nature, directly points to the shortcomings of the method used by scientists from Celera. The consequence of these shortcomings was the omission of numerous repeats in the read DNA sequences and, as a result, the underestimated length and complexity of the entire genome. In order not to repeat such mistakes in the future, the authors of the article proposed using a hybrid strategy - a combination of a highly effective approach used by scientists from Celera, and the relatively slow and laborious but more reliable method used by the IHGSC researchers.

Where will the unprecedented human genome research go next? Something about this can already be said. Founded in September 2003, the international consortium ENCODE ( ENCyclopaedia Of DNA Elements) set as its goal the discovery and study of "control elements" (sequences) in the human genome. Indeed, after all, 3 billion base pairs (namely, such is the length of the human genome) contain only 22 thousand genes scattered in this ocean of DNA in a way incomprehensible to us. What controls their expression? Why do we need such an excess of DNA? Is it really a ballast, or does it still manifest itself, possessing some unknown functions?

To begin with, as a pilot project, scientists at ENCODE took a close look at the sequence that makes up 1% of the human genome (30 million base pairs) using the latest molecular biology research equipment. The results were published in April of this year in Nature. It turned out that most of the human genome (including regions that were previously considered "silent") serves as a template for the production of various RNAs, many of which are not informational, since they do not encode proteins. Many of these "non-coding" RNAs overlap with "classical" genes (sections of DNA that code for proteins). An unexpected result was also how the DNA regulatory regions were located relative to the genes whose expression they controlled. The sequences of many of these regions changed little during evolution, while other regions, considered important for cell control, mutated and changed during evolution at an unexpectedly high rate. All these findings have raised a large number of new questions, the answers to which can only be obtained in further research.

Another task, the solution of which will become a matter of the near future, is to determine the sequence of the remaining “small” percent of the genome that make up heterochromatin, i.e., gene-poor and repeat-rich DNA sections necessary for doubling chromosomes during cell division. The presence of repetitions makes the task of deciphering these sequences unsolvable for existing approaches, and, therefore, requires the invention of new methods. Therefore, do not be surprised when another article is published in 2010 announcing the "end" of the decoding of the human genome - it will talk about how heterochromatin was "hacked".

Of course, now we have at our disposal only a certain “averaged” version of the human genome. Figuratively speaking, today we have only the most general description of the design of the car: engine, chassis, wheels, steering wheel, seats, paint, upholstery, gasoline with oil, etc. A closer examination of the result indicates that ahead are years of work on refine our knowledge of each specific genome. The Human Genome Program has not ceased to exist, it is only changing its orientation: from structural genomics there is a transition to functional genomics, designed to establish how genes are controlled and work. Moreover, all people at the level of genes differ in the same way as the same car models differ in different versions of the same units. Not only can individual bases in the gene sequences of two different people differ, but the number of copies of large DNA fragments, sometimes including several genes, can vary greatly. And this means that work on a detailed comparison of the genomes of, say, representatives of various human populations, ethnic groups, and even healthy and sick people, comes to the fore. Modern technologies allow such comparative analyzes to be carried out quickly and accurately, and yet ten years ago no one dreamed of this. Another international scientific association is engaged in the study of structural variations of the human genome. In the US and Europe, significant funds are allocated to finance bioinformatics - a young science that arose at the intersection of computer science, mathematics and biology, without which it is impossible to understand the boundless ocean of information accumulated in modern biology. Bioinformatics methods will help us answer many interesting questions - “how did human evolution take place?”, “what genes determine certain features of the human body?”, “what genes are responsible for predisposition to diseases?” You know what the English say: This is the end of the beginning- "This is the end of the beginning." This is exactly the phrase that accurately reflects the current situation. The most important thing begins and - I am quite sure - the most interesting: the accumulation of results, their comparison and further analysis.

« ...Today we are releasing the first edition of the Book of Life with our instructions, - Francis Collins said on the air of the Rossiya TV channel. - We will refer to it for decades, hundreds of years. And soon people will wonder how they could do without this information.».

Another point of view can be illustrated by quoting Academician V. A. Kordyum:

“... Hopes that new information about the functions of the genome will be completely open are purely symbolic. It can be predicted that gigantic centers will arise (on the basis of existing ones) that will be able to combine all data into one coherent whole, a kind of electronic version of a Human and realize it practically - into genes, proteins, cells, tissues, organs and anything else. But into what? Pleasing to whom? For what? In the process of work on the program "human genome" methods and equipment for determining the primary DNA sequence were rapidly improved. In the largest centers, this has turned into a kind of factory activity. But even at the level of laboratory individual devices (or rather, their complexes), such perfect equipment has already been created that it is able to determine in three months such a DNA sequence in volume, which is equal to the entire human genome. Not surprisingly, the idea of ​​determining the genomes of individual people arose (and immediately began to be rapidly realized). Of course, it is very interesting to compare the differences of different individuals at the level of their fundamental principle. The benefit of such a comparison is also undeniable. It will be possible to determine who has what disorders in the genome, predict their consequences and eliminate what can lead to diseases. Health will be guaranteed, and life will last very significantly. This is on the one hand. On the other hand, everything is not at all obvious. To obtain and analyze the entire heredity of an individual means obtaining a complete, exhaustive biological dossier on him. It, at the desire of the one who knows him, will allow him to do just as exhaustively anything with a person. According to the already known chain: a cell is a molecular machine; a person is made up of cells; the cell in all its manifestations and in the entire range of possible responses is recorded in the genome; the genome can already be manipulated to a limited extent even today, and in the foreseeable future it can be manipulated almost as you please...»

However, it is probably too early to be afraid of such gloomy forecasts (although you certainly need to know about them). For their implementation it is necessary to completely rebuild many social and cultural traditions. Doctor of Biological Sciences Mikhail Gelfand said very well about this in an interview, and. O. Deputy Director of the Institute for Information Transmission Problems of the Russian Academy of Sciences: “ ...if you have, let's say, one of the five genes that predetermine the development of schizophrenia, then what can happen if this information - your genome - fell into the hands of your potential employer, who does not understand anything in genomics!(and as a result, you may not be hired, considering it risky; and this despite the fact that you do not have and will not have schizophrenia - author's note.) Another aspect: with the advent of individualized medicine based on genomics, insurance medicine will completely change. After all, it is one thing to foresee unknown risks, and another thing to be absolutely certain. To be honest, the entire Western society as a whole, not only Russian, is not ready for the genomic revolution right now ... " .

Indeed, in order to use new information wisely, one must understand it. And in order to understand the genome is not easy to read, it is far from enough, we will need decades. Too complex a picture emerges, and in order to realize it, we will have to change many stereotypes. Therefore, in fact, the decoding of the genome is still ongoing and will continue. And whether we will stand aside or finally become active participants in this race depends on us.

Literature

  1. Kiselev L. (2001). New Biology started in February 2001 . "Science and life";
  2. Kiselev L. (2002). The second life of the genome: from structure to function. "Knowledge is power". 7 ;
  3. Ewan Birney, The ENCODE Project Consortium, John A. Stamatoyannopoulos, Anindya Dutta, Roderic Guigó, et. al. (2007). Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 447 , 799-816;
  4. Lincoln D. Stein. (2004). Human genome: End of the beginning. Nature. 431 , 915-916;
  5. Gelfand M. (2007). postgenomic era. "Commercial Biotechnology".

The international project "Human Genome" was launched in 1988. This is one of the most time-consuming and expensive projects in the history of science. If in 1990 about 60 million dollars were spent on it in general, then in 1998 the US government alone spent 253 million dollars, and private companies even more. Several thousand scientists from more than 20 countries are involved in the project. Since 1989, Russia has also been participating in it, where about 100 groups are working on the project. All human chromosomes are divided among the participating countries, and Russia got the 3rd, 13th and 19th chromosomes for research.

The main goal of the project is to find out the sequence of nucleotide bases in all human DNA molecules and to establish localization, i.e. complete mapping of all human genes. The project includes as subprojects the study of the genomes of dogs, cats, mice, butterflies, worms and microorganisms. It is expected that the researchers will then determine all the functions of the genes and develop ways to use the obtained data.

What is the main subject of the project - the human genome?

It is known that in the nucleus of each somatic cell (in addition to the DNA nucleus, there is also in the mitochondria) of a person, there are 23 pairs of chromosomes, each chromosome is represented by one DNA molecule. The total length of all 46 DNA molecules in one cell is approximately 2 m, they contain about 3.2 billion base pairs. The total length of DNA in all cells of the human body (there are approximately 5x1013 of them) is 1011 km, which is almost a thousand times the distance from the Earth to the Sun.

How do such long molecules fit in the nucleus? It turns out that in the nucleus there is a mechanism of "forced" DNA folding in the form of chromatin - levels of compaction.

The first level involves the organization of DNA with histone proteins - the formation of nucleosomes. Two molecules of special nucleosomal proteins form an octamer in the form of a coil around which the DNA strand is wound. One nucleosome contains about 200 base pairs. A fragment of DNA up to 60 base pairs in size, called a linker, remains between the nucleosomes. This level of folding makes it possible to reduce the linear dimensions of DNA by a factor of 6–7.

At the next level, nucleosomes are packed into a fibril (solenoid). Each turn is 6-7 nucleosomes, while the linear dimensions of DNA are reduced to 1 mm, i.e. 25-30 times.

The third level of compactization is the looping of fibrils, i.e., the formation of looped domains that diverge at an angle from the main axis of the chromosome. They can be seen under a light microscope as interphase lampbrush chromosomes. The transverse striation characteristic of mitotic chromosomes reflects, to some extent, the order in which genes are arranged in the DNA molecule.

If in prokaryotes the linear dimensions of the gene are consistent with the dimensions of the structural protein, then in eukaryotes the dimensions of DNA are much greater than the total dimensions of significant genes. This is explained, firstly, by the mosaic, or exon-intron, structure of the gene: fragments subject to transcription - exons, are interspersed with insignificant sections - introns. The sequence of genes is first completely transcribed by the synthesized RNA molecule, from which introns are then cut out, exons are fused, and in this form information from the mRNA molecule is read on the ribosome. The second reason for the colossal size of DNA is the large number of repetitive genes. Some repeat tens or hundreds of times, and there are those that have up to 1 million repeats per genome. For example, the gene encoding rRNA is repeated about 2 thousand times.

Back in 1996, it was believed that a person has about 100 thousand genes, now bioinformatics experts suggest that there are no more than 60 thousand genes in the human genome, and they account for only 3% of the total length of cell DNA, and the functional role of the rest 97% not installed yet.

What are the achievements of scientists in over ten years of work on the project?

The first major success was the complete mapping in 1995 of the genome of the bacterium Haemophilus influenzae. Later, the genomes of more than 20 bacteria were fully described, including the causative agents of tuberculosis, typhus, syphilis, etc. In 1996, the DNA of the first eukaryotic cell, yeast, was mapped, and in 1998, the genome of a multicellular organism, the roundworm Caenorhabolitis elegans, was mapped for the first time. . By 1998, the sequences of nucleotides in 30,261 human genes had been established; about half of the human genetic information has been deciphered.

Below are the known data on the number of genes involved in the development and functioning of some human organs and tissues.

Name of the organ, tissue, cell and number of genes

1. Salivary gland 17

2. Thyroid gland 584

3. Smooth muscles 127

4. Mammary gland 696

5. Pancreas 1094

6. Spleen 1094

7. Gallbladder 788

8. Small intestine 297

9. Placenta 1290

10. Skeletal muscle 735

11. White blood cell 2164

12. Testis 370

13. Leather 620

14. Brain 3195

15. Eye 547

16. Lungs 1887

17. Heart 1195

18. Erythrocyte 8

19. Liver 2091

20. Uterus 1859

In recent years, international databases have been created on the nucleotide sequences in the DNA of various organisms and on the amino acid sequences in proteins. In 1996, the International Society for Sequencing decided that any newly determined nucleotide sequence of 1-2 thousand bases or more should be made public via the Internet within a day after its decoding, otherwise articles with these data in scientific journals will not accepted. Any specialist in the world can use this information.

During the implementation of the Human Genome Project, many new research methods have been developed, most of which have recently been automated, which significantly speeds up and reduces the cost of DNA decoding. The same methods of analysis can be used for other purposes: in medicine, pharmacology, forensics, etc.

Let us dwell on some specific achievements of the project, first of all, of course, related to medicine and pharmacology.

Every hundredth child in the world is born with some kind of hereditary defect. To date, about 10 thousand different human diseases are known, of which more than 3 thousand are hereditary. Mutations have already been identified that are responsible for diseases such as hypertension, diabetes, certain types of blindness and deafness, and malignant tumors. Genes responsible for one of the forms of epilepsy, gigantism, etc. have been discovered. Below are some diseases resulting from damage to genes, the structure of which was completely deciphered by 1997.

Diseases resulting from gene damage

1. Chronic granulomatosis
2. Cystic fibrosis
3. Wilson's disease
4. Early breast/ovarian cancer
5. Emery-Dreyfus muscular dystrophy
6. Atrophy of the muscles of the spine
7. Albinism of the eye
8. Alzheimer's disease
9. Hereditary paralysis
10. Dystonia

It is likely that in the coming years it will become possible to diagnose serious diseases very early, and hence more successful in the fight against them. Currently, methods of targeted drug delivery to affected cells, replacement of diseased genes with healthy ones, switching on and off of lateral metabolic pathways by turning on and off the corresponding genes are being actively developed. Examples of successful application of gene therapy are already known. For example, it was possible to achieve a significant improvement in the condition of a child suffering from severe congenital immunodeficiency by introducing him with normal copies of the damaged gene.

In addition to disease-causing genes, some more genes have been found that are directly related to human health. It turned out that there are genes that determine the predisposition to the development of occupational diseases in hazardous industries. So, in asbestos industries, some people get sick and die from asbestosis, while others are resistant to it. In the future, it is possible to create a special genetic service that will give recommendations on possible professional activities in terms of predisposition to occupational diseases.

It turned out that the predisposition to alcoholism or drug addiction can also have a genetic basis. Seven genes have already been discovered, the damage of which is associated with the emergence of dependence on chemicals. A mutant gene has been isolated from the tissues of alcoholics, which leads to defects in cellular dopamine receptors, a substance that plays a key role in the work of the pleasure centers of the brain. A lack of dopamine or defects in its receptors are directly related to the development of alcoholism. In the fourth chromosome, a gene was found whose mutations lead to the development of early alcoholism and already in early childhood manifest themselves in the form of increased mobility of the child and attention deficit.

Interestingly, gene mutations do not always lead to negative consequences - they can sometimes be beneficial. Thus, it is known that in Uganda and Tanzania, AIDS infection among prostitutes reaches 60-80%, but some of them not only do not die, but also give birth to healthy children. Apparently, there is a mutation (or mutations) that protects a person from AIDS. People with this mutation can be infected with the immunodeficiency virus but do not develop AIDS. A map has now been created that roughly reflects the distribution of this mutation in Europe. Especially often (in 15% of the population) it occurs among the Finno-Ugric group of the population. The identification of such a mutant gene could lead to a reliable way to fight one of the most terrible diseases of our century.

It also turned out that different alleles of the same gene can cause different reactions of people to drugs. Pharmaceutical companies plan to use this data to produce specific drugs for different patient groups. This will help to eliminate adverse reactions from drugs, more precisely, to understand the mechanism of their action, to reduce millions of costs. A whole new branch of pharmacogenetics is the study of how certain features of the structure of DNA can weaken or enhance the effect of drugs.

Deciphering the genomes of bacteria makes it possible to create new effective and harmless vaccines and high-quality diagnostic preparations.

Of course, the achievements of the Human Genome Project can be applied not only in medicine or pharmaceuticals.

DNA sequences can be used to determine the degree of relatedness of people, and mitochondrial DNA can be used to accurately determine maternal kinship. A method of "genetic fingerprinting" has been developed, which makes it possible to identify a person by trace amounts of blood, skin flakes, etc. This method has been successfully used in forensics - thousands of people have already been acquitted or convicted on the basis of genetic analysis. Similar approaches can be used in anthropology, paleontology, ethnography, archeology, and even in such a seemingly distant field from biology as comparative linguistics.

As a result of the research, it became possible to compare the genomes of bacteria and various eukaryotic organisms. It turned out that in the process of evolutionary development in organisms, the number of introns increases, i.e. evolution is associated with the “dilution” of the genome: per unit length of DNA there is less and less information about the structure of proteins and RNA (exons) and more and more areas that do not have a clear functional significance (introns). This is one of the great mysteries of evolution.

Previously, evolutionary scientists identified two branches in the evolution of cellular organisms: prokaryotes and eukaryotes. As a result of the comparison of genomes, archaebacteria, unique unicellular organisms that combine the features of prokaryotes and eukaryotes, had to be separated into a separate branch.

Currently, the problem of the dependence of a person's abilities and talents on his genes is also being intensively studied. The main task of future research is the study of single nucleotide DNA variations in cells of different organs and the identification of differences between people at the genetic level. This will make it possible to create genetic portraits of people and, as a result, more effectively treat diseases, assess the abilities and capabilities of each person, identify differences between populations, assess the degree of adaptation of a particular person to a particular environmental situation, etc.

Finally, it is necessary to mention the danger of disseminating genetic information about specific people. In this regard, laws have already been passed in some countries prohibiting the dissemination of such information, and lawyers around the world are working on this problem. In addition, the Human Genome Project is sometimes associated with the revival of eugenics at a new level, which also causes concern among specialists.

Human genome analysis completed.

On April 6, 2000, the US Congressional Science Committee met in Washington, D.C., where Dr. J. Craig Venter announced that his company, Celera Genomics, had completed the sequencing of all necessary fragments of the human genome. He expects that the preliminary work to sequencing all the genes (there are about 80,000 of them, and they contain about 3 billion "letters" of DNA) will be completed in 3-6 weeks, i.e. much earlier than planned. Most likely, the final decoding of the human genome will be completed by 2003.

Celera joined the Human Genome Project 22 months ago. Its approaches were initially criticized by the so-called open consortium of project participants, but its fruit fly genome sequencing subproject, which it completed last month, has shown their effectiveness.

This time, no one criticized the forecasts of K. Venter, made by him in the presence of US presidential adviser for science, Dr. N. Lane, and a representative of the consortium, the largest specialist in genome sequencing, Dr. Robert Waterston.

The preliminary map of the genome will contain about 90% of all genes, but, nevertheless, it will be of great help to the work of scientists and doctors, since it will allow them to quite accurately find the necessary genes. Dr. Venter said he now intends to use his 300 sequencers to analyze the mouse genome, which will help him understand how human genes work.

The decoded genome belongs to a man, therefore it contains both X- and Y-chromosomes. The name of this person is not known, and it does not matter, because. extensive data on individual DNA variability has been and continues to be collected by both Celera and a consortium of researchers. Incidentally, the consortium uses genetic material obtained from various people in its research. Dr. Venter described the results of the consortium as 500,000 deciphered, but not ordered fragments, from which it would be very difficult to compose whole genes.

Dr. Venter said that once the structure of the genes is determined, he will host a conference to bring outside experts into positioning genes in DNA molecules and determining their functions. After that, other researchers will have free access to data on the human genome.

Negotiations were underway between Venter and a consortium of researchers to jointly publish the results, and one of the main points of the agreement was to provide that genes could be patented only after their functions and position in DNA were accurately determined.

However, negotiations were interrupted due to disagreements over what counts as the completion of genome sequencing. The problem is that in eukaryotic DNA, unlike prokaryotic DNA, there are fragments that cannot be deciphered by modern methods. These fragments can range in size from 50 to 150 kb, but fortunately these fragments contain very few genes. At the same time, there are fragments in DNA regions rich in genes that also cannot be deciphered yet.

Determination of the position and functions of genes is supposed to be carried out with the help of special computer programs. These programs will analyze the structure of genes and, comparing it with data on the genomes of other organisms, offer options for their possible functions. According to Celera, the work can be considered completed if the genes are almost completely determined and it is known exactly how the decoded fragments are located on the DNA molecule, i.e. in what order. This definition is satisfied by the results of Celera, while the results of the consortium do not allow one to unambiguously determine the position of the deciphered sections relative to each other.

Celera intends to make the data available to other researchers by subscription after compiling a complete map of the human genome, while for universities the fee for using the data bank will be very low, 5-15 thousand dollars a year. This will seriously compete with the Genbank database owned by universities.

The Science Committee meeting was highly critical of companies like Incyte Pharmaceuticals and Human Genome Sciences for copying the consortium's Internet data every night and then filing patents for all the genes they found in those sequences.

When asked whether data on the human genome could be used to create a new type of biological weapon, for example, dangerous only for certain populations, Dr. Venter replied that data on the genomes of pathogenic bacteria and viruses pose a much greater danger. When asked by one of the congressmen whether purposeful change in the human race would now become a reality, Dr. Venter replied that it could take about a hundred years to completely determine the functions of all genes, and until then there is no need to talk about directed changes in the genome.

Recall that in December 1999, researchers from Great Britain and Japan announced the establishment of the structure of the 22nd chromosome. It was the first human chromosome to be decoded. It contains 33 million base pairs, and 11 sections (about 3% of the DNA length) remained undeciphered in its structure. The functions of about half of the genes have been determined for this chromosome. It has been established, for example, that 27 different diseases are associated with defects in this chromosome, including such as schizophrenia, myeloid leukemia and trisomy 22 - the second most important cause of miscarriages in pregnant women.

At the time, British scientists were highly critical of Celera's sequencing methods, believing that they would take too long to decipher the sequences and determine the relative position of their fragments. Then, based on the known amount of decoded material, predictions were made that the next to be mapped would be the 7th, 20th, and 21st chromosomes.

A week after the announcement of the completion of the decoding of the nucleotide sequences in the human genome, a meeting of the American Association for the Advancement of Science was held, at which US Secretary of Energy Bill Richardson announced that scientists from the Joint Genome Institute had determined the structures of the 5th, 16th, and 19th human chromosomes .

These chromosomes contain approximately 300 million base pairs, which is 10-15 thousand genes, or about 11% of the human genetic material. So far, 90% of the DNA of these chromosomes has been mapped - there are areas that cannot be deciphered, containing a small number of genes.

Genetic defects have been found on chromosome maps that can lead to certain kidney diseases, prostate and rectal cancer, leukemia, hypertension, diabetes, and atherosclerosis. According to Richardson, closer to the summer, information on the structure of chromosomes will be available to all researchers free of charge.


The Human Genome: An Encyclopedia Written in Four Letters Vyacheslav Zalmanovich Tarantul

PART I. STRUCTURE OF THE HUMAN GENOME

WHAT IS A GENOME?

Questions are eternal, answers are conditioned by time.

E. Chargaff

In a dialogue with life, it is not her question that matters, but our answer.

M. I. Tsvetaeva

From the very beginning, we will define what we mean here by the word genome. This term itself was first proposed in 1920 by the German geneticist G. Winkler. Then there was already another scientific term - genotype, introduced into the arsenal of geneticists by W. Johansen back in 1909, which meant the totality of all the hereditary inclinations of a given particular cell or a given particular organism. Subsequently, Johansen himself said with surprise that his “word” unexpectedly materialized in T. Morgan’s chromosome theory that arose later. But now a new term has appeared - the genome. Unlike genotype, this term was to become characteristic of a whole species of organisms, and not a specific individual. And this became a new stage in the development of genetics.

In the biological dictionary, the concept genome is defined as a set of genes characteristic of the haploid (single) set of chromosomes of a given type of organism. Such a formulation does not sound entirely clear to a non-specialist, and most importantly, it is inaccurate in the modern sense of the word. The basis of the genome is the deoxyribonucleic acid molecule, well known in abbreviated form as DNA. After all, all genomes (DNA) contain at least two types of information: coded information about the structure of mediator molecules (the so-called RNA) and protein (this information is contained in genes), as well as instructions that determine the time and place of manifestation of this information during development. and further life of the organism (this information is mainly located in intergenic regions, although partially in the genes themselves). The genes themselves occupy a very small part of the genome, but at the same time they form its basis. The information recorded in the genes is a kind of "instruction" for the manufacture of proteins, the main building blocks of our body. "On the shoulders" of genes lies a huge responsibility for how each cell and the organism as a whole will look and work. They govern our lives from the moment of conception to the very last breath, without them not a single organ functions, blood does not flow, the heart does not beat, the liver and brain do not work.

However, for a complete characterization of the genome, the information on the structure of proteins embedded in it is not enough. We also need data on the elements of the genetic apparatus that take part in the work ( expression) genes, regulate their manifestation at different stages of development and in different life situations.

But even this is not enough for a complete definition of the genome. After all, the genome also contains elements that contribute to its self-reproduction ( replication), the compact packaging of DNA in the nucleus, and some other areas that are still incomprehensible, sometimes called “selfish” (that is, as if serving only for themselves). For all these reasons, today, when talking about the genome, they usually mean the entire set of DNA sequences present in the chromosomes of the nuclei of cells of a certain type of organism, including, of course, genes. In this book, we will use just such a definition. At the same time, it should be remembered that some other structures (organelles) of the cell also contain genetic information necessary for the functioning of organisms. In particular, all animal organisms, including humans, also have a mitochondrial genome, that is, DNA molecules present in intracellular structures such as mitochondria and containing a number of so-called mitochondrial genes. The human mitochondrial genome is very small compared to the nuclear genome located on chromosomes, but, nevertheless, its contribution to cellular metabolism is very significant.

It is clear that knowledge of the DNA structure alone is by no means sufficient for a complete description of the cell's hereditary system. The following analogy is given to this conclusion in the literature: information about the number and shape of bricks cannot reveal the design of a Gothic cathedral and the course of its construction. In a broader sense, the hereditary system of a cell is made up not only by the structure of DNA, but also by its other components, the totality of which and environmental factors determine how the genome will work, how the course of individual development will go, and how the resulting organism will live later.

From the book The Newest Book of Facts. Volume 1 [Astronomy and astrophysics. Geography and other earth sciences. Biology and Medicine] author

From the book The Human Genome: An Encyclopedia Written in Four Letters author

PART I. STRUCTURE OF THE HUMAN GENOME WHAT IS A GENOME? Questions are eternal, answers are conditioned by time. E. Chargaff In dialogue with life, it is not the question that matters, but our answer. MI Tsvetaeva From the very beginning, let's define what we mean here by the word genome. This term itself

From the book The Human Genome [Encyclopedia written in four letters] author Tarantul Vyacheslav Zalmanovich

THE BASIC PART OF THE GENOME - TERRA INCOGNIT Progress in biology is the transition from false knowledge to true ignorance. V. Ya. Aleksandrov There is nothing useless in nature. M. Montaigne Now it has become possible to estimate that RNA is synthesized only on a maximum of 25–28% of nucleotide

From the book The Newest Book of Facts. Volume 1. Astronomy and astrophysics. Geography and other earth sciences. Biology and medicine author Kondrashov Anatoly Pavlovich

From the book The Future Evolution of Man. Eugenics of the 21st century by Glad John

From the book Decoded Life [My Genome, My Life] by Venter Craig

From the book Biological Chemistry author Lelevich Vladimir Valeryanovich

From the author's book

THE BASIC PART OF THE GENOME - TERRA INCOGNIT Progress in biology is the transition from false knowledge to true ignorance. V. Ya. Aleksandrov There is nothing useless in nature. M. Montaigne Now it has become possible to estimate that RNA is synthesized only on a maximum of 25–28% of nucleotide

From the author's book

From the author's book

PART II. THE FUNCTION OF THE HUMAN GENOME THE QUEEN IS DEAD - LONG LIVE THE QUEEN! What we know is limited, and what we do not know is infinite. P. Laplace Science is always wrong. She will never solve a problem without adding a dozen new ones. B. Show So,

From the author's book

Why is a computer useful for studying the human genome? Without computer bioinformatics technologies (genoinformatics, or, in a broader sense, bioinformatics), the development of genomic research would hardly have been possible at all. It is even hard to imagine how

From the author's book

PART III. ORIGIN AND EVOLUTION OF THE HUMAN GENOME

From the author's book

How different is the human genome from the chimpanzee genome? A genome is a set of genes contained in a haploid (single) set of chromosomes of a given organism. The genome is not a characteristic of an individual, but of a species of organisms. In February 2001 in the American

From the author's book

Mapping the human genome We do not need to bother the gods in vain - There are the insides of the victims to guess about the war, Slaves to be silent, and stones to build! Osip Mandelstam, “Nature is the same Rome…” Genetics is a young science. The evolution of species has been truly discovered

From the author's book

Chapter 11 Deciphering the human genome What do you say when, climbing with the last of your strength to the top of a mountain that no one has yet been on, you suddenly see a person climbing up a parallel path? In science, cooperation is always much more fruitful,

"chromosome" - words that are familiar to every schoolchild. But the idea of ​​​​this issue is rather generalized, since deepening into the biochemical jungle requires special knowledge and a desire to understand all this. And it, if it is present at the level of curiosity, then quickly disappears under the weight of the presentation of the material. Let's try to understand the intricacies in a scientific polar form.

A gene is the smallest structural and functional piece of information about heredity in living organisms. In fact, it is a small section of DNA, which contains knowledge about a specific amino acid sequence for building a protein or functional RNA (from which a protein will also be synthesized). The gene determines those traits that will be inherited and passed on to descendants further along the genealogical chain. Some unicellular organisms have gene transfer that is not related to the reproduction of their own kind, it is called horizontal.

"On the shoulders" of genes lies a huge responsibility for how each cell and organism as a whole will look and work. They govern our lives from conception to our very last breath.

The first scientific advance in the study of heredity was made by the Austrian monk Gregor Mendel, who in 1866 published his observations on the results of crossing peas. The hereditary material that he used clearly showed the patterns of transmission of traits, such as the color and shape of peas, as well as flowers. This monk formulated the laws that formed the beginning of genetics as a science. The inheritance of genes occurs because parents give their child half of all their chromosomes. Thus, the signs of mom and dad, mixing, form a new combination of already existing signs. Fortunately, there are more options than living creatures on the planet, and it is impossible to find two absolutely identical creatures.

Mendel showed that hereditary inclinations do not mix, but are transmitted from parents to descendants in the form of discrete (isolated) units. These units, represented in individuals by pairs (alleles), remain discrete and are passed on to subsequent generations in male and female gametes, each of which contains one unit from each pair. In 1909, the Danish botanist Johansen named these units genes. In 1912, Morgan, a geneticist from the United States of America, showed that they are in the chromosomes.

Since then, more than a century and a half have passed, and research has advanced further than Mendel could have imagined. At the moment, scientists have settled on the opinion that the information contained in the genes determines the growth, development and functions of living organisms. Or maybe even their death.

Classification

The structure of a gene contains not only information about a protein, but also instructions on when and how to read it, as well as empty sections necessary to separate information about different proteins and stop the synthesis of an information molecule.

There are two forms of genes:

  1. Structural - they contain information about the structure of proteins or RNA chains. The sequence of nucleotides corresponds to the arrangement of amino acids.
  2. Functional genes are responsible for the correct structure of all other sections of DNA, for the synchronism and sequence of its reading.

Today, scientists can answer the question: how many genes are on a chromosome? The answer will surprise you: about three billion pairs. And this is only one of twenty-three. The genome is called the smallest structural unit, but it can change a person's life.

Mutations

A random or purposeful change in the sequence of nucleotides that make up a DNA chain is called a mutation. It may have virtually no effect on the structure of the protein, or it may completely pervert its properties. So, there will be local or global consequences of such a change.

Mutations themselves can be pathogenic, that is, they can manifest themselves in the form of diseases, or lethal, preventing the organism from developing to a viable state. But most of the changes go unnoticed by humans. Deletions and duplications are constantly made within the DNA, but do not affect the course of life of each individual.

A deletion is the loss of a portion of a chromosome that contains certain information. Sometimes these changes are beneficial to the body. They help it defend itself against external aggression, such as the human immunodeficiency virus and plague bacteria.

A duplication is a doubling of a section of a chromosome, which means that the set of genes that it contains also doubles. Due to the repetition of information, it is less susceptible to selection, which means it can quickly accumulate mutations and change the body.

Gene Properties

Each person has a huge Genes - these are functional units in its structure. But even such small areas have their own unique properties that allow maintaining the stability of organic life:

  1. Discreteness - the ability of genes not to mix.
  2. Stability - preservation of structure and properties.
  3. Lability - the ability to change under the influence of circumstances, to adapt to hostile conditions.
  4. Multiple allelism is the existence within the DNA of genes that, coding for the same protein, have a different structure.
  5. Allelism is the presence of two forms of the same gene.
  6. Specificity - one trait = one inherited gene.
  7. Pleiotropy - multiple effects of one gene.
  8. Expressivity is the degree of expression of a trait that is encoded by a given gene.
  9. Penetrance - the frequency of occurrence of a gene in the genotype.
  10. Amplification is the appearance of a significant number of copies of a gene in DNA.

Genome

The human genome is all of the hereditary material found in a single human cell. It contains instructions on the construction of the body, the work of organs, and physiological changes. The second definition of this term reflects the structure of the concept, not the function. The human genome is a set of genetic material packaged in a haploid set of chromosomes (23 pairs) and related to a particular species.

The basis of the genome is the molecule well known as DNA. All genomes contain at least two types of information: coded information about the structure of the messenger molecules (so-called RNA) and protein (this information is contained in genes), as well as instructions that determine the time and place of manifestation of this information during the development of the organism. The genes themselves occupy a small part of the genome, but at the same time they are its basis. The information recorded in the genes is a kind of instruction for the manufacture of proteins, the main building blocks of our body.

However, for a complete characterization of the genome, the information on the structure of proteins embedded in it is not enough. We also need data on the elements that take part in the work of genes, regulate their manifestation at different stages of development and in different life situations.

But even this is not enough for a complete definition of the genome. After all, it also contains elements that contribute to its self-reproduction (replication), compact packaging of DNA in the nucleus, and some other still incomprehensible sections, sometimes called "selfish" (that is, supposedly serving only for themselves). For all these reasons, at the moment, when it comes to the genome, they usually mean the entire set of DNA sequences present in the chromosomes of the nuclei of cells of a certain type of organism, including, of course, genes.

Size and structure of the genome

It is logical to assume that the gene, genome, chromosome differ in different representatives of life on Earth. They can be both infinitely small and huge and contain billions of pairs of genes. The structure of the gene will also depend on whose genome you are examining.

According to the ratio between the size of the genome and the number of genes included in it, two classes can be distinguished:

  1. Compact genomes with no more than ten million bases. They have a set of genes strictly correlated with size. Most characteristic of viruses and prokaryotes.
  2. Large genomes are over 100 million base pairs long, with no relationship between their length and the number of genes. More common in eukaryotes. Most of the nucleotide sequences in this class do not code for proteins or RNA.

Studies have shown that there are about 28,000 genes in the human genome. They are unevenly distributed over the chromosomes, but the meaning of this feature remains a mystery to scientists.

Chromosomes

Chromosomes are a way of packaging genetic material. They are found in the nucleus of every eukaryotic cell and consist of one very long DNA molecule. They can be easily seen under a light microscope during fission. A karyotype is a complete set of chromosomes, which is specific to each individual species. Mandatory elements for them are the centromere, telomeres and replication points.

Changes in chromosomes during cell division

A chromosome is a successive link in the chain of information transfer, where each next includes the previous one. But they also undergo certain changes during the life of the cell. So, for example, in interphase (the period between divisions), the chromosomes in the nucleus are located loosely and take up a lot of space.

As the cell prepares for mitosis (i.e., the process of splitting in two), the chromatin condenses and twists into chromosomes, and is now visible under a light microscope. In metaphase, the chromosomes resemble sticks, close to each other and connected by a primary constriction, or centromere. It is she who is responsible for the formation of the division spindle, when groups of chromosomes line up. Depending on the location of the centromere, there is such a classification of chromosomes:

  1. Acrocentric - in this case, the centromere is located polar to the center of the chromosome.
  2. Submetacentric, when the arms (that is, the areas before and after the centromere) are of unequal length.
  3. Metacentric, if the centromere divides the chromosome exactly in the middle.

This classification of chromosomes was proposed in 1912 and is used by biologists until today.

Chromosome anomalies

As with other morphological elements of a living organism, structural changes can also occur with chromosomes that affect their functions:

  1. Aneuploidy. This is a change in the total number of chromosomes in the karyotype due to the addition or removal of one of them. The consequences of such a mutation can be lethal to the unborn fetus, as well as lead to birth defects.
  2. Polyploidy. It manifests itself in the form of an increase in the number of chromosomes, a multiple of half their number. Most often found in plants, such as algae, and fungi.
  3. Chromosomal aberrations, or rearrangements, are changes in the structure of chromosomes under the influence of environmental factors.

Genetics

Genetics is a science that studies the patterns of heredity and variability, as well as the biological mechanisms that provide them. Unlike many other biological sciences, since its inception, it has strived to be an exact science. The entire history of genetics is the history of the creation and use of more and more precise methods and approaches. The ideas and methods of genetics play an important role in medicine, agriculture, genetic engineering, and the microbiological industry.

Heredity - the ability of an organism to provide a number of morphological, biochemical and physiological signs and characteristics. In the process of inheritance, the main species-specific, group (ethnic, population) and family features of the structure and functioning of organisms, their ontogenesis (individual development) are reproduced. Not only certain structural and functional characteristics of the organism are inherited (facial features, some features of metabolic processes, temperament, etc.), but also the physicochemical features of the structure and functioning of the main cell biopolymers. Variability is the diversity of traits among representatives of a particular species, as well as the property of offspring to acquire differences from parental forms. Variability together with heredity are two inseparable properties of living organisms.

Down syndrome

Down syndrome is a genetic disease in which the karyotype consists of 47 chromosomes in humans instead of the usual 46. This is one of the forms of aneuploidy discussed above. In the twenty-first pair of chromosomes, an additional one appears, which introduces extra genetic information into the human genome.

The syndrome was named after the physician, Don Down, who discovered and described it in the literature as a form of mental disorder in 1866. But the genetic background was discovered almost a hundred years later.

Epidemiology

At the moment, a karyotype of 47 chromosomes in humans occurs once per thousand newborns (previously the statistics were different). This became possible due to the early diagnosis of this pathology. The disease does not depend on the race, ethnicity of the mother or her social status. Age influences. The chances of giving birth to a child with Down syndrome increase after thirty-five, and after forty the ratio of healthy children to patients is already 20 to 1. A father's age over forty also increases the chances of having a child with aneuploidy.

Forms of Down syndrome

The most common option is the appearance of an additional chromosome in the twenty-first pair along a non-hereditary path. It is due to the fact that during meiosis this pair does not diverge along the division spindle. In five percent of the diseased, mosaicism is observed (an extra chromosome is not found in all cells of the body). Together they make up ninety-five percent of the total number of people with this congenital pathology. In the remaining five percent of cases, the syndrome is caused by hereditary trisomy of the twenty-first chromosome. However, the birth of two children with this disease in the same family is insignificant.

Clinic

A person with Down syndrome can be recognized by characteristic external signs, here are some of them:

Flattened face;
- shortened skull (the transverse dimension is greater than the longitudinal one);
- skin fold on the neck;
- a fold of skin that covers the inner corner of the eye;
- excessive joint mobility;
- reduced muscle tone;
- flattening of the neck;
- short limbs and fingers;
- development of cataracts in children older than eight years;
- anomalies in the development of teeth and hard palate;
- congenital heart defects;
- the presence of an epileptic syndrome is possible;
- leukemia.

But it is, of course, impossible to unambiguously make a diagnosis based only on external manifestations. Karyotyping is required.

Conclusion

Gene, genome, chromosome - it seems that these are just words, the meaning of which we understand in a generalized and very remote way. But in fact, they greatly influence our lives and, changing, make us change too. A person knows how to adapt to circumstances, whatever they may be, and even for people with genetic anomalies there will always be a time and place where they will be irreplaceable.

The puffer fish genome is about eight times smaller than the human genome, and 330 times smaller than the lungfish protopter genome. What kind of "ghosts" live in the "graveyards of genomes", and how much garbage is in our DNA?

Alexander Panchin

Renowned molecular biologist David Penny of the Allen Wilson Center for Molecular Ecology and Evolution at New Zealand's Massey University once said: “I would be very proud to work with the group that developed the E. coli genome. However, I would never admit that I participated in the design of the human genome. No other university could have messed up this project so badly.” The topic of the amount of junk in our DNA is one of the hottest topics in the scientific community. Around this issue, real verbal battles flare up among scientists.


Replication (from Latin replicatio - renewal) is the process of synthesis of a daughter molecule of deoxyribonucleic acid on the parent matrix. During the subsequent division, each of the daughter cells receives one copy of a DNA molecule identical to the DNA of the original mother cell. DNA replication is carried out by the replisome, a complex enzyme complex consisting of 15–20 different proteins.

A bit of molecular genetics

Recall that the transmission of hereditary information is based on a double-stranded DNA molecule. It is a polymer of four types of monomers (nucleotides): adenine (A), thymine (T), cytosine (C) and guanine (G) - and is folded into chromosomes. A person has 23 pairs of chromosomes located in the nucleus (22 pairs of non-sex and one pair of sex). They form the basis of our genome (another 37 genes contain mitochondrial circular DNA). If we took one human cell, sewed the entire diploid (paired) set of chromosomes together and pulled it into a thread, we would get a molecule two meters long, consisting of six billion base pairs (nucleotides). Three billion from dad and three from mom.


Drosophila melanogaster fruit fly. Model fly genome. Genome: 120 million base pairs. Genes: 13,500.

The most studied type of functional DNA sequences are genes encoding proteins. An RNA molecule is read from such genes, which then plays the role of a template for protein synthesis and determines their amino acid sequence. The coding part of an RNA molecule can be divided into triplets of nucleotides (codons), which either correspond to a certain amino acid or determine the end point of protein synthesis (stop codons). The rule of matching codons to amino acids is called the genetic code. For example, the codon GCC codes for the amino acid alanine.


Partially synthetic bacterium Mycoplasma laboratorium. A synthetic genome encoded with the names of the scientists who synthesized it. Genome: 580,000 base pairs. Genes: 381.

Shall we measure genes?

It was once thought that such a complex organism as a person must have a lot of genes. When the Human Genome Project was coming to an end, scientists even arranged a sweepstakes: how many genes will be discovered? Imagine their surprise when it turned out that the number of genes in humans and the small roundworm Caenorhabditis elegans is approximately the same. A worm has about 20,000 genes, while we have 20-25 thousand. For the “crown of creation”, the fact is rather offensive, especially considering that there are many organisms with both a larger genome (the genome of the lungfish protopter, Protopterus aethiopicus, is 40 times larger than the human one), and with a larger number of genes (rice has 32 −50 thousand genes).


The free-living nematode Caenorhabditis elegans. Small model animal genome. Genome: 100 million base pairs. Genes: ~20,000.

But in reality, less than 2% of the human genome encodes any proteins. What is the other 98% for? Maybe there lies the secret of our complexity? It turned out that there are important non-coding regions of DNA. For example, these are sections of promoters - nucleotide sequences on which the RNA polymerase enzyme sits and from where the synthesis of an RNA molecule begins. These are binding sites for transcription factors - proteins that regulate the work of genes. These are telomeres, which protect the ends of chromosomes, and centromeres, necessary for the correct divergence of chromosomes along different poles of cells during division. Some regulatory RNA molecules are known (for example, microRNAs that prevent the synthesis of proteins of the corresponding genes on messenger RNA - copies of the source gene), as well as RNA molecules that are part of important enzymatic complexes - for example, ribosomes that assemble proteins from individual amino acids, moving along messenger RNA. There are other examples of important non-coding regions of DNA.


Arabidopsis thaliana. Small model plant genome. Genome: 119 million base pairs. Genes: ~25,000.

Nevertheless, most of our genome is like a desert: repetitive sequences, the remains of "dead" viruses that were once integrated into the genomes of our ancestors long ago; the so-called selfish mobile elements - DNA sequences that can jump from one part of the genome to another; various pseudogenes - nucleotide sequences that have lost the ability to encode proteins as a result of mutations, but still retain some features of genes. This is not a complete list of "ghosts" that live in the "cemetery of the genome."

Twice as smart as flies

Dr. Evan Birney came up with the idea for a sweepstake over the number of human genes in a laboratory bar at Cold Spring Harbor shortly before the completion of the Human Genome Project. As we approached the final, from 2000 to 2002, the stakes rose from $1 to $20. As a result, the bank was divided “into three”: Paul Dear from the British Medical Research Council, who, back in 2000, bet on his date of birth - 27.04 .1962 - 27,462, Lee Rowan from the Institute of Systems Biology in Seattle - in 2001 she bet on the number 25,947, and Oliver Jaylon from the French company Genoscope (26,500). When the main winner, Dr. Deere, was asked how he managed to guess the number with such accuracy three years ago, when everyone thought that a person had at least 50,000 genes, he replied: “It was in a bar, late at night. Observing the behavior of drinking people, I thought that it differed little from the behavior of fruit flies, which have 13,500 genes, and therefore it seemed to me that twice the number of fly genes is enough for people.

Minimal mouse

There is a point of view that most of the human genome is non-functional. In 2004, the journal Nature published an article describing mice from whose genome significant fragments of non-coding DNA of 0.8 and even 1.5 million nucleotides in size were excised. It has been shown that these mice do not differ from ordinary mice in body structure, development, longevity or the ability to leave offspring. Of course, some differences could go unnoticed, but in general it was a serious argument in favor of the existence of "junk DNA", which can be disposed of without any special consequences. Of course, it would be interesting to cut out not a couple of million nucleotides, but a billion, leaving only the predicted gene sequences and known functional elements. Will it be possible to bring out such a “minimal mouse”, and will it be able to exist normally? Can a person get by with a genome only half a meter long? Perhaps someday we will know about it. Meanwhile, another important argument in favor of the existence of junk DNA is the presence of fairly close organisms with very different genome sizes. The puffer fish genome is about eight times smaller than the human genome (although there are about the same number of genes in it), and 330 times smaller than the genome of the already mentioned protopter fish. If every nucleotide in the genome were functional, then it is not clear why the onion genome is five times larger than ours?


The evolutionary biologist Susumu Ohno drew attention to the colossal differences in the size of the genomes of similar organisms. Ohno is credited with coining the term junk DNA. Back in 1972, long before the human genome was read, Ohno made plausible ideas about both the number of genes in the human genome and the amount of "garbage" in it. In his article “So Much Junk DNA in Our Genome,” he notes that there should be about 30,000 genes in the human genome. This number, at that time not at all obvious, turned out to be surprisingly close to the real one, which was recognized decades later. In addition, Ohno gives an estimate of the functional proportion of the genome (6%), declaring more than 90% of the human genome as garbage.


Mimivirus Acanthamoeba polyphaga mimivirus. The largest known virus genome. Genome: 1,181,404 base pairs. Genes: 979.

Find or trash?

The challenge to the idea of ​​the existence of junk DNA was thrown by the ENCODE project - The Encyclopedia of DNA Elements, "Encyclopedia of DNA Elements" (its first results were published in the journal Nature in 2012). Having received numerous experimental data on which parts of the human genome interact with various proteins, are involved in transcription - the synthesis of RNA copies of genes for subsequent translation (protein synthesis from amino acids on a messenger RNA matrix) - or other biochemical processes, the authors concluded that Over 80% of the human genome is functional in one way or another. Of course, this thesis caused a heated discussion in the scientific community.


Lung-breathing fish Protopterus aethiopicus. The largest known genome. Genome: 133 billion base pairs. Genes: a lot.

One of the more ironic papers published by Dan Graur, a molecular evolutionary bioinformatician and professor at the University of Houston, and colleagues in 2013 in the journal Genome biology and evolution, is titled: Gospel of ENCODE". Its authors note that individual members of the ENCODE consortium disagree on which part of the genome is functional. So, one of them soon clarified in the journal Genomicron that we are talking not about 80% of the functional sequences in the genome, but about 40%, and the other (in an article in Scientific American) completely reduced the figure to 20%, but at the same time continued to insist that the term "junk DNA" should be eliminated from the lexicon.


Human immunodeficiency virus (HIV). The rapidly changing human immunodeficiency virus genome. Genome: 9749 base pairs (but already mutated). Genes: 9, but they code for 18 proteins.

According to the authors of the article “On the Immortality of TVs”, members of the ENCODE consortium interpret the term “function” too freely. For example, there are proteins called histones. They can bind the DNA molecule and help it fold compactly. Histones can undergo certain chemical modifications. According to ENCODE, the putative function of one of these histone modifications is "a preference for being at the 5" end of genes" (the 5" end is the end of the gene from which DNA and RNA polymerase enzymes move when DNA is copied or during transcription). “About the same way you can say that the function of the White House is to occupy the area of ​​​​land at 1600 Pennsylvania Avenue, Washington, DC,” opponents say.

Kasha rode a motorcycle

Sometimes in the media you can hear the incorrect phrase "the genetic code has mutated." But mutations do not occur in the code, but in the DNA molecule (in the genome). As a result, the nucleotide sequences change. This can be compared to replacing a letter in a word. For example, the phrase "Masha rode a motorcycle" turns into the phrase "Sasha rode a motorcycle" if one letter M "mutated" into the letter C. Changing the genetic code is much more serious - it's like changing the alphabet. Let's imagine that in the whole text the letter M suddenly turned into the letter K. Now we have "Kasha rode a cat cycle." It is clear that such changes lead to significant consequences and therefore occur extremely rarely in nature. But they happen! For example, in some ciliates, one of the stop codons can code for the amino acid glutamine. But this is more the exception than the rule. Most organisms have the same genetic code: for example, a person, a worm or a cucumber. But the genomes of these organisms differ greatly. Same alphabet but different text.

There is also a problem with assigning a function to DNA segments. Suppose that a protein important for the functioning of the cell can attach to a certain DNA region, and therefore ENCODE ascribes a “function” to this region. For example, some transcription factor, a protein that initiates the synthesis of informational (matrix) RNA, binds to the following nucleotide sequence: TATAAA. Consider two identical TATAAA sequences in different parts of the genome. After the transcription factor binds to the first sequence, the synthesis of an RNA molecule begins, serving as a template for the synthesis of another important protein. Mutations (substitutions of any of the nucleotides) in this sequence will lead to the fact that the RNA will be poorly read, the protein will not be synthesized, and this will most likely negatively affect the survival of the organism. Therefore, the correct sequence of TATAAA will be maintained at a given location in the genome by natural selection, in which case it is appropriate to say that it has a function.


Fugu fish Fugu rubripes. The smallest known vertebrate genome. Genome: 390 million base pairs. Genes: 20-28 thousand.

Another TATAAA sequence arose in the genome for random reasons. Since it is identical to the first, a transcription factor also binds to it. But there is no gene nearby, so the binding does not lead to anything. If a mutation occurs in this area, nothing will change, the body will not suffer. In this case, it makes no sense to talk about the function of the second section of TATAAA. However, it may turn out that the presence of a large number of TATAAA sequences far from genes in the genome is necessary simply to bind the transcription factor and reduce its effective concentration. In this case, selection will regulate the number of such sequences in the genome.


Onion Allium cepa. One of the largest plant genomes. Genome: 16 billion base pairs. Genov: unknown.

To prove that a certain section of DNA is functional, it is not enough to show that some biological process occurs in this section (for example, DNA binding). Members of the ENCODE consortium write that DNA regions that are involved in transcription have a function. “But why is it necessary to focus on the fact that 74.7% of the genome is transcribed, while we can say that 100% of the genome takes part in a reproducible biochemical process - replication!” Graur jokes again.


Antarctic wingless mosquitoes Belgica antarctica. The smallest arthropod genome. Genome: 99 million base pairs. Genes: ~14,000.

A good criterion for the functionality of a piece of DNA is that mutations in it are harmful enough that significant changes in this section are not observed from generation to generation. How to identify such areas? This is where bioinformatics comes to the rescue, modern science at the intersection of biology and mathematics about the analysis of gene and protein sequences. We can take the human and mouse genomes and find all of the similar pieces of DNA in them. It turns out that in these two species some parts of the nucleotide sequences are very similar. For example, the genes necessary for the synthesis of ribosomal proteins are quite conservative, that is, mutations in them are harmful enough that the carriers of new mutations die out without leaving offspring. Such genes are said to be under negative selection, clearing out harmful mutations. Other regions of the genomes will have significant differences between species, which indicates that mutations in these regions are most likely harmless, which means that their functional role is small or not determined by a specific nucleotide sequence. A number of works have estimated the proportion of human DNA regions under the pressure of negative selection. It turned out that only about 6.5–10% of the genome belong to them, and non-coding regions, in contrast to coding regions, are much less subject to negative selection. It turns out that, from the point of view of evolutionary criteria, less than 10% of the human genome is functional. Notice how close Ono was to that estimate in 1972!


The bacterium Hodgkinia cicadicola. The smallest known bacterial genome. A symbiont bacterium with a non-standard genetic code. Genome: 144,000 base pairs. Genes: 189.

Trash Fortress

But is the remaining 90% of the human genome really garbage that is better to get rid of? Not certainly in that way. There are considerations that a large genome size might be useful in and of itself. In bacteria, genome replication is a serious limiting factor that requires a significant expenditure of energy. Therefore, their genomes, as a rule, are small, and they get rid of everything superfluous. In large organisms, as a rule, DNA replication of dividing cells does not make such a big contribution to the total amount of energy consumption of the body against the background of expenses for the work of the brain, muscles, excretory organs, maintaining body temperature, etc. At the same time, a large genome may be important a source of genetic diversity, increasing the chances of the emergence of new functional sites from non-functional ones due to mutations that are potentially useful in the evolutionary process. Mobile elements can carry regulatory elements, creating genetic diversity in the regulation of genes. That is, organisms with large genomes can theoretically adapt to environmental conditions faster, paying a relatively small additional cost for replicating a larger genome. We will not find such an effect in a single organism, but it can play an important role at the population level.


Homo sapiens Homo sapiens. The genome is supposedly 90% garbage. Genome: 3 billion base pairs. Genes: 20-25 thousand.

Having a large genome may also reduce the chance that a virus will insert itself into a functional gene (which can lead to gene breakdown and, in some cases, cancer). In other words, it is possible that natural selection can act not only to maintain specific sequences in the genome, but also to maintain certain sizes of the genome, the nucleotide composition in some of its regions, etc.


However, although the idea that only 80% or even 20% of the human genome is functional is debatable, this does not mean that the entire ENCODE project is subject to criticism. Within its framework, a huge amount of data has been obtained on how different proteins bind to DNA, information on gene regulation, etc. These data are of great interest to specialists. But it is unlikely that in the near future it will be possible to get rid of the "garbage" in the genome - both from the concept and from the unnecessary sequences themselves.

Up