Cannabis is probably best known for one secondary compound, the psychotropic substance tetrahydrocannabinol (THC). Depending on the THC content of the plant, or more specifically the dried inflorescence,Cannabis is either classified as marijuana (or drug-type, plants above 0.3% THC) or hemp (fibre-type, below 0.3% THC), which is mainly a legal and not a strict taxonomic classification. A more refined classification of Cannabis according to the phytocannabinoid profile into distinct ’chemotypes’ can also be useful, with chemotype I and II being marijuana while chemotypes III, IV and V can be seen as hemp (see chapter 3).
Many countries have been easing the ban on medical and even recreational use of THC during the past decade. However, because of the prohibition of Cannabis in many countries throughout the last century, it was not bred to the same extent as other high-value crops. Hence, hemp and marijuana lines retain a high level of genetic variability and heterozygosity, that is not found in other crops (Sawler et al., 2015).
Here, we review the biology as well as the applications and future perspectives of Cannabis research and breeding. We discuss Cannabis taxonomy and cannabinoid synthesis as well as flower development and flowering time control with an emphasis on sex determination in this predominantly dioecious species. We also summarize the currently available genomics resources. Since Cannabis is so versatile, we discuss its applications in medicine as well as in the building industry. Cannabis ’ future role in a sustainable society is summarized as well as the future of cannabinoid production via cell suspension cultures.
Cannabis systematics
Cannabis is the botanical name of a genus that historically includes three species, C. sativa , C. ruderalis andC. indica . However, since the three species can intercross, they are also often considered one single species, C. sativa(Small, 2015).Recent genetic data support the single species concept and recommend that three subspecies should be recognized: Cannabis sativa s ubsp. sativa , subsp. indica and subsp. ruderalis(Q. Zhang et al., 2018).
Cannabis is a dioecious species, meaning there are male and female individuals (Figure 2a-c). However, through breeding, monoecious lines with male and female flowers on the same plant have also been generated (Figure 2d) (Moliterni et al., 2004).
The genus Cannabis is part of the Cannabaceae, a small family of flowering plants with 10 genera and some 120 species (Jin et al., 2020; Yang et al., 2013). The Cannabaceae have been estimated to have originated ca. 70 to 90 million years ago, and are distributed in temperate and tropical regions throughout the world (Figure 3) (Jin et al., 2020; Magallón et al., 2015). Most species of the Cannabaceae are trees or shrubs, Cannabis as a herb is therefore the exception rather than the rule in the family. However, a trait Cannabis shares with many other species in the family is the inconspicuous unisexual flowers (Yang et al., 2013).
The closest relative of Cannabis is the genus Humulus(Yang et al., 2013), which consists of three species, among which Humulus lupulus(hop) is economically important for the beer brewing industry. Both hop and Cannabis produce separate male and female flowers, and the trichomes in the female inflorescences are the site of secondary compound production that make both of those plants economically valuable (Page and Nagel, 2006).
Within the angiosperm phylogeny, Cannabaceae are most closely related to the Moraceae (mulberry or fig family) and Urticaceae (nettle family). Together with the Ulmaceae (elms and relatives) they form a group known as the urticalean rosids (Figure 3) (Sytsma et al., 2002). It is interesting to note that unisexual flowers appear to be prevalent in the urticalean rosids, whereas bisexual flowers are by far the dominant system in angiosperms in general (Renner, 2014; Sytsma et al., 2002). The evolution of sex expression and sex determination in this group is an interesting area of future research.
The urticalean rosids belong to the order Rosales, which are eudicots (The Angiosperm Phylogeny Group, 2016). Though the Rosales comprise some 7700 species (Zhang et al., 2011), they contain relatively few well characterized model plants. The flowering plant super-models Arabidopsis thaliana (thale cress, Brassicales) and Oryza sativa (rice, monocots) are only distantly related to Cannabis , the lineages leading to Arabidopsisand Cannabis separated some 120 million years ago, those leading to rice and Cannabis some 130 to 140 million years ago (Figure 3) (Magallón et al., 2015). Among the relatively well characterized plants that are more closely related to Cannabis are many Rosaceae species (rose family, apple, peach and relatives), for which several well assembled and annotated genomes exist (Aranzana et al., 2019; Zhang et al., 2019), the Cucurbitaceae (cucumber, pumpkin and relatives), which serve as an important model for sex determination and sex expression (Li et al., 2019; Schilling et al., 2020a; Zheng et al., 2019) and Fabaceae (bean family) for flowering time regulation (Cao et al., 2017; Schmutz et al., 2010) .
Cannabis sativa itself is phenotypically extremely diverse.Cannabis plants vary in numerous traits including height, leaf shape, photoperiod response, tetrahydrocannabinol (THC) and cannabidiol (CBD) content, plant architecture and sex expression (Clarke and Merlin, 2016; Grassi and McPartland, 2017; Raman et al., 2017; Schilling et al., 2020b). The dioecy of many Cannabis lines and thus the relatively high levels of heterozygosity further contribute to the fact that even within one cultivar the phenotypic diversity can be substantial (our unpublished observations).
For breeders and farmers, the high level of genetic and phenotypic diversity can be problematic, as a crop is usually best to handle when it possesses a high degree of uniformity in the field. However, at the same time, the existing diversity can be harnessed by breeders to produce new lines for a multitude of different purposes. For plant genetics research, the phenotypic and genetic diversity is a gold mine, as it provides the possibility to study the genetic basis of many traits in Cannabis . Some developments in this arena are outlined in the subsequent chapters, but many more are sure to come.
Ever more complex: The genetics of phytocannabinoid biosynthesis
One of the commercially most interesting and valuable products that can be generated from Cannabis plants are phytocannabinoids. We use the term phytocannabinoids here for plant derived cannabinoids, and to distinguish them from synthetic cannabinoids or those produced by the human endocannabinoid system. Phytocannabinoids are of great interest for medical applications (see chapter 8 for a detailed discussion) as well as commercial exploitations for recreational use. Hence, one of the major breeding goals involves the accurate prediction and targeted manipulation of phytocannabinoid profiles to ensure the optimal combination of active components in plant extracts (see entourage effect chapter 8) or legal compliance for non-psychoactive products.
While there are over 100 different phytocannabinoids described (Pertwee, 2014), three phytocannabinoids are usually at the centre of attention from a medical and commercial perspective: cannabigerol (CBG), cannabidiol (CBD) and tetrahydrocannabinol acid (THC) (Figure 4). Cannabisitself synthesizes phytocannabinoids in the carboxylated form with a carboxylic acid group, i.e. as CBGA, CBDA and THCA. However, to be active in the human endocannabinoid system, phytocannabinoids need to be consumed in their decarboxylated forms, which are usually generated by high temperature treatment (for example during smoking) (Moreno-Sanz, 2016). Phytocannabinoids are predominantly produced in female inflorescences, more precisely they are secreted from trichomes of perigonal bracts, subtending flowers, and leaves (‘sugar leaves’) within inflorescences. However, in lower concentrations, phytocannabinoids can also be detected in vegetative leaves at certain times during the growth period (Aizpurua-Olaizola et al., 2016).
Among all phytocannabinoids, THC is the major psychotropic one. However, chemically all molecules mentioned above are very similar in structure and are produced from the same precursor molecules (Figure 4). CBDA and THCA are biochemically synthesized by two closely related enzymes, CBDA and THCA synthase (Shoyama et al., 2012; Taura et al., 1996). CBDA and THCA are both synthesized from CBGA, while CBGA is synthesized from two non-cannabinoids, olivetolic acid and geranyl pyrophosphate by a prenyltransferase (Fellermeier and Zenk, 1998)(Figure 4). Cannabichromenic acid (CBCA) synthase converts CBGA to CBCA (Morimoto et al., 1997) and is closely related to THCA and CBDA synthase (Figure 5), but the CBCA content of most mature Cannabis flowers is low (de Meijer et al., 2009a). Interestingly, CBDA synthase-like genes have been found in other plants and fungi (Aryal et al., 2019; Vergara et al., 2019).
Cannabis plants can have very high levels of phytocannabinoids or close to no phytocannabinoids at all, or anything in between (Aizpurua-Olaizola et al., 2016; de Meijer et al., 2009a). This has stipulated the description of different chemotypes that are characterized by their distinct phytocannabinoid profiles. The chemotypes are a very useful concept for chemical classifications and for breeding programmes. It should be kept in mind, however, that they do not necessarily constitute a phylogenetic classification based on evolutionary relationships (de Meijer et al., 2009b; Small and Beckstead, 1973). Cannabis plants can roughly be categorized into five different ‘chemotypes’ (Figure 4). Plants of chemotype I (short ‘type I’) produce high levels of THCA and only low levels of CBDA and CBGA (Small and Beckstead, 1973). This means the ratio of THCA/CBDA is much larger than 1. In type II Cannabis plants THCA and CBDA are both produced in approximately equal amounts (Small and Beckstead, 1973). Both, type I and type II plants, are usually classified as ‘marijuana’ and can underlie strong regulations, depending on the country or jurisdiction. These plants are bred to produce up to 20 % of their dry mass as phytocannabinoids.
In contrast, type III plants have high CBDA levels and low to very low amounts of THCA.
Chemotype IV and V refer to Cannabis plants which have CBGA as their dominant phytocannabinoid or very low levels of phytocannabinoids overall, respectively (de Meijer et al., 2009a; de Meijer and Hammond, 2005)(Figure 4).
In addition to the five different chemotypes, also the hemp-marijuana distinction is used to characterize different Cannabis plants (Figure 4). If the THC/THCA content in the dry flower mass is below 0.2-1 %, these plants are usually categorized as hemp, above that as marijuana (depending on the jurisdiction this threshold can vary) (Brunetti et al., 2020; Mead, 2017). The differentiation between hemp and marijuana can typically also be drawn genetically, with hemp and marijuana varieties forming two genetically distinct populations (Sawler et al., 2015). Further, hemp and marijuana can be phenotypically quite distinct with marijuana plants generally being bushier and with a dense set of inflorescences while hemp plants tend to be taller, less branched and with less dense flower structures. However, there are also plants with low THC/THCA content (type III) which strongly resemble marijuana in overall plant and inflorescence architecture (Grassa et al., 2018). Hence, the terms hemp and marijuana do not necessarily always refer to distinct genetic populations or phylogenetic categories. As the critical distinction between hemp and marijuana is the THC/THCA content, they can also be considered broader categories of chemotypes.
The underlying genetics of the different chemotypes have been studied in quite some detail in the last two decades (de Meijer et al., 2009a, 2009b, 2003; de Meijer and Hammond, 2005; Pacifico et al., 2006; Toth et al., 2020; Weiblen et al., 2015; Welling et al., 2016). However, the complex nature of the Cannabis genome with its many transposable elements, low complexity regions and high heterozygosity have made a conclusive analysis of the loci controlling phytocannabinoid production challenging (Grassa et al., 2018; Laverty et al., 2019; McKernan et al., 2018).
Different genetic loci had been postulated which determine a plant’s chemotype, they are encoding for the different types of synthases: at locus B two codominant alleles were hypothesized to exist, the allele BT encodes for the THCA synthase, BD for the CBDA synthase (Figure 4)(de Meijer et al., 2003). Depending on the presence of either or both loci, the plant will be chemotype I (BT/BT), chemotype II (BT/BD) or chemotype III (BD/BD) (de Meijer et al., 2003; Toth et al., 2020; Welling et al., 2016). Additionally, non-functional alleles of the synthase gene (B0) are predicted to be associated with chemotype IV, where neither CBDA nor THCA are produced and the precursor, CBGA, accumulates (Figure 4) (de Meijer and Hammond, 2005; Onofri et al., 2015; Welling et al., 2016).
Further, according to this model, CBCA synthase is encoded by an independent locus (C) while another independent locus (O) is relevant for precursor production, with a knockout resulting in overall minimal phytocannabinoid levels (Figure 4) (de Meijer et al., 2009a, 2009b).
The genetic basis of the chemotypes was analysed in detail by producing a cross between high-THC Purple Kush (chemotype I) and low-THC Finola (chemotype III). This resulted in an F1 generation of mainly type II plants, producing both, THCA as well as CBDA (Weiblen et al., 2015). This confirmed earlier findings of crosses between type I and type II plants, resulting in intermediate type II individuals (de Meijer et al., 2003). The segregation pattern of phytocannabinoid profiles in the F2 generation pointed towards a Mendelian inheritance pattern: type I, type II and type III plants were all observed in the F2 generation with the expected distribution of 1:2:1 (de Meijer et al., 2003; Weiblen et al., 2015). A correlation of the expression of either THCA or CDBA synthase with the respective chemotype was also observed and the THCAS/CBDAS locus could be mapped (Weiblen et al., 2015).
However, although these findings were consistent with the idea of codominant alleles at one single locus, it became apparent that the situation is more complex (Grassa et al., 2018; Laverty et al., 2019; Weiblen et al., 2015). New draft genomes generated with third generation sequencing technology indicated that the THCA and CBDA synthases do not seem to be encoded by alleles of one and the same gene, but rather by distinct loci in marijuana and hemp, respectively, without a clear counterpart in the other genome (Grassa et al., 2018; Laverty et al., 2019). Sequencing of the hemp cultivar ‘Finola’ and the marijuana cultivar ‘Purple Kush’ indicates that a functional CBDA synthase gene is present only in in the ‘Finola’ genome while the ‘Purple Kush’ genome only encodes for a functional THCA synthase (Laverty et al., 2019). While mapping to approximately the same region in both genomes, the DNA sequences surrounding the respective synthase genes are drastically different from each other. Further, a low albeit still detectable recombination rate between the two loci supports the notion that they are genetically distinct (Laverty et al., 2019). The sequencing of a different Cannabis variety (‘CBDRx’), which is a chemotype III hemp-marijuana hybrid revealed an even more complex genomic arrangement with a number of pseudo- and functional synthase genes in three different cassettes on the same chromosome (Figure 5) (Grassa et al., 2018).
The CBDA and THCA synthase genes themselves seem to be embedded in cassettes of multiple tandem duplications of putatively non-functional synthase genes, which are regularly interspersed with long terminal repeat (LTR) retrotransposons, making the assembly and analysis of these loci even more challenging (Figure 5) (Grassa et al., 2018; Laverty et al., 2019). This is also the reason why these complex loci could not be resolved in the first published Cannabis genome, which relied on short-read sequencing data (van Bakel et al., 2011). This genomic constitution, where the difference between marijuana and hemp comes down to a large structural variation is, if true, very unusual. Hence, the aforementioned locus “B” with its different alleles might look very different from what was previously assumed to be simple isoforms of a single gene.
The complexity of phytocannabinoid synthases does not end there, though. Copy number variation of CBDA and THCA synthase genes might be involved in phytocannabinoid level and composition (Vergara et al., 2019) and most likely, the number of synthase (pseudo)genes might be different for each cultivar sequenced (Grassa et al., 2018; Laverty et al., 2019; McKernan et al., 2020).
High throughput assays for BT and BDmarkers have been developed and show that many plants actually contain both loci (Cascini et al., 2019; McKernan et al., 2020; Toth et al., 2020). Moreover, many BD/BD plants, especially those with higher CBDA levels, have THCA levels of above 0.3 % of dry flower mass, despite the absence of a functional BT allele (Toth et al., 2020). This residual THCA is probably at least to some extent a by-product of the CBDA synthase itself. The THCA and CBDA synthase have a relatively high sequence similarity (83.85 %, Figure 5) and process the same precursor molecule, CBGA (Figure 4). In vitro studies have shown that the CBDA synthase produced CBDA and THCA at roughly a ratio of 20:1 (Zirpel et al., 2018). This is similar to ratios observed in planta in high-CBD hemp varieties as well (Toth et al., 2020; Weiblen et al., 2015). This potentially results in the problem that, if CBDA production is increased, THCA also increases as a by-product, even if plants do not express a functional THCA synthase. Cannabis varieties with very high CBD levels may thus be at risk of exceeding legal THC thresholds.
Understanding the exact genetics underlying the different chemotypes will be important for future targeted breeding approaches. Tight restrictions across the world make it difficult for farmers to grow chemotype III, IV and V varieties, because the presence of residual THC creates regulatory problems and uncertainties. Especially type III plants often have THCA/THC levels slightly above the legal THC limit (Aizpurua-Olaizola et al., 2016; Toth et al., 2020). Hence, one important breeding goal is going to be the generation of zero-THC lines which still produce high levels of CBD in the range of 15 to 20 % of dry flower mass. Whether this is possible to achieve is difficult to say, since even in the absence of a THCA synthase, CBDA synthases produce THCA as a by-product (Toth et al., 2020; Zirpel et al., 2018). This will, therefore, require identification of a CBDA synthase that does produce only very low or no amounts of THCA.In vitro experiments show that point mutations can alter the amount of by-products (Zirpel et al., 2018). Natural variation in synthase genes exists and have been linked to altered phytocannabinoid compositions (Onofri et al., 2015). Hence, naturally occurring or artificially generated CBDA synthase varieties could be used for targeted breeding in this direction.
In addition, Cannabis varieties used for fibre or seed production could be selectively bred and genotyped to have 0 % overall phytocannabinoids (chemotype V), as currently even the farming of these kinds of varieties is heavily restricted in many countries.
Other phytocannabinoids like CBG(A) and CBC(A) as well as the manifold variants of terpenes produced in Cannabis flowers are increasingly coming into focus in the medical research fields (reviewed in Booth and Bohlmann, 2019; Deiana, 2017; Pollastro et al., 2018), hence generating lines with specific phytocannabinoid profiles might be of interest in further research.
A hairy topic: Flower development and morphology inCannabis
The flower is the reproductive structure of flowering plants (angiosperms), which represent one of the most successful and diverse groups of organisms on this planet (Krizek and Fletcher, 2005). While the characteristic shape of the Cannabis leaf is often used as a symbol for the whole plant, Cannabis female flowers are of particular interest because they are the main site of production of pharmacologically active compounds (phytocannabinoids) (Spitzer-Rimon et al., 2019). Understanding the morphology of Cannabis flowers and their developmental genetics is therefore especially important.
The typical angiosperm flower consists of four different organ types, which are organized in concentric whorls: sepals, petals, stamens and carpels (Endress, 1992; Krizek and Fletcher, 2005). Sepals are in the outermost whorl and usually green and leaflike in appearance. Petals are in the second whorl and often coloured to attract pollinators. Petals together with sepals are termed the perianth and constitute the non-reproductive part of a flower. Stamens are typically located in the third floral whorl. They are the male reproductive organs and are composed of an anther and a filament. The anthers grow on top of the stalk-like filaments and are the site of pollen production. Finally, carpels develop in the fourth and central whorl of a typical flower. Carpels are the reproductive organs that contain an ovary inside which ovules develop. The tip of the carpel, the stigma, receives the pollen. The style connects the stigma to the ovary (Becker, 2020; Endress, 1992; Krizek and Fletcher, 2005).
Notably, the number, arrangement, and morphology of the floral organs varies substantially between different species of flowering plants (Endress, 2011; Theissen and Melzer, 2007). Most flowers contain, as described above, both carpels and stamens, and are therefore termed bisexual flowers (Renner, 2014). However some 15 % of flowering plant species are monoecious or dioecious and have unisexual flowers that develop only stamens or carpels (Renner, 2014). In dioecious plants, female and male flowers develop on separate individuals. In contrast, in monoecious plants male and female flowers develop on the same individual (Renner, 2014).
Cannabis is primarily dioecious (Moliterni et al., 2004). The male Cannabis flower is green-yellow in appearance and has a perianth of five sepals, while petals are completely absent. Further, an individual male flower contains five free stamens, and no female reproductive organs (Figure 6a and b) (Leme et al., 2020; Spitzer-Rimon et al., 2019).
On the other hand, the female flower is enclosed within a green leaflike perigonal bract. The perigonal bract is sometimes also described as a sepal, but morphological studies agree that it is a bract (Leme et al., 2020; Spitzer-Rimon et al., 2019). As such, it is not strictly a part of the flower. Between the perigonal bract and the carpel is a membranous and hyaline perianth which tightly embraces the ovary (Leme et al., 2020; Reed, 1914; Spitzer-Rimon et al., 2019). It is worth noting that this inconspicuous perianth sometimes is not mentioned in the structure of female Cannabis flowers or is considered missing as it is not visible from the outside of the flower. Most likely, these membranous structures are homologous to sepals (Leme et al., 2020). At the top of the ovary are two filamentous styles. The stigma is brush-like and has epidermal cells elongated into hair-like projections (Reed, 1914; Lemeet al., 2020) (Figure 6c and d).
The commercially interesting phytocannabinoids and terpenes are predominantly produced on the perigonal bracts of female flowers, more specifically in glandular trichomes that cover those bracts. Glandular trichomes can be categorized into sessile, stalked and bulbous trichomes (Hammond and Mahlberg, 1973), with bulbous trichomes being metabolically less active (Livingston et al., 2020). Cannabis plants also have non-glandular trichomes: hair-like uni- or multicellular trichomes which protect them from biotic and abiotic stresses (Andre et al., 2016; Dayanandan and Kaufman, 1976). However, glandular trichomes are the main site of phytocannabinoid synthesis (Furr and Mahlberg, 1981).
Because phytocannabinoids are cytotoxic in higher concentrations, they have to be secreted and are not stored within cellular compartments. Phytocannabinoids along with other secondary metabolites are secreted from glandular trichomes with a globose head-like structure (Figure 7). This head is formed by an enlarged secretory cavity which is surrounded by a culticule that encapsulates the secreted secondary metabolites (Hammond and Mahlberg, 1973). At the base of the head is a layer of secretory cells (Kim and Mahlberg, 1991; Livingston et al., 2020). The head can be sessile, directly on the epidermis and often be found on vegetative leaves (sessile trichomes), or pre-stalked or stalked with the head being elevated above the epidermis (pre-stalked and stalked trichomes), which are mainly found on female inflorescences (Kim and Mahlberg, 1991; Livingston et al., 2020). Additionally, these structures can be distinguished by different levels of autofluorescence, cell numbers as well as phytocannabinoid and terpene profiles (Livingston et al., 2020; Turner et al., 1978). Stalked trichomes seem to be developing from pre-stalked trichomes and contain a terpene profile distinct from true sessile trichomes (Livingston et al., 2020). Transcriptome analysis of floral trichomes of a CBD hemp (‘Finola’) confirmed high expression levels of genes involved in the synthesis of phytocannabinoids, terpenes and their respective precursor molecules in glandular trichomes, with expression differences between bulbous, sessile, and (pre-)stalked trichomes (Livingston et al., 2020).
It is not clear why predominantly female plants produce glandular trichomes within their inflorescence structures. Illuminating the genetic underpinnings of this sexual dimorphism remains a challenge for further research. Glandular trichomes also develop on male flowers (Leme et al., 2020), albeit at lower density and probably with less phytocannabinoids. Understanding which genetic factors restrict the development of glandular trichomes largely to female inflorescences during flower development would provide a valuable resource for an increase of phytocannabinoid production.
The battle of the sexes: Sex determination in Cannabis