ABSTRACT

The Candida Genome Database (CGD) is a public resource containing genomic information for those who are interested in the molecular biology

studies of fungal pathogen, Candida albicans. Researchers in CGD are collecting and combining previous research work to collect Candida albicans gene name and aliases to assign gene ontology terms, which provide the information regarding the molecular function, biological process and subcellular localization of each gene product. In addition, CGD is used to annotate mutant phenotypes and to summarize the function and biological context of each gene product in free-text description lines. The CGD search tools, such as Quick Search, Text Search, Gene/Sequence Resources, Ortholog Search and Pattern Match are designed according to search multiple species. Search results displayed on the top for all species, with sections for species-specific search results displayed below. CGD tools help to perform species-or sequence-specific searches (e.g., Gene/ Sequence Resources, Pattern Match, Advanced Search, Batch Download, Restriction Mapper, GO Term Finder, GO Slim Mapper). The CGD Locus Summary Page (LSP) provides information about the identity of orthologous genes and orthology-based functional predictions and gene descriptions in Candida glabrata. Both manual and computational gene, protein and sequence information of Candida albicans and the recently added species, Candida glabrata are displayed under CGD (Inglis et al., 2012). BLAST searches at CGD provide complete sequence sets for the combination of several Candida species, such as Candida albicans, Candida glabrata, Candida dubliniensis, Candida guilliermondii, Candida lusitaniae, Candida parapsilosis, Candida tropicalis, and Debaryomyces hansenii and Lodderomyces elongisporus (Altschul et al., 1990; Jones et al., 2004; Dujon et al., 2004; Butler et al., 2009). LSP is the central organizing unit of the CGD website and it represents each gene in CGD. The LSP provides access to tools for retrieval, analysis and visualization of gene data. It also acts as a platform to access information about the orthologs in Saccharomyces cerevisiae (Inglis et al., 2012). InParanoid algorithm is used to define orthology relationships, which identify reciprocal best BLAST hits between species (Remm et al., 2001). Protein tab on LSP provides similarity based information of each protein-coding gene through descriptions and a graphical display of conserved protein domains and motifs identified using InterProScan software (Zdobnov and Apweiler, 2001; Hunter et al., 2009). It displays the most similar protein in the Protein Data Bank and provides information on protein length, molecular

weight, sequence and a table of calculated physicochemical properties (Rose et al., 2010). This is publicly funded and is freely available at https://www.candidagenome.org (Arnaud et al., 2005).