Latest Updates and News from Genenetwork
2009-12-04: Search page has been redesigned. The new design provides more help to new users. (Implemented by RW Williams.)
2009-11-20: TouchGraph's Navigator software has a nifty applet that lets you see the relationship between GeneNetwork and other sites on the web. Click here to check it out.
2009-11-12: Swiss GeneNetwork site is on-line as of November 16. This mirror site is in the Laboratory for Integrative Systems Biology at the EPFL (Prof Johan Auwerx and colleagues) in Lausanne (Implementation by Evan Williams, Lei Yan, and Johan Auwerx.)
2009-11-6: Fly Toxicogenomic data sets are expected to be available on the GN production web site in early December 2009. We are currently experimenting with two data sets generated by Douglas Ruden and colleagues:
Raw data were provided by Dr. Grier Page and can be found on the GeneNetwork development server. (Implementation by Arthur Centeno, Robert W. Williams, Doug Ruden, Grier Page, and Xiaodong Zhou.)
- UAB Whole body D.m. mRNA control (Oct09) RMA
- UAB Whole body D.m. mRNA lead acetate (pbAc) (Oct09) RMA
2009-10-27: Fixed links to PKU SynDB. (The API interface methods was changed from GET to POST.) (Implementation by Lei Yan)
2009-09-24: A GeneNetwork Archive/Time Machine server has be set up. This systems allows users to work with older versions of GeneNetwork that correspond well with specific publications. Each version is designated by a time stamp (2009-03-04, for the March 4, 2009 version). (Implementation by Lei Yan)
2009-09-21: We have added 150 behavioral traits that are related to anxiety levels in the BXD strains of mice into GeneNetwork. Mice fall into five different treatment groups: baseline (BASE), treated with a saline control injection (no stress, only saline; NOS), treated with an ethanol (no stress, only ethanol; NOE), treated to mild restraint stress followed by a saline injection (RSS), treated to mild restraint stress followed by an ethanol injection (RSE). Each of these five sets of 30 behavioral assays is matched to a corresponding hippocampal gene expression data set (Illumina). Please search for "Melloni" in the BXD Phenotypes database ANY or ALL fields. (Implementation by Xiaodong Zhou, Melloni Cook, and Lu Lu.)
2009-09-09: Unit testing of all GeneNetwork servers and mirrors has been implemented. This system monitors the performance of six different GN functions using Get commands. Success is evaluated by comparing the length of the returned html document with expectation. When a system fails, an error message is sent to the Sys Admin. (Implementation by Lei Yan)
2009-09-09: Barley gene expression data has being integrated into GeneNetwork. All data are from Matthew Moscou, Roger Wise and Nick Lauter at the Iowa State University. Data is currently public with the correspondent acknowlegment of data use and disclaimer, and those interested in obtaining full access should contact moscou at iastate.edu (Implementation by Matthew Moscou, Robert W. Williams, and Arthur Centeno)
2009-09-09: We have added a simple web interface used by data managers to delete phenotype trait. (Implementation by Xiaodong Zhou)
2009-08-24: PARTIAL CORRELATIONS have been implemented in GeneNetwork on the beta test site and will soon be move to our production site. You can now compute correlations between a "reference trait" and any other large set of traits while controlling for variability associated with up to three other factors. For example, it is possible to compute the correlation between the expression of the gene formin 2 (Fmn2) and all other 45,000 probe sets in the BXD hippocampus data set while controlling for the effects of the genotype at the location of the Fmn2 gene itself. This is done by using the SNP rs6375522 that is located in Fmn2as a controlled cofactor. In this particular situation the partial correlation then provides you with a measure of the association between differences in formin 2 expression and all other transcripts in the absence of genetic variation in formin 2 itself. Partial correlations are a great way to study correlations between genes with control for sometimes unwanted sources of variability--linkage with neighboring genes being one important unwanted source. Partial correlations can also be used to remove effects of sex, age, batch effect, etc. Users can upload their own "correction" factors that can be used to calculate partial correlations. (Implementation by David Crowell)
2009-08-20: SNP BROWSER UPGRADE. We have reworked and renamed the SNP Browser module of GeneNetwork. The new name is Variant Browser, a change we made because we are adding data on indels (about 140,000 so far). All indel data is from our own comparative whole-genome sequence analysis of indels between C57BL/6J and DBA/2J. The figure below, for example, shows four indels in the mouse Hc gene. The Variant Browser is currently only useful for the analysis of mouse genomes, but we hope to provide useful links and tools from other species in the next few years. (Implementation by Evan G. Williams)
Legend: Example of the new Variant Browser illustrating results of an indel search for hemolytic complement (Hc).
2009-08-12: A massive human brain expression data set (363 cases) for the study of Late-Onset Alzheimer's Disease (LOAD) with age-matched elderly control subjects has been added to GeneNetwork. This cortical expression data set is taken from the work of Dr. Amanda Myers and colleagues (see GEO GSE15222, Webster et al., 2009). (Implementation by Robert W. Williams and Arthur Centeno)
2009-08-10: Service upgrade: We have developed a program that monitors free space on GeneNetwork servers. When hard drive space gets too tight, email notification is sent to system administrators. (Implementation by Lei Yan)
2009-08-10: The QTL Heat Map and the QTL Cluster Map have been significantly improved. The QTL heat map now allows the user to reorder traits with much greater flexiblity, using all of the sorting functions that are built into the Trait Collection window (sort by position, sort by database, sort by symbol, sort by LRS, etc.). By default, the order of traits in the Traits Collection window is used by the QTL heat map. In addition, the QTL heat maps can be clustered, as usual, by trait correlations--the Cluster Map function developed originall by Elissa Chesler. We have made some minor changes to the heat map display that allow figures to be oriented horizontally. (Implementation by Xiaodong Zhou.)
2009-07-29: We have fixed the "Set to Default" function so that it works properly with any species on GeneNetWork. (Implementation by Xiaodong Zhou)
2009-07-27: All mouse genome position data on GeneNetwork have been updated from mouse assembly mm8 to assembly mm9. This has affected the position of the genes, probes, and SNPs. The mm8 assembly remains on the mirrors, temporarily. (Implementation by Evan Williams)
2009-07-27: We have fixed the Principal Component Analysis (PCA) function to handle as many traits as computationally feasible given the sample size and numbers of cases. This improved PCA code is used when computing correlation matrices. (Implementation by Xiaodong Zhou)
2009-07-26: Human brain expression data in patients with Alzheimer's disease and age-matched elderly control subjects. This cortical expression data set is taken from GEO GSE5281 (Liang et al. 2006). Samples were laser-captured from cortical layer 3 (except the hippocampus) and run on the Affymetrix U133 Plus 2.0 array. We renormalized the data to an average expression of 8 units on a log2 scale. Two versions of the data have been entered in GeneNetwork: one consisting of 157 of 161 arrays; the second consisting of what we regard as the best 102 arrays (those with mean correlations of better than 0.88 with all other arrays). Case IDs have the following code structure: Brain Region, GEO ID, Sex, Age, Disease Status. E119615M63N is a sample of the entorhinal cortex of case GSM119615, a male 63 year old normal case. The tissue codes are E = enorhinal cortex, H = hippocampus pyramidal layer, MT = medial temporal cortex, PC = porterior cingulate cortex, SP = supeior frontal cortex, V = primary visual cortex. GeneNetwork does not allow sophisticated display of the data, but you can perform correlation analyses of any of the 56,000 probe sets. For example expression of the APP transcript is higher in the AD cases and correlates well with many other AD related genes. At least 7.5% of cases are assigned incorrectly by sex (see INFO file) (Implementation by Robert W. Williams and Arthur Centeno)
2009-07-24: We have added annotation for the Affymetrix U133 Plus 2.0 human microarray (GPL570). Annotation file download from NetAffx July 2009. Minor reannotation by Arthur Centeno prior to entry into GeneNetwork. This annotation file will be used with GSE5281 (steph-affy-human-433773), a data set of gene expression in six brain regions from normal and Alzheimer's disease patients.
2009-07-22: We have finished the first phase of sequencing the genome of DBA/2J at UTHSC (SOLiD) and UCLA (Solexa) and have extracted a set of 2.8 million SNPs with comparatively high quality scores that differ between DBA/2J and the C57BL/6J. Many of these SNPs are novel. Some overlap with Celera and Perlegen data. All SNPs have been added to the GeneNetwork SNP browser using the laboratory identifier symbols "MRS1xxxxxxx" label. (Implementation by Evan Williams and Xusheng Wang)
2009-07-21: Imputed alleles has been added for 6 million SNPs and a panel of 74 strains in the GeneNetwork SNP browser. Data were generated by the Genome Dynamics group at the Jackson Laboratory using the Perlegen data set as a reference. The GN SNP browser tables are now more significantly larger in terms of both mouse strain and SNP coverage. The figure below illustrates results for Zim3 in the GN SNP browser. (Implementation by Evan Williams)
Legend: Example of the updated SNP browser; SNP search results for Zim3
2009-07-20: SymAtlas links in GeneNetwork has been replaced by BioGPS links. These new links provide summary data on expression of genes in 50 or more tissues in seveal species. BioGPS also provides data on expression QTLs. We thank Dr. Andrew Su and colleagues at the GNF. (Implemented by Xiaodong Zhou)
GeneNetwork is integrated with the Gene-set Cohesion Analysis Tool (GCAT). GCAT determines the functional coherence of gene sets by performing latent semantic analysis of Medline abstracts. To try this new function, select a set of genes/transcripts in one of the GeneNetwork Correlation Tables and then click on the GCAT button (upper left). (Implementation by Xiaodong Zhou, Lijing Xu, Ramin Homayouni and colleagues at the University of Memphis)
Legend: GCAT output graph of genes associated with Comt
2009-06-20: Expression data for the ventral tegmental area (VTA) of the midbrain has been added to the BXD panel of mouse strains. Three related data sets--saline control, acute ethanol, and an ethanol vs saline contrast S scores--were generated by Dr. Michael Miles and colleagues. All data sets are currently being tested. Contact Dr. Miles at VCU Medical Center for access (Implemented by Arthur Centeno, Xiaodong Zhou, and Michael Miles)
2009-06-14: We have upgraded the Network Graph (Association Network) functions in GN. You can now change color and thickness of lines (edges) that connect nodes (usually transcripts or phenotypes). Each node of these graphs is hyperlinked to the underlying data. (Implemented by Lei Yan)
Legend: Example of the type of network graph that can be generated using both expression and phenotype data sets.
2009-06-05: Upgraded the annotation to the Affymetrix Rat RAE 230 and RAE 230 2.0 microarrays. We have also incorporated the probe sequence data for the newer array. Annotation for the rat Exon 1.0 ST array is also in progress. (Implemented by Xioadong Zhou and RWW)
2009-04-28: The edit HTML function was changed from cgi technique to mod_python. Our goal is to eventually change all the few functions in GN that still use CGI technique to mod_python (Implementation by Xiaodong Zhou)
2009-04-28: In each probeset info file, one link was added (Accession number: GNxxx) so that user can download the raw data of this particular dataset. (Implementation by Xiaodong Zhou)
2009-04-24: Information on polymorphisms in micro RNA targets taken from the PolymiRTS Database have been added to interval map tables in GeneNetwork. (Implementation by Lei Yan)
2009-04-15: Microarray Annotation Files. We have begun to make all of the manually curated array annotation files used in GeneNetwork available here. (Implementation by Lei Yan, Arthur Centeno, Xusheng Wang, and Xiaodong Zhou)
2009-04-15: GN genotype data improvement. We have begun to revise all genotype database tables and data sets in anticipation of adding large mouse, rat, and human genotype data sets in the next few months. We have reconciled discrepancies between physical and genetic maps for all species and developed a new method to produce special files used for interval mapping (so-called "geno" files). (Implementation by Xiaodong Zhou and Rob Williams)
2009-04-15: GeneNetwork Database Improvement. All datga (phenotype, genotype, and expression data) in GeneNetwork have previously been entered into a single massive table (data table) in fully normal form. To improve performance we have now split quantitative data for individuals, cases, and strains into four related tables: 1. PublishData (classic phenotype data), 2. GenoData (genotypes), 3. ProbeSetData (gene expression data), and 4. ProbeData (individual probe expression data). The reason to do this is that these data types are used and updated in different ways. An original table that held all error term data (the SE table) has also been separated in the same way. Our intent is to improve SQL query performance and reduce recovery time. (Implemented by Xiaodong Zhou)
2009-04-07: Usage Statistics for Lily, one of the main GeneNetwork server are now available. GeneNetwork runs on a set of servers and we occasionally rotate the production server and staging server (currently named StatisticsLily and StatisticsProust). (Implemented by Lei Yan)
2009-02-20: We are making some GUI interface improvements to the Genome Graph interface. (Implemented by Lei Yan)
2009-02-20: Trait Correlation Upgrade. The most important innovation from the user's point-of-view is a fast method to compute correlations among transcripts that reduces the response latency about 20-fold--from 90 seconds to under 5 seconds using standard Affymetrix and Illumina data sets consisting of 45000 probes or probe sets. Of even greater importance, this improvement means that GN can now handle massive Affymetrix Exon 1.0 ST data sets that have over 1.1 million probe sets in about 60 seconds. The new method exploits a set of text files that are external to the database (essentially a materialized view), parallel computing technique (Parallel Python) and optimized SQL queries. (Implementation by Xiaodong Zhou and David Crowell).
2009-02-18: Data Security Upgrade. New data security system has been active since Feb 18, 2009. You only have access to the confidential dataset to which you are assigned. If you have any questions, please contact Dr. Robert Williams or Xiaodong Zhou. (Implementation by Xiaodong Zhou and Hongqiang Li).
2009-2-13: Add link to Ontological Discovery Environment (Implemented by Lei Yan)
2009-01-20: Human and Rat Trait Correlation pages have been upgraded to included data on tissue correlations and literature correlations. (Implementation by Xiaodong Zhou).
2009-01-16: Advanced Search GUI The GUI for advanced search function has been developed. This GUI is much more user friendly than the old text interface (Implementation by Lei Yan).
2009-01-08: Link to WebGestalt 2.0 The links in GN to WebGestalt have been switched to its new edition WebGestalt 2.0 (Implementation by Lei Yan).
2008-12-30: The UML class diagrams for GN python code have been finished. The inheritance among classes and dependency among modules are also represented by UML diagrams. (Implemented by Xiaodong)
2008-12-30: The UML database diagram for GN database has been finished (Implemented by Hongqiang)
2008-12-23: Heritability values have been added to the BXD Eye HEIMED mRNA data set. These data are shown in the Basic Statistics page for this particular data set. We hope to routinely add this type of information for new array data as they are added to GeneNetwork. (Implementation by Lei Yan and Xusheng Wang).
2008-12-19: Array Annotation Query, Download, and Annotate module has been added to GeneNetwork. (Implementation by Lei Yan and Xusheng Wang).
2008-12-18: New rat HXB/BXH gene expression data from Norbert Hubner and colleagues is being entered into GeneNetwork. The following four tissue types are being added: adrenal glands, soleus muscle, heart, and liver. These data are still experimental but will be opened in 2009.
2008-12-4: Tissue Correlation scores for the great majority of genes have been added to GeneNetwork. A high tissue correlation between two genes indicates that they tend to be expressed together across a set of diverse tissues and organs. Tissue Correlations can be computed from most Trait Data and Analysis pages by selecting Tissue Correlation, Pearson's r or Tissue Correlation, Spearman's rho using the Trait Correlation analysis tool. (Implementation by Xiaodong Zhou and Xusheng Wang)
2008-12-4: University of Western Australia GN mirror site is up and running in the laboratory of Dr. Grant Morahan. This system built and maintained by Munish Mehta. Munish also has an active code development program with novel tools and resources.
2008-12-3: The JAX Mouse Diversity Genotyping array is being used to regenotype the BXD strains. This array was designed by Fernando Pard-Manuel de Villena and Gary Churchill. The array generates as many as 625,000 SNPs and 900,000 invariant genomic probes (one SNP about every 4.3 kb, see JAX Notes, Winter 2008, No 512). The genotypes be integrated into GeneNetwork in the next three months. (Implementation for GN by Lu Lu, R. Williams, and Hongqiang Li).
2008-11-28: Gene-set Cohesion Analysis Tool has been integrated into GN. GCAT Home (Implemented by Xiaodong Zhou)
2008-11-10: One new server Lily has been installed and configured as GN main server. (Implemented by Lei Yan)
2008-10-20: First human expression data set for lymphoblasts (Epstein-Bar virus immortalized B-cells) from the CEPH panel (large Mormon families) have been integrated into GeneNetwork. The data are original from a study by Monks and colleagues (2004). Please note: These data can currently be used to study patterns of coexpression among transcripts, but we have not yet implemented mapping algorithms and do not expect to make interval mapping available until summer 2009. We thank Stephanie Santorico for providing her data and help. (Implemented by Hongqiang Li, Stephanie Monks Santorico, Xusheng Wang, and Arthur Centeno.)
2008-10-15: We have uploaded four LXS expression data sets for the mouse hippocampus (NOS, NOE, RSS, and RSE; n = 33 strains per data set generated using Illumina Mouse WG-6.1 Beadarray). The four sets are part of a study on the effects of ethanol and stress on brain gene expression. Please see the INFO files associated with these data sets for more background, for example, the "no restraint-saline" condition (NOS). (Implemented by Lu Lu and Arthur Centeno.)
2008-10-05: Major new BXD behavioral and drug-related phenotypes for 62 strains have been integrated into GeneNetwork by a large research consortium funded by the National Institute on Drug Abuse and the National Institute for Alcohol Abuse and Alcoholism. A total of 242 phenotypes were measured in collaboration with the Systems Genetics Group at the Oak Ridge National Laboratory. These records can be searched by selecting the BXD Phenotypes data set and entering the search string "Chesler." Please request permission from Drs. EJ Chesler and D Goldowitz to use these data. Primary phenotypic data on individual mice are available in the Mouse Track System. (Implemented by Elissa Chesler (email@example.com), Philip VM, Ansah TA, Blaha CD, Cook MN, Hamre KM, Lariviere WR, Matthews DB, Mittleman G, Goldowitz D)
2008-10-04: UCLA Department of Genetics GeneNetwork mirror site is now online. Data in this site is synchronized periodically (once per month) with the Tennessee production site. (Implemented by Evan G. Williams and Kev Adler.)
2008-10-05: An XML schema for expression genetic data sets of the type used by GeneNetwork has been generated by Ilze Druka and colleagues. This schema is based in part on the GeneNetwork database. (Implemented by Ilze Druka)
2008-9-15: GN code improvement by cleaning up "namespace pollution" Every GN python module used to import other module by "from SomeModule import *" so that the class, function and variable of the imported module can be used without the imported module name as prefix. Since one typical GN python module imports over ten modules, and GN python modules always import other GN python modules like chain (B import A, C import B, D import C ...), the namespace is seriously polluted. It not only makes the source code very difficult to read (for anything not defined in current module, programmer may have look up dozens of imported modules to find where it is defined), but also makes the dependency among modules unclear, so the change to one module always has "unpredictable" effects to other modules. As GN software keeps growing, the namespace pollution problem keeps building up and make it harder and harder to maintain the software and develop new features. A lot of effort has been made to change all GN python module to import module by "import SomeModule", and add the module's name as prefix to the the class, function and variable of the imported module (thousands of places). This work not only greatly improves the source code readability, but also make the dependency among GN python modules clear, hence much easier for maintenance and future development. (Implemented by Xiaodong Zhou)
2008-07-29: GeneNetwork TWiki site has been moved to a new server. (Implemented by Kev Adler)
GeneNetwork Roundup bug tracking site has been moved to a new server. (Implemented by Hongqiang Li)
2008-07-30: Massive UCLA microarray data sets for four F2 populations and four tissues have been integrated into GeneNetwork. Data are now available for up to 300 animals for each of these tissues:
Data sets were generated by Jake Lusis, Eric Schadt, and colleagues and include one of the first expression genetics study published in 2003 (Mouse BDF2 UCLA). Papers that describe several of these new data sets have been published (Yang et al. 2006; Schadt et al., 2008) and several massive data sets are open access (Mouse BHF2-Apoe UCLA, from GEO GSE2814, GSE3086, GSE3087, and GSE3088). Several other data sets are still being analyzed. For early access to still unpublished data sets (e.g., BH/HB F2 UCLA and CastB6/B6CastF2 UCLA) please contact Dr. Aldons Jake Lusis and colleagues. (Implemented by Evan Williams)
- Adipose tissue (see 2007)
- Brain (see 2006)
- Liver (see 2007)
2008-08-29: EYE data: The Mouse HEIMED whole eye gene expression data set has been extended to 99 strains, including 27 common inbred strains. With support from Dr. Barrett Haik, Eldon Geisert, Lu Lu and colleagues we have added data for 15 new strains of mice. Arrays were processed at the UTHSC by Weikuan Gu and colleagues. This data set is open for use without a password. (Implemented by RWW, Daniel Ciobanu, Lu Lu, and Arthur Centeno)
2008-07-15: The GeneNetwork MySQL configurations have been optimized, resulting in much faster and more reliable service. (Implemented by Kev Adler)
2008-06-20: GeneNetwork is now part of the NIH NCRR Biomedical Informatics Research Network (BIRN) with a fully configured BIRN equipment rack. Our thanks to BIRN-Coordinating Center and to the IT staff at UTHCS (Implemented by Bao Nguyen, Mark James, James Martin, Billy Hatcher, and Kev Adler)
2008-05-20: Neocortex: With support from the High Q Foundation, we have added a matched neocortex (cerebral cortex) gene expression data set for 73 strains of mice to accompany the striatum data set highlighted below. The neocortex data set includes estimates of expression for 20 common inbred strains, 52 BXD strains, B6D2F1 hybrids generated using the Illumina Mouse-6 v1.1 Sentrix array. Samples were generated by Glenn Rosen, Lu Lu, and colleagues. Arrays were processed at the UTHSC. This data set is open for use without a password. (Implemented by HS, RWW, Lu Lu, and Arthur Centeno)
2008-05-20: Striatum: With support from the High Q Foundation, we have added a striatum (caudate-putamen) gene expression data set for 75 strains of mice. The data set includes estimates of expression for 20 common inbred strains, 54 BXD strains, B6D2F1 hybrids generated using the Illumina Mouse-6 v1.1 Sentrix array. Samples were generated by Glenn Rosen, Lu Lu, and colleagues. Arrays were processed at the UTHSC. This data set is open for use without a password. (Implemented by HS, RWW, Lu Lu, and Arthur Centeno)
2008-04-20: Lung: We have added lung gene expression data set for 57 strains of mice. The data set includes estimates of expression for 8 common inbred strains, 47 BXD strains, and reciprocal F1 hybrids (B6D2F1 and D2B6F1) generated using the M430 2.0 Affymetrix array. Samples were generated by Lu Lu and colleagues. Arrays were processed by Yan Jiao and Weikuan Gu at the Memphis VA. This data set is still provisional and not available without a password. If you would like early access, please contact Prof. Klaus Schughart (Helmholtz Centre for Infection Research, Braunschweig, Germany) at firstname.lastname@example.org. (Implemented by HS, RWW, Lu Lu, and Arthur Centeno)
2008-04-10: Nucleus Accumbens: Dr. Michael Miles and colleagues have added gene expression data for the nucleus accumbens of the BXD strains into GN. The nucleus accumbens is an important part of the brain involved in emotional state and reward. It is also critically involved in drug abuse and alcoholism. Three complementary data sets have been submitted: expression in nucleus accumbens following a saline control injection, expression in nucleus accumbens following an injection of ethanol, and a data set that highlights the difference in expression between the two conditions (saline and ethanol). Each data set includes estimates of expression for 35 BXD strains, as well as C57BL/6J and DBA/2J. Samples were generated and processed by Miles and colleagues. The saline control data set is available without a password, but the two other data sets are still available only with a password. If you would like early access, please contact Prof. Michael F. Miles (Virginia Commonwealth University) at email@example.com. (Implemented by MM and Arthur Centeno.)
2008-04-10: Prefrontal Cortex: Dr. Michael Miles and colleagues have added gene expression data for the prefrontal cortex of the BXD strains into GN. The prefrontal cortex (PFC, or prelimbic neocortex) is an important part of the brain involved in emotional state and reward. It is also critically involved in drug abuse and alcoholism. Three complementary data sets have been submitted: expression in PFC following a saline control injection, expression in nucleus accumbens following an injection of ethanol, and a data set that highlights the difference in expression between the two conditions (saline and ethanol). Each data set includes estimates of expression for 35 BXD strains, as well as C57BL/6J and DBA/2J. Samples were generated and processed by Miles and colleagues. The saline control data set is available without a password, but the two other data sets are still available only with a password. If you would like early access, please contact Prof. Michael F. Miles (Virginia Commonwealth University) at firstname.lastname@example.org. (Implemented by MM and Arthur Centeno.)
2008-03-26: Dr. Fan Zhang has left the GN software development group. He has spent the past five months substantially reworking the architecture of GN hardware, rewriting the code to increase is portability, and in improving security. Thanks Fan for your many contributions and good luck back home in China.
2008-03-17: Two Affymetrix Mouse EXON ST 1.0 array data sets for the hippocampus (n = 84 strains) and striatum (n = 48 strains), including a variety of common inbred strains and numerous BXD strains have being integrated this week into GeneNetwork (see Mouse--BXD--Hippocampus and Striatum). These data were generated with the support of Affymetrix, Dr. David Kulp, and the High Q Foundation. Annotation on these two large data sets is in progress. (Implemented by Manjunatha Jagalur, Arthur Centeno, Xusheng Wang, Lu Lu, and Hongqiang Li).
2008-03-14: GeneNetwork site has been moved to two new Dell PowerEdge 1950 and 2950 8-core servers to provide more capacity and higher performance. (Implemented by Fan Zhang).
2008-03-2: GeneNetwork site usage is now being studied using Google Analytics. We hope this will allow us to improve usage and performance for GN clients. (Implemented by Fan Zhang).
2008-02-17: We have finished synchronizing the updated GN code between the UTHSC GN main site and the HZI GeneNetwork mirror. This is still a manual process that needs to be done once every few months. (Implemented by Fan Zhang and Rudi Albert).
2008-02-14: We have implemented a system that improves the synchronization of the main production database and with the set of servers that are part of the cluster. These servers are referred to by our development team as GN server "bundles". (Implemented by Fan Zhang and Kev Adler).
2008-01-20: GeneNetwork consists of a cluster of servers that have identical software code and identical databases. The creation of the individual servers (bundles) that are part of the GN cluster is quite complicated. Each bundle consists of an application stack (Linux CENT OS 5, MySQL 5.0, Apache HTTP server, Mod_Python, and many code libraries and scripts). Components of the bundles ideally work well regardless of the physical hardware and network configuration. We have now rewritten and annotated the GN code and reconfigured bundles to make them as independent as possible from their particular network situation. We refer to this effort as "de-hardcoding" or refactoring GN. GN bundles are now easy to be deployed with minimum manual work. The following tasks still need to be done: configuration of IP addresses, specification of absolute directory structure as DocumentRoot for Apache, recompilation of certain third-party Python libraries according to hardware infrastructure (i386 vs X64_86).(Implemented by Fan Zhang). The number of bundle servers has been increased to six.
GN Bundle Setup
2008-01-12: Probe level data has been added into the UCHSC BXD Whole Brain M430 2.0 November 2006 RMA data set. Affymetrix data sets in GeneNetwork always include the higher level "probe set" data, but in the case of this important whole brain data set from The University of Colorado we have also entered all of the individual probe level data. This means that users can drill down to examine the expression of individual 25-mer probes. To access the individual probe data please click on the PROBE TOOL button on the Trait Data and Analysis page of GN. Then click on any individual probe; the left hand column. (Implemented by Arthur Centeno, Hongqiang Li, and Daniel Ciobanu.)
2008-01-08: We have added a large Mouse Striatum Expression data set into GeneNetwork. This data set includes replicate estimates of expression in the dorsal striatum for 75 strains of mice (54 BXDs and 21 common strains). All data were generated using the new Illumina Mouse 6.1 bead array with support from the High Q Foundation and the NIH. A matched neocortical expression data set will be uploaded into GN in the next few months. (Implemented by Glenn Rosen, Lu Lu, Arthur Centeno, Hongqiang Li, and Rob Williams.)
Legend: Expression of the dopamine D1a receptor mRNA in the striatum from the new Illumina data set. Each bar provides expression values (log2 transformed values) for a single strain.
2007-12-08: John Stuart and colleagues have uploaded a Spleen M430 2.0 Expression data set into GeneNetwork for the CXB recombinant inbred strains and the two parents of this RI set 9C57BL/6By and BALB/cBy). This data set includes replicate estimates of expression for the whole spleen. (Implemented by Lu Lu, Arthur Centeno, Hongqiang Li, and Rob Williams.)
2007-10-20: In progress: GeneNetwork developers are working closely with the Biomedical Informatics Research Network to map GN data sets and metadata into OWL/RDF concepts. (Implemented by Rob Williams, Fan Zhang, and Bill Bug.)
2007-10-19: GeneNetwork codebase has been more throughly documented/annotated over the past several weeks by Kev Adler and Fan Zhang. Many important changes have been made to abstract the code and its calls to external resources (Implemented by Fan Zhang.)
2007-10-16: The performance of MySQL database as the data vault has been increased dramatically. The consumed time for running a standard correlation mapping has been decreased from 3-4 minutes down to less than 2 seconds. (Implemented by Fan Zhang.)
2007-10-16: A small Java application has been written to handle early steps in the processing and error-checking of microarray data sets that heading into GeneNetwork. The program is still a beta version and is available upon request from Hongqiang Li. It is called ArrayPipeliner. (Implemented by Hongqiang Li.)
2007-10-12: GeneNetwork has been moved to a new cluster. The GN MySQL relational database is now running on a Dell 2950. Our apologies for inconvenience associated with this move. (Implemented by Fan Zhang and Hongqiang Li.)
2007-10-24: A preliminary version Standard Operating Procedures for entering new microarray data sets into GeneNetwork are now available on the GeneNetwork TWiki pages (Implemented by Arthur Centeno, Hongqiang Li, and Rob Wiliams.)
2007-10-02: The Illumina mouse microarray annotation file has been substantially updated and extended by Xusheng Wang, Hongqiang Li, and Rob Williams. The new annotation file covers four variant arrays: Mouse 6, Mouse 6.1, Mouse 8, and Mouse 8.1. This annotation file is used by the Mouse LXS Hippocampus data (Mouse 6) and by three new Mouse BXD Hematopoietic cell data sets (Mouse 6.1) generated by Gerald de Haan's and colleagues. The Illumina Annotation files are available for users of GeneNetwork, but are not yet freely as a complete text file download. If you need the new annotation file rather than that provide by Illumina, please contact Robert W. Williams. (Implemented by Xusheng Wang, Hongqiang Li, and Rob Wiliams.)
2007-07-27: The SGO Literature Correlation data generated by the Semantic Gene Organizer (SGO) team has been updated. SGO is a patented algorithm based on latent semantic analysis that provides a score of the terminological overlap between any two genes based on PubMed records and abstracts (M Berry, Kevin Heinrich, R Homayouni, and colleagues). These scores range from 0 to 1 (cosine similarity) and can be used like correlation coefficients, although all values are positive. GeneNetwork provides the Literature Correlations along with expression level correlations in most Correlation Table output pages. Roughly 5% of literature correlations have an r value greater than 0.6. The Literature Correlation covers roughly 75% of known genes in mouse, rat, and human.
The new 2007 dataset covers 21,903 genes and was derived from a corpus consisting of 242,365 MEDLINE abstracts. (The previous literature correlation dataset dated back to 2005 and was derived from a set of 108,367 abstracts covering 13,129 genes.)
We have also added a new SGO Literature Correlation feature to the Trait Data and Analysis Form. You can now select any gene and find the top genes (100 to 2000) with which it shares common terms and context based on the literature. To try this feature select one of the Calculate: SGO Literature Correlation options in the Trait Correlations part of the Trait Data and Analysis form. Normally GeneNetwork computes correlations based on expression and then provides the literature correlation as a secondary data type. This new feature reverses the situation; now the literature correlation is primary and the expression data are given as a secondary data type. This is a quick and powerful way to determine whether expression data support particular correlations extracted from the literature. (Implemented by Lijing Xu, Nick Furlotte, Ramin Homayouni.)
Legend: The Literature Correlation feature can be accessed from the Trait Correlation tool.
2007-07-26: Affymetrix Mouse Exon 1.0 ST data are now available for the first time in GeneNetwork for the striatum of 30 BXD strains and 20 standard inbred strains. These data were generated with the support of the High Q Foundation. (Implemented by Hongqiang Li, Arthur Centeno, Manjunatha Jagalur.)
Legend: Access to the Mouse Exon 1.0 ST Affymetrix expression data set from the High Q Foundation.
2007-06-21: A European GeneNetwork mirror site http://genesys.helmholtz-hzi.de has been established at The Helmholtz Zentrum für Infektionsforschung in Braunschweig, Germany. A GeNeSys private site will operate as part of the German Network for Systems Genetics (GeNeSys). (Implemented by Evan G. Williams and Klaus Schughart, Experimental Mouse Genetics.)
2007-04-26: Zhaohui Sun, one of our lead programmers is moving to Dallas with his family. As a quick review of these recent News items will show, Zhaohui has made many important contributions to GeneNetwork over the past year. Thanks Zhaohui. We will miss you.
2007-04-26: The SNP Browser that is integrated into GN has been further improved. It is now possible to filter SNPs by domain (exon, intron, etc.) and by function (e.g., mis-sense, silent). The updated version also includes a wider variety of strains of mice. (Implemented by Zhaohui Sun.)
2007-04-25: GeneNetwork's Scriptable Interface is now being extended at the request of Dr. Graeme Wistow of the National Eye Institute NEIBank so that external users and bioinformatics site developers can directly access the "best" data for particular genes and transcripts from specific data sets.
For example, it is now possible to link directly to the Basic Statistics data page for the expression of the rhodopsin gene in the eyes of BXD mouse strains using a string that has this form:
"http://www.genenetwork.org/webqtl/WebQTL.py?FormID=showBest&gene=Rho&database=Eye_M2_0906_R" (no line break)
where "gene=Rho" can be replaced by "refseq=NM_145383" to retrieve data using the RefSeq identifier (do not use the decimal point and digit that may follow, such as "refseq=NM_145383.1"), or can be replaced by "geneid=212541" to retrieve data using the Entrez Gene identifier.
Although not recommended, the string "&searchAlias=1" can be added at the end of the command to retrieve data using the alias of a proper gene symbol when the original does not work. Thus "gene=RP4" will resolve to "gene=Rho" only if you add "&searchAlias=1" at the end of the command.
When a particular gene is associated with more than one probe or probe set, our code selects the single probe or probe set with the highest average expression in that particular data set. For example, there are four probe sets that traget different parts of rhodopsin mRNAs. Using the Affymetrix M430 2.0 array, probe set 1425172_at has the highest expression in the eyes of BXD strains (the probes target the last two exons and the proximal 3' UTR).
To choose a particular data set one needs to know the appropriate code, such as Eye-M2_0906_R in the example above. These codes can currently be obtained from:
This implementation will be moved from our beta site to the production GN site in early May, 2007. (Implemented by Hongqiang Li.)
2007-03-23: We have annotated Illumina's Sentrix Mouse-6 1.0 microarray BeadArray platform (see Mouse LXS Hippocampus data sets). We have added or corrected gene assignments, symbols, and gene descriptions for almost all of the 46,166 Illumina probes. We added many data types not provided by Illumina and the MEEBO consortium in their original annotation files (http://www.microarray.org/data/download/MEEBO_Data.txt), including updated Entrez Gene IDs, gene ontology categories, human orthologs, OMIM identifiers, and the precise locations of the 50-mer probe sequences on the most recent mouse genome assembly (Feb 2006, mm8). Some helpful metrics on this annotation: For 46166 probes on the Mouse 6 array platform (including control probes) we have identified 36687 NCBI Entrez Gene IDs; 26180 matched human Gene IDs; 23899 matched rat Gene IDs; 26882 NCBI HomoloGene IDs; and 12790 OMIM IDs. Position data for the 50-mer Illumina Mouse-6 array were BLAT aligned to the latest mouse genome assembly by Hongqiang Li. Many of the probes and alignments have been error-checked to a limited extent by RWW. Annotation is still continuing and we will be adding new data over the next several months. (Curation by Robert W. Williams, implemented into GeneNetwork by Hongqiang Li.)
2007-03-02: The new production site of Genenetwork has been built from the code base maintained in Subversion. Subversion has completely been incorporated to our software development practice. Previously we have built a development / demo site - http://web2qtl.utmem.edu from Subversion, which will be our official beta site from now on. The previous beta site has been merged into our current production (main) site. The users can still access the old main and the old beta site through the archive: http://www.genenetwork.org:82/search.html. This archive site still uses the position data from the older assembly of the mouse genome (UCSC mm6) so it allows the users to retrieve old results. (Implemented by Zhaohui Sun.)
2007-02-21: The SNP Browser that is integrated into GN has been updated to include all Perlegen/NIEHS SNPs, all Celera SNPs, Wellcome Trust-CTC SNPs, and the Mouse Haplotype Map SNPs. We will continue to improve the annotation and strain of coverage of these SNPs, but you will find that you can already download +/- 100 nt around the SNP or automatically BLAT SNPs to the UCSC Genome Browser. It is also now possible to search for SNPs by some of their identifiers. We have intentionally left in duplicate SNPs to allow ou to evaluate data consistency. The "Gap" value will be 0 for redundant SNPs. (Implemented by Zhaohui Sun.)
2006-11-28: We finished converting all mouse genomic and genetic data sets to the latest mouse genome sequence assembly (mm8). Prior to this day all mouse sequence data has used position data from the March 2005 mm6 assembly (equivalent to NCBI Build 34). Currently we are using all position data of mm8, equivalent to NCBI Mouse Build 36 and Ensembl Mus musculus version 40.36a, released in February 2006. (Implemented by Zhaohui Sun, Evan Williams and Rob Williams.)
2006-10-30: LXS Hippocampus Illumina Mouse-6 Sentrix array data (77 strains of mice represented by male and female pools, 43,514 probe sequences) have been uploaded into Beta site for testing. (Implemented by Lu Lu and Hongqiang Li.)
2006-09-16: The code base for GeneNetwork, WebQTL, and many other code modules used by GN have been successfully moved into a source code revision management system called Subversion. This will greatly improve the GN software team's ability to maintain the code, develop new applications, and to collaborate with other research groups to expand GN. This work provides a solid foundation for the scalability and portability of genenetwork. As part of this effort, we have set up a new beta site that is currently the official "product" of the Subversion tree at http://web2qtl.utmem.edu. (Implemented by Zhaohui Sun and Hongqiang Li.)
2006-08-21: GeneNetwork TWiki Code and Hardward documentation site has been set up at
http://webqtl.utmem.edu:81/bin/view/GeneNetwork/WebHome. (Implemented by Stephen Pitts and Zhaohui Sun, with help from Bill Bug, and Hongqiang Li.)
2006-08-20: Members of the Kidney Consortium have contributed a very large gene expression data for 68 strains of mice (adults of both sexes), including 53 BXD strains to GeneNetwork. The August 2006 and July 2006 data sets are preliminary and will be subject to change. Due to strong sex differences and imperfect representation of all strains by arrays from both sexes, we recommend using the sex-corrected data sets. The Kidney Consortium is lead by Erwin Bottinger at Mt. Sinai School of Medicine. All array data we processed by Kremena Star (MSSM), and all samples were generated by Lu Lu and colleagues at UTHSC. Data have been normalized by Hongqiang Li and R Williams and Kremena Star. (Implemented by Kremena Star, Lu Lu, Hongqiang Li, Russell Chesney, Robert Williams, and Erwin Bottinger.)
2006-08-16: We soon will convert all mouse genomic and genetic data sets to the latest mouse genome sequence assembly. Through August 2006 all mouse sequence data has used position data from the March 2005 mm6 assembly (equivalent to NCBI Build 34). Starting September 2006 we will convert all position data to mm8, equivalent to NCBI Mouse Build 36 and Ensembl Mus musculus version 40.36a, released in February 2006. (Implemented by Evan Williams and Rob Williams.)
2006-07-12: Barley gene expression data and new SNP genotypes are being integrated into the GeneNetwork beta site for testing. All data are from Arnis Druka at the Scottish Crop Research Institute. Data are currently password protected, and those interested in obtaining access should contact Arnis.Druka@scri.ac.uk (Implemented by Jintao Wang, Arnis Druka, and Rob Williams.)
Legend: Access to the new Barley Affymetrix expression data set from the Scottish Crop Research Institute.
2006-07-11: Jintao Wang, the lead programmer for GeneNetwork, WebQTL, GenomeGraph, and GeneWiki has accepted a position at Federal Express. We are really sorry to see Jintao leave later in July, but wish him all the best at this exciting new position in one of the world's greatest companies. (Implemented by Jintao Wang ;-)
2006-07-10: The Mouse Phenome Database (MPD) and several other large data sets are being integrated into the GeneNetwork's Mouse Diversity Panel. To access these new data sets please select "MOUSE-GROUP-Mouse Diversity Panel". The Mouse Diversity Panel will eventually includes the MPD, additional strain data sets extracted from the published literature, the Wellcome Trust-CTC SNP collection, and several large gene expression data sets, including those for whole brain, hippocampus, cerebellum, and eye. (Implemented by Jintao Wang and Evan G. Williams.)
Legend: Access to the new Mouse Diversity Panel data sets.
Legend: Bar chart of white blood cell counts across 43 strains of mice taken from the Mouse Diversity Panel. Virutally all of the phenotype data are provided from the Mouse Phenome Project.
2006-06-23: The GeneWiki annotation and open note making system has been upgraded and now has an independent search page (see the Search menu, pop-down). We expect to make additional changes to the interface over the next few weeks. (Implemented by Jintao Wang.)
2006-06-23: Three new versions of the Hippocampus Consortium expression data set for adult BXD strains (June06 in MAS5, RMA, and PDNN versions). These data sets exclude several marginal arrays and correct for one incorrectly labeled strain in older data sets. (Implemented by Hongqiang Li.)
2006-06-12: GenomeGraph is being rewritten to exploit a Scalable Vector Graphics (SVG) interface that allows zooming and other advanced GUI features. Visit the beta version of GenomeGraph for a test drive. You will need an SVG plugin for your machine. We do not know of an effective universal SVG plug-in for Macintosh Intel machines. Therefore, if you have a Macintosh with an Intel processor you will currently need to force Safari to open using the Rosetta emulation mode. It is easy to do this. Just follow these directions. (Implemented by Jintao Wang.) Running Safari using Rosetta will slow things down a bit, so consider this a temporary solution. (Implemented by Jintao Wang.)
2006-06-09: Powerful new search method (RIF=your-text-here) that exploits the new Gene "Reference into Function" (GeneRIF) taken from NCBI. A search string such as rif=autism or RIF=Autism will find all genes/transcripts/proteins that are known to be associated with autism based on the GeneRIF entries from NCBI (n = 125 hits using one of the Affymetrix M430 expression data sets). (Implemented by Jintao Wang.)
2006-06-08: GeneWiki has been upgraded to include the current set of NCBI GeneRIF entries. These GeneRIFs provide a summary of information about genes. You can search of data in the GeneRIFs using this simple command in the ANY or ALL fields "RIF=text_string" or "rif=text_string". For example: RIF=schizophrenia will generate a list of all genes with schizophrenia listed anywhere in their list of GeneRifs. We encourage all users to enter their own comments and notes in the GeneWiki to supplement and extend the GeneRIF. (Implemented by Jintao Wang.)
2006-06-05: A new assembly of the mouse genome (mm8) is now being integrated into GeneNetwork databases and the web site. Please note that many data sets still rely on the mm6 assembly. (Implemented by Jintao Wang.)
2006-05-18: We are experimenting on the Beta site with Scalable Vector Graphics (SVG) displays of scatter plots generated from the Correlations Results tables. SVG allows you to modify the display size and the area of graphs using control clicks. You will need an SVG plug-in for your browser and hardware. SVG works fine with most Intel and Macintosh computers. However, if you have a Macintosh with an Intel processor you will find that the SVG version of the GenomeGraph does not work unless you force Safari to open using Rosetta. It is easy to do this. Just follow these directions. (Implemented by Jintao Wang.) Running Safari using Rosetta will slow it down somewhat, so consider this a temporary solution.
2006-05-11: New and final "Eye M430v2 (Apr06) RMA" database has been added to GN beta and production web site. This data set includes data for 71 strains including 55 BXD strains, C57BL/6J, DBA/2J, reciprocal F1 hybrids and 12 other strains of mice. The Info file is still incomplete. Data generated by Weikuan Gu, Eldon Geisert, Yan Jian, Lu Lu, and Rob Williams with support from Barrett Haik and the Hamilton Eye Institute. (Implemented by Hongqiang Li, Yanhua Qu, and Jintao Wang.)
2006-04-26: New and final Mouse BXD Eye mRNA expression database is being added to the Beta site using a new quality control procedure. The data are still being error corrected as of April 29, 2006. This data set includes 57 BXD strains, C57BL/6J, DBA/2J, F1 hybrids and 12 other strains of mice. Data generated by Weikuan Gu, Eldon Geisert, Yan Jian, Lu Lu, and Rob Williams with support from Barrett Haik and the Hamilton Eye Institute. (Implemented by Hongqiang Li, Yanhua Qu, and Jintao Wang.)
2006-04-26: Search menus are now being updated so that they provide a complete list of available databases in hierarchical pull-down menus. (Implemented by Jintao Wang.)
2006-04-18: We have converted our python code to utilize Mod_python. Mod_python is an Apache code module that embeds a Python interpreter within the server and that will often run many times faster than a traditional Common Gateway Interface (CGI). Mod_python will not help much for those processes (e.g., interval mapping or correlation tables) that take a long time to compute. But for fast processes, such as generating AJAX menus, opening data-editing page, it helps substantially. (Implemented by Jintao Wang.)
2006-03-31: A Correlation Results Tables now includes a feature to add multiple columns of correlations. This makes it possible to quickly identify well and poorly conserved correlations across data sets and tissues. You may need to use a newer browser to exploit this new feature. (See News item of March 14th; Implemented by Jintao Wang.)
2006-03-16: NCBI Entrez Gene LinkOut established. LinkOut is a service of NCBI and Entrez that allows you to link directly from PubMed and other Entrez databases to a wide range of information and services beyond the Entrez system. NCBI pages now link from mouse and rat genes to GeneNetwork expression data sets. (Implemented by Hongqiang Li.)
2006-03-14: GENSAT BGEM link to GN established. The Brain Gene Expression Map is a large library of in situ gene expression images of the embyronic, neonatal, and adult mouse. It includes data for over 3000 genes. (Implemented by Tom Curran and the BGEM group at St Jude Children's Research Hospital.)
2006-03-15: A Correlation Results Tables are now implemented using AJAX code that allows rapid resorting of the top 100, 200, or 500 traits. You will now see small UP and DOWN sort arrows in the column heads. You may need to use a newer browser to exploit this new feature. Being able to resort tables is useful when you would like to filter a list of traits by expression value (usually from high to low) or by position. AJAX is a programming method that makes web pages more responsive and dynamic. (Implemented by Jintao Wang.)
2006-01-20: GeneNetwork's MySQL relational database has been moved to a dual dual-core AMD Opteron 280 computer system assembled by Monarch Computer for improved performance. This system has halved the time required to compute correlation tables from about 100 seconds down to 40 seconds. (Implemented by Jintao Wang.)
2005-12-19: A short Review of GeneNetwork by William R. Lariviere on the American Pain Society web site.
2006-01-03: A GeneWiki system is being implemented. GeneWiki (also known as Gene Notes) allows any user of GN to add notes to the GN database. You can add annotations for genes of interest. All annotation is public. For example, RWW has added annotations on expression patterns of genes in different brain regions using taken from the Allen Brain Atlas and GENSAT. Our first GeneWiki implementation does not conform to all WIKI standards, and it may be more appropriate to consider GeneWiki as a simple system for adding notes on genes. We hope to load GeneWiki with many of the NCBI GeneRifs. (Implemented in progress by Jintao Wang.)
2005-12-19: An AJAX implementation of the Search Page is now being tested on the beta site. There should be almost no noticeable difference if you are using a current version of common web browsers (Explorer, Firefox, Safari). Please contact us if you have any problems. (Implemented by Jintao Wang.)
2005-12-15: GeneWiki feature added to GeneNetwork. You can add short annoations to the GN database that related to genes using an interface we have borrowed from the NCBI Gene Reference into Function (GeneRIF). To read all annotations provided by all users please click on the Annotations button (or GeneRIF button). All annotations are open and public. Annotations should ideally be of use to the research community. Here is an example of a recent annotation entered for the mouse Etv1 gene: "Amygdala and hippocampal CA1 and subiculum expression signature, highly specific neocortical layer 5 expression signature, cerebellar granule cell expression signature (data from Allen Brain Atlas, ABA)."
When adding a note, if possible please provide a PubMed ID number or a web address (URL). You can use the Annotations feature to find groups of genes that belong to interesting functional categories. We are currently using this feature to define sets of "expression signatures" for different parts of the mouse brain, for example genes and transcripts with highly selective expression in the dentate gyrus of the hippocampal formation. (Implemented by Rob Williams and Jintao Wang.)
2005-12-02: New Advanced Search function now allows users to search for either cis-acting or trans-acting QTLs across entire expression data sets. The general fomat is "TransLRS=(Low_LRS_limit, High_LRS_limit, Mb_buffer)". This syntax can be combined in the ALL field with other conditions, such as the chromosome location of the QTL and the expression level of the trait. For a better explanation please see the Advanced Search page. (Implemented by Jintao Wang.)
2005-11-21: Demonstration XML Schema for mouse data sets has been published for the use of the Biomedical Informatics Research Network (BIRN). For readability, please review the source code version of this page. This is an initial demonstration/proof-of-principle. (Implemented by Hongqiang Li.)
2005-11-15: Basic Statistics pages have been improved to handle larger data sets and to provide better graphic output. (Implemented by Jintao Wang and Rob Williams)
2005-11-14: Literature Correlations gene data set by Ramin Homayouni, Michael Berry and colleagues has been updated. The literature correlations are positive values between o and 1 that summarize the pair-wise similarity of genes (or transcripts) on the basis of the known literature using the methods described on the Semantic Gene Organizer site. (Implemented by Ramin Homayouni, Lai Wei, Kevin Heinrich, and Jintao Wang.)
2005-11-01: New Affymetrix M430v2 Eye Data Set for 63 strains of mice (C57BL/6J, DBA/2J, their reciprocal F1 hybrids, 47 BXD recombinant inbred strains, and 12 diverse inbred strains) have been entered on the beta site by the UTHSC Hamilton Eye Institute. Expression data for whole eye is available from Species = Mouse, Group = BXD, and Type = Eeye. The Information (INFO) file that accompanies this M430 data set is still provisional. Use of these data in publications is currently limited to members of the HEIMED consortium pending addition of more data, publication, and formal release, but if you would like permission to make selected use of data please contact Robert W. Williams, UTHSC. (Implemented by Lu Lu, Yan Jiao, Yanhua Qu with support of the Hamilton Eye Institute.)
2005-10-24: New Affymetrix M430v2 Hippocampus Data Set for 96 strains of mice (65 BXD, 13 CXB, and 16 diverse inbred strains, B6D2F1 and D2B6F1) will be placed on the beta site by the Hippocampus Array Consortium at the end of October. Expression data for whole hippocampus will be available from Species = Mouse, Group = BXD, and Type = Hippocampus. The Information (INFO) file that accompanies this M430 data set is still provisional. Use of these data in publications is currently limited to members of the consortium pending data addition, publication, and formal release, but if you would like permission to make selected use of data please contact Robert W. Williams, UTHSC. (Implemented by Lu Lu, Shirlean Goodwin, Yanhua Qu, Rob Williams, and members of the Hippocampus Consortium.)
2005-10-10: New Affymetrix M430v2 Striatum Data Set for a B6D2F2 Intercross has been placed on the beta test site by Robert Hitzemann and colleagues. Expression data for the striatum of 30 males and 30 females are available from Species = Mouse and Group = BDF2-2005. The Information (INFO) file that accompanies the M430 data is still provisional. For use of these unpublished data please contact Robert Hitzemann, Department of Behavioral Neuroscience, Oregon Health & Science University. (Implemented by Yanhua Qu.)
2005-10-07: Advanced Search options have been improved. The main improvement involves combining Gene Ontology searches with other advanced search syntax. (Implemented by Hongqiang Li.)
2005-09-28: GeneNetwork Mouse SNP Browser has been upgraded with Perlegen/NIEHS data. The SNP Browser is a tool that is used in combination with the Interval Analyst to evaluate and rank genes and polymorphisms in intervals thought to be responsible for variation in traits. The SNP Browser includes all Celera Genomics mouse SNPs, all public mouse SNPs in dbSNP (as of August 2005), and all Perlegen-NIEHS SNPs (http://mouse.perlegen.com/mouse/download.html as of late Sept 26, 2005). We thank Paul Thomas and Richard Mural of Celera Genomics, Gary Churchill and Natalie Blade of the Jackson Laboratory, and the Perlegen/NIEHS sequencing consortium for help and access to data. (Implemented by Robert Crowell, Alex Williams, and Jintao Wang.)
An example: To search for SNPs type in this string and then modify position as desired:
2005-09-27: Gene Ontology searching is now possible. This search feature allows you to search for all genes/transcripts related to particular categories using the appropriate GO identifer. For example, to extract all transcripts associated with "synaptic vesicle exocytosis" enter the string "GO:0016079" in the ANY field. To browse GO terms and classes link to AmiGo. As of Sept 2005, the GO contains approximately 20,000 terms of which approximately 6300 GO terms can be associated with genes in one or more of the GeneNetwork databases. Approximately 700 high level GO terms will return well over 200 genes. Given the 500 transcript limit it is therefore useful to select lower level GO terms that will return 100 or fewer probe sets/transcripts/genes. (Implemented by Hongqiang Li.)
2005-09-20: The UCSC Gene Browser is now linked to GeneNetwork from the Gene Description and Page Index as a "Quick Link" for both mouse and rat genomes. (Implemented by Jintao Wang at UT and Fan Hsu at UCSC.)
2005-09-06: Phenotype Data Entry SOP. We are beginning to develop standard operating procedures (SOP) to allow colleagues to deposit new data sets into the GeneNetwork. Please review this initial Phenotype data entry SOP if you have traits that you would like added to either an existing or new mapping panel (Partially implemented by Rob Williams.)
2005-08-26: OHSU/VA B6D2F2 Brain mRNA 430 (Aug05) MAS5, RMA and PDNN array data sets now are available. These data sets include M430 Set A and Set B arrays (Implemented by Yanhua Qu.)
2005-08-19: GenomeGraph has been implemented for several large array data sets and can now be used for testing purposes. GenomeGraph is a new module of The GeneNetwork that is designed for the analysis of entire array data sets. (Implemented by Jintao Wang.)
2005-08-17: Dynamic GeneNetwork Database Schema Description allows database experts to review the data structure and fields used by the GeneNetwork MySQL relational database. We have just begun the textual annotation of the database tables and field. This new system will soon replace the current database "dump" available at http://www.genenetwork.org/schema.html (Implemented by Hongqiang Li.)
2005-08-16: Traits in the Selections Windows Now Sortable. The Selection command is used to move trait data from one or more databases into a single Selections window (aka the "shopping cart") for common analysis. For example, users can put classical phenotypes such as body and brain weight in the same Selections window with transcripts for growth hormone receptor (Ghr), GH releasing hormone (Ghrh), and GHRH receptor (Ghrhr) in liver and brain. The new feature makes it possible to sort items in the Selections window by database, position, or name. Sorting is helpful is reviewing contents of the window and in reordering items prior to calculating correlation matrices. Please recall that all itmes in a Selections window must come from a single genetic reference population or panel, for example the AXB/BXA strains of mice, the BXH strains of rat, or from one of several intercrosses. (Implemented by Jintao Wang.)
2005-08-12: New Mouse Liver and Metabolic Trait Databases have been released by Dr. Alan Attie and colleagues. While these data may be reviewed, their use is still are reserved until final publication. The primary database is an Affymetrix M430 survey of gene expression in the liver of 60 selected F2 mice (a B6 x BTBR F2-ob/ob cross) that includes data on approximately 45,000 probe sets. This array database is accompanied by 24 classical metabolic and blood chemistry traits. All F2 animals were genotyped a 194 microsatellite markers. (Implemented by Alan Attie and colleagues, Yanhua Qu, and Jintao Wang.)
2005-08-08: The Interval Analyst (IA) provides a tabular summary of known genes in a chromosomal interval with data on gene expression, gene size, SNPs number and density, and human homologs. The IA is still a beta site function but will be release to the public site in the next week. The IA table is automatically generated with each chromosome map. IA tables can be extensively customized and resorted. For the BXD and AXB/BXA mouse genetic reference panels, the IA also provides access to Celera SNPs, as well as public SNPs for a variety of sources. Clicking on the SNP number for a specific gene in the IA generates a SNP browser table (at present, only for mouse). The purpose of the IA is to allow users to rank-order genes in an interval that may be contributing to variability in phenotypes. (Implemented by Evan Williams, Robert Crowell, Alex Williams, and Rob Williams.)
2005-08-08: The design of Chromosome and Whole Genome QTL Maps has been signficantly improved and updated. These new physical QTL maps merge LRS or LOD functions with gene and SNP tracks and can be zoomed to the level of single genes and SNPs. Maps can be exported in 2X versions that are near publication quality. Below most maps you will now find a customizable Interval Analyst table that can be customized to help rank order candidate genes. Variants of these new maps have been introduced to handle all species and genetic reference populations. (Implemented by Robert Crowell, Alex Williams, Evan Williams, and Rob Williams; final integration by Jintao Wang.)
Legend: Sample of a new high resolution physical map. This map shows a locus that modulates the expression of the Cart transcript (cocaine and amphetamine regulated transcript) on distal Chr 10 in BXD mouse strains (brain tissue). The Control Block, top middle, permits users to customize the display and its resolution. Pink, blue, and beige horizontal bars above the map provide links to higher resolution maps (8x) or to the UCSC and ENSEMBL genome browsers. Statistical thresholds for linkage are marked by grey and pink horizontal lines and are based on 2,000 permutations. The Y-axis provides a scale for the plot of LRS or LOD scores that are plotted using a thicker blue line. The calculation of linkage statistics are based on a total of 147 useful markers that have been genotyped in all 89 BXD strains (The Wellcome-CTC Mouse Strain SNP database with added microsatellite markers). The far more digital look of the LRS function that traditional interval maps arises for the simple reason that locations of recombinations in this cross have been precisely defined and only a fewer regions exploit a true interval mapping approach (see News item of 2005-06-17 for additional detail).
The thinner red and green lines and the right Y-axis display the additive effect size; green for high alleles inherited from one parent (DBA/2J in this example), and red for high alleles from the other parent (C57BL/6J). The units are log2 expression differences where 0.2 is equivalent to a 2^0.2-fold difference. The large number of closely packed tick marks along the top of the map show locations of genes on Chr 10. Gene blocks are color coded by the average density of SNPs per gene using a rainbow color sequence with low density in the blue/green spectrum and high density in orange/red spectrum. The bright orange hash marks along the X-axis provide a graphic estimate of numbers of SNPs that are segregating in the BXD strains in any particular chromosomal region. A long interval from 30 Mb to 65 Mb is almost identical by descent between the two parental strains.
Many regions of these maps are responsive to a mouse click. For example, the name and size of any gene can be determined by simply placing the mouse cursor over its mark. The same applies to the significance thresholds and the SNP track. Below each of these maps is a complete list of known genes in the interval with numerous links to other data types, including information on expression, lists of known SNPs in each gene, and corresponding regions of the human genome. All physical map positions in mouse are based on the Mouse Build 34, mm6 (March 2005).
2005-08-02: An Export Traits function button has been added to the set of tools available in each Selections Window (the Selections window is known informally as the "shopping cart"). Export Traits now joins other tools such as Cluster Tree, Network Graph, and Compare Correlates at both the top and bottom or each Selection window. Any set of traits in the Selection window can be easily exported, including conventional phenotypes, genotypes, and subsets of array data. The default output format is compatible with Microsoft Excel. (Implement by Jintao Wang.)
2005-07-29: Rat RAE 230A and Mouse (M430 and U74A) Affymetrix Probe Set Annotation Tables have been significantly improved and realigned to rat and mouse genome assemblies. Information taken from the BLAT alignment data has been added to GeneNetwork data tables. Data types include the alignment score of concatenated probes, probe set specificty (usually the ratio of first hit score divided by second hit score), a position values of the 3' and 5' ends of the concatenated probe sequences. [Implemented by Senhua Yu (rat) and Yanhua Qu (mouse).]
2005-07-27: All Mouse Genotype Databases have now been fully updated using Wellcome-Illumina-CTC SNP data sets consisting of 13377 SNPs. These SNPs have been integrated with the older microsatellite markers used through July 2005. You can search for markers (see Advanced Search) and treat genotypes as a standard "trait." You can also align the sequence of any marker to the latest genome assembly to determine where a SNP or microsatellite is located. (Implemented by Jing Gu, Lu Lu, and Jintao Wang.)
2005-07-26: Complete Upgrade of the PUBLISHED PHENOTYPE Databases. All PubMed abstracts were searched in June and July of 2005 for publications pertaining to BXD, AXB, CXB, or BXH mouse recombinant inbred strains. Means and standard errors were collected, reviewed, and extracted from these papers. Data were then entered manually in GeneNetwork tables by Emily English and Elissa Chesler.
2005-07-26: Sorting Traits by several different variables is now possible in the Search Results page. Select from seven different ways to sort lists as shown in the screen shot below.(Implemented by Jintao Wang)
2005-07-26: QTL Reaper tutorial has been added to the GeneNetwork site. QTL Reaper is a command line program for high throughput mapping of array data sets. (Implemented by Evan Williams.)
2005-07-25: An Error Detected and Corrected in SJUT Cerebellum databases dated March 2005. Data for BXD23 mistakenly included a BXD14 sample. All three March 2005 databases (RMA, PDNN, MAS5) have now been corrected. Values for the two affected strains are changed relative to data in this database prior to July 25, 2005. (Implemented by Jing Gu, Rob Williams, and Yanhua Qu.)
2005-07-22: Modified Linux Virtual Server configuration to eliminate problems with client institution firewall restrictions on numbers of simultaneous connections. Our thanks to Dr. Michael Miles for his help diagnosing firewall problems for clients. (Implemented by Jintao Wang.)
2005-07-21: Improved Advanced Search. It is now possible to combine search strings to generate complex queries. For example, this combination Mb=(Chr11 90 100) Mean=(12 20) when entered into the lower ALL field will find transcripts that map to Chr 11 between 90 and 100 Mb that also have mean expression between 12 and 20 units. (Implemented by Jintao Wang.)
2005-07-15: . GeneNetwork Mouse SNP Browser has been implemented. The SNP Browser is a tool that will eventually be used in combination with the Interval Analyst to evaluate and rank genes and polymorphisms in intervals thought to be responsible for variation in traits. The SNP Browser includes all Celera Genomics mouse SNPs, all public mouse SNPs in dbSNP (as of August 2005), and all Perlegen-NIEHS SNPs (http://mouse.perlegen.com/mouse/download.html as of late June 2005). The SNP Browser is still at an early stage of development. We thank Paul Thomas and Richard Mural of Celera Genomics, Gary Churchill and Natalie Blade of the Jackson Laboratory, and the Perlegen/NIEHS sequencing consortium for help and access to data. (Implemented by Robert Crowell, Alex Williams, and Jintao Wang.)
An example: To search for SNPs on Chr 5 from X to Y Mb:
2005-07-15: Access to GeneNetwork Archive site. The archive site provides access to old data sets and old genotype files that have now been superceded. We anticipate that it will be used mostly to verify old findings and to document changes in results. The Archive is now available from the main search page. (Implemented by Jintao Wang.)
2005-07-13: LXS Genotypes Upgraded. Genotypes for the large GRP of LXS strains has been greatly improved thanks to the Illumina-Wellcome-CTC SNP project. The original set of 330 markers has been replaced with a set of 2659 informative markers. Download either the LXS genotypes or BXD genotypes used by WebQTL as text files.
(Implemented by Jing Gu and Jintao Wang.)
2005-07-12: Search Page Upgraded. Users now can change the default settings to those they most commonly use. Your browser must be configured to allow The GeneNetwork to retain a "cookie" on your computer. We have also added a new button labeled ADVANCED SEARCH that provides advice and syntax for searches. (Implemented by Jintao Wang.)
2005-07-12: Pair-Scan Upgraded. The pair scan now exploits the new Wellcome-Illumina high density genotype files. This result in more exhaustive searches for two-locus interactions. This is particulary true when single chromosome pairs are scanned by clicking on the initial DIRECT output graph. (Implemented by Jintao Wang.)
2005-07-12: Updated Affymetrix M430 GeneChip Annotation Data. We have realigned all M430 probes and probe set sequences onto the latest mouse assembly (Build 34 or mm6). This annotation is more complete than most other available M430 probe set annotation of which we are aware, including Affymetrix NetFX. (Implemented by Yanhua Qu.)
2005-06-17: New High Density Mapping Algorithm that exploits the Wellcome-CTC SNP data has been implemented for the BXD mouse genetic reference populations on both public and beta sites. In the case of the BXD panel (BXD1 through BXD100), the merged SNP and microsatellite maps are based on a total of 7636 informative markers that differ between the parental strains, C57BL/6J (B) and DBA/2J (D). The locations of these makers are known on the latest assembly of the mouse genome (Build 34, mm6). The median distance between markers in this subset is 178,831 bp. The mean distance is 324,493 bp. There are only 26 intervals between markers that are longer than 5 Mb. No interval is greater than 10 Mb except on Chr X. These long intervals are essentially monomorphic between the parental strains.
The new algorithm exploits a selected subset of 3795 markers that includes all markers with unique strain distribution patterns (SDP), as well as pairs of markers (the most proximal and most distal markers) for SDPs represented by two or more markers. This BXD genotype data set can be downloaded by ftp at ftp://atlas.utmem.edu/public/BXD_WebQTL_Genotypes_June05.txt.
The mapping algorithm is a mixture of simple marker regression, linear interpolation, and standard Haley-Knott interval mapping. If two adjacent markers have identical SDPs they will have identical linkage statistics, as will the entire interval between the markers (assuming complete and error-free haplotype data for all strains). On a physical map the LRS and the additive effect values will therefore be constant over this interval. Between neighboring markers that are separated by 1 cM or more we use a conventional interval mapping method (Haley-Knott) combined with a Haldane estimate of genetic distance. When the interval is less than 1 cM we simply interpolate linearly based on a physical scale between the markers. The result of this mixture mapping algorithm is a map of the trait that has an unusal profile that is particular striking on a physical (Mb) scale, with many plateaus, abrupt linear transitions between plateaus, and a few regions with the standard graceful curves typical of interval maps.
The same procedure will soon be implemented for other mouse GRPs, including AXB/BXA, CXB, BXH, and AKXD.
For users that would like reference access to the old set of genotypes, we will set up an Archive site with the May 2005 microsatellite markers and maps.
To download the combined SNP and microsatellite genotype file used in WebQTL please link to ftp://atlas.utmem.edu/public/ and look for Illumina_UT_BXD_May05.xls (entire data set) or BXD_WebQTL_Genotypes_June05.txt (extracted subset of markers used by WebQTL), or link to Dr. Richard Mott's Mouse Inbred Line Genotype site for the original SNP data set. (Implemented by RW Williams, KF Manly, and JT Wang.)
2005-06-13: Rat HXB Fat Data Set released on the www.genenetwork.org/search3.html test site (stabilized RMA transform). The Affymetrix RAE230A data files generated by Tim Aitman and colleagues were downloaded from the Array Express site. The set of 120+ arrays covers a total of 30 RI strains and complements a recent paper (Hübner et al., 2005). Error checking is still in progress and this is a pre-release data set to use for test purposes. (Implemented by Senhua Yu, R. Williams, and Jintao Wang. More transforms are in progress.)
2005-06-12: Moved GeneNetwork and Upgraded Utilities. The GeneNetwork and the WebQTL module has been moved to a cluster of nine P4 single processor computers. Eight of the nodes are devoted to the GeneNetwork application code while the ninth node runs the Linux virtual server. The MySQL database server currently runs on a separate Proliant dual processor node. The Roundup issue tracking systems has been upgraded to v. 0.83 and is now available at http://www.genenetwork.org:8080/webqtl/. Analog has also been upgraded to v 6.0. (Implemented by Jintao Wang, with thanks again to Ari Berman.)
2005-05-24: Ultra-high Resolution Mouse SNP Genetic Maps are now gradually replacing the previous generation of microsatellite maps. Until May 2005, all genetic maps of recombinant inbred strains of mice in WebQTL have relied heavily on a set of roughly 1500 microsatellite markers genotyped across all RI sets by the Informatics Center for Mouse Neurogenetics (Williams et al., 2001; Peirce, Lu et al, 2004). In collaboration with members of the CTC (Richard Mott, Jonathan Flint and colleagues), we have helped genotype a total of 480 strains using a panel of 13,377 SNPs. More than half of the SNPs are informative in most crosses. These SNPs have been combined with microsatellites to produce new consensus maps for BXD and other GRPs using the latest mouse genome assembly as a reference frame (Build 34 - mm6). In the case of the BXD GRP, a total of 88 strains were genotyped using the full set of SNPs of which 7482 are informative. The order of markers given in WebQTL is essentially the same as that given in Build 34. To reduce false positive errors when mapping using this ultradense map, we have eliminated most single genotypes that generate double-recombinant haplotypes. Double-recombinant haplotypes are most commonly produced by typing errors ("smoothed" genotypes). (Implemented by Lu Lu, Jing Gu, Jintao Wang, Ken Manly, and Rob Williams, with help from Jonathan Flint and Richard Mott).
2005-05-23: Search Functions have been upgraded. It is now possible to (1) find all transcripts whose genes map to a give chromosomal location; (2) all traits and transcripts that have a mean value within a particular range; (3) all traits that have a peak genome-wide linkage score (LRS score or p value) within a particular range. These new search functions are still being tested on the test site (http://www.genenetwork.org/search3.html). (Implemented by Jintao Wang).
(1) To find transcripts by chromosomal position the search syntax needs to follow these rules:
- "Position in (ChrY 0.3 52.4)" or "Position = (Chr1, 98 104)" [Note: No space between "Chr" and the number or letter of the chromosome. ]
- "Pos in (ChrY 0.3 52.4)" or "Pos =(Chr1, 98 104)" [don't enter the quotes.]
- "Mb in (ChrY 0.3 52.4)" or "Mb = (Chr1, 98 104)" [don't enter the quotes.]
(2) To find traits by mean value, the search syntax needs to follow these rules:
- "Mean in (12.3, 12.4)" or Mean=(12.3, 12.4) [These strings will find those traits with a mean value from 12.3 and 12.4. Don't enter the quotes.]
(3) To find traits by LRS value or p value, the search syntax needs to follow these rules:
- "LRS in (20, 30)" or "LRS=(20, 30)" [These strings will find traits with LRS values ranging from 20 to 30. This search depends on the existence of database of precomputed LRS values. If this database has not yet been set up for a particular data set, then the search will not return any records. Don't enter the quotes.]
- "pvalue in (0.0001, 0.001)" or "pvalue=(0.0001, 0.001)" [These strings will find traits with p values ranging from 0.0001 to 0.001. This search depends on a database of precomputed values. If this database has not yet been set up for a particular data set, then the search will not return any records. Don't enter the quotes.]
2005-05-13: Virtual Server implementation of The GeneNetwork is being beta tested. The Linux Virtual Server (LVS) allows GeneNetwork to exploit a small clusters of servers to handle larger numbers of clients quicky. Performance is particularly critical during bioinformatics class projects when large numbers of students make nearly simultaneous requests. (Implemented by Jintao Wang, Senhua Yu, and Ari Berman).
2005-05-12: Genome Explorations Inc. has been provided a license to run a copy of the GeneNetwork and WebQTL software as part of a Phase I Small Business Innovation Research (SBIR) grant from NIAAA. The TCP/IP address is 188.8.131.52. The site currently contains three data sets (MAS5, RMA, and PDNN) generated at GE and UTHSC (subcontractor) using a total of 85 Affymetrix M430 2.0 arrays. The first data release consists of 26 BXD strains, the two parental strains, C57BL/6J and DBA/2J, and ten other inbred strains of mice (A/J, 129S1/SvJ, AKR/J, BALB/cJ, BALB/cByJ, C3H/HeJ, CAST/Ei, KK/HIJ, LG/J, and NOD/J). (Implemented by Jintao Wang, Yanhua Qu, Lu Lu, Roberrt Williams, Robert Rooney, and Divyen Patel).
2005-05-10: Whole Transcriptome Mapping Display: We are testing an interface that displays a entire transcriptome QTL map for a tissue similar to figures 3A and 3B of Chesler and colleagues (2005). Note that one parameter can be used to modify the false discovery rate of the points that are plotted. Plots have been precomputed for more than 30 databases and transforms. (Implemented by Jintao Wang).
2005-05-04: New Mouse Genome Assembly (NCBI Build 34, UCSC mm6) released by NCBI (implemented by Deanna Church and colleagues). Over the next several months all mouse genome megabase and nucleotide position data and links in the GeneNetwork (markers, probes, SNPs, genes) will be converted to this new assembly. BLAT searches initiated with WebQTL already exploit the most recent build. GeneNetwork users may find small discrepancies in gene and marker locations until all database tables are updated.
2005-04-22: Arabidopsis Data Sets released on the www.genenetwork.org/search3.html test site. The Genotypes and Phenotypes files for the Bay-0 x Shahdara cross data were all provided by Olivier Loudet. Please see the Information file. Implemented by O. Loudet, R. Williams, and Jintao Wang.
2005-04-21: Rat HXB Kidney Data Set released on the www.genenetwork.org/search3.html test site (original RMA transforms). The Affymetrix RAE230A data files were provided by Norbert Hübner and colleagues. The set of 120+ arrays covers a total of 30 RI strains and complements a recent paper (Hübner et al., 2005). Implemented by Senhua Yu, R. Williams, and Jintao Wang. More transforms are in progress (MAS5 added May 13, 2005).
2005-04-14: New S-Score Transform for the BXD Brain data set released on the www.genenetwork.org/search3.html test site. This data set complements existing MAS5, PDNN, RMA, dCHIP, and HWTIPM transforms. The Significance score method centers the expression of every probe set at 0. The signal values are therefore the strain deviations in Z score units from the grand mean based on 100 arrays. The S-score software is described in Zhang et al. (2002) and Kerns et al. (2003).
2005-04-08: Expanded HBP/Rosen Striatum Data Sets released on the www.genenetwork.org/search3.html test site (MAS5, RMA, and PDNN transforms). The new data set covers a total of 33 strains using 59 M430 2.0 arrays. A good demonstration of the improved performance of the expanded data set is Kcnj9 (probe set 1450712_at_A), a known cis-QTL in multiple data sets. This trait generates a peak LRS score of 27.0 in the initial November 2004 data set (MAS5) and a peak LRS of 47.8 in the April 2005 data set (MAS5). The peak LRS is approximately 600 Kb proximal to the Kcnj9 gene. The Heritability Weight Transform (HWT) data set will be added in the next several weeks.
2005-04-04: Expanded INIA Brain Data Sets released on the www.genenetwork.org/search3.html test site (MAS5, RMA, and PDNN transforms). Seventy-one new samples have been added, bringing the total to 105 arrays covering 42 BXD strains, both parents, and the F1 hybrid. A good demonstration of the improved performance of the expanded data set is Kcnj9 (probe set 1450712_at_A), a known cis-QTL in multiple data sets. This trait generates a peak LRS score of 14 in the initial October 2004 data set (MAS5) and a peak LRS of 41.9 in the April 2005 data set (MAS5). The peak LRS is approximately 2000 Kb distal to the Kcnj9 gene. We have also tested these data using probe set 1418908_at_A (Pam). This trait generates a peak LRS score of 52.8 in the initial October 2004 data (MAS5) and a peak LRS of 54.2 in the April 2005 (MAS5). The peak LRS is approximately 800 Kb distal to Pam gene. The Heritability Weight Transform (HWT) transform will be added in the next several weeks.
2005-03-21: Expanded Cerebellum Data Sets released on the www.genenetwork.org/search3.html test site (MAS5, RMA, and PDNN transforms). Fifty-four new samples have been added. We have tested these data using probe set 1418908_at_A (Pam), a known cis-QTL in multiple data sets. This trait generates a peak LRS score of 31.7 in the initial March 2003 data (MAS5), a peak LRS score of 32.3 in the October 04 data (MAS5), and a peak LRS of 52.2 in the March 2005 (MAS5). In the March 2005 data, the peak LRS is only 500 Kb from the 5' promoter region of the Pam gene. The abundantly expressed GABA alpha 6 receptor (Gabra6) transcript (1417121_at_A) is another good test case of a cis modulated trait in cerebellum. (Implemented by the GeneNetwork group and the Cerebellum Consortium). The Heritability Weight Transform (HWT) data set will be added in the next several weeks.
2005-03-15: Cluster Trees now compute and display up to 100 traits simultaneously. This makes it possible to select the top 100 covariates of a trait from a Correlation Results table and map all 100 as a hierarchically organized group. (Implementation by Jintao Wang).
2005-03-04: Literature Correlation data set has been integrated into GeneNetwork Correlation Results output tables. This important new feature provides an estimate of the strength of relations between pairs of genes that is based on a textual analysis of PubMed abstracts (latent semantic index correlations). Values are based on a matrix of 16,000 gene-gene simlarity scores computed by Ramin Homayouni (UTHSC) and Michael Berry (UT Knoxville). This feature is still experimental, and GeneNetwork users should note that pairs of genes that are mentioned together in a small set of papers may have inappropriately high correlations. For more information on the algorithm please contact Ramin Homayouni. (Implementation by Ramin Homayouni and Jintao Wang).
2005-03-01: Network Graph output has been improved significantly. It is now possible to change the labels from probe set IDs to gene symbols. Nodes can also be color-coded by database. Markers and genotypes can be used as nodes. Literature Correlations can be used to define the lines (edges) between traits. (Implementation by Jintao Wang).
2005-03-01: Heritability Weighted Transform method has been published at Genome Biology. This method (HWT1PM) provides significantly higher signal than other common transforms. (Design and implemenation by Ken Manly)
2005-02-23: Database Schema has been published online at http://www.genenetwork.org/schema.html. This schema (January 2005 version) was generated using MySQLdump v 9.1. (Implemenation by Jintao Wang, Bill Bug, and Ken Manly)
2005-02-23: Scriptable Interface improved to handle queries from Genome Browser and other systems. The new interface provides a list of links to data from multiple tissues and strains for a single gene. For example, to retrieve expression estimates for Kcnj8 the URL query has this form: http://www.genenetwork.org/cgi-bin/beta/WebQTL.py?cmd=search&gene=kcnj8. This query does not resolve the many possible aliases for gene symbols, and requires the use of the preferred or official gene symbol. (RWW, implementation by Jintao Wang)
2005-01-27: QTL Reaper 1.0.0 has been released. QTL Reaper is platform-independent program for rapidly mapping thousands of traits. It is now available to advanced users at SourceForge (241 KB, written in Python and C with sample and help files). QTL Reaper can map well over 50,000 traits in under 12 hours on fast single-processor systems. It includes a sophisticated method (Besage, 1991) to adjust the number of permutation tests to estimate genome-wide p values with reasonable precision down to values of approximately 10^-5 (10^6 permutations). This feature is useful for identifying reproducible QTLs in large transcriptome data sets, that is, sets of QTLs with defined false discovery rates. (Design by Ken Manly, implemenation by Jintao Wang)
Besag J, and Clifford P (1991). Sequential Monte Carlo p-values. Biometrika 78: 301-304.
2005-01-26: The Pair-scan output tables now include a new analytic tool that provides a breakdown of strains in each genotype category (for example, the four two-locus genotypes: B/B, B/D, D/B, and D/D) either in the form of scatter plots or in the form of a box plot. This new feature is still being tested and refined and is currently available only on the test site (www.genenetwork.org/search3.html). This feature will be moved to the public site in February. (Implemenation by Jintao Wang)
2005-01-22: Marker Genotype Databases have been added that complement trait and transcriptome databases for the following groups: AKXD, AXB/BXA, CXB, BXH, BXD, LXS, B6D2F2, and the rat HXB/BXH. These new databases enable you to use any marker genotype as a "trait" to search for transcripts or classical phenotypes that may be influenced by particular genomic regions. This is now possible using the new Genotype databases and the Compare Correlates tool. To find all markers on Chromosome 1 just type in "Chr 1" or "Chromosome 1" into the Search field. These maker genotype databases are currently available on the test site (www.genenetwork.org/search3.html) but will be moved to the public site by late January. (Implemenation by Jing Gu, Lu Lu, Yanhua Qu, Rob Williams, and Jintao Wang)
2005-01-21: New Data Download feature has been added. The Information files for most UTHSC Brain databases (e.g., the RMA Orig transform) now have links to Excel workbooks that include the full Affymetrix U74Av2 data set of 100 arrays for each transform. These Excel workbooks also include a separate spreadsheet with the strain averages for each transform. Look for the word "Download" in the Information pages. (Implemenation by Yanhua Qu)
2005-01-13: We have added a new BLAST probe analysis tool to the Probe Information tables associaed with each Affymetrix probe set. This button-tool aligns any PM 25-mer probe to the GenBank sequence that Affymetrix lists as being the sequence source. When BLAT analysis of concatenated probes does not provide an unequivocal map location for a probe set, this method can be used to verify that the GenBank accession is correct. If so, it may then be appropriate to BLAT the entire GenBank entry to verify probe set map location. (Implemenation by Yanhua Qu)
2005-01-11: Rat HXB/BXH Published Phenotype databases added to the GeneNetwork. The genetic maps that are used in combination with these phenotypes are based on a total of 770 markers. Phenotypes were all provided by Michal Pravenec. We thank Tim Aitman and Pierre Mormede for review of their data sets. (Implementation by RWW, MP, and JW)
2005-01-03: We now provide links to entire data files for the U74Av2 brain data set. All DAT, CEL, TXT, RPT, and EXP files can be downloaded. For example, here are data files for five C57BL/6J U74Av2 arrays. The complete U74Av2 data set consists of a total of 100 arrays, all of which can be reached from the Main Table in any of the Information Pages for these different transforms (MAS5, RMA, PDNN, HWT1PM, dChip). The DAT, CEL, RPT and EXP files will be identical among all transforms. The only differences among transforms are the TXT files. The appropriate reference to cite if you make use of these data files is:
Chesler EJ, Lu L, Shou S, Qu Y, Gu J, Wang J, Hsu HC, Mountz JD, Baldwin N, Langston MA, Threadgill DW, Manly KF, Williams RW (2005) Genetic dissection of gene expression reveals polygenic and pleiotropic networks modulating brain structure and function. Nature Genetics 37: 233-42.
2004-12-25: First draft of the WebQTL Glossary is completed. Many key terms are now defined. We will be adding links to the glossary from graphs and other pages.
2004-12-22: An annotated Links list has been added. Email RW Williams at if you have suggestions for additional sites that have proved useful in combination with GeneNetwork resources.
2004-12-21: We have implemented a new method of transforming Affymetrix microarray data called the Heritability Weighted Transform (Manly et al. 2005). When used with large Affymetrix data sets of the type used by WebQTL, this method is considerably more powerful than other common probes-to-probeset transform such as MAS5, PDNN, RMA, or dChip. To evaluate this new method please try the Mouse/BXD/Brain/Database called UTHSC Brain mRNA U74Av2 (Dec03) HWT1PM (HWT1PM is short for Heritability Weighted Transform Version 1, Perfect Match Probes only). For further detals on this method see the Info page. The reference for this approach to transforming Affymetrix array data is:
Manly KF, Wang J, Williams RW (2005) Weighting by heritability for detection of quantitative trait loci with microarray estimates of gene expression. Genome Biology 6: R27.
2004-12-17: We have added mouse UniGene identifiers from Build 142. It is therefore now possible to enter search terms such as "Mm.1" to find data on S100 calcium binding protein A10 (S100a10). A total of 38,034 probe sets on the Affymetrix mouse expression array 430 2.0, have UniGene identifiers.
2004-12-14: First draft of the WebQTL Frequently Asked Questions is completed. We be happy to answer any other questions you have. Please email RW Williams at .
2004-12-13: Major additions are expected later in December in both the SJUT Cerebellum data set and in the INIA Brain data set. Sample size will be almost doubled in both data sets.
2004-12-10: Updated positions of Mouse Expression Aglient G4121A probe using the May 2004 (mm5) assembly of the mouse genome. This work was carried out by Yanhua Qu.
2004-12-10: We have begun to combine WebQTL and The GeneNetwork. WebQTL is the first and so far only "channel" of the GeneNetwork. However, our hope is that there will soon be other projects that will share use of the GeneNetwork. The main URL is now www.genenetwork.org. Requests to www.webqtl.org will resolve to www.genenetwork.org.
2004-12-03: Rat HXB/BXH genotype and published phenotype databases added to beta test site of WebQTL. The genetic maps are based on a total of 770 markers. Phenotypes were all provided by Dr. Michal Pravenec.
2004-12-02: Important new graphic and analytic tools have been added.
The first of these is the Compare Correlates tool. This function is available in Selection Windows. It is essentially a Venn diagram set tool. Instead of providing simple graphs, it provides lists of traits in different parts of a virtual Venn diagram. For example, to find traits that covary with Sonic Hedgehog, Indian Hedgehog, Desert Hedgehog, Patched1, and Gli3, you would select five key transcripts into a Selections window (use the "Add Selection" tool and then select a group of traits in the Selections window). Compare Correlates allows you to chose the target database to which the key traits will be correlated. Compare Correlates was designed by Elissa Chesler and Stephen Pitts. Code was written and optimized by Stephen Pitts.
The second new tool is Network Graph. This function displays a set of traits and their correlations in the form of a graph with nodes (traits) and lines (correlations). There are quite a few tunable parameters, including the correlation threshold used to draw (or not draw) a line between nodes. To use this new tool, you again need to have traits loaded into one of the Selections windows. Network Graph was designed by Elissa Chesler and Stephen Pitts. Code was written, optimized, and error-checked by Stephen Pitts.
2004-10-24: Updated positions of all Mouse Expression U74Av2, 430A, 430B, and 430 2.0 probe sets using the May 2004 (mm5) assembly of the mouse genome. This work was carried out by Yanhua Qu. The M430 data consists of 45,000 probe sets. Positions were obtained using a series of methods: Method 1. A BLAT analysis of the actual probe sequence using a 48-processor cluster (our thanks to Yan Cui). Roughly 90% of all probe sets were mapped using this method. If the probe sequence did not BLAT with a score above 99 AND an identity match of 100, then we used Method 2: We used the position of the probe set given in the affMOE430.txt.gz data file. This method recovered position data for approximaely 5% of all probe sets. If Method 2 failed, then we used Method 3: We obtained the position given by Affymetrix in the files called "MOE430A Annotations, CSV (6.3 Mb, 10/12/04)" and "MOE430B Annotations, CSV (3.9 Mb, 10/12/04)". This method recovered positions on roughly 4%. As a last resort we used Method 4: We retained position data from mm4 or mm3 without interpolation. No position data would be found for 198 records and no chromosome could be found for 46 probe sets. We estimate that 5 to 10% of position data are unreliable.
2004-10-16: Expression data set for the striatum of BXD strains released by Glenn Rosen to the www.webqtl.org/search3.html beta site. This is the first WebQTL database that exploits the Mouse Expression 430 2.0 array from Affymetrix. Four versions were released: MAS5, RMA, PDNN, and the new GCRMA.
2004-10-11: New hierarchical Search Page interface released to main site (Choose species, cross, type, and database). New Info pages released. More complete annotation and explanation of the use of the pair-scan data is now provided when the "permutation" option is selected in the Analysis Tools area of the Trait Data and Editing Form.
2004-09-22: Pair-scan feature is now zoomable. Click on any single chromosome pair region to zoom in.
2004-08-20: Pair-scan permutation test is now available, it takes 90 seconds to do 500 permutations.
2004-07-15: New Pair-scan searches for pairs of chromosomal regions that may be involved in two-locus epistatic interactions is added to WebQTL
2004-06-07: Interval mapping graph in 2X resolution is now available for downloading.
2004-06-02: Three new B6D2F2 database are added to WebQTL. Dominance estimation for interval mapping with F2 data is available.
2004-05-03: Cluster qtl map display is added to WebQTL. These QTL heat maps can be drawn using three different color assignments.
2004-03-18: User is now able to add their own traits to selections, the correlation matrix and multiple mapping and some other features can be included for those traits.
2005-07-15: Database List Selector has been implemented for the administrator. This facility is used to select the best databases to use by external resources that link to the GeneNetwork. (Implemented by Jintao Wang.)
Information about this text file:
This text file originally generated by RWW, March 2004.