← Back to GADGET

GADGET User Manual

GADGET is a tool that lets you find genes and metabolites associated with a given query in the biomedical literature.

Search for genes

When you search for genes, GADGET first finds PubMed abstracts that match your query. From the set of abstracts found, GADGET compiles a list of genes that are overrepresented in the abstracts, and thus related to your query. (If you've entered a keyword query, GADGET checks to see if your keywords are present in the abstract for each abstract. If you've entered gene symbols, GADGET checks several data sources to find abstracts that mention your genes.)

For each gene in the search results, you can see the abstracts that link this gene to your query by clicking the "Show abstracts" link.

You can download gene search results as a CSV file. Downloaded results will contain all of the genes that match your query, not only the top-scoring genes displayed on the page.

Search for metabolites

GADGET's metabolite search shows you a list of metabolites related to your query. It works the same as the gene search, by first finding a set of abstracts related to your query, and then compiling a list of metabolites from the abstracts (see above for details.)

Searching by keyword

GADGET can find genes and metabolites that match any set of keywords or keyphrases, by finding PubMed abstracts that contain the keywords and keyphrases in your query.


GADGET supports Boolean queries for more specific searches. Terms can be separated with AND and OR. An AND query will match abstracts that contain both terms, while an OR query will match abstracts that contain either term. Terms separated by spaces are automatically treated as an AND query. You can also exclude terms from your query by using NOT followed by the term. You can use parentheses to make complex Boolean queries.

Examples:

Find genes or metabolites referenced in abstracts matching both "Cat" and "Dog":
Cat AND Dog or just Cat Dog
Find genes or metabolites referenced in abstracts matching "Cat" or "Dog", but not "Fish":
Cat OR Dog NOT Fish
Find genes or metabolites referenced in abstracts matching "Cat", or both "Dog" and "Fish":
Cat OR (Dog AND Fish)

To do an exact match on a multi-word phrase, enclose the phrase in quotes.

Match abstracts containing the terms "Embryonic", "Stem", and "Cell":
Embryonic stem cell
Match abstracts containing "Embryonic stem cell" all together:
"Embryonic stem cell"

You can also restrict abstracts by the year they were published by entering year: followed by the year. You can specify ranges of years by using year:[startyear to endyear]. You can leave either startyear or endyear blank for an open-ended range.

Match abstracts published in 2011:
year:2011
Match abstracts containing "apoptosis" published between 2005 and 2009 (inclusive):
apoptosis year:[2005 to 2009]
Match abstracts containing "surfactant" published in or before 2007:
surfactant year:[to 2007]
Match abstracts containing "chitin" published in or after 2009:
chitin year:[2009 to]

GADGET accepts wildcards to match inexact terms. Including a question mark (?) in a term will match any single character, while an asterisk (*) will match any number of arbitrary characters. Note that using a wildcard character at the beginning of a term will significantly slow GADGET down. You cannot use wildcard characters within quotation marks.

Match abstracts containing "text", "test", etc.:
te?t
Match abstracts containing "BLAST", "blastula", "blastema", etc.:
blast*

Query terms in GADGET are not case sensitive (although Boolean operators are case sensitive).

You will often get better and faster results by using more specific terms in your searches. Avoid using common terms like "gene" in your query; they don't usually add much value to the results, and they can slow down GADGET.

Searching via related genes

You can enter a list of genes to search for related genes and metabolites. GADGET will find abstracts that mention your queried genes, and show you a list of genes/metabolites highly represented in those abstracts.

You can enter genes using their Entrez Gene symbols or Gene ID's (eg. TAL1 or 6886.)

You can select whether GADGET should use abstracts that mention any (at least one) or all of your genes.

You can use Boolean operators (AND, OR, and NOT in caps) in your gene list. These operators override your any / all choice.

You can also use * for wildcard matches.
With the any option selected, match abstracts that mention either SHH or LBH (or both.) With the all option selected, match abstracts that mention both SHH and LBH.
SHH LBH
The same as the previous example, using Entrez Gene ID's instead of gene symbols:
6469 81606
Match abstracts that mention PECAM1 and either KDR or TIE1 (or both):
PECAM1 AND (KDR OR TIE1)
Match abstracts that don't mention TWIST1:
NOT TWIST1
Match all of the gene symbols that start with HOX
HOX*

You can also upload a file with a list of genes. The file needs to have one gene symbol or Entrez Gene ID per line.

Gene / metabolite scores

GADGET can use several different scores to rank search results for genes and metabolites:

F1 Balanced ScoreThe default F1 Balanced Score is a balanced measurement of precision and recall (adjusted precision and query-matching abstracts). It takes both the proportion of matching abstracts for each gene and the total number of matching abstracts into account. The F1 score represents a tradeoff between the "Adjusted precision" and "Matching abstracts" scores.
Adjusted PrecisionThe adjusted precision score is based upon the proportion of a gene's abstracts that match your query – it highlights genes that occur frequently in abstracts matching your query, but occuring less frequently in other abstracts. For each gene in your query, adjusted precision is calculated as (number of abstracts for the gene matching the query) ÷ (10 + total number of abstracts for the gene). The adjusted precision score is adjusted by adding a "pseudocount" of 10 to the denominator, so that a gene with 10/10 matching abstracts scores higher than a gene with 1/1 matching abstracts.
Query-matching abstractsThe number of abstracts for each gene/metabolite that match your query.
Total abstractsThe total number of abstracts associated with each gene. (This number is not based upon your query.)

GADGET also computes a p-value for each gene using the hypergeometric test. The null hypothesis for the test is that the fraction of abstracts that match the query for a given gene/metabolite is equal to the "background" fraction of abstracts that match the query in the entire corpus of abstracts. The p-value thus indicates how confident we are that a gene/metabolite's abstract set is enriched for the query. (There is no option in GADGET to sort genes/metabolites by the p-value because it would be too slow, but you can still do this if you download your search results in a CSV file.)

Selecting a species

Gene search only

When you do a GADGET gene search, all of the genes in your results will be from a particular species. GADGET allows you to select from several species to view genes for. The currently-supported species are:

NCBI Taxonomy IDName
9606Homo Sapiens (default)
10090Mus musculus
559292Saccharomyces cerevisiae

The metabolite search includes results from all of the above species, and cannot currently filter results by species.

For gene search queries, when you provide a list of gene symbols, GADGET will match them to genes from your selected species.

Optional inclusion of homologs

Gene search only

When finding and ranking genes that match your query, GADGET's gene search can optionally take homologs into account. To calculate scores for each gene, GADGET counts the number of abstracts matching your query that the gene appears in. If you choose to include homologs, GADGET will also count abstracts that refer to homologs of the gene.

The groups of homologs that GADGET uses come from Homologene. Homologs can be from both the same species as each gene, and from different species.

This feature can be useful for queries that match a small number of abstracts.

Data sources

GADGET's gene search uses gene-abstract links from NCBI's gene2pubmed dataset, Saccharomyces Genome Database (SGD), and Mouse Genome Informatics (MGI). Gene homologs come from the Homologene database.

GADGET's metabolite search uses a list of metabolites from the Human Metabolome Database (HMDB). Metabolite-abstract links are generated by matching metabolite names and synonyms from HMDB in the set of PubMed abstracts collected for the gene search.

GADGET downloads abstract text from PubMed. New abstracts, gene-abstract links, and metabolite-abstract links are updated twice a week.

Data API - Using GADGET without a browser

If you want to automatically access GADGET without using a web browser, you can download comma-delimited CSV or XML files by fetching URLs from GADGET and sending arguments via the query string at the end of the URL. To download data, the query string must include &download=CSV.

To search genes, use this URL: http://gadget.biostat.org/gadget/genelist?download=csv&<query parameters here>

To search metabolites, use this URL: http://gadget.biostat.org/gadget/metabolitelist?download=csv&<query parameters here>

To send information to GADGET, just append it to the url. Aguments are given by the name of the argument, an equals sign (=), and the argument's value, e.g. argument=value. Multiple arguments should be separated by ampersands (&). Here's a list of the arguments you can send:

downloadThis argument is required to download the results as a CSV file. (&download=CSV.)
qThe keyword query terms of your search.

In most browsers, you can include spaces and quotation marks in the URL. Special characters, like "&" and "/", must be percent encoded.

Either q or genes (or both) is required for GADGET queries.
genesA list of gene symbols and/or Entrez Gene IDs to search for. You can use the same format and all of the same features for entering genes as you can in the graphical interface.

If you provide gene symbols, GADGET will match them to genes for your selected species

Either q or genes (or both) is required for GADGET queries.
geneopThe "gene operator" to use for your list of genes - whether to match abstracts that mention any or all of your genes (see gene entry for the gene search and keyphrase search.) If geneop isn't provided, any will be used by default.
speciesGene search only (GADGET's metabolite search only supports homo sapiens for now.)

The species to use for your gene search. All of the genes in your gene search results will belong to this species.

To specify a species, use the species's NCBI taxonomy ID, (e.g. &species=9606 for humans). See here for a list of currently available species and taxonomy IDs. If you don't specify a species, homo sapiens will be used by default.
usehomologsGene search only

Whether or not to include homologs when calculating scores for each gene. (See here.) To include homologs, include &usehomologs=1 in your query string. If usehomologs is omitted, homologs will not be included by default.
orderbyWhich score to use to rank the results.

There are 4 possible values for orderby. The possible values are f1_score, which will order the results by the F1 score; adjusted_precision, which will order the results by the adjusted precision; matching_abstracts, which will order the results by the number of abstracts matching your query; and total_abstracts, which will order the results by the total number of abstracts for each gene/metabolite.

If the orderby argument is omitted or invalid, GADGET will default to the F1 score. See above for more information about the different scores.
limitThe maximum number of results to return. This argument should be a number. If limit is not included or is invalid, GADGET will return all of the genes/metabolites found in the search.
offsetThe rank (minus one) of the first gene/metabolite to include in the results. To omit the first 100 results, include &offset=100. If offset is omitted or invalid, GADGET will not omit any results.

If no abstracts or genes/metabolites match your query, or if q is omitted or invalid, the GADGET server will issue a 404 error instead of returning a CSV file.

Examples:

Download genes matching "arteriole", using homologs:
http://gadget.biostat.wisc.edu/gadget/genelist?download=CSV&usehomologs=1&q=arteriole
Download mouse genes matching "mesoderm", ranked by adjusted precision:
http://gadget.biostat.wisc.edu/gadget/genelist?download=CSV&q=mesoderm&orderby=adjusted_precision
Download the top 150 genes for mice matching "Loop of Henle", using homologs:
http://gadget.biostat.wisc.edu/gadget/genelist?download=CSV&species=10090&q=Loop of Henle&limit=150&usehomologs=1
Download the top 80 metabolites related to CFTR matching "cystic fibrosis", ranked by number of matching abstracts:
http://gadget.biostat.wisc.edu/gadget/metabolitelist?download=CSV&q=cystic fibrosis&genes=CFTR&limit=80&orderby=matching_abstracts
Source code

GADGET's source code is available in this repository on GitHub.