GADGET is a tool that lets you find genes and metabolites associated with a given query in the biomedical literature.
When you search for genes, GADGET first finds PubMed abstracts that match your query. From the set of abstracts found, GADGET compiles a list of genes that are overrepresented in the abstracts, and thus related to your query. (If you've entered a keyword query, GADGET checks to see if your keywords are present in the abstract for each abstract. If you've entered gene symbols, GADGET checks several data sources to find abstracts that mention your genes.)
For each gene in the search results, you can see the abstracts that link this gene to your query by clicking the "Show abstracts" link.
You can download gene search results as a CSV file. Downloaded results will contain all of the genes that match your query, not only the top-scoring genes displayed on the page.
Search for metabolitesGADGET's metabolite search shows you a list of metabolites related to your query. It works the same as the gene search, by first finding a set of abstracts related to your query, and then compiling a list of metabolites from the abstracts (see above for details.)
Searching by keywordGADGET can find genes and metabolites that match any set of keywords or keyphrases, by finding PubMed abstracts that contain the keywords and keyphrases in your query.
GADGET supports Boolean queries for more specific searches. Terms can be separated with AND and OR. An AND query will match abstracts that contain both terms, while an OR query will match abstracts that contain either term. Terms separated by spaces are automatically treated as an AND query. You can also exclude terms from your query by using NOT followed by the term. You can use parentheses to make complex Boolean queries.
Examples:
To do an exact match on a multi-word phrase, enclose the phrase in quotes.
You can also restrict abstracts by the year they were published by entering year: followed by the year. You can specify ranges of years by using year:[startyear to endyear]. You can leave either startyear or endyear blank for an open-ended range.
GADGET accepts wildcards to match inexact terms. Including a question mark (?) in a term will match any single character, while an asterisk (*) will match any number of arbitrary characters. Note that using a wildcard character at the beginning of a term will significantly slow GADGET down. You cannot use wildcard characters within quotation marks.
Query terms in GADGET are not case sensitive (although Boolean operators are case sensitive).
You will often get better and faster results by using more specific terms in your searches. Avoid using common terms like "gene" in your query; they don't usually add much value to the results, and they can slow down GADGET.
Searching via related genesYou can enter a list of genes to search for related genes and metabolites. GADGET will find abstracts that mention your queried genes, and show you a list of genes/metabolites highly represented in those abstracts.
You can enter genes using their Entrez Gene symbols or Gene ID's (eg. TAL1 or 6886.)
You can select whether GADGET should use abstracts that mention any (at least one) or all of your genes.
You can use Boolean operators (AND, OR, and NOT in caps) in your gene list. These operators override your any / all choice.
You can also use * for wildcard matches.You can also upload a file with a list of genes. The file needs to have one gene symbol or Entrez Gene ID per line.
Gene / metabolite scoresGADGET can use several different scores to rank search results for genes and metabolites:
F1 Balanced Score | The default F1 Balanced Score is a balanced measurement of precision and recall (adjusted precision and query-matching abstracts). It takes both the proportion of matching abstracts for each gene and the total number of matching abstracts into account. The F1 score represents a tradeoff between the "Adjusted precision" and "Matching abstracts" scores. |
Adjusted Precision | The adjusted precision score is based upon the proportion of a gene's abstracts that match your query – it highlights genes that occur frequently in abstracts matching your query, but occuring less frequently in other abstracts. For each gene in your query, adjusted precision is calculated as (number of abstracts for the gene matching the query) ÷ (10 + total number of abstracts for the gene). The adjusted precision score is adjusted by adding a "pseudocount" of 10 to the denominator, so that a gene with 10/10 matching abstracts scores higher than a gene with 1/1 matching abstracts. |
Query-matching abstracts | The number of abstracts for each gene/metabolite that match your query. |
Total abstracts | The total number of abstracts associated with each gene. (This number is not based upon your query.) |
GADGET also computes a p-value for each gene using the hypergeometric test. The null hypothesis for the test is that the fraction of abstracts that match the query for a given gene/metabolite is equal to the "background" fraction of abstracts that match the query in the entire corpus of abstracts. The p-value thus indicates how confident we are that a gene/metabolite's abstract set is enriched for the query. (There is no option in GADGET to sort genes/metabolites by the p-value because it would be too slow, but you can still do this if you download your search results in a CSV file.)
Selecting a speciesGene search only
When you do a GADGET gene search, all of the genes in your results will be from a particular species. GADGET allows you to select from several species to view genes for. The currently-supported species are:
NCBI Taxonomy ID | Name |
---|---|
9606 | Homo Sapiens (default) |
10090 | Mus musculus |
559292 | Saccharomyces cerevisiae |
The metabolite search includes results from all of the above species, and cannot currently filter results by species.
For gene search queries, when you provide a list of gene symbols, GADGET will match them to genes from your selected species.
Optional inclusion of homologsGene search only
When finding and ranking genes that match your query, GADGET's gene search can optionally take homologs into account. To calculate scores for each gene, GADGET counts the number of abstracts matching your query that the gene appears in. If you choose to include homologs, GADGET will also count abstracts that refer to homologs of the gene.
The groups of homologs that GADGET uses come from Homologene. Homologs can be from both the same species as each gene, and from different species.
This feature can be useful for queries that match a small number of abstracts.
Data sourcesGADGET's gene search uses gene-abstract links from NCBI's gene2pubmed dataset, Saccharomyces Genome Database (SGD), and Mouse Genome Informatics (MGI). Gene homologs come from the Homologene database.
GADGET's metabolite search uses a list of metabolites from the Human Metabolome Database (HMDB). Metabolite-abstract links are generated by matching metabolite names and synonyms from HMDB in the set of PubMed abstracts collected for the gene search.
GADGET downloads abstract text from PubMed. New abstracts, gene-abstract links, and metabolite-abstract links are updated twice a week.
Data API - Using GADGET without a browserIf you want to automatically access GADGET without using a web browser, you can download comma-delimited CSV or XML files by fetching URLs from GADGET and sending arguments via the query string at the end of the URL. To download data, the query string must include &download=CSV.
To search genes, use this URL: http://gadget.biostat.org/gadget/genelist?download=csv&<query parameters here>
To search metabolites, use this URL: http://gadget.biostat.org/gadget/metabolitelist?download=csv&<query parameters here>
To send information to GADGET, just append it to the url. Aguments are given by the name of the argument, an equals sign (=), and the argument's value, e.g. argument=value. Multiple arguments should be separated by ampersands (&). Here's a list of the arguments you can send:
download | This argument is required to download the results as a CSV file. (&download=CSV.) |
q | The keyword query terms of your search. In most browsers, you can include spaces and quotation marks in the URL. Special characters, like "&" and "/", must be percent encoded. Either q or genes (or both) is required for GADGET queries. |
genes | A list of gene symbols and/or Entrez Gene IDs to search for. You can use the same format and all of the same features for entering genes as you can in the graphical interface. If you provide gene symbols, GADGET will match them to genes for your selected species Either q or genes (or both) is required for GADGET queries. |
geneop | The "gene operator" to use for your list of genes - whether to match abstracts that mention any or all of your genes (see gene entry for the gene search and keyphrase search.) If geneop isn't provided, any will be used by default. |
species | Gene search only (GADGET's metabolite search only supports homo sapiens for now.) The species to use for your gene search. All of the genes in your gene search results will belong to this species. To specify a species, use the species's NCBI taxonomy ID, (e.g. &species=9606 for humans). See here for a list of currently available species and taxonomy IDs. If you don't specify a species, homo sapiens will be used by default. |
usehomologs | Gene search only Whether or not to include homologs when calculating scores for each gene. (See here.) To include homologs, include &usehomologs=1 in your query string. If usehomologs is omitted, homologs will not be included by default. |
orderby | Which score to use to rank the results. There are 4 possible values for orderby. The possible values are f1_score, which will order the results by the F1 score; adjusted_precision, which will order the results by the adjusted precision; matching_abstracts, which will order the results by the number of abstracts matching your query; and total_abstracts, which will order the results by the total number of abstracts for each gene/metabolite. If the orderby argument is omitted or invalid, GADGET will default to the F1 score. See above for more information about the different scores. |
limit | The maximum number of results to return. This argument should be a number. If limit is not included or is invalid, GADGET will return all of the genes/metabolites found in the search. |
offset | The rank (minus one) of the first gene/metabolite to include in the results. To omit the first 100 results, include &offset=100. If offset is omitted or invalid, GADGET will not omit any results. |
If no abstracts or genes/metabolites match your query, or if q is omitted or invalid, the GADGET server will issue a 404 error instead of returning a CSV file.
Examples:
GADGET's source code is available in this repository on GitHub.