|
Features |
|
1) Introduction |
|
-
Domains are the building blocks of proteins and one
of the most useful characteristics for determining
protein function. The functions of the individual
domains of a multidomain protein contribute to our
understanding of the properties of the protein as a
whole. The sequential order of protein domains
is known as the domain architecture.
Architectures are useful for classifying
evolutionarily related proteins, detecting
evolutionarily distant homologs, and comparing
multidomain proteins
-
DAhunter is a new web-based server that
identifies homologous proteins by comparing the
sequence of domains (domain architecture). DAhunter
considers promiscuous domains (domains that
typically carry out auxiliary functions and appear
in many unrelated proteins), which are not directly
related to homology.
-
To
detect promiscuous domains, we assigned a weight
score to each domain extracted from RefSeq proteins
that was based on its abundance and versatility. We
used a domain¡¯s scores to represent its importance
in protein world. In scoring domains, we considered
domain combinations as well as single domains.
We use (1)
the
cosine similarity, (2)
the Goodman-Kruskal
gamma function, and (3) domain duplication index
to measure the similarity of a pair of domain
architectures.
|
|
|
2) Datasets |
|
|
|
3) How to
assign a weight score
of each domain unit |
|
Domain unit weight score = IAF*IVF
|
|
4) How to
compare two domain architectures |
|
-
DAhunter search for the best
matched domain architecture from the domain architecture database,
which is from RefSeq proteins,
UniProtKB/Swiss-Prot,
and UniProtKB/TrEMBL.
-
DAhunter compare three features of domain
architectures.
- domain unit content (x): the Cosine similarity.
- domain order (y): the Goodman-Kruskal gamma
function.
- domain unit copy (z): the domain duplication index,
whose definition is similar to that of the IDF.
|
|
5) How to
fix the parameters and evaluate DAhunter |
|
- To fix parameters a
and b of the similarity score, we used Homologene DB release 61 containing
44,481 groups. (1) From these groups, we obtained
8,290 domain architectures from 5,215 groups having more
than 2 architectures. (2) We carried out 8,290 tests.
In each test, one of 8,290 domain architectures was compared
to the other 8,289 by allowing a and b to vary
from 1.0 to 0.0 in steps of 0.1. (3) We chose 0.8
for a and 0.3 for b because these values produce
the maximum number of the best-matched combinations with the
same group. To obtain the test results of DAhunter with
Homologene DB for each a and b value, click
here.
We also tested with the COG database in a similar manner.
The user can downloaded the test results from
here.
- To evaluate the DAhunter
algorithm, we compared the
DAhunter
results (a=0.8, b=0.3) with the
PDART
results (a=0.36, b=0.01, c=0.63).
|
|
|
|