A Design Methodology for a Document Indexing Tool
David R. Cheriton School of Computer Science
The huge increase in volume of online literature has led
Documents contain “text objects” that have many infor-
to a parallel surge in research into methods for retrieving
mation retrieval uses. These text objects include: textual
meaningful information from this textual data—“content
items, such as noun phrases, and metadata items, such as
extraction” has emerged as a prominent field in natural
citations to other articles, hyperlinks to other documents or
language computing. However, little progress has as yet
web pages, and XML attributes. Respective uses of these
been made in determining the pragmatic content of a doc-
text objects include: keyword indexing to form links be-
ument, ‘hidden’ meaning such as the attitudes of the writer
tween keywords and documents; citation indexes; and XML
toward her audience, the intentions being communicated,
attributes as an important metadata search item.
the intra-textual relationships between document objects,
While it is a straightforward task to associate keywords
and so forth. But pragmatic information carries a great
with documents or build citation indexes which facilitate
deal of the underlying meaning in a document, and the in-
searches that ensure a high rate of recall in a search, the
ability to access this information means that current content
presence of a keyword or citation link does not necessar-
extraction methods are very uninformed.
ily mean a correspondingly high search precision. To im-
Our goal is to develop natural language systems capable
prove search precision, each link should ideally be labelled
of extracting this pragmatic information in text to provide
with a domain-specific descriptive category that indicates a
more meaningful document understanding. To this end, we
likely reason for the link. We propose to develop automated
are developing automated methods, both discourse-based
methods of link classification providing such typed links to
and using Machine Learning techniques, to recognize and
enable more-effective literature indexing and analysis tools.
interpret pragmatic cues in text. This pragmatic evidence
Our initial task is to construct an annotation tool for
may then be used to provide more-sophisticated document
manually classifying rhetorical and other pragmatic cues in
indexing to guide information extraction by providing de-
online texts to provide a training corpus for developing our
tailed information on the fine-grained nature of the linking
automated document-link classification system.
∗Authors are listed in alphabetical order. An earlier version of this pa-
per was given as a poster at the 2004 Joint Conference on Human LanguageTechnology/North American Association for Computational Linguistics(HLT-NAACL) (BioLink 2004: Workshop on Linking Biological Litera-
With the explosion in the amount of online literature, our
ture, Ontologies and Databases: Tools for Users), Boston, May 2004.
current techniques for information exploration have been
overwhelmed. If we could recognize and use fine-grained
Once we have determined the purpose of a citation, we
relationships among documents to assist navigation through
can then use this knowledge to group together articles and
information networks, we could better address this problem.
authors into clusters that will allow better navigation of
Suppose that we wish to label a link to the following news
the literature in a subject domain, and mapping to social
article which is cited by a competitor company analysis:
networks within a scientific community. We are applying
“The U.S. Food and Drug Administration is planning
knowledge from Computational Linguistics and Machine
to reverse additional patent protection for Biovail Corp.’s
Learning to develop methods and software tools for auto-
Tiazac, setting the stage for potential generic competition
matically determining the function of citations. It is ex-
against the Mississauga company’s flagship drug.” (The
pected that these results will then be applicable to related
Globe and Mail, Saturday 5 March 2001, page B2.)
problems in classifying other types of links and hyperlinks
Suppose also that we wish to label this link with either
“Favourable development for competitor” or “Unfavourable
Our resources include specialized repositories of
development for competitor”. If we extract just the posi-
biomedical articles (10,000) and physics articles (30,000),
tive phrase “additional patent protection for [competitor’s
as well as the entire BioMed Central corpus. Our initial goal
product]” then, without additional information, this article
is to build a training set of manually classified citations in
would be labelled as “Favourable”. However, the positive
biomedical articles (using a set of 1000 protein-interaction
phrase is obviously in the negative context indicated by “re-
articles we have curated from the larger biomedical corpus)
verse’, so it should have been labelled as “Unfavourable”.
that we could then use for developing our learning algo-
If the verb had instead been “continue” (a positive context)
rithms and for building scientific social networks.
then the positive sense would again prevail.
We have developed an initial annotation tool for manu-
It is obvious from this example that an analysis of the
ally classifying citations in scientific articles and now plan
text object context is crucial. What is not obvious is that
to extend the tool to classify other types of surface prag-
the context could be structurally larger than just the enclos-
matic cues (e.g., hedging cues, indicators of uncertainty).
ing sentence, even as large as a paragraph, the entire doc-
These cues will then provide a training corpus to develop
ument, or a set of documents. The goal of this project is
automated methods for classifying the types of links be-
to develop new methods for discovering contextual infor-
mation vital to the interpretation of text objects found indocuments. This information can then be used to label links
to the document that use the textual object. Although deepanalysis of text would be required for complete understand-ing of all the nuanced relationships between documents, it is
1. Development of Machine Learning algorithms (e.g.,
our contention that surface-cue and stylistic analysis, easier
using Hidden Markov Models, Conditional Random
and more tractable than full syntactic and semantic under-
Fields) for detection of linguistic features in text
standing, can provide much of the information that will be
relevant to citation function (R. Radoulov, Master’s
2. Development of Machine Learning methods and
software tools for automated classification of cita-
We are bootstrapping the development of a set of meth-
tions (J. Taylor, PhD student, UWO; R. Radoulov,
ods and software tools for the automated classification of
links between documents in online corpora by focusing ini-tially on the problem of automated citation classification inscientific articles. This is a particularly challenging prob-
3. Analysis of discourse and argumentation structure
lem as there can be upwards of 35 citation categories used
(e.g., using lexical chaining, lexical style, classical
in scholarly writing, with fine-grained distinctions among
argumentation models) as cues to citation function
the category definitions. Determining the purpose of a cita-
and inter-document relations (T. Maynard, Master’s
tion can involve recognizing linguistic features at all levels
student, UWO; B. White, PhD student, UWO; C.
of the text: lexical cues, syntactic arrangement, and over-
all discourse structure. We have demonstrated that auto-mated citation classification is feasible, but to improve theperformance of our classifier we need more-sophisticated
4. Using citation network analysis to map the structure
techniques blending discourse understanding with statisti-
of scientific communities (F. Kroon, PhD student,
cal methods for large-scale corpus analysis.
(1) Although the 3-D structure analysis by x-ray crys-
tallography is still in progress (Eger et al., 1994;Kelly, 1994), it was shown by electron microscopy
that XO consists of three submasses (Coughlan et
A citation index enables efficient retrieval of documents
Indexing tools, such as CiteSeer [3], play an important
from a large collection—a citation index consists of source
role in the scientific endeavour by providing researchers
items and their corresponding lists of bibliographic descrip-
with a means to navigate through the network of schol-
tions of citing works. The use of citation indexing of sci-
arly scientific papers using the connections provided by ci-
entific articles was invented by Dr. Eugene Garfield in the
tations. Citations relate articles within a research field by
1950s as a result of studies on problems of medical in-
linking together works whose methods and results are in
formation retrieval and indexing of biomedical literature.
some way mutually relevant. Customarily, authors include
Dr. Garfield later founded the Institute for Scientific Infor-
citations in their papers to indicate works that are foun-
mation (ISI), whose Science Citation Index [4] is now one
dational in their field, background for their own work, or
of the most popular citation indexes. Recently, with the ad-
representative of complementary or contradictory research.
vent of digital libraries, Web-based indexing systems have
Another researcher may then use the presence of citations
begun to appear (e.g., ISI’s ‘Web of Knowledge’, CiteSeer
to locate articles she needs to know about when entering a
new field or to read in order to keep track of progress in a
Authors of scientific papers normally include citations
field where she is already well-established. But, with the ex-
in their papers to indicate works that are connected in an
plosion in the amount of scientific literature, a means to pro-
important way to their paper. Thus, a citation connecting
vide more information in order to give more intelligent con-
the source document and a citing document serves one of
trol to the navigation process is warranted. A user normally
many functions. For example, one function is that the cit-
wants to navigate more purposefully than “Find all articles
ing work gives some form of credit to the work reported
citing a source article”. Rather, the user may wish to know
in the source article. Another function is to criticize pre-
whether other experiments have used similar techniques to
vious work. Other functions include foundational works in
those used in the source article, or whether other works have
their field, background for their own work, works which are
reported conflicting experimental results. In order to navi-
representative of complementary or contradictory research.
gate a citation index in this more-sophisticated manner, the
Determining the nature of the exact relationship between a
citation index must contain not only the citation-link infor-
citing and cited paper, often requires some level of under-
mation, but also must indicate the function of the citation in
standing the text that the citation is embedded in.
The near-term goal of our research project is the imple-
mentation of an indexing tool for scholarly scientific liter-
ature which uses rhetorical and other pragmatic cues in thecontext surrounding a citation to provide information about
In the biomedical field, a domain of particular interest
the relationship between the two papers connected by the ci-
to us, we believe that the usefulness of automated citation
tation. Ultimately, we hope to apply the methods and tools
classification in literature indexing can be found in both the
we will develop in classification of more-general kinds of
larger context of managing entire databases of scientific ar-
document links to enhance literature indexing schemes, im-
ticles or for specific information-extraction problems. On
prove document retrieval precision, and advance social net-
the larger scale, database curators need accurate and effi-
cient methods for building new collections by retrieving ar-ticles on the same topic from huge general databases. Sim-
ple systems (e.g., [1], [13]) consider only keyword frequen-cies in measuring article similarity. More-sophisticated sys-
A citation may be formally defined as a portion of a sen-
tems, such as the Neighbors utility [22], may be able to lo-
tence in a citing document which references another docu-
cate articles that appear to be related in some way (e.g., find-
ment or a set of other documents collectively. For example,
ing related Medline abstracts for a set of protein names [2]),
in sentence 1 below, there are two citations: the first cita-
but the lack of specific information about the nature and
tion is Although the 3-D structure. . . progress, with the set
validity of the relationship between articles may still make
of references (Eger et al., 1994; Kelly, 1994); the second ci-
the resulting collection a less-than-ideal resource for subse-
tation is it was shown. . . submasses with the single reference
quent analysis. Citation classification to indicate the nature
of the relationships between articles in a database would
make the task of building collections of related articles both
may be resolved through the availability of citation catego-
easier and more accurate. And, the existence of additional
rization in curated texts: synonym detection, for example,
knowledge about the nature of the linkages between articles
may be enhanced if different names for the same entity oc-
would greatly enhance navigation among a space of docu-
cur in articles that can be recognized as being closely related
ments to retrieve meaningful information about the related
A specific problem in information extraction that may
benefit from the use of citation categorization involves min-ing the literature for protein-protein interactions (e.g., [2],
[13], [21]). Currently, even the most-sophisticated systemsare not yet capable of dealing with all the difficult problems
The automated labelling of citations with a specific cita-
of resolving ambiguities and detecting hidden knowledge.
tion function requires an analysis of the linguistic features
For example, Blaschke et al.’s system [2] is able to handle
in the text surrounding the citation, coupled with a knowl-
fairly complex problems in detecting protein-protein inter-
edge of the author’s pragmatic intent in placing the citation
actions, including constructing the network of protein inter-
at that point in the text. The author’s purpose for includ-
actions in cell-cycle control, but important implicit knowl-
ing citations in a research article reflects the fact that re-
edge is not recognized. In the case of cell-cycle analysis for
searchers wish to communicate their results to their scien-
Drosophila, their system is able to determine that relation-
tific community in such a way that their results, or knowl-
ships exist between Cak, Cdk7, CycH, and Cdk2: Cak in-
edge claims, become accepted as part of the body of sci-
hibits/phosphorylates Cdk7, Cak activates/phosphorylates
entific knowledge. This persuasive nature of the scientific
Cdk2, Cdk7 phosphorylates Cdk2, CycH phosphorylates
research article, how it contributes to making and justifying
Cak and CycH phosphorylates Cdk2. However, the sys-
a knowledge claim, is recognized as the defining property
tem is not able to detect that Cak is actually a complex
of scientific writing by rhetoricians of science, e.g., [7], [8],
formed by Cdk7 and CycH, and that the Cak complex reg-
[9], [17]. Style (lexical and syntactic choice), presentation
ulates Cdk2. While the earlier literature describes inter-
(organization of the text and display of the data), and ar-
relationships among these proteins, the recognition of the
gumentation structure are noted as the rhetorical means by
generalization in their structure, i.e., that these proteins are
which authors build a convincing case for their results.
part of a complex, is contained only in more-recent articles:
Our approach to automated citation classification is
“There is an element of generalization implicit in later pub-
based on the detection of fine-grained linguistics cues in
lications, embodying previous, more dispersed findings. A
scientific articles that help to communicate these rhetori-
clear improvement here would be the generation of asso-
cal stances and thereby map to the pragmatic purpose of
ciated weights for texts according to their level of gener-
citations. As part of our overall research methodology, our
ality” [2]. Citation categorization could provide just these
goal is to map the various types of pragmatic cues in sci-
kind of ‘ancestral’ relationships between articles—whether
entific articles to rhetorical meaning. Our previous work
an article is foundational in the field or builds directly on
has described the importance of discourse cues in enhanc-
closely related work—and, if automated, could be used in
ing inter-article cohesion signalled by citation usage [15],
forming collections of articles for study that are labelled
[12]. We have also been investigating another class of prag-
with explicit semantic and rhetorical links to one another.
matic cues, hedging cues, [16], that are deeply involved in
Such collections of semantically linked articles might then
creating the pragmatic effects that contribute to the author’s
be used as ‘thematic’ document clusters (cf. Wilbur [23]) to
knowledge claim by linking together a mutually support-
elicit much more meaningful information from documents
ive network of researchers within a scientific community.
In extending our work to more-general types of document
An added benefit of having citation categories available
links, we are exploring other types of pragmatic connota-
in text corpora used for studies such as extracting protein-
tions, including certainty categorization and how explicitly
protein interactions is that more, and more-meaningful, in-
marked certainty can be predictably and dependably identi-
fied from newspaper article data. Certainty identification, in
Blaschke et al. [2] noted that they were able to discover
particular, can serve as a foundation for a novel type of text
many more protein-protein interactions when including in
analysis that can enhance question-and-answering, search,
the corpus those articles found to be related by the Neigh-
and information retrieval capabilities ([18], [19]). Certainty
bors facility [22] (285 versus only 28 when relevant protein
identification is a part of the new and exciting direction in
names alone were used in building the corpus). Lastly, very
information retrieval, natural language processing, and text-
difficult problems in scientific and biomedical information
mining, concerned with exploration of subjective, attitudi-
extraction that involve aspects of deep-linguistic meaning
nal, and affective aspects of texts [20].
We investigated this hypothesis by doing a frequency
analysis of hedging cues in citation contexts in a corpus of
In our preliminary study [15], we analyzed the frequency
985 biology articles. We obtained statistically significant
of the cue phrases from [14] in a set of scholarly scientific
results (summarized in Table 1) indicating that hedging is
articles. We reported strong evidence that these cue phrases
used more frequently in citation contexts than the text as
are used in the citation sentences and the surrounding text
a whole. Given the presumption that writers make stylis-
with the same frequency as in the article as a whole. In sub-
tic and rhetorical choices purposefully, we propose that we
sequent work [12], we analyzed the same dataset of articles
have further evidence that connections between fine-grained
to begin to catalogue the fine-grained discourse cues that
linguistic cues and rhetorical relations exist in citation con-
exist in citation contexts. This study confirmed that authors
do indeed have a rich set of linguistic and non-linguistic
Table 1 shows the proportions of the various types of sen-
methods to establish discourse cues in citation contexts.
tences that contain hedging cues, broken down by hedging-
Another type of linguistic cue that we are studying is re-
cue category (verb or nonverb cues), according to the dif-
lated to hedging effects in scientific writing that are used
ferent sections in the articles (background, methods, results
by an author to modify the affect of a ‘knowledge claim’.
and discussion, conclusions). For all but one combination,
Hedging in scientific writing has been extensively studied
citation sentences are more likely to contain hedging cues
by Hyland [9], including cataloging the pragmatic func-
than would be expected from the overall frequency of hedge
tions of the various types of hedging cues. As Hyland [9]
sentences (p ≤ .01). Citation ‘window’ sentences (i.e., sen-
explains, “[Hedging] has subsequently been applied to the
tences in the text close to a citation) generally are also sig-
linguistic devices used to qualify a speaker’s confidence in
nificantly (p ≤ .01) more likely to contain hedging cues
the truth of a proposition, the kind of caveats like I think,
than expected, though for certain combinations (methods,
perhaps, might, and maybe which we routinely add to our
verbs and nonverbs; res+disc, verbs) the difference was not
statements to avoid commitment to categorical assertions.
Hedges therefore express tentativeness and possibility in
Tables 2, 3, and 4 summarize the occurrence of hedging
communication, and their appropriate use in scientific dis-
cues in citation ‘contexts’ (a citation sentence and the sur-
rounding citation window). Table 5 shows the proportion of
The following examples illustrate some of the ways in
hedge sentences that either contain a citation, or fall within
which hedging may be used to deliberately convey an atti-
a citation window; Table 5 suggests (last 3-column column)
tude of uncertainty or qualifification. In the first example,
that the proportion of hedge sentences containing citations
the use of the verb suggested hints at the author’s hesitancy
or being part of citation windows is at least as great as what
to declare the absolute certainty of the claim:
would be expected just by the distribution of citation sen-tences and citation windows.
(2) The functional significance of this modulation
Table 1 indicates (statistically significant) that in most
is suggested by the reported inhibition of MeSo-
cases the proportion of hedge sentences in the citation con-
induced differentiation in mouse erythroleukemia
texts is greater than what would be expected by the distribu-
cells constitutively expressing c-myb.
tion of hedge sentences. Taken together, these conditional
In the second example, the syntactic structure of the sen-
probabilities support the conjecture that hedging cues and
tence, a fronted adverbial clause, emphasizes the effect of
citation contexts correlate strongly. Hyland [9] has cata-
qualification through the rhetorical cue Although. The sub-
logued a variety of pragmatic uses of hedging cues, so it
sequent phrase, a certain degree, is a lexical modifier that
is reasonable to speculate that these uses can be mapped
also serves to limit the scope of the result:
to the rhetorical meaning of the text surrounding a citation,and from thence to the function of the citation.
(3) Although many neuroblastoma cell lines show
a certain degree of heterogeneity in terms of neuro-
transmitter expression and differentiative potential,each cell has a prevalent behavior in response to dif-ferentiation inducers.
The indexing tool that we are designing is an enhanced
citation index. The feature that we are adding to a standard
In [16], we showed that the hedging cues proposed by
citation index is the function of each citation, that is, given
Hyland occur more frequently in citation contexts than in
an agreed-upon set of citation functions, we want our tool
the text as a whole. With this information we conjecture
to be able to automatically categorize a citation into one of
that hedging cues are an important aspect of the rhetorical
these functional categories. To accomplish this automatic
relations found in citation contexts and that the pragmatics
categorization we are using a decision tree—currently, we
of hedges may help in determining the purpose of citations.
are building the decision tree by hand, but in future we in-
Table 1. Proportion of sentences containing hedging cues, by type of sentence and hedging cue category. Table 2. Number and proportion of citation contexts containing a hedging cue, by section and loca- tion of hedging cue.
tend to investigate machine learning techniques to induce
edge about the IMRaD structure1 of the article together with
a tree. Our aim is to have a working indexing tool when-
some simple syntactic structure of the citation-containing
ever we add more knowledge to the categorization process.
sentence. The prototype uses 35 citation categories. In addi-
This goal appears very feasible given our design method-
tion to having a design which allows for easy incorporation
ology choice of using a decision tree: adding more knowl-
of more-sophisticated knowledge, it also gives flexibility to
edge only refines the decision-making procedure of the pre-
the tool: categories can be easily coalesced to give users a
tool that can be tailored to a variety of uses.
Two factors influence the development of the tree as fol-
Although we anticipate some small changes to the num-
ber of categories due to category refinement, the majormodifications to the decision tree will be driven by a more-
• The granularity of the citation categories determines
sophisticated set of features associated with each citation.
how many leaves are in the decision tree; and
When investigating a finer granularity of the IMRaD struc-
• The number of features that can be used to deter-
ture, we came to realize that the structure of scientific writ-
mine the category of a citation determines the po-
ing at all levels of granularity was founded on rhetoric,
which involves both argumentation structure as well asstylistic choices of words and syntax. This was the moti-
In earlier work, Garzone and Mercer ([5], [6]) proposed
vation for choosing the rhetoric of science as our guiding
a citation classification scheme that, with 35 categories, was
both more comprehensive than the union of all of the pre-
We rely on the notion that rhetorical information is real-
vious schemes and also amenable to implementation in an
ized in linguistic ‘cues’ in the text, some of which, although
automated citation classifier. We use this categorization in
not all, are evident in surface features (cf. Hyland [9] on
the citation classifiers, but a finer or coarser granularity is
surface hedging cues in scientific writing). Since we antic-
ipate that many such cues will map to the same rhetorical
Concerning the features on which the decision tree
features that give evidence of the text’s argumentative and
makes its decisions, we have started with a simple, yet fully
pragmatic meaning, and that the interaction of these cues
automatic prototype [5] which takes journal articles as input
will likely influence the text’s overall rhetorical effect, the
and classifies every citation found therein. Its decision tree
1The corpus of biomedical papers all have the standard Introduction,
is very shallow, using only sets of cue-words and polarity
Methods, Results, and Discussion or a slightly modified version in which
switching words (not, however, etc.), some simple knowl-
Table 3. Proportion of citation contexts containing a verbal hedging cue, by section and location of hedging cue. Table 4. Proportion of citation contexts containing a nonverb hedging cue, by section and location of hedging cue.
formal rhetorical relation (cf. [11]) appears to be the appro-
Not surprisingly, the morphology of scientific terminology
priate feature for the basis of the decision tree. So, our long-
exhibits comparison and contrasting features, for example,
term goal is to map between the textual cues and rhetorical
exo- and endo-. Science needs to measure, so scientific
relations. Having noted that many of the cue words in the
writing contains measurement cues by referring to scales
prototype are discourse cues, and with two recent impor-
(0–100), or using comparatives (larger, brighter, etc.). Ex-
tant works linking discourse cues and rhetorical relations
periments are described as a sequence of steps, so this is an
([10, 14]), we began our investigation of this mapping with
discourse cues. We have some early results that show that
Finally, as for our prototype system, we will continue to
discourse cues are used extensively with citations and that
evaluate the classification accuracy of the citation-indexing
some cues appear much more frequently in the citation con-
tool by a combination of statistical testing and validation
text than in the full text [15]. Another textual device is the
by human experts. In addition, we would like to assess the
hedging cue, which we are currently investigating [16].
tool’s utility in real-world applications such as database cu-
Although our current efforts focus on cue words which
ration for studies in biomedical literature analysis. We have
are connected to organizational effects (discourse cues), and
suggested earlier that there may be many uses of this tool,
writer intent (hedging cues), we are also interested in other
so a significant aspect of the value of our tool will be its
types of cues that are associated more closely to the purpose
ability to enhance other research projects.
and method of science. For example, the scientific methodis, more or less, to establish a link to previous work, set
up an experiment to test an hypothesis, perform the exper-iment, make observations, then finally compile and discussthe importance of the results of the experiment. Scientific
The pragmatic connotations of citation function and
writing reflects this scientific method and its purpose: one
other types of document links are a feature of scientific
may find evidence even at the coarsest granularity of the
writing which can be exploited in a variety of ways. We
IMRaD structure in scientific articles. At a finer granular-
anticipate more-informative citation and document indexes
ity, we have many targetted words to convey the notions
as well as more-intelligent database curation. Additionally,
of procedure, observation, reporting, supporting, explain-
sophisticated information extraction may be enhanced when
ing, refining, contradicting, etc. More specifically, science
better selection of the dataset is enabled. For example, syn-
categorizes into taxonomies or creates polarities. Scien-
onym detection in a corpus of papers may be made more
tific writing then tends to compare and contrast or refine.
tractable when the corpus is comprised of related papers de-
Table 5. Proportion of hedge sentences that contain citations or are part of a citation window, by section and hedging cue category.
rived from navigating a space of linked citations.
[4] E. Garfield. Information, power, and the science citation
In this paper we have motivated our approach to devel-
index. In Essays of an Information Scientist, Volume 1. In-
oping a literature indexing tool that computes the functions
stitute for Scientific Information, 1962–1973.
of citations. The function of a citation is determined by ana-
[5] M. Garzone. Automated classification of citations using lin-
lyzing the rhetorical intent of the text that surrounds it. This
guistic semantic grammars. M.Sc. Thesis, The University ofWestern Ontario, 1996.
analysis is founded on the guiding principle that the scien-
[6] M. Garzone and R. Mercer. Towards an automated citation
tific method is reflected in scientific writing.
classifier. In Proceedings of the 13th Biennial Conference of
Our early investigations have determined that linguistic
the CSCSI/SCEIO (AI’2000), pages 337–346. Lecture Notes
cues and citations are related in important ways. Our future
in Artificial Intelligence, volume 1822, H.J. Hamilton (ed.),
work will be to map these linguistic cues to rhetorical rela-
tions and other pragmatic functions so that this information
can then be used to determine the purpose of citations and
from thence to more-general document links. The results of
[8] A. Gross, J. Harmon, and M. Reidy. Communicating Sci-
our research will be a set of algorithms, methods, and soft-
ence: The Scientific Article from the 17th Century to the
ware tools that can be applied to the following problems in
Present. Oxford University Press, 2002.
[9] K. Hyland. Hedging in Scientific Research Articles. John
• Automated analysis of document content for cues to
[10] A. Knott. A data-driven methodology for motivating a set of
coherence relations. Ph.D. thesis, University of Edinburgh,1996.
• Automated classification of semantic links between
[11] W. Mann and S. Thompson. Rhetorical structure theory:
Toward a functional theory of text organization. Text, 8(3),1988.
• Mapping from typed document links to social net-
citation-related rhetorical cues in scientific texts. In Pro-ceedings of the Pacific Association for Computational Lin-
guistics (PACLING 2003) Conference, Halifax, Canada,2003.
[13] E. M. Marcotte, I. Xenarios, and D. Eisenberg.
literature for protein-protein interactions. Bioinformatics,
[1] M. A. Andrade and A. Valencia. Automatic extraction of
keywords from scientific text: Application to the knowledge
[14] D. Marcu. The rhetorical parsing, summarization, and gen-
domain of protein families. Bioinformatics, 14(7):600–607,
eration of natural language texts. Ph.D. thesis, University of
[2] C. Blaschke, M. A. Andrade, C. Ouzounis, and A. Valencia.
[15] R. Mercer and C. DiMarco. The importance of fine-grained
Automatic extraction of biological information from scien-
cue phrases in scientific citations. In Proceedings of the
tific text: Protein-protein interactions. In International Con-
16th Conference of the CSCSI/SCEIO (AI’2003), Halifax,
ference on Intelligent Systems for Molecular Biology (ISMB
[16] R. Mercer and C. DiMarco. The frequency of hedging cues
[3] B. Bollacker, S. Lawrence, and C. Giles. A system for au-
in citation contexts in scientific writing. In Proceedings of
tomatic personalized tracking of scientific literature on the
the 17th Conference of the CSCSI/SCEIO (AI’2004), Lon-
web. In Digital Libraries 99—The Fourth ACM Conference
on Digital Libraries, pages 105–113, New York, 1999. ACM
[17] G. Myers. Writing Biology. University of Wisconsin Press,
[18] V. Rubin, N. Kando, and E. Liddy. Certainty categorization
model. In AAAI Spring Symposium: Exploring Attitude andAffect in Text: Theories and Applications, Stanford, USA,2004.
[19] V. Rubin, E. Liddy, and N. Kando. Certainty identification
in texts: Categorization model and manual tagging results. In In: J.G. Shanahan, Y. Qu and J. Wiebe (Eds.), ComputingAttitude and Affect in Text: Theory and Applications (theInformation Retrieval Series): Springer-Verlag, New York,2005.
[20] J. Shanahan, Y. Qu, and J. W. (Eds.). Computing Attitude
and Affect in Text: Theory and Applications (the InformationRetrieval Series). Springer-Verlag, New York, 2005.
[21] J. Thomas, D. Milward, C. Ouzounis, S. Pulman, and
Automatic extraction of protein interactions
from scientific abstracts. In Proceedings of the 5th PacificSymposium on Biocomputing (PSB 2000), pages 538–549,2000.
[22] W. Wilbur and L. Coffee. The effectiveness of document
neighboring in search enhancement. Information ProcessingManagement, 30:253–266, 1994.
[23] W. J. Wilbur. A thematic analysis of the aids literature. In
Proceedings of the 7th Pacific Symposium on Biocomputing(PSB 2004), pages 386–397, 2002.
Things to consider about your birth. Support Team: - Who do I want present? Partner, doula, family? Is What comfort measures would I like to try? - Do I want to limit personnel, students/observers, etc? Environment: - Will I wear my own clothes or the hospital gown?- Would I like music, television, silence?- Would I like the lights dimmed & curtains drawn? - Focused relaxat
SAMPLE CALCULATION: If the Mean Recovered value for Level 4 = 10.1, you can calculate Theoretical Values by multiplying 10.1 by the “Linearity Factor” associated with each level. For example 189 Twin County Rd. Morgantown, PA 19543 Therapeutic Drug Monitoring Linearity Test Set INTENDED USE: Therapeutic Drug Monitoring Test Sets are for in vitro diagnostic use in verifying Lev