Description of DocuScope

by
Jeff Collins and Dave Kaufer
Carnegie Mellon University

Version current as of 12 November 2001

This description is available online as a PDF and HTML document.

DocuScope is © Copyright 1998-2001 by Carnegie Mellon University.
See Section 3.6 for a list of the current development team.

Contents

1  Introductory Overview

2  Language Theory Behind the Software
  2.1  Rhetorical Theory and Language Patterns
  2.2  Representational Effect Hierarchy
  2.3  Representational Effect Descriptions
    2.3.1  Effects of the Thought Cluster:
    2.3.2  Effects of the Description Cluster:
    2.3.3  Effects of the Linear Cues Cluster:

3  Software Information
  3.1  Theory Development
  3.2  Tagging Language Strings
  3.3  Language Visualization
  3.4  Software Operation
  3.5  Software Dissemination
  3.6  Development Team

Appendix A  DocuScope in Action (Screen Shots)

1  Introductory Overview

DocuScope is text tagging and visualization software, developed at Carnegie Mellon University's English Department to support writing courses taught to information design and professional writing students.

What does DocuScope do?   DocuScope is designed to let people visualize and understand representational effects in texts. It is not an attempt at artificial intelligence and the program does not ``understand'' or analyze anything it ``reads.'' DocuScope simply goes through text documents and finds patterns of words that the humans using the program have told it are relevant to representation (more on this in Section 3). DocuScope then displays its findings in ways that help users see the representational patterns in the texts. The software will also output quantifications of these patterns to comma-delimited text files for analysis in statistical packages.

Where can DocuScope be downloaded?   At this time it's not available to be downloaded. We have taught several courses and given workshops with DocuScope at academic conferences, but it is not a commercial product. If you are an academic researcher interested in evaluating DocuScope, contact David Kaufer (kaufer+@andrew.cmu.edu). DocuScope is owned by the authors and Carnegie Mellon University. You will need to sign a release from Carnegie Mellon's technology transfer office before we can send you a copy of the software.

What is representational composition?   Representational composition is the language theory responsible for the patterns DocuScope visualizes. We are at work on a forthcoming book [1] that explains the language theory fully. Even without the book, we've observed students learning this theory by using DocuScope and seeing how the computer tool ``reads'' and visualizes their texts. We discuss the theory in some detail in Section 2.

To help you see the big picture, here's a quick introduction to the point of this language theory. For a moment pretend the nasty chemical DDT has not yet been banned from widespread use in the U.S. and you want to write a paper arguing DDT should be banned. There are a myriad of options for designing such an argument, but here are three possible high-level plans:

Plan 1 Write an explicit argument that will provide your reader with the main reasons DDT ought to be banned.

Plan 2 Write a paper outlining your opinions of DDT and what you think about its continued use.

Plan 3 Write a description of DDT's effects (or potential future effects) on the environment.

Writing papers based on these three plans would lead toward quite different reading experiences for your reader: the first would feel like a research paper, leading the reader to consider your evidence critically; the second would feel like an opinion piece, leading the reader to consider how much credence to pay you and your opinion; the third would feel like a narrative, leading the reader to visualize the world you describe in your paper and compare it to the one the reader lives in and knows. These striking differences in reading experience are created linguistically: the language used to carry out your chosen plan will determine which experience your reader will have.

All three of these plans could potentially lead to effective papers arguing against DDT. The idea behind representational composition and DocuScope is that it is useful to understand the way language choices create these experiences for readers. In other words, representational composition attempts to draw attention to the linguistic choices authors make that lead readers to have particular experiences with texts. We believe writers design these experiences for their readers as they write. DocuScope is a program that lets writers see these differences explicitly and get better control of their representational language choices.

2  Representational Theory of Language

In this section, we provide an overview of the rhetorical theory behind DocuScope and give you some information about how the theory operates.

2.1  Rhetorical Theory and Language Patterns

DocuScope's method of characterizing language choice is drawn from rhetoricians' long-standing interest in the patterns of language that provide interactive experiences for an audience [2,3].

Perhaps an example of how subtle language changes affect rhetorical purpose would make this clear. The original (1) and revised (2) sentences below were written by a rental property owner to the manager of the property:

  1. Please ask if the tenants would be willing to move out early to accommodate renters who want to move in during the summer.

  2. Please ask if the tenants would be willing to move out early to accommodate any possible renters who might want to move in during the summer.

There is nothing particularly wrong with sentence (1) to cause its revision. The difference between the sentences is a difference of rhetorical effect, achieved through subtle language change. The first sentence implies renters have been lined up and are ready to move in during the summer. The second sentence makes the renters more tentative and uncertain, suggesting it is the reader (the property manager) who would know about potential renters, not the writer of the letter.

The second sentence conveys a slightly more tentative way for the manager to approach the current tenants and suggests the manager should take action to look for renters who might want to move in. Of course there are other ways a writer could achieve this impression. It's quite likely a writer would combine this sentence with others in the letter to help clarify the relationship and to make the direction to the manager more careful (or perhaps the relationship is already clear from the context of the letter). Nonetheless, by making the specific language changes to the first sentence, the author revised it for a rhetorical purpose-a purpose that was probably designed to help make the overall point of the letter.

Experienced writers have control of these subtle rhetorical shifts-manipulating their language choices here and there in their attempts to achieve different impressions for their readers. This is what is meant by the cumulative rhetorical effect of language choice: no one choice necessarily makes a strong impression, but cumulatively the choices lead to a particular impression for a reader. Historically, writers have attained this control through years of reading and writing practice-both in the school setting and beyond it. To try to help our writing students understand these cumulative language effects more explicitly, we created DocuScope.

Two tagged sentences
Figure 1: The two sentences in DocuScope. The different colors indicate different clusters of representational cues.

Using DocuScope to examine the two example sentences (Figure 1), we see the additions to the revised sentence have provided two dimensions of interactive experience for the reader (we describe all the dimensions in Section 2.3). First, the phrase ``any possible'' adds an indication of the writer's inner thinking. This means ``the renters'' have moved from being explicit, tangible people in (1) to being figures of imagination or potentiality in (2): they've gone from being in the world to being imagined in the writer's mind. They're not renters, but possible renters. The writer makes this clear for the reader by adding ``any possible'' to the sentence.

Second, the added word ``might'' combines with the pronoun ``who'' to create a phrase that overtly cues the reader that interactivity is required-it adds a conditional nature to the revised sentence. This means the renters have changed from being known by the writer in (1) to being unknown, potential renters who might want to move in, but that some action is required on the part of the reader to know this. In other words, by formulating the phrase ``who might'' in sentence (2), the writer implies an understanding something like ``You need to find a renter'' or ``When you find a renter.'' The writer achieves this impression of interactivity with the reader by composing the interactive phrase ``who might'' in the revised sentence.

The task this software undertakes is to distinguish the patterns suggested by this representational theory for the user. It accomplishes this by first isolating and categorizing the strings responsible for the effects and then presenting this information in a useful way.

2.2  Representational Effect Hierarchy

When we talk about the ``effects'' of representational composition, we are talking about extremely different micro-experiences for the reader. Consider the different effects of the word ``smeared'' in the following sentences:

  1. John smeared butter on his toast.
  2. John smeared his opponent.

The catalog behind DocuScope uses language strings to disambiguate such rhetorical functions, isolating recognized strings and assigning each to a classification within a hierarchical classification scheme. (This large-scale effort is beyond the scope of this paper and will be detailed elsewhere [1]).

The hierarchies of effects bear some rough and eclectic overlaps with the work of rhetorician I.A. Richards and linguist Michael Halliday [4,5]. From Richards, we took the important, but little noticed, idea that much of English separates into concepts of inner thought and outer sense-an important determiner for a rhetorical action is whether it is mental action spawning from a mind or outward action, projecting a descriptive reality. Based on Richards' work, we constructed two major divisions (which we call ``clusters'') at the top of our hierarchy, Thought and Description . These two clusters provide the major source of language at the word and phrase level for disclosing thoughts and for implementing the reader effects of immersion in spatial and temporal situations. Thought and description, together, create the combined inner and outer depictions required for building acquaintance effects with readers.

From Halliday's systemic-functional grammar, we took the idea that a fundamental function of language involves what Halliday calls the ``interpersonal metafunction,'' the ability of language to structure interactive relationships with implied or addressed audiences. From Halliday's work, we created a third major cluster in our hierarchy, called ``Linear Cues.'' These supply the main workhorse for information effects on readers. It is central to decision-making insofar as getting the reader to decide depends upon getting and sustaining the reader's attention through a chain of reasoning, an immersive example, or an exhortation. Combined with (spatial) description, linear cues are central as micro-level actions to navigate readers through the spatial task of reading the text.

representational effects hierarchy

Figure 2: The hierarchy of representational effects.

Each of the three clusters of effects are further divided into two subcategories, as indicated in Figure 2. DocuScope uses these subcategories to choose the color for each of the matched strings. This use of color is designed to benefit users by drawing attention to specific consistencies and variabilities within and among texts (more on this text visualization is found in section 3.3).

We have not yet formally evaluated how well these clusters and subcategories correspond to a broad range of readers and their reading experiences with texts. Our anecdotal experience using the software in the writing classroom indicates these clusters and subcategories are adequate for drawing student attention to the particular representational effects of their writing that is our aim. The hierarchy (and concomitant color scheme) help users of the software compare their use of representational effects in their writing to the writings of others.

2.3  Representational Effect Descriptions

The software at the time of this writing characterizes over 250 million strings according to 18 non-overlapping categories of representational effect. Below is a brief description of the effects, grouped according to the three clusters, thought , description and linear cues . A fuller description may be found elsewhere [2] and will be available in our forthcoming book [1].

Each description below is followed by phrases containing a few highlighted strings that are assigned to the representational effect.

2.3.1  Effects of the Thought Cluster:

First Person:
Often when reading we get an impression of a mind at work, either the author's or a narrator or other character. Writers often use strings containing self-referential pronouns to individuate the point of view of a single consciousness from objects outside of it. Example strings:I have peaches in my cart and I'm positive they're organic.

Inner Thinking:
Private and particular minds on the page don't necessarily require first person. Writers can use English strings that give readers the impression of private minds at work by using, for example, thinking verbs or assurances. Example strings: Never did we imagine the need to heed the meaning of ``organic.''

Think Positive:
Writers signal positive feeling in the worlds they create by using strings that contain ``feel good'' words and phrases. These strings are less constrained than inner thinking strings because they can be unassociated with more complete thoughts. Example strings: Recent laws have made it easier to be sure what you get---a welcome change that promises for a new relationship with organic foods.

Think Negative:
Likewise, writers have available a large class of strings evoking distress in the mind of the writer, narrator, or character. Even wrapped around neutral descriptions, strings of negative affect signal the reader that the writer disapproves. Example strings: It was those abuses of food processing that produced many of the restraints that prohibit current labelling practices.

Thinking Ahead:
This dimension is evident when a text contains projections into the future. These strings capture a mind anticipating a potential event rushing toward the present with a prospect of actualization. Example strings: The government will get into oversight because no bureaucrat will want to be blamed for missing a problem.

Thinking Back:
Thinking-back language effects occur when a text contains retrospections on a past. The reader feels a mind recollecting an event that had assumed or experienced actuality and that is now receding into memory. Example strings: The legislation has made it easier and may have prevented a problem. The old law was to have kept shoppers guessing about ` ` organic ' ' labels.

2.3.2  Effects of the Description Cluster:

Word Picture:
Writers use these strings to prime images that embody all the major elements of the story. Writers prime a word picture to allow readers to ``see'' the skeleton of the story in mental imagery. Example strings: It set about hiring 100 analysts in 56 cities across Europe.

Space Interval:
These strings prime the reader's sense of spatial contiguity. English relies primarily on a handful of strings containing prepositions (e.g. on, under, across, against, over, alongside, in front of, in back of) to carry the burden for signaling relationships between objects occupying contiguous space. Example strings: It will share violations with news agencies, including the Times, he added, saying a new press office will be built near the Brandenburg Gate.

Motion:
These strings prime kinetic images in the mind of the reader. They force readers not only to build an image of motion, but also to specialize the kinetic shape as part of their basic understanding of the world created by the writer. Example strings: France's Renseignements Generaux can open mail and tap farmers' phones at will.

Past Events:
The simplest way for readers to feel the elapse of time is through strings containing the simple past tense. The advantage of the simple past for writers is that event starts and stops can be signaled to the reader. Example strings: They just caught a case of fraud as the agency got set up and operated out of Germany.

Shifting Events:
Another way English conveys time's passage is by creating shifts across events. These shifts, often captured in strings containing adverbials, occur in both time and space. In the physical world, time and spatial shifts invariably co-occur. However, English phraseology often separates shifts in time from shifts in space, providing writers with rich access to time adverbials and space adverbials that do not overlap. Example strings: Once food enters the country it will be labeled at the same time it is inspected.

Time Interval:
Event uniqueness or repetition is usually indicated by writers through strategic selection of strings containing time adverbials. Temporal adverbs are often used in repeated event strings. Beyond single-word adverbs, writers of English encode repeated events through a large inventory of adverbial and prepositional strings, all signaling temporal recurrence. Example strings: The agency is already changing. The last time it got involved was during the cold war years.

2.3.3  Effects of the Linear Cues Cluster:

Cue Common Knowledge:
Writers use these language strings to cue a reader's prior knowledge of standards, resemblances, authorities, precedents, and values. The ancient rhetoricians had an umbrella name for this class of priors-commonplaces. Commonplaces increase the solidarity between writer and reader, as merely referencing a commonplace highlights shared knowledge, beliefs and values that the writer can use as leverage for persuasion. Strings cueing prior knowledge make communication with readers more efficient, as commonplaces function as implicit premises to carry conclusions. Example strings: The food will be safe only when security and liberty are assured.

Cue Prior Text:
Writers increase a reader's sense of familiarity by cueing information that readers have learned in the course of reading the text. Such cueing provides important grounding to readers within the text. Readers understand that the writer has composed the text to take into account the familiar information that results from the reader's progress through the text. Example strings: Does this sound familiar? To avoid more of them the agency needs oversight to let it know its boundaries.

Cue Reader:
Using these strings, the writer acknowledges-in some cases fully addresses-the reader's presence, guiding the reader explicitly or previewing coming features of a text. These are the chief strings through which writers indicate their intentions and purposes to readers as well as telegraph the plan of the text to come. Example strings: Does this seem silly? Remember, we are shielded by our laws and protections.

Cue Notifier:
Writers use these language strings to signal readers about key terms and information, indicating the presence of chunks of information to provide readers with a key discriminator in helping them create schemata and conveying the text's organization. Example strings: The paradox is that the agency would be a different organization, a kind of food intelligence agency.

Cue Movement:
These language strings address a reader who is negotiating a physical action in his or her proximate environment. The action may be related to some institutional procedure or practice (like filling out a tax form), may require mental focus and review, or may direct physical actions with the reader's hands or body. Example strings: To ensure organic purity put the food lables under more scrutiny. Rotate the package quickly and look for the new EU holographic symbol.

Cue Reasoning:
Using these strings, writers guide the reader as a thinking being who, in following the text, follows a path of reasoning. Strings of this type launch lines of reasoning the reader is meant to follow. This class is marked by language strings indicating logical connection. Example strings: Nor, for that matter, has the government, but even if industry needed nothing else, inspections might falter.

3  Software Information

This section outlines the effort that went into software development and describes some key aspects of the software's functionality.

3.1  Theory Development

While we can draw on a long history of rhetorical interest in patterns of language effect for the above understandings, neither the ancient nor the contemporary writers on rhetoric have had a theory of rhetorical patterning that a modern computer scientist would call specific enough to implement. Thus, the first six years of this development project was spent refining a detailed theory of rhetorical patterning. During this development, we were more concerned with the quality of the theory than with whether a computer program could ever result.

Drawing broadly from the rhetorical tradition, but mainly from rhetorical practice, we started to build a theory of rhetorical patterns within texts. From 1992-1995, Kaufer and Butler studied tens of thousands of phrases from the Lincoln-Douglas debates, coding and classifying them for the different functions they served. The book that resulted from their work [3] was a systematic exploration of the large variety of patterned experiences the debates makes available to a listener or a reader. The texts of the debates were threads of heterogeneous patterns creating a wide variety of interactive experiences. Before the 1996 book project, they thought that what was known about genres (in this case the genre of public debate) could help them understand the patterns available in the debates. After that project, they were convinced of the need for a more empirical, bottom-up, theory of English patterns and variation to understand not only the debates, but also textual genres in general.

The challenge became to put various textual genres under the microscope for their distinctive rhetorical patterning. Kaufer and Butler pursued this challenge, beginning in 1996. They designed a writing curriculum in comparative genres and, from the student papers, began indexing characteristic patterns associated with different genres. They also began to keep external archives of texts, cross-checking the patterns they were finding from their students against published writers. After several years of iterations of the course, the patterns for the different genres began to crystallize and stabilize. Kaufer and Butler published a second book [2] to describe these patterns at a high level of generality.

While Kaufer and Butler had a large inventory of English genres and patterns and a theory of their contribution to rhetorical effect, two lingering problems were encountered that technology seemed well equipped to solve.

First, although they had kept massive journals from their work on English genres, most of the notes on rhetorical patterns were less precise than literal catalogs of English phrases. English phrasal patterns are extremely difficult to store or manage on paper or even in standard databases. Different English patterns can share multiple words. What we needed was an information system to manage and differentiate rhetorical knowledge from naturally occurring English patterns. This required a system with the flexibility to categorize English patterns differently, even when they overlapped substantially. We knew of no existing information system equipped to do this.

Second, teaching the course in comparative genres was becoming both frustrating and exhausting. Each semester, Kaufer had to summon for students his tacit knowledge of English genres and phrases. This seemed to be re-inventing the wheel, semester after semester, without building on what we already knew. In discussion with Suguru Ishizaki and Kerry Ishizaki, the idea was hatched to design a computer program that would parse and visualize this knowledge automatically. Students could have some of a reader's tacit knowledge built into the system and could explore their own texts for projected rhetorical effects. They could learn at their own pace as part of discovery learning. They would not have to try to read the teacher's mind about tens and hundreds of thousands of rhetorical patterns the teacher could not articulate.

Creating a program that could visualize rhetorical patterns in a simple interface posed a new and daunting challenge. This required enormous talent and skill in text parsing and visual interface design. (See Section 3.3 for a description).

As our system began working well in our classrooms, our categories of reader-centered rhetorical function at the micro-level began to stabilize and become more concrete. We were able to enlist student help to fill in strings that constituted micro-rhetorical action. We created a public process by which students could make ``bug'' reports about omissions (a string should be added), ambiguities (a classified string has multiple classifications and can be lengthened to disambiguate it), and misclassification (a sting classified in category A should be in Category B). Through in-class discussion and final logs, students made a game of trying to predict and improve upon the string-matchers performance at capturing rhetorical actions in texts they wrote and read within the class.

To get a sense of this process. Consider the motion word jump . This word seems to change rhetorical roles abruptly when embedded in the longer string ``jump to the conclusion'' or ``jump at the chance.'' When embedded in these longer forms, does the motion sense still survive or is it superseded by rhetorical actions of other types? Shall we, for example, classify the string ``jump to the conclusion'' as negative information based on one datum a student can find or think up (e.g., ``he jumped to the conclusion too quickly'') or is it better regarded as a syntax of anticipation based on another conceivable string (e.g., ``jumped to the conclusion that...'')? Should jump at retain its spatial action categorization based on datum like ``jumped at Fred'' or should it be abstracted out of space into a general marker of intensity (``I'd jump at the chance''). By inquiry of this sort, we were able to assign ``jump'' to multiple functions, depending upon disambiguated strings in which it was embedded.

3.2  Tagging Language Strings

DocuScope could be accurately described as a flexible string-matching software: it can match strings up to any length. We designed the string-matcher to support human coding schemes. Thus, the string-matcher does no automatic analysis of its own, but rather implements across a text, or corpus of texts, whatever coding of categories a human has supplied. In our case, the coding categories have been based on representational composition, as discussed in Section 2.

The idea of a string-matcher that can match strings of different lengths is very important, because a series of words may not disambiguate themselves with respect to rhetorical action until deep into the series. Consider, for example, the two strings:

  1. On one hand there was a freckle. (``hand'' refers to spatial relationship)
  2. On one hand there was evidence of fraud. (``hand'' part of logical opening)
Rhetorically, the first sentence creates a spatial experience for a reader. The second, creates a sense of engaging the reader's reasoning in an argument. These are extremely different experiences for the reader, but the strings do not start to disambiguate until the sixth word. To capture the difference, our string-matcher required an algorithm that can hold in memory all strings that begin with ``on one hand there was'' while it explores the next word in the series, seeking a match that disambiguates the rhetorical function. When a disambiguation is found, the string can be isolated and assigned to a classification within a hierarchical classification scheme (Section 2.2).

The software we have built tags and recognizes contiguous strings that appear to do rhetorical work. In relation to theorists of written communication, our software captures an account of language that overlaps with much of the language phenomena associated with writer-reader involvement, tying writers to readers through textual experience that can stand in for face-to-face experience.

For several years now, the catalogs of strings have continued to grow incrementally and as a cooperative process with students in our writing classrooms. While the rhetorical tagging program we designed has done the job we needed it to do, it has limitations. For example, it tags only contiguous language parts. It thus knows nothing about logical dependencies or about shifting speaking roles within textual dialog, for example.

3.3  Language Visualization

Differences in rhetorical function at the string-level are often hard for the human eye to detect. Therefore, we attached our string-matcher to a visual interface that allowed human coders to see its performance on actual texts.

Visualization systems attempt to tap people's natural strengths in rapid visual pattern recognition to support performance in activities that involve information processing [6,7,8]. As has been demonstrated in many areas, graphics are useful for enhancing human performance because complex cognitive tasks (e.g., comparing numbers from a table) may be replaced by perceptual inferences (e.g., perceiving the relative height of two adjacent figures).

One of the key considerations in designing visualization systems, as Lohse [9] suggests, is to ``facilitate and direct attention to visual feature(s) that communicate the requisite information when it is needed during the task'' (p. 385, emphasis added). While visualization technologies have been shown to support thinking and decision-making in many non-technical, multivariate areas such as management and legal studies [10,11,12,13], they have not yet been widely employed in language study, outside of concept mapping and annotation or collaboration support [14,15,16,17]. There are likely several reasons for this, including the necessity of reading texts closely and serially [18,19]; the difficulty aggregating textual understandings visually [20,21,22,23]; and the difficulty in providing closely-coordinated and appropriate task representations and graphical support for reading/writing/revising performance [24,25].

The extraordinary modularity of texts requires people to manage their attention in support of their current writing and reading task(s) [26,27]. While empirical observation of these processes is difficult, researchers have outlined several of the constructive reading and writing processes [28,29,30]. Underlying these processes are four, interdependent cognitive tasks that we hypothesize could be supported with visualization technology: (1) fixation and (2) spatial/temporal attention that leads toward (3) explicit and implicit ``seeing'' and (4) conscious perception [9,31].

To support our writing students and to begin to test our theories, we've associated the different representational effects with specific colors (see Sections 2.2 and 2.3) and, using colors, weighting, and positioning, have developed several visualization schemes within DocuScope to support some of the cognitive processes that underlie reading and writing. We have not formally evaluated the effectiveness of these schemes. In this paper we provide four screen captures to illustrate some of the visualization schemes that our students have found most beneficial (Figures 3-6 in Appendix A).

3.4  Software Operation

Over the four years of its classroom use, DocuScope has evolved toward becoming a mature text tagging and visualization system. It has two integrated applications. One application is for single text visualization (STV) and another for multiple-text visualization (MTV). In writing classrooms, students tend to start in MTV to locate significant rhetorical trends across the classroom of texts. They then drill down into single texts to understand how individual student writers (and texts) produce these trends. It is beyond the scope of this document to describe the program's use in the writing classroom. A short tutorial is being tested and will (eventually) be disseminated to students, educators and researchers learning to use the software.

3.5  Software Dissemination

We have made DocuScope available free of charge to academic researchers interested in evaluating it. Since it is not a commercial product, we cannot guarantee support but we help as we can. DocuScope is a Java application, running under Windows, Mac OS X, and Unix/Linux. If you are interested, contact David Kaufer (Kaufer@andrew.cmu.edu). DocuScope is owned by the authors and Carnegie Mellon University. You will need to sign a release from Carnegie Mellon's technology transfer office before we can send you a copy of the software.

We have given several workshops with DocuScope for writing teachers in the Pittsburgh public and independent schools. The response has been enthusiastic. DocuScope is very intuitive to use. It takes about an hour of face-to-face training to get teachers and students up to speed with it. We have not had the resources to provide true training and multi-institutional testing needed for large-scale phased dissemination.

3.6  Development Team

DocuScope has been developed at Carnegie Mellon University's English Department to support writing courses taught to information design and professional writing students. The development team includes David Kaufer, Suguru Ishizaki, Brian Butler, Kerry Ishizaki, Milu Ritivoi, Pantelis Vlachos, and Jeff Collins.

References

[1]
David S. Kaufer, Suguru Ishizaki, Brian S. Butler, and Jeff Collins. Everyday Strings of English and the Rhetorical Priming of Audience: Unveiling the Speaker and Writer's Hidden Craft. Lawrence Erlbaum Associates, Mahwah, NJ, In Press.

[2]
David S. Kaufer and Brian S. Butler. Principles of Writing as Representational Composition: Designing Interactive Worlds with Words. Lawrence Erlbaum Associates, Mahwah, New Jersey, 2000.

[3]
David S. Kaufer and Brian S. Butler. Rhetoric and the Arts of Design. Lawrence Erlbaum Associates, Mahwah, New Jersey, 1996.

[4]
I. A. Richards. Context theory of meaning and types of context. In Ann E. Berthoff, editor, Richards on Rhetoric, pages 111-117. Oxford UP, New York, 1991.

[5]
M. A. K. Halliday. An Introduction to Functional Grammar. Edward Arnold, London, 2nd edition, 1994.

[6]
N. Gershon and S. G. Eick. Information visualization. IEEE Computer Graphics and Applications, 17(4):29-31, 1997.

[7]
A. Spoerri. Infocrystal: A visual tool for information retrieval. In Visualization '93, pages 150-157, San Jose, Calif., 1993.

[8]
Daniel A. Keim. Information visualization and visual data mining. IEEE Transactions on Visualization and Computer Graphics, 8(1):1-12, 2002.

[9]
Gerald Lee Lohse. A cognitive model for understanding graphical perception. Human-Computer Interaction, 8:353-388, 1993.

[10]
Stuart K. Card, Jock D. Mackinlay, and Ben Shneiderman. Readings in Information Visualization: Using Vision to Think. Morgan Kaufmann Publishers, San Francisco, Calif., 1999.

[11]
Bart Verheij. Automated argument assistance for lawyers. In Seventh International Conference on Artificial Intelligence and Law, pages 43-52, Oslo, Norway, 1999. ACM Press.

[12]
Dan Suthers, Alan Lesgold, Sandy Katz, Arlene Weiner, Eva Toth, Kim Harrigal, Dan Jones, Joe Toth, John Connelly, and Yazmine DeLeon. An interface for collaborative and coached approaches to learning critical inquiry. In Proceedings of the 2nd International Conference on Intelligent User Interfaces, pages 217-220. ACM Press, 1997.

[13]
W. Wright. Business visuzlization applications. IEEE Computer Graphics and Applications, 17(4):66-70, 1997.

[14]
A. J. Bernheim Brush, David Bargeron, Anoop Gupta, and J. J. Cadiz. Robust annotation positioning in digital documents. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 285-292. ACM Press, 2001.

[15]
Camille Bierens de Haan, Gilles Chabré, Francis Lapique, Gil Regev, and Alain Wegmann. Oxymoron, a non-distance knowledge sharing tool for social science students and researchers. In Proceedings of the International ACM SIGGROUP Conference on Supporting Group Work, pages 219-228. ACM Press, 1999.

[16]
James H. Morris, Christine M. Neuwirth, Susan Harkness Regli, Ravinder Chandhok, and Geoffrey C. Wenger. Interface issues in computer support for asynchronous communication. ACM Computing Surveys (CSUR), 31(2es):11, 1999.

[17]
Joanna L. Wolfe and Christine M. Neuwirth. From the margins to the center: The future of annotation. Journal of Business and Technical Communication, 15(3):337-371, 2001.

[18]
N. Ann Chenoweth. Reading for Revision : A Comparison of Teachers' and Students' Judgments. PhD thesis, Carnegie Mellon University, 1997.

[19]
Linda Flower. The construction of purpose in writing and reading. College English, 50(5):528-550, 1988.

[20]
James A. Wise, James J. Thomas, Kelly Pennock, David Lantrip, Marc Pottier, and Anne Schur. Visualizing the non-visual: Spatial analysis and interaction with information from text documents. In Proceedings of the Information Visualization Symposium 95, pages 51-58, Zurich, 1995. IEEE Computer Society.

[21]
Christine M. Neuwirth, R Chandhok, David S. Kaufer, J. H. Morris, P. Erion, and D. Miller. Computer support for distributed collaborative writing: A coordination science perspective. In T. W. Olson and J. B. Smith, editors, Coordination Theory and Collaboration Technology, pages 535-558. Lawrence Erlbaum, Mahwah, NJ, 2001.

[22]
Susan Havre, Elizabeth Hetzler, Paul Whitney, and Lucy Nowell. Themeriver: Visualizing thematic changes in large document collections. IEEE Transactions on Visualization and Computer Graphics, 8(1):9-20, 2002.

[23]
B. G. Becker. Using mineset for knowledge discovery. IEEE Computer Graphics and Applications, 17(4):75-78, 1997.

[24]
Christine M. Neuwirth, James H. Morris, Susan Harkness Regli, Ravinder Chandhok, and Geoffrey C. Wenger. Envisioning communication: task-tailorable representations of communication in asynchronous work. In Proceedings of the ACM 1998 conference on Computer supported cooperative work, pages 265-274. ACM Press, 1998.

[25]
Patricia Ericsson and Tim McGee. Squiggley green lines and red ink: Examining the ``innards'' of the Microsoft writing coach. Presentation at the 2001 Computers in Writing conference, 2001.

[26]
John R. Hayes. A new framework for understanding cognition and affect in writing. In C. Michael Levy and Sarah Ransdell, editors, The Science of Writing: Theories, Methods, Individual Differences, and Applications, pages 1-27. Lawrence Erlbaum, Mahwah, NJ, 1996.

[27]
David L. Wallace. From Intention to Text: Developing, Implementing, and Judging Intentions in Writing. Dissertation, Carnegie Mellon University, Pittsburgh, Penn., 1991.

[28]
Christina Haas and Linda Flower. Rhetorical reading strategies and the construction of meaning. College Composition and Communication, 39(2):167-183, 1988.

[29]
Walter Kintsch. Learning from text. In Lauren B. Resnick, editor, Knowing, Learning, and Instruction, pages 25-46. Lawrence Erlbaum, Mahwah, NJ, 1989.

[30]
Christina Haas. Writing Technology: Studies on the Materiality of Literacy. Lawrence Erlbaum, Mahwah, NJ, 1996.

[31]
Marvin M. Chun and Jeremy M. Wolfe. Visual attention. In E. Bruce Goldstein, editor, Blackwell Handbook of Perception, pages 272-299. Blackwell Publishers, Malden, Mass., 2001.

Appendix A  DocuScope in Action (Screen Shots)

The screen shots comprising this appendix were captured while analyzing The Federalist papers for a recent project. These shots were taken to illustrate four of the different methods of visualization offered by the software.

Single Text Visualizer: Dimensional View

Figure 3: Single Text Visualizer (STV): Dimensional View

The STV dimensional view shows a single text that has been tagged by the software. The text's score on each of the 18 representational variables is listed at the left of the screen. Each of the 18 matrices at the right of the screen is a view of the same portion of the text.

Single Text Visualizer: Text View

Figure 4: Single Text Visualizer (STV): Text View

The STV text view shows a single text that has been tagged by the software. The text's score on each of the 18 representational variables is listed at the left of the screen. The text itself, with tagged strings underlined, appears at the right of the screen. The tags may be toggled off or on by clicking on the variable name in the variable list at left.

Multi-Text Visualizer (MTV): Range View

Figure 5: Multi-Text Visualizer (MTV): Range View

The MTV range view summarizes the scores of all the tagged texts in a collection of texts. The 18 representational variables are listed at the left of the screen with a modified boxplot* indicating the range of scores for the texts in the collection. The texts in the collection are listed at the right of the screen, sorted by the currently-highlighted variable (``ThinkNegative'' in this shot, indicated by the green color). Different variables are highlighted by clicking on them. The texts may be sorted into groups and colored for easy separation, as indicated in the lower right-hand corner of the screen shot.

* The endpoints of the boxplots represent the high and low values, including any outliers. Therefore, we call these modified boxplots. The ``fences'' are marked by the yellow indications above and below the boxplots. Outlying text scores are also indicated with asterisks beside the text name at the right of the screen.

Multi-Text Visualizer (MTV): Map View

Figure 6: Multi-Text Visualizer (MTV): Map View

The MTV map view shows all the texts in a collection of texts. The 18 representational variables are listed along the X and Y axes. Clicking on a variable on an axis plots the texts' scores on that variable along that axis. The texts may be sorted into groups and colored for easy separation, as indicated in the lower right-hand corner of the screen shot. The texts are listed at the right of the screen. Clicking on an individual text highlights that text within the plot and generates the bar graph shown just to the left of the text names. The plot may be scaled, as necessary.


File translated from TEX by TTH, version 2.25.
On 11 Nov 2002, 22:30.