Appendix 2

Choice of parameters in drawing up a lexical net

In this appendix a short extract from one of the interview transcripts from Chapter 7 is used to illustrate the consequences of using different numerical parameters when drawing up a lexical net and to explain how particular parameters were decided on for the main analysis. The extract is from the first interview with interviewee number 44, a 21 year-old woman in her second admission to the hospital. The extract is from a part of the interview dealing with reasons for admission and has been annotated using the COCOA format:

<topic why>

<p M> The first sort of standard question that I ask people is if you could tell me why you're here - you know the obvious thing.

<p P> Uh as far as I understand it I suffer depression. I'm not sure whether it's endogenous or whether it's reactive. I think it's a bit of both [ja], and I become extremely suicidal. I don't have a great love for life as it is [ja]. And that's basically why I'm here. And also to discover why you know I feel the way I do about my life and my circumstances.

<p M> What uh sort of form might that discovery take, do you think?

<p P> Uh...I have a lot of trouble expressing myself, especially my emotions, and it causes a lot of anger within me, and causes me to isolate myself from other people. It affects my life outside. And what I'm hoping to achieve here through therapy is to learn to express these emotions so that uh I can function normally, outside and not find myself hiding away [yes], and find myself acceptable to those outside.

<p M> But so you feel it's probably a kind of psychological thing really then.

<p P> I'm inclined to think it's more psychological, personally yes.

<topic preadmit>

<p M> Uh to change the topic slightly, I'd like to come back to this later, if you could just kind of tell me what happened in the two three weeks prior to coming in this time.

<p P> I was actually at X hospital before I came here [is it?]. I was there for five and a half weeks. What happened was...I started getting very suicidal and I decided to actually do something [yes]. And I happened to talk to a friend of mine who eventually had me certified and sent to X hospital.

<p M> Certified?

<p P> Ja [gmmm]. Where I, I was locked up there for four weeks, and then I went to an open ward for a week and a half, and then I managed, I requested transfer here, because I find this environment more therapeutic.

<p M> Much better, I'm sure...Uh how did you feel about this certify business, were you sort of [interruption at door]?

<p P> It was a shock, it really was a shock [is it]. I had been threatened with it before, uh, but I had never sort of really thought I'd actually end up there, and it was, it was very difficult. I was very angry in the beginning [mmm]. But uh, that subsided. I don't hold it against my friend for certifying me. I would have, I would have done the same for her. She was worried.

<p M> And it was the only way to kind of force you to come.

<p P> Ja [gmmm].

<topic helpprof>

<p M> OK ja, then I'd like to sort of hear the opinions of, or the opinion of some professional person that's, you feel has been helpful to you in the past. Uh or that you have felt close to, such as a psychologist, psychiatrist, social worker, whatever. Uh if you could say who the person is, maybe not by name, and uh how he or she defines your problem.

<p P> ...One professional person?

<p M> Preferably, but if you like, you could -

<p P> I've been seeing one of the psychologists here, a female psychologist, and...she's seen me in my, my good states, my bad states [ja] and it, it makes it easier to discuss things with her because she knows ME, for what I am, without a mask I'm inclined to put on. And it's helpful that she's there to listen. She understands, she doesn't judge you. She's objective, she's not subjective. know they're able to read, sometimes read between the lines so to say [ja]. So that you find yourself, something you can't express [yes], with their prompting and their aid, it makes it easier to express things, and it increases an awareness within one. Sometimes one gets a bit tired of talking and talking and talking, but you know one gets to realise that it's got to come from you [yes]. Although you sometimes feel frustrated, because you'd like them to give the answer, you know right there, and say look this is what you must do [yes]. You do realise that they're there to help you, and it's got to come from you, yourself [ja]. And it, it's sometimes difficult to talk about things, but it helps knowing that they're not going to judge you, they're not going to hold it against you, it's not going to go further than the professional team working here. And that in itself makes you feel a bit more comfortable, because sometimes things get a bit personal.

<p M> Yes but it's kind of, they're not part of your friends circle or something.

<p P> Ja, it's not as though you need worry about people finding out what you said, or how you feel about things [ja, ja]. And that, that in itself makes things a lot easier.

<topic helplay>

<p M> What about a non-professional, uh like family or an acquaintance or something. If you could sort of pick on some person whose been helpful to you there.

<p P> ....[sighs] I have a friend, who is also an X with me, and

<p M> Are you an X?

<p P> Ja. And she has been through much the same experience herself - she's actually the one that certified me - and I find that we're able to talk to one another quite freely, knowing that that person's not going to spread it, you know, around the whole group or whatever [ja]. And having been there herself, she's very understanding. And it's not of a case where she's trying to give advice. She might discuss her experience and one can learn from that. And it's, you know, it's helpful that you know the person and you feel comfortable with them. And they, OK it's subjective, but they know where you're coming from, they know what type of person you are, what lifestyle you lead, you know things like that, they know your background a bit better [ja, OK].

A conventional qualitative analysis of this extract could focus on a variety of different themes, such as: The nature of the interaction between the interviewer and interviewee (which appears to be structured to allow for maximal talking by the latter, while the former presents himself as an empathetic listener despite the sometimes brutal topic changes); the interviewee's easy familiarity with (sometimes outdated) clinical jargon such as 'expressed emotion' and 'endogenous' versus 'reactive' depression; the ostensible high regard shown by the interviewee for others' opinions about her situation; or the interviewee's acquiescence with and resistance to various forms and degrees of incarceration.

An analysis making use of a lexical net, by contrast, would start by identifying repetitive linguistic patterns (such as for example, the repetitive use of the "and...and...and" structure in the last paragraph of the interview transcript) as a starting point for further qualitative analysis and interpretation. Lexical nets provide an automated, objective means of identifying patterns that depend on the repeated co-occurrence of words. The following parameters affect what is counted as a significant co-occurrence:

Span: The number of words on either side of a target word which are counted as co-occurring with the target word.

Minimum collocation size:

The minimum strength of collocation (as calculated using the z-score) which is accepted as significant. Although all collocations with a z-score of above 2.57 are statistically significant at the 5% level, in a large text inordinately many collocations reach significance and different (usually more stringent) cut-offs have to be set.

Minimum collocation frequency:

In cases where words occur together consistently the statistical procedure will flag them as collocates even if they are used quite infrequently in a text. To prevent such rare words from becoming too prominent in a lexical net it is therefore necessary to exclude words that co-occur less frequently than a certain cut-off.

These three parameters are interdependent, such that a larger span will result in more statistically significant collocates and a consequent need to set more stringent cut-offs with regard to minimum collocation size and frequency. In addition, the number of statistically significant collocates also depend on the size of the text, with more significant co-occurrences identified as the length of the text increases. At present there is no objective method of deciding on the minimum collocation size and frequency, and a pragmatic approach aimed at keeping the lexical net within reasonable levels of complexity has to be followed.

The situation with regard to the first parameter, the collocation span, is somewhat different. In theory, a small span will result in the identification of lexical redundancies at the level of frequently used word-pairs and phrases, while a large span will result in the identification of co-occurrences which occur when words are frequently used in the same general context, but not necessarily next to each other or as part of the same stock phrase. As a general rule, a small span can therefore be expected to throw syntactic relationships between words into relief, while a larger span will tend towards the identification of semantic contingencies.

To illustrate this, three different lexical nets derived from the extract are presented below. The net in Figure 1 is based on a a span of four words (two on either side of the target word), which (as discussed above) can be expected to highlight stylistic redundancies such as word pairs or short phrases. Figure 2, by contrast, uses a span of 40 words, which can be expected to show patterns of co-occurrence of words not necessarily in close proximity.

Figure 1. Net for a span of 4, minimum frequency of 5 and minimum size of 1.64

As can be seen from Figure 1, the shorter span does indeed reveal word pairs and phrases such as 'you know' (which occurs no fewer than 8 times) and a pairing between 'I' and 'have' which occurs 6 times in short phrases such as 'I have', 'I don't have' and 'I would have'. An extreme view would be that using a very short span simply leads to the rediscovery of grammatical rules at the syntactic level. (For example, 'I' and 'has', being a grammatical miss-match, are unlikely to be significant collocates for any text when the span is set to two.) Nevertheless, the collocational patterns found when a small span is used will not be the same for all texts and do reveal something of the stylistic 'signature' of a particular text. The 'you know' locution shown in Figure 1, for example, rarely occurs in written texts.

Figure 2. Net for a span of 40, minimum frequency of 8 and minimum size of 5.07

At the other extreme from the lexical net based on a very short span shown in Figure 1, is the net in Figure 2, which is based on a span of 20 words on either side of the target word. Thus, for Figure 2, words are taken to be in the same general area of the text if they occur within 40 words of each other. The first consequence of this, as is evident from Figure 2, is that many more word pairs are considered to be statistically related, despite the fact that far more stringent cut-offs have been set, with a minimum collocation size of 5 and a minimum collocation frequency of 8. As can be seen, some of the stronger short-range (or syntactic) collocations such as 'I-was' and 'you-know' have been preserved in this net, to which have been added a profusion of longer-range collocations with a somewhat more semantic flavour. For example, we can see that the word 'suicidal' tends to occur with statistically significant regularity in the vicinity of 'I', even though the two words do not form part of a stock phrase.

Another effect of using a very large span is that high frequency words (such as in this case 'you', 'I' and 'to') are shown as high-frequency collocates of a large number of words that rarely co-occur with one another. Thus a word such as 'you' will tend to occupy a central position in a net, with a large number of collocations radiating out from it. Taken to extremes, very large word spans could thus result in lexical nets that do little more than reproduce the word frequency table for a text.

Figure 3, with a span of 8 and frequency cut-offs somewhere between those for Figures 1 and 2, represents a middle position. Strong short-range collocations such as 'I-was' and 'you-know' present in both the other figures are reflected here as well, as are some present in one of the figures only. The phenomenon of high-frequency words such as 'you' and 'and' occurring at the centre of a collocational web can also be observed, but is much less marked than in Figure 2.

Figure 3. Net for a span of 7, minimum frequency of 6 and minimum size of 2.57

None of the three lexical nets is a more accurate reflection of the text than the others, but each presents a somewhat different view of the text. Which view is more useful depends on the analytic purposes for which the net is being used. If the purpose is to know which words co-occur in the same broad areas of text, a large span is indicated. An example of this may be where the text has been segmented into a number of 'cases' and the analyst wishes to know which words tend to cluster together in the same case. A large span will ensure that all words in a case are counted as collocates, while words in different cases will not be counted as collocates (since case boundaries are not crossed). Different sections of the net will therefore tend to represent different types of cases, each with its characteristic pattern of collocation. Although some of the collocations may be short-range locutions such as repetitively used word pairs, many will be long-range and possibly have a more semantic flavour.

If, on the other hand the purpose is to identify typical turns of phrase in a text, a shorter word span is indicated. If the purpose is to identify a mix of stock phrases and longer range 'semantic' contingencies, a word span of intermediate-length is indicated. From the three lexical nets presented here it is in any case clear that the nets are reasonably robust and that at least some of the more prominent features of the lexical interdependencies in a text are identified despite wide variations in the parameters used.

For the purposes of the analysis presented in Chapters 8 and 9 it was decided to use a word span of intermediate length. The data used for Chapter 8 consisted of a relatively small number of cases each containing a large number of words, thus precluding a case-wise analysis since even a very long word span would not have covered each case. Chapter 9, by contrast, made use of a large number of relatively short cases and using a longer word span to cover each case would have been feasible. However, the purpose was not to find clusters of similar cases (correlated, perhaps, with psychiatric diagnosis or with the registrars making the diagnosis), but to highlight typical forms of expression occurring in the text taken as a whole. A very short word span was also considered inadvisable as many of the relations among words identified in this manner would be likely to be of a trivial semantic nature.

Finally, it should be borne in mind that, whatever the parameters used, lexical nets are not intended as an end in themselves, but as a starting point for further qualitative analysis. As a concise overview of a text, a lexical net can serve as a contextual backdrop for whatever analysis is performed with the text.