APPENDIX TO THE C-ORAL-ROM QUARTERLY REPORT January-March 2002 APPPENDIX 1 Assessments for comparability within the multilingual corpora: notes to be added to the C-ORAL- ROM sampling structure and textual format Sampling Problems The difficulties with authorization in the public informal field have been the object of discussion in the Madrid meeting. Probably for this reason a set of samples, whose nature is private, has been classed as public in the corpus structure, on the basis of some non-essential feature (for example sometimes private conversations have been classified as public if recorded in a public place). Solution In order to allow the selection of informal texts of a clearly public nature in accordance with present limitations the percentage of words in this field must be reduced to 25.500 words (without prejudice to the meaning of the C-ORAL-ROM resource) with a parallel rise in the Family/private field (keeping the previous Monologue/Dialogue proportion). The following will be the revised corpus structure in the informal part: Family /Private context 124.500 Public context 25.500
Textual Format Problem Many possible interpretations of the information required in the headers appear in the four corpora. Solution The following definition of the headers must be coherently followed in each corpus: Title: one or two words. Not in English but in the sub-corpus language (French, Spanish, etc.) Participant field: it must contain the role which must be intended as the role in the linguistic event (neutral role: participant or intervenient; interviewer vs. interviewee; inquisitor vs. accused; phone caller vs. answerer; seller vs. buyer). When the role in the linguistic event is identical to the profession (teacher vs. student) the neutral role is applied. On the contrary the role of relatives, friendship, acquaintance, etc. (mother, daughter; husband -wife; friends, neighbour, colleague, etc.) must be explicit in the situation field. Place: The toponym (Lisbon) (for media the place is where the company is established) Situation: information about all the following defined set of : a) the recording situation (in the silent studio; in the street; at home; in a shop; at the Faculty, etc.; b) information that helps to define the activity performed during the event (gossip; chat; quarrel; discussion; narration; claim, etc). The neutral case is "talk". The information of the Class field (dialogue, conversation etc.) should not be repeated. c) the roles of the participants (see.above). e.g. gossips between friends at home during the dinner.
For media corpora the situation is the name of the program. Problems The text-sound alignement of WPC requires a set of guidelines and some revision of the trascription format. The relation between alignment and prosodic tagging must be uniform all through each corpus. Solutions Requirements on the transcription format due to the alignment program
1) all tagging signs must be necessarily preceded and followed by a space: e.g. space//space 2) all dialogic turns must be concluded with an empty space (this creates a problem with the previously defined C-ORAL-ROM format. See below). 3) of necessity the three dots . must be three dots (the corresponding single symbol must be excluded) 4) in overlapping the brackets must precede and follow the text immediately: <text> instead of < text > 5)N.B:: no double empty space in the text 6) N.B. symbols are written in the correct way: e.g. "//" is correct while "/ /" is incorrect ; [/] is correct, while [/ ] is incorrect Other clarifications with transcription format
Case 1: *MAR: ha puesto un sofa / *VIS: ah si // *MAR : / una mesa de esas pequenas / *VIS: si // *MAR: / con un armario muy simple // In this first case we have “intersection”: in MAR’s first turn we have one tone unit (not an utterance: the utterance goes on in MAR’s second turn and doesn’t end until MAR’s third turn). We mark this continuation with a slash / at the beginning of MAR’s second and third turns. So, we have end of tone unit, not interruption. We may also have this case with overlapping: *MAR: ha puesto <un sofa> / *VIS: [<] <ah si> // *MAR : / una mesa de esas pequenas / *VIS: si // *MAR: / con un armario muy simple // Case 2:
*MAR: es verdad // <y todavia> *VIS: [<] <si> // *MAR : / estas obsesionada // In this case, « y todavia estas obsesionada » is the same tone unit, so we don’t put a slash at the end of “todavia”. That’s the difference with respect to case 1. Case 3. The empty space at the end of the turn is necessary on all occasions because of Winpitchcorpus, therefore, in the Format it can no longer signal out interruptions. Therefore the empty space signalling interruption in the C-ORAL-ROM format must be substituted with a specific sign at both levels of transcription. For example: *MAR: es verdad // <y todavia> + *VIS: [<] <si> // *MAR : obsesionada // The XML conversion program and the Macro prepared in Florence will be up dated with respect to the previous requirement. Clarification as regards the relation between prosodic tagging and the alignment unit: Each C-ORAL-ROM corpus must be aligned only with respect to perceptively relevant terminal contours in order to give rise to a data base of utterances (See the Annex: a) Alignment per utterance; b) terminal contours/utterance rough equivalence). This criterion must be strictly followed in both formal and informal speech, even if the alignment unit frequently turns out a huge textual unit in the formal corpora. Given the importance of this requirements in order to reach the objectives of C-ORAL-ROM, the validation of comparability of the four corpora with respect to the tagging of terminal contours will be expected to meet a more strict criterion with respect to non terminal contours (e.g. agreement of four competent speakers in 90% vs 75% of the selected cases). An important result of the meeting must be stressed. A selection of different contexts of prosodic parsing has been verified through the Italian, Portuguese and Spanish corpora, with special attention to the representation in the three corpora of the main critical contexts. In all relevant cases considered, the perceptively relevant breaks in the speech flow are coherently perceived by Italian, Spanish and Portuguese speakers in the full set of Romance corpora, despite the different syllabic structure of the three languages and the respective lack of competence. Advices for starting WP3 on alignment
a) Alignment must be performed on TXT files. b) Practice with alignment of brief dialogues c) Start real alignment operations with 1.500 words texts. d) Start with monologues e) Before alignment: pass texts trough the XML conversion (Madrid) and through the
verification macro (Florence) to tests correctness with respect to the format
APPENDIX 2 Meetings
Meeting with advisors LABLITA, Italian departement, University of Florence, February 18 – 19, 2002.: Corpus Assessment and the exploitation of the C-ORAL-ROM deliverables with related workshop: “Corpus Linguistics and contrastive research on the C-ORAL ROM corpus” February 18, Morning session
Claire Blanche -Benveniste (Ecole Pratique des Hautes Etudes) Quelqu’un, quelque chose, quelque part, quelque fois Fernanda Bacelar do Nascimento (Centro de Linguistica da Universidade de Lisboa) Un outil pour létude des faits de grammaticalisation: les concordanciers Emanuela Cresti (Università degli Studi di Firenze) La personne verbale dans l’italien parlé
February 18 Afternoon session Paola Gramigni (Università degli Studi di Firenze) Les corpus Lablita. Une analyse comparative. Valentina Firenzuoli (Università degli Studi di Firenze) A corpus based study on variation of illocution: some intonational contours Sabrina Signorini (Università degli Studi di Firenze) Syntactic features and frequency of topic unit: analysis of a sample of italian spoken language Daniela Giani (Università degli Studi di Firenze) Le discours direct rapporté dans le corpus C-ORAL-ROM
February 19 Morning session
Dominique Willelms (Ghent University) Le dictionnaire contrastif des valences verbales Antonietta Scarano (Università degli Studi di Firenze) Les relatives dans l'italien parlé. Étude sur corpus Claire Blanche -Benveniste (Ecole Pratique des Hautes Etudes) L’éspace occupé par les chaînes verbales dans le parlé Massimo Moneglia – Alessandro Panunzi (Università degli Studi di Firenze) Semantic variations of verbs in the C-ORAL-ROM italian sub-corpus
February 19 Afternoon session
Assessement of the 1.1 1.2 deliverables
C-ORAL-ROM WP3 sheduled workshop : Application of prosodic tagging criteria to the C- ORAL-ROM corpora and alignment through WinPitchCorpus LABLITA, Italian Department 8-9-10 April, 2002
Monday 8 9,30 – 11: The theoretical Framework (with application to Italian examples) 11-13: Critical contexts on the Italian corpus afternoon
15-18: Applications on the romance corpus: cases and problems encountered in tagging the other romance languages (examples verified through Win Pitch Corpus). Tuesday 9 9,30 – 12: Explanation of the alignment method of Win Pitch Corpus 12 – 13:
Alignment restrictions on the C-ORAL-ROM transcription format
afternoon 15 - 18: training of each participant to the meeting in concrete aligning operation Wednesday 10 9,30 – 11 introduction to XML, and the macro to convert C-ORAL-ROM format into an equivalent XML-format 11-13: Macro for counting and standard measures derived from transcription Participants Massimo Moneglia (UFIR.DIT) Emanuela Cresti (UFIR.DIT) Marius Spinu (UFIR.DIT) Carlota Nicolas (UFIR.DIT-UAM) Paola Gramigni (UFIR.DIT) Sabrina Signorini (UFIR.DIT) Antonietta Scarano (UFIR.DIT) Guillermo de la Madrid (LLI.UAM) Manuel Alcantara Plà (LLI.UAM) Fernando Ares Chote (LLI.UAM) Rita Veloso (FUL.CLUL) Sandra Antunes (FUL.CLUL) APPENDIX 3
Report by the advisors on the French Delivery.
Réunion à Florence le 17 février 2002, des gestionnaires du projet CORAL-ROM, Massimo Moneglia, Emmanuela Cresti, Fernanda Bacelar Do Nascimento et de deux des consultants représentant le Advisory Board: Dominique Willems et Claire Blanche-Benveniste.
Les consultants sont préoccupés. Le projet est en péril. Les activités de l’équipe d’Aix ne
répondent pas à ce qui en était attendu. Les responsables italiens du projet doivent se rendre à Luxembourg entre le 15 et le 20 mars prochain et ils doivent pouvoir répondre avec précision de l’état de la contribution de l’équipe d’Aix.
Dans la mesure où l’équipe d’Aix n’a pas pu jusqu’ici engager directement un jeune
chercheur avec les subventions obtenues, il avait été envisagé de faire verser une bourse à un chercheur d’Aix par les partenaires, Florence et Lisbonne. Il faut pour cela demander à Luxembourg un changement du contrat. Luxembourg demandera alors à voir les parties du travail faites jusqu’ici et particulièrement les parties comparables des quatre corpus des quatre équipes espagnole, portugaise, italienne et française. L’équipe française doit pouvoir faire un point précis du travail fait et à faire, avec un calendrier motivé, compte tenu de ses empêchements financiers. Il est compréhensible que l’équipe d’Aix n’ait pas encore pu constituer l’ensemble du corpus, comme les trois autres équipes l’ont fait, mais il est impératif qu’elle montre au moins une partie de ce travail, absolument conforme aux normes adoptées par les autres équipes, et qu’elle prenne un engagement précis, en termes de calendrier et de personnes engagées, pour terminer cette phase.
Les consultants ont dressé une liste des demandes, en l’état actuel des choses.
1. Florence voudrait avoir un contact direct avec les personnes chargées de faire le travail à Aix,
et non seulement avec la personne, José Deulofeu, qui organise ce travail.
2. Florence voudrait également avoir une lettre officielle de Jean Véronis, responsable officiel
du projet pour fournir par écrit le nom de la personne qui est l’authorized contact personn à mentionner à Luxembourg.
3. Le travail envoyé par l’équipe d’Aix doit être impérativement ramené aux formats prévus et
réexpédié à Florence d’ici la fin du mois de février :
format des extraits, avec le nombre exact de mots, conforme aux décisions prises. On peut indiquer (cf. modèle portugais) ce qui est fait et ce qui reste à faire.
Présentation du son des CD conforme aux normes prévues lors des précédentes réunions. La compression MP3 empêche la comparaison.
En-têtes (headers) des extraits à faire selon le modèle adopté par les équipes italienne, espagnole et portugaise, et envoyé à J. Deulofeu le vendredi 15 février. Ces headers doivent être contrôlés en fonction du contenu (Cf. Observations faites par Florence le 21/02, par exemple de ne pas nommer monologue ce qui comporte visiblement quatre participants).
Pour répartir les extraits selon les catégories du « formel » et « informel », on peut s’appuyer à la fois sur les rôles et sur la situation, quitte à mettre une brève explication.
Il est très important de préserver la « comparabilité » des corpus entre les quatre équipes. C’est ce que la Commission va vérifier en tout premier lieu. Des documents ont été confié aux consultants et expédiés aussi directement à J. Deulofeu, pour servir de base à cette nécessité de rendre comparables les travaux.
4. L’équipe d’Aix doit préciser, avant le début du mois de mars :
quelle part du corpus ancien sera récupérée et re-formatée
quelle part des nouveaux corpus sera faite et à quel moment.
On rappelle que les trois autres équipes ont rendu toute cette pariet du travail et qu’on peut prendre tous les renseignements utiles auprès d’elles. Si ces engagements ne sont pas tenus, il faudrait recourir à d’autres solutions, que les responsables de Florence voudraient éviter.
APPENDIX 4 Main Communications from UPRO
José Deulofeu <jose.deulofeu@wanadoo.fr> sabato 2 marzo 2002 12.39 Chers amis, A) Je vous informe de la situation administrative des personnels travaillant pour CORAL ROM. 1) Magali Seijido Comme je vous l'annonçais dans mon précédent message, j'ai pu, après une longue discussion avec le nouveau responsable, vérifier que Magali remplit maintenant les conditions administratives pour qu'on puisse lui faire un contrat d'un an de jeune chercheur à partir du 1er Mars 2002. Elle aura un salaire net de 1200 € ce qui va permettre de lui payer le travail fait en fevrier. C'est vraiment une bonne conclusion inespérée et qui résout les problèmes immédiats. Le calendrier est le suivant : j'ai déposé la demande de contrat vendredi 1er Mars. Le contrat de travail devrait être signé dans une dizaine de jours. Je vous préviendrai dès que la signature sera donnée. 2) Sandra Bonnard : elle est engagée sur vacations comme Sandrine l'année dernière.Pas de problème. 3) EStelle Campione : pour elle il faudra une bourse gérée par Florence, comme nous l'avions prévu : elle ne peut pas être payée par la France. Comme elle est actuellement payée par l'université, sa bourse peut commencer seulement en juillet, si ça vous arrange. B) Modification du contrat UPRO. Nous avons la chance que le nouveau Président (élu en janvier) a nommé de nouveaux responsables de services plus jeunes et plus européens. Ils vont nous aider à remplir le maximum de notre contrat. Ils sont persuadés que ça nuirait à la réputation de l'université si on renonçait à une trop grande partie du montant de la bourse. Dans ce nouveau contexte, je vous propose de limiter le changement de contrat à deux points : 1) Le contrat pour Estelle, qui doit être géré directement par Florence. 2) L'augmentation demandée du poste "travel and subsistance" pour associer nos collègues d'autres universités (Paul, Bilger.) à la suite du travail. Nous pourrons assurer la mise à jour du concordancier. J'espère avoir été clair et utile. Bien cordialement José (vous avez peut-être reçu deux fois ce message, c'est une erreur)
Jean Veronis Jean.Veronis@newsup.univ-mrs.fr mercoledì 3 aprile 2002 9.55 Chers Collègues, J'ai le plaisir de vous annoncer que nous avons finalement pu recruter du personnel en nombre important, ce qui va nous permettre de rattraper notre retard dans les meilleurs délais. En fait, nous en avons d'ores et déjà rattrapé une bonne partie. Etant donné la taille que prend maintenant l'équipe qui travaille ici sur C-ORAL-ROM, nous avons convenu que j'en assurerai désormais l'entière direction scientifique et opérationnelle, tandis que José Deulofeu continuera à assurer la gestion financière. Afin de permettre la meilleure efficacité au sein de l'équipe, nous souhaiterions donc que l'interaction avec Florence et les autres partenaires se fasse désormais de façon organisée et centralisée, par mon intermédiaire. Je vous serai donc reconnaissant de bien vouloir désormais me faire parvenir toutes les informations touchant le projet. Je me chargerai de faire circuler ces informations au sein de l'équipe aixoise et de distribuer les tâches de façon que le plan de travail puisse être respecté. En ce qui concerne la réunion prévue ce week-end à Florence, nous en avons malheureusement été prévenus un peu tard, et nous ne sommes pas en mesure d'envoyer les spécialistes de prosodie de l'équipe, car ils ont déjà d'autres engagements à cette date. Nous n'aurions d'ailleurs pas été en mesure, en cette période de vacances universitaires pendant laquelle les services administratifs sont fermés, de faire acheter les billets et de faire donner une avance financière au personnel partant en mission. Notre université a une bureaucratie un peu lourde, et il faut faire connaître les missions au moins un mois à l'avance pour que les services administratifs puissent faire le nécessaire, surtout si des vacances interviennent entre temps. Vous savez l'intérêt que nous portons aux études prosodiques, et je vous serai toutefois très reconnaissant de bien vouloir me faire parvenir le compte-rendu de la réunion, que nous étudierons avec le plus grand soin. Je suis très heureux de voir que le projet est maintenant aussi bien engagé de notre côté, et j'en profite pour vous remercier de vos efforts incessants. A bientôt j'espère, Jean Véronis http://www.up.univ-mrs.fr/veronis/
Jean Veronis Jean.Veronis@newsup.univ-mrs.fr venerdì 3 maggio 2002 12.03 Chers Collègues, Comme nous vous l'avions annoncé, un nouveau président et de nouveaux Conseils dirigent depuis quelque temps notre université. Nous avons dû attendre que la nouvelle équipe soit opérationnelle pour pouvoir discuter efficacement des problèmes administratifs et financiers du contrat C-ORAL-ROM, ce que nous venons de faire. J'ai le plaisir de vous annoncer que la nouvelle équipe a été extrêmement positive à notre égard, et que nous avons pu résoudre toutes les difficultés de trésorerie et de personnel qui nous avaient malheureusement gênés la première année. Nous sommes donc en mesure d'embaucher tout le personnel nécessaire et de mener à bien la suite du projet sans changement contractuel. Nous vous remercions de vos démarches auprès de votre université pour la bourse post-doctorale, qui finalement n'est plus nécessaire. En ce qui concerne l'exécution du contrat, nous avons avancé de façon totalement satisfaisante et nous serons en mesure de vous remettre la totalité des données au format requis lors de la réunion de juin. Nous avons bien reçu vos nombreux messages et requêtes (fax, etc.), mais la plupart de nous semblent pas de nature contractuelle. Nous avons un contrat très clair avec la Commission Européenne, et nous honorerons ce contrat. Dans un souci d'efficacité et compte-tenu du retard, nous souhaiterions cependant ne pas avoir à nous disperser dans une bureaucratie lourde et de multiples interactions électroniques hors des cadres contractuels. Puis-je ajouter que j'ai été très surpris du ton autoritaire et agressif de vos divers messages, qui me paraît assez inhabituel par rapport aux projets interuniversitaires auxquels il m'a été donné de participer. Je souhaiterais, si c'est possible, que nous revenions à un esprit de bonne entente collaborative, qui me paraît tout à fait souhaitable et nécessaire pour la bonne marche de la suite du projet. Bien cordialement, Jean Véronis, Professeur Direteur de l'équipe DELIC
APPENDIX 5 Formal request to the President of UPRO
To the President of the “Université de Provence” 3, Place Victor Hugo 13331, Marseille France cc: Jean Veronis (Scientific person responsible of the C-ORAL-ROM project at UPRO) Giovanni Battista Varile (IST Program, Multimedia Contents and Tools, Head of Unit) Object: Request to fulfil the contractual obligations of the C-ORAL-ROM contract (IST 2000 26228)to the legal representative of the Université de Provence. Deliverables 1.1;1.2. Dear President, the above-mentioned Contract signed by the Université de Provence with the Commission foresees the delivery 1.1;1.2, by December 2001 (see pages 39 and 47 of Annex 1 to the Contract).
Despite many informal requests to the persons in charge at Description Linguistique
Informatizée sur Corpus (DELIC), which is the main Department carrying out the work at the Université de Provence (at Aix), we have not yet received the full contribution of UPRO for the deliverables; instead, to-date, we have only received some partial input which is far from that which is set forth in the Université de Provence contractual obligations. Attached to this letter, please find the Deliverable description reported to the Commission, in which the lacking portion of the work by the Université de Provence is detailed.
Therefore, on behalf of the C-ORAL-ROM Consortium, and in accordance with Article 7.3.b
of the Contract, we kindly request you to provide the co-ordinator with the above-mentioned deliverables 1.1 and 1.2 within one month of the receipt of this letter. Yours sincerely. Prof. Emanuela Cresti, Co-ordinator of the C-ORAL-ROM Project Dipartimento di Italianistica Università di Firenze Piazza. Savonarola, 1 50132 Florence Italy
APPENDIX 6 C-ORAL-ROM paper at LREC 2002
The C-ORAL-ROM Project. New methods for spoken language archives in a multilingual romance corpus Emanuela Cresti, Massimo Moneglia*, Fernanda Bacelar do Nascimento*, Antonio Moreno Sandoval*, Jean Veronis*, Philippe Martin*, Kalid Choukri, Valerie Mapelli*, Daniele Falavigna*, Antonio Cid*, Claude Blum*
* Dipartimento di Italianistica, Università di Firenze
*Centro de Linguistica da Universidade de Lisboa
Complexo Interdisciplinar, Av Gama Pinto, 2, 1649-003 Lisboa Portugal
*Laboratorio de Lingüística Informática Departemento de Linguistica, Universidad Autonoma de Madrid
Carretera de Colmenar Viejo Km 15 Cantoblanco 28049 Madrid Spain
*Description Linguistique Informatizée sur Corpus, Université de Provence
29, Avenue Robert Schuman13621 AIX EN PROVENCE - Cedex 1 France
European Language Association Agency (ELDA)
55-57, Rue Brillant-Savarin 75013 Paris France
Istituto Trentino di Cultura (Centro per la ricerca scientifica e tecnologica)
*Instituto Cervantes, Oficina del Español en la Sociedad de la Información
Livreros, 23 28801 Alcalà de Henares - Madrid Spain
Abstract
C-ORAL-ROM is a multilingual corpus of spontaneous speech of around 1.200.000 words representing the four main Romance languages: French, Italian, Portuguese and Spanish. The resource will be delivered in standard textual format, aligned to the audio source in a multimedia edition. C-ORAL-ROM aims to ensure both a sufficient representation of spontaneous speech variation in each language resource, and comparability among the four resources with respect to a definite set of variation parameters. The multimedia conception of C-ORAL-ROM allows simultaneously alignment and full appreciation of the acoustic information through the speech software WINPITCHCORPUS. The storage of spoken language resources is based on the identification of utterances in the four corpora through perceptively relevant prosodic properties. In C-ORAL-ROM, all the textual information is tagged simultaneously with respect to prosodic parsing and utterance limits. Each prosodic unit corresponding to an utterance is easily and directly aligned to its acoustic counterpart, thus ensuring a natural text - sound correspondence and the definition of a data base of possible speech acts in the four romance languages.
Florence1. The resource was set up during 2001 with a
1. Introduction
large reuse of corpora of spontaneous speech collected in
The main goal of the C-ORAL-ROM Project is to
previous academic studies (See Cresti, 2000; Bacelar do Nascimento, 2001; Lavacchi & Nicolas; 2000; Blanche-
provide a comparable set of corpora of spontaneous speech for the main Romance Languages, namely French,
Italian, Portuguese and Spanish (roughly 300,000 words
The C-ORAL-ROM Corpora will be delivered in the
same textual format following present EU standards
for each language). The project has been funded under the IST program of the EU and is being carried out by a
(EAGLE) in a multimedia edition on DVDs, integrated
European consortium co-ordinated by the University of
1 C-ORAL-ROM (IST 200026228). Official web site: http://lablita.dit.unifi.it/coralrom
with tools, assuring both concordances of the text and
verbs, but also that the relative frequency of nouns is
detailed analysis of the acoustic signal. The Corpus
much lower in informal conversations with respect to
edition will be associated with comparative linguistic
formal contexts (1/1 vs 1/3). Adjectives, on the contrary
studies, models and standard linguistic measures of
are much more frequent in formal speech.
spontaneous spoken language variabilit y. Publication and
In the domain of corpus based grammars, the
distribution for academic studies will be performed by
induction of the main syntactic properties is strongly
Champion, while ELDA will distribute the LR to speech
correlated to text variation parameters. For example in
English, both main types of dependent clauses (relative
The paper focuses on two features of the project that
and complement clauses) vary their relative frequency
according to socio -linguistic parameters. Generally
• sampling criteria adopted to ensure comparability
speaking, in syntactic structures controlled by a noun, the
frequency of both that-clauses and to-clauses is higher in
• the multimedia design of the C-ORAL-ROM
formal language, while, in verb-controlled structures,
that-clauses are much more frequent in conversation (Biber, 2000). Similar conclusions can be drawn with
2. Representation of spontaneous speech and
respect to relative clauses. Relative constructions are
comparability in a multilingual LR
much more frequent in formal speech, while the restrictive function is the more frequent, among relative clause
2.1. The representation issue
functions, in the all corpus variation (Biber et al., 1999). In other words, the pragmatic domain of corpora
The Spontaneous Spoken Language areas have
collection strongly influences the probability of
become consolidated only in quite recent times (See
occurrence of syntactic properties of spontaneous speech
Biber, 1988; Blanche-Benveniste, 1990; Cresti, 2000;
Givon, 1979; Miller & Weinert, 1999). Spontaneous
In between syntactic and lexical properties. It is
essential to the grammatical description of spoken
language to note that the majority of complement clauses
which depend on a verb, depend on a putandi verb in
(b) face-to-face dialogue in a large variety of
spontaneous conversation. However, such important data
are also relative to variation parameters. For example, a
(c) mental programming simultaneous with vocal
complement clause depends quite frequently on a dicendi
verb in broadcasting and media contexts (Biber et al.,
(d) contextually undetermined linguistic behaviour
Prosodic level. In the map tasking coding scheme
(Anderson et al., 1991), the set of possible dialogue acts,
The setting up of Spontaneous Speech databases is a
whose investigation is relevant to the link between
complex task. Spoken resources set up in controlled
prosodic and discourse structures, corresponds to roughly
environments (such as telephone information, health
16 possible moves in the map task (Stirling et al., 2001).
dialogues, map tasking) constitute at present the majority
On the contrary, current trends in corpora which document
of the databases used for the validation of language
a huge variety of socio -linguistic and pragmatic domains,
engineering. Their acoustic/phonetic quality is excellent,
show that the set of possible speech acts includes as many
but they deal with highly predictable semantic domains.
as 80 categories which are distributed all over the corpus
Should one wish to represent Spontaneous Speech in a
variation (Firenzuoli in preparation). Of course the
LR, the constitution criteria must ensure the widest
inductive data on the link between prosody and speech
possible variation in speech contexts, and a low control on
acts have a severe limitation in map tasking and need to be
the speech event, that is exactly the opposite of what
The study of prosody needs natural speech variations
There are many reasons for this necessity. Variability
for many reasons. For instance, quite surprisingly we
is the main property of spontaneous spoken texts. As a
noticed that thematic prosodic structures (topic/prefix
matter of fact, almost the complete set of linguistic levels
intonation see. 't Hart et al., 1990), largely characterised
of language description varies their quantitative weight a
formal texts, while the so called comma intonation
lot, when considered with respect to different pragmatic
(appendix/suffix 't Hart et al., 1990) strongly correlates to
everyday dialogues (Tizzanini in press).
Frequency lexicon level. The representation of a
Middle length of utterances (MLU). The demarcation
sufficient number of contexts covering, as far as possible,
of the utterances, is an essential data for the interpretation
relevant types of speech events in the universe, is the only
of natural speech and it turns out that such tagging level
possible strategy to identify significant frequency
allows the verification of important basic speech
lexicons. High frequency lexicon defined with respect to
measurements (Biber et al., 1999). In recent works (see
general corpora may be under-represented in specific
Tizzanini, in press; Rossi, 1999; Cresti, 2000, Moneglia,
pragmatic domains which on the contrary, by definition,
in press; Firenzuoli, 2000) has been verified, that MLU of
maximise the probability of occurrence of low frequency
texts marked by a strong degree of spontaneity (family
lexical items. That is the real point of interest for the rigid
conversations, country wakes, conversatio ns among work
definition of a semantic domain in the setting up of
colleagues and conversations among university students)
comparable corpora of dedicated resources.
systematically differs from MLU of formal texts
Syntactic level. It has been noted that in general
(university lectures and radio interviews).
corpora (Biber et al., 1999) nouns are more frequent than
Fig. 1 shows that the MLU is almost constant all
In the domain of speech, parallel corpora are possible only
through the contextual variation with the significant
exception of formal contexts, where we find a iato2.
Comparability is quite easy to pursue with respect to
The systematic correlation between type of contexts
resources based on the selection of a specific semantic
and MLU allows a strong quantitative prevision on the
domain (telephone information, health information, map
internal structure of the texts defining the probability of
tasking etc.) “people in the same controlled situation
the possible length of the utterance in each domain.
doing the same things”. However such resources are
acquired in a restricted series of situations and are
submitted to elicitation parameters (limited contexts) and therefore lack the main character of spontaneous speech (character d).
If we assume that the representation of spontaneous
speech must necessarily represent spoken text variation, in
a multilingual resource the more variability is represented
in each language resource, the more the language resource is difficult to compare with the other resources and
comparability is a function of the application of variation
C-ORAL-ROM sampling
The definition of significant variation parameters is,
therefore, a basic step towards the development of a comparable LR of spontaneous speech.
A long tradition of socio-linguistic studies (see Bilger,
1997; Labov, 1966; Biber, 1998; Berruto, 1987; Gadet,
1996) has frequently dealt with the significance of "socio-
situational parameters": 1) Socio-linguistic (age,
education, occupation, sex); 2) semiological (monologue,
dialogue, conversation); 3) sociological (family, public);
4) transmission (face-to-face, transmitted); 4) gender. In practice4.
Figure 1. Middle Length Average in text typologies
languages resources is based on the following set of
variation parameters that constitutes the semiological and
The representation of spontaneous speech must
therefore necessarily represent spoken text variation.
(a) Dialogical structure (monologues, dialogues,
2.2. The comparability issue
(b) Social domain of use (family; private life, public
The central problem which a multilingual corpus of
Spontaneous Speech must solve is the question of
comparability among different language resources in the
(e) Speaker parameters (Age, Sex, Education, and
Comparability in large Written Language Corpora
In C-ORAL-ROM, which has quite a limited
• Parallel corpora (for ex. CRATER and EUROROM)
dimension, such parameters are not uniformly verified
Corpora of the same type or of the same specialised
throughout all the variation. That should of course be
much better. In particular the use in the sampling strategy
of the formal / informal partition, which is absent in the
Clearly, with respect to the task of collecting
Dutch corpus, allows one to restrict the number of
Multilingual Spontaneous Spoken Language Corpora,
parameters under investigation reducing the set of possible
only the second alternative is, in principle, available. As a matter of fact, it is impossible to realise parallel corpora
variations, with low damage for representation purposes.
without losing the spontaneity characteristic (Character c).
In particular, text gender variation is the main criterion applied in the formal part, while social contexts of use and
dialogue structure variation are the variation parameters
2 From Cresti, 2000. Legend: TOT.Sampling: total data of sampling; TOT.FAM.: family typology; TOT.PRIV.free: private fee typology; TOT.PRIV.reg.: private regulate typology;
4 The Spoken Dutch Corpus (also under constitution at present)
TOT.PUB.free: public free typology; TOT.PUB.reg: public
is a concrete example of the use of such parameters in corpus
regulate typology; Media: mediatypology; Baby: baby talk
design (documentingthe Netherlands and Flanders). We were
not aware of the corpus design of the Dutch Corpus when the C-
3 The prototype example is the relation between the Brown
ORAL-ROM project was prepared (1999), but when sampling
Corpus (early 60’s, Brown University USA) and LOB Corpus
was decided (January 2001), its structure at
(Lancaster/Oslo/Bergen, 1970) which realise together a
http//lands/let.kun.nl/cgn/edesign.htm, confirmed the overall
comparable sampling of American English and British English.
systematically adopted for the informal part, where on the
contrary gender variation is not strictly defined as a
C-ORAL-ROM does not represent dia-topical phonetic
variations. In a multilingual collection dia-topical limits for each language must be established. Corpora are
collected in Continental Portugal, Central Castilia Spain,
Southern France, Western Tuscany, and are intended to
represent some possible standard, rather than all the
varieties of pronunciation, which need collections of interlinguistic corpora with a wide dia-topical variation5.
Therefore, each corpus does not represent phonetic
variation, but rather is expected to demonstrate a sufficient
variation across language uses for at least studying
communicative acts, lexicon, syntax and prosody.
The main choices adopted in C-ORAL-ROM for the
representation of speech variability in four 300.000 word
splitting formal speech (50%) and informal speech (50%), variation ensuring a sufficient representation
of dialogical Informal Speech (which is the resource
• selecting distinct criteria for sampling the formal and
• defining a text weight ( from 1500 to 3000 words for
each text ) that ensures both the possible appreciation
of macro-textual properties and sufficient
representation of the universe in each 300.000 word
• representing a variety of possible recording situations
within the range of perception and intelligibility of the human ear 6.
recording as part of the meta-data: a) Speaker characteristics; (gender, age, geographical region education and occupation); b) acoustic quality of the
As a consequence of those choices, each corpus in the
multilingual resource cannot be said to be comparable to
the others with respect to specific semantic domains, but
The comparable Romance Spoken Corpus is identified
rather, with respect to the possible occurrence of spoken
by means of common Sampling criteria, and the same
language structure/s at both syntactic and prosodic levels
proportion of each type in the four corpora: the following
in a variety of possible significant contexts.
are the tables for the formal and informal part of each Romance corpus in the C-ORAL-ROM resource.
Textual format
The four Romance Corpora have been transcribed or
converted into standard textual format (Gibbon et al., 1997).The format definition of spoken texts involves: 1)
dialogue representation; 2) text co-ordinates; 3) prosodic
tagging. The C-ORAL-ROM textual format is defined as
an implementation of the CHAT architecture (Mac
a) Heading, containing a definite set of meta-textual
b) Text lines in orthographic transcription divided as
c) vertically, in dialogic turns (introduced by a speaker
d) horizontally, by prosodic parsing and utterance limit,
5 This limitation is quite severe for Italian, where local varieties
representing terminal and non terminal prosodic
may strongly diverge from the standard (De Mauro et al., 1993)
6 The sound files of the acoustic database are set on a quality scale (recording, volume, voice overlapping and noise) and are
comparable with respect to it. The quality scale extends from the
8 10 long samples 4.500w; at least 64 short sample, 1500w;
highest level of clarity of the voice signal to low levels of
7.500w collections of very short dialogues in public context
acoustic quality. The quality is gauged spectrographically.
9 2 or 3 sample for each gender of 3000 words average with only
7 At least 23.000 conversations with more then two participants
e) Dependent tiers for context information and possible
mouse clicking as slowed speech is perceived, and
automatically building an aligned text database (up to 8
layers of text annotation and alignment). It incorporates a
The C-ORAL-ROM textual Corpus will turns out
mouse driven file segmentation tools, with precise time
tagged with respect to: a) utterances corresponding to
adjustment on on-screen speech spectrogram and prosodic
speech acts (Austin, 1962; Cresti, 1994 and 2000); b)
parameters display. This allows a fast and precise
prosodic parsing of each utterance (’t Hart et al., 1990);
segmentation of both long prosodic units (utterances) and
c) words vs. word fragments distinction; d) overlapping.
small speech units such as syllables or phones. Among its numerous features:
3. Multimedia
a) Recording, and playback of long signals (memory
limited) at standard sampling rates (8,000 Hz, 11,025
The definition of the text to speech interface in C-
ORAL-ROM is based on the idea that the access to
Hz, 16,000 Hz, 22050 Hz, 32,000 Hz, 44,000 Hz and
acoustic information in a multimedia corpus (alignment)
64,000 Hz) in mono or stereo mode, at 8 bit or 16 bit encoding;
must go hand in hand with the representation of prosody. Such a method can be proposed as a possible standard for
b) Standard black and white and color spectrogram of
storing oral language in multimedia and multi-modal
any part of the speech signal, with 3 distinct zooming tools (down to 1 sample resolution), 8 levels of
language resources. C-ORAL-ROM will ensure simultaneously:
bandwidth and 8 available analysis windows, 3
a) tagging with respect to prosodic parsing & action
c) Powerful fundamental frequency and intensity
b) acoustic analysis with special functions for F0
analysis (3 standard methods – spectral comb,
AMDF, harmonic selection) with all user adjustable parameters;
c) Utterance-based text - speech alignment
d) Prosodic morphing, user graphically defined
3.1. Acoustic format
modification of the prosodic parameters of natural speech (fundamental frequency, intensity, syllable
C-ORAL-ROM comes from the reuse of previously
established resources recorded with various analogue or
e) Easy insertion of text, bookmarks, comments. User
digital equipment and from new recordings. The following
are the requirements for the acoustic format:
Format: mono or stereo .wav files (Windows PCM),
WinPitch also complies with the MDI Windows
standard (Multiple Document Interface), and allows all
Recording and storing process for old Analogue
functions to be concurrently applied to multiple speech
recording: directly derived in wav files (20.050 hz 16
bit) from the original analogue tapes through a standard sound card (Sound Blaster live or compatible) with a
3.3. Alignment units Recording and storing process for new recording:
The storage of spoken language resources should be
a) dialogues: stereo DAT or minidisk recording
based on the selection of a natural alignment unit. In C-ORAL-ROM all the textual information is tagged
(44.100Hz) with two unidirectional Micro-phones, converted into mono or stereo .wav files (Windows PCM,
simultaneously with respect to prosodic parsing and
22050Hz, 16 bit) via SPDIF port of a standard sound card
utterance limits, therefore, each prosodic unit corresponding to an utterance can be easily and directly
(Sound Blaster live or compatible) with a professional sound editor
aligned to its acoustic counterpart, thus ensuring a natural
b) conversations with more than two participants:
and meaningful text - sound correspondence.
This step is quite controversial at two levels. It implies
mono DAT or minidisk recording with cardioid or omni-directional microphone converted into mono .wav files
on the one hand that the notion of utterance should be
(Windows PCM, 22050Hz, 16 bit) via SPDIF port of a
preferred to other possible linguistic notions as a natural alignment unit and that, on the other hand, the criteria for
standard sound card (Sound Blaster live or compatible) with a professional sound editor.
the identification of utterances in a spoken language corpus are reliable.
3.2. WinPitchCorpus
As far as the first question is concerned word based alignment (that has been preferred for example in the
In synthesis the function of the Align Programme in C-
Spoken Dutch Corpus) has low significance in
ORAL-ROM is to orient the sound signal exploitation
spontaneous speech, and it is hard to be pursued for
allowing, not only the transit from text to sound, but also,
prosodic reasons. In spontaneous spoken language words
are co-articulated in prosodic units and the acoustic effect
Text -speech alignment and acoustic analysis are
of a word-based alignment is perceptively unnatural.
ensured through the speech software WinPitchCorpus
Moreover, the alignment becomes significant from a
linguistic point of view once it is defined with respect to a
WinPitchCorpus (see http://www.winpitch.com) is a
compositional linguistic domain, that is ranked over the
general purpose speech analysis tool working under
word level description. Therefore, the alignment problem
Windows 2000/XP with many functions devoted to the
is linked to the definition of the language structure in the
alignment and annotation of large corpora. In particular,
Text -speech aligner tool, is based on a user adjustable speech slow-down process, in order to easily select text by
The C-ORAL-ROM approach is based on the idea that
weak and too strong for the representation of natural
while Written language is characterised by a textual
speech and, moreover, it does not allow any prevision on
organisation based on syntax, Spoken language is mainly
spoken corpora segmentation even from a statistic point of
characterised by utterances, having a pragmatic nature and
corresponding to communicative acts (Quirk, et al., 1985; Biber, et al., 1999; Cresti, 2000). In facts, sentence based
Prosodic tagging
(or clause based) alignment turns out to be strongly
The segmentation of spoken texts into utterances
underdetermined in spontaneous spoken texts. For
corresponding to speech acts can be based on prosodic
example, considering textual information, the following
properties that are highly identifiable at the perceptual
dialogic turn is apparently one sentence:
In C-ORAL-ROM the prosodic tagging of the
transcribed text is not a transcription of the intonation, as
for example ToBi, or MARSEC that specifies the
%sit: in a garage, a secretary looking for some
intonation profiles according to a phonological typology.
In C-ORAL-ROM prosodic tagging specifies on the text
each perceptively relevant prosodic break in the speech
On the contrary the relevant acoustic information
reveals that the dialogic turn is compound of two
utterances, which can receive the following paraphrases:
a) Tone units with a non terminal contour, reported
"I wander which kind of car this one is. Is it a Punto ?" .
every time a non terminal prosodic break can be
In other words, the two utterances define two
perceived in a word sequence by a competent
meaningful units for a linguistically relevant alignment,
while the syntactic approach will lead to a meaningless
b) Terminal contours (utterance limit) reported every
alignment from a linguistic point of view.
time that a terminal prosodic break can be perceived
Therefore textual information does not determine a
by a competent speaker: // ? (double slash or
significant alignment unit in spoken language, in which
not textual information is frequently required and, as the
previous example shows, a meaningful alignment unit
The previous example will be transcribed as follows in
may not have a clause or sentence structure. So
syntactically based alignment is at least underdetermined.
The relevant linguistic events (utterances) must be
*SEC: che macchina l’è / codesta // Punto ?
selected in the speech continuum through the full
appreciation of the acoustic and pragmatic information.
%sit: in a garage, a secretary looking for some
This conclusion, however, leads us to the second question.
A definition of utterance as a speech continuum from one silence to another silence has been frequently
Crucially, terminal breaks indicate the prosodic
proposed, even as an objective mark allowing the
automatic detection of utterance limits on the acoustic
signal. However it must be stressed that the notion of
The definition of utterance in C-ORAL-ROM is
utterance as a speech continuum from one silence to
theoretically defined. Given that intonation parses the
another silence is both too week and too strong for the
speech continuum with relevant F0 movements, we
representation of natural speech and therefore it does not
assume that the identification of utterances in the sound
allow any prevision on spoken corpora segmentation. In
continuum is linked to the detection of perceptively
particular we can highlight the following:
relevant F0 movements. Also very traditional studies of
prosody have noted that there is no such thing as an
a) segments of sound wave that are between two sound
utterance without a profile of terminalintonation
(Karcevsky, 1931; Crystal, 1975). Therefore the
b) in spontaneous speech frequently utterances start
systematic correlation between terminal contours and
and/or stop with no break in the sound wave.
utterance limit is an efficient heuristic method for speech
The quantitative relevance of both properties in
However, at the theoretical level, we must consider
spontaneous speech cannot be stated with precision but
that perception is highly sensitive to voluntary F0
only guessed. For example from 20% to 50% of
variation (’t Hart et al., 1990) and that every utterance in
utterances (depending on the text gender) of spontaneous
spoken language on the one hand is the voluntary
speech corpora have a topic unit (Signorini, 2001). A topic
accomplishment of a speech act (Austin, 1962) and on the
cannot be an utterance but is frequently in between two
other, it is necessarily parsed in one or more tone units.
The background theory of the C-ORAL-ROM project
Similarly the second utterance of the previous example
(Cresti, 1994, 2000) links the two properties: the
is not preceded by a temporal break. The frequency of
voluntary F0 variations do not simply scan the utterance,
new utterances that start with no temporal break (or less
but rather express functional values that are necessary to
than the voiceless part of a stop consonant) has not be
the accomplishment of speech acts. For this reason the
counted but it is of course a very high percentage in
selection of textual units corresponding to an utterance
can be based on prosodic properties. In particular, as we
In conclusion, the notion of utterance as a speech
did in the previous example, it is possible to identify an
continuum from one silence to another silence is both too
utterance each time the prosody makes it possible to
perceive the completion of a speech act; i.e. intonation
Bacelar do Nascimento, F., (ed.), 2001. Portugues falado:
permits the pragmatic interpretation of the text
varietades geograficas e sociais, Lisboa: CLUL &
illocutionary criterion has been successfully applied to
Bally, C.,1950. Linguistique générale et linguistique
both the corpora of Adult Spontaneous Speech and Infant
française, Berne: Francke Verlag.
Speech allowing their tagging in utterances (see. Moneglia
Berruto, G., 1987. Sociolinguistica dell’Iitaliano
The identification of functional values for prosody is
Biber, D., 1988. Variation across speech and writing,
also in some sense traditional (Bally, 1950; Halliday,
1985). For example, it has been noted that, within the
Biber, D., S. Johansson, G. Leech, E. Finegan (eds.) 1998.
possible tone units, the tone information which enables
Corpus linguistics: investigating language structure
one to identify the illocution, or modality, of the utterance
and use. Cambridge: Cambridge University Press.
lies in a specific scansion unit (Martin, 1978).
Biber D., S. Johansson, G. Leech, S. Conrad, E. Finegan
The theoretical approach we are referring to
(eds.) 1999, The Longman grammar of spoken and
systematically links the study of such values to the study
written English . London: Longman.
of spontaneous speech. The melodic pattern which scans
Biber D. 2000. Corpus based analysis of grammar:
an utterance can be simple (composed of a single tone
variability in the form and use of English complement
unit) or complex (in which case it is made up of two or
clauses. In M. Bilger (ed.), Corpus, Methodologie et
more tone units linked melodically together).
applications linguistique. Paris: Champion, 224-237.
Non terminal tone units correspond to the scanning of
Bilger, M. , 1997. Corpus de portugais & d’espagnol.
an utterance by means of a complex pattern: the type of
Revue de l’Association Français de linguistique
which is discriminated at the perceptual level on the base
of its form (intonation pattern , 't Hart, et al., 1990). In
Blanche-Benveniste, C. (ed.), 1990. Le français parlé:
principle each perceptively relevant tone unit conveys a
ètudes grammaticales . Paris: Editions du CNRS.
specific functional value (informational patterning; see
Blanche-Benveniste, C. (ed.) in press. Corpus du
Cresti, 1994; Crest & Firenzuoli in press). For example,
Français parlé. Echantillonages. Paris: Champion.
the first tone unit of the following utterance is a Topic
Cresti, E., 1994. Information and intonational patterning
(prefix contour) and is followed by an information unit
in Italian. In B. Ferguson, H. Gezundhajt, Ph. Martin
(with a root contour) allowing the identification of the
illocutionary value of the utterance (Comment).
phonologiques. Toronto: Editions Mélodie. 99-140.
Cresti, E., 2000. Corpus di italiano parlato, vol. I- II, CD-
Cresti, E., V. Firenzuoli in press. L’articolazione
informativa topic-comment e comment-appendice: correlati intonativi, In Atti delle XII° GFS (Macerata 15 Dicembre 2001). Macerata: Università di Macerata Press.
Crystal, D., 1975. The English tone of voice, London
De Mauro, T., F. Mancini, M. Vedovelli, M. Voghera
1993. Lessico di frequenza dell'italiano parlato. Milano: Etass Libri.
Firenzuoli, V., 2000. Nuovi dati statistici sull’italiano
parlato. Romanische Forshungen, 13: 213-225.
Firenzuoli, V., in preparation. Repertorio dei profili
The results obtained on the basis of the application of
intonativi di valore illocutivo in un corpus di italiano
the illocutionary criterion are crucially confirmed in the
parlato, Ph.D. thesis, Firenze: LABLITA.
macro-syntactic theory of spoken language (Blanche-
Gadet, F., 1996. Variabilité, variation, varieté: le Français
Benveniste, 1990) for which the syntactic noyau coincides
d’Europe. French Language Studies, 6:45-58.
with the tone unit bearing the illocutionary value.
Gibbon, D., R. More, R. Winski (eds.), 1997. The
C-ORAL-ROM Corpora represent the variety of
handbook of Standards and Resources for Spoken
speech acts performed in everyday language use and
language Systems. Berlin: Mouton & de Gruyter.
enables the description of their prosodic and syntactic
Givon, T. (ed.), 1979. Discourse and Syntax. In Givon T.
structure in the four Romance Languages, from a
(ed.), Syntax and Semantics, vol. 12. New York:
quantitative and qualitative point of view.
Halliday, M., 1985. Spoken and written languages.
4. References
Austin, L.J., 1962. How to do things with words, Oxford:
’t Hart, H., R. Collier, A. Cohen, 1990. A perceptual study on intonation. An experimental approach to speech
Anderson, A., M. Bader, E. Bard, E. Boyle, G. Doherty,
melody. Cambridge: Cambridge University Press.
S. Garrod, S. Isard, J. Kowtko, J. McAllister, J.
Karcevsky, S., 1931. Sur la phonologie de la phrase, in
Miller, C. Sotillo, H. Thompson, R. Weinert, 1991.
Travaux du Cercle linguistique de Prague, IV.
The HCRC map task corpus. Language and Speech, 34:
Labov, W., 1966. The social stratification of English in
Lavacchi, L., C. Nicolas, 2000. Dizionario Spagnolo Italiano, (CD-rom Edition). Firenze: Le Lettere.
MacWhinney, B., 1994. The CHILDES project: tools for
Martin, Ph. 1978. Questions de phonosyntaxe et de
Miller, J. & Weinert, R. 1999., Spontaneous Spoken language, Oxford: Clarendon Press.
Moneglia, M., in press. I corpora dell’italiano parlato di
LABLITA: Criteri di costituzione, unità di analisi e comparabilità dei dati linguistici orali. In E. Burr (ed.), Atti del VII° Convegno internazionale SILFI. Pisa: Cesati.
Moneglia, M., E. Cresti, 1997. Intonazione e criteri di
trascrizione del parlarto, in U. Bortolini E. Pizzuto (eds), Il progetto CHILDES Italia, Pisa: Del Cerro.
Miller, J., R. Weinert, 1999. Spontaneous Spoken language . Oxford: Clarendon Press.
Quirk, R., S. Greenbaum, G. Leech, J. Svartvik, 1985. A comprehensive Grammar of the English Language. London: Longman.
Rossi, F., 1999. Le parole dello schermo . Roma: Bulzoni. Signorini, S., 2001. Caratteristiche sintattiche e frequenze dei topic in un corpus di parlato italiano, Tesi di Laurea, Univerity of Florence.
Stirling, J., I. Fletch.r, R. Mushin, L. Wales, 2001.
Representational issues in annotation: Using the Australian map task corpus to relate prosody and discourse structure. Speech Communication, 33:113-134.
Tizzanini, G., in press. L’articolazione dell’informazione.
Dati quantitativi di un corpus di italiano parlato. In E. Burr (ed.), Atti del VII° Convegno internazionale SILFI, Pisa: Cesati.
APPENDIX 7 CV New manager CURRICULUM VITAE Eng. Marius Bogdan SPINU Ph.D
PERSONAL DATA:
• Place of birth: Vatra Dornei (Romania) ; Date: 03.20.1970 • Civil status: married. • Residence: via Favini 47, PRATO 59100, Italy • Phone: 055 4796365, 347.3182020 (mobile) • E-mail: spinu@lablita.dir.unifi.it or: spinu@dsi.unifi.it
EDUCATION:
• 2002: Receive the Ph.D in Computer Science and Telecommunication from the University
of Florence, Italy with a thesis on Watermarking music scores, transaction model for digital music distribution
• 1998: Receive the degree in Telecommunication Engineering from the University of
• 1988: High school "Stefan Cel Mare", Suceava, Romania
COMPUTER SKILLS:
• Object-Oriented and structured programming
§ Pascal, C, C++ , Assembler80x86, HTML, Perl, ASP, CGI, JScript
• Operating systems: MS-DOS,Windows(NT, 3.x, 9x, 2000),Unix and unixlike (Linux) • Other:
§ Personal Computer components (characteristics, installing) § Computer architecture (microprocessors, memories, microcontrollers)
LANGUAGES:
• Romanian (native tongue). • Italian (very well). • English (well). • French (quite good).
WORK EXPERIENCES:
1988-1989 mechanic in a Romanian enterprise
1998-1999 hardware/software technician at Centro HL in Florence
1999-2000 Excel teacher at CESIT Florence
from 1998 consultant for several software house in Florence
from 1999 scientific consultant at "Dipartimento di Sistemi e Informatica" at Florence
University working also for: • European ESPRIT MOODS project
• Protection techniques for IPR on music delivering field
from 2000 "Technological aspects in using the computer" teacher for CESCOT Toscana
• dicembre 1998 - febbraio 2000: EXETICA S.r.l. Firenze (via Cavour 8, 50129, Firenze)
• October 1999 – April 2000: realisation of a CD-ROM for Siderfor S.r.l. (viale Unità d'Italia
105/107, 57025, Piombino -LI-) inside the Campus "Multimedia" project
• December 1999 - January 2000: Consultant to l'AGENZIA PER L'ALTA TECNOLOGIA
CESVIT S.p.A., Via G. del Pian dei Carpini, 28) for an ESPRIT project proposal (for Provincia di Ascoli Piceno)
• October 2000 – June 2001: Engineering Ingegneria Informatica (Via G. del Pian dei Carpini,
1, Firenze) for teaching, software and web applications development
• From February 2002: Contract teacher to the University of Florence for Microprocessor systemscourse at the Faculty of Information Engineering CONFERENCES:
• In the “technical committee” of International Conference on Software Maintenance ICSM2001
• In the “technical committee” of Web Delivering of Music WEDELMUSIC2001
• In the “technical committee” of Web Delivering of Music WEDELMUSIC2002 PUBLICATIONS: International Journals:
• P. Bellini, F. Fioravanti, P. Nesi, M. B. Spinu, "Cooperative Visual Manipulation of
Music Notation" Submitted to ACM transaction on Human Computer Interaction.
Italian Journals:
• P. Bellini, I. Bruno, R. Della Santa, P.Nesi, M. Spinu, "Musica e Internet, Distribuzione e Fruizione", Scienza&Business nr. 1 -2/2001, (2001). International Conferences:
• M. Monsignori, P. Nesi, M. B. Spinu, "A high capacity technique for watermarking music sheet while printing". In proc. of MMSP2001 (pp. 493-498), October 3 -5, 2001, Cannes (France)
• M. Monsignori, P. Nesi, M. B. Spinu, "Watermarking music sheet"
In proc. of PCM2001 (The Second IEEE Pacific-Rim Conference on Multimedia) (pp. 646-653) , October 24-26, 2001, Beijing (China)
• M. Monsignori, P. Nesi, M. B. Spinu, "Watermarking music sheet while printing"
In Proc. of WEDEL2001 (pp. 28-35), November, 23-24, 2001, Florence
• C.Busch, P.Nesi, M.Schmucker, M.B.Spinu “Evolution of MusicScore Watermarking
Algorithm”, Accepted to SPIE 2002, 20–25 January 2002, San Jose, California USA.
Italian Conferences:
• P. Bellini, I. Bruno, R. Della Santa, P.Nesi, M. Spinu, "Music Distribution and Protection", In proceedings of the Electronic Imaging & the Visual Arts, EVA2001 conference, March, 26-30, 2001, Florence (Italy)
• P. Bellini, I. Bruno, R. Della Santa, P.Nesi, M. Spinu "Safe distribution of multimedia musical objects". In proc. of AICA2001 congress (pg. 101-112), September, 19-21, 2001, Como (Italy)
• P. Bellini, I. Bruno, R. Della Santa, P. Nesi, M. B. Spinu "A Format for Modelling and
Managing Integrated Musical Objects". In proc. of AIIA2001 – 7th Conference of the Italian Association for Artificial Intelligence.
Your GP or Endocrinologist may also check: What do I need to know about my medication? Hypoparathyroidism The aim of treatment is to abolish symptoms – not to restore ‘normal’ calcium levels in the blood. In the absence of PTH, higher levels of calcium are found in the urine for a given blood calcium level. When hypoparathyroidism occurs as a complication This can caus
Institución Educativa Escuela Normal Superior Resolución Aprobación Nro.006990 de Sept.07/92 Acreditación MEN Resolución Nro. 3684 de Dic. 09/98 INSTITUCIÓN EDUCATIVA ESCUELA NORMAL SUPERIOR MARÍA AUXILIADORA INFORME DE GESTIÓN VIGENCIA 2011 Copacabana, febrero 16 de 2012 Presentación Hna Sara Cecilia Sierra Jaramil o, con Cédula de ciudadanía Nº 42 679 309, del munici