Past Seminars‎ > ‎

JSS-2006

Showing 12 items
DateSpeakerTitleAbstract
Sort 
 
Sort 
 
Sort 
 
Sort 
 
DateSpeakerTitleAbstract
December 18, 2006 Satanjeev Banerjee Segmenting Meetings into Agenda Items by Extracting Implicit Supervision from Human Note-Taking Splitting a meeting into segments such that each segment contains discussions on exactly one agenda item is useful for tasks such as retrieval and summarization of agenda item discussions. However, accurate topic segmentation of meetings is a difficult task. We are investigating the idea of acquiring implicit supervision from human meeting participants to solve the segmentation problem. Specifically we have implemented and tested SmartNotes - a note taking interface that gives value to users by helping them organize and retrieve their notes easily, but that also extracts a segmentation of the meeting based on the humans' note taking behavior. In this talk I will report on our experiments with this notes-based segmentation. I will show that the notes based segmentation improves over an unsupervised baseline by 45% relative, and also compares favorably with a current state-of-the-art algorithm. This research has been conducted with the guidance of Alex Rudnicky. This talk is in partial fulfillment of my LTI speaking requirement, and is also a practice talk for the Intelligent User Interfaces conference.  
September 14, 2006 Roger Hsiao Optimizing Components for Handheld Two-way Speech Translation for an English-Iraqi Arabic System  
September 14, 2006 Wilson Tam Unsupervised Language Model Adaptation Using Latent Semantic Marginals  
September 14, 2006 Antoine Raux Doing Research on a Deployed Spoken Dialogue System: One Year of Let's Go! Experience  
September 14, 2006 John Kominek The Blizzard Challenge 2006 CMU Entry: Introducing Hybrid Trajectory Selection Synthesis  
September 8, 2006 Stan (Szu-Chen) Jou Towards Continuous Speech Recognition Using Surface Electromyography  
September 8, 2006 Brian Langner Generating Time-Constraint Audio Presentations of Structured Information   
September 8, 2006 Paisarn Charoenpornsawat Example-Based Grapheme-to-Phoneme Conversion for Thai Several characteristics of the Thai writing system make Thai grapheme-to-phoneme (G2P) conversion very challenging. In this paper we propose an Example-Based Grapheme-to-Phoneme conversion approach. It generates the pronunciation of a word by selecting, modifying and combining pronunciations of syllables from a training corpus. The best system achieves 80.99% word accuracy and 94.19% phone accuracy which significantly outperform previous approaches for Thai. 
July 21, 2006 Friedrich Faubel Particle Filters for the Supression of Background Noises in Speech Particle filters, originally developed for tracking aplications such as pursuing airplanes in radars or persons in video images are increasingly pervading other fields of engineering covering navigation, robotics and (industrial) process control. Recently they have found their way into speech recognition where they are used to track background noises that contaminate speech spectra. The noise is then compensated for by statistical inference. The talk will present the original approach by Raj, Singh, Stern, some extensions and some of the remaining problems. 
April 28, 2006 Ian R. Lane Verification of Speech Recognition Results Incorporating In-domain Confidence and Discourse Coherence Measures Conventional confidence measures for assessing the reliability of ASR (automatic speech recognition) output are typically derived from "low-level" information which is obtained during speech recognition decoding. In contrast to these approaches, we propose a novel utterance verification framework which incorporates "high-level" knowledge sources. Specifically, we investigate two application-independent measures: in-domain confidence, the degree of match between the input utterance and the application domain of the back-end system, and discourse coherence, the consistency between consecutive utterances in a dialogue session. A joint confidence score is generated by combining these two measures with an orthodox measure based on GPP (generalized posterior probability). The proposed framework was evaluated on an utterance verification task for spontaneous dialogue performed via a (English/Japanese) speech-to-speech translation system. Incorporating the two proposed measures significantly improved utterance verification accuracy compared to using GPP alone, realizing reductions in CER (confidence error-rate) of 11% and 8% for the English and Japanese sides, respectively. When negligible ASR errors (that do not affect translation) were ignored, further improvement was achieved, realizing a reduction in CER of up to 15% compared to the GPP case. 
February 17, 2006 Alan W Black Statistical Parametric Synthesis for Multilingual Speech Synthesis Unit selection synthesis has offered quality speech synthesis merely at the cost of a large well labeled appropriate speech database. As the desire for an easier method for building voices increases alternative methods are being sought. HMM-Generation synthesis, as typified by NITECH's HTS, has been shown to produce high quality acceptable speech output without the laborious hand correction of large databases. This talk presents the FestVox CLUSTERGEN trainer and synthesizer for automatically building Statistical Parametric Synthesis voices. In an effort to generalize HTS in a language independent way we have more tightly coupled a parametric synthesizer build process into FestVox. The process is language independent and robust to less perfect and smaller databases. The resulting synthesis quality is comparable to HTS. In an attempt to investigate multi-lingual synthesis, where cross language data is used to generate target language synthesizer the talk will report on a number of multilingual experiments using the CLUSTERGEN synthesizer, and GlobalPhone multilingual data originally collected for speech recognition modeling. CLUSTERGEN is not seen as a replacement to HTS, it is not as elaborate as the current work in HMM-generation synthesis, but it is seen as a tighter coupling with FestVox and a framework in which we can carry out future Statistical Parametric Synthesis work. Limitations and intended future improvements will also be discussed. 
January 25, 2006 Mari Ostendorf, Jeremy G. Kahn, and Dustin Hillard Parsing Spontaneous Speech With recent advances in automatic speech recognition (ASR), there are increasing opportunities for natural language processing of speech, including applications such as speech understanding, summarization and translation. Parsing can play an important role here, but much of current parsing technology has been developed on written text. Spontaneous speech differs substantially from written text, posing challenges for parsing that include the absence of punctuation and the presence of disfluencies and ASR errors. Prosodic cues can help fill in this gap, and there is a long history of linguistic research indicating that prosodic cues in speech can provide disambiguating context beyond that available from punctuation. However, leveraging prosodic cues can be challenging, because of the many roles prosody serves in speech communication. This talk looks at means of leveraging prosody combined with lexical cues and ASR uncertainty models to improve parsing (and recognition) of spontaneous speech. The talk will begin with an overview of studies of prosody and syntax, both perceptual and computational. The focus of the talk will be on our work with a state-of-the-art statistical parser, discussing the issues of sentence segmentation, disfluencies, sub-sentence prosodic constituents, and ASR uncertainty. Finally, we outline challenges in speech processing that impact parsing, including ASR confidence prediciton, tighter integration of ASR and parsing, and portability to new domains.  
Showing 12 items