Generating Descriptions of Visual Objects What do people describe when they look at objects? Can we model what they say? (Why does this matter?) This talk will characterize what makes up a visual description and define some of the methods necessary to automatically generate such language. Taking this a bit further, I describe an end-to-end prototype system that reads in computer vision output and generates natural language descriptions. Time permitting, I argue that improving visual descriptions can also improve computer vision, and working on the interaction between the two may lead to advances in both computer vision and natural language generation. My prototype vision-to-language system, largely developed during the Hopkins summer workshop 2011 in collaboration with vision researchers at Stony Brook and language researchers at U. Maryland, is available at: http://recognition.cs.stonybrook.edu:8080/~mitchema/midge/ Meg Mitchell graduated with a Bachelor's in Linguistics from Reed College in 2005. Since then, she has worked at the Center for Spoken Language Understanding at Oregon Health and Science University, helping to automatically diagnose neurological disorders by analyzing the syntactic and phonetic characteristics of spoken language. She has also received a Master's in Computational Linguistics from the University of Washington and is working towards a Ph.D. in Computing Science at the University of Aberdeen. |

