go back

Generating Coherent and Informative Descriptions for Groups of Visual Objects and Categories: A Simple Decoding Approach

Nazia Attari, David Schlangen, Heiko Wersing, Sina Zarriess, "Generating Coherent and Informative Descriptions for Groups of Visual Objects and Categories: A Simple Decoding Approach", INLG 2022 Proceedings, 2022.

Abstract

State-of-the-art image captioning models achieve very good performance in generating descriptions for instances of visual categories and reasoning about them, e.g. imposing dis- tinctiveness of the description in the context of distractors. In this work, we propose an inference mechanism that extends an instance- level captioning model to generate coherent and informative descriptions for groups of visual objects from the same or different categories. We test our model in the domain of bird de- scriptions. We show that group-level descrip- tions generated by our method are (i) coherent, pulling together properties that are true for all or majority of its instances, and (ii) informa- tive, as they allow an external BERT-based text classifi er to identify the target category more accurately in comparison to single-instance cap- tions and are preferred by human evaluators.



Download Bibtex file Download PDF

Search