Data

A Large-Scale Corpus for Conversation Disentanglement (Kummerfeld et al., 2019)

This post is about my own paper to appear at ACL later this month. What is interesting about this paper will depend on your research interests, so that’s how I’ve broken down this blog post. A few key points first: Data and code are available on Github. The paper is also available. The general-purpose span labeling and linking annotation tool we used is also appearing at ACL. Check out DSTC 8 Track 2, which is based on this work.

PreCo: A Large-scale Dataset in Preschool Vocabulary for Coreference Resolution (Chen et al., 2018)

The OntoNotes dataset, which is the focus of almost all coreference resolution research, had several compromises in its development (as is the case for any dataset). Some of these are discussed in…

Frames: a corpus for adding memory to goal-oriented dialogue systems (El Asri et al., 2017)

A new dialogue dataset that has annotations of multiple plans (frames) and dialogue acts that indicate modifications to them.