Utterance-based reference chain

Round 1

A: A bunch of people with their backs facing the camera. One of them has an open pink umbrella

Round 2

B: a crowd of people, the top of a pink umbrella

Round 5

B: crowd with top of pink umbrella