Segment-based reference chain

Round 1

A: Do you have a girl holding a pizza in a kitchen?

B: no

Round 3

B: I don't have them

B: I have a woman with open mouth holding a large pizza at chin height

A: I don't have that one.

Round 4

B: I have the kid in red on the bench

A: I do not

A: I have the women with an open mouth holding the large pizza.

Round 5

A: I have the open-mouthed woman holding the pizza again.

B: my last is the woman holding the large pizza by her chin - mouth open

B: yes