Segment-based reference chain

Round 1

A: Done.

A: My first image is over a woman in a green t-shirt taking something out of an oven at her home.

B: A man in a black shirt, standing in front of a grill in a backyard.

A: I have the man at the grill.

Round 2

B: I have the same 3

B: guy in front of a backyard bbq

A: I have the older woman making pizzas. Was that one?

B: yes, I have that

A: I don't have the bbq guy.

Round 3

A: I have the grill guy this time.

B: I don't.

Round 5

B: bbq guy

A: Nope.

B: green shirt woman