Broken Telephone

Text-to-Picture and then Picture-to-Text… will the first and last image look the same?

Emmy Keogh


February 26, 2024

Ever heard of the game Broken Telephone?

Basically, it is the fraternal twin to the game Telephone. You start with one word/phrase, and as it is passed through the players it gets more and more dissimilar to the original prompt. After learning about the coding for picture-to-text, I thought this would be a great test! How far away from the original prompt can we get after generating a picture, turning the picture to text, and re-inserting the picture again? Well, let’s find out!

Prompt: Create a picture of a selfie of a dog wearing sunglasses, an upset cat, and a horse who is just happy to be there. Do not make it a drawing make it as realistic as possible.


Picture to text

content=“The image shows three animals, each with anthropomorphic expressions as if they are posing for a selfie. On the left is a cat with a somewhat grumpy expression. In the center is a dog wearing sunglasses, with its tongue out in a happy expression, reminiscent of a typical selfie smile. On the right is a horse, also smiling and showing teeth, as though it’s laughing or grinning. The styling of the image is highly detailed and realistic, and the animals are presented in a way that”

Now, I cycle back in the picture-to-text response as the new prompt for the picture to see how much it changes compared to the original.


Picture to text

content=“This image depicts a trio of anthropomorphized animals posing as if they are taking a selfie. From left to right, there is a grumpy-looking cat holding a smartphone, a smiling dog in the center wearing sunglasses, and a horse showing its teeth in a wide grin on the right. The image is highly stylized and detailed, showcasing exaggerated human-like expressions on the animals’ faces, which suggests that it might be a digital artwork or a creative illustration, rather than real animals.”

This one lost the selfie element and some of the saturation of color, but for the most part the description fits.


Picture to text

content=“This image features a whimsical illustration of four animals as if they are posing for a selfie. From left to right, there’s a cat, a dog wearing sunglasses, another small enthusiastic dog, and a horse—all of which are displaying exaggerated, comical facial expressions. The dog with sunglasses appears to be holding the smartphone for the selfie. At the bottom of the image, there is a graphic depiction of a digital tablet or drawing pad with a stylus pen, indicating that this might be a digital.”

We’ve added another animal! Now the cat is taking the selfie, and it is clearly a digital drawing, so it lost its realistic style.


Picture to text

content=‘This image shows a digital drawing on a graphics tablet. The drawing is of five animals with anthropomorphized features, making them appear as if they have human-like expressions. From left to right, there are two cats, two dogs, and a horse. The animals are characterized with features such as prominent eyes and expressive second from the left is wearing a pair of sunglasses, which adds a touch of humor to the image. The horse on the right is partially obscured by a human’

The picture has lost its original intention of a selfie of three animals. Now 3 animals are wearing sunglasses, and the horse is not as happy as it started out. At least the picture is back to a more realistic style.

Final Picture

Closing Remarks

We went from a cute selfie of a dog, a horse, and a cat to a drawing of a cat, three dogs, a horse, and a human. Seems like the back-and-forth nature of the prompts confused the technology. This was a fun experiment, but I’d rather play this game with my friends instead!

Thanks for tuning in,
