
A TikTok creator discovered how to use Gemini's custom GPT feature to automatically extract detailed image descriptions as structured JSON, eliminating guesswork in image recreation. The results are surprisingly precise—and the technique works with any visual content.
A 47-second TikTok video just revealed one of the smartest prompt engineering techniques I've seen in months. While most people struggle to describe images with enough detail for AI recreation, this creator found a way to make Gemini do all the heavy lifting automatically.
Describing images to AI is harder than it looks. Miss the lighting details, forget about the composition, or skip the color palette, and your recreation attempt falls flat. Most people write prompts like "a person in a kitchen with modern appliances" and wonder why their results look generic.
The breakthrough here isn't just about better descriptions—it's about structured data. By converting visual analysis into JSON format, you create a systematic, tweakable blueprint that captures details your eyes might miss.
The difference between eyeballing an image description and using structured analysis is like the difference between sketching from memory and working from technical blueprints.
Here's where Gemini's Gem feature becomes your secret weapon. Think of Gems as Google's answer to ChatGPT's Custom GPTs, but with some unique advantages for visual analysis.
The setup process is deceptively simple:
The magic happens in that specialized prompt. While the TikTok creator keeps the exact wording behind a "comment for access" gate, the concept is clear: this prompt instructs Gemini to analyze images and output detailed descriptions in JSON format rather than natural language.
JSON (JavaScript Object Notation) isn't just a data format—it's a way of thinking systematically about visual elements. Instead of a paragraph describing an image, you get structured fields like:
This structured approach forces the AI to be comprehensive rather than impressionistic.
Once your Vision to JSON Gem is configured, the workflow becomes surprisingly elegant:
Drop any image into your custom Gem without writing a single word of description. The Gem analyzes the image and spits out detailed JSON that captures elements you probably wouldn't have noticed:
Here's where the structured format pays dividends. Instead of rewriting entire prompts, you can chat with Gemini to modify specific JSON fields:
Gemini updates the JSON accordingly, maintaining all the other detailed specifications while making your targeted changes.
The final step: Copy the refined JSON, open a fresh Gemini chat, paste it in, and select Nano Banana Pro (or your preferred image generation model).
The beauty of this system is that you're not starting from scratch each time—you're methodically adjusting a comprehensive blueprint.
Most image recreation attempts follow this painful pattern:
The JSON approach flips this workflow:
Traditional approach: "A modern kitchen with white cabinets and stainless steel appliances"
JSON approach: Structured data specifying cabinet style, hardware finishes, lighting temperature, countertop material, appliance brands, spatial measurements, and compositional framing—all automatically extracted.
The difference in output quality isn't subtle.
This technique demonstrates three fundamental prompt engineering principles:
1. Structure beats creativity. Systematic approaches often outperform artistic intuition when working with AI.
2. Custom instructions amplify capabilities. The same Gemini model produces dramatically different results when given specialized instructions through the Gem feature.
3. Iterative refinement works better than perfect first attempts. Starting with comprehensive JSON and making targeted edits beats trying to write the perfect prompt initially.
Think of this technique as training wheels for advanced image prompting—it shows you what comprehensive image description actually looks like.
For L1 learners, this approach provides a masterclass in systematic thinking about visual elements. Even if you eventually move beyond JSON-based workflows, understanding this level of detail transforms how you approach all image-related AI tasks.
What started as a TikTok hack reveals something deeper about effective AI interaction: the best results often come from systematic approaches rather than creative guesswork. By using Gemini's Gem feature to convert visual analysis into structured JSON data, you're not just improving image recreation—you're learning to think like AI about visual elements. The technique works because it eliminates human blind spots in favor of comprehensive, tweakable specifications. Whether you're recreating marketing visuals, analyzing competitor designs, or just trying to understand what makes images work, this systematic approach beats intuitive description every time.
Rate this tutorial