The Custom GPT Secret That Turns Any Image Into Perfect JSON Prompts

A 47-second TikTok video just revealed one of the smartest prompt engineering techniques I've seen in months. While most people struggle to describe images with enough detail for AI recreation, this creator found a way to make Gemini do all the heavy lifting automatically.

Why This Changes Everything

Describing images to AI is harder than it looks. Miss the lighting details, forget about the composition, or skip the color palette, and your recreation attempt falls flat. Most people write prompts like "a person in a kitchen with modern appliances" and wonder why their results look generic.

The breakthrough here isn't just about better descriptions—it's about structured data. By converting visual analysis into JSON format, you create a systematic, tweakable blueprint that captures details your eyes might miss.

The difference between eyeballing an image description and using structured analysis is like the difference between sketching from memory and working from technical blueprints.

The Custom GPT That Does the Work for You

Here's where Gemini's Gem feature becomes your secret weapon. Think of Gems as Google's answer to ChatGPT's Custom GPTs, but with some unique advantages for visual analysis.

The setup process is deceptively simple:

Navigate to Gemini and select the Gem option
Create a new Gem with the title "Vision to JSON"
Add a description (anything works—this is just for your reference)
Input the specialized prompt that transforms image analysis into structured JSON

The magic happens in that specialized prompt. While the TikTok creator keeps the exact wording behind a "comment for access" gate, the concept is clear: this prompt instructs Gemini to analyze images and output detailed descriptions in JSON format rather than natural language.

Why JSON Makes All the Difference

JSON (JavaScript Object Notation) isn't just a data format—it's a way of thinking systematically about visual elements. Instead of a paragraph describing an image, you get structured fields like:

Composition: Rule of thirds, focal points, framing
Lighting: Direction, intensity, color temperature
Colors: Dominant palette, accent colors, saturation levels
Objects: Precise positioning, scale relationships
Style: Artistic techniques, filters, post-processing effects

This structured approach forces the AI to be comprehensive rather than impressionistic.

The Two-Step Recreation Process

Once your Vision to JSON Gem is configured, the workflow becomes surprisingly elegant:

Step 1: Extract the Blueprint

Drop any image into your custom Gem without writing a single word of description. The Gem analyzes the image and spits out detailed JSON that captures elements you probably wouldn't have noticed:

Subtle lighting gradients
Precise color hex values
Spatial relationships between objects
Texture descriptions
Compositional techniques

Step 2: Refine and Execute

Here's where the structured format pays dividends. Instead of rewriting entire prompts, you can chat with Gemini to modify specific JSON fields:

"Make the product 20% larger"
"Change the background to a modern kitchen"
"Adjust the lighting to golden hour"

Gemini updates the JSON accordingly, maintaining all the other detailed specifications while making your targeted changes.

The final step: Copy the refined JSON, open a fresh Gemini chat, paste it in, and select Nano Banana Pro (or your preferred image generation model).

The beauty of this system is that you're not starting from scratch each time—you're methodically adjusting a comprehensive blueprint.

Why This Beats Traditional Prompt Engineering

Most image recreation attempts follow this painful pattern:

Look at target image
Write description based on obvious elements
Generate image
Realize you missed crucial details
Rewrite prompt from scratch
Repeat until frustrated

The JSON approach flips this workflow:

Let AI extract ALL details systematically
Review comprehensive specification
Make targeted adjustments to specific elements
Generate with confidence

Traditional approach: "A modern kitchen with white cabinets and stainless steel appliances"

JSON approach: Structured data specifying cabinet style, hardware finishes, lighting temperature, countertop material, appliance brands, spatial measurements, and compositional framing—all automatically extracted.

The difference in output quality isn't subtle.

The Broader Implications for L1 Learners

This technique demonstrates three fundamental prompt engineering principles:

1. Structure beats creativity. Systematic approaches often outperform artistic intuition when working with AI.

2. Custom instructions amplify capabilities. The same Gemini model produces dramatically different results when given specialized instructions through the Gem feature.

3. Iterative refinement works better than perfect first attempts. Starting with comprehensive JSON and making targeted edits beats trying to write the perfect prompt initially.

Think of this technique as training wheels for advanced image prompting—it shows you what comprehensive image description actually looks like.

For L1 learners, this approach provides a masterclass in systematic thinking about visual elements. Even if you eventually move beyond JSON-based workflows, understanding this level of detail transforms how you approach all image-related AI tasks.

The Bottom Line

What started as a TikTok hack reveals something deeper about effective AI interaction: the best results often come from systematic approaches rather than creative guesswork. By using Gemini's Gem feature to convert visual analysis into structured JSON data, you're not just improving image recreation—you're learning to think like AI about visual elements. The technique works because it eliminates human blind spots in favor of comprehensive, tweakable specifications. Whether you're recreating marketing visuals, analyzing competitor designs, or just trying to understand what makes images work, this systematic approach beats intuitive description every time.

Why This Changes Everything

The difference between eyeballing an image description and using structured analysis is like the difference between sketching from memory and working from technical blueprints.

The Custom GPT That Does the Work for You

Here's where Gemini's Gem feature becomes your secret weapon. Think of Gems as Google's answer to ChatGPT's Custom GPTs, but with some unique advantages for visual analysis.

The setup process is deceptively simple:

Navigate to Gemini and select the Gem option
Create a new Gem with the title "Vision to JSON"
Add a description (anything works—this is just for your reference)
Input the specialized prompt that transforms image analysis into structured JSON

Why JSON Makes All the Difference

Composition: Rule of thirds, focal points, framing
Lighting: Direction, intensity, color temperature
Colors: Dominant palette, accent colors, saturation levels
Objects: Precise positioning, scale relationships
Style: Artistic techniques, filters, post-processing effects

This structured approach forces the AI to be comprehensive rather than impressionistic.

The Two-Step Recreation Process

Once your Vision to JSON Gem is configured, the workflow becomes surprisingly elegant:

Step 1: Extract the Blueprint

Drop any image into your custom Gem without writing a single word of description. The Gem analyzes the image and spits out detailed JSON that captures elements you probably wouldn't have noticed:

Subtle lighting gradients
Precise color hex values
Spatial relationships between objects
Texture descriptions
Compositional techniques

Step 2: Refine and Execute

Here's where the structured format pays dividends. Instead of rewriting entire prompts, you can chat with Gemini to modify specific JSON fields:

"Make the product 20% larger"
"Change the background to a modern kitchen"
"Adjust the lighting to golden hour"

Gemini updates the JSON accordingly, maintaining all the other detailed specifications while making your targeted changes.

The final step: Copy the refined JSON, open a fresh Gemini chat, paste it in, and select Nano Banana Pro (or your preferred image generation model).

The beauty of this system is that you're not starting from scratch each time—you're methodically adjusting a comprehensive blueprint.

Why This Beats Traditional Prompt Engineering

Most image recreation attempts follow this painful pattern:

Look at target image
Write description based on obvious elements
Generate image
Realize you missed crucial details
Rewrite prompt from scratch
Repeat until frustrated

The JSON approach flips this workflow:

Let AI extract ALL details systematically
Review comprehensive specification
Make targeted adjustments to specific elements
Generate with confidence

Traditional approach: "A modern kitchen with white cabinets and stainless steel appliances"

The difference in output quality isn't subtle.

The Broader Implications for L1 Learners

This technique demonstrates three fundamental prompt engineering principles:

1. Structure beats creativity. Systematic approaches often outperform artistic intuition when working with AI.

2. Custom instructions amplify capabilities. The same Gemini model produces dramatically different results when given specialized instructions through the Gem feature.

3. Iterative refinement works better than perfect first attempts. Starting with comprehensive JSON and making targeted edits beats trying to write the perfect prompt initially.

Think of this technique as training wheels for advanced image prompting—it shows you what comprehensive image description actually looks like.

The Custom GPT Secret That Turns Any Image Into Perfect JSON Prompts

Why This Changes Everything

The Custom GPT That Does the Work for You

Why JSON Makes All the Difference

The Two-Step Recreation Process

Step 1: Extract the Blueprint

Step 2: Refine and Execute

Why This Beats Traditional Prompt Engineering

The Broader Implications for L1 Learners

The Bottom Line

Try This Now

How many Orkos does this deserve?

Sources (1)

The Custom GPT Secret That Turns Any Image Into Perfect JSON Prompts

Why This Changes Everything

The Custom GPT That Does the Work for You

Why JSON Makes All the Difference

The Two-Step Recreation Process

Step 1: Extract the Blueprint

Step 2: Refine and Execute

Why This Beats Traditional Prompt Engineering

The Broader Implications for L1 Learners

The Bottom Line

Try This Now

How many Orkos does this deserve?

Sources (1)