BattlecatAI
HomeBrowsePathsToolsLevel UpRewardsBookmarksSearchSubmit

Battlecat AI — Built on the AI Maturity Framework

The Custom GPT Secret That Turns Any Image Into Perfect JSON Prompts
L1 InstructorPracticeintermediate5 min read

The Custom GPT Secret That Turns Any Image Into Perfect JSON Prompts

A TikTok creator discovered how to use Gemini's custom GPT feature to automatically extract detailed image descriptions as structured JSON, eliminating guesswork in image recreation. The results are surprisingly precise—and the technique works with any visual content.

custom instructionsimage analysisprompt engineeringstructured promptsGeminiNano Banana Pro

A 47-second TikTok video just revealed one of the smartest prompt engineering techniques I've seen in months. While most people struggle to describe images with enough detail for AI recreation, this creator found a way to make Gemini do all the heavy lifting automatically.

Why This Changes Everything

Describing images to AI is harder than it looks. Miss the lighting details, forget about the composition, or skip the color palette, and your recreation attempt falls flat. Most people write prompts like "a person in a kitchen with modern appliances" and wonder why their results look generic.

The breakthrough here isn't just about better descriptions—it's about structured data. By converting visual analysis into JSON format, you create a systematic, tweakable blueprint that captures details your eyes might miss.

The difference between eyeballing an image description and using structured analysis is like the difference between sketching from memory and working from technical blueprints.


The Custom GPT That Does the Work for You

Here's where Gemini's Gem feature becomes your secret weapon. Think of Gems as Google's answer to ChatGPT's Custom GPTs, but with some unique advantages for visual analysis.

The setup process is deceptively simple:

  1. Navigate to Gemini and select the Gem option
  2. Create a new Gem with the title "Vision to JSON"
  3. Add a description (anything works—this is just for your reference)
  4. Input the specialized prompt that transforms image analysis into structured JSON

The magic happens in that specialized prompt. While the TikTok creator keeps the exact wording behind a "comment for access" gate, the concept is clear: this prompt instructs Gemini to analyze images and output detailed descriptions in JSON format rather than natural language.

Why JSON Makes All the Difference

JSON (JavaScript Object Notation) isn't just a data format—it's a way of thinking systematically about visual elements. Instead of a paragraph describing an image, you get structured fields like:

  • Composition: Rule of thirds, focal points, framing
  • Lighting: Direction, intensity, color temperature
  • Colors: Dominant palette, accent colors, saturation levels
  • Objects: Precise positioning, scale relationships
  • Style: Artistic techniques, filters, post-processing effects

This structured approach forces the AI to be comprehensive rather than impressionistic.


The Two-Step Recreation Process

Once your Vision to JSON Gem is configured, the workflow becomes surprisingly elegant:

Step 1: Extract the Blueprint

Drop any image into your custom Gem without writing a single word of description. The Gem analyzes the image and spits out detailed JSON that captures elements you probably wouldn't have noticed:

  • Subtle lighting gradients
  • Precise color hex values
  • Spatial relationships between objects
  • Texture descriptions
  • Compositional techniques

Step 2: Refine and Execute

Here's where the structured format pays dividends. Instead of rewriting entire prompts, you can chat with Gemini to modify specific JSON fields:

  • "Make the product 20% larger"
  • "Change the background to a modern kitchen"
  • "Adjust the lighting to golden hour"

Gemini updates the JSON accordingly, maintaining all the other detailed specifications while making your targeted changes.

The final step: Copy the refined JSON, open a fresh Gemini chat, paste it in, and select Nano Banana Pro (or your preferred image generation model).

The beauty of this system is that you're not starting from scratch each time—you're methodically adjusting a comprehensive blueprint.


Why This Beats Traditional Prompt Engineering

Most image recreation attempts follow this painful pattern:

  1. Look at target image
  2. Write description based on obvious elements
  3. Generate image
  4. Realize you missed crucial details
  5. Rewrite prompt from scratch
  6. Repeat until frustrated

The JSON approach flips this workflow:

  1. Let AI extract ALL details systematically
  2. Review comprehensive specification
  3. Make targeted adjustments to specific elements
  4. Generate with confidence

Traditional approach: "A modern kitchen with white cabinets and stainless steel appliances"

JSON approach: Structured data specifying cabinet style, hardware finishes, lighting temperature, countertop material, appliance brands, spatial measurements, and compositional framing—all automatically extracted.

The difference in output quality isn't subtle.


The Broader Implications for L1 Learners

This technique demonstrates three fundamental prompt engineering principles:

1. Structure beats creativity. Systematic approaches often outperform artistic intuition when working with AI.

2. Custom instructions amplify capabilities. The same Gemini model produces dramatically different results when given specialized instructions through the Gem feature.

3. Iterative refinement works better than perfect first attempts. Starting with comprehensive JSON and making targeted edits beats trying to write the perfect prompt initially.

Think of this technique as training wheels for advanced image prompting—it shows you what comprehensive image description actually looks like.

For L1 learners, this approach provides a masterclass in systematic thinking about visual elements. Even if you eventually move beyond JSON-based workflows, understanding this level of detail transforms how you approach all image-related AI tasks.


The Bottom Line

What started as a TikTok hack reveals something deeper about effective AI interaction: the best results often come from systematic approaches rather than creative guesswork. By using Gemini's Gem feature to convert visual analysis into structured JSON data, you're not just improving image recreation—you're learning to think like AI about visual elements. The technique works because it eliminates human blind spots in favor of comprehensive, tweakable specifications. Whether you're recreating marketing visuals, analyzing competitor designs, or just trying to understand what makes images work, this systematic approach beats intuitive description every time.

Try This Now

  • 1Create a 'Vision to JSON' Gem in Google Gemini with specialized image analysis instructions
  • 2Test the workflow by uploading an image to your custom Gem and analyzing the JSON output structure
  • 3Practice iterative refinement by chatting with Gemini to modify specific JSON elements before final generation
  • 4Compare results between traditional text descriptions and JSON-based prompts using the same source image

How many Orkos does this deserve?

Rate this tutorial

Sources (1)

  • https://www.tiktok.com/t/ZP8mNFmpu
← All L1 tutorialsBrowse all →