Google Gemini

Advanced reasoning & thinking control

Google Gemini 2.5+ models support advanced reasoning capabilities. Control how deeply the model thinks about your problem, allocate tokens for internal reasoning, and even see the model's thought process.

Reasoning control basics

Use the reasoning_effort parameter to control how much computational effort the model spends on reasoning. Higher effort means better results for complex problems, but uses more tokens.

"none"

Fastest, no reasoning

"low"

Minimal reasoning

"medium"

Balanced (default)

"high"

Maximum reasoning

Quick start examples

Basic reasoning control

Simple parameter to enable/disable reasoning

from openai import OpenAI

client = OpenAI(
    api_key="your-zaguan-api-key",
    base_url="https://api.zaguanai.com/v1",
)

response = client.chat.completions.create(
    model="google/gemini-2.5-pro",
    messages=[
        {"role": "user", "content": "Solve this complex problem: ..."}
    ],
    extra_body={
        "reasoning_effort": "high"  # Use maximum reasoning
    }
)

print(response.choices[0].message.content)

Simplified thinking toggle

Use the thinking boolean for convenience

response = client.chat.completions.create(
    model="google/gemini-2.5-flash",
    messages=[{"role": "user", "content": "Your prompt"}],
    extra_body={
        "thinking": False  # Disable reasoning for speed
    }
)

Translation:

• "thinking": true → "reasoning_effort": "medium"
• "thinking": false → "reasoning_effort": "none"

Advanced Feature

Advanced: Thinking budgets

For fine-grained control, use thinking_config to allocate a specific token budget for internal reasoning and optionally see the model's thoughts.

response = client.chat.completions.create(
    model="google/gemini-2.5-pro",
    messages=[
        {"role": "user", "content": "Design a distributed system architecture"}
    ],
    extra_body={
        "google": {
            "thinking_config": {
                "thinking_budget": 15000,      # Max tokens for reasoning
                "include_thoughts": True       # Show reasoning process
            }
        }
    }
)

# Response includes both thoughts and final answer
print(response.choices[0].message.content)

⚠️ Important: When using thinking_budget, the reasoning_effort parameter is automatically removed as they cannot be used together.

Function calling

Gemini models support OpenAI-compatible function calling through the standard tools parameter. No special configuration needed—it just works.

Function calling example

Define tools using OpenAI's standard format

response = client.chat.completions.create(
    model="google/gemini-2.0-flash",
    messages=[
        {"role": "user", "content": "What's the weather in Tokyo?"}
    ],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get current weather for a city",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "city": {"type": "string"}
                    },
                    "required": ["city"]
                }
            }
        }
    ],
    tool_choice="auto"
)

# Handle tool calls
if response.choices[0].message.tool_calls:
    for tool_call in response.choices[0].message.tool_calls:
        # Execute your function here
        result = get_weather(tool_call.function.arguments)
        # Continue conversation with result...

Safety settings

Gemini models include adjustable safety filters to control content blocking. Zaguán passes these settings through the extra_body parameter, giving you fine-grained control over what content is allowed.

Safety categories

Four adjustable content filter categories

HARM_CATEGORY_HARASSMENT

Negative or harmful comments targeting identity/protected attributes

HARM_CATEGORY_HATE_SPEECH

Content that is rude, disrespectful, or profane

HARM_CATEGORY_SEXUALLY_EXPLICIT

Contains references to sexual acts or content

HARM_CATEGORY_DANGEROUS_CONTENT

Promotes or enables harmful activities

Block thresholds

Control how aggressively content is filtered

BLOCK_NONEAllow all content

BLOCK_ONLY_HIGHBlock high probability only

BLOCK_MEDIUM_AND_ABOVEBlock medium & high (default)

BLOCK_LOW_AND_ABOVEBlock low, medium & high

Using safety settings with Zaguán

Pass safety settings through extra_body

from openai import OpenAI

client = OpenAI(
    api_key="your-zaguan-api-key",
    base_url="https://api.zaguanai.com/v1",
)

response = client.chat.completions.create(
    model="google/gemini-2.0-flash",
    messages=[
        {"role": "user", "content": "Your prompt here"}
    ],
    extra_body={
        "google": {
            "safety_settings": [
                {
                    "category": "HARM_CATEGORY_HARASSMENT",
                    "threshold": "BLOCK_ONLY_HIGH"
                },
                {
                    "category": "HARM_CATEGORY_HATE_SPEECH",
                    "threshold": "BLOCK_ONLY_HIGH"
                },
                {
                    "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
                    "threshold": "BLOCK_MEDIUM_AND_ABOVE"
                }
            ]
        }
    }
)

print(response.choices[0].message.content)

⚠️ Important: The default threshold isBLOCK_MEDIUM_AND_ABOVE for most categories. Only adjust safety settings if your use case consistently requires it. Built-in protections against core harms (like child safety) cannot be disabled.

Advanced Feature

Supported models

Reasoning features work with these Gemini models:

google/gemini-2.5-pro

Best for complex reasoning tasks

google/gemini-2.5-flash

Faster, cost-effective reasoning

google/gemini-2.0-flash-thinking

Specialized thinking model

google/gemini-2.0-flash-exp

Experimental features

Best practices

•
Start with medium: Use reasoning_effort: "medium"as your baseline and adjust based on results
•
Use high for complex tasks: Math problems, code generation, and multi-step reasoning benefit from "high" effort
•
Disable for simple tasks: Use "none"for basic Q&A or when speed matters more than depth
•
Monitor token usage: Reasoning consumes additional tokens—check the usage field in responses
•
Enable thoughts for debugging: Set include_thoughts: trueto understand how the model arrived at its answer
•
Use default safety settings: Start with the defaultBLOCK_MEDIUM_AND_ABOVE threshold and only adjust if your use case requires it
•
Test safety configurations: If you relax safety settings, thoroughly test with edge cases to ensure appropriate content filtering for your users

💡 Tip for solo devs: Start without reasoning features to establish a baseline. Add reasoning only when you notice the model struggling with complex tasks. This keeps costs predictable while you're building.