Advanced reasoning & thinking control
Google Gemini 2.5+ models support advanced reasoning capabilities. Control how deeply the model thinks about your problem, allocate tokens for internal reasoning, and even see the model's thought process.
Reasoning control basics
Use the reasoning_effort parameter to control how much computational effort the model spends on reasoning. Higher effort means better results for complex problems, but uses more tokens.
"none"Fastest, no reasoning
"low"Minimal reasoning
"medium"Balanced (default)
"high"Maximum reasoning
Quick start examples
Basic reasoning control
Simple parameter to enable/disable reasoning
from openai import OpenAI
client = OpenAI(
api_key="your-zaguan-api-key",
base_url="https://api.zaguanai.com/v1",
)
response = client.chat.completions.create(
model="google/gemini-2.5-pro",
messages=[
{"role": "user", "content": "Solve this complex problem: ..."}
],
extra_body={
"reasoning_effort": "high" # Use maximum reasoning
}
)
print(response.choices[0].message.content)Simplified thinking toggle
Use the thinking boolean for convenience
response = client.chat.completions.create(
model="google/gemini-2.5-flash",
messages=[{"role": "user", "content": "Your prompt"}],
extra_body={
"thinking": False # Disable reasoning for speed
}
)- •
"thinking": true→"reasoning_effort": "medium" - •
"thinking": false→"reasoning_effort": "none"
Advanced Feature
Advanced: Thinking budgets
For fine-grained control, use thinking_config to allocate a specific token budget for internal reasoning and optionally see the model's thoughts.
response = client.chat.completions.create(
model="google/gemini-2.5-pro",
messages=[
{"role": "user", "content": "Design a distributed system architecture"}
],
extra_body={
"google": {
"thinking_config": {
"thinking_budget": 15000, # Max tokens for reasoning
"include_thoughts": True # Show reasoning process
}
}
}
)
# Response includes both thoughts and final answer
print(response.choices[0].message.content)thinking_budget, the reasoning_effort parameter is automatically removed as they cannot be used together.Function calling
Gemini models support OpenAI-compatible function calling through the standard tools parameter. No special configuration needed—it just works.
Function calling example
Define tools using OpenAI's standard format
response = client.chat.completions.create(
model="google/gemini-2.0-flash",
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"}
],
tools=[
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
}
}
}
],
tool_choice="auto"
)
# Handle tool calls
if response.choices[0].message.tool_calls:
for tool_call in response.choices[0].message.tool_calls:
# Execute your function here
result = get_weather(tool_call.function.arguments)
# Continue conversation with result...Safety settings
Gemini models include adjustable safety filters to control content blocking. Zaguán passes these settings through the extra_body parameter, giving you fine-grained control over what content is allowed.
Safety categories
Four adjustable content filter categories
HARM_CATEGORY_HARASSMENTNegative or harmful comments targeting identity/protected attributes
HARM_CATEGORY_HATE_SPEECHContent that is rude, disrespectful, or profane
HARM_CATEGORY_SEXUALLY_EXPLICITContains references to sexual acts or content
HARM_CATEGORY_DANGEROUS_CONTENTPromotes or enables harmful activities
Block thresholds
Control how aggressively content is filtered
BLOCK_NONEAllow all contentBLOCK_ONLY_HIGHBlock high probability onlyBLOCK_MEDIUM_AND_ABOVEBlock medium & high (default)BLOCK_LOW_AND_ABOVEBlock low, medium & highUsing safety settings with Zaguán
Pass safety settings through extra_body
from openai import OpenAI
client = OpenAI(
api_key="your-zaguan-api-key",
base_url="https://api.zaguanai.com/v1",
)
response = client.chat.completions.create(
model="google/gemini-2.0-flash",
messages=[
{"role": "user", "content": "Your prompt here"}
],
extra_body={
"google": {
"safety_settings": [
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_ONLY_HIGH"
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_ONLY_HIGH"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
}
]
}
}
)
print(response.choices[0].message.content)BLOCK_MEDIUM_AND_ABOVE for most categories. Only adjust safety settings if your use case consistently requires it. Built-in protections against core harms (like child safety) cannot be disabled.Advanced Feature
Supported models
Reasoning features work with these Gemini models:
google/gemini-2.5-proBest for complex reasoning tasks
google/gemini-2.5-flashFaster, cost-effective reasoning
google/gemini-2.0-flash-thinkingSpecialized thinking model
google/gemini-2.0-flash-expExperimental features
Best practices
- •Start with medium: Use
reasoning_effort: "medium"as your baseline and adjust based on results - •Use high for complex tasks: Math problems, code generation, and multi-step reasoning benefit from
"high"effort - •Disable for simple tasks: Use
"none"for basic Q&A or when speed matters more than depth - •Monitor token usage: Reasoning consumes additional tokens—check the
usagefield in responses - •Enable thoughts for debugging: Set
include_thoughts: trueto understand how the model arrived at its answer - •Use default safety settings: Start with the default
BLOCK_MEDIUM_AND_ABOVEthreshold and only adjust if your use case requires it - •Test safety configurations: If you relax safety settings, thoroughly test with edge cases to ensure appropriate content filtering for your users