Skip to main content

[Preview] v1.79.1-stable - FAL AI Support

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaff
CTO, LiteLLM

Deploy this versionโ€‹

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.80.0-stable

Key Highlightsโ€‹

  • Container API Support - End-to-end OpenAI Container API support with proxy integration, logging, and cost tracking
  • FAL AI Image Generation - Native support for FAL AI image generation models with cost tracking
  • UI Enhancements - Guardrail Playground, Cache Settings, Tag Routing, SSO Settings
  • Batch API Rate Limiting - Input-based rate limits support for Batch API requests
  • Vector Store Expansion - Milvus vector store support and Azure AI virtual indexes
  • Memory Leak Fixes - Resolved issues accounting for 90% of memory leaks on Python SDK & AI Gateway

Dependency Upgradesโ€‹

  • Dependencies
    • Build(deps): bump starlette from 0.47.2 to 0.49.1 - PR #16027
    • Build(deps): bump fastapi from 0.116.1 to 0.120.1 - PR #16054
    • Build(deps): bump hono from 4.9.7 to 4.10.3 in /litellm-js/spend-logs - PR #15915

New Models / Updated Modelsโ€‹

New Model Supportโ€‹

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)Features
Mistralmistral/codestral-embed8K$0.15-Embeddings
Mistralmistral/codestral-embed-25058K$0.15-Embeddings
Geminigemini/gemini-embedding-0012K$0.15-Embeddings
FAL AIfal_ai/fal-ai/flux-pro/v1.1-ultra---Image generation - $0.0398/image
FAL AIfal_ai/fal-ai/imagen4/preview---Image generation - $0.0398/image
FAL AIfal_ai/fal-ai/recraft/v3/text-to-image---Image generation - $0.0398/image
FAL AIfal_ai/fal-ai/stable-diffusion-v35-medium---Image generation - $0.0398/image
FAL AIfal_ai/bria/text-to-image/3.2---Image generation - $0.0398/image
OpenAIopenai/sora-2-pro---Video generation - $0.30/video/second

Featuresโ€‹

  • Anthropic

    • Extended Claude 3-7 Sonnet deprecation date from 2026-02-01 to 2026-02-19 - PR #15976
    • Extended Claude Opus 4-0 deprecation date from 2025-03-01 to 2026-05-01 - PR #15976
    • Removed Claude Haiku 3-5 deprecation date (previously 2025-03-01) - PR #15976
    • Added Claude Opus 4-1, Claude Opus 4-0 20250513, Claude Sonnet 4 20250514 deprecation dates - PR #15976
    • Added web search support for Claude Opus 4-1 - PR #15976
  • Bedrock

    • Fix empty assistant message handling in AWS Bedrock Converse API to prevent 400 Bad Request errors - PR #15850
    • Allow using ARNs when generating images via Bedrock - PR #15789
    • Add per model group header forwarding for Bedrock Invoke API - PR #16042
    • Preserve Bedrock inference profile IDs in health checks - PR #15947
    • Added fallback logic for detecting file content-type when S3 returns generic type - When using Bedrock with S3-hosted files, if the S3 object's Content-Type is not correctly set (e.g., binary/octet-stream instead of image/png), Bedrock can now handle it correctly - PR #15635
  • Azure

    • Add deprecation dates for Azure OpenAI models (gpt-4o-2024-08-06, gpt-4o-2024-11-20, gpt-4.1 series, o3-2025-04-16, text-embedding-3-small) - PR #15976
    • Fix Azure OpenAI ContextWindowExceededError mapping from Azure errors - PR #15981
    • Add handling for v1 under Azure API versions - PR #15984
    • Fix azure doesn't accept extra body param - PR #16116
  • OpenAI

    • Add deprecation dates for gpt-3.5-turbo-1106, gpt-4-0125-preview, gpt-4-1106-preview, o1-mini-2024-09-12 - PR #15976
    • Add extended Sora-2 modality support (text + image inputs) - PR #15976
    • Updated OpenAI Sora-2-Pro pricing to $0.30/video/second - PR #15976
  • OpenRouter

    • Add Claude Haiku 4.5 pricing for OpenRouter - PR #15909
    • Add base_url config with environment variables documentation - PR #15946
  • Mistral

    • Add codestral-embed-2505 embedding model - PR #16071
  • Gemini (Google AI Studio + Vertex AI)

    • Fix gemini request mutation for tool use - PR #16002
    • Add gemini-embedding-001 pricing entry for Google GenAI API - PR #16078
    • Changes to fix frequency_penalty and presence_penalty issue for gemini-2.5-pro model - PR #16041
  • DeepInfra

    • Add vision support for Qwen/Qwen3-chat-32b model - PR #15976
  • Vercel AI Gateway

    • Fix vercel_ai_gateway entry for glm-4.6 (moved from vercel_ai_gateway/glm-4.6 to vercel_ai_gateway/zai/glm-4.6) - PR #16084
  • Fireworks

    • Don't add "accounts/fireworks/models" prefix for Fireworks Provider - PR #15938
  • Cohere

    • Add OpenAI-compatible annotations support for Cohere v2 citations - PR #16038
  • Deepgram

    • Handle Deepgram detected language when available - PR #16093

Bug Fixesโ€‹

New Provider Supportโ€‹


LLM API Endpointsโ€‹

Featuresโ€‹

  • Container API

    • Add end-to-end OpenAI Container API support to LiteLLM SDK - PR #16136
    • Add proxy support for container APIs - PR #16049
    • Add logging support for Container API - PR #16049
    • Add cost tracking support for containers with documentation - PR #16117
  • Responses API

    • Respect LiteLLM-Disable-Message-Redaction header for Responses API - PR #15966
    • Add /openai routes for responses API (Azure OpenAI SDK Compatibility) - PR #15988
    • Redact reasoning summaries in ResponsesAPI output when message logging is disabled - PR #15965
    • Support text.format parameter in Responses API for providers without native ResponsesAPIConfig - PR #16023
    • Add LLM provider response headers to Responses API - PR #16091
  • Video Generation API

    • Add custom_llm_provider support for video endpoints (non-generation) - PR #16121
    • Fix documentation for videos - PR #15937
    • Add OpenAI client usage documentation for videos and fix navigation visibility - PR #15996
  • Moderations API

    • Moderations endpoint now respects api_base configuration parameter - PR #16087
  • Vector Stores

    • Milvus - search vector store support - PR #16035
    • Azure AI Vector Stores - support "virtual" indexes + create vector store on passthrough API - PR #16160
  • Passthrough Endpoints

    • Support multi-part form data on passthrough - PR #16035

Management Endpoints / UIโ€‹

Featuresโ€‹

  • Virtual Keys

    • Validation for Proxy Base URL in SSO Settings - PR #16082
    • Test Key UI Embeddings support - PR #16065
    • Add Key Type Select in Key Settings - PR #16034
    • Key Already Exist Error Notification - PR #15993
  • Models + Endpoints

    • Changed API Base from Select to Input in New LLM Credentials - PR #15987
    • Remove limit from admin UI numerical input - PR #15991
    • Config Models should not be editable - PR #16020
    • Add tags in model creation - PR #16138
    • Add Tags to update model - PR #16140
  • Guardrails

    • Add Apply Guardrail Testing Playground - PR #16030
    • Config Guardrails should not be editable and guardrail info fix - PR #16142
  • Cache Settings

    • Allow setting cache settings on UI - PR #16143
  • Routing

    • Allow setting all routing strategies, tag filtering on UI - PR #16139
  • Admin Settings

    • Add license metadata to health/readiness endpoint - PR #15997
    • Litellm Backend SSO Changes - PR #16029

Logging / Guardrail / Prompt Management Integrationsโ€‹

Featuresโ€‹

  • OpenTelemetry

    • Enable OpenTelemetry context propagation by external tracers - PR #15940
    • Ensure error information is logged on OTEL - PR #15978
  • Langfuse

    • Fix duplicate trace in langfuse_otel - PR #15931
    • Support tool usage messages with Langfuse OTEL integration - PR #15932
  • DataDog

    • Ensure key's metadata + guardrail is logged on DD - PR #15980
  • Opik

    • Enhance requester metadata retrieval from API key auth - PR #15897
    • User auth key metadata Documentation - PR #16004
  • SQS

    • Add Base64 handling for SQS Logger - PR #16028
  • General

    • Fix: User API key and team id and user id missing from custom callback is not misfiring - PR #15982

Guardrailsโ€‹

Prompt Managementโ€‹


Spend Tracking, Budgets and Rate Limitingโ€‹

  • Cost Tracking

    • Fix spend tracking for OCR/aOCR requests (log pages_processed + recognize OCRResponse) - PR #16070
  • Rate Limiting

    • Add support for Batch API Rate limiting - PR1 adds support for input based rate limits - PR #16075
    • Handle multiple rate limit types per descriptor and prevent IndexError - PR #16039

MCP Gatewayโ€‹

  • OAuth
    • Add support for dynamic client registration - PR #15921
    • Respect X-Forwarded- headers in OAuth endpoints - PR #16036

Performance / Loadbalancing / Reliability improvementsโ€‹

  • Memory Leak Fixes

    • Fix: prevent httpx DeprecationWarning memory leak in AsyncHTTPHandler - PR #16024
    • Fix: resolve memory accumulation caused by Pydantic 2.11+ deprecation warnings - PR #16110
    • Fix(apscheduler): prevent memory leaks from jitter and frequent job intervals - PR #15846
  • Configuration

    • Remove minimum validation for cache control injection index - PR #16149
    • Fix prompt_caching.md: wrong prompt_tokens definition - PR #16044

Documentation Updatesโ€‹

  • Provider Documentation

    • Use custom-llm-provider header in examples - PR #16055
    • Litellm docs readme fixes - PR #16107
    • Readme fixes add supported providers - PR #16109
  • Model References

    • Add supports vision field to qwen-vl models in model_prices_and_context_window.json - PR #16106
  • General Documentation


New Contributorsโ€‹

  • @RobGeada made their first contribution in PR #15975
  • @shanto12 made their first contribution in PR #15946
  • @dima-hx430 made their first contribution in PR #15976
  • @m-misiura made their first contribution in PR #15971
  • @ylgibby made their first contribution in PR #15947
  • @Somtom made their first contribution in PR #15909
  • @rodolfo-nobrega made their first contribution in PR #16023
  • @bernata made their first contribution in PR #15997
  • @AlbertDeFusco made their first contribution in PR #15881
  • @komarovd95 made their first contribution in PR #15789
  • @langpingxue made their first contribution in PR #15635
  • @OrionCodeDev made their first contribution in PR #16070
  • @sbinnee made their first contribution in PR #16078
  • @JetoPistola made their first contribution in PR #16106
  • @gvioss made their first contribution in PR #16093
  • @pale-aura made their first contribution in PR #16084
  • @tanvithakur94 made their first contribution in PR #16041
  • @li-boxuan made their first contribution in PR #16044
  • @1stprinciple made their first contribution in PR #15938
  • @raghav-stripe made their first contribution in PR #16137
  • @steve-gore-snapdocs made their first contribution in PR #16149

Full Changelogโ€‹

View complete changelog on GitHub