Generative AI models maintenance policy

This page describes the model maintenance policy for the Foundation Model APIs pay-per-token, Foundation Model APIs provisioned throughput, Batch Inference with ai_query, and Foundation Model Fine-tuning offerings.

To continue supporting the most state-of-the-art models, Databricks manages models through a lifecycle that progresses from update to deprecation to eventual retirement.

Update: Databricks applies incremental updates to a model to deliver optimizations. See Model updates.
Deprecated: A deprecated model is no longer recommended for new workloads, but remains available in workspaces with existing usage of the model. Workspaces that aren't using the model at the time of deprecation no longer have access to it.
Retired: A retired model is no longer accessible, and support for the model is fully discontinued. Any workload using the model stops working.

Model deprecation policy

When Databricks deprecates a model, that model is no longer recommended and is planned for retirement. Databricks announces retirement dates for deprecated models with the notification timelines summarized in the following sections. Retirement dates might be announced at the time of deprecation or at a later date. After the retirement date, the model is no longer accessible, and any workload using it stops working.

For deprecated and retired models and their announced retirement dates, see Deprecated and retired models. For partner models, see Partner model retirement policy.

Important

The deprecation policies that apply to the Foundation Model APIs pay-per-token and Foundation Model Fine-tuning offerings only impact supported chat and completion models.

Foundation Model APIs

The following table summarizes the deprecation policy for the Foundation Model APIs pay-per-token, Foundation Model APIs provisioned throughput, and Batch Inference with ai_query offerings.

Deprecation notification	Transition to retirement	On the retirement date
Databricks takes the following steps to notify customers about a model deprecation: In the Databricks UI, a warning message indicates that the model is deprecated. The applicable documentation contains a notice that indicates the model is deprecated, along with a retirement date if one has been announced.	After deprecating a model, Databricks announces a retirement date three months or more in the future. During this transition period: The model remains available only for workspaces with existing workloads that use it, until the announced retirement date. The deprecated model isn't accessible from workspaces that weren't actively using it at the time of deprecation. Customers with existing workloads should migrate to the recommended replacement model or sunset affected workloads.	The model is no longer available for use and is removed from the product. Any existing workloads using the model stop working. Applicable documentation is updated to indicate that the model is no longer available, and to recommend a replacement model.

Partner model retirement policy

Partner models are models that third-party partners — specifically OpenAI, Anthropic, and Google — provide through Foundation Model APIs. For these partner models, Databricks generally follows the same deprecation timelines and policies described above.

However, partners might provide retirement dates shorter than the three-month transition period that Databricks publishes. In these cases, Databricks attempts to bridge the gap by temporarily redirecting models to a similar version, so customers receive the full transition time.

For example, if a partner model retirement is announced with one month's lead time instead of three, Databricks redirects the model for an additional two months to prevent immediate breakage and allow time for migration. Queries fail at the end of the full three-month period.

Note

This redirection can only occur if the replacement model has the same price and is backwards compatible. The replacement model is usually an incremental model version, like 3.0 versus 3.1.

Foundation Model Fine-tuning

The following table summarizes the deprecation policy for Foundation Model Fine-tuning.

Deprecation notification	Transition to retirement	On the retirement date
Databricks takes the following steps to notify customers about a model deprecation: In the Experiments tab, a warning message appears in the drop-down menu for Foundation Model Fine-tuning that indicates that the model is deprecated. The applicable documentation contains a notice that indicates the model is deprecated, along with a retirement date if one has been announced.	After deprecating a model, Databricks announces a retirement date three months or more in the future. During this transition period, customers should migrate their workloads to a recommended replacement model, or delete the affected endpoint.	The model is no longer available for use and is removed from the product. Applicable documentation is updated to recommend using a replacement model.

Model updates

Databricks might ship incremental model updates to deliver optimizations. When Databricks updates a model, the endpoint URL remains the same, but the model ID in the response object changes to reflect the date of the update. For example, if Databricks ships an update to meta-llama/Meta-Llama-3.3-70B on 3/4/2024, the model name in the response object updates to meta-llama/Meta-Llama-3.3-70B-030424. Databricks maintains a version history of the updates. Reach out to your Databricks account team for more details.

Deprecated and retired models

The following sections list models that are deprecated (no longer recommended for new workloads) or retired (end-of-life and no longer available). Retirement dates for deprecated models are announced at least three months in advance.

Foundation Model APIs retirements

The following table shows model retirements, their retirement dates, and recommended replacement models to use for Foundation Model APIs pay-per-token and provisioned throughput serving workloads. Databricks recommends that you migrate your applications to use replacement models before the indicated retirement date.

Note

OpenAI and Google Gemini models are only available through ADI Services, provided by Databricks.

Partner model	Retirement date	Recommended replacement model
OpenAI GPT-5.1 Codex Max	Pay-per-token: July 16, 2026	OpenAI GPT-5.5
OpenAI GPT-5.1 Codex Mini	Pay-per-token: July 16, 2026	OpenAI GPT-5.4 Codex Mini
OpenAI GPT-5.2 Codex	Pay-per-token: July 16, 2026	OpenAI GPT-5.5
Anthropic Claude 3.7 Sonnet	Pay-per-token: April 12, 2026	Use the latest Claude Sonnet model
Gemini 3 Pro	Provisioned throughput: March 26, 2026	Gemini 3.1 Pro. To allow more time for migration, between March 26, 2026 and June 7, 2026, API calls to Gemini 3 Pro will be temporarily redirected to Gemini 3.1 Pro. The pricing for both models is identical.

Open model	Retirement date	Recommended replacement model
Meta Llama 3.1 405B	Pay-per-token: February 15, 2026 Provisioned throughput: May 15, 2026	OpenAI GPT OSS 120B
DBRX / DBRX Instruct	Pay-per-token: April 30, 2025 Provisioned throughput: December 19, 2025	Pay-per-token: Meta-Llama-4-Maverick Provisioned throughput: Comparable model on the same offering, like Llama 3.2, 3.3, or 4 model of similar size.
Mixtral 8x7B / Mixtral-8x7B Instruct	Pay-per-token: April 30, 2025 Provisioned throughput: February 27, 2026	Pay-per-token: Meta-Llama-4-Maverick Provisioned throughput: Comparable model on the same offering, like Llama 3.2, 3.3, or 4 model of similar size.
Meta Llama 3 (70B)	Pay-per-token: July 23, 2024 (Meta-Llama-3-70B-Instruct); December 11, 2024 (Meta-Llama-3.1-70B-Instruct) Provisioned throughput: February 27, 2026	Pay-per-token: Meta-Llama-4-Maverick Provisioned throughput: Comparable model on the same offering, like Llama 3.2, 3.3, or 4 model of similar size.
Meta Llama 3 8B	Provisioned throughput: February 27, 2026	Comparable model on the same offering, like Llama 3.2, 3.3, or 4 model of similar size.
Meta Llama 2 70B / Meta-Llama-2-70B-Chat	Pay-per-token: October 30, 2024 Provisioned throughput: February 27, 2026	Pay-per-token: Meta-Llama-4-Maverick Provisioned throughput: Comparable model on the same offering, like Llama 3.2, 3.3, or 4 model of similar size.
Meta Llama 2 13B	Provisioned throughput: February 27, 2026	Comparable model on the same offering, like Llama 3.2, 3.3, or 4 model of similar size.
Meta Llama 2 7B	Provisioned throughput: February 27, 2026	Comparable model on the same offering, like Llama 3.2, 3.3, or 4 model of similar size.
Mistral 7B	Provisioned throughput: February 27, 2026	Comparable model on the same offering, like Llama 3.2, 3.3, or 4 model of similar size.
MPT 30B / MPT 30B Instruct	Pay-per-token: August 30, 2024 Provisioned throughput: December 19, 2025	Pay-per-token: Meta-Llama-4-Maverick Provisioned throughput: Comparable model on the same offering, like Llama 3.2, 3.3, or 4 model of similar size.
MPT 7B / MPT 7B Instruct	Pay-per-token: August 30, 2024 Provisioned throughput: December 19, 2025	Pay-per-token: Meta-Llama-4-Maverick Provisioned throughput: Comparable model on the same offering, like Llama 3.2, 3.3, or 4 model of similar size.

If you require long-term support for a specific model version, Databricks recommends using Foundation Model APIs provisioned throughput for your serving workloads.

Foundation Model Fine-tuning retirements

The following table shows retired model families, their retirement dates, and recommended replacement model families to use for Foundation Model Fine-tuning workloads. Databricks recommends that you migrate your applications to use replacement models before the indicated retirement date.

Model family	Retirement date	Recommended replacement model family
DBRX	April 30, 2025	Llama-3.1-70B
Mixtral	April 30, 2025	Llama-3.1-70B
Mistral	April 30, 2025	Llama-3.1-8B
Meta-Llama-3.1-405B	January 30, 2025	Llama-3.1-70B
Meta-Llama-3	January 7, 2025	Meta-Llama-3.1
Meta-Llama-2	January 7, 2025	Meta-Llama-3.1
Code Llama	January 7, 2025	Meta-Llama-3.1

Find workloads that use retired models

Use the following query to find workloads that are using deprecated models and identify their owners.

SELECT
   eu.requester,
   se.endpoint_name,
   se.entity_name,
   COUNT(*) AS request_count,
   SUM(eu.input_token_count) AS total_input_tokens,
   SUM(eu.output_token_count) AS total_output_tokens,
   MIN(eu.request_time) AS first_request,
   MAX(eu.request_time) AS last_request
 FROM system.serving.endpoint_usage eu
 JOIN system.serving.served_entities se
   ON eu.served_entity_id = se.served_entity_id
 WHERE LOWER(se.entity_name) LIKE '%<retired-model-name>%'
 GROUP BY eu.requester, se.endpoint_name, se.entity_name
 ORDER BY request_count DESC

Feedback

Was this page helpful?

Last updated on 2026-07-02