Azure OpenAI image and audio REST API reference (2025-04-01-preview)

This article documents the image generation and audio (speech) data plane inference REST API operations for Azure OpenAI in the 2025-04-01-preview release. For chat completions, embeddings, assistants, responses, vector stores, and all other operations, see the official Azure OpenAI REST API reference.

API specs

Managing and interacting with Azure OpenAI models and resources is divided across three primary API surfaces:

Control plane
Data plane - authoring
Data plane - inference

Each API surface/specification encapsulates a different set of Azure OpenAI capabilities. Each API has its own unique set of preview and stable/generally available (GA) API releases. Preview releases currently tend to follow a monthly cadence.

Important

There is now a new preview inference API. Learn more in our API lifecycle guide.

API	Latest preview release	Latest GA release	Specifications	Description
Control plane	`2025-07-01-preview`	`2025-06-01`	Spec files	The control plane API is used for operations like creating resources, model deployment, and other higher level resource management tasks. The control plane also governs what is possible to do with capabilities like Azure Resource Manager, Bicep, Terraform, and Azure CLI.
Data plane	`v1 preview`	`v1`	Spec files	The data plane API controls inference and authoring operations.

Authentication

Azure OpenAI provides two methods for authentication. You can use either API Keys or Microsoft Entra ID.

API Key authentication: For this type of authentication, all API requests must include the API Key in the api-key HTTP header. The Quickstart provides guidance for how to make calls with this type of authentication.
Microsoft Entra ID authentication: You can authenticate an API call using a Microsoft Entra token. Authentication tokens are included in a request as the Authorization header. The token provided must be preceded by Bearer, for example Bearer YOUR_AUTH_TOKEN. You can read our how-to guide on authenticating with Microsoft Entra ID.

REST API versioning

The service APIs are versioned using the api-version query parameter. All versions follow the YYYY-MM-DD date structure. For example:

POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/chat/completions?api-version=2024-06-01

Data plane inference

The rest of this article covers the image and audio operations in the 2025-04-01-preview preview release of the Azure OpenAI data plane inference specification.

For the GA image and audio operations, see the GA image and audio REST API reference.

Transcriptions - Create

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/transcriptions?api-version=2025-04-01-preview

Transcribes audio into the input language.

URI Parameters

Name	In	Required	Type	Description
endpoint	path	Yes	string url	Supported Azure OpenAI endpoints (protocol and hostname, for example: `https://aoairesource.openai.azure.com`. Replace "aoairesource" with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
deployment-id	path	Yes	string
api-version	query	Yes	string

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.

Name	Required	Type	Description
Authorization	True	string	Example: `Authorization: Bearer {Azure_OpenAI_Auth_Token}` To generate an auth token using Azure CLI: `az account get-access-token --resource https://cognitiveservices.azure.com` Type: oauth2 Authorization Url: `https://login.microsoftonline.com/common/oauth2/v2.0/authorize` scope: `https://ai.azure.com/.default`
api-key	True	string	Provide Azure OpenAI API key here

Request Body

Content-Type: multipart/form-data

Name	Type	Description	Required	Default
model	string	ID of the model to use. The options are `gpt-4o-transcribe`, `gpt-4o-mini-transcribe`, `gpt-4o-mini-transcribe-2025-12-15`, `whisper-1`, and `gpt-4o-transcribe-diarize`.	Yes
file	string	The audio file object to transcribe.	Yes
language	string	The language of the input audio. Supplying the input language in ISO-639-1 format improves accuracy and latency.	No
prompt	string	An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.	No
response_format	audioResponseFormat	Defines the format of the output.	No
temperature	number	The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model uses log probability to automatically increase the temperature until certain thresholds are hit.	No	0
timestamp_granularities[]	array	The timestamp granularities to populate for this transcription. `response_format` must be set `verbose_json` to use timestamp granularities. Either or both of these options are supported: `word`, or `segment`. Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency.	No	['segment']

Responses

Status Code: 200

Description: OK

Content-Type	Type	Description
application/json	object
text/plain	string	Transcribed text in the output format (when response_format was one of `text`, `vtt` or `srt`).

Examples

Example

Gets transcribed text and associated metadata from provided spoken audio data.

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/transcriptions?api-version=2025-04-01-preview

Responses: Status Code: 200

{
  "body": {
    "text": "A structured object when requesting json or verbose_json"
  }
}

Example

Gets transcribed text and associated metadata from provided spoken audio data.

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/transcriptions?api-version=2025-04-01-preview

"---multipart-boundary\nContent-Disposition: form-data; name=\"file\"; filename=\"file.wav\"\nContent-Type: application/octet-stream\n\nRIFF..audio.data.omitted\n---multipart-boundary--"

Responses: Status Code: 200

{
  "type": "string",
  "example": "plain text when requesting text, srt, or vtt"
}

Translations - Create

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/translations?api-version=2025-04-01-preview

Transcribes and translates input audio into English text.

URI Parameters

Name	In	Required	Type	Description
endpoint	path	Yes	string url	Supported Azure OpenAI endpoints (protocol and hostname, for example: `https://aoairesource.openai.azure.com`. Replace "aoairesource" with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
deployment-id	path	Yes	string
api-version	query	Yes	string

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.

Name	Required	Type	Description
Authorization	True	string	Example: `Authorization: Bearer {Azure_OpenAI_Auth_Token}` To generate an auth token using Azure CLI: `az account get-access-token --resource https://cognitiveservices.azure.com` Type: oauth2 Authorization Url: `https://login.microsoftonline.com/common/oauth2/v2.0/authorize` scope: `https://ai.azure.com/.default`
api-key	True	string	Provide Azure OpenAI API key here

Request Body

Content-Type: multipart/form-data

Name	Type	Description	Required	Default
file	string	The audio file to translate.	Yes
prompt	string	An optional text to guide the model's style or continue a previous audio segment. The prompt should be in English.	No
response_format	audioResponseFormat	Defines the format of the output.	No
temperature	number	The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model uses log probability to automatically increase the temperature until certain thresholds are hit.	No	0

Responses

Status Code: 200

Description: OK

Content-Type	Type	Description
application/json	object
text/plain	string	Transcribed text in the output format (when response_format was one of text, vtt, or srt).

Examples

Example

Gets English language transcribed text and associated metadata from provided spoken audio data.

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/translations?api-version=2025-04-01-preview

"---multipart-boundary\nContent-Disposition: form-data; name=\"file\"; filename=\"file.wav\"\nContent-Type: application/octet-stream\n\nRIFF..audio.data.omitted\n---multipart-boundary--"

Responses: Status Code: 200

{
  "body": {
    "text": "A structured object when requesting json or verbose_json"
  }
}

Example

Gets English language transcribed text and associated metadata from provided spoken audio data.

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/translations?api-version=2025-04-01-preview

"---multipart-boundary\nContent-Disposition: form-data; name=\"file\"; filename=\"file.wav\"\nContent-Type: application/octet-stream\n\nRIFF..audio.data.omitted\n---multipart-boundary--"

Responses: Status Code: 200

{
  "type": "string",
  "example": "plain text when requesting text, srt, or vtt"
}

Speech - Create

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/speech?api-version=2025-04-01-preview

Generates audio from the input text.

URI Parameters

Name	In	Required	Type	Description
endpoint	path	Yes	string url	Supported Azure OpenAI endpoints (protocol and hostname, for example: `https://aoairesource.openai.azure.com`. Replace "aoairesource" with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
deployment-id	path	Yes	string
api-version	query	Yes	string

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.

Name	Required	Type	Description
Authorization	True	string	Example: `Authorization: Bearer {Azure_OpenAI_Auth_Token}` To generate an auth token using Azure CLI: `az account get-access-token --resource https://cognitiveservices.azure.com` Type: oauth2 Authorization Url: `https://login.microsoftonline.com/common/oauth2/v2.0/authorize` scope: `https://ai.azure.com/.default`
api-key	True	string	Provide Azure OpenAI API key here

Request Body

Content-Type: multipart/form-data

Name	Type	Description	Required	Default
input	string	The text to synthesize audio for. The maximum length is 4,096 characters.	Yes
response_format	enum	The format to synthesize the audio in. Possible values: `mp3`, `opus`, `aac`, `flac`, `wav`, `pcm`	No
speed	number	The speed of the synthesized audio. Select a value from `0.25` to `4.0`. `1.0` is the default.	No	1.0
voice	enum	The voice to use for speech synthesis. Possible values: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`	Yes

Responses

Status Code: 200

Description: OK

Content-Type	Type	Description
application/octet-stream	string

Examples

Example

Synthesizes audio from the provided text.

POST https://{endpoint}/openai/deployments/{deployment-id}/audio/speech?api-version=2025-04-01-preview

{
 "input": "Hi! What are you going to make?",
 "voice": "fable",
 "response_format": "mp3"
}

Responses: Status Code: 200

{
  "body": "101010101"
}

Image generations - Create

POST https://{endpoint}/openai/deployments/{deployment-id}/images/generations?api-version=2025-04-01-preview

Generates a batch of images from a text caption on a given image generation model deployment

URI Parameters

Name	In	Required	Type	Description
endpoint	path	Yes	string url	Supported Azure OpenAI endpoints (protocol and hostname, for example: `https://aoairesource.openai.azure.com`. Replace "aoairesource" with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
deployment-id	path	Yes	string
api-version	query	Yes	string

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.

Name	Required	Type	Description
Authorization	True	string	Example: `Authorization: Bearer {Azure_OpenAI_Auth_Token}` To generate an auth token using Azure CLI: `az account get-access-token --resource https://cognitiveservices.azure.com` Type: oauth2 Authorization Url: `https://login.microsoftonline.com/common/oauth2/v2.0/authorize` scope: `https://ai.azure.com/.default`
api-key	True	string	Provide Azure OpenAI API key here

Request Body

Content-Type: application/json

Name	Type	Description	Required	Default
background	imageBackground	Allows to set transparency for the background of the generated images. This parameter is only supported for gpt-image-1 series models.	No	auto
n	integer	The number of images to generate. For dall-e-3, only n=1 is supported.	No	1
output_compression	integer	The compression level (0-100%) for the generated images. This parameter is only supported for gpt-image-1 series models with the jpeg output format.	No	100
output_format	imagesOutputFormat	The file format in which the generated images are returned. Only supported for gpt-image-1 series models.	No	png
prompt	string	A text description of the desired image(s). The maximum length is 32000 characters for gpt-image-1 series and 4000 characters for dall-e-3	Yes
partial_images	integer	The number of partial images to generate. This parameter is used for streaming responses that return partial images. Value must be between 0 and 3. When set to 0, the response will be a single image sent in one streaming event. Note that the final image may be sent before the full number of partial images are generated if the full image is generated more quickly.	0
stream	boolean	Edit the image in streaming mode.	no	`false`
quality	imageQuality	The quality of the image that will be generated.	No	auto
response_format	imagesResponseFormat	The format in which the generated images are returned. This parameter isn't supported for `gpt-image-1`-series models which will always return base64-encoded images. Possible values: `url`, `b64_json`.	No	url
size	imageSize	The size of the generated images.	No	auto
style	imageStyle	The style of the generated images. Only supported for dall-e-3.	No	vivid
user	string	A unique identifier representing your end-user, which can help to monitor and detect abuse.	No

Responses

Status Code: 200

Description: Ok

Content-Type	Type	Description
application/json	generateImagesResponse

Status Code: default

Description: An error occurred.

Content-Type	Type	Description
application/json	dalleErrorResponse

Examples

Example

Creates images given a prompt.

POST https://{endpoint}/openai/deployments/{deployment-id}/images/generations?api-version=2025-04-01-preview

{
 "prompt": "In the style of WordArt, Microsoft Clippy wearing a cowboy hat.",
 "n": 1,
 "style": "natural",
 "quality": "standard"
}

Responses: Status Code: 200

{
  "body": {
    "created": 1698342300,
    "data": [
      {
        "revised_prompt": "A vivid, natural representation of Microsoft Clippy wearing a cowboy hat.",
        "prompt_filter_results": {
          "sexual": {
            "severity": "safe",
            "filtered": false
          },
          "violence": {
            "severity": "safe",
            "filtered": false
          },
          "hate": {
            "severity": "safe",
            "filtered": false
          },
          "self_harm": {
            "severity": "safe",
            "filtered": false
          },
          "profanity": {
            "detected": false,
            "filtered": false
          },
          "custom_blocklists": {
            "filtered": false,
            "details": []
          }
        },
        "url": "https://dalletipusw2.blob.core.windows.net/private/images/e5451cc6-b1ad-4747-bd46-b89a3a3b8bc3/generated_00.png?se=2023-10-27T17%3A45%3A09Z&...",
        "content_filter_results": {
          "sexual": {
            "severity": "safe",
            "filtered": false
          },
          "violence": {
            "severity": "safe",
            "filtered": false
          },
          "hate": {
            "severity": "safe",
            "filtered": false
          },
          "self_harm": {
            "severity": "safe",
            "filtered": false
          }
        }
      }
    ]
  }
}

Image generations - Edit

POST https://{endpoint}/openai/deployments/{deployment-id}/images/edits?api-version=2025-04-01-preview

Edits an image from a text caption on a given gpt-image-1 model deployment

URI Parameters

Name	In	Required	Type	Description
endpoint	path	Yes	string url	Supported Azure OpenAI endpoints (protocol and hostname, for example: `https://aoairesource.openai.azure.com`. Replace "aoairesource" with your Azure OpenAI resource name). https://{your-resource-name}.openai.azure.com
deployment-id	path	Yes	string
api-version	query	Yes	string

Request Header

Use either token based authentication or API key. Authenticating with token based authentication is recommended and more secure.

Name	Required	Type	Description
Authorization	True	string	Example: `Authorization: Bearer {Azure_OpenAI_Auth_Token}` To generate an auth token using Azure CLI: `az account get-access-token --resource https://cognitiveservices.azure.com` Type: oauth2 Authorization Url: `https://login.microsoftonline.com/common/oauth2/v2.0/authorize` scope: `https://ai.azure.com/.default`
api-key	True	string	Provide Azure OpenAI API key here

Request Body

Content-Type: multipart/form-data

Name	Type	Description	Required	Default
image	string or array	The image(s) to edit. Must be a supported image file or an array of images. Each image should be a png, or jpg file less than 50MB.	Yes
input_fidelity	string	Control how much effort the model will exert to match the style and features, especially facial features, of input images. This parameter is only supported for gpt-image-1 series models. Supports `high` and `low`.	no	`low`.
mask	string	An additional image whose fully transparent areas (e.g., where alpha is zero) indicate where the image should be edited. If there are multiple images provided, the mask will be applied to the first image. Must be a valid PNG file, less than 4MB, and have the same dimensions as the image.	No
n	integer	The number of images to generate. Must be between 1 and 10.	No	1
prompt	string	A text description of the desired image(s). The maximum length is 32000 characters.	Yes
quality	imageQuality	The quality of the image that will be generated.	No	auto
partial_images	The number of partial images to generate. This parameter is used for streaming responses that return partial images. Value must be between 0 and 3. When set to 0, the response will be a single image sent in one streaming event. Note that the final image may be sent before the full number of partial images are generated if the full image is generated more quickly.
stream	boolean	Edit the image in streaming mode.	no	`false`
response_format	imagesResponseFormat	The format in which the generated images are returned.	No	url
size	imageSize	The size of the generated images.	No	auto
user	string	A unique identifier representing your end-user, which can help to monitor and detect abuse.	No

Responses

Status Code: 200

Description: Ok

Content-Type	Type	Description
application/json	generateImagesResponse

Status Code: default

Description: An error occurred.

Content-Type	Type	Description
application/json	dalleErrorResponse

Components

For the schema definitions used by chat, completions, embeddings, responses, and other text operations, see the Azure OpenAI REST API reference. The following schemas support the image and audio operations on this page.

innerErrorCode

Error codes for the inner error object.

Property	Value
Description	Error codes for the inner error object.
Type	string
Values	`ResponsibleAIPolicyViolation`

dalleErrorResponse

Name	Type	Description	Required	Default
error	dalleError		No

dalleError

Name	Type	Description	Required
inner_error	dalleInnerError	Inner error with additional details.	No
param	string		No
type	string		No

dalleInnerError

Inner error with additional details.

Name	Type	Description	Required
code	innerErrorCode	Error codes for the inner error object.	No
content_filter_results	dalleFilterResults	Information about the content filtering category (hate, sexual, violence, self_harm), if it has been detected, as well as the severity level (very_low, low, medium, high-scale that determines the intensity and risk level of harmful content) and if it has been filtered or not. Information about jailbreak content and profanity, if it has been detected, and if it has been filtered or not. And information about customer block list, if it has been filtered and its id.	No
revised_prompt	string	The prompt that was used to generate the image, if there was any revision to the prompt.	No

contentFilterSeverityResult

Name	Type	Description	Required	Default
filtered	boolean		Yes
severity	string		No

contentFilterDetectedResult

Name	Type	Description	Required	Default
detected	boolean		No
filtered	boolean		Yes

contentFilterDetailedResults

Content filtering results with a detail of content filter ids for the filtered segments.

Name	Type	Description	Required	Default
details	array		No
filtered	boolean		Yes

dalleFilterResults

Information about the content filtering category (hate, sexual, violence, self_harm), if it has been detected, as well as the severity level (very_low, low, medium, high-scale that determines the intensity and risk level of harmful content) and if it has been filtered or not. Information about jailbreak content and profanity, if it has been detected, and if it has been filtered or not. And information about customer block list, if it has been filtered and its id.

Name	Type	Description	Required
custom_blocklists	contentFilterDetailedResults	Content filtering results with a detail of content filter ids for the filtered segments.	No
hate	contentFilterSeverityResult		No
jailbreak	contentFilterDetectedResult		No
profanity	contentFilterDetectedResult		No
self_harm	contentFilterSeverityResult		No
sexual	contentFilterSeverityResult		No
violence	contentFilterSeverityResult		No

audioResponseFormat

Defines the format of the output.

Property	Value
Description	Defines the format of the output.
Type	string
Values	`json` `text` `srt` `verbose_json` `vtt`

imageQuality

The quality of the image that will be generated.

Property	Value
Description	The quality of the image that will be generated.
Type	string
Default	auto
Values	`auto` `high` `medium` `low` `hd` `standard`

imagesResponseFormat

The format in which the generated images are returned.

Property	Value
Description	The format in which the generated images are returned.
Type	string
Default	url
Values	`url` `b64_json`

imagesOutputFormat

The file format in which the generated images are returned. Only supported for series models.

Property	Value
Description	The file format in which the generated images are returned. Only supported for gpt-image-1 series models.
Type	string
Default	png
Values	`png` `jpeg`

imageSize

The size of the generated images.

Property	Value
Description	The size of the generated images.
Type	string
Default	auto
Values	`auto` `1792x1024` `1024x1792` `1024x1024` `1024x1536` `1536x1024`

imageStyle

The style of the generated images. Only supported for dall-e-3.

Property	Value
Description	The style of the generated images. Only supported for dall-e-3.
Type	string
Default	vivid
Values	`vivid` `natural`

imageBackground

Allows to set transparency for the background of the generated image(s). This parameter is only supported for gpt-image-1 series models.

Property	Value
Description	Allows to set transparency for the background of the generated image(s). This parameter is only supported for gpt-image-1 series models.
Type	string
Default	auto
Values	`transparent` `opaque` `auto`

generateImagesResponse

Name	Type	Description	Required
created	integer	The unix timestamp when the operation was created.	Yes
data	array	The result data of the operation, if successful	Yes
usage	imageGenerationsUsage	Represents token usage details for image generation requests. Only for gpt-image-1 series models.	No

imageGenerationsUsage

Represents token usage details for image generation requests. Only for gpt-image-1 series models.

Name	Type	Description	Required
input_tokens	integer	The number of input tokens.	No
input_tokens_details	object	A detailed breakdown of the input tokens.	No
└─ image_tokens	integer	The number of image tokens.	No
└─ text_tokens	integer	The number of text tokens.	No
output_tokens	integer	The number of output tokens.	No
total_tokens	integer	The total number of tokens used.	No

Next steps

Learn about models and fine-tuning with the REST API. Learn more about the underlying models that power Azure OpenAI.

Feedback

Was this page helpful?

Last updated on 2026-06-24

Azure OpenAI image and audio REST API reference (2025-04-01-preview)

API specs

Authentication

REST API versioning

Data plane inference

Transcriptions - Create

URI Parameters

Request Header

Request Body

Responses

Examples

Example

Example

Translations - Create

URI Parameters

Request Header

Request Body

Responses

Examples

Example

Example

Speech - Create

URI Parameters

Request Header

Request Body

Responses

Examples

Example

Image generations - Create

URI Parameters

Request Header

Request Body

Responses

Examples

Example

Image generations - Edit

URI Parameters

Request Header

Request Body

Responses

Components

innerErrorCode

dalleErrorResponse

dalleError

dalleInnerError

contentFilterSeverityResult

contentFilterDetectedResult

contentFilterDetailedResults

dalleFilterResults

audioResponseFormat

imageQuality

imagesResponseFormat

imagesOutputFormat

imageSize

imageStyle

imageBackground

generateImagesResponse

imageGenerationsUsage

Next steps

Feedback

Additional resources