How to set "annotate" for harmful categories within the new Foundry service.

Question

How to set "annotate" for harmful categories within the new Foundry service.

Srinivas Kasu 0 Microsoft Employee

Hi,
Is it possible to set "Annotate" action for the risk types under the content harms when creating a new Guardrail in the new Foundry service.
Right now, I can only see "Block" as the Action as seen the picture below.

User's image

Thanks in Advance

Anshika Varshney 14,085 Reputation points Microsoft External Staff Moderator

2026-06-25T20:03:52.1666667+00:00

Hello @Srinivas Kasu

I hope this helps you get back on track! If you're still facing issues, could you share more details?
Anshika Varshney 14,085 Reputation points Microsoft External Staff Moderator

2026-06-28T06:52:07.6133333+00:00

Hello @Srinivas Kasu

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Thankyou!
Anshika Varshney 14,085 Reputation points Microsoft External Staff Moderator

2026-07-01T15:08:01.01+00:00

Hello @Srinivas Kasu

Please let me know if the issue persists after these checks. If you have any remaining questions or need additional details, I’ll be glad to provide further clarification or guidance.

If the above steps resolve your issue, please take a moment to mark it as Accepted. This helps others in the community with the same question find the solution more easily.

Thankyou!

2 answers

Your answer

Anshika Varshney 14,085 Reputation points Microsoft External Staff Moderator

2026-06-25T20:03:52.1666667+00:00

Hello @Srinivas Kasu

I hope this helps you get back on track! If you're still facing issues, could you share more details?
Anshika Varshney 14,085 Reputation points Microsoft External Staff Moderator

2026-06-28T06:52:07.6133333+00:00

Hello @Srinivas Kasu

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Thankyou!
Anshika Varshney 14,085 Reputation points Microsoft External Staff Moderator

2026-07-01T15:08:01.01+00:00

Hello @Srinivas Kasu

Please let me know if the issue persists after these checks. If you have any remaining questions or need additional details, I’ll be glad to provide further clarification or guidance.

If the above steps resolve your issue, please take a moment to mark it as Accepted. This helps others in the community with the same question find the solution more easily.

Thankyou!

Answer 1

Hello @Srinivas Kasu

You can configure the optional harmful categories in annotate mode, so they return detection metadata without filtering the response. optional models support both annotate and filter, and annotations are returned separately from blocking behavior. [learn.microsoft]

For example, the docs describe annotate mode for models such as prompt shield, protected material, and groundedness, where the API returns detected / filtered fields and, for groundedness, offset details as well. The standard harm categories like hate, sexual, violence, and self-harm return severity and filtered status, but the optional models are the ones explicitly described as having annotate vs. filter modes.

If you are using the Azure OpenAI / Foundry API, make sure you are on a supported API version that returns annotations, such as 2024-02-01 GA or later preview versions mentioned in the doc.

I Hope this helps. Do let me know if you have any further queries.
Thankyou!

Answer 2

Hello Srinivas Kasu,

Greetings! Thanks for raising this question in Q&A forum.

This is a great question and the behavior you are seeing in the new Foundry Guardrails portal is by design. Let me explain why and then walk you through how to get annotation-only behavior if that is what you need.

Why the portal only shows "Block" for harmful categories:

For the four core harm categories, which are hate, sexual, violence, and self-harm, the portal lets you configure severity thresholds (Low, Medium, High) and content is annotated by each category and blocked according to the threshold. The design intent for these core categories is that blocking is the primary enforcement action. The threshold slider is where you control sensitivity, not the action type.

Some filters, such as Prompt Shields and Protected material detection, enable you to determine if the model should annotate and/or block content, which is why you see the "Annotate" option available for those risk types but not for the core harmful categories in the portal UI.

However, annotation mode is fully supported for harmful categories at the API level:

Annotations can be enabled even for filters and severity levels that have been disabled from blocking content. Optional model annotations can be set to annotate mode (returns information when content is flagged, but not filtered) or filter mode (returns information when content is flagged and filtered).

This means if you want "Annotate only" behavior for harmful categories (detect and flag in the API response without actually blocking the content), you need to configure this through the REST API rather than the portal UI.

Step 1: Use the REST API to create a guardrail with annotate mode.

In the Azure AI Services REST API, a guardrail is represented as a RAI policy — a resource-level object in Azure Resource Manager. Use the RAI Policies - Create Or Update operation to create or update a guardrail. Specify the controls in the request body, including the risk category, severity level, and whether to block or annotate.

Here is an example REST API call to create a guardrail with annotate-only mode for harmful categories:

PUT https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.CognitiveServices/accounts/{accountName}/raiPolicies/{raiPolicyName}?api-version=2024-10-01

{
  "properties": {
    "mode": "Asynchronous_filter",
    "contentFilters": [
      {
        "name": "hate",
        "severityThreshold": "Low",
        "blocking": false,
        "enabled": true,
        "source": "Prompt"
      },
      {
        "name": "violence",
        "severityThreshold": "Low",
        "blocking": false,
        "enabled": true,
        "source": "Prompt"
      },
      {
        "name": "sexual",
        "severityThreshold": "Low",
        "blocking": false,
        "enabled": true,
        "source": "Prompt"
      },
      {
        "name": "selfharm",
        "severityThreshold": "Low",
        "blocking": false,
        "enabled": true,
        "source": "Prompt"
      }
    ]
  }
}

Setting "blocking": false while keeping "enabled": true is what activates annotate-only mode. The guardrail will detect and flag content in the API response without blocking the request.

Step 2: Read annotations from the API response.

When annotations are enabled, the following information is returned via the API for the categories hate, sexual, violence, and self-harm: detected (true or false), filtered (true or false).

Your application can then read the content_filter_results object in the response and take your own action based on the detected flag, such as logging, routing to a human reviewer, or applying downstream logic.

Step 3: Check the annotation API version requirement.

Annotations are returned for all scenarios when using any preview API version starting from 2023-06-01-preview, as well as the GA API version 2024-02-01. Make sure your inference calls use at least API version 2024-02-01 or a later preview version to receive annotation data in the response.

Summary:

The Foundry portal currently surfaces "Block" as the only action for harmful categories by design, since blocking is the recommended enforcement mode for those risk types. Annotate mode returns information when content is flagged, but not filtered, and this is configurable via the RAI Policy REST API by setting blocking: false. This gives you full annotation visibility without the content being blocked, which is useful for monitoring, auditing, or building custom moderation workflows on top of the detection results.

If this answer helps you kindly accept the answer which will help others who have similar questions.

Best Regards,

Jerald Felix.