Hello Srinivas Kasu,
Greetings! Thanks for raising this question in Q&A forum.
This is a great question and the behavior you are seeing in the new Foundry Guardrails portal is by design. Let me explain why and then walk you through how to get annotation-only behavior if that is what you need.
Why the portal only shows "Block" for harmful categories:
For the four core harm categories, which are hate, sexual, violence, and self-harm, the portal lets you configure severity thresholds (Low, Medium, High) and content is annotated by each category and blocked according to the threshold. The design intent for these core categories is that blocking is the primary enforcement action. The threshold slider is where you control sensitivity, not the action type.
Some filters, such as Prompt Shields and Protected material detection, enable you to determine if the model should annotate and/or block content, which is why you see the "Annotate" option available for those risk types but not for the core harmful categories in the portal UI.
However, annotation mode is fully supported for harmful categories at the API level:
Annotations can be enabled even for filters and severity levels that have been disabled from blocking content. Optional model annotations can be set to annotate mode (returns information when content is flagged, but not filtered) or filter mode (returns information when content is flagged and filtered).
This means if you want "Annotate only" behavior for harmful categories (detect and flag in the API response without actually blocking the content), you need to configure this through the REST API rather than the portal UI.
Step 1: Use the REST API to create a guardrail with annotate mode.
In the Azure AI Services REST API, a guardrail is represented as a RAI policy — a resource-level object in Azure Resource Manager. Use the RAI Policies - Create Or Update operation to create or update a guardrail. Specify the controls in the request body, including the risk category, severity level, and whether to block or annotate.
Here is an example REST API call to create a guardrail with annotate-only mode for harmful categories:
PUT https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.CognitiveServices/accounts/{accountName}/raiPolicies/{raiPolicyName}?api-version=2024-10-01
{
"properties": {
"mode": "Asynchronous_filter",
"contentFilters": [
{
"name": "hate",
"severityThreshold": "Low",
"blocking": false,
"enabled": true,
"source": "Prompt"
},
{
"name": "violence",
"severityThreshold": "Low",
"blocking": false,
"enabled": true,
"source": "Prompt"
},
{
"name": "sexual",
"severityThreshold": "Low",
"blocking": false,
"enabled": true,
"source": "Prompt"
},
{
"name": "selfharm",
"severityThreshold": "Low",
"blocking": false,
"enabled": true,
"source": "Prompt"
}
]
}
}
Setting "blocking": false while keeping "enabled": true is what activates annotate-only mode. The guardrail will detect and flag content in the API response without blocking the request.
Step 2: Read annotations from the API response.
When annotations are enabled, the following information is returned via the API for the categories hate, sexual, violence, and self-harm: detected (true or false), filtered (true or false).
Your application can then read the content_filter_results object in the response and take your own action based on the detected flag, such as logging, routing to a human reviewer, or applying downstream logic.
Step 3: Check the annotation API version requirement.
Annotations are returned for all scenarios when using any preview API version starting from 2023-06-01-preview, as well as the GA API version 2024-02-01. Make sure your inference calls use at least API version 2024-02-01 or a later preview version to receive annotation data in the response.
Summary:
The Foundry portal currently surfaces "Block" as the only action for harmful categories by design, since blocking is the recommended enforcement mode for those risk types. Annotate mode returns information when content is flagged, but not filtered, and this is configurable via the RAI Policy REST API by setting blocking: false. This gives you full annotation visibility without the content being blocked, which is useful for monitoring, auditing, or building custom moderation workflows on top of the detection results.
If this answer helps you kindly accept the answer which will help others who have similar questions.
Best Regards,
Jerald Felix.