Edit

Get data from Azure Storage

In this article, you learn how to get data from Azure Storage (Azure Data Lake Storage Gen2 containers, blob containers, or individual blobs) into a table in a KQL database. You can ingest data continuously or as a one-time ingestion. After ingestion completes, the data is available for query.

  • Continuous ingestion (preview): Continuous ingestion sets up an ingestion pipeline that allows an eventhouse to listen to Azure Storage events. The pipeline notifies the eventhouse to pull information when subscribed events occur. The events are BlobCreated and BlobRenamed.

  • One-time ingestion: Use this method to retrieve data from Azure Storage as a one-time operation.

    Note

Prerequisites

Prerequisites for continuous ingestion

In Azure:

  • Register the Event Grid resource provider with your Azure subscription.
  • Assign Storage Blob Data Reader role permissions to the workspace identity.
  • Create a blob container to hold the data files.
    • Upload a data file. The data file structure is used to define the table schema. For more information, see Data formats supported by Real-Time Intelligence.

      Note

      You must upload a data file:

      • Before the configuration to define the table schema during set-up.
      • After the configuration to trigger the continuous ingestion, to preview data, and to verify the connection.

      Note

      Continuous ingestion from Azure Storage is also supported when the storage account is configured with private endpoints (Private Link). Make sure that the Fabric workspace can access the storage account through the configured private network path.

Add the workspace identity role assignment to the storage account

  1. From the Workspace settings in Fabric, copy your workspace identity ID.

    Screenshot of the workspace setting, with the workspace ID highlighted.

  2. In the Azure portal, browse to your Azure Storage account, and select Access Control (IAM) > Add > Add role assignment.

  3. Select Storage Blob Data Reader.

  4. In the Add role assignment dialog, select + Select members.

  5. Paste in the workspace identity ID, select the application, and then Select > Review + assign.

Create a blob container and upload a data file

  1. In the storage account, select Containers.

  2. Select + Container, enter a name for the container, and select Save.

  3. Enter the container, select upload, and upload the data file prepared earlier.

    For more information, see supported formats and supported compressions.

  4. From the context menu, [...], select Container properties, and copy the URL to input during the configuration.

    Screenshot of the container list with the context menu open and the Container properties option highlighted.

Select Azure Storage as the data source

Open the Get Data flow and select Azure Storage as the source.

  1. From your Workspace, open the eventhouse, and select the database.

  2. On the KQL database ribbon, select Get Data.

  3. Select the data source from the available list. In this example, you're ingesting data from Azure Storage.

    Screenshot of the get data tiles with the Azure storage option highlighted.

Configure

  1. Select a destination table. To ingest data into a new table, select + New table and enter a table name.

    Note

    Table names can be up to 1,024 characters, including spaces, alphanumeric characters, hyphens, and underscores. Special characters aren't supported.

  2. In Configure Azure Blob Storage connection, make sure Continuous ingestion is turned on. It's turned on by default.

  3. Configure the connection by creating a new connection or by using an existing connection.

    To create a new connection:

    1. Select Connect to a storage account.

      Screenshot of configure tab with Continuous ingestion and connect to an account selected.

    2. Use the following descriptions to help fill in the fields.

      Setting Field description
      Subscription The storage account subscription.
      Blob storage account Storage account name.
      Container The storage container containing the file you want to ingest.
    3. In the Connection field, open the dropdown and select + New connection, and then select Save > Close. The connection settings are prepopulated.

    Note

    When you create a new connection, you also create a new eventstream. The name is defined as <storage_account_name>_eventstream. Don't remove the continuous ingestion eventstream from the workspace.

    To use an existing connection:

    1. Select Select an existing storage account.

      Screenshot of configure tab with Continuous ingestion and connect to an existing account selected.

    2. Use the following descriptions to help fill in the fields.

      Setting Field description
      RTAStorageAccount An eventstream connected to your storage account from Fabric.
      Container The storage container containing the file you want to ingest.
      Connection This field is prepopulated with the connection string.
    3. In the Connection field, open the dropdown and select the existing connection string from the list. Then select Save > Close.

  4. Optionally, expand File filters and specify the following filters:

    Setting Field description
    Folder path Filter data to ingest files with a specific folder path.
    File extension Filter data to ingest files with a specific file extension only.
  5. In the Eventstream settings section, select the events to monitor in Advanced settings > Event types. By default, Blob created is selected. You can also select Blob renamed.

    Screenshot of Advanced settings with the Event types dropdown expanded.

  6. Select Next to preview the data.

Inspect preview data before ingestion

The Inspect tab opens with a preview of the data.

To complete the ingestion process, select Finish.

Screenshot of the Inspect tab showing a preview of ingested data before selecting Finish.

Note

To trigger continuous ingestion and preview data, upload a new blob after you complete the configuration.

Optionally:

  • Use the schema definition file dropdown to change the file that the schema is inferred from.

  • Use the file type dropdown to explore Advanced options based on data type.

  • Use the Table_mapping dropdown to define a new mapping.

  • Select </> to open the command viewer to view and copy the automatic commands generated from your inputs. You can also open the commands in a KQL queryset.

  • Select the pencil icon to Edit columns.

Edit columns

Note

  • For tabular formats (CSV, TSV, PSV), you can't map a column twice. To map to an existing column, first delete the new column.
  • You can't change an existing column type. If you try to map to a column having a different format, you may end up with empty columns.

The changes you can make in a table depend on the following parameters:

  • Table type is new or existing
  • Mapping type is new or existing
Table type Mapping type Available adjustments
New table New mapping Rename column, change data type, change data source, mapping transformation, add column, delete column
Existing table New mapping Add column (on which you can then change data type, rename, and update)
Existing table Existing mapping none

Screenshot of the column editor with table columns open for editing names, data types, and mappings.

Mapping transformations

Some data format mappings (Parquet, JSON, and Avro) support simple ingest-time transformations. To apply mapping transformations, create or update a column in the Edit columns window.

Mapping transformations can be performed on a column of type string or datetime, with the source having data type int or long. For more information, see the full list of supported mapping transformations.

Advanced options based on data type

Tabular (CSV, TSV, PSV):

  • If you're ingesting tabular formats in an existing table, you can select Advanced > Keep table schema. Tabular data doesn't necessarily include the column names that are used to map source data to the existing columns. When this option is checked, mapping is done by-order, and the table schema remains the same. If this option is unchecked, new columns are created for incoming data, regardless of data structure.

    Screenshot of advanced options.

  • Tabular data doesn't necessarily include the column names that are used to map source data to the existing columns. To use the first row as column names, select First row is column header.

    Screenshot of the First row is column header switch.

Tabular (CSV, TSV, PSV):

  • If you're ingesting tabular formats in an existing table, select Table_mapping > Use existing schema. Tabular data doesn't necessarily include the column names that are used to map source data to the existing columns. When you select this option, mapping is done by order, and the table schema remains the same. If you clear this option, new columns are created for incoming data, regardless of data structure.

  • To use the first row as column names, select First row header.

    Screenshot of advanced CSV options, including controls for first row header and delimiter settings.

JSON:

  • To determine column division of JSON data, select Nested levels, from 1 to 100.

    Screenshot of advanced JSON options, including the Nested levels setting used to split JSON columns.

Review the ingestion summary

In the Summary window, all the steps show green check marks when data ingestion finishes successfully. You can select a card to explore the data, delete the ingested data, or create a dashboard with key metrics.

Screenshot of summary page for continuous ingestion with successful ingestion completed.

When you close the window, you can see the connection in the Explorer tab, under Data streams. From here, you can filter the data streams and delete a data stream.

Screenshot of the KQL database explorer with Data streams highlighted.