You can query data stored in Azure Blob Storage from SQL Server using PolyBase or OPENROWSET. Here’s a quick guide:

Prerequisites

  • SQL Server 2019 or later
  • Azure Blob Storage account
  • Data files (e.g., CSV, Parquet) in Blob Storage
  • Database master key created

Steps

1. Create Database Scoped Credential

Work with Azure admin to have the SAS token beforehand Then create below the Credential and External Data Source in a database

CREATE DATABASE SCOPED CREDENTIAL AzureBlobCredential
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = '<your-SAS-token>';

2. Create External Data Source

CREATE EXTERNAL DATA SOURCE AzureBlobStorage
WITH (
    TYPE = BLOB_STORAGE,
    LOCATION = 'https://<your-storage-account>.blob.core.windows.net/<container>',
    CREDENTIAL = AzureBlobCredential
);

3. Query Data Using OPENROWSET

SELECT *
FROM OPENROWSET(
    BULK 'yourfile.csv',
    DATA_SOURCE = 'AzureBlobStorage',
    FORMAT = 'CSV',
    FIRSTROW = 2
) AS DataFile;

4. Yet another specific case to extract data from a bulk of XML files

First use API call to retrieve a file list. Note the parameters used below to query specifically

    SET @BlobAPILink =  [Azure Blob storage URL]+'?restype=container&comp=list&prefix=archive/PAIN002v3&maxresults=1000&'+@sasSecret;

....

    SET @response = DataPortal.dbo.[clr_http_request_modern]('GET', @BlobAPILink_Next, @parameters, @header, @Options);
    SET @response_json = @response.value('Response[1]/Body[1]', 'NVARCHAR(MAX)') 

    -- Remove XML declaration
    SET @pos = CHARINDEX('?>', @response_json);
    IF @pos > 0  
        SET @response_json = LTRIM(STUFF(@response_json, 1, @pos + 1, ''));

    SET @response_XML = CAST(@response_json AS XML);

    -- Extract the NextMarker value to construct next API call
    SET @NextMarker = @response_XML.value('(/EnumerationResults/NextMarker)[1]', 'VARCHAR(MAX)') ;

....

    SELECT
        Blob.value('(Name)[1]', 'VARCHAR(255)') AS RejectedFileName,
        REPLACE(Blob.value('(Properties/Creation-Time)[1]', 'VARCHAR(100)'), 'GMT','') AS CreationTime
    FROM
        @response_XML.nodes('/EnumerationResults/Blobs/Blob') AS XTbl(Blob)

Then use OPENROWSET to open the XML content

    SET @BlobQuery = N'select @output = BulkColumn
        FROM OPENROWSET(
            BULK '''+ @RejectedFileName+''',
            DATA_SOURCE = ''DS_ifspaymentstobanks'',
            SINGLE_BLOB
        ) AS DataFile';

    EXEC sp_executesql @BlobQuery, N'@output varchar(max) OUTPUT', @output = @xmlDoc OUTPUT;

Finally use XPath to extract data out of the XML

    SELECT
        -- Group Header
        Headers.value('(ns:GrpHdr/ns:MsgId)[1]', 'VARCHAR(50)') AS MsgId,
        Headers.value('(ns:GrpHdr/ns:CreDtTm)[1]', 'DATETIME') AS  CreDtTm,
        Headers.value('(ns:GrpHdr/ns:InitgPty/ns:Nm)[1]', 'VARCHAR(100)') AS  InitgPty_Nm,
        Headers.value('(ns:GrpHdr/ns:InitgPty/ns:Id/ns:OrgId/ns:BICOrBEI)[1]', 'VARCHAR(50)') AS  InitgPty_BICOrBEI,

        -- Original Group Info
        Headers.value('(ns:OrgnlGrpInfAndSts/ns:OrgnlMsgId)[1]', 'VARCHAR(50)') AS OrgnlMsgId,
        Headers.value('(ns:OrgnlGrpInfAndSts/ns:OrgnlMsgNmId)[1]', 'VARCHAR(50)') AS OrgnlMsgNmId,
        Headers.value('(ns:OrgnlGrpInfAndSts/ns:OrgnlCreDtTm)[1]', 'DATETIME') AS OrgnlCreDtTm,
        Headers.value('(ns:OrgnlGrpInfAndSts/ns:OrgnlNbOfTxs)[1]', 'INT') AS OrgnlNbOfTxs,
        Headers.value('(ns:OrgnlGrpInfAndSts/ns:OrgnlCtrlSum)[1]', 'DECIMAL(18,2)') AS OrgnlCtrlSum,

        -- Transaction Info
        Headers.value('(ns:OrgnlPmtInfAndSts/ns:OrgnlPmtInfId)[1]', 'VARCHAR(50)') AS OrgnlPmtInfId,

        Txs.value('(ns:StsId)[1]', 'VARCHAR(100)') AS StsId,
        Txs.value('(ns:OrgnlInstrId)[1]', 'VARCHAR(50)') AS OrgnlInstrId,
        Txs.value('(ns:OrgnlEndToEndId)[1]', 'VARCHAR(50)') AS OrgnlEndToEndId,
        Txs.value('(ns:TxSts)[1]', 'VARCHAR(50)') AS TxSts,

        Txs.value('(ns:StsRsnInf/ns:Rsn/ns:Cd)[1]', 'VARCHAR(50)') AS StsRsnInf_Rsn_Cd,
        Txs.value('(ns:StsRsnInf/ns:AddtlInf)[1]', 'VARCHAR(500)') AS StsRsnInf_AddtlInf,

        Txs.value('(ns:OrgnlTxRef/ns:Amt/ns:InstdAmt)[1]', 'DECIMAL(18,2)') AS OrgnlTxRef_InstdAmt,
        Txs.value('(ns:OrgnlTxRef/ns:Amt/ns:InstdAmt/@Ccy)[1]', 'VARCHAR(3)') AS OrgnlTxRef_Ccy,
        Txs.value('(ns:OrgnlTxRef/ns:ReqdExctnDt)[1]', 'DATE') AS OrgnlTxRef_ReqdExctnDt,
        Txs.value('(ns:OrgnlTxRef/ns:Dbtr/ns:Nm)[1]', 'VARCHAR(100)') AS OrgnlTxRef_Dbtr_Nm,

        Txs.value('(ns:OrgnlTxRef/ns:Dbtr/ns:Id/ns:OrgId/ns:Othr/ns:Id)[1]', 'VARCHAR(50)') AS OrgnlTxRef_Dbtr_Id,

        Txs.value('(ns:OrgnlTxRef/ns:DbtrAcct/ns:Id/ns:Othr/ns:Id)[1]', 'VARCHAR(50)') AS OrgnlTxRef_DbtrAcct_Id,

        Txs.value('(ns:OrgnlTxRef/ns:DbtrAgt/ns:FinInstnId/ns:BIC)[1]', 'VARCHAR(50)') AS OrgnlTxRef_DbtrAgt_BIC,
        Txs.value('(ns:OrgnlTxRef/ns:DbtrAgt/ns:FinInstnId/ns:ClrSysMmbId)[1]', 'VARCHAR(50)') AS OrgnlTxRef_DbtrAgt_ClrSysMmbId,

        Txs.value('(ns:OrgnlTxRef/ns:CdtrAgt/ns:FinInstnId/ns:ClrSysMmbId)[1]', 'VARCHAR(50)') AS OrgnlTxRef_CdtrAgt_ClrSysMmbId,
        Txs.value('(ns:OrgnlTxRef/ns:CdtrAgt/ns:FinInstnId/ns:PstlAdr/ns:Ctry)[1]', 'VARCHAR(50)') AS OrgnlTxRef_CdtrAgt_PstlAdr_Ctry,

        Txs.value('(ns:OrgnlTxRef/ns:Cdtr/ns:Nm)[1]', 'VARCHAR(100)') AS OrgnlTxRef_Cdtr_Nm,
        Txs.value('(ns:OrgnlTxRef/ns:CdtrAcct/ns:Id/ns:Othr/ns:Id)[1]', 'VARCHAR(50)') AS OrgnlTxRef_CdtrAcct_Id
    FROM
        @xmlData.nodes('/ns:Document/ns:CstmrPmtStsRpt') AS S(Headers)
    CROSS APPLY
        @xmlData.nodes('/ns:Document/ns:CstmrPmtStsRpt/ns:OrgnlPmtInfAndSts/ns:TxInfAndSts') AS T(Txs);

Notes

  • Replace placeholders with your actual values.
  • Ensure PolyBase is enabled if using external tables.
  • For more advanced scenarios, consider creating external tables.

References


Published

Category

Azure Blob Storage

Tags

Contact