You can query data stored in Azure Blob Storage from SQL Server using PolyBase or OPENROWSET. Here’s a quick guide:
Prerequisites
- SQL Server 2019 or later
- Azure Blob Storage account
- Data files (e.g., CSV, Parquet) in Blob Storage
- Database master key created
Steps
1. Create Database Scoped Credential
Work with Azure admin to have the SAS token beforehand Then create below the Credential and External Data Source in a database
CREATE DATABASE SCOPED CREDENTIAL AzureBlobCredential
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = '<your-SAS-token>';
2. Create External Data Source
CREATE EXTERNAL DATA SOURCE AzureBlobStorage
WITH (
TYPE = BLOB_STORAGE,
LOCATION = 'https://<your-storage-account>.blob.core.windows.net/<container>',
CREDENTIAL = AzureBlobCredential
);
3. Query Data Using OPENROWSET
SELECT *
FROM OPENROWSET(
BULK 'yourfile.csv',
DATA_SOURCE = 'AzureBlobStorage',
FORMAT = 'CSV',
FIRSTROW = 2
) AS DataFile;
4. Yet another specific case to extract data from a bulk of XML files
First use API call to retrieve a file list. Note the parameters used below to query specifically
SET @BlobAPILink = [Azure Blob storage URL]+'?restype=container&comp=list&prefix=archive/PAIN002v3&maxresults=1000&'+@sasSecret;
....
SET @response = DataPortal.dbo.[clr_http_request_modern]('GET', @BlobAPILink_Next, @parameters, @header, @Options);
SET @response_json = @response.value('Response[1]/Body[1]', 'NVARCHAR(MAX)')
-- Remove XML declaration
SET @pos = CHARINDEX('?>', @response_json);
IF @pos > 0
SET @response_json = LTRIM(STUFF(@response_json, 1, @pos + 1, ''));
SET @response_XML = CAST(@response_json AS XML);
-- Extract the NextMarker value to construct next API call
SET @NextMarker = @response_XML.value('(/EnumerationResults/NextMarker)[1]', 'VARCHAR(MAX)') ;
....
SELECT
Blob.value('(Name)[1]', 'VARCHAR(255)') AS RejectedFileName,
REPLACE(Blob.value('(Properties/Creation-Time)[1]', 'VARCHAR(100)'), 'GMT','') AS CreationTime
FROM
@response_XML.nodes('/EnumerationResults/Blobs/Blob') AS XTbl(Blob)
Then use OPENROWSET to open the XML content
SET @BlobQuery = N'select @output = BulkColumn
FROM OPENROWSET(
BULK '''+ @RejectedFileName+''',
DATA_SOURCE = ''DS_ifspaymentstobanks'',
SINGLE_BLOB
) AS DataFile';
EXEC sp_executesql @BlobQuery, N'@output varchar(max) OUTPUT', @output = @xmlDoc OUTPUT;
Finally use XPath to extract data out of the XML
SELECT
-- Group Header
Headers.value('(ns:GrpHdr/ns:MsgId)[1]', 'VARCHAR(50)') AS MsgId,
Headers.value('(ns:GrpHdr/ns:CreDtTm)[1]', 'DATETIME') AS CreDtTm,
Headers.value('(ns:GrpHdr/ns:InitgPty/ns:Nm)[1]', 'VARCHAR(100)') AS InitgPty_Nm,
Headers.value('(ns:GrpHdr/ns:InitgPty/ns:Id/ns:OrgId/ns:BICOrBEI)[1]', 'VARCHAR(50)') AS InitgPty_BICOrBEI,
-- Original Group Info
Headers.value('(ns:OrgnlGrpInfAndSts/ns:OrgnlMsgId)[1]', 'VARCHAR(50)') AS OrgnlMsgId,
Headers.value('(ns:OrgnlGrpInfAndSts/ns:OrgnlMsgNmId)[1]', 'VARCHAR(50)') AS OrgnlMsgNmId,
Headers.value('(ns:OrgnlGrpInfAndSts/ns:OrgnlCreDtTm)[1]', 'DATETIME') AS OrgnlCreDtTm,
Headers.value('(ns:OrgnlGrpInfAndSts/ns:OrgnlNbOfTxs)[1]', 'INT') AS OrgnlNbOfTxs,
Headers.value('(ns:OrgnlGrpInfAndSts/ns:OrgnlCtrlSum)[1]', 'DECIMAL(18,2)') AS OrgnlCtrlSum,
-- Transaction Info
Headers.value('(ns:OrgnlPmtInfAndSts/ns:OrgnlPmtInfId)[1]', 'VARCHAR(50)') AS OrgnlPmtInfId,
Txs.value('(ns:StsId)[1]', 'VARCHAR(100)') AS StsId,
Txs.value('(ns:OrgnlInstrId)[1]', 'VARCHAR(50)') AS OrgnlInstrId,
Txs.value('(ns:OrgnlEndToEndId)[1]', 'VARCHAR(50)') AS OrgnlEndToEndId,
Txs.value('(ns:TxSts)[1]', 'VARCHAR(50)') AS TxSts,
Txs.value('(ns:StsRsnInf/ns:Rsn/ns:Cd)[1]', 'VARCHAR(50)') AS StsRsnInf_Rsn_Cd,
Txs.value('(ns:StsRsnInf/ns:AddtlInf)[1]', 'VARCHAR(500)') AS StsRsnInf_AddtlInf,
Txs.value('(ns:OrgnlTxRef/ns:Amt/ns:InstdAmt)[1]', 'DECIMAL(18,2)') AS OrgnlTxRef_InstdAmt,
Txs.value('(ns:OrgnlTxRef/ns:Amt/ns:InstdAmt/@Ccy)[1]', 'VARCHAR(3)') AS OrgnlTxRef_Ccy,
Txs.value('(ns:OrgnlTxRef/ns:ReqdExctnDt)[1]', 'DATE') AS OrgnlTxRef_ReqdExctnDt,
Txs.value('(ns:OrgnlTxRef/ns:Dbtr/ns:Nm)[1]', 'VARCHAR(100)') AS OrgnlTxRef_Dbtr_Nm,
Txs.value('(ns:OrgnlTxRef/ns:Dbtr/ns:Id/ns:OrgId/ns:Othr/ns:Id)[1]', 'VARCHAR(50)') AS OrgnlTxRef_Dbtr_Id,
Txs.value('(ns:OrgnlTxRef/ns:DbtrAcct/ns:Id/ns:Othr/ns:Id)[1]', 'VARCHAR(50)') AS OrgnlTxRef_DbtrAcct_Id,
Txs.value('(ns:OrgnlTxRef/ns:DbtrAgt/ns:FinInstnId/ns:BIC)[1]', 'VARCHAR(50)') AS OrgnlTxRef_DbtrAgt_BIC,
Txs.value('(ns:OrgnlTxRef/ns:DbtrAgt/ns:FinInstnId/ns:ClrSysMmbId)[1]', 'VARCHAR(50)') AS OrgnlTxRef_DbtrAgt_ClrSysMmbId,
Txs.value('(ns:OrgnlTxRef/ns:CdtrAgt/ns:FinInstnId/ns:ClrSysMmbId)[1]', 'VARCHAR(50)') AS OrgnlTxRef_CdtrAgt_ClrSysMmbId,
Txs.value('(ns:OrgnlTxRef/ns:CdtrAgt/ns:FinInstnId/ns:PstlAdr/ns:Ctry)[1]', 'VARCHAR(50)') AS OrgnlTxRef_CdtrAgt_PstlAdr_Ctry,
Txs.value('(ns:OrgnlTxRef/ns:Cdtr/ns:Nm)[1]', 'VARCHAR(100)') AS OrgnlTxRef_Cdtr_Nm,
Txs.value('(ns:OrgnlTxRef/ns:CdtrAcct/ns:Id/ns:Othr/ns:Id)[1]', 'VARCHAR(50)') AS OrgnlTxRef_CdtrAcct_Id
FROM
@xmlData.nodes('/ns:Document/ns:CstmrPmtStsRpt') AS S(Headers)
CROSS APPLY
@xmlData.nodes('/ns:Document/ns:CstmrPmtStsRpt/ns:OrgnlPmtInfAndSts/ns:TxInfAndSts') AS T(Txs);
Notes
- Replace placeholders with your actual values.
- Ensure PolyBase is enabled if using external tables.
- For more advanced scenarios, consider creating external tables.