Error while running U-SQL Activity in Pipeline in Azure Data Factory

1.7k views Asked by At

I am getting following error while running a USQL Activity in the pipeline in ADF:

Error in Activity:

{"errorId":"E_CSC_USER_SYNTAXERROR","severity":"Error","component":"CSC",
    "source":"USER","message":"syntax error.
     Final statement did not end with a semicolon","details":"at token 'txt', line 3\r\nnear the ###:\r\n**************\r\nDECLARE @in string = \"/demo/SearchLog.txt\";\nDECLARE @out string = \"/scripts/Result.txt\";\nSearchLogProcessing.txt ### \n",
    "description":"Invalid syntax found in the script.",
    "resolution":"Correct the script syntax, using expected token(s) as a guide.","helpLink":"","filePath":"","lineNumber":3,
    "startOffset":109,"endOffset":112}].

Here is the code of output dataset, pipeline and USQL script which i am trying to execute in pipeline.

OutputDataset:

{
"name": "OutputDataLakeTable",
"properties": {
    "published": false,
    "type": "AzureDataLakeStore",
    "linkedServiceName": "LinkedServiceDestination",
    "typeProperties": {
        "folderPath": "scripts/"
    },
    "availability": {
        "frequency": "Hour",
        "interval": 1
    }
}

Pipeline:

{
    "name": "ComputeEventsByRegionPipeline",
    "properties": {
        "description": "This is a pipeline to compute events for en-gb locale and date less than 2012/02/19.",
        "activities": [
            {
                "type": "DataLakeAnalyticsU-SQL",
                "typeProperties": {
                    "script": "SearchLogProcessing.txt",
                    "scriptPath": "scripts\\",
                    "degreeOfParallelism": 3,
                    "priority": 100,
                    "parameters": {
                        "in": "/demo/SearchLog.txt",
                        "out": "/scripts/Result.txt"
                    }
                },
                "inputs": [
                    {
                        "name": "InputDataLakeTable"
                    }
                ],
                "outputs": [
                    {
                        "name": "OutputDataLakeTable"
                    }
                ],
                "policy": {
                    "timeout": "06:00:00",
                    "concurrency": 1,
                    "executionPriorityOrder": "NewestFirst",
                    "retry": 1
                },
                "scheduler": {
                    "frequency": "Minute",
                    "interval": 15
                },
                "name": "CopybyU-SQL",
                "linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
            }
        ],
        "start": "2017-01-03T12:01:05.53Z",
        "end": "2017-01-03T13:01:05.53Z",
        "isPaused": false,
        "hubName": "denojaidbfactory_hub",
        "pipelineMode": "Scheduled"
    }
}

Here is my USQL Script which i am trying to execute using "DataLakeAnalyticsU-SQL" Activity Type.

@searchlog =
    EXTRACT UserId          int,
            Start           DateTime,
            Region          string,
            Query           string,
            Duration        int?,
            Urls            string,
            ClickedUrls     string
    FROM @in
    USING Extractors.Text(delimiter:'|');

@rs1 =
    SELECT Start, Region, Duration
    FROM @searchlog
WHERE Region == "kota";


OUTPUT @rs1   
    TO @out
      USING Outputters.Text(delimiter:'|');

Please suggest me how to resolve this issue.

3

There are 3 answers

4
wBob On BEST ANSWER

Your script is missing the scriptLinkedService attribute. You also (currently) need to place the U-SQL script in Azure Blob Storage to run it successfully. Therefore you also need an AzureStorage Linked Service, for example:

{
    "name": "StorageLinkedService",
    "properties": {
        "description": "",
        "type": "AzureStorage",
        "typeProperties": {
            "connectionString": "DefaultEndpointsProtocol=https;AccountName=myAzureBlobStorageAccount;AccountKey=**********"
        }
    }
}

Create this linked service, replacing the Blob storage name myAzureBlobStorageAccount with your relevant Blob Storage account, then place the U-SQL script (SearchLogProcessing.txt) in a container there and try again. In my example pipeline below, I have a container called adlascripts in my Blob store and the script is in there:

Make sure the scriptPath is complete, as Alexandre mentioned. Start of the pipeline:

{
    "name": "ComputeEventsByRegionPipeline",
    "properties": {
        "description": "This is a pipeline to compute events for en-gb locale and date less than 2012/02/19.",
        "activities": [
            {
                "type": "DataLakeAnalyticsU-SQL",
                "typeProperties": {
                    "scriptPath": "adlascripts\\SearchLogProcessing.txt",
                    "scriptLinkedService": "StorageLinkedService",
                    "degreeOfParallelism": 3,
                    "priority": 100,
                    "parameters": {
                        "in": "/input/SearchLog.tsv",
                        "out": "/output/Result.tsv"
                    }
                },
...

The input and output .tsv files can be in the data lake and use the the AzureDataLakeStoreLinkedService linked service.

I can see you are trying to follow the demo from: https://learn.microsoft.com/en-us/azure/data-factory/data-factory-usql-activity#script-definition. It is not the most intuitive demo and there seem to be some issues like where is the definition for StorageLinkedService?, where is SearchLogProcessing.txt? OK I found it by googling but there should be a link in the webpage. I got it to work but felt a bit like Harry Potter in the Half-Blood Prince.

3
Alexandre Gattiker On

Remove the script attribute in your U-SQL activity definition and provide the complete path to your script (including filename) in the scriptPath attribute.

Reference: https://learn.microsoft.com/en-us/azure/data-factory/data-factory-usql-activity

0
Øystein S On

I had a similary issue, where Azure Data Factory would not recognize my script files. A way to avoid the whole issue, while not having to paste a lot of code, is to register a stored procedure. You can do it like this:

DROP PROCEDURE IF EXISTS master.dbo.sp_test;
CREATE PROCEDURE master.dbo.sp_test() 
AS  
BEGIN 

@searchlog =
EXTRACT UserId          int,
        Start           DateTime,
        Region          string,
        Query           string,
        Duration        int?,
        Urls            string,
        ClickedUrls     string
FROM @in
USING Extractors.Text(delimiter:'|');

@rs1 =
    SELECT Start, Region, Duration
    FROM @searchlog
WHERE Region == "kota";


OUTPUT @rs1   
    TO @out
      USING Outputters.Text(delimiter:'|');
END;

After running this, you can use

"script": "master.dbo.sp_test()"

in your JSON pipeline definition. Whenever you update the U-SQL script, simply re-run the definition of the procedure. Then there will be no need to copy script files to Blob Storage.