Input Dataset not working

532 views Asked by At

I've created an azure data factory to schedule a U-SQL script using "DataLakeAnalyticsU-SQL" activity. See the code below:

InputDataset
{
"name": "InputDataLakeTable",
"properties": {
    "published": false,
    "type": "AzureDataLakeStore",
    "linkedServiceName": "LinkedServiceSource",
    "typeProperties": {
        "fileName": "SearchLog.txt",
        "folderPath": "demo/",
        "format": {
            "type": "TextFormat",
            "rowDelimiter": "\n",
            "columnDelimiter": "|",
            "quoteChar": "\""
        }
    },
    "availability": {
        "frequency": "Hour",
        "interval": 1
    }
}

}

OutputDataset:
{
"name": "OutputDataLakeTable",
"properties": {
    "published": false,
    "type": "AzureDataLakeStore",
    "linkedServiceName": "LinkedServiceDestination",
    "typeProperties": {
        "folderPath": "scripts/"
    },
    "availability": {
        "frequency": "Hour",
        "interval": 1
    }
}

}

Pipeline:
{
"name": "ComputeEventsByRegionPipeline",
"properties": {
    "description": "This is a pipeline to compute events for en-gb locale and date less than 2012/02/19.",
    "activities": [
        {
            "type": "DataLakeAnalyticsU-SQL",
            "typeProperties": {
                "scriptPath": "scripts\\SearchLogProcessing.txt",
                "degreeOfParallelism": 3,
                "priority": 100,
                "parameters": {
                    "in": "/demo/SearchLog.txt",
                    "out": "/scripts/Result.txt"
                }
            },
            "inputs": [
                {
                    "name": "InputDataLakeTable"
                }
            ],
            "outputs": [
                {
                    "name": "OutputDataLakeTable"
                }
            ],
            "policy": {
                "timeout": "06:00:00",
                "concurrency": 1,
                "executionPriorityOrder": "NewestFirst",
                "retry": 1
            },
            "scheduler": {
                "frequency": "Hour",
                "interval": 1
            },
            "name": "CopybyU-SQL",
            "linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
        }
    ],
    "start": "2016-12-21T17:44:13.557Z",
    "end": "2016-12-22T17:44:13.557Z",
    "isPaused": false,
    "hubName": "denojaidbfactory_hub",
    "pipelineMode": "Scheduled"
}

}

I've created all required Linked Services successfully. But after deploying the pipeline, there is no time slice is created for input dataset. See below image: No time slice created for input dataset

Whereas Output Dataset is expecting an upstream input dataset timeslice. As a result, the time slices of output dataset remains in pending execution state and my Azure data factory pipeline is not working. See below image: Output dataset is expecting a time slice from input dataset and remains in pending state Any suggestion to resolve this issue.

1

There are 1 answers

0
Alexandre Gattiker On BEST ANSWER

If you don't have another activity that is creating your InputDataLakeTable, you need to add the attribute

"external": true

https://learn.microsoft.com/en-us/azure/data-factory/data-factory-faq

https://learn.microsoft.com/en-us/azure/data-factory/data-factory-create-datasets