我在 ADLS Gen2 中有一个名为 Source 的父文件夹,其中包含许多子文件夹,这些子文件夹包含实际的数据文件,如下例所示...
***来源:***
文件夹名称:20221212
A_20221212.txt B_20221212.txt C_20221212.txt
文件夹名称:20221219
A_20221219.txt B_20221219.txt C_20221219.txt
文件夹名称:20221226
A_20221226.txt B_20221226.txt C_20221226.txt
如何使用 Azure 数据工厂从子文件夹复制文件以命名特定文件夹(如果不存在,则应创建一个新文件夹),请参阅下面的示例...
***目标:***
文件夹名称: A
A_20221212.txt A_20221219.txt A_20221226.txt
文件夹名称: B
B_20221212.txt B_20221219.txt B_20221226.txt
文件夹名称: C
C_20221212.txt C_20221219.txt C_20221226.txt
非常感谢您的帮助。
最佳答案
我复制了上面的内容并得到了以下结果。
如果您的文件夹目录处于同一级别,则可以使用“获取元数据”事件按照以下过程进行操作。
这是我的源文件夹结构。
data
20221212
A_20221212.txt
B_20221212.txt
C_20221212.txt`
20221219
A_20221219.txt
B_20221219.txt
C_20221219.txt
20221226
A_20221226.txt
B_20221226.txt
C_20221226.txt
源数据集:
将此用于获取元数据事件并使用 ChildItems
。
然后将 Get Meta data 事件中的 ChildItems 数组提供给 ForEach 事件。在 ForEach 内部,我使用了 set 变量来存储文件夹名称。
@split(item().name,'_')[0]
现在,使用复制事件并在源中使用通配符路径,如下所示。
对于接收器,创建数据集参数并为其提供复制事件接收器,如下所示。
我的管道 JSON:
{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "Get Metadata1",
"type": "GetMetadata",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "sourcetxt",
"type": "DatasetReference"
},
"fieldList": [
"childItems"
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "ForEach1",
"type": "ForEach",
"dependsOn": [
{
"activity": "Get Metadata1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('Get Metadata1').output.childItems",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "Copy data1",
"type": "Copy",
"dependsOn": [
{
"activity": "Set variable1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"wildcardFolderPath": "*",
"wildcardFileName": {
"value": "@item().name",
"type": "Expression"
},
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "DelimitedTextSink",
"storeSettings": {
"type": "AzureBlobFSWriteSettings"
},
"formatSettings": {
"type": "DelimitedTextWriteSettings",
"quoteAllText": true,
"fileExtension": ".txt"
}
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "sourcetxt",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "targettxts",
"type": "DatasetReference",
"parameters": {
"folder_name": {
"value": "@variables('folder_name')",
"type": "Expression"
},
"file_name": {
"value": "@item().name",
"type": "Expression"
}
}
}
]
},
{
"name": "Set variable1",
"type": "SetVariable",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"variableName": "folder_name",
"value": {
"value": "@split(item().name,'_')[0]",
"type": "Expression"
}
}
}
]
}
}
],
"variables": {
"folder_name": {
"type": "String"
}
},
"annotations": []
}
}
结果:
关于azure - 根据 Azure 数据工厂中的文件名将文件从一个文件夹复制到多个文件夹,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75213508/