azure - 根据 Azure 数据工厂中的文件名将文件从一个文件夹复制到多个文件夹

标签 azure copy azure-data-factory move

我在 ADLS Gen2 中有一个名为 Source 的父文件夹,其中包含许多子文件夹,这些子文件夹包含实际的数据文件,如下例所示...

***来源:***

文件夹名称:20221212

A_20221212.txt B_20221212.txt C_20221212.txt

文件夹名称:20221219

A_20221219.txt B_20221219.txt C_20221219.txt

文件夹名称:20221226

A_20221226.txt B_20221226.txt C_20221226.txt

如何使用 Azure 数据工厂从子文件夹复制文件以命名特定文件夹(如果不存在,则应创建一个新文件夹),请参阅下面的示例...

***目标:***

文件夹名称: A

A_20221212.txt A_20221219.txt A_20221226.txt

文件夹名称: B

B_20221212.txt B_20221219.txt B_20221226.txt

文件夹名称: C

C_20221212.txt C_20221219.txt C_20221226.txt

非常感谢您的帮助。

最佳答案

我复制了上面的内容并得到了以下结果。

如果您的文件夹目录处于同一级别,则可以使用“获取元数据”事件按照以下过程进行操作。

这是我的源文件夹结构。

data
    20221212
        A_20221212.txt
        B_20221212.txt
        C_20221212.txt`
    20221219
        A_20221219.txt
        B_20221219.txt
        C_20221219.txt
    20221226
        A_20221226.txt
        B_20221226.txt
        C_20221226.txt

源数据集:

enter image description here

将此用于获取元数据事件并使用 ChildItems

然后将 Get Meta data 事件中的 ChildItems 数组提供给 ForEach 事件。在 ForEach 内部,我使用了 set 变量来存储文件夹名称。

@split(item().name,'_')[0]

enter image description here

现在,使用复制事件并在源中使用通配符路径,如下所示。

enter image description here

对于接收器,创建数据集参数并为其提供复制事件接收器,如下所示。

enter image description here

enter image description here

我的管道 JSON:

{
    "name": "pipeline1",
    "properties": {
        "activities": [
            {
                "name": "Get Metadata1",
                "type": "GetMetadata",
                "dependsOn": [],
                "policy": {
                    "timeout": "0.12:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "dataset": {
                        "referenceName": "sourcetxt",
                        "type": "DatasetReference"
                    },
                    "fieldList": [
                        "childItems"
                    ],
                    "storeSettings": {
                        "type": "AzureBlobFSReadSettings",
                        "enablePartitionDiscovery": false
                    },
                    "formatSettings": {
                        "type": "DelimitedTextReadSettings"
                    }
                }
            },
            {
                "name": "ForEach1",
                "type": "ForEach",
                "dependsOn": [
                    {
                        "activity": "Get Metadata1",
                        "dependencyConditions": [
                            "Succeeded"
                        ]
                    }
                ],
                "userProperties": [],
                "typeProperties": {
                    "items": {
                        "value": "@activity('Get Metadata1').output.childItems",
                        "type": "Expression"
                    },
                    "isSequential": true,
                    "activities": [
                        {
                            "name": "Copy data1",
                            "type": "Copy",
                            "dependsOn": [
                                {
                                    "activity": "Set variable1",
                                    "dependencyConditions": [
                                        "Succeeded"
                                    ]
                                }
                            ],
                            "policy": {
                                "timeout": "0.12:00:00",
                                "retry": 0,
                                "retryIntervalInSeconds": 30,
                                "secureOutput": false,
                                "secureInput": false
                            },
                            "userProperties": [],
                            "typeProperties": {
                                "source": {
                                    "type": "DelimitedTextSource",
                                    "storeSettings": {
                                        "type": "AzureBlobFSReadSettings",
                                        "recursive": true,
                                        "wildcardFolderPath": "*",
                                        "wildcardFileName": {
                                            "value": "@item().name",
                                            "type": "Expression"
                                        },
                                        "enablePartitionDiscovery": false
                                    },
                                    "formatSettings": {
                                        "type": "DelimitedTextReadSettings"
                                    }
                                },
                                "sink": {
                                    "type": "DelimitedTextSink",
                                    "storeSettings": {
                                        "type": "AzureBlobFSWriteSettings"
                                    },
                                    "formatSettings": {
                                        "type": "DelimitedTextWriteSettings",
                                        "quoteAllText": true,
                                        "fileExtension": ".txt"
                                    }
                                },
                                "enableStaging": false,
                                "translator": {
                                    "type": "TabularTranslator",
                                    "typeConversion": true,
                                    "typeConversionSettings": {
                                        "allowDataTruncation": true,
                                        "treatBooleanAsNumber": false
                                    }
                                }
                            },
                            "inputs": [
                                {
                                    "referenceName": "sourcetxt",
                                    "type": "DatasetReference"
                                }
                            ],
                            "outputs": [
                                {
                                    "referenceName": "targettxts",
                                    "type": "DatasetReference",
                                    "parameters": {
                                        "folder_name": {
                                            "value": "@variables('folder_name')",
                                            "type": "Expression"
                                        },
                                        "file_name": {
                                            "value": "@item().name",
                                            "type": "Expression"
                                        }
                                    }
                                }
                            ]
                        },
                        {
                            "name": "Set variable1",
                            "type": "SetVariable",
                            "dependsOn": [],
                            "userProperties": [],
                            "typeProperties": {
                                "variableName": "folder_name",
                                "value": {
                                    "value": "@split(item().name,'_')[0]",
                                    "type": "Expression"
                                }
                            }
                        }
                    ]
                }
            }
        ],
        "variables": {
            "folder_name": {
                "type": "String"
            }
        },
        "annotations": []
    }
}

结果:

enter image description here

关于azure - 根据 Azure 数据工厂中的文件名将文件从一个文件夹复制到多个文件夹,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75213508/

相关文章:

azure - 如何记录和审核 Azure Functions 代码和配置更改

node.js - nodejs + azure 函数应用程序 - 处理数据库连接的最佳实践?

Unix cp 命令目标 = . (点)?

c# - 为什么 File.copy 有效但 File.OpenRead 提示访问被拒绝?

ssis - Azure 数据工厂 : does Start Integration Runtime cost me?

azure - 使用 ADF 调用存储过程

azure-blob-storage - 使用 SQL 行中的文件名和数据创建 Blob

azure - 如何将应用程序注册绑定(bind)到特定的 Power BI 实例?

linux - 如何在 Azure Web App 中设置重写规则 - Linux

flash - AS3 : Impossible to copy DisplayObjects with content?