我正在使用“数据 Assets ”将数据从 azure datalake 读取到 azureML 工作区。
但是我想知道如何在azure datalake中写入数据。我有一个 pandas 数据框,想将其保存为 datalake 中的 csv/parquet。
代码:
import mltable
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
mlClient = MLClient.from_config(credential=DefaultAzureCredential())
dataAsset = mlClient.data.get(name="MyDataAsset", version="1")
pathTest = {
'folder': dataAsset.path
}
tblTest = mltable.from_parquet_files(paths=[pathTest])
dfBaseTest = tblTest.to_pandas_dataframe() # ok, here is my pandas dataframe
##############
ML operations.....result: dfResult
How to save dfResult in my dataLake. Is it possible to use the data asset: "MyDataAsset"? Or data asset is only read?
##############
谢谢,RishabhM。有效。
由于我已经在 datalake 中创建了文件夹,所以我这样做了:
import os
from azure.storage.filedatalake import (
DataLakeServiceClient,
DataLakeDirectoryClient,
FileSystemClient
)
from azure.identity import DefaultAzureCredential
account_url = f"https://<Account-Name>.dfs.core.windows.net"
token_credential = DefaultAzureCredential()
service_client = DataLakeServiceClient(account_url, credential=token_credential)
file_system_client = service_client.get_file_system_client(file_system="myFileSystem")
directory_client = file_system_client.get_directory_client("Folder1/Folder2")
dfPandas.to_csv("./data.csv", index=False, encoding='utf-8', sep=';')
file_client = directory_client.get_file_client("data.csv")
with open(file=os.path.join("", "data.csv"), mode="rb") as data:
file_client.upload_data(data, overwrite=True)
最佳答案
上传数据的一种可能的解决方案是使用适用于 Python 的 Azure Data Lake Storage 客户端库
。
import os
from azure.storage.filedatalake import (
DataLakeServiceClient,
DataLakeDirectoryClient,
FileSystemClient
)
from azure.identity import DefaultAzureCredential
account_url = f"https://<Account-Name>.dfs.core.windows.net"
token_credential = DefaultAzureCredential()
service_client = DataLakeServiceClient(account_url, credential=token_credential)
file_system_client = service_client.create_file_system(file_system="dataasset2")
directory_client = file_system_client.create_directory("test")
file_client = directory_client.get_file_client("data.csv")
with open(file=os.path.join("", "data.csv"), mode="rb") as data:
file_client.upload_data(data, overwrite=True)
关于azure - 在 AzureML 中的 Azure DataLake 中写入文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/77012681/