javascript - 映射两个对象数组以准备 'upload' 的最有效方法

标签 javascript node.js arrays algorithm data-structures

抱歉,如果标题有点困惑,我不确定如何用几个词来表达它。

我目前正在处理用户上传 .csv 或 excel 文件的情况,并且必须正确映射数据以准备批量上传。当您阅读下面的代码时,它会更有意义!

第一步:用户上传 .csv/excel 文件,它被转换成一个对象数组。通常第一个数组是标题。

数据将如下所示(包括标题)。这将介于 100 项到最多 ~100,000 项之间:

const DUMMY_DATA = [
['First Name', 'Last Name', 'company', 'email', 'phone', 'Address', 'City', 'State', 'Zip Code'],
['Lambert', 'Beckhouse', 'StackOverflow', 'lbeckhouse0@stackoverflow.com', '512-555-1738', '316 Arapahoe Way', 'Austin', 'TX', '78721'],
['Maryanna', 'Vassman', 'CDBABY', 'mvassman1@cdbaby.com', '479-204-8976', '1126 Troy Way', 'Fort Smith', 'AR', '72916']
]

上传后,用户会将每个字段映射到正确的模式。这可以是所有字段,也可以是选定的几个字段。

例如,用户只想排除邮政编码以外的地址部分。 我们将取回“映射字段”数组,重命名为正确的模式名称(即 First Name => firstName):

const MAPPED_FIELDS = [firstName, lastName, company, email, phone, <empty>, <empty>, <empty>, zipCode]

我已经做到了,因此映射字段的索引将始终与“标题”相匹配。因此,任何未映射的 header 都会有一个值。

所以在这种情况下,我们知道只上传索引为 [0, 1, 2, 3, 4, 8] 的数据(属于 DUMMY_DATA)。

然后我们进入最后一部分,我们要为所有数据上传正确的字段,这样我们就会有来自 MAPPED_FIELDS 的正确映射模式与来自 DUMMY_DATA 的映射值相匹配......

const firstObjectToBeUploaded = {
  firstName: 'Lambert',
  lastName: 'BeckHouse',
  company: 'StackOverflow',
  email: 'lbeckhouse@stackoverflow.com',
  phone: '512-555-1738',
  zipCode: '78721'
}

try {
  await uploadData(firstObjectToBeUploaded)
} catch (err) {
  console.log(err)
}

所有数据都将发送到用 Node.js 编写的 AWS lambda 函数来处理上传/逻辑。

由于数据可能会变得非常大,我正在努力解决如何有效地实现这一点。

最佳答案

如果您希望在更大的数组大小下获得一些性能提升,您可以应用与 Nick 的答案相同的逻辑,但在标准 for 循环中实现。

const DUMMY_DATA = [
  ['First Name', 'Last Name', 'company', 'email', 'phone', 'Address', 'City', 'State', 'Zip Code'],
  ['Lambert', 'Beckhouse', 'StackOverflow', 'lbeckhouse0@stackoverflow.com', '512-555-1738', '316 Arapahoe Way', 'Austin', 'TX', '78721'],
  ['Maryanna', 'Vassman', 'CDBABY', 'mvassman1@cdbaby.com', '479-204-8976', '1126 Troy Way', 'Fort Smith', 'AR', '72916']
];

const MAPPED_FIELDS = ['firstName', 'lastName', 'company', 'email', 'phone', null, null, null, 'zipCode'];

const fieldLength = MAPPED_FIELDS.length;
const dataLength = DUMMY_DATA.length;

const objectsToUpload = [];
for (let i = 1; i < dataLength; i++) {
  const obj = {};
  for (let j = 0; j < fieldLength; j++) {
    if (MAPPED_FIELDS[j] !== null) {
      obj[MAPPED_FIELDS[j]] = DUMMY_DATA[i][j];
    }
  }
  objectsToUpload.push(obj);
}

console.log(objectsToUpload);

对于...的

此处隔离 entries() MAPPED_FIELDS 数组在循环之前一次,以避免重复生成条目迭代器并简单地跳过 null 键而不是稍后过滤它们。解构和可能的迭代器创建/传播似乎使它在小数组上低于 Nick,但在大数组上更快(基于 Chrome 的浏览器测试)。

const DUMMY_DATA = [
  ['First Name', 'Last Name', 'company', 'email', 'phone', 'Address', 'City', 'State', 'Zip Code'],
  ['Lambert', 'Beckhouse', 'StackOverflow', 'lbeckhouse0@stackoverflow.com', '512-555-1738', '316 Arapahoe Way', 'Austin', 'TX', '78721'],
  ['Maryanna', 'Vassman', 'CDBABY', 'mvassman1@cdbaby.com', '479-204-8976', '1126 Troy Way', 'Fort Smith', 'AR', '72916']
];

const MAPPED_FIELDS = ['firstName', 'lastName', 'company', 'email', 'phone', null, null, null, 'zipCode'];
const MAPPED_FIELDS_ENTRIES = [...MAPPED_FIELDS.entries()];

const objectsToUpload = [];
for (const datum of DUMMY_DATA.slice(1)) {
  const obj = {};
  for (const [idx, key] of MAPPED_FIELDS_ENTRIES) {
    if (key !== null) {
      obj[key] = datum[idx];
    }
  }
  objectsToUpload.push(obj);
}

console.log(objectsToUpload);


下面是粗略的基准测试,在我的机器上的结果如下。

for          1,000: 0.400ms
for...of     1,000: 2.900ms
entries      1,000: 1.700ms

for         10,000: 4.100ms
for...of    10,000: 11.700ms
entries     10,000: 13.900ms

for        100,000: 30.200ms
for...of   100,000: 56.500ms
entries    100,000: 100.200ms

const DUMMY_DATA = [
  ['First Name', 'Last Name', 'company', 'email', 'phone', 'Address', 'City', 'State', 'Zip Code'],
  ['Lambert', 'Beckhouse', 'StackOverflow', 'lbeckhouse0@stackoverflow.com', '512-555-1738', '316 Arapahoe Way', 'Austin', 'TX', '78721'],
  ['Maryanna', 'Vassman', 'CDBABY', 'mvassman1@cdbaby.com', '479-204-8976', '1126 Troy Way', 'Fort Smith', 'AR', '72916']
];

const MAPPED_FIELDS = ['firstName', 'lastName', 'company', 'email', 'phone', null, null, null, 'zipCode'];

function makeBigData(size) {
  const [header, ...data] = DUMMY_DATA;
  const r = [header];
  for (let l = 0; l < size; l += 1) {
    r.push([...data[Math.round(Math.random())]]);
  }
  return r;
}

let data = makeBigData(1000);
console.time('for          1,000');
let objectsToUpload = [];
let fieldLength = MAPPED_FIELDS.length, dataLength = data.length;
for (let i = 1; i < dataLength; i++) {
  const obj = {};
  for (let j = 0; j < fieldLength; j++) {
    if (MAPPED_FIELDS[j] !== null) {
      obj[MAPPED_FIELDS[j]] = data[i][j];
    }
  }
  objectsToUpload.push(obj);
}
console.timeEnd('for          1,000');

data = makeBigData(1000);
console.time('for...of     1,000');
objectsToUpload = [];
let MAPPED_FIELDS_ENTRIES = [...MAPPED_FIELDS.entries()];
for (const datum of data.slice(1)) {
  const obj = {};
  for (const [i, key] of MAPPED_FIELDS_ENTRIES) {
    if (key !== null) {
      obj[key] = datum[i];
    }
  }
  objectsToUpload.push(obj);
}
console.timeEnd('for...of     1,000');

data = makeBigData(1000);
console.time('entries      1,000');
objectsToUpload = data.slice(1).map(data =>
  Object.fromEntries(MAPPED_FIELDS
    .map((key, idx) => [key, data[idx]])
    .filter(a => a[0])
  )
)
console.timeEnd('entries      1,000');

console.log();

data = makeBigData(10000);
console.time('for         10,000');
objectsToUpload = [];
fieldLength = MAPPED_FIELDS.length, dataLength = data.length;
for (let i = 1; i < dataLength; i++) {
  const obj = {};
  for (let j = 0; j < fieldLength; j++) {
    if (MAPPED_FIELDS[j] !== null) {
      obj[MAPPED_FIELDS[j]] = data[i];
    }
  }
  objectsToUpload.push(obj);
}
console.timeEnd('for         10,000');

data = makeBigData(10000);
console.time('for...of    10,000');
objectsToUpload = [];
MAPPED_FIELDS_ENTRIES = [...MAPPED_FIELDS.entries()];
for (const datum of data.slice(1)) {
  const obj = {};
  for (const [i, key] of MAPPED_FIELDS_ENTRIES) {
    if (key !== null) {
      obj[key] = datum[i];
    }
  }
  objectsToUpload.push(obj);
}
console.timeEnd('for...of    10,000');

data = makeBigData(10000);
console.time('entries     10,000');
objectsToUpload = data.slice(1).map(data =>
  Object.fromEntries(MAPPED_FIELDS
    .map((key, idx) => [key, data[idx]])
    .filter(a => a[0])
  )
)
console.timeEnd('entries     10,000');

console.log();

data = makeBigData(100000);
console.time('for        100,000');
objectsToUpload = [];
fieldLength = MAPPED_FIELDS.length, dataLength = data.length;
for (let i = 1; i < dataLength; i++) {
  const obj = {};
  for (let j = 0; j < fieldLength; j++) {
    if (MAPPED_FIELDS[j] !== null) {
      obj[MAPPED_FIELDS[j]] = data[i];
    }
  }
  objectsToUpload.push(obj);
}
console.timeEnd('for        100,000');

data = makeBigData(100000);
console.time('for...of   100,000');
objectsToUpload = [];
MAPPED_FIELDS_ENTRIES = [...MAPPED_FIELDS.entries()];
for (const datum of data.slice(1)) {
  const obj = {};
  for (const [i, key] of MAPPED_FIELDS_ENTRIES) {
    if (key !== null) {
      obj[key] = datum[i];
    }
  }
  objectsToUpload.push(obj);
}
console.timeEnd('for...of   100,000');

data = makeBigData(100000);
console.time('entries    100,000');
objectsToUpload = data.slice(1).map(data =>
  Object.fromEntries(MAPPED_FIELDS
    .map((key, idx) => [key, data[idx]])
    .filter(a => a[0])
  )
)
console.timeEnd('entries    100,000');

关于javascript - 映射两个对象数组以准备 'upload' 的最有效方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74175664/

相关文章:

node.js - 在Nightmare.js中无法读取null的 'blur'属性

javascript - 如何让await等待函数完成?

javascript - Highstock - 如何在折线图或面积图工具提示中显示开盘价、最高价、最低价、收盘价

javascript - 如何在javascript中获取准确的屏幕高度和宽度?

javascript - php完成后如何关闭窗口?

node.js - 有没有办法在 Node 服务器的 httpd conf 中设置 ProxyPass

javascript - 如何在 jQuery 中的两个不同变量中获取一些第一个字符和其余字符?

java - 在 Java 8 中,如何同时遍历两个数组?

c - 用 C 语言将数组旋转 180°

javascript - 在提交表单上,返回 false 不起作用