javascript - 查找数组中通常彼此相邻出现的元素

标签 javascript node.js

我正在尝试查找数组中通常彼此相邻出现的值。

例如给定数组:

["dog","cat","goat","dog","cat","elephant","dog","cat","pig","seal","dog","cat","pig","monkey"]

它应该返回类似于:

[[["dog","cat"],4],[["cat","pig"],2],[["dog","cat","pig"],2]]

这里有一些更好的数据:https://pastebin.com/UG4iswrZ

帮助将不胜感激。这是我目前尝试做类似事情的失败尝试:

function findAssociations(words){
  var temp = [],tempStore = [],store = [],found = false;
  //loop through the words counting occurrances of words together with a window of 5
  for(var i = 0;i<words.length-1;i++){
    if(i % 5 == 0){
      //on every fith element, loop through store attempting to add combinations of words stored in tempStore
      for(var j = 0;j<5;j++){
        temp = []
        //create the current combination
        for(var k = 0;k<j;k++){
          temp.push(tempStore[k]);
        }
        //find if element is already stored, if it is, increment the occurrence counter
        for(var k = 0;k<store.length;k++){
          if(store[k][0]===temp){
            found = true;
            store[k][1] = store[k][1]+1;
          }
        }
        //if it isn't add it
        if(found == false){
          store.push([temp,1]);
        }
        found == false;
      }
      tempStore = [];
    } else {
      //add word to tempStore if it i isnt a multiple of 5
      tempStore.push(words[i]);
    }
  }
}

此脚本不会删除出现一次的组合,不会按出现次数对输出进行排序,也不会起作用。它只是一个可能的解决方案如何工作的概述(如 benvc 所建议的那样)。

最佳答案

这是一个适用于多个组大小的通用解决方案。

您指定组大小的范围,例如 [2,4] 用于包含 2 到 4 个元素的组和最小出现次数。

然后该函数生成所有给定大小的邻居组,对每个组进行排序并计算重复项。可以删除排序步骤,因为组中的顺序很重要。

通过创建一个字典来计算重复项,该字典的键是用特殊标记排序和连接的组元素。字典中的值是计数。

然后它返回按出现次数排序的组,然后按组大小排序。

const data = ["dog","cat","goat","dog","cat","elephant","dog","cat","pig","seal","dog","cat","pig","monkey"];

function findSimilarNeighbors(groupSizeRange, minOccurences, data) {
  const getNeighbors = (size, arr) => arr.reduce((acc, x) => {
    acc.push([]);
    for (let i = 0; i < size; ++ i) {
      const idx = acc.length - i - 1;
      (acc[idx] || []).push(x);
    }
    return acc;
  }, []).filter(x => x.length === size);

  const groups = [];
  for (let groupSize = groupSizeRange[0]; groupSize <= groupSizeRange[1]; ++groupSize) {
    groups.push(...getNeighbors(groupSize, data));
  }
  const groupName = group => group.sort().join('@#@'); // use a separator that won't occur in the strings

  const groupsInfo = groups.reduce((acc, group) => {
    const name = groupName(group);
    acc[name] = acc[name] || {};
    acc[name] = { group, count: (acc[name].count || 0) + 1 };
    return acc;
  }, {});
  
  return Object.values(groupsInfo)
    .filter(group => group.count >= minOccurences)
    .sort((a, b) => {
      const countDiff = b.count - a.count;
      return countDiff ? countDiff : b.group.length - a.group.length;
    })
    .map(({ group, count }) => [group, count]);
};

console.log(findSimilarNeighbors([2, 4], 2, data));
console.log(findSimilarNeighbors([4, 4], 2, data));

关于javascript - 查找数组中通常彼此相邻出现的元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55067241/

相关文章:

javascript - 使用 jquery 或任何首选库获取文档中包含的所有元素及其当前值

javascript - 数组存在并已填充,但无法访问 AJAX 响应上的各个元素

node.js - Node - 如何测试 stub 依赖项的匿名回调

javascript - 即使导出模块,也无法设置未定义的属性 'size'

javascript - URL 中带有动态参数的快速路由

javascript - jmeter 将日期从 EST 转换为 UTC

JavaScript——这个单选按钮被选中了吗?

javascript - 如何使用 Lodash 从数组中删除对象?

node.js - 如何允许 Mongoose 模式中的枚举字段为空?

node.js - Sequelize DataTypes.STRING 的意外标记