C# 屏蔽数组以排除索引,速度与 Python 一样快

标签 c# python arrays bitmask

我拥有的:

//This data set contains columns (second index) having the same value in each row (first index)
double[][] dataSet = new double[][]
{
  new double[] {1, 2, 3, 4},
  new double[] {5, 6, 7, 4},
  new double[] {8, 9, 10, 4},
}; 

我想要得到什么:

// This data set has no column where the value in each row is the same
double[][] reducedDataSet = new double[][]
{
  new double[] {1, 2, 3},
  new double[] {5, 6, 7},
  new double[] {8, 9, 10},
}; 

在 python 中,这可以通过以下方式轻松完成:

all_equal_value_indices = numpy.all(data_set == data_set[0, :], axis=0) // Finds the indices of all columns that have equal values in each row
reduced_data_set = data_set[:, ~all_equal_value_indices] // Takes all rows with only those columns where all_equal_value_indices is not 1

在 C# 中,我可以获得一个包含应相对较快地排除的索引的数组,但如何使用这些索引作为掩码来仅获取这些索引中未包含的那些列?

我尝试过的:

var numberOfDeletedColumns = 0;
var reducedDataSet = dataSet;

foreach (var columnToDelete in columnsToDelete)
{
  reducedDataSet = reducedDataSet.RemoveColumn(columnToDelete - numberOfDeletedColumns++);
}

RemoveColumnAccord.Net 提供的扩展并具有以下代码:

/// <summary>Returns a new matrix without one of its columns.</summary>
public static T[][] RemoveColumn<T>(this T[][] matrix, int index)
{
  T[][] objArray = new T[matrix.Length][];
  for (int index1 = 0; index1 < matrix.Length; ++index1)
  {
    objArray[index1] = new T[matrix[index1].Length - 1];
    for (int index2 = 0; index2 < index; ++index2)
      objArray[index1][index2] = matrix[index1][index2];
    for (int index2 = index + 1; index2 < matrix[index1].Length; ++index2)
      objArray[index1][index2 - 1] = matrix[index1][index2];
  }
  return objArray;
}

但这比Python中的实现要慢得多。有人可以建议一种更快的方法来实现减少的数据集吗?

最佳答案

Array.Copy 帮助它在我的计算机上运行速度提高了约 2 倍。

static T[][] FastRemoveColumn<T>(T[][] matrix, int index)
{
    T[][] objArray = new T[matrix.Length][];
    for (int i = 0; i < matrix.Length; i++)
    {
        var line = matrix[i];
        var reducedline = new T[line.Length - 1];
        Array.Copy(line, 0, reducedline, 0, index);
        Array.Copy(line, index + 1, reducedline, index, line.Length - index - 1);
        objArray[i] = reducedline;                
    }
    return objArray;
}

我也尝试过多线程。它运行非常慢:

static T[][] MultiThreadRemoveColumn<T>(T[][] matrix, int index)
{
    T[][] objArray = new T[matrix.Length][];
    Parallel.For(0, matrix.Length, i =>
    {
        var line = matrix[i];
        var reducedline = new T[line.Length - 1];
        Array.Copy(line, 0, reducedline, 0, index);
        Array.Copy(line, index + 1, reducedline, index, line.Length - index - 1);
        objArray[i] = reducedline;                
    });
    return objArray;
}

测试:

// init
double[][] arr = new double[2000][];
for (int i = 0; i < arr.Length; i++)            
    arr[i] = new double[2000];

double v = 0;
for (int i = 0; i < arr.Length; i++)
{
    for (int j = 0; j < arr[i].Length; j++)
    {
        arr[i][j] = v;
        v++;
    }
}

Stopwatch sw = Stopwatch.StartNew();
var reducedArr = RemoveColumn(arr, 200);
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
sw.Restart();
var reducedArr2 = FastRemoveColumn(arr, 200);    
sw.Stop();        
Console.WriteLine(sw.ElapsedMilliseconds);
sw.Restart();
var reducedArr3 = MultiThreadRemoveColumn(arr, 200); 
sw.Stop();     
Console.WriteLine(sw.ElapsedMilliseconds);

// Check the result
for (int i = 0; i < reducedArr.Length; i++)
{
    for (int j = 0; j < reducedArr[i].Length; j++)
    {
        if(reducedArr[i][j] != reducedArr2[i][j]) throw new Exception();
        if(reducedArr[i][j] != reducedArr3[i][j]) throw new Exception();   
    }
}

更新

删除多列的解决方案:

public static T[][] DeleteColumns<T>(T[][] matrix, int[] columns)
{
    if (matrix.Length == 0) return new T[0][];
    bool[] delColumns = new bool[matrix[0].Length];
    foreach (int col in columns) delColumns[col] = true;
    List<int> remainCols = new List<int>();
    for (int i = 0; i < delColumns.Length; i++)
    {
        if (!delColumns[i]) remainCols.Add(i);
    }
    var target = new T[matrix.Length][];
    for (int rowIndex = 0; rowIndex < matrix.Length; rowIndex++)
    {
        T[] sourceRow = matrix[rowIndex];
        T[] targetRow = new T[remainCols.Count];
        for (int i = 0; i < remainCols.Count; i++)
        {
            targetRow[i] = sourceRow[remainCols[i]];
        }
        target[rowIndex] = targetRow;    
    }
    return target;
}

在 2000x2000 矩阵上进行测试。与 Adam Brown 的解决方案相比,测试删除所有列是绝对不公平的,但我的解决方案即使只删除一列也更快。

关于C# 屏蔽数组以排除索引,速度与 Python 一样快,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47835565/

相关文章:

c# - 如何提供 "Exit"菜单项

Python 同时循环、枚举和添加到字典

java - 从二维数组中删除/删除重复项

c# - 不要在 linq C# 中记录最后一条记录

c# - Response.End() 错误

python - 替换pandas groupby并apply以提高性能

java - jackson : Serialize a JSON object to multiple objects

ios - 尝试将 Parse 中的一行数据提取到 Swift 中的数组中

c# - 在 Windows 应用商店应用程序中显示来自 app.xaml.cs 的 MessageDialog

python - 如果可能,如何使用 os.execv() 使用管道和重定向?