machine-learning - 在 PyTorch 中使用 WeightedRandomSampler

标签 machine-learning deep-learning computer-vision pytorch

我需要在 PyTorch 中实现一个多标签图像分类模型。但是我的数据不平衡,所以我使用了 WeightedRandomSampler在 PyTorch 中创建自定义数据加载器。但是当我遍历自定义数据加载器时,出现错误:IndexError: list index out of range
使用此链接实现了以下代码:https://discuss.pytorch.org/t/balanced-sampling-between-classes-with-torchvision-dataloader/2703/3?u=surajsubramanian

def make_weights_for_balanced_classes(images, nclasses):                        
    count = [0] * nclasses                                                      
    for item in images:                                                         
        count[item[1]] += 1                                                     
    weight_per_class = [0.] * nclasses                                      
    N = float(sum(count))                                                   
    for i in range(nclasses):                                                   
        weight_per_class[i] = N/float(count[i])                                 
    weight = [0] * len(images)                                              
    for idx, val in enumerate(images):                                          
        weight[idx] = weight_per_class[val[1]]                                  
    return weight 

weights = make_weights_for_balanced_classes(train_dataset.imgs, len(full_dataset.classes))
weights = torch.DoubleTensor(weights)
sampler = WeightedRandomSampler(weights, len(weights))

train_loader = DataLoader(train_dataset, batch_size=4,sampler = sampler, pin_memory=True)   

基于 https://stackoverflow.com/a/60813495/10077354 中的回答,以下是我更新的代码。但是当我创建一个数据加载器时也是如此:loader = DataLoader(full_dataset, batch_size=4, sampler=sampler) , len(loader)返回 1。

class_counts = [1691, 743, 2278, 1271]
num_samples = np.sum(class_counts)
labels = [tag for _,tag in full_dataset.imgs] 

class_weights = [num_samples/class_counts[i] for i in range(len(class_counts)]
weights = [class_weights[labels[i]] for i in range(num_samples)]
sampler = WeightedRandomSampler(torch.DoubleTensor(weights), num_samples)

非常感谢!

我根据下面接受的答案包含了一个实用函数:

def sampler_(dataset):
    dataset_counts = imageCount(dataset)
    num_samples = sum(dataset_counts)
    labels = [tag for _,tag in dataset]

    class_weights = [num_samples/dataset_counts[i] for i in range(n_classes)]
    weights = [class_weights[labels[i]] for i in range(num_samples)]
    sampler = WeightedRandomSampler(torch.DoubleTensor(weights), int(num_samples))
    return sampler

imageCount 函数查找数据集中每个类的图像数量。数据集中的每一行都包含图像和类,因此我们考虑元组中的第二个元素。

def imageCount(dataset):
    image_count = [0]*(n_classes)
    for img in dataset:
        image_count[img[1]] += 1
    return image_count

最佳答案

该代码看起来有点复杂...您可以尝试以下操作:

#Let there be 9 samples and 1 sample in class 0 and 1 respectively
class_counts = [9.0, 1.0]
num_samples = sum(class_counts)
labels = [0, 0,..., 0, 1] #corresponding labels of samples

class_weights = [num_samples/class_counts[i] for i in range(len(class_counts))]
weights = [class_weights[labels[i]] for i in range(int(num_samples))]
sampler = WeightedRandomSampler(torch.DoubleTensor(weights), int(num_samples))

关于machine-learning - 在 PyTorch 中使用 WeightedRandomSampler,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60812032/

相关文章:

python - 如何在预训练的对象检测模型中添加其他类并对其进行训练以检测所有类(预训练的+新的)?

python-3.x - 在浏览器网络应用程序中同时对两个视频进行姿势检测不起作用

image-processing - 语义分割和对象检测

machine-learning - 理解令人困惑的感知器输入数据

python - numpy:如何在 np 数组中选择特定索引以进行 k 折交叉验证?

tensorflow - 对全连接层使用单一共享偏差

python - ParameterError:数据必须为numpy.ndarray类型(使用Librosa时)

Java - OpenCv - JavaCv

clojure - 与 Incanter 一起在 Clojure 中使用 BFGS 最小化算法进行 Logistic 回归

r - R 中的字符串聚类(可能吗?)