python-2.7 - 为什么随机种子不能使结果在 Python 中保持不变

标签 python-2.7 random seed

我使用以下代码。我想为相同的随机种子获得相同的结果。我使用相同的随机种子(在本例中为 1)并得到不同的结果。 这是代码:

import pandas as pd
import numpy as np
from random import seed
# Load scikit's random forest classifier library
from sklearn.ensemble import RandomForestClassifier

from sklearn.model_selection import train_test_split
seed(1) ### <-----

file_path = 'https://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data'
dataset2 = pd.read_csv(file_path, header=None, sep=',')

from sklearn import preprocessing
le = preprocessing.LabelEncoder()

#Encoding
y = le.fit_transform(dataset2[60])
dataset2[60] = y
train, test = train_test_split(dataset2, test_size=0.1)
y = train[60] 
y_test = test[60] 
clf = RandomForestClassifier(n_jobs=100, random_state=0)
features = train.columns[0:59] 
clf.fit(train[features], y)

# Apply the Classifier we trained to the test data
y_pred = clf.predict(test[features])

# Decode 
y_test_label = le.inverse_transform(y_test)
y_pred_label = le.inverse_transform(y_pred)


from sklearn.metrics import accuracy_score
print (accuracy_score(y_test_label, y_pred_label))

# Two following results:
# 0.761904761905
# 0.90476190476

最佳答案

您的代码:

import numpy as np
from random import seed
seed(1) ### <-----

设置python的random-class的随机种子.

但是sklearn完全基于numpy的random class , 作为 explained here :

For testing and replicability, it is often important to have the entire execution controlled by a single seed for the pseudo-random number generator used in algorithms that have a randomized component. Scikit-learn does not use its own global random state; whenever a RandomState instance or an integer random seed is not provided as an argument, it relies on the numpy global random state, which can be set using numpy.random.seed. For example, to set an execution’s numpy global random state to 42, one could execute the following in his or her script:

import numpy as np

np.random.seed(42)

所以一般来说你应该这样做:

np.random.seed(1)

但这只是事实的一部分,因为在小心使用所有 sklearn-components 时通常不需要这样做,明确地用一些种子调用它们!

ShreyasG 所述,这也适用于 train_test_split

关于python-2.7 - 为什么随机种子不能使结果在 Python 中保持不变,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46661426/

相关文章:

python - 使用两个或多个列表进行聚类

java - Jython 随机字符串生成

javascript - 你会为 GNOME 桌面应用程序推荐 JavaScript 吗?

google-app-engine - Google App Engine 上 Jinja2 中的千位分隔符错误

soap - Spyne SOAP 服务器的合格元素/属性形式和不合格形式

random - 具有 > 64 位种子的伪随机数生成器,用于 52 张纸牌洗牌

java - 在Java中计算生成的随机数中的最大和最小数

ruby-on-rails - Rails 播种 boolean 值不起作用

objective-c - 具有上下文的确定性随机数生成器?

python - 值错误: time data does not match format (convert part of string to time)