Python:加载 MNIST 数据时出错

标签 python machine-learning mnist

使用以下代码加载 MNIST 数据时发生错误。(anaconda 已在在线 Jupyter Notebook 上安装并编码。)

from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')

出现Timeouterror,我不知道哪里出错了。我已经关闭了我的 VPN 代理,但它不起作用。救命!

TimeoutError                              Traceback (most recent call last)
<ipython-input-1-3ba7b9c02a3b> in <module>()
      1 from sklearn.datasets import fetch_mldata
----> 2 mnist = fetch_mldata('MNIST original')

~\Anaconda3\lib\site-packages\sklearn\datasets\mldata.py in fetch_mldata(dataname, target_name, data_name, transpose_data, data_home)
    152         urlname = MLDATA_BASE_URL % quote(dataname)
    153         try:
--> 154             mldata_url = urlopen(urlname)
    155         except HTTPError as e:
    156             if e.code == 404:

~\Anaconda3\lib\urllib\request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    221     else:
    222         opener = _opener
--> 223     return opener.open(url, data, timeout)
    224 
    225 def install_opener(opener):

~\Anaconda3\lib\urllib\request.py in open(self, fullurl, data, timeout)
    524             req = meth(req)
    525 
--> 526         response = self._open(req, data)
    527 
    528         # post-process response

~\Anaconda3\lib\urllib\request.py in _open(self, req, data)
    542         protocol = req.type
    543         result = self._call_chain(self.handle_open, protocol, protocol +
--> 544                                   '_open', req)
    545         if result:
    546             return result

~\Anaconda3\lib\urllib\request.py in _call_chain(self, chain, kind, meth_name, *args)
    502         for handler in handlers:
    503             func = getattr(handler, meth_name)
--> 504             result = func(*args)
    505             if result is not None:
    506                 return result

~\Anaconda3\lib\urllib\request.py in http_open(self, req)
   1344 
   1345     def http_open(self, req):
-> 1346         return self.do_open(http.client.HTTPConnection, req)
   1347 
   1348     http_request = AbstractHTTPHandler.do_request_

~\Anaconda3\lib\urllib\request.py in do_open(self, http_class, req, **http_conn_args)
   1319             except OSError as err: # timeout error
   1320                 raise URLError(err)
-> 1321             r = h.getresponse()
   1322         except:
   1323             h.close()

~\Anaconda3\lib\http\client.py in getresponse(self)
   1329         try:
   1330             try:
-> 1331                 response.begin()
   1332             except ConnectionError:
   1333                 self.close()

~\Anaconda3\lib\http\client.py in begin(self)
    295         # read until we get a non-100 response
    296         while True:
--> 297             version, status, reason = self._read_status()
    298             if status != CONTINUE:
    299                 break

~\Anaconda3\lib\http\client.py in _read_status(self)
    256 
    257     def _read_status(self):
--> 258         line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
    259         if len(line) > _MAXLINE:
    260             raise LineTooLong("status line")

~\Anaconda3\lib\socket.py in readinto(self, b)
    584         while True:
    585             try:
--> 586                 return self._sock.recv_into(b)
    587             except timeout:
    588                 self._timeout_occurred = True

TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

我下载了 MNIST 数据集并尝试自己加载数据。我复制了用于加载 MNIST 的代码,但再次加载数据失败。我认为我需要更改一些代码,而不是完全从互联网上复制代码,但我不知道应该在哪里进行更改。(只是Python的初学者) 我用来加载下载的 MNIST 数据的代码。是否是因为我将数据放入了错误的文件中?

def loadmnist(imagefile, labelfile):

    # Open the images with gzip in read binary mode
    images = open(imagefile, 'rb')
    labels = open(labelfile, 'rb')

    # Get metadata for images
    images.read(4)  # skip the magic_number
    number_of_images = images.read(4)
    number_of_images = unpack('>I', number_of_images)[0]
    rows = images.read(4)
    rows = unpack('>I', rows)[0]
    cols = images.read(4)
    cols = unpack('>I', cols)[0]

    # Get metadata for labels
    labels.read(4)
    N = labels.read(4)
    N = unpack('>I', N)[0]

    # Get data
    x = np.zeros((N, rows*cols), dtype=np.uint8)  # Initialize numpy array
    y = np.zeros(N, dtype=np.uint8)  # Initialize numpy array
    for i in range(N):
        for j in range(rows*cols):
            tmp_pixel = images.read(1)  # Just a single byte
            tmp_pixel = unpack('>B', tmp_pixel)[0]
            x[i][j] = tmp_pixel
        tmp_label = labels.read(1)
        y[i] = unpack('>B', tmp_label)[0]

    images.close()
    labels.close()
    return (x, y)

上面的部分很好。

train_img, train_lbl = loadmnist('data/train-images-idx3-ubyte'
                                 , 'data/train-labels-idx1-ubyte')
test_img, test_lbl = loadmnist('data/t10k-images-idx3-ubyte'
                               , 'data/t10k-labels-idx1-ubyte')

错误是这样的。

FileNotFoundError                         Traceback (most recent call last)
<ipython-input-5-b23a5078b5bb> in <module>()
      1 train_img, train_lbl = loadmnist('data/train-images-idx3-ubyte'
----> 2                                  , 'data/train-labels-idx1-ubyte')
      3 test_img, test_lbl = loadmnist('data/t10k-images-idx3-ubyte'
      4                                , 'data/t10k-labels-idx1-ubyte')

<ipython-input-4-967098b85f28> in loadmnist(imagefile, labelfile)
      2 
      3     # Open the images with gzip in read binary mode
----> 4     images = open(imagefile, 'rb')
      5     labels = open(labelfile, 'rb')
      6 

FileNotFoundError: [Errno 2] No such file or directory: 'data/train-images-idx3-ubyte'

我下载的数据放在我刚刚创建的文件夹中。 enter image description here

最佳答案

如果您想直接从某个库加载数据集而不是下载然后加载它,请从 Keras 加载。

可以这样做

from keras.datasets import mnist

(X_train, y_train), (X_test, y_test) = mnist.load_data()

如果你是机器学习和Python的初学者,想了解更多,我建议你看一下this优秀的博客文章。

此外,将文件传递给函数时还需要文件扩展名。即你必须像这样调用该函数。

train_img, train_lbl = loadmnist('mnist//train-images-idx3-ubyte.gz'
                                 , 'mnist//train-labels-idx1-ubyte.gz')
test_img, test_lbl = loadmnist('mnist//t10k-images-idx3-ubyte.gz'
                               , 'mnist//t10k-labels-idx1-ubyte.gz')

在您用于从本地磁盘加载数据的代码中,它会抛出错误,因为文件不存在于给定位置。确保文件夹 mnist 存在于您的笔记本所在的文件夹中。

关于Python:加载 MNIST 数据时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51245961/

相关文章:

python - 避免在 python 中重复键入类名

python - 为什么在将ubuntu从12.04升级到14.04后使用python的numpy进行矩阵乘法变得如此缓慢?

python - 处理图像作为mnist模型的输入

python - Scikit-learn - 无法在 Python 中使用 fetch_openml 加载 MNIST 原始数据集

python - session 和并行在 TF2.0 中如何工作?

python - Spark Dataframe Pivot(无聚合)

python - Tensorflow 在使用 tf.cond() 时要求输入不必要的占位符

machine-learning - 训练深度学习模型时什么时候应该使用预训练权重?

python - 根据文本语料库中的出现次数列出词汇表中的单词,使用 Scikit-Learn CountVectorizer

model - Tensorflow Slim 恢复模型并预测