python - 使用 python 和 tensorflow 从图像中识别数字

标签 python opencv ubuntu tensorflow number-recognition

详细信息:Ubuntu 14.04(LTS)、OpenCV 2.4.13、Spyder 2.3.9(Python 2.7)、Tensorflow r0.10

我想识别号码来自 the image使用 PythonTensorflow(可选 OpenCV)。

此外,我想使用 MNIST 数据训练和 tensorflow

像这样(代码引用了this page的视频),

代码:

import tensorflow as tf
import random

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

x = tf.placeholder("float", [None, 784])
y = tf.placeholder("float", [None, 10])

W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

learning_rate = 0.01
training_epochs = 25
batch_size = 100
display_step = 1

### modeling ###

activation = tf.nn.softmax(tf.matmul(x, W) + b)

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(activation), reduction_indices=1))

optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)

init = tf.initialize_all_variables()

sess = tf.Session()
sess.run(init)

### training ###

for epoch in range(training_epochs) :

    avg_cost = 0
    total_batch = int(mnist.train.num_examples/batch_size)

    for i in range(total_batch) :

        batch_xs, batch_ys =mnist.train.next_batch(batch_size)
        sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys})
        avg_cost += sess.run(cross_entropy, feed_dict = {x: batch_xs, y: batch_ys}) / total_batch

    if epoch % display_step == 0 :
        print "Epoch : ", "%04d" % (epoch+1), "cost=", "{:.9f}".format(avg_cost)

print "Optimization Finished"

### predict number ###

r = random.randint(0, mnist.test.num_examples - 1)
print "Prediction: ", sess.run(tf.argmax(activation,1), {x: mnist.test.images[r:r+1]})
print "Correct Answer: ", sess.run(tf.argmax(mnist.test.labels[r:r+1], 1))

但是,问题是我怎样才能使 numpy 数组像

代码补充:

mnist.test.images[r:r+1]

[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.50196081 0.50196081 0.50196081 0.50196081 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.50196081 1. 1. 1. 1. 1. 1. 0.50196081 0.25098041 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.50196081 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0.25098041 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.74901962 1. 1. 1. 1. 0.50196081 0.50196081 0.50196081 0.74901962 1. 1. 1. 0.74901962 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.50196081 1. 1. 1. 0.74901962 0. 0. 0. 0. 0. 0. 0.50196081 1. 1. 0.74901962 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 0.50196081 0. 0. 0. 0. 0. 0. 0. 0. 0.25098041 1. 1. 0.74901962 0.25098041 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.74901962 1. 1. 0.74901962 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.25098041 1. 1. 0.74901962 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.50196081 1. 1. 0.74901962 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.25098041 1. 1. 0.50196081 0. 0. 0. 0. 0. 0. 0. 0. 0.50196081 1. 1. 0.25098041 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0.50196081 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.25098041 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0.50196081 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.25098041 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.74901962 1. 0.50196081 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.74901962 1. 1. 1. 0.25098041 0. 0. 0. 0. 0. 0. 0. 0. 0.50196081 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.25098041 0.74901962 1. 1. 1. 1. 0.74901962 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.50196081 1. 1. 0.74901962 0. 0. 0. 0. 0. 0.25098041 0.50196081 1. 1. 1. 1. 1. 1. 0.50196081 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.74901962 1. 1. 1. 1. 0.50196081 0.50196081 0.74901962 1. 1. 1. 1. 1. 1. 1. 0.50196081 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.74901962 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0.50196081 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.25098041 1. 1. 1. 1. 1. 1. 1. 0.50196081 0.25098041 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.50196081 0.50196081 0.50196081 0.50196081 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]]

当我使用 OpenCV 解决问题时,我可以制作关于图像的 numpy 数组,但有点奇怪。 (我想把array做成一个28x28的vector)

代码补充:

image = cv2.imread("img_easy.jpg")
resized_image = cv2.resize(image, (28, 28))

[[[255 255 255] [255 255 255] [255 255 255] ..., [255 255 255] [255 255 255] [255 255 255]]

[[255 255 255] [255 255 255] [255 255 255] ..., [255 255 255] [255 255 255] [255 255 255]]

[[255 255 255] [255 255 255] [255 255 255] ..., [255 255 255] [255 255 255] [255 255 255]]

...,

[[255 255 255] [255 255 255] [255 255 255] ..., [255 255 255] [255 255 255] [255 255 255]]

[[255 255 255] [255 255 255] [255 255 255] ..., [255 255 255] [255 255 255] [255 255 255]]

[[255 255 255] [255 255 255] [255 255 255] ..., [255 255 255] [255 255 255] [255 255 255]]]

然后,我将 value('resized_image') 放入 Tensorflow 代码中。 像这样,

代码修改:

### predict number ###

print "Prediction: ", sess.run(tf.argmax(activation,1), {x: resized_image})
print "Correct Answer: 9"

因此,错误发生在这一行。

ValueError: Cannot feed value of shape (28, 28, 3) for Tensor u'Placeholder_2:0', which has shape '(?, 784)'

最后,

1)我想知道如何制作一个可以输入tensorflow代码的数据(可能是numpy数组[784])

2)你知道使用te​​nsorflow的数字识别例子吗?

我是机器学习的初学者。

请详细告诉我我该怎么做。

最佳答案

您使用的图像似乎是 RGB,因此是第 3 维 (28,28,3)。

因为原始 MNIST 图像是灰度,宽度和高度为 28。这就是为什么 x 占位符的形状是 [None, 784],因为 28*28= 784。

CV2 以 RGB 格式读取图像,而您希望它是灰度图像,即 (28,28) 在执行 imread 时,您可能会发现使用它很有帮助。

image = cv2.imread("img_easy.jpg", cv2.CV_LOAD_IMAGE_GRAYSCALE)

通过这样做,您的图像应该具有正确的形状 (28, 28)。

此外,CV2 图像值与您的问题中显示的 MNIST 图像不在同一范围内。您可能需要对图像中的值进行归一化,使它们在 0-1 范围内。

此外,您可能还想为此使用 CNN(稍微高级一些,但应该会提供更好的结果)。看本页教程https://www.tensorflow.org/tutorials/了解更多详情。

关于python - 使用 python 和 tensorflow 从图像中识别数字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39032277/

相关文章:

ubuntu - PasswordAuthenticator 导致 Cassandra 连接超时

python - 如何在 django GET api(rest 框架)中返回单个对象

python - Django Createview保存表单并根据外键的id重定向到页面

matlab - OpenCV 2.3 相机标定

image - python : Concatenate 2 irregular sized images by automatically detecting the sizes

ubuntu - 如何在 Ubuntu 18.04+ 上安装 ksnapshot?

python - Django - 以geoJSON格式获取多边形的质心

python - Flask:Apache Httpd 字体端和 websocket 支持

c++ - Opencv C++ 图像导出到 C#

ubuntu - 我无法将 XRDP 配置为重新连接到 Ubuntu 20 中的现有 session