声明

本文代码均保存在
https://github.com/super-213/business_data_analysis
有需要的可以自行下载

查看数据

1
2
3

df = pd.read_excel('手写字体识别.xlsx'

df.head()

对应数字	0	1	2	3	4	5	6	7	8	…	1014	1015	1016	1017	1018	1019	1020	1021	1022	1023
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

数据划分

X = df.drop(columns='对应数字')
y = df['对应数字']

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

KNN

from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.9534883720930233

测试

from PIL import Image
img = Image.open('数字4.png')
img = img.resize((32,32))
img = img.convert('L')

import numpy as np
img_new = img.point(lambda x: 0 if x > 128 else 1)
arr = np.array(img_new)
arr_new = arr.reshape(1, -1)

answer = knn.predict(arr_new) 
print('图片中的数字为：' + str(answer[0]))

图片中的数字为：4

SVM

from sklearn.svm import SVC
svm = SVC(kernel='rbf')
svm.fit(X_train, y_train)
score = svm.score(X_test, y_test)
print('SVM模型的准确率为：' + str(score))

SVM模型的准确率为：0.9715762273901809
测试

1 2	answer = rf.predict(arr_new) print('图片中的数字为：' + str(answer[0]))

图片中的数字为：4

随机森林

from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=100)
rf.fit(X_train, y_train)
score = rf.score(X_test, y_test)
print('随机森林模型的准确率为：' + str(score))

随机森林模型的准确率为：0.9689922480620154

测试

1 2	answer = rf.predict(arr_new) print('图片中的数字为：' + str(answer[0]))

图片中的数字为：4

MLP

from sklearn.neural_network import MLPClassifier
mlp = MLPClassifier(hidden_layer_sizes=(100,), max_iter=500)
mlp.fit(X_train, y_train)
score = mlp.score(X_test, y_test)
print('神经网络模型的准确率为：' + str(score))

神经网络模型的准确率为：0.958656330749354

测试

1 2	answer = rf.predict(arr_new) print('图片中的数字为：' + str(answer[0]))

图片中的数字为：4

CNN

df = pd.read_excel('手写字体识别.xlsx')

X = df.drop(columns='对应数字')
y = df['对应数字']

X = X.to_numpy().reshape(-1, 32, 32, 1).astype('float32')

X = X / 255.0

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = models.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', input_shape=(32,32,1)),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.MaxPooling2D((2,2)),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

history = model.fit(X_train, y_train,
                    epochs=10,
                    batch_size=32,
                    validation_data=(X_test, y_test))

Epoch 1/10
/opt/anaconda3/envs/tf-metal/lib/python3.11/site-packages/keras/src/layers/convolutional/base_conv.py:113: UserWarning: Do not pass an input_shape/input_dim argument to a layer. When using Sequential models, prefer using an Input(shape) object as the first layer in the model instead.
super().init(activity_regularizer=activity_regularizer, **kwargs)
2025-09-23 20:17:35.573018: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type GPU is enabled.
49/49 ━━━━━━━━━━━━━━━━━━━━ 4s 33ms/step - accuracy: 0.1034 - loss: 2.3043 - val_accuracy: 0.0749 - val_loss: 2.3025
Epoch 2/10
49/49 ━━━━━━━━━━━━━━━━━━━━ 1s 13ms/step - accuracy: 0.1028 - loss: 2.3010 - val_accuracy: 0.1137 - val_loss: 2.2969
Epoch 3/10
49/49 ━━━━━━━━━━━━━━━━━━━━ 1s 11ms/step - accuracy: 0.1713 - loss: 2.2660 - val_accuracy: 0.2481 - val_loss: 2.1901
Epoch 4/10
49/49 ━━━━━━━━━━━━━━━━━━━━ 1s 11ms/step - accuracy: 0.5184 - loss: 1.9033 - val_accuracy: 0.6382 - val_loss: 1.4705
Epoch 5/10
49/49 ━━━━━━━━━━━━━━━━━━━━ 1s 12ms/step - accuracy: 0.7673 - loss: 1.0627 - val_accuracy: 0.8062 - val_loss: 0.7945
Epoch 6/10
49/49 ━━━━━━━━━━━━━━━━━━━━ 1s 11ms/step - accuracy: 0.8649 - loss: 0.5883 - val_accuracy: 0.8527 - val_loss: 0.5462
Epoch 7/10
49/49 ━━━━━━━━━━━━━━━━━━━━ 1s 12ms/step - accuracy: 0.8908 - loss: 0.4113 - val_accuracy: 0.8786 - val_loss: 0.4564
Epoch 8/10
49/49 ━━━━━━━━━━━━━━━━━━━━ 1s 12ms/step - accuracy: 0.9089 - loss: 0.3249 - val_accuracy: 0.9044 - val_loss: 0.3793
Epoch 9/10
49/49 ━━━━━━━━━━━━━━━━━━━━ 1s 11ms/step - accuracy: 0.9211 - loss: 0.2893 - val_accuracy: 0.9018 - val_loss: 0.3861
Epoch 10/10
49/49 ━━━━━━━━━━━━━━━━━━━━ 1s 11ms/step - accuracy: 0.9302 - loss: 0.2541 - val_accuracy: 0.9173 - val_loss: 0.3233

测试

from PIL import Image
import numpy as np

img = Image.open('数字4.png')
img = img.resize((32, 32))
img = img.convert('L')
img_new = img.point(lambda x: 0 if x > 128 else 1)

arr = np.array(img_new)

arr_new = arr.reshape(1, 32, 32, 1).astype('float32')


answer = model.predict(arr_new)
predicted_digit = np.argmax(answer[0])

print('图片中的数字为：' + str(predicted_digit))

1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 582ms/step
图片中的数字为：4

智浩的Blog

商业数据分析--手写字体识别

声明

查看数据

数据划分

KNN

SVM

随机森林

MLP

CNN

对应数字	0	1	2	3	4	5	6	7	8	…	1014	1015	1016	1017	1018	1019	1020	1021	1022	1023
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

对应数字	0	1	2	3	4	5	6	7	8	…	1014	1015	1016	1017	1018	1019	1020	1021	1022	1023
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

对应数字	0	1	2	3	4	5	6	7	8	…	1014	1015	1016	1017	1018	1019	1020	1021	1022	1023
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0