ESP32捕获实时视频

环境：ESP-IDF v5.1.4，Python v3.12.4

在乐鑫官网查找相关示例，ESP-IoT-Solution符合需求，从github拉取相关代码。

1	git clone --recursive https://github.com/espressif/esp-iot-solution

第一步——配置

进入esp-iot-solution/examples/camera/video_stream_server，输入以下命令：

1 2	idf.py set-target esp32s3 idf.py menuconfig

进入菜单后，进入Camera Pin Configuration选项，选择对应的开发板（这个没配置似乎会出问题）

如果想要改变ESP32的WIFI设置，进入Example Connection Configuration选项。里面包含如：设置AP模式下WIFI的SSID、IP地址、密码，或者设置STA模式（就是要连入的WIFI的SSID和密码）

默认情况下，ESP32为AP模式，且没有密码，最大连入数量为1，默认IP地址为：192.168.4.1

第二步——烧录

在Ubuntu系统下，直接输入：

1	idf.py build flash monitor

外部端口会自动匹配

第三步——测试

在连入到ESP32的WIFI后，先进行PING通测试，然后在浏览器中输入192.168.4.1/stream就可以查看摄像头捕获镜头

如果是STA模式，那么就需要知道ESP32被DHCP分配到的IP地址，假设为192.168.1.10，进行PING通测试，然后在浏览器中输入192.168.1.10.stream

主机获取视频流

第一步——环境配置

Python版本：3.8.19

代码中会使用opencv库，需要提前下好：

1	pip install python-opencv

第二步——图传代码

具体代码：

import cv2
 

# 视频流地址
url = "http://192.168.2.8/stream"

# 打开摄像头
cap = cv2.VideoCapture(url)
 
while True:
    # 读取摄像头的帧
    ret, frame = cap.read()
    
    # 在窗口中显示帧
    cv2.imshow('Camera', frame)
    
    # 按下'q'键退出循环
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
 
# 释放摄像头并关闭窗口
cap.release()
cv2.destroyAllWindows()

运行上面的代码，就可以获取到ESP32捕获的视频流了

模型训练

这里直接选用MNIST手写体模型，如果想自己训练的话可以直接按照我的另一篇文章走流程：

Seeed_Studio_MNIST实例实现部分 | Norlcyan’s Blog

我就直接用现成的模型了，模型在下面的网页下方获取：

MNIST_Classification

模型部署

这部分搞了最久，没接触过相关领域，基本上都是网上各种示例，再加上点AI的魔法，东拼西凑出来的。

我模型采用的是TFLITE，这个部署起来感觉简单点
下面是详细的代码

import cv2
import time
import numpy as np
import tensorflow as tf
from PIL import Image

# 视频流地址
url = "http://192.168.2.8/stream"

# TFLite模型导入
model_path = 'best_accuracy_top1_epoch_10_float32.tflite'
interpreter = tf.lite.Interpreter(model_path=model_path)
interpreter.allocate_tensors()

# 获取输入和输出张量
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# 打开摄像头
cap = cv2.VideoCapture(url)

# 获取当前时间
start_time = time.time()

# 定义预处理函数
def preprocess(image_path):
    img = Image.open(image_path).convert('L')
    img = img.resize((32, 32))
    img = np.array(img).astype(np.float32)
    img = img.reshape(1, 32, 32, 1)  # TFLite expects NHWC format
    img /= 255.0
    return img

# 推理
def predict(image_path):
    img = preprocess(image_path)
    interpreter.set_tensor(input_details[0]['index'], img)
    interpreter.invoke()
    output_data = interpreter.get_tensor(output_details[0]['index'])
    return np.argmax(output_data)

# 将灰色图片转换为黑白图片(之前卡在这部分，不将图片转换为黑白图片识别出的结果基本都是8)
def convert_to_binary(image):
    threshold_value = 144
    _, binary_image = cv2.threshold(image, threshold_value, 255, cv2.THRESH_BINARY)
    return binary_image

while True:
    # 读取摄像头的帧
    ret, frame = cap.read()

    # 去除镜像
    frame = cv2.flip(frame, 1)
    
    # 将帧转换为灰色单通道
    gray_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # 将灰色图片转换为黑白图片
    binary_frame = convert_to_binary(gray_frame)

    # 在窗口中显示黑白帧
    cv2.imshow('Camera', binary_frame)

    # 获取当前时间
    current_time = time.time()

    # 每隔10秒保存一张图片
    if current_time - start_time >= 10:
        cv2.imwrite(f'image_{int(current_time)}.jpg', binary_frame)
        image_path = (f'image_{int(current_time)}.jpg')

        # 推理
        prediction = predict(image_path)
        print(f'Prediction: {prediction}')

        start_time = current_time

    # 按下'q'键退出循环
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# 释放摄像头并关闭窗口
cap.release()
cv2.destroyAllWindows()

在原来的基础上添加点内容，包括模型导入、图片截取等等。