利用 ocr 整理仓库库存

需求: 统计整理仓库里联想主机的序列号思路:

通过扫码枪的方式，手机扫描记录 S/N 号码再导出
通过调用联想接口，小程序可识别部分 S/N，但机型太久远估计数据库不全面，且没有找到接口，只找到一条外链分析 S/N 号码 https://cas.wx.lenovo.com.cn/api/device/check/sn?sn=
通过拍照上传图片，OCR 识别，可以通过天若一张张识别，或者选择百度智能云免费接口，再将图片上传到七牛云个人空间外链识别百度智能云

流程:

获取图片，经测试，iPhone12mini 照片精度为 6 台机左右
统一图片格式命名，可以用批量文件软件改名，方便后续遍历
上传图片到七牛云，生成 CDN 外链
提交 post 请求到百度 API，返回识别结果保存到本地
数据清洗，将文本解析为 JSON，提取所有的 words 字段写入新文件，正则匹配所有 PC 开头，后面跟着 6 个字符，纠正识别错误’PCO’为’PC0'
后续人工校对

API 请求识别

识别后会返回一个 json 格式的 txt 文件，需要对内容进行清洗提取

展开代码

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37



import requests

API_KEY = ""
SECRET_KEY = ""

def main():

    url = "https://aip.baidubce.com/rest/2.0/ocr/v1/accurate_basic?access_token=" + get_access_token()
    with open("PCNAME2.txt", "w+", encoding="utf-8") as f:
    # 循环写入数据

        for i in range(45,117):
            #print(i)
            payload=''.format(i)
            headers = {
                'Content-Type': 'application/x-www-form-urlencoded',
                'Accept': 'application/json'
            }

            response = requests.request("POST", url, headers=headers, data=payload)
            #f.write(str(i)+'\n')
            f.write(response.text+'\n')
            print(response.text)
        f.close()

def get_access_token():
    """
    使用 AK，SK 生成鉴权签名（Access Token）
    :return: access_token，或是None(如果错误)
    """
    url = "https://aip.baidubce.com/oauth/2.0/token"
    params = {"grant_type": "client_credentials", "client_id": API_KEY, "client_secret": SECRET_KEY}
    return str(requests.post(url, params=params).json().get("access_token"))

if __name__ == '__main__':
    main()

数据清洗，封装整合

由于精度不是特别高，注意经常会有数字 0 被识别成字母 O，字母 I 和 L 被识别成 1，因此需要人工检查排除错误，另外还有一部分可能被识别漏了，因此在整理的时候按特定顺序排列方便后续找回补充数据

展开代码

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70


import json
import re

def re_matchPC(content):

    # 编译正则表达式模式
    # PC开头，后面跟着6个字符（字母或数字）
    pattern = re.compile(r'PC[A-Z0-9]{6}')

    # 存储匹配的结果
    matches = []

    # 读取文件并查找匹配

    # 找到所有匹配项
    matches = pattern.findall(content)

    # 去除重复项
    unique_matches = list(dict.fromkeys(matches))

    # 将结果写入新文件

    print(f"提取完成！共找到 {len(unique_matches)} 个唯一的PC编号，结果已保存到 new 3.txt")
    return unique_matches

def replace_O(lines):

    # Read the original file

    # Process the lines, replacing 'PCO' with 'PC0'
    modified_lines = [line.replace('PCO', 'PC0') for line in lines]

    # Write the modified lines back to the file


    print("File has been updated successfully.")
    return modified_lines


# 读取输入文件
words_list = []

with open('new 1.txt', 'r', encoding='utf-8') as f:
    # 逐行读取并处理
    for line in f:
        try:
            # 解析每一行的JSON
            json_data = json.loads(line.strip())

            # 提取words_result中的words字段
            if 'words_result' in json_data:
                for item in json_data['words_result']:
                    if 'words' in item:
                        words_list.append(item['words'])

        except json.JSONDecodeError:
            print(f"警告：跳过无效的JSON行: {line[:50]}...")
        except Exception as e:
            print(f"处理行时发生错误：{str(e)}")


print(' '.join(words_list))
print(f"提取完成！共提取了 {len(words_list)} 个words，结果已保存到 new 2.txt")
content = ' '.join(words_list)

lines = re_matchPC(content)

results = replace_O(lines)

print(results)

利用ocr整理仓库库存

方法

利用 ocr 整理仓库库存

API 请求识别

数据清洗，封装整合