scratchJR_OCRversion二次开发说明

文章发布时间:

最后更新时间:

文章总字数:
3.6k

预计阅读时间:
18 分钟

一次在AI辅助下开展的软件二次开发实践,主要涉及有Electron,SQLite技术栈。

针对特定需求进行scratchJR软件的二次开发说明

1724753512299

开发目的

实体化的积木可以让学生直观地看到代码的结构和逻辑,帮助他们理解编程的基本概念。通过动手操作积木,学生可以更积极地参与学习过程,增强学习的互动性和趣味性。编程对初学者来说可能比较抽象和复杂,实体化的积木可以降低学习难度,让编程变得更加容易接近。通过组合不同的积木块来构建程序,学生可以学习到基本的逻辑和顺序结构,培养逻辑思维能力。

因此为scratchJR教学软件添加拍照识别特定实体代码块,并转换为对应可执行文件的功能。

需求分析

  • scratchJR二次开发版(以下简称为“sjr”)应在软件原有基础上增加拍照,识别实体代码块,转换为sjr可识别的文件并打开的功能。

  • sjr应能够在Windows平台(win32_x64),西沃教学白板等平台正常运行。

设计概要

Scratch Jr. Architecture Diagram

因sjr技术框架为Electron,将拍摄+图像识别,处理+导入图片数据分别设置为两个功能按钮,并辅以图像与标号示意。

功能实现细节见“详细设计”。

拍摄+图像识别

对各类代码块进行编号并标识,使用Python+OpenCV进行摄像头行为控制,将拍下的图片进行OCR文字识别,将识别结果通过ipc通信导入下一步等待用户选择是否导入或重新拍摄。

image-20240827181615691

UMI离线OCR文字识别方案:调用本地文字识别引擎实现无网络情况下sjr能够正常工作,OCR引擎以打包致sjr文件包中并实现点击即用。用户仅需启动OCR服务后即可正常使用sjr识别功能。

开源项目地址:hiroi-sora/Umi-OCR: OCR software, free and offline. 开源、免费的离线OCR软件。内置多国语言库。 (github.com)

处理+导入数据

当用户按下导入数据按钮时,将识别结果导入sjr数据库并自动刷新界面,主界面出现按当前时间命名的新工程,导入成功。

详细设计

参照scratchJR官方二次开发说明进行开发构建。

前端页面设计

image-20240827181550788

home.html下添加所需按钮与对应功能:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<ul>
<button id="myButton" class="shifted-image">
<img src="./pic.png" alt="拍照识别">
</button>


<button id="loadJsonButton">
<img src="./picture.png" alt="导入代码">
</button>


<!-- 在<body>标签中的合适位置添加以下按钮 -->

<script src="./QSCode/QScript.js"></script>


<div></div>
</ul>

其中配置按钮的图像与样式的类型如下:其中,点开头的样式用于对按钮位置进行修饰。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
<!--pic button style--->
<style>
button {
border: none;
outline: none;
background: none;
padding: 0;
cursor: pointer;
}

button img {
display: block;
}

.shifted-image {
margin-left: 59%;
/* 将图片下移其高度的一半 */
}

.scaled-image {
width: 70%;
height: auto;
}

.both-images {
margin-top: 30%;
width: 80%;
height: auto;
}

.button-container {
display: flex;
}
</style>
<!--pic button style--->

其中按钮对应的JavaScript代码将实现拍摄功能对python的调用与前端用户反馈,具体内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// 点击按钮后执行 
const { exec } = require('child_process');
const path = require('path');

document.addEventListener('DOMContentLoaded', function () {
// 通过ID获取按钮元素
var button = document.getElementById('myButton');

// 为按钮添加点击事件监听器
button.addEventListener('click', function () {


// 调用python脚本
const { spawn } = require('child_process');

// 使用spawn启动Python脚本
// const pythonProcess = spawn('python', ['./src/app/QSCode/a().py']);
const pypath = path.join(__dirname, './QSCode/pic2json/dist/pic2code.exe')
const pythonProcess = spawn(pypath);

// 监听Python脚本的输出
pythonProcess.stdout.on('data', (data) => {
console.log(`Python script output: ${data}`);
});

// 监听Python脚本的错误输出
pythonProcess.stderr.on('data', (data) => {
console.error(`Python script error: ${data}`);
});

// 监听Python脚本结束事件
pythonProcess.on('close', (code) => {
console.log(`Python script closed with code ${code}`);
alert('图片已拍摄,请确保启动OCR服务后点击"2"按钮'); // 显示一个警告框
});
});
});

数据处理与导入

以上JavaScript脚本将调用由python编写的图像处理脚本,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
# 启动程序时,先调用capture_image()函数,将图像保存到pic文件夹中,然后调用convert_image_to_base64()函数,将图像转换为base64编码,并将编码发送给后端。后端收到编码后,调用OCR接口,将图像中的文字识别出来,并将识别结果返回给前端。前端将识别结果显示在画布上。

#OCR程序应及时启动

import json
import cv2
import os
import base64
import requests
import sqlite3
import time
import re
import sys
# Existing code

result = {
"pages": [
"page 1"
],
"currentPage": "page 1",
"page 1": {
"textstartat": 36,
"sprites": [
"Tic 1"
],
"num": 1,
"lastSprite": "Tic 1",
"Tic 1": {
"shown": True,
"type": "sprite",
"md5": "Blue.svg",
"id": "Tic 1",
"flip": True,
"name": "Tic",
"angle": 0,
"scale": 0.5,
"speed": 2,
"defaultScale": 0.5,
"sounds": [
"pop.mp3"
],
"xcoor": 123,
"ycoor": 180,
"cx": 73,
"cy": 123,
"w": 147,
"h": 247,
"homex": 123,
"homey": 180,
"homescale": 0.5,
"homeshown": True,
"homeflip": True,
"scripts": [
[]
]
},
"layers": [
"Tic 1"
]
}
}

BDict = {
"A1": ["onflag", "null",10,10],
"A2": ["onclick", "null",10,10],
"A3": ["ontouch", "null",10,10],
"A4": ["onmessage", "null",10,10],
"A5": ["message", "Orange",10,10],

"B1": ["home", "null",10,10],
"B2": ["hop", "1",10,10],
"B3": ["left", "1",10,10],
"B4": ["right", "1",10,10],
"B5": ["down", "1",10,10],
"B6": ["up", "1",10,10],
"B7": ["forward", "1",10,10],
"B8": ["back", "1",10,10],

"C1": ["say", "hi",10,10],
"C2": ["grow", "1",10,10],
"C3": ["shrink", "1",10,10],
"C4": ["same", "null",10,10],
"C5": ["hide", "null",10,10],
"C6": ["show", "null",10,10],

"D1": ["endstack", "null",10,10],
"D2": ["forever", "null",10,10],

"E1": ["playsnd", "pop.mp3",10,10],

"F1": ["wait", "1",10,10],
"F2": ["stopmine", "null",10,10],
"F3": ["setspeed", "1",10,10],
"F4": ["repeat", "1",10,10,[]],
}

executable_path = os.path.dirname(sys.executable)
pic_folder_path = os.path.join(executable_path, 'pic')
imagepath = pic_folder_path + '\\' + 'capture.jpg'

def capture_image():
# Create the pic folder if it doesn't exist
if not os.path.exists('pic'):
os.makedirs('pic')

# Capture image from camera
cap = cv2.VideoCapture(0)
ret, frame = cap.read()

# Save the captured image
if ret:
cv2.imwrite(imagepath, frame)
print("Image captured and saved successfully.")
else:
print("Failed to capture image.")

# Release the camera
cap.release()

# Convert image to base64
def convert_image_to_base64(image_path):
with open(image_path, "rb") as image_file:
encoded_image = base64.b64encode(image_file.read()).decode("utf-8")
print("Image converted to base64 successfully.")
return encoded_image


# 调用OCR接口
def ocr(encoded_image):
# 启动umi.exe
url = "http://127.0.0.1:1224/api/ocr"
data = {
"base64": encoded_image,
"options": {
"data.format": "text"
}
}

headers = {"Content-Type": "application/json"}
data_str = json.dumps(data)
response = requests.post(url, data=data_str, headers=headers)
if response.status_code == 200:
res_dict = json.loads(response.text)
print("ocr result:\n", res_dict.get("data"))
matches = re.findall(r'([A-Z]\d|\d|\*)', res_dict.get("data"))
formatted_result = ' '.join(matches)
print("formatted result:\n", formatted_result)
return formatted_result
else:
print("Failed to call OCR API.")
return ""

# 支持循环结构, F4 后接循环次数, 后为循环内的指令,*表示循环结束
def word2Json(userInput):
inputList = userInput.split()
loop_start = False
loop_count = 0
loop_instructions = []
loop_x = 10

for i in range(len(inputList)):
if i+1 == len(inputList):
break
if loop_start:
if inputList[i] == "*":
loop_start = False
result["page 1"]["Tic 1"]["scripts"][0].append(["repeat", loop_count, 0, 0, loop_instructions])
loop_instructions = []
else:
if inputList[i] not in BDict:
continue
loop_instructions.append(BDict[inputList[i]])
elif inputList[i] == "F4":
loop_start = True
loop_count = int(inputList[i+1])
else:
if inputList[i] not in BDict:
continue
key = inputList[i]
value = inputList[i+1]
if value.isdigit():
value = int(value)
if value == 0:
# 将0设置为"NULL"

BDict[key][1] = "null"

# 将BDict[key][1]的值设置为上一个BDict[key][1]的值加10
BDict[key][2] = loop_x + 40
# 更新上一个BDict[key][1]的值
loop_x = BDict[key][2]
result["page 1"]["Tic 1"]["scripts"][0].append(BDict[key])


with open('./a.json', 'w') as file:
json.dump(result, file, indent=4)
return result


def mainP2J():
# 拍照并保存图片
capture_image()
# 图片转base64
encoded_image = convert_image_to_base64(imagepath)
# print(encoded_image)

# 调用OCR接口
userinput = ocr(encoded_image)

# userinput = "B3 6 F4 2 B7 2 D1 null B8 3 * B8 3"
word2Json(userinput)
linkSQL(result)

def debugMain():
# 拍照并保存图片
capture_image()
# 图片转base64
encoded_image = convert_image_to_base64(imagepath)
# print(encoded_image)

# 调用OCR接口
userinput = ocr(encoded_image)

#userinput = "B3 6 F4 2 B7 2 D1 null B8 3 * B8 3"
word2Json(userinput)
#linkSQL(result)

debugMain()

使用pyinstaller进行python程序编译,可使其在普通Windows平台正常运行。

另一按钮用于将数据写入jsr,当按钮按下,electron页面将通知渲染进程electronClient.js,借由渲染进程向主进程发送加载数据的信息:

1
2
3
4
5
6
7
8
// 当home.html页面的按钮loadJsonButton被点击时,会向主进程发送消息,消息类型为'load-json-file',消息内容为'a.json'。
document.addEventListener('DOMContentLoaded', () => {
const loadJsonButton = document.getElementById('loadJsonButton');
loadJsonButton.addEventListener('click', () => {
// 使用Node.js的fs模块读取本地文件
ipcRenderer.send('load-json-file', 'a.json');
});
});

将数据导入sjr需要调取程序提供的SQLite数据库管理实例,于main.js中创建一个新方法用于写入数据,通过阅读项目主进程可发现以下数据库管理方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/**
runs a sql query on the database, returns the number of rows from the result
@param {json} json object with stmt and values filled out
@returns lastRowId
*/
stmt(jsonStrOrJsonObj) {


try {
// {"stmt":"select name,thumbnail,id,isgift from projects where deleted = ? AND version = ? AND gallery IS NULL order by ctime desc","values":["NO","iOSv01"]}

// if it's a string, parse it. if not, use it if it's not null.
const json = (typeof jsonStrOrJsonObj === 'string') ? JSON.parse(jsonStrOrJsonObj) : jsonStrOrJsonObj || {};
const stmt = json.stmt;
const values = json.values;



if (DEBUG_DATABASE) debugLog('DatabaseManager executing stmt', stmt, values);

const statement = this.db.prepare(stmt, values);

while (statement.step()) statement.get();
// return JSON.stringify(statement.getAsObject());

const result = this.db.exec('select last_insert_rowid();');

const lastRowId = result[0].values[0][0];

return lastRowId;
} catch (e) {
if (DEBUG_DATABASE) debugLog('stmt failed', jsonStrOrJsonObj, e);
console.error('stmt failed', jsonStrOrJsonObj, e);
return -1;
}
}

透过该方法构建数据库插入语句,并订阅渲染进程传入的导入数据的信息,当接收器收到信号时调用以下方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
ipcMain.on('load-json-file', (event, filename) => {
console.log('load-json-file');
const db = dataStore.getDatabaseManager();

// 读取JSON文件
const filePath = './a.json';
if (!filePath) {
console.log('load-json-file: File could not be resolved.', filename);
event.returnValue = null;
return;
}
const data = fs.readFileSync(filePath, 'utf8');
const newJsonData = JSON.parse(data);

const currentTime = new Date().toISOString();
const timestamp = new Date().getTime();

// 完整测试数据
const newProject = {
// ID: newID,
CTIME: currentTime,
MTIME: timestamp,
ALTMD5: null, // 假设没有其他元数据的MD5,使用NULL
POS: null, // 假设没有特定位置信息,使用NULL
NAME: `P${currentTime.slice(2, 15)}`, // 项目名称,这里使用ID前缀'P'作为示例
JSON: JSON.stringify(newJsonData), // 将JavaScript对象转换为JSON字符串
THUMBNAIL: JSON.stringify({ "pagecount": 1, "md5": "13_823d327a5d13e21aea5d06732a3c329d.png" }), // 假设的缩略图信息
OWNER: null, // 假设没有特定所有者信息,使用NULL
GALLERY: null, // 假设项目不特定于画廊,使用NULL
DELETED: 'NO', // 假设项目未被删除
VERSION: 'iOSv01', // 假设的项目版本
ISGIFT: 0, // 假设项目不是礼物
};

const newCMD = {
stmt: "insert into projects (ctime, mtime, altmd5, pos, name,json, thumbnail, owner, gallery, deleted, version, isgift) values (?,?,?,?,?,?,?,?,?,?,?,?)",
values: [newProject.CTIME, newProject.MTIME, newProject.ALTMD5, newProject.POS, newProject.NAME, newProject.JSON, newProject.THUMBNAIL, newProject.OWNER, newProject.GALLERY, newProject.DELETED, newProject.VERSION, newProject.ISGIFT]
};
event.returnValue = db.stmt(newCMD);
if (win) {
win.reload();
}
console.log('load-json-file result:', event.returnValue);
});

完成数据写入后将自动刷新页面实现新工程的可视化,并通过测试数据完成测试:

1
2
3
// 测试JSON数据
const newJsonData = { "pages": ["page 1"], "currentPage": "page 1", "page 1": { "textstartat": 36, "sprites": ["Tic 1"], "num": 1, "lastSprite": "Tic 1", "Tic 1": { "shown": true, "type": "sprite", "md5": "Blue.svg", "id": "Tic 1", "flip": true, "name": "Tic", "angle": 0, "scale": 0.5, "speed": 2, "defaultScale": 0.5, "sounds": ["pop.mp3"], "xcoor": 123, "ycoor": 180, "cx": 75, "cy": 126, "w": 151, "h": 253, "homex": 123, "homey": 180, "homescale": 0.5, "homeshown": true, "homeflip": true, "scripts": [[["back", "1", 96, 44], ["left", "1", 161, 44], ["repeat", 2, 226, 28, [["forward", "1", 261, 44], ["endstack", "null", 326, 44]]]]] }, "layers": ["Tic 1"] } };

image-20240827181845780

程序打包

使用electron提供的打包工具或原sjr提供的指令进行程序编译,详细配置参数如下:

1
2
3
4
5
6
7
8
9
10
11
"scripts": {
"start": "electron-forge start",
"debugMain": "electron-forge start -i",
"package": "electron-forge package",
"make": "electron-forge make",
"make64": "electron-forge make --arch=x64",
"make32": "electron-forge make --arch=ia32",
"makeAll": "electron-forge make --arch=ia32,x64",
"publish": "electron-forge publish",
"lint": "eslint src --color"
},

可针对不同的用户平台生成对应的可执行文件。

参考文献

  1. Umi-OCR GitHub Repository
  2. Umi-OCR Documentation
  3. How to Write a Highly Readable Software Engineering Design Document
  4. ScratchJr Desktop README
  5. my-electron-app GitHub Repository
  6. Electron Quick Start Guide