This article has been contributed by Lin Ma, Software Engineer and KVM Virtualization Specialist at SUSE. If you want to read more from him about virtualization, machine learning and artificial intelligence, have a look at the following articles:
Today I’m going to briefly introduce a home-made pen plotter which is based on machine learning. It can detect, recognize and answer very simple exam questions about mathematical computing.
The following sections detail the technical requirements and specifications of our setup.
- Raspberry Pi 3
- openSUSE Tumbleweed raspberrypi3 aarch64 2019.05.17 image
- TensorFlow 1.13.1 from the science:machinelearning repository in the Open Build Service (OBS)
Artificial Intelligence (AI) Components
- Pre-trained Connectionist Text Proposal Network (CTPN) neural network model of oriented text detection in a natural scene (For more details, please read https://www.suse.com/c/machine-learning-oriented-text-detection-from-natural-scene/)
- Baidu optical character recognition (OCR) service
Upper Computer Component
- Arduino UNO R3 plus Grbl firmware
- CNC Shield V3 Expansion Board (1x)
- A4988 stepper motor driver module (2x)
- NEMA 17 stepper motor (2x)
- Servo motor (1x)
- Camera (1x)
- Limit Switch (2x)
- Plain shaft with slider (2x)
- Threaded shaft with coupling & nut (2x)
- Aluminum alloy plate
1. Prepare your questions for the mathematical computing on a piece of paper.
2. The AI pen plotter takes a picture through the camera and resizes it to 1280 * 1690, like in the example below:
— Neural Network Job START—
3. The picture will be handled by the pre-trained CTPN neural network model on the Raspberry Pi 3 to figure out where the text is and to record the relative positions.
4. A script will perform a couple of screenshots according to the positions and send these sub-pictures which include the text to the Baidu OCR service. See the example below:
5. The Baidu OCR service processes these sub-pictures. Then it recognizes and returns the text.
— Neural Network Job DONE—
6. A script checks the returned text, searches for keywords, and tries to understand the questions. Then it starts to parse the questions and generates the process to create a picture. See the example below:
7. A script calculates the relative positions and converts the picture into a gcode file, as in the example below:
8. Run the command <gcode sender> to send these gcode instructions to the upper computer.
9. The Grbl firmware generates the step and direction signals according to the gcode instructions and hands them on to the CNC Shield V3.
10. The CNS Shield V3 controls the linear XY motion with the stepper motor and the movement of the pen (up/down) with the servo motor.
If you want to see a live demo of what is described above, just have a look at the videos below.
As you can see, to process a picture of this size which contains only one short mathematical question, the time consumed is around 11 minutes. It is very likely that the time consumed for the entire process can be reduced by 50 percent, if the code is changed and sends the text detection job to ‘cloud’ instead of to the native Raspberry Pi 3, or if you use Raspberry Pi 3 with Neural Compute Stick(s) for accelerating the inference. But this assumption still would have to be proven :-).
Thank you very much for your attention!