Introduction
In this project, my friend and I assembled a "virtual pen" with Arduino 33 BLE Sense (1050-ABX00035-ND) that determines which digit the user is writing based on the motion they make holding the pen, without the use of a camera. The layout is shown in Fig 1. The button is pressed to indicate the start of motion.
Figure 1. The breadboard layout of the data collection device.
We aim to minimize the need for external computation or additional sensors outside of the device. Ideally the user can “write” on any surface or in the air with little restriction on location, orientation, or occlusion between the pen and the receiving device. We want the handwriting recognition to be robust to differences in scale (e.g. from the scale of writing on paper to writing for lecturing purposes), writing speed, and handwriting style.
As a proof of concept, we will implement recognition for a subset of arabic numerals. Our main objective is to make the device capable of recognizing the numbers accurately given robust motions. The device will be able to tell the difference between the hand motions when writing different numbers and analyze the motion data. The user will press a button to indicate when they are writing a single digit as opposed to just moving the pen between digits or are not using the pen.
Approach
We divided this project into 4 main steps:
1. Arduino IMU data collection for digits 0-9 in Python.
2. Preprocess the data on the Arduino so that it works well as an input to machine learning system.
3. Train a model using RNN (recurrent neural network) or CNN (convolutional neural network).
4. Test the system and check the accuracy of the model's predictions.
Figure 2. The overall approach of the project.
Data Collection
Below is the Python code we used to collect data. To avoid fatigue of writing the same digit over and over, we generated random numbers from 0 to 9. Each dataset contains 20 digits and their corresponding acceleration information. Considering each digit took different amount of motion and time to write, the ones that took less time to write were zero-padded so that all the data points are of the same length.
target = np.repeat(np.arange(10), 2)
np.random.shuffle(target)
bigdata = np.zeros([20,3*119,7]) for i in range(20):
printmd("" + str(target[i]) + "") data = []
with serial.Serial('COM3', 38400, timeout=10) as ser:
startmess = ser.readline() if not startmess:
raise(Exception("Timed Out")) if startmess != b'IMU goes BBBBRRRR\r\n':
raise(Exception("Start message invalid:\n"+str(startmess)))
ser.timeout = 0.1 while True:
message = ser.read(28) if not message:
raise(Exception("Timed Out")) if len(message)<24:
raise(Exception("Message Too Short:\n"+str(message)))
values = np.array(struct.unpack('fffffff',message)) if np.any(np.isnan(values)): if np.any(~np.isnan(values)):
raise Exception("Could not read data:\n"+str(message)) else: break data.append(values) data = np.array(data)
clear_output(wait=True)
print(data.shape[0]) data = np.pad(data,((0, 3*119-data.shape[0],), (0,0)))
print(np.mean(data, axis=0))
bigdata[i] = data
Fig 3. Visual representation of a collected data point for the digit "8".
Classification Model
Because we eventually needed to put this model on the Arduino through tflite, which currently does not support RNN, we implemented a CNN for our model. Below are the layers we used in our model. This worked well for classifying 10 groups of data. We achieved 90% of accuracy with testing and validation data, and a similar percentage when using this device.
Fig 4. CNN layers used in the inertial handwriting recognition system.