The HWRT database of handwritten symbols contains on-line data of handwritten symbols such as all alphanumeric characters, arrows, greek characters and mathematical symbols like the integral symbol.
The database can be downloaded in form of bzip2-compressed tar files. Each tar file contains:
All CSV files use ";" as delimiter and "'" as quotechar. The data is given in YAML format as a list of lists of dictinaries. Each dictionary has the keys "x", "y" and "time". (x,y) are coordinates and time is the UNIX time.
[[{"x":190,"y":578,"time":1400943241868}, {"x":195,"y":554,"time":1400943241973}, {"x":205,"y":490,"time":1400943241992}, {"x":205,"y":472,"time":1400943242016}, {"x":206,"y":455,"time":1400943242026}, {"x":206,"y":436,"time":1400943242038}, {"x":205,"y":416,"time":1400943242054}, {"x":204,"y":398,"time":1400943242070}, {"x":202,"y":383,"time":1400943242087}], [{"x":201,"y":370,"time":1400943242104}, {"x":200,"y":358,"time":1400943242118}, {"x":200,"y":349,"time":1400943242136}, {"x":200,"y":339,"time":1400943242152}, {"x":200,"y":334,"time":1400943242169}, {"x":200,"y":333,"time":1400943242186}, {"x":202,"y":332,"time":1400943242203}, {"x":203,"y":332,"time":1400943242218}, {"x":206,"y":335,"time":1400943242234}, {"x":211,"y":339,"time":1400943242251}, {"x":218,"y":350,"time":1400943242267}, {"x":227,"y":363,"time":1400943242283}]]
About 90% of the data was made available by Daniel Kirsch via github.com/kirel/detexify-data. Thank you very much, Daniel!
The Detexify data was filtered, some labels which were obviously wrong were changed, new data was added.
Just like Detexify, this dataset is published under ODbL (ODC Open Database License).
Date | Download | Size | DOI |
---|---|---|---|
2015-01-28 | Download | 134,3 MB | 10.5281/zenodo.50022 |
To make sure that the download worked, you can use md5sum
:
$ md5sum 2015-01-28-data.tar
2bf1d089ce65c0a39e57064516f1bd1c 2015-01-28-data.tar
* If nothing else is mentioned, then the classifier uses scaling, shifting and linear resampling.
Classifier | Preprocessing* | Features | Training | TOP-3 Test Error Rate (%) |
Reference |
---|---|---|---|---|---|
Neural Networks | |||||
160:500:369 MLP | Coordinates (4 strokes, 20 points per stroke) | Mini-Batch gradient descent, η=0.1, α=0.1 | 6.80% | Thoma, 2014 | |
160:500:500:369 MLP | Coordinates (4 strokes, 20 points per stroke) | Mini-Batch gradient descent, η=0.1, α=0.1 | 5.75% | Thoma, 2014 | |
160:500:500:500:369 MLP | Coordinates (4 strokes, 20 points per stroke) | Mini-Batch gradient descent, η=0.1, α=0.1 | 5.74% | Thoma, 2014 | |
160:500:500:500:500:369 MLP | Coordinates (4 strokes, 20 points per stroke) | Mini-Batch gradient descent, η=0.1, α=0.1 | 6.12% | Thoma, 2014 | |
167:500:500:369 MLP | Coordinates (4 strokes, 20 points per stroke); re-curvature; ink; # of Strokes; aspect ratio |
Mini-Batch GD with SLP with η=0.1 and α=0.1 The complete MLP was trained again with η=0.05 and α=0.1 |
4.04% | Thoma, 2014 |