The HWRT database of handwritten symbols

The HWRT database of handwritten symbols contains on-line data of handwritten symbols such as all alphanumeric characters, arrows, greek characters and mathematical symbols like the integral symbol.

The database can be downloaded in form of bzip2-compressed tar files. Each tar file contains:

All CSV files use ";" as delimiter and "'" as quotechar. The data is given in YAML format as a list of lists of dictinaries. Each dictionary has the keys "x", "y" and "time". (x,y) are coordinates and time is the UNIX time.

Data example

[[{"x":190,"y":578,"time":1400943241868},
  {"x":195,"y":554,"time":1400943241973},
  {"x":205,"y":490,"time":1400943241992},
  {"x":205,"y":472,"time":1400943242016},
  {"x":206,"y":455,"time":1400943242026},
  {"x":206,"y":436,"time":1400943242038},
  {"x":205,"y":416,"time":1400943242054},
  {"x":204,"y":398,"time":1400943242070},
  {"x":202,"y":383,"time":1400943242087}],
 [{"x":201,"y":370,"time":1400943242104},
  {"x":200,"y":358,"time":1400943242118},
  {"x":200,"y":349,"time":1400943242136},
  {"x":200,"y":339,"time":1400943242152},
  {"x":200,"y":334,"time":1400943242169},
  {"x":200,"y":333,"time":1400943242186},
  {"x":202,"y":332,"time":1400943242203},
  {"x":203,"y":332,"time":1400943242218},
  {"x":206,"y":335,"time":1400943242234},
  {"x":211,"y":339,"time":1400943242251},
  {"x":218,"y":350,"time":1400943242267},
  {"x":227,"y":363,"time":1400943242283}]]
   

Credits and License

About 90% of the data was made available by Daniel Kirsch via github.com/kirel/detexify-data. Thank you very much, Daniel!

The Detexify data was filtered, some labels which were obviously wrong were changed, new data was added.

Just like Detexify, this dataset is published under ODbL (ODC Open Database License).

Downloads

Date Download Size DOI
2015-01-28 Download 134,3 MB 10.5281/zenodo.50022

To make sure that the download worked, you can use md5sum:


$ md5sum 2015-01-28-data.tar
2bf1d089ce65c0a39e57064516f1bd1c  2015-01-28-data.tar

Classifiers

* If nothing else is mentioned, then the classifier uses scaling, shifting and linear resampling.

Classifier Preprocessing* Features Training TOP-3 Test
Error Rate (%)
Reference
Neural Networks
160:500:369 MLP Coordinates (4 strokes, 20 points per stroke) Mini-Batch gradient descent, η=0.1, α=0.1 6.80% Thoma, 2014
160:500:500:369 MLP Coordinates (4 strokes, 20 points per stroke) Mini-Batch gradient descent, η=0.1, α=0.1 5.75% Thoma, 2014
160:500:500:500:369 MLP Coordinates (4 strokes, 20 points per stroke) Mini-Batch gradient descent, η=0.1, α=0.1 5.74% Thoma, 2014
160:500:500:500:500:369 MLP Coordinates (4 strokes, 20 points per stroke) Mini-Batch gradient descent, η=0.1, α=0.1 6.12% Thoma, 2014
167:500:500:369 MLP Coordinates (4 strokes, 20 points per stroke);
re-curvature; ink; # of Strokes; aspect ratio
Mini-Batch GD with SLP with η=0.1 and α=0.1
The complete MLP was trained again with η=0.05 and α=0.1
4.04% Thoma, 2014

Publications