placeholder
Stuart Gentle Publisher at Onrec

Textkernel announces Hungarian CV parsing

Textkernel is happy to announce a new version of it’s CV parsing software Extract!. Version 2015.1 introduces Hungarian CV parsing and further improvements to the German, Dutch and English parsers

Textkernel is happy to announce a new version of it’s CV parsing software Extract!. Version 2015.1 introduces Hungarian CV parsing and further improvements to the German, Dutch and English parsers.

New: Hungarian CV parsing

In late 2014, Textkernel started working on Hungarian CV extraction and is now proud to announce the new Hungarian CV parsing model. With the addition of Hungarian, Textkernel now offers CV parsing for 16 languages.

Development of the Hungarian CV parser

The development of a new language model is a complex process. First, a large set of resumes has to be annotated. Hungarian linguistics students were hired to identify the different sections in each CV such as education and experience, but also more specific information such as the education level, position title, and company name.

Textkernel’s researchers then trained the CV parsing engine on these examples. A Hungarian CV parsing model was created and optimised and finetuned using more Hungarian CVs, until the desired performance was achieved. Lastly, a Hungarian language guesser was added in order for Hungarian CVs to be routed to the new Hungarian CV parsing model.

Improving German CV parsing with Deep Learning

Last year, Textkernel’s R&D team started applying Deep Learning techniques to further improve the quality of their CV parsers. Following successes with the English and French models, Deep Learning is now being used for the first time to improve the German model. This new technology increases the robustness of the German CV parser and has improved extraction of experience and education items (such as job title and company name).
 


Read the full release notes

For more information on this release or about Textkernel’s CV parsers, please contact Textkernel.