Candidates Hate Having to Re-Enter Their Data
by Aaron Koller
Published
Avoid this all-too-common candidate experience pitfall
Candidates shouldn’t have to re-enter information already on their resume when applying for jobs! Here's how we're solving this at Quadratic:
Employers want structured candidate data for many good reasons, but this gets tricky because PDF is a complicated format and there’s no general standard for resume design - each one is unique.
While there are many resume parsing solutions on the market, all have major limitations. Most candidates frequently experience parsing failures, often leading them to re-enter even basic information like name and email
At Quadratic, we’re combining our expertise with low-level text rendering and verification with new technologies to create a superior parsing solution. Our parsing technology can be broken down into three main parts:
- Initial text extraction - we convert the PDF to a basic text format. This is not trivial because PDF files don’t expose text characters in a straightforward way. If you’ve tried “Convert PDF to doc”, you’ve likely noticed some of the resulting issues. Even Mozilla’s pdf.js - an extremely popular open source PDF library - does not properly handle many common edge cases with text extraction, such as typographic ligatures.
- Schematizing data - we assign each word, phrase, or sentence in the resume to a strict schema. This is challenging because resume data does not come in explicitly tagged. Our system needs to figure out where everything goes.
- Validation, error detection and correction - we rigorously check the structured data against the original, correcting any problems and flagging them if they can’t be corrected. This critical and often underappreciated step allows our users to confidently rely on the data we provide (and know when they shouldn’t!). And, detecting problems is necessary for us to continuously improve the system.
While parsing resumes isn’t the most exciting thing we’re doing at Quadratic, it’s of foundational importance for everything else.
The underlying data needs to be correct - we’re aiming for perfection here!