I am building a Machine Learning recommendation system for matching candidates with job postings.
I have two data sets. One contains job postings, the other one contains candidates. Job postings are originally retrieved in Swedish, from Swedish Unemployment Agency. I wrote a Python script to translate those job postings to English. Each job posting has a title and description, which is any sort of a text from one to 20 sentences. A description field contains everything from responsibilities, required skills and everything else that one job posting has.
On the other hand, the data set which contains candidates contains age, education, previous experience, knowledge, and skills for each candidate. Each candidate had up to six skills. All skills from the data set are collected and the data set is one hot encoded, meaning that I created a column for each possible skill and labeled it with 0 or 1, depending on the user's knowledge about the skill.
Now I need to prepare some data for training the model. I already split the candidates into a training and test set. I now must find a way to somehow extract keywords from job descriptions and compare them to the candidates' skills. Do you have any idea on how to do any of that, from extracting and defining keywords to cross-checking each candidate with each job posting?
Any help would be very appreciated!