data scientist Remote
Job description
Digital Harbor, a leader in Risk Management, offers the industry’s most advanced end-to-end operational intelligence suite for detection, investigation, assessment and monitoring of risk and fraud. With its indigenous Social Enterprise Technology (SET) Universal Platform, Digital Harbor seamlessly enables collaboration both within and between the enterprise knowledge workers, end users and all relevant stakeholders, aiding in effective risk management and in making informed decisions; thus transforming the enterprises from focusing on 'better transactions' to 'better decisions'.
The AI Team at Digital Harbor is growing, and we are seeking an exceptional Senior NLP Data Scientist to join our team. As a Senior NLP Data Scientist, you will drive everything that concerns data in our data-centric AI team, helping both to identify new data sources and to leverage existing data better. You will establish annotation standards and use tools to improve the quality and quantity of our data. You will play a pivotal role establishing and setting up data pipelines and preparing data for ingestion by our ML models at scale. You will join a highly collaborative, remote-first, cross-functional Agile team.
Key Responsibilities:
- Identify new data sources
- Lead annotation efforts
- Prepare data for ingestion by LLMs
- Perform prompt engineering with LLMs
- Leverage NLP tools like SpaCy and Snorkel to analyze, annotate, and augment data, using both supervised and unsupervised approaches
- Identify relevant datasets and determine annotation needs
- Set up data pipelines
- Version datasets for training in our pipelines
- Monitor ML models in production for data drift
- Stay current with the latest technologies in Machine Learning and Data Science
Qualifications:
- At least a Bachelor’s degree in Linguistics, Computational Linguistics, Computer Science, or related degree, and three (3) years of relevant experience, or combination or education, experience and training
- Exposure to modern NLP systems, including word embeddings, transformer architectures, LLMs such as ChatGPT, and good software design principles
- Experience with NLP toolkits such as NLTK, SpaCy, and Gensim
- Experience leading teams and mentoring junior data scientist
- Fluency in Python
- Experience with NLP libraries and tolls such as NLTK, SpaCy, Gensim
- Experience setting up CI/CD pipelines
- Strong analytical and problem-solving skills
- Excellent communication and collaboration skills
Preferred Qualifications:
- Knowledge of linguistic theories (morphosyntax, semantics, etc.)
- Experience with unsupervised techniques like clustering and topic modeling
- Experience with tools for data versioning like dvc
- Experience with unsupervised tools for labeling data, e.g. Snorkel
- Experience with big data technologies such as Hadoop, Spark, or Kafka
What are we looking for from you?
- Passion, Curiosity, Hunger
- Fearless & Analytical Mindset
- Strong interest in Product Development
- Goal-Oriented, get it done attitude
Job Type: Full-time
Pay: $120,000.00 - $130,000.00 per year
Benefits:
- 401(k) matching
- Dental insurance
- Flexible schedule
- Health insurance
- Life insurance
- Paid time off
- Retirement plan
- Vision insurance
Schedule:
- 8 hour shift
Supplemental pay types:
- Bonus pay
Education:
- Bachelor's (Required)
Experience:
- Python: 3 years (Required)
- IT: 3 years (Required)
- Natural language processing: 2 years (Required)
Work Location: Remote