Job Responsibilities :
- Identify, and accurately label, useful data from websites
- Operate web harvesting software in Python to scrape data from websites including in-house native harvester
- Building/maintaining web scraping tools using Python (beautifulsoup/extracttable API)
- Ability to integrate 3rd party packages and API libraries into web harvesting software
- Adhere to website-scraping targets
- Review and ensure that scraped data is correct
- Troubleshooting quality issues
- Collaborate with developer teams for testing
- Responding to requests from the design team and management
- Monitoring bugs tracking and error reporting in Jira
Additional Knowledge (beneficial):-
- Knowledge of AWS (S3)
- Knowledge of Apache Airflow coding for Orchestration
Requirements :
- Bachelor’s Degree
- Technically minded: able to quickly learn and master new software
- Experience in a technical role
- Experience in Python, BeautifulSoup and in-built modules
- Experience with workflow management tools like Airflow
- Experience with Mozenda or comparable harvesting software
- Good working knowledge of Atlassian suite of products
- Good work ethic, self motivated
- Fluent in English [is a MUST]
ONLY SHORTLISTED CANDIDATES WILL BE NOTIFIED