I. POSITION INTRODUCTION
We are looking for a professional Data Crawler, capable of operating a large-scale data collection system, ensuring stability, accuracy and efficiency.
1. Professional Scraping System Development
Technical Requirements:
System Architecture:
- Design cross-platform Python crawling scripts
- Build scalable systems
- Develop parallel crawling solutions
- Manage large, multi-threaded data streams
Technologies:
- Scrapy, BeautifulSoup
- Selenium
- Asyncio, Multiprocessing
- Proxy management
- IP rotation techniques
2. Data Processing and Normalization
Processing Methods:
- Develop API data cleaning processes
- Data transformation algorithms
- Integrity checks
- Remove noisy data
Tools:
- Pandas
- Data validation techniques
- Machine Learning preprocessing
3. Database Management
Specialized Skills:
Advanced SQL:
- Complex queries
- Performance optimization
4. Monitoring & Optimization
Strategy:
- Manage scraping system operations.
- Track scraping performance
- Challenge handling:
- IP blocking
- Speed limiting
- CAPTCHA
II. PROFESSIONAL REQUIREMENTS
Education
- Bachelor's degree (GPA > 3.0)
- Major:
- Data science
- Computer engineering
- Data related fields
- English: TOEIC > 700 of IELTS >5.5
Technical Skills
Python Ecosystem
- Asyncio, Multiprocessing
- Data cleaning techniques
- Machine Learning preprocessing
- Advanced error handling
Database & Big Data
- SQL (Intermediate to Advanced)
- NoSQL database management
- PySpark
- Data warehousing
In-depth Experience
- Minimum 1-2 years
- Project implementation:
- Web scraping
- Automatic data processing
- Big data crawling
III. SOFT SKILLS
System analysis
Problem solving
Independent & team working
Time management
Logical thinking
IV. NICE TO HAVE EXPERIENCES
Big Data experience
Data pipeline design
Working with diverse APIs
Professional certifications
Creativity and initiative in proposing ideas
V. BENEFITS
Modern technology environment
Competitive salary
Development opportunities
Continuous training
VIII. EVALUATION CRITERIA
System stability
Data quality
Processing efficiency
Scalability
VI. REPORTING
- Directly report to: Manager and Board of Directors
- Reporting content: according to reporting regulations and reporting content for the technical
- Types of Reports:
- Daily Progress Report
- Weekly Report
- Monthly Report
- Milestone Quick Report
- Incident Report
- Performance Report
VII. OTHER RELATED FACTORS
- Working hours: HC 07 hours/day (Morning from 08:00 - 11:30, Afternoon from 13:00 - 16:30), from Monday to Friday, off on Saturday & Sunday.
- Working equipment: provided
- Salary: 12 - 18 million/month
- Gửi CV ứng tuyển qua mail: hr@webify.com.vn, hạn tuyển: 15/03/2025