
Factual, Inc. is looking for an Intern Data Extraction Engineer to work with our team of Linguists and Engineers to find and upload structured data to our open data platform. Your work may include research and analysis of data on the web; extracting data using proprietary tools; aggregating, cleaning, and merging data; and evaluating algorithmically generated data extractors.
The job will entail generating, maintaining, and operating the extraction software and scripts, as well as authoring CSS and XPATH selectors
Skills desired:
Linux shell scripting, one-liners with perl/ruby, and/or awk and sed
Familiarity with bash, including grep, sort, find, xargs, split, join, comm, input and output redirection, shell escaping, defining aliases and functions would be of help
Ability to author regular expressions
Familiarity with DOM, esp. authoring and fine-tuning CSS and XPATH selectors which work across multiple pages
You are likely to excel at this job if you:
Are detail-oriented
Generally love data
http://wiki.developer.factual.com/Factual-Developer-Internships

- Columbia
- Cornell
- Duke
- Harvard
- MIT
- NYU
- Penn
- Princeton
- U of Chicago
- U of Illinois
- Matching great people and companies



