Eugenio Lacuesta @eugenio

About Me



[Web] Scrapy beyond the first steps

Scrapy is the most popular web scraping / web crawling framework out there. It provides several out-of-the box configurable features for the most common needs for web crawlers in general.

However, when your project starts growing up, you will feel the need to tailor Scrapy further to your needs. You'll need to create custom request/response processors, data validators, etc.

In this talk you will learn how to take the best out of Scrapy in terms of design by taking advantage of built-in components like signals and extensions to respond to events that happen during the crawl.

We will also cover how to plug functionality into Scrapy by using a few additional components like:
- spider middlewares (customize resources going in and out of spiders, like requests, responses and scraped items)
- downloader middlewares (customize Scrapy's request/response processing)
- item exporters (serialize the extracted data into files)
- item pipelines (custom post processing/validation of extracted items)