San Francisco
September 30, 2014

Simple Web Scraping with Ruby: Using Nokogiri and SQLite3

In this class, we’ll learn how to use some Ruby gems to solve a simple web scraping problem and store scraped data locally. Bring your own favorite web scraping problem to class, and we’ll try to write as many scrapers as possible!

We will review some basic HTML and DOM concepts to understand how browsers and other programs interpret an HTML page, to make it easy to search for specific elements, and extract both the visible and the invisible text on a webpage.

We will us the following exercise to illustrate these concepts: look at the newest stories on Reddit, and insert into a local database the following information about the top 10 of those stories - the author, how many comments/points it has, its category, its title, and the date it was posted.

We will assume you have a basic understanding of Ruby constructs, like lists, string manipulation and methods.

The material covered in the class is available online - feel free to email us at team AT railsschool DOT org beforehand if you have read these and have any questions about the material:

If there is a webpage you're dying to figure out how to scrape and download data off of, email us that ahead of time too and we'll try to build a parser customized for your needs.



7pm Pacific - 9pm Pacific on September 30, 2014 at Noisebridge
13 students were there

Whiteboard

Please, sign in to see the Whiteboard