Skip to content

Web Scraping in Ruby

May 12, 2011

Hpricot is a HTML parser, fantastic ruby library, easy to install and easy usage

To install

sudo gem install hpricot open-uri

open-uri is using a network streams

here i posted a simple web scraping code

This code to fetch the group of student results from the Annauniversity website


# Fetch my class students exam result from AnnaUniversity site
# Progamme name scrabing_exam_results.rb
# Author : Rajkumar.S
# version : 0.01
# License: GNU GPL 3

require 'rubygems'
require 'open-uri'
require 'hpricot'

url = "http://result.annauniv.edu/cgi-bin/result/re10.pl?regno="
# exam_no is a range
exam_no = "52108621001".."52108621039"

exam_no.each do |each_number|
doc=Hpricot(open(url+each_number))
data=doc.search('table')
# write a file as html format easily view all results in one page
File.open("result.html","a") {|f| f.puts(data)}
# find the inside content of table tag
x=doc.search('table').inner_html
# it is remove the html tags
a=x.gsub(/<\/?[^>]*>/,"")
# spearate an array where \n is placed
b=a.split.join("\n")
puts b+"\n"+"======================="

File.open("result.txt","a") { |f| f.puts(b+"\n\n"+"=================")}

end

Advertisements
6 Comments leave one →
  1. sanmugam k permalink
    May 13, 2011 9:11 am

    super v good job..

  2. February 7, 2012 12:06 pm

    cool code…
    i tried to rewrite code to scrap results from madras university.
    but i can’t install the open-uri
    it tells:
    ERROR: Could not find a valid gem ‘open-uri’ (>= 0) in any repository

  3. manimaran permalink
    February 8, 2012 10:19 am

    now working…
    thanks

  4. February 9, 2012 10:54 pm

    fanstastic

  5. June 7, 2012 7:41 pm

    After I start your Feed it appears to be a ton of nonsense, is the problem on my part?

  6. Glennsib permalink
    July 3, 2017 7:26 am

    nfl bears 17 https://www.gradeajerseys.net cheap nfl jerseys

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: