Web Scraping in Ruby
May 12, 2011
Hpricot is a HTML parser, fantastic ruby library, easy to install and easy usage
To install
sudo gem install hpricot open-uri
open-uri is using a network streams
here i posted a simple web scraping code
This code to fetch the group of student results from the Annauniversity website
# Fetch my class students exam result from AnnaUniversity site # Progamme name scrabing_exam_results.rb # Author : Rajkumar.S # version : 0.01 # License: GNU GPL 3 require 'rubygems' require 'open-uri' require 'hpricot' url = "http://result.annauniv.edu/cgi-bin/result/re10.pl?regno=" # exam_no is a range exam_no = "52108621001".."52108621039" exam_no.each do |each_number| doc=Hpricot(open(url+each_number)) data=doc.search('table') # write a file as html format easily view all results in one page File.open("result.html","a") {|f| f.puts(data)} # find the inside content of table tag x=doc.search('table').inner_html # it is remove the html tags a=x.gsub(/<\/?[^>]*>/,"") # spearate an array where \n is placed b=a.split.join("\n") puts b+"\n"+"=======================" File.open("result.txt","a") { |f| f.puts(b+"\n\n"+"=================")} end
6 Comments
leave one →
super v good job..
cool code…
i tried to rewrite code to scrap results from madras university.
but i can’t install the open-uri
it tells:
ERROR: Could not find a valid gem ‘open-uri’ (>= 0) in any repository
now working…
thanks
fanstastic
After I start your Feed it appears to be a ton of nonsense, is the problem on my part?
nfl bears 17 https://www.gradeajerseys.net cheap nfl jerseys