Ruby获取网页的title
gem install uri gem install nokogiri
Create a Ruby script called get_titles.rb and add the following code to load the libraries, open a URL as a file, send its contents to Nokogiri, and extract the value of the <title> tag:
Copy
require 'nokogiri' require 'open-uri' url = "https://google.com" URI.open(url) do |f| doc = Nokogiri::HTML(f) title = doc.at_css('title').text puts title end
Save the file and run the program:
Copy
ruby get_titles.rb
The result shows the page title for Google:
To do this for multiple URLs, put the URLs in an array manually, or get them from a file.
Reading URLs from a File
You may already have the list of URLs in a file, which may have come from a data export. Using Ruby’s File.readlines, you can quickly convert the file into an array.
Create a new file called links.txt and add a couple of links. Make sure one of them is a bad URL; you’ll make sure to handle errors.
https://google.com https://devto
Save the file.
Now return to your get_titles.rb file and modify the code so it reads the file in line-by-line, and uses each line as a URL:
Copy
# get_titles.rb require 'nokogiri' require 'open-uri' lines = File.readlines('links.txt') lines.each do |line| url = line.chop URI.open(url) do |f| doc = Nokogiri::HTML(f) title = doc.at_css('title').text puts title end rescue SocketError puts "#{url}: can't connect. Bad URL?" end
Each line from the file will have a line break at the end, which you remove with the .chop method before storing the value in the url variable.
The URI.open method will throw a SocketError if it can’t connect, and so you rescue that error with a sensible message.
Save the file and run the program again:
阅读量: 1068
发布于:
修改于:
发布于:
修改于: