Create a Ruby script called get_titles.rb and add the following code to load the libraries, open a URL as a file, send its contents to Nokogiri, and extract the value of the <title> tag: the-little-prince-wallpaper.jpg2.0 MB
Copy
require 'nokogiri'
require 'open-uri'
url = "https://google.com"
URI.open(url) do |f|
doc = Nokogiri::HTML(f)
title = doc.at_css('title').text
puts title
end
Save the file and run the program:
Copy
ruby get_titles.rb
The result shows the page title for Google:
Google
To do this for multiple URLs, put the URLs in an array manually, or get them from a file.
Reading URLs from a File
You may already have the list of URLs in a file, which may have come from a data export. Using Ruby’s File.readlines, you can quickly convert the file into an array.
Create a new file called links.txt and add a couple of links. Make sure one of them is a bad URL; you’ll make sure to handle errors.
https://google.com
https://devto
Save the file.
Now return to your get_titles.rb file and modify the code so it reads the file in line-by-line, and uses each line as a URL:
Copy
# get_titles.rb
require 'nokogiri'
require 'open-uri'
lines = File.readlines('links.txt')
lines.each do |line|
url = line.chop
URI.open(url) do |f|
doc = Nokogiri::HTML(f)
title = doc.at_css('title').text
puts title
endrescue SocketError
puts "#{url}: can't connect. Bad URL?"
end
Each line from the file will have a line break at the end, which you remove with the .chop method before storing the value in the url variable.
The URI.open method will throw a SocketError if it can’t connect, and so you rescue that error with a sensible message.