Parsing an XML file in Ruby

One of the first times I worked with files, I needed to parse an XML file. I’m going to show you two ways to parse an XML file–the first is the simple way I originally did it and the second way I’ll show you a Ruby gem called Nokogiri.

Parsing an XML file the old-fashioned way

Let’s look at the file we’re going to parse. This is a list of conversion rates which is part of PuzzleNode challenge #1 (if you’re thinking about doing the PuzzleNode challenges you should stop reading this right now!).

Rates.xml

All html tags are altered to ensure that they show up here. If you see an ">" assume that it's prefaced by a "<"

#?xml version="1.0"?>
rates>
  rate>
    from>AUD/from>
    to>CAD/to>
    conversion>1.0079/conversion>
  /rate>
  rate>
    from>CAD/from>
    to>USD/to>
    conversion>1.0090/conversion>
  /rate>
  rate>
    from>USD/from>
    to>CAD/to>
    conversion>0.9911/conversion>
  rate>
/rates>

My initial solution was to go line by line checking to see the type of data on the line by checking to see what the line started with. Then I added the data to an OpenStruct rate object.

require 'ostruct'

class RatesParser
        attr_reader :all_rates

	class << self

		def parse(xml_file)
			file = File.open(xml_file)
			all_rates = []

			file.each_line do |line|
				line.strip!
				@from_currency = get_from_currency(line) if line.start_with?("") 
				@to_currency = get_to_currency(line) if line.start_with?("")
				@rate = get_rate(line) if line.start_with?("")
				if line.start_with?("")
					new_rate = OpenStruct.new(:from_currency => @from_currency, :to_currency => @to_currency, :rate => @rate)
					all_rates << new_rate 
				end
			end
			all_rates
		end

		private
		
		def get_from_currency(line)
			line[6..8]
		end

		def get_to_currency(line)
			line[4..6]
		end

		def get_rate(line)
			line[12..17].to_f
		end
	end
end

I was very happy with this solution. But you can pretty clearly see that if there are tens of hundreds of nodes in the xml file this parser would also have tens or hundreds of if-statements. Yuck.

Luckily there is a Ruby gem called Nokogiri that parses XML files for us.

The Nokogiri way

First you need to install Nokogiri

sudo gem install nokogiri

Next, we can start parsing. Here’s how it worked for me. I went through each ‘rate’ node in ‘rates’ and extracted the text from each of the the children nodes, namely ‘from’, ‘to’, and ‘conversion’.

require 'ostruct'
require 'nokogiri'

class RatesParser
	attr_reader :all_rates

	def self.parse(xml_file = "SAMPLE_RATES.xml")
		all_new_rates = []
		file = Nokogiri::XML(open("SAMPLE_RATES.xml")) 
		@rate_nodes = file.xpath("//rates/rate")
		@children_nodes = @rate_nodes.map do |node|
			@from_currency = node.children.map{|n| n.text.strip if n.name == "from" }.compact[0]
			@to_currency = node.children.map{|n| n.text.strip if n.name == "to" }.compact[0]
			@rate = node.children.map{|n| n.text.strip if n.name == "conversion" }.compact[0].to_f
			all_new_rates << OpenStruct.new(:from_currency => @from_currency, :to_currency => @to_currency, :rate => @rate)
		end.compact
		all_new_rates
	end
end

Here’s a good Stack Overflow explanation of how to use Nokogiri. You can also head over to the Nokogiri tutorials to dive into the documentation.

Leave a Reply

Your email address will not be published. Required fields are marked *