One of the first times I worked with files, I needed to parse an XML file. I’m going to show you two ways to parse an XML file–the first is the simple way I originally did it and the second way I’ll show you a Ruby gem called Nokogiri.
Parsing an XML file the old-fashioned way
Let’s look at the file we’re going to parse. This is a list of conversion rates which is part of PuzzleNode challenge #1 (if you’re thinking about doing the PuzzleNode challenges you should stop reading this right now!).
Rates.xml
All html tags are altered to ensure that they show up here. If you see an ">" assume that it's prefaced by a "<" #?xml version="1.0"?> rates> rate> from>AUD/from> to>CAD/to> conversion>1.0079/conversion> /rate> rate> from>CAD/from> to>USD/to> conversion>1.0090/conversion> /rate> rate> from>USD/from> to>CAD/to> conversion>0.9911/conversion> rate> /rates>
My initial solution was to go line by line checking to see the type of data on the line by checking to see what the line started with. Then I added the data to an OpenStruct rate object.
require 'ostruct' class RatesParser attr_reader :all_rates class << self def parse(xml_file) file = File.open(xml_file) all_rates = [] file.each_line do |line| line.strip! @from_currency = get_from_currency(line) if line.start_with?("") @to_currency = get_to_currency(line) if line.start_with?("") @rate = get_rate(line) if line.start_with?("") if line.start_with?("") new_rate = OpenStruct.new(:from_currency => @from_currency, :to_currency => @to_currency, :rate => @rate) all_rates << new_rate end end all_rates end private def get_from_currency(line) line[6..8] end def get_to_currency(line) line[4..6] end def get_rate(line) line[12..17].to_f end end end
I was very happy with this solution. But you can pretty clearly see that if there are tens of hundreds of nodes in the xml file this parser would also have tens or hundreds of if-statements. Yuck.
Luckily there is a Ruby gem called Nokogiri that parses XML files for us.
The Nokogiri way
First you need to install Nokogiri
sudo gem install nokogiri
Next, we can start parsing. Here’s how it worked for me. I went through each ‘rate’ node in ‘rates’ and extracted the text from each of the the children nodes, namely ‘from’, ‘to’, and ‘conversion’.
require 'ostruct' require 'nokogiri' class RatesParser attr_reader :all_rates def self.parse(xml_file = "SAMPLE_RATES.xml") all_new_rates = [] file = Nokogiri::XML(open("SAMPLE_RATES.xml")) @rate_nodes = file.xpath("//rates/rate") @children_nodes = @rate_nodes.map do |node| @from_currency = node.children.map{|n| n.text.strip if n.name == "from" }.compact[0] @to_currency = node.children.map{|n| n.text.strip if n.name == "to" }.compact[0] @rate = node.children.map{|n| n.text.strip if n.name == "conversion" }.compact[0].to_f all_new_rates << OpenStruct.new(:from_currency => @from_currency, :to_currency => @to_currency, :rate => @rate) end.compact all_new_rates end end
Here’s a good Stack Overflow explanation of how to use Nokogiri. You can also head over to the Nokogiri tutorials to dive into the documentation.