First project is finally done!
The project scope comprised of using Object Oriented Ruby to create a command line interface (CLI) to access data that is scraped from a website. The data that is being accessed is two levels deep; a level refers to where a user can choose an option and then receive more information about their choice.
Idea generation
I decided to choose a topic that has recently peaked my interest, group fitness classes. My thought was to create an application that showed a list of different group fitness class offerings to choose from and after a user selects a class, return the class schedule information for the gym locations nearby. I looked through numerous gym websites that offered group fitness classes but a majority of the sites did not meet the two levels deep requirement. LA Fitness was the only gym I found that seemed to fit the criteria for my plan. I ran into issues during the second level scrape, however and it caused me major delays in obtaining the data on the second page.
Project layout
To provide separation of concerns, I created separate classes for GroupFitness, GymLocation, Scraper, and the CLI. I first made a rough outline of the Groupfitness and Gymlocation classes since I had a better idea of what information I wanted the two classes to contain. The GroupFitness class is responsible for initializing each fitness class object with a name, description, and id value then saves it to the @@all class variable.
class Fitness::GroupFitness
@@all = []
attr_accessor :description
attr_reader :name, :fitness_class_id
def initialize(name, description=nil, fitness_class_id)
@name = name
@description = description
@fitness_class_id = fitness_class_id
@@all << self
end
def self.all
@@all
end
end
Similarly, the GymLocation class initializes each instance of location with a location name, address, and distance and saves it to the @@all class variable.
class Fitness::GymLocation
@@all = []
attr_accessor :class_schedule
attr_writer :zip_code
attr_reader :location_name, :address, :distance
def initialize(location_name, address, distance)
@location_name = location_name
@address = address
@distance = distance
@@all << self
end
def self.all
@@all
end
end
The Scraper class is responsible for scraping all of the data from the website to obtain all the pertinent information from the two levels. By parsing HTML using CSS selectors, I was able to obtain the attributes specified in the GroupFitness class. The attributes specified in my GymLocation class was located in the second level and appeared to use query string parameters. After multiple attempts to use these parameters to generate a new URL and parse using CSS selectors, I realized that the class schedules were actually generated by AJAX (Asynchronous JavaScript and XML) calls and displayed by JavaScript. I had to make a POST request and parse JSON (JavaScript Object Notation). After many frustrating hours, I was able to successfully implement a POST request. I was then able to extract the data I needed from the JSON document through accessing the array of hashes.
def self.get_locations_post(zip_code, group_fitness)
response = HTTParty.post(
URL_TO_POST,
:body => JSON.generate({ClassId: group_fitness.fitness_class_id, ZipCode: zip_code, MileRange: 10}),
:headers => {
"Content-Type" => "application/json; charset=UTF-8",
"Accept" => "*/*"
}
)
Lastly, the CLI class holds all the information to run the program and display the prompts for the user to follow. This class also accounts for most of the edge cases that may arise with user inputs.
Project outcome
After numerous days of refactoring and debugging, the command line interface is working as expected! Several edge cases were tested and there are currently no known bugs! The CLI greets the user and asks the user to select a group fitness class froma list.
After the user selects a class, they are given the class description and prompted to input their zip code. It will then display a list of the locations ad schedule information for that class within a 10 mile radius. Here is a short demo of the CLI: