Spidr and Raingrams are back, now with specs

Raingrams is back in action. After sitting on rubyforge for quite some time, I was asked to add some features to the general purpose Ngrams Ruby library. I ended up refactoring the code to handle probability calculations better (only recalculate the Maximum Likelihood Estimation (MLE) when the set of ngrams changes), removed the Unigram model (kinda pointless in a ngrams library), allow a trained model to be dumped to a file using Marshal and added the ability to generate random text from trained models. Raingrams also received a total of 133 new spec tests.

Install Raingrams:
$ sudo gem install raingrams

Spidr also received some new spec tests. After fixing a link handling bug in Spidr 0.1.1, I decided to create a Web Spider Obstacle Course for testing purposes. The course contains all manner of links (remote, local, relative, absolute, javascript, empty URLs and infinite looping links). The course also provides a JSON file containing spec information for how a web-spider should navigate the links. I also wrote a RSpec helper which imports the spec information from the JSON file and auto-generates spec tests for how Spidr::Agent should navigate the links in the obstacle course.

Install Spidr:
$ sudo gem install spidr

Advertisements

About this entry