Male vs Female Biographies on Wikipedia

Screen Cap of Names and Genders extracted from Wikipedia

I attended the Boston Girl Geeks Dinner last week, and left with a mission – to extract the names from all the biographies on wikipedia, and analyze them for gender.

There’s a gem that will compare a first name to a 40,000 name dictionary, and determine whether that name is female, male, mostly female, mostly male, or androgynous/unknown.

Wikipedia has an api, but it’s a long, convoluted process to get at just the names of biographical subjects. However, they also have a set of pages that organize biographies according to the quality of the bio. Each of those pages lists … the *names* of the biography subjects. Bingo!

So, I got to learn how to extract xpath data from a page, and do a bit of work to extract exactly what I need from the data (the names are not the only text in the resulting string, so I had to strip the extra content out), then run it through the gender tool.

I’ve made the tool as gentle on wikipedia as possible – there’s no automated running of the whole extraction, each page has to be run separately, by hand, so it won’t create a big load and thus accidentally get my IP address banned. I used a really small page during coding, so the incessant testing would only be grabbing a single page and a few lines of data.

And it works!

Next up: enabling people to edit records, to handle cases where the tool couldn’t identify the gender, and adding statistical analysis.

Fun with FizzBuzz

I decided to make FizzBuzz into something that reads (nominally) like English … because I’m easily amused.

fizz_or_buzz = (1..100)

fizzbuzz = fizz_or_buzz.map do |i|
  not_fizzy = (i % 3 != 0)
  not_buzzy = (i % 5 != 0)
  no_fizz_no_buzz = i.to_s
  when_fizzy = (i % 3)
  when_buzzy = (i % 5)

  if not_fizzy && not_buzzy
    fizziness = [no_fizz_no_buzz]
  else
    fizziness = [ ["Fizz"][when_fizzy], ["Buzz"][when_buzzy] ]
  end

  fizziness.compact.join
end

puts fizzbuzz

There’s a fun little trick in the else clause, taking advantage of ruby’s handling of nil elements in a 2 element array. It’s a *really bad* coding practice, but fun to play with. You can see an explanation of the nil handling here.

Skip to content