Sunday, August 10, 2008

Some Numbers from the 2008 Olympics

During the opening ceremonies, the US broadcast displayed a graphic with the size of the delegation and population of the country. I wondered what the correlation was between population and the number of athletes competing and what other variables might influence the delegation size. In addition to population GDP, climate, and some metric measuring civil rights for women might also explain delegation size. When I went to research this I had trouble finding the country and delegate data. What I did find was a list of all the athletes online. I turned the data into an excel file and played around a little with the data. The file can be downloaded here if anyone wants it. It is a .csv, which makes it easy to inport and analyze with R.


I created some pivot tables in Excel. It is possible to do similar things in R with the tapply() function but Excel makes pivot tables so easy to create and alter I did not bother. I uploaded them to a Google spreadsheet and embedded a few of them at the bottom of this post. The rest of the data I mainly found using R. There are 204 countries competing in this Olympics. The largest delegation belongs to the United States, with 618 athletes. 10 countries have one athlete competing: Arba, Belize, Burundi, Central Africa Republic, Dominica, Gabon, Niger, and Nauru (which was featured in a surreal This American Life episode). The mean delegation size is 49.24 athletes. Surprisingly, the median delegation size was only 9. Roughly half the countries send 9 or fewer athletes. Random fact: Only 27 of the 204 delegations have more women competing than men. Which two countries have the largest female-positive (ie more women than men) delegation?


Another column of data on the offical site listed the disciplines (sports) which athletes compete in. There are 38 discipline classifications. The most competed in discipline is Athletics (Track & Field I would guess) with 1943 competitors. Cycling BMX is the smallest event with only 24 competitors. The median and mean are 182.5 (between Baseball and Table Tennis) and 264 respectively. Random fact: there are five sports that are specific to only one gender. Which ones are they?


A little manipulation with R turned the Date of Birth data into Year of Birth and then into an 'age estimate' where I took 2008 and subtracted Year of Birth to get current age. The oldest athlete is Hoketsu Hiroshi, a 67 year-old man representing Japan in Equestrian. The youngest competitor is 12 year-old swimmer Antoinette Joyce Guedia Mouafo from Cameroon. The Median and Mean ages estimates are 26 and 26.37 years. The Random Fact was going to be the average oldest and youngest delegation, but Excel started acting up and I was a bit tired to do more R (date modification can be tricky). There is plenty to explore in this data. Let me know if you find anything neat in the above file and don't be afraid to add more columns of data.


Answers: Norway and Sweden; baseball, softball, boxing, synchronised swimming, rhythmic gymnastics