Wednesday, April 14, 2010

Google and the Library of Congress to acquire Twitter Archives

Both Google and the Library of Congress announced today that they will be acquiring the entire Twitter Archives going back to March 21, 2006.  Specifically, Google will handle the searching while the LC will handle the archiving (Wired).  Searching for President Barack Obama's first tweet as president-elect would be an example of the usefulness of such a collection.

A more profound application would be in the area of Humanities High Performance Computing (HHPC).   The Office of Digital Humanities defines 'High Performance Computing' as 'fast computers, capable of performing calculations many times faster than standard desktop machines. High Performance Computing is used mainly by scientific disciplines for processing huge amounts of data, data mining, and simulation. That is, using an enormous amount of data to simulate a physical object or series of events,' such as studying hurricanes.

HHPC takes this application and applies it to humanities and social science projects. One could mine tweets for trending topics, correlations of location and particular topics, tags, snapshots on various daily issues, historical insights, individuals before they became famous, and even apply these results to visualization data techniques.  As the ODH says, 'HHPC offers the humanist opportunities to sort through, mine, and better understand and visualize this data.