Tools for data wrangling

Did you say you are interested in some data wrangling? Or perhaps some data scraping? Wait, you say you just want to learn how to clean data and maybe geocode it? For all this and much more, log on to School of Data now! You can even take a course online. The following are some of the recommended tools:

Extracting: Google chrome scraper extension, Google spreadsheets, Scraperwiki, gImageReader + Tesseract

Cleaning: Open Refine, Spreadsheets, Nomenklatura

Analysing: Spreadsheets, R, Gephi

Presenting: Tile Mill, Fusion Tables, Gephi, Many Eyes, D3

Sharing: The Datahub, Google Docs, Github

Source: School of Data

PS: Other good places to check out are Codeacademy and the resource page at Exposing the Invisible.

Results from Census 2011 Household Listing

The results from the houselisting exercise that was conducted as part of the 2011 census have been online for a while now. Most of us know that India’s population is over a billion (1.2 billion to be precise) but some lesser known facts are that the population lives in 244.6 million houses, spread across 0.6 million villages and 7933 towns.

The listing exercise also has collects data on possession of some household assets for communication and transportation. The top 3 modes of communication were: mobile/telephone (63%), television (47%) and radio/transistor (20%), while the top 3 modes of transportation are: bicycle (45%), scooter/moped/motorcycle (21%) and car/jeep/van (5%).

However, the most interesting data point was that 18% of the households do not possess any of the assets i.e. no mobile/telephone, no TV, no radio, no computer, no bicycle nor any vehicle.

The following spreadsheet summarizes the distribution of various household assets:

Click here to view the spreadsheet in a new window/tab.