Text analysis of Rahul Gandhi’s interview

So, Arnab Goswami’s interview of Rahul Gandhi concluded a while ago and now that the transcript is online, it’s time to do some text analysis (I will leave the meta analysis to political commentators/analysts):

Total word count: 12720
Rahul’s word count: 7595 (60%)
Arnab’s word count: 5125 (40%)

The most frequently used words by Rahul (after filtering out some commonly used words):
system (70)
people (66)
going (52)
party (50)
country (45)
want/wants/wanted (40)
thing/things (37)
congress (34)
power (32)
rti (32)
political (31)
think/thinks/thinking (29)
one (28)
issue (26)
riots (25)

2 word phrase frequency:
i am or i’m (70)
in the (57)
going to (44)
the system (43)
this country (39)
we have (38)
i have (33)
of the (32)
to do (29)

3 word frequency:
the congress party (23)
in this country (22)
i want to (18)
we have to (13)

4 word frequency:
we are going to (9)
are we going to (8)
in the congress party (8)

Rahul’s word cloud

Arnab’s word cloud


Note: Word clouds created using Wordle and text analysis conducted using Textalyser and ATLAS.ti. The list of English stopwords was taken from Ranks.nl. To download the data in the spreadsheet, click here. (Please click on ‘file > ‘download as’ to save a copy of the file on your computer)

PS: In case you are wondering, Rahul Gandhi referred to himself in third person 7 times; he didn’t refer to his opponents by their names (Akhilesh or Arvind Kejriwal had 0 references, but Modi had 3); and oh, the word empower or a version of it like empowering/empowered/empowerment had 23 occurrences.

The business of family politics in India

Acemoglu and Robinson, in their new book Why Nations Fail argue that the main difference between successful nations and those that fail is not luck, not culture, not geography, but institutions. The economic and political institutions that a country builds and how they maintain them determine the fate of countries across the world. [For more read their blog: Why Nations Fail.] I want to take off from their argument and look at political parties in India as institutions and how well they are functioning. (This, btw, is a great research topic, as there is little literature on this subject, except for some stuff here and there in newspapers/magazines.)

In this post, I will focus on a narrow aspect and examine dynastic politics. This is also a theme of a new working paper by Mendoza et al who analyze the social and economic effects of political dynasties in Philippines. [For an overview of the findings, see this VoxEU article. The paper has been also discussed by Rupa Subramanya at WSJ India Real Time and Amol Agrawal at Mostly Economics.] To summarize it in a sentence: the key finding is that districts which have dynastic legislator incumbents also have a higher incidence of poverty, suggesting a link between economic inequality and political structure.

One could replicate this study for India and given that 3 out of 10 MPs in India are hereditary the findings could be very interesting. However, in the absence of a dataset that maps district development indicators to parliamentary constituencies, I unable to test this hypothesis. Nevertheless, by combining Patrick French’s dataset (on the biographies and political background of MPs) and ADR’s dataset (on the financial and criminal records of MPs), we can get a closer look at some of the issues. [Side note: to read about crorepati MPs, MPs with criminal records and analysis of Lok Sabha 2009 elections, please read ADR's main report (check out the maps - they are very good) and this analysis for only Lok Sabha MPs. To read about nepotism and family politics read this.]

Hereditary MPs are 4.5 times wealthier than MPs with no significant political background

Consider table 1 that presents the mean value of total assets (declared by MPs in the affidavits they file along with their nomination papers before the elections), according to political background. The average Indian MP has declared assets worth Rs. 5 crores (Rs. 2 crore movable and Rs. 3 crore immovable assets). Predictably, MPs who have a business background have are on top of the table with an average of Rs. 15 crore of total assets. They are closely followed by MPs from the royal family, who on an average are worth Rs. 14 crore. Ranking third are, to borrow French’s term, “mummy-daddy” MPs worth Rs. 10 crores and right behind them are MPs who were inducted (Rs. 9 crore).

This has to be more than a coincidence, right? You would expect actor-turned-MPs like Jaya Prada, Satabdi Roy, Shatrughan Sinha and cricketer-turned-politician Azharuddin to be rich because of their successful (?) past career, but if you were to take the case of other MPs who were ‘inducted’, like IIT-IIM graduate and successful banker, Prem Das Rai or Shashi Tharoor (India’s most twitter friendly politician) or Annu Tandon (trustee with Mukhesh Ambani’s Reliance group) or US return Madhu Yaskhi and Janardhana Swamy it is hard to not to miss the crucial role of money in politics: all of the above inducted MPs are crorepatis.

On the other hand, the MPs who entered the political area via student politics or RSS route or just the the regular way have only about Rs. 2 crores of total assets, and lie below the national average. Depending on your level of cynicism with the Indian democracy, reactions after looking at these numbers could possibly range from “hmm, interesting” to “so what? tell me something new”.

Hyper-hereditary MPs are the wealthiest of all

Now, let’s look at table 2. If we divide the “mummy-daddy MPs” into hereditary and hyper-hereditary MPs (hyper-hereditary MPs are those who have multiple family connections – you can, loosely speaking, think of them as a proxy for dynastic politics) and run the same analysis, you will see the “mummy-daddy MPs” were masking a crucial distinction. Couple of points:
1. There is a wide difference in the average assets of hyper-hereditary and hereditary MPs: the former is almost twice richer than the latter.
2. More importantly – and this is the result that surprised me – hyper-hereditary MPs are the richest folks in the Lok Sabha and their average total assets is even more than MPs who have a business background!!
3. Another interesting finding: On an average, inducted MPs are richer than hereditary MPs.

Who are these ultra rich, hyper-hereditary MPs, you ask? In descending order, they are: Naveen Jindal, Gaddam Vivekanand, Harsimrat Kaur Badal, Preneet Kaur, Pinaki Misra, Daggubati Purandeswari, Maneka Gandhi, Shruti Choudhry, Dushyant Singh, Varun Gandhi, Sachin Pilot, Ajay Maken, Bharatsinh Madhavsinh Solanki, Salman Khurshid, Ashok Tanwar, Rahul Gandhi, Sandeep Dikshit, Pratik Prakashbapu Patil, Ravneet Singh, Sonia Gandhi, Vijay Bahuguna. Did you also note a common link among these MPs? A significant majority – 16 out of 21 – are part of the Congress.

Clearly, there are a lot of hereditary MPs in the Congress and this coupled with the lack of inter-party democracy speaks volumes about how the institutions of political parties in India are crippling.

Source: ADR dataset and Patrick French (PF) dataset. (Please refer to respective websites for clarifications on the data.)

Note: You may view/download the merged dataset in google doc here. It is likely that the formatting was disturbed when converting the spreadsheet from excel to google doc format.  You may download the original merged data set in excel format from here. Small clarification: while it is true that an analysis based on the ADR dataset will not represent the true picture, but given these are affidavits we are talking about and that there is no incentive for any candidate to overstate their assets, we can consider the declaration of assets as a lower bound and so the results based on the data are only going to be biased downwards.

Documentation: If you are performing the merging of PF’s and ADR’s dataset, be warned that they don’t  have a strict one-to-one correspondence because name of constituencies are spelled differently. If you are referring to the merged dataset, please refer to ‘Merge (MASTER)’ sheet as it takes care of these issues. Data from PF is highlighted in blue and data sourced from ADR is highlighted in green. Merged (MASTER) contains information for all 545 MPs and information that is missing is highlighted in yellow. During analysis, 3 MPs were dropped because their assets and liabilities information was not available. This new dataset is available under ‘Merge (coded)’ sheet. Please refer to the codelist for questions on the codes. The MPs who were dropped are: Raj Babbar (ADR’s dataset and the ECI website have Akhilesh Yadav’s affidavits in place of Raj Babber’s), Charles Dias and Ingrid Mcleod. (The latter two are nominated Anglo-Indians and again corresponding information was not available in their case). I tried to verify the merged dataset by comparing summary stats from it to this ADR report [link is now broken]. The data for the party matches up perfectly, but there is some discrepancy when comparing average assets of MPs according to states. I verified the list of constituencies in PF’s data with ECI’s data and it matches well. Since ADR’s dataset does not contain states, my guess is there may be a coding issue at their end.