Tidy HansaRd: a Shiny app to reformat Hansards

Matt Chaib

2019/02/18

Recently, I’ve made several posts about analysing Hansards, i.e. transcripts from British Parliamentary debates, which can be found here. A big part of that process is simply getting the Hansard from plaintext form to a tidy data table, where you have one column for the peer’s name, one for each speech, and then it’s also nice to have columns for gender and party along with a speech_id column.

Essentially, how do you go from this:

"European Union (Notification of Withdrawal) Bill\r\n\r\nCommittee (2nd Day)\r\n\r\n15:37:00\r\n\r\nRelevant document: 8th Report from the Constitution Committee\r\n\r\nClause 1: Power to notify withdrawal from the EU\r\n\r\nAmendment 9A had been retabled as Amendment 16A.\r\n\r\nAmendment 9B\r\n\r\nMoved by\r\n\r\n9B: Clause 1, page 1, line 3, at end insert—\r\n “( ) Within three months of exercising the power under section 1(1), "

To this: