Code and resources for the PyData 2019 Democracy Hackathon (Saturday 13:30-15:45 in the Mortimer Room). Hosted by Newspeak House Fellow John Sandall and Richard Chadwick.
Data and technology can be powerful tools for understanding and improving the democratic process. Instead of weaponising these tools to produce unfair advantages, driving mistrust and disenfranchisement, the data community can also do the opposite. This hackathon isn't "The Great Hack", but it will be a hack, it will be great, and it will be using data for good.
Why do some people exercise their right to vote whilst others stay at home? In an era of contentious discourse and political scandals, how can we restore democratic faith and trust in our elected representatives?
In this hackathon, hosted by Newspeak House, we present a series of challenges and datasets compiled by civic tech organisations working to upgrade democracy for the digital age. We will provide working examples, open ended challenges as well as a Kaggle-style prediction competition, and plenty of support if this is your first data hackathon!
Newspeak House is an independent residential college founded in 2015 to study, nurture and inspire emerging communities of practice across UK public sector and civil society. Find out more @nwspk or come to one our upcoming events in Shoreditch.
- Machine learning competition. There will be a Kaggle-style machine learning competition for predicting the turnout of UK general elections. SixFifty has been working hard to source and produce model-ready datasets for solving this problem. All that remains is for someone to solve it!
- Voter engagement. For the hack most likely to get more people to turnout.
- Open data for democracy. Help improve discoverability and accessibility of open datasets and streamline getting them from raw to model-ready by contributing to Maven. Maven aims to reduce the time data scientists spend on data cleaning and preparation by providing easy access to open datasets in both raw and processed formats.
- Painless parsing of political PDFs. A huge amount of civic data is published as tables trapped in PDF prisons. Work towards liberating this information and set it free!
- Fake news, misinformation & public sentiment. It's becoming harder to distinguish legitimate news from demonstrably false news, and with a few taps we can instantly share the stories we consume on our phones to our social networks. More news doesn't mean better news, and big tech companies are increasingly having to moderate and filter the content they host. How can we use the vast quantity of information at our fingertips to create tools or insights into improving the quality of the information we receive online?
- Wildcard prize. The theme is democracy. The goal is a better world. You define how we get there. Should Parliament move to another city? What would be the perfect voting system? Perhaps we should back to the wapentake or the Thing? Should Parliament delegate constitutionally contentious issues to a citizens assembly? Should we replace all branches of Government with a Superintelligent AI?
You don't have to use these, but they're a good start.
- UK Politics Datasets: Crowdsourced document of links to useful datasets & munging tools. Candidates, polling stations, constituencies, parliament voting records, parliament speeches, Hansard, previous election/referendum results, registered financial interests, boundary maps, shapefiles, campaign expenses, registration rates, candidates CVs, constituency stats, GE2017 manifestos…
- Democracy Club: Election identifiers, candidates' info since 2010 (name, email, photos, social media), polling stations, all CC-BY-SA.
- mySociety: mySociety have created a range of tools including Parliamentary Monitoring, structured data on every national politician in the world, information on election candidates around the world, how to contact elected representatives, the constituency/postcode matching tool MapIt, and published transcripts from all levels of government.
- mySociety Geographic data: When it comes to building predictive models, geocoded data is quite handy. Official "open" data portals can be broken, or the data only available via mail-order CD, so mySociety's cache of OS, ONS, and OSNI open geographic data going back to 2010 is a gold mine.
- Start with the notebook in
turnout_model
. - SixFifty Datasets: Model-ready datasets for 2010/2015 elections, EU referendum, opinion polling at national/regional levels, all available in CSV, JSON and Feather.
- Opinion polling data: In 2017 SixFifty created a manually curated set of poll results can be downloaded in JSON, CSV or Feather. See data/polls/ for more information including a data dictionary.
- Start with the README in
voter_engagement
. - To understand what's been tried before, take a look at the tools and projects that were developed for the 2017 General Election. In the GE2017 Tech Initiatives Handbook you'll find a collection of resources, datasets, volunteers, existing projects, proposed projects.
- Start with Maven. How can we make commonly used datasets (e.g. those listed in Newspeak's Politics Datasets or under the "General" heading above) easier to discover, download and process?
- Start with the README in
pdf_parsing
.
- Start with the README in
fake_news
. - Political Twitter: Richard has scraped a collection of tweets from UK politicians.
- Start with So you want to reform democracy?, but don't be disheartened! One thing is for sure, the world in 100 years will be very different.
- 13:30 – Introduction to challenges & datasets.
- 13:50 – Hacking time!
- 15:25 – Presentations & wrapup.
- 15:45 – Event ends.
All attendees are expected to abide by the NumFOCUS Code of Conduct. Please take this opportunity to review it.
Be kind to others. Do not insult or put down others. Behave professionally. Remember that harassment and sexist, racist, or exclusionary jokes and language are not appropriate for PyData. All communication should be appropriate for a professional audience including people of many different backgrounds. Sexual language and imagery is not appropriate. PyData is dedicated to providing a harassment-free event experience for everyone, regardless of gender, sexual orientation, gender identity, and expression, disability, physical appearance, body size, race, or religion. We do not tolerate harassment of participants in any form. Thank you for helping make this a welcoming, friendly community for all.
The full Code of Conduct and additional information can be found here.
If you wish to submit a Code of Conduct report click here.