Opinion

A new frontier: How big data can improve government services

We are living in a new data-driven era. Some estimates have suggested that 90 percent of the world’s data has been produced in the last 18 months. By all accounts, we are quickly approaching a new frontier in big data production, data management and data analytics.

Nowhere is this profound shift more evident than in government, where we are seeing fundamental changes in how data can improve government services and increase management accountability.

Take for example New York, where state and city jurisdictions each have some of the most extensive open data portals in the country. In New York, concerned citizens, researchers and even just curiosity-seekers have the ability to link to a publicly accessible website where they can download all kinds of useful and usable raw data.

But open data is just the tip of the iceberg in how data can be used to better serve taxpayers.

In a recent article in Government Technology magazine, Jessica Renee Napier explains how Bexar County, Texas – the sixth-fastest growing county in the U.S. – used big data analytics to help reduce its local jail population, saving the state’s taxpayers millions of dollars. Bexar County didn’t want to spend its precious funds on building a new jail; it preferred to spend the money on new roads or a new library.

Stephen Goldsmith, the Daniel Paul Professor and the director of the Innovations in American Government Program at Harvard University’s Kennedy School of Government, echoes this sentiment, noting that data-driven approaches are beginning to produce benefits not only for jurisdictions seeking to better manage their jail populations, but also throughout the criminal justice system.

Yet despite these strong praises for the efficacy of big data use within the public sector, many government agencies have yet to take advantage of this new opportunity. Why?

One reason is that the potential value of using data across agencies is difficult to realize – there are many legal and technical hurdles to clear and only a few prototype successes to point to. Another is that the pressures to meet existing program needs make it difficult for agencies to try something new and create pipelines of new products. A third is that government salary structures make it difficult to hire and retain enough in-house data analysts, so agencies don’t have the capacity to work with new linked data. These combined challenges have led to the current situation: Agencies cannot get the significant resources necessary to make use of new data, and because they don’t use new data, they don’t get new resources.

A classic conundrum – or is it?

In a new book titled “Big Data and Social Science: A Practical Guide to Methods and Tools,” Julia Lane, a professor at New York University’s Robert F. Wagner Graduate School of Public Service and the NYU Center for Urban Science and Progress, Julius Shiskin award recipient and one of the book’s editors, identifies three challenges that government agencies must address in order to be successful in harnessing big data: protecting privacy and confidentiality; demonstrating that integrating data can substantially serve each agency’s mission; and developing the capacity to do the work. All can, and must, be addressed if data is to be used in decision-making.

Professional data stewardship must be the core of confidentiality protection. Historically, this stewardship has rested on a portfolio approach: safe people (restricted access), safe projects (project audits), safe settings (secure environments), and safe outputs. Often this is handled manually, and without recourse to the best technologies. Ad hoc manual approaches can and must be replaced with automated tools so that access is possible and analyses are fully auditable and replicable.

Any initiative must also demonstrate value to each agency and the constituencies that they serve. Many agencies are not legally permitted to share data unless the work is consistent with the agency mission. Since it is difficult to speculate what new products can be produced from integrating datasets across agencies in advance, pilot demonstration projects should demonstrate the value of integrated data.

The creation of agency workforce capacity is critically important. Data scientists who can demonstrate to their fellow civil servants the value of the new types of data for solving practical problems are invaluable. Training programs must be established that engage staff across agencies and empower them to work on cross-agency projects using multiple sources of data.

Talented and dedicated teams of data scientists and scholars at NYU, the University of Chicago and the University of Maryland have worked to produce a set of certificate classes that do just that. The certificates are structured to work with confidential data in a secure environment and 1) create a pipeline of new prototype products central to agency missions as defined by senior management; 2) develop teams of practitioners who can demonstrate the value of the new types of data for solving real-world practical problems and who become embedded in their organizations; and 3) make new linked data available as an ongoing asset. More detailed information can be found on the course website (dataanalytics.umd.edu); the Laura and John Arnold Foundation is providing a limited number of scholarships to government agency staff.

The initial focus of their work is on three policy areas: the earnings and employment outcomes of ex-offenders, welfare recipients and veterans. The value to society of better policies is clear.

For example, recent research by the Center for Economic and Policy Research estimates that the reduction in the overall employment rate caused by the barriers former prisoners and people convicted of felonies face costs the United States $78 billion to $87 billion in annual GDP. Related research shows that former prisoners released to a county with higher low-skilled employment and higher average low-skilled wages significantly decreases the risk of recidivism.

It would appear that finally, there is a clear opportunity for government agencies to integrate data to serve citizens. It is fair to say that there is the potential for a new data infrastructure to evolve that can enable the joining of datasets across federal and local agencies and enhance decision-making.

At the federal level, the Evidence Based Policy Making Commission Act of 2016, otherwise known as the Ryan-Murray Act, was signed into law March 30, 2016. At the state and local level, multiple leading foundations such at the Bill and Melinda Gates Foundation, Ewing Marion Kauffman, Laura and John Arnold Foundation and Bloomberg Philanthropies are funding efforts to support evidence-based decision-making. And at the city level, every local government organization (U.S. Conference of Mayors, National League of Cities and the International City/County Management Association) has launched a major initiative and peer-learning group focused exclusively on data.

The new frontier is looking brighter by the moment.

Julia Lane is a professor of public service at New York University’s Robert F. Wagner Graduate School of Public Service and NYU’s Center for Urban Science and Progress.Tom Herzog is former deputy commissioner of the New York state Department of Corrections and Community Supervision.