Civic Tech

INTERVIEW: The Importance of Clarity and Priorities in Big Data

As co-founder of RJMetrics, Jake Stein is something of a big data expert. Since 2009, his Philadelphia-based start-up has provided companies with a data analysis platform for getting hold of mountains of information collected from their websites. Given how big data has earned a lot of coverage on Next City in recent weeks, we got Stein on the horn to talk about what government leaders should consider as they begin using piles of data to improve the ways that cities work.

Next City: We did a story on ways that Cook County, Ill. is working to bring data sets together to figure out the vacant land that its new land bank should purchase. But it can take a long time for investments in big data to really pay off. Time is something political leaders are often not afforded, especially when money has been spent. Do you have advice for municipalities looking to get into better governing through using data?

Jake Stein: My advice for municipalities would be very similar to my advice for anyone who is trying to do their job better using data. The most important thing is establishing clear success criteria upfront. Is the goal to reduce vacant lots by a certain percentage within a certain amount of time? Or to create 10 more parks without increasing the budget by more than 10 percent? Without that, it’s very difficult to rally people to a common goal, and you waste a lot of time working at cross-purposes.

The other thing that I think is key for municipalities is focusing on the presentation of the data, but not just as an afterthought. Some insights from data analysis are very difficult to grasp when described in text, but become obvious in the right data visualization. One great is example of this is a visualization that my colleague Ben Garvey made for the controller election [in Philadelphia].

NC: What seems to be a realistic time frame, when you work with clients on data collection, before they start to see the investment to be worth it?

Stein: As you can imagine, it varies a lot. Our goal is for clients to see at least some value the first time they log in. However, it doesn’t always work that way. A lot depends on how much work they have put into data analysis before, and I think that will be the same for municipalities. If the data has been completely opaque or very hard to get at in the past, just making it accessible with some very simple charts can go a long way.

Obviously, the goal is to build on that and gain nuanced insights, but the foundational stuff should be the first step. That also ensures that you can do a sanity check on the numbers to make sure we don’t have a decimal place off and a city budget of 100 trillion.

NC: How good is technology getting at bringing diverse sets of data in wildly different formats together? Governing magazine did a story on this topic, and that’s one of the bottlenecks it raised.

Stein: There are incredibly powerful tools for bringing data sets together and normalizing different formats, but many of them are still very raw and accessible only to very technically sophisticated people. The open source community has made big strides in raw data processing power with tools like Hadoop, but the trickiest things can be getting data from a clunky old mainframe or from a paper record system. It takes a combination of knowledge of the format, understanding of the data set, understanding of statistics, and programming ability to do a big project like that. It’s not for the faint of heart.

NC: By having more information, decision-making may start to take power away from staff opinions, which they base on their experience, and give more power to the data set. It seems like this could create internal friction around deploying big data in cities and governments. Have you seen that kind of cultural resistance in organizations you’ve worked with?

Stein: There is a healthy tension in our customers between being consistent with brand image and trying to make data-driven decisions. I think that most of our customers are relatively forward thinking, however, and they take it as a given that they should base their tactics and strategy off of data analysis. However, what makes sense for some of our luxury customers and a price-focused discount site can be very different, regardless of what the data says. I think it would be the same for a city government.

Let’s say that you’re an ecommerce manager at Tiffany’s — not a customer of ours — and your job is to get the maximum number of visitors to the website to convert into buyers. That’s a great, data-focused goal. You might have more visitors convert into buyers if you have a dancing monkey shoot around the page offering a discount coupon. The one metric that you’re focusing on would go up, but it would be bad for the Tiffany’s brand. It would also probably show up in longer-term metrics like cohort analysis and customer lifetime value, but I think the brand team’s fundamental objection is based on gut and feelings. I think that is a good thing.

NC: Memphis police used external weather data, ran it against their own info and learned that thieves steal more cars when it is raining. How often do you see reluctance from leaders to go beyond their own in-house info and draw on what others know or collect?

Stein: I think that might be a big difference among the public and private sector. I can’t remember a conversation with leaders at one of our customers or prospects where they weren’t looking for data or analysis that would give them an edge over their competitors. That said, there are cases of “not-invented-here” syndrome. Some people feel like if their team didn’t build it, then they are very suspect of its reliability or accuracy. In a political context, where mistakes could cause a public scandal, I imagine that fear could be amplified.

NC: Boston is using a cell phone-based system to identify potholes and the state of New Jersey is using aggregate mobile phone and GPS data to spot accidents faster than cameras can. Is there some sense in which using big data could enable us to make the worst aspects of cities work better? For example, we need to get more people invested in other modes of transport, but big data is going to make our roadways smoother, which will make folks happier using them, which is not what cities really need.

Stein: Like any technology, big data isn’t good or bad on its own. Depending on how it’s used, there will be different winners and losers. In addition to pothole identification, data analysis might help us set surge pricing better for tolls, which helps get people on public transit. That’s why I think it’s so important to identify your success criteria upfront, because different people may have different goals and view the same outcome as a win or a loss.

NC: We’ve already seen notes of caution around big data brought up even by a group that represents data professionals in government. The National Association of State CIOs wrote in a recent report:

Investment in big data skills and technology competes with other investment pressures facing state CIOs. In general, states are still working on many other high priorities such as legacy modernization, consolidation and shared services, cloud computing, mobile services, and cyber security…

Is it too soon for big data investment at the local and state level? Do they need to do other things first?

Stein: I actually think that’s a very practical and mature way to look at the issue. I think it would be a mistake to prioritize “big data” as a goal in and of itself. Big data technologies are a means to an end, and the other things mentioned in that quote are other means that might be more effective in the near term. As a citizen, I care about potholes a lot more than I care about Hadoop in city government. If legacy modernization is a good way to fix potholes faster, then it’s a mistake to prioritize other technologies that have more buzz right now but offer less bang for the buck in this particular case.

NC: San Francisco’s chief innovation officer did an unhackathon in February of last year, where he just got folks together around the problem of inefficient taxicab distribution. It’s pretty basic democracy in action, but may be necessary to bring in as we use data more and more. Do you have advice for folks getting into big data on how they can balance it with human perspective?

Stein: I think you did a great job in framing that question by including the “human perspective.” That is the key. If you’re trying to resolve a problem, the best data scientists in the world will have a very hard time if they don’t have any context or domain knowledge. Data analysis skills are an important part of problem solving, but so is understanding the relevant stakeholder groups, political limitations, budget options, etc. Once you know what you want to do, you need to figure out from whom you need buy in to make it happen, and then persuade them to act. Data can help, but you usually need more than an awesome chart to persuade people.

Tags: civic tech, interviews, civic engagement, elections, land banks, big data, hadoop, jake stein, rjmetrics