The Chicago Parks District has long tested water samples from 28 city beaches to measure levels of E. coli, a bacteria found in animal intestines that can make swimmers sick. But with results taking 18 hours — too late to issue an advisory if levels are unsafe — the agency implemented a predictive model in 2012 that aimed to estimate what levels might be the following day. However, after attendees at Chi Hack Night realized that approach wasn’t actually predicting very well, the first direct collaboration between the city and these citizen data scientists was born. The result: Three new predictive models will be tested on Chicago’s beaches this summer.
The search for a better predictive model began with two projects initiated by Scott Beslow, a software developer who regularly attends Chi Hack Night. The free weekly event brings together researchers, web developers, data journalists and others interested in using technology to improve government. Beslow in particular says he is driven by “finding tough-to-find data and bringing it toward the forefront in a really publicly accessibly way.” Along with three other developers, he created a website, istheresewageinthechicagoriver.com, that keeps a record of when Chicago’s water management agencies are dumping excess wastewater into the river due to lots of rain or snow melt.
Next, Beslow and other volunteers with civic-tech app group OpenCity developed Drek Beach, which pulls on the Parks District’s data to compare levels of E. coli across all the city’s beaches. Every day the District posts a prediction for E. coli levels that day plus the actual levels from the day before — measured by that 18-hour lab test. Beslow realized there was no historical context: A user couldn’t go back to see whether the prediction from the day before had been accurate or not. So he used Drek Beach to draw those comparisons for the entire 2015 beach season.
“At the end of the season, the lesson I got from looking at the data was the model that the city was using was not doing a very good job of keeping people out of the water when levels were dangerous,” says Beslow.
Tom Schenk, Chicago’s chief data officer, also attends Chi Hack Night, so Beslow approached him about what he’d discovered. Beslow said the Parks District’s model (developed by the U.S. Geological Survey and Michigan State University) was only predicting elevated levels less than 10 percent of the time. The EPA recommends a swim ban for E. coli counts higher than 1,000 CCE, but according to Drek Beach, of the 37 days in 2015 when E. coli levels at one of the beaches exceeded that number, bans were issued only 21.6 percent of the time.
The primary sources of E. coli on Chicago’s beaches are animal feces — particularly from seagulls — garbage and, occasionally, raw sewage. To estimate levels at a particular beach for the next day, the Parks District’s original predictive model drew on data including weather forecasts, that beach’s historical levels and the shape of the beach. A bowl-shaped beach would capture water for longer, for example, while a beach more perpendicular to the water flow would see more movement and lower levels. Elevated levels happen only about 15 percent of the time, which is “tough statistically,” says Schenk. The safest bet, based on past data, is that levels probably aren’t high.
Schenk launched a breakout group at Chi Hack Night to get volunteers working on the problem, in hopes of developing a new predictive model before 2016 beach season began at the end of May. The Parks District released 15 years of data to the group, and Beslow and several other volunteers got to work. By May, the group had not one, but three new models for the agency to test. They don’t draw on different sources of data, but rather take the existing publicly available data and make better use of it with more powerful machine-learning techniques. “These models can dig in a little bit deeper and pick up some nuance that the previous statistical model could not,” says Schenk.
To get a sense of the models’ accuracy, the team tested them using existing data. They could use the 2014 data to predict 2015 levels, for example. Based on such analysis, the models seem to work well. But the real test will be using data that’s totally unknown. This summer, the three new models will all be utilized alongside the original model and the daily water sample testing. Whichever model has the highest level of accuracy at the end of the season will be implemented for summer 2017.
Other Chi Hack Night projects have included apps to help locate nursing homes, or to read up on quality of schools across city. Schenk says the work of the breakout groups is now evolving, beyond creating apps — which tend to be portals for citizens to connect with city data — and toward projects like the beach analytics that actually improve governmental processes.
“This is the first project that was a city-community collaboration, and that’s what’s exciting about it. No other project had been done at the service or the benefit of the city in this way,” says Schenk. “It’s more hands on deck. When you deal with technology and research, the more eyes that are looking at a problem, the more folks that are thinking about it, you get a better solution.”
The Works is made possible with the support of the Surdna Foundation.