A Halton Region Citizen Initiative

We are a group of citizens who believe in transformational power of open data and open government, our mission is to bring Open Data to the Halton region.

06 April 2011 ~ 0 Comments

Collision Data

Posted by:

This morning’s tragic news about a pedestrian killed in a collision on Britannia road (via MiltonSearch) raises a question about data on similar collisions and accidents in the region. According to HRPS collisions statistics there were nearly 7,000 property damage collisions in 2009, with 1,178 injury collisions. What if there was a way to analyze this data?

accident1Are there any areas across Halton where such collisions occur more frequently? Are there any  “patterns” in the data or correlation with factors such as availability of street lights, sidewalks, crosswalks at intersections, etc. Can we gain any insights into areas that are more likely to be dangerous for pedestrians?

The answers could be found if the historic collision data gathered by police were made available as Open Data. Our friends at Open Hamilton were able to get their hands on Hamilton’s raw pedestrian accident data, which can be used to derive various insights, particularly when “mashed up” with other data. A map below, for example, mashes-up geocoded pedestrian accident data with geocoded crossing guard data:

[map id="map1" z="10" w="600" maptype="ROADMAP" kml="http://openhalton.ca/openhamilton/xingaccidentmashup.kml"]
Legend: Pedestrian Accidents       Crossing Guards

There are many ways to derive insights from such data, as long as it carries at a minimum some useful information such as date, time, intersection, distance from intersection & direction (if it occurred between 2 streets for example), vehicles and/or pedestrians involved and an indication of a severity of an accident. This type of data if made public is not just for curiosity sake, but can help potentially prevent accidents & collisions through better knowledge of when such incidents are more likely to occur. Think about analyses like collisions by time of day (day, dusk, night, etc), combined with historic weather data – collisions by weather conditions (and visibility), or as simple as heat-maps of the most dangerous areas or “roads to avoid walking on”.

The possibilities are endless, we just need the data to work with. I sent my request for the data to HRPS tonight, fingers crossed…

01 April 2011 ~ 2 Comments

Blowing Smoke

Posted by:
Blowing Smoke

keep_calm1Headlines like “Gas plant plume poses ‘no threat’ “ and “Plant’s yellow plume no cause for alarm” about Halton Hills Generating Station emissions, as well as statements like “It was just scrap; Plenty of smoke, little danger” mentioned in Hamilton Spectator post are great at driving home one message: Keep Calm and Carry On.

But while it worked to raise the morale of the British during WWII, it’s far from putting me at ease. Really? “No cause for alarm?” “No concerns regarding environmental impact?” Thank you, Mr. spokesperson for the company (TransCanada) whose plant is worrying my neighbors enough to want to complain to the Ministry of the Environment. I appreciate that pollution may be within “normal limits”, but wouldn’t it be nice to know if it’s on the higher end of the norm, bordering the limit, or way way below what the industry average is?

Our tap water is probably within “normal limits”, but many of us choose to filter it first before we drink it. All food in grocery stores has to be up to health standards to “meet regulations”, but many choose to buy organic. Baby products almost certainly pass safety regulations, but we still research them, we read energy star ratings on appliances, look for car safety data – all to ensure our families consume only the “best”, according to your criteria.

smokestack1So why wouldn’t I care about emissions data for a plant only 3 minutes away from my home? Why wouldn’t I want to have a choice to drill into the data on the volume of chemicals that leave that smoke stack and enter my back yard, my home, my child’s lungs? Oh, I am no chemist, nor am I an environmental activist. I’m just a guy who likes to have an option to read the ingredients label to understand what I’m consuming, if I so choose!

That’s where Open Data becomes such a critical factor in building trust not only between the government and its citizens, but also between companies and their customers. Even though Environment Canada gathers and makes publically available pollution data in its NPRI database, its most recent data is 2 years old! That’s why I couldn’t find any data on the Halton Hills plant in EMITTER.CA, a site that visualizes NPRI data for any address in Canada. It’s because the Halton Hills plant began its commercial operations in 2010 according to Ontario Power Authority’s website.

This is bigger than the government regulations. It’s an opportunity for the company whose operations are causing concerns to step up and share the actual data for what’s being released in the air. Yes, TransCanada, here’s your chance to build a solid reputation with us, residents of Halton, who drive by your plant daily, wondering what and how much is being emitted from your stacks. I love your Public Safety and Awareness page and your Corporate Responsibility Reports, but how about the actual emissions data for your operations? Can we see that?

So, how about it, TransCanada? Put your money where your mouth is. Prove that there’s nothing to be worried about by opening your emissions data. Let us make up our mind based on the numbers, not your reassuring statements. Otherwise, it just seems like you’re blowing smoke.

31 March 2011 ~ 0 Comments

3 Lessons from OpenHamilton Hackfest

Posted by:
3 Lessons from OpenHamilton Hackfest

Earlier this week some of us from OpenHalton: Mark, Aaron and I joined the Open Hamilton crew at their first hackfest. The Monday night meet-up was timed with the release of the municipal election candidate financial disclosures by the City of Hamilton. The objective: to lay the open data foundation for the 2010 Candidate financial statements, which outline how much candidates spent on their campaign, and where this money came from.image

For about 5 hours we hacked away to transform the PDF scans of the 9503P “Financial Statement – Auditor’s Report” form pulled from the city’s website into a more usable dataset. I had my reservations about an “evening hackathon”- type event, since we had just a few hours to put together something useful, but we managed!

IMHO, the following 3 considerations made it a success:

1. Plan for Success: Logistics are important!DSC_0187

Open Hamilton team planned the hackfest at an easy to get to location, right off the highway (and by Timmy’s where some of us fueled up Smile). Not even to mention, it was a super cool place: Think|haus – a.k.a home of Hamilton’s “open data” and local hackerspace. Our host Richard Degelder, whose beard rivals that of Richard Stallman, got us access to an open dedicated WIFI with plenty of bandwidth & all the necessary ports for FTP, SSH, Remote Desktop, etc. The whiteboard and projector helped organize our thoughts, a shared Google Doc helped aggregate links & coordinate our efforts (in the past we’ve used Wiki’s like PBworks that work well, but it’s nice to be able to collaboratively edit the doc. If you prefer the look/feel of Excel, you can use Office Live for free to edit simultaneously as well).

2. Set Realistic Goals: Don’t boil the ocean!image

It is tempting to aim really high and set lofty goals for a hackfest: “we want the most comprehensive dataset”, or “we’ll build the coolest app that does everything you can possibly imagine”. Don’t do it! Set a modest goal to end up with a “basic dataset X” or a “basic functionality Y” in the app you’re hacking. Then you’ll be happy to wrap up your hackathon with something to show for it. On Monday we decided to “scrape” only a small number of data fields, thirteen to be exact, pulled from the PDF forms, and to load the data into the OGDI catalogue that stores data for DataDOTgc.ca – so that we can visualize it. End-of-hackfest: we successfully loaded the finance datasets, exposed the data via APIs, and were even projecting some dynamic OGDI charts on the whiteboard. Mission accomplished!

3. Do Your Homework: Build the Foundation!image

We knew not only what we wanted to do, but also had a pretty good idea of how we would do it. Prior to the hackfest there was a planning thread to discuss the project and the tools that are needed. Going in, we knew we’ll be extracting data into a simple format (CSV) and loading it into a data store. We also discussed the scope of the effort, and that we’d eventually want to mash-up some of the data with ward maps. This meant we needed Hamilton Ward data in a format like KML; so Joey Coleman and Richard did some pre-work to get the Ward boundaries data prepared ahead of time. We ended up with a clean version of the Hamilton Ward Boundaries KML, and because we planned for it ahead of time, we included information like the Ward ID in the V1 of the spending dataset that could later be easily mashed up with the Ward KML Layer. So, a bit of pre-planning up-front saved us a bit of aggravation in the end.

To take PDF data into a spreadsheet or database and then a cloud catalogue doesn’t seem like a big deal on its own. But to orchestrate half a dozen or more people with various skills and technology experiences requires some balancing of short-term objectives of the hackathon with long-term goals of the local open data movement.

17 March 2011 ~ 0 Comments

If the feds can, why can’t we?

Posted by:
If the feds can, why can’t we?

Those of you in the twittersphere or following Gov in Canada feeds/blogs have probably heard today’s biggest news in the IT/gov sector: the launch of Canadian Open Data portal http://data.gc.ca

This is a catalyst for innovation, as companies & organizations are able to find new and creative ways to analyze, visualize, integrate & built upon these datasets. UK, US and major cities across the World have spawned a number of web and mobile applications for their citizens, ranging from transit & parking to health & environment monitoring app, helping drive better government services & more citizenship engagement.

While many of Data.gc.ca’s datasets seem to be pointers to some older data (like the NPRI Industrial Pollution data that our team used earlier this year to build EMITTER.CA), there are some new and interesting datasets which have numerous commercial & non-commercial applications. In particular, the 260,000+ geospacial datasets, also representing a vast majority of data released, is seriously awesome data that nerds like myself would love to use in applications. From roads, to province boundaries to coasts & inland water resources, many of these are in GIS formats, which makes for easy/easier mash-up opportunities.

The objective of OpenHalton is to try to get similar type data for the Halton region. We’re at the very end of an intersting project WardRep, which would have already been released…. had it not been for the lack of open data. WardRep is intended to allow residents of city to easily & intuitively see which City Ward they live in, and who their City Councilors are + see in one place different ways to contact them (email, phone, twitter, facebook, etc.)

Unfortunately, at this time we only have Milton’s ward boundary data (thanks to some diligent data scraping by one of our members), as well as data from Guelph and London, ON which luckily arrived in an map-ready format. WE NEED that same data for Oakville, Burlington & Halton Hills in a GIS format to avoid unnecessary scraping & digitizing of data. What takes minutes for a city IT employee to export from a GIS system takes hours for someone to “scrape” from PDFs…. and this data is alreay something that city has and should  be shared.

It is my sincere hope that today’s announcement will trigger some thinking & action on the part of our local municipalities to follow in the footsteps of other Canadian cities — and now Federal Gov’t — in becoming more open with our data!!

24 February 2011 ~ 2 Comments

I Sing the Data Open

Posted by:
I Sing the Data Open

And if the body does not do as much as the Soul?
And if the body were not the Soul, what is the Soul?

“I Sing the Body Electric”
Walt Whitman, Leaves of Grass, 1855

Last Saturday I was invited to speak at the annual dinner for the IEEE Hamilton chapter at the beautiful Ancaster Mill:

Topic: “I Sing The Data Open” (Open Data Initiatives)

We’ll discuss what’s and why’s of open data, why it’s a big deal both in Canada and WorldWide, how it fits as part of Open Gov initiatives as well as how private companies and organizations can find business opportunities to use open data…

Once in a while I get inspired to build a new talk that not only shares my perspective, but also challenges me to re-visit my own understanding of the subject, check my facts, re-visit concepts and to re-evaluate whether my views have changed. It’s like going through spring cleaning or a mass indexing of your music or video library: a way to find out what I know, while discovering & filling in some gaps along the way.

This was one of those talks.

To be able to articulate the differences between Open Gov, Open Data, Gov 2.0 – I had to re-visit some oldie but oh so goodie definitions such as this one for Open Gov, 8 Principles of Open Data and 3 Laws of Open Data as captured by David Eaves. In the end, I found it necessary to build a Venn Diagram to capture the relationship between Open, Data & Government – to help orient the audience to the area of my interest: Open Data & Open Gov Data at its core. Feel free to re-use (creative commons on Flickr)

Along the way I found some truly great articles, like the one by Melanie Chernoff at RedHat on what “open data” means – and what it doesn’t, crisply articulating the differences between Open Data and Publically Available Data: “All open data is publicly available. But not all publicly available data is open”.

Another post Killer open-data apps from around the world by James McKinney from Montréal Ouvert had a fantastic collection of links that helped me think through different levels of citizens involvement with Open Gov, which I captured as follows:

And finally, as I tried to articulate the relationship between Open Data and Open Gov, and their impact on people’s lives, I thought back to Walt Whitman’s poem “I Sing the Body Electric”, where he explores the interconnectedness of the Soul and the Body and celebrates the importance of the Body in how it forges connections in our human society.

Borrowing from Whitman, Open Data is to Open Gov as the Body is to the Soul.

Open Gov as a movement and a governing principle is made real by Open Data. It is how an open government can interact with its citizens, and it is Open Data that helps bring to life the principles of Open Gov and through Gov 2.0 forges connections in our civil society.

In the end, I came out not only with a talk that people seemed to have enjoyed, but also enriched my own understanding through the work (and perspective) of others.

09 February 2011 ~ 1 Comment

Liberating Milton Transit Data: Part 1

Posted by:

Part 1: Escape from PDF

It’s been almost a month since we reached out to Milton Transit (MT) with hopes of getting hold of their “raw” transit data. While we got a reply promising an eventual response, we can’t make progress on the Transit Project unless we have data to work with. So, we sprung into action to try and “liberate” at least some data sets, while we are hopeful to get more from MT directly.

The end-goal for the Transit Project is a mash-up app that would provide a more intuitive and interactive bus route map showing stops and integrated bus schedule. For data this means we need to extract the data from MT’s published maps & schedules to load it into a database (or structured format like XML, CSV, or perhaps Google Transit Feed GTFS), and add missing pieces (like the bus stop coordinates) by gathering the data manually if MT’s contacts don’t come through.

altStep 1 to set the data free is getting the schedule data out of PDF into a more useful format. Developers would probably use tools to extract the data, and write scripts to transform and load the data into a database. I, unfortunately, am not one of those people, and tend to rely on things like Excel to manipulate and “shape” the data.

Depending on how the PDF is produced the data may be easier to “extract”, particularly if it wasn’t created from an image. MT’s new bus schedule data looked like mostly tables, route1-woodwardso I tried a few approaches & tools (listed on the wiki here) to get the data out:

  1. Basic Copy + Paste from PDF to Excel:   each time table row transformed into one continuous string with each time separated from one-another by a space (or two). This may be an acceptable situation if no other method worked.
  2. Online PDF-to-??? / PDF-to-Excel tool like Zamzar: http://www.zamzar.com/ : I was impressed by the quality of conversion. It generated a pretty clean XLS that needed a few tweaks, like removing extra columns but otherwise was useful.
  3. Offline PDF to ??? / PDF-to-Word tool like SmartSoft Free PDFtoWord Converter (didn’t work, thought our PDF was password protected), or SolidDocuments Solid PDF Converter (which worked pretty well in “FLOWING” mode, table recognition “on”)

The key thing in converting from PDF to any format is consistency. route1-excelAs long as you end up in a format that you can write a formula or script against to transform the data, you can then export the data to a structured format like XML or CSV or a database. In my case I wanted to get the data into a set of Excel tables that I could write a few formulas against, with consistent header / cell structure.

The end-result of Part 1 is this Excel doc (first 3 schedules took about 30 mins from start to finish), or you can see the HTML version of it.  Parts 2 and 3 will deal with transforming the Excel table data into a more structured “true table” route, stop, schedule, etc. format, following the GTFS schema.

07 February 2011 ~ 2 Comments

…and we’re back after a server crash!

Posted by:

A few days ago our OpenHalton.ca server went down due to a bad Hard Drive and we lost the site and most of the data. Unfortunately it served as another lesson on value of backing up your data, and I definitely learned it again this time…. yes, again…. :(

I’ve recovered all of the posts, but unfortunately most comments & user information is gone. So, please feel free to re-register. Apologies for the inconvenience…

While at it, we added DISQUS & AddThis modules for WordPress, upgraded the WordPress install & database — and good riddance to all those annoying SPAM comments ;-)


25 November 2010 ~ 0 Comments

Open Data Hackathon – Dec 4th

Posted by:

OpenGuelphThings have been quite busy with open data in the last few months. Some of Open Halton team members Mark, Aaron, and myself have been head down working on Emitter.ca (Mark’s post on it here), and we were even featured in the front page of Oakville Beaver. Woo hoo! We are still getting some feeback & suggestions on Emitter, and also ideas for new open data apps that could be created.

So, here’s your opportunity to be part of building an Open Data OpenHalton_Twitter_v2_150x150application, and make a difference for the Halton region communities. In partnership with Open Guelph our team is organizing an Open Data Hackathon on December 4th, where we’ll tackle a project to help make City Ward & Town Council information easier to access for citizens. We’re doing our part in the International Open Data Hackathon Day. Our our Hackathon we’re planning to “liberate” the ward boundary data that today is contained in .PDF files on various websites, and integrate that data with council contact information. All of this we hope to make available as a web site and also a mobile application by the end of day on December 4th.

Register HERE, particularly if you are comfortable with graphics design, or work with data (at least know how to use Excel and how to move around data from web pages / HTML to Excel or other spreadsheet software), or are a web or mobile developer, or even if you just have good ideas and want to help out at the hackathon — we’d love for you to come out. The hackathon will take place at Microsoft Canada (event sponsor) at Meadowvale & Mississauga rd intersection (401 & Derry area).

Check out the HACKATHON WIKI, where we are aggregating ideas, tips, developer links, etc. and feel free to add your own ideas & thoughts on what would make for a compelling application that addresses the challenge of more accessible ward/city council information.

It’s a great opportunity to discover what open data can do for you and your city & to also network and enjoy some “geek time” with likeminded techies :) See you there??

15 November 2010 ~ 0 Comments

Tracking Pollution in Halton Region

Posted by:

We have been busy at Open Halton plugging away on some projects.  Today I’m happy to announce the recent launch of one called Emitter.

Some Background

Recently my team at RedBit Development and myself were part of a team to help make pollution data made available by Environment Canada easily searchable by citizens concerned with pollution in their neighbourhood.  With myself living in Oakville, and as one of the communities with the most mature and flourishing trees, it does have quite a few factories in to town which all obviously emit some sort of pollution into the atmosphere.  As a 5 year citizen of Oakville, this has always been a concern for me especially concerning the health of my three children. Pollution and what facilities emit into the atmosphere is always a concern for most people and hence the idea of allow people to easily visualize this on a map was born.

Where did the data come from? 

Environment Canada actually releases pollution data that industries are mandated to report on.  All this data is public data and all pollution data for Canada is available on the Environment Canada site.  If you clicked the link and looked at the data you will notice it is not very friendly and not very easy to read.  As a normal regular citizen (me), you just want to know if I’m safe in my neighbourhood and don’t necessarily care what type of pollution/chemicals are put into the atmosphere.  The only time I would care is if there are large amounts of pollution in my area then I would want to dig deeper and find out why.  With the state of the current data, you can’t easily find out ‘Why?’ and visualize if it’s in your neighbourhood.

Now if you are a developer, using this data is a real challenge.  It’s not really in a standard format to be able to use in a custom application.  And to decipher the data, you need to spend a lot of time figuring out the structure of the data so it’s useable for your specific purpose.  All pollution datasets have been made available by Environment Canada as Microsoft Access databases  and is posted publicly on their website.

Introducing Emitter.ca

EmitterEven though the data is open and available it’s not very easy to use.  To make things easier to visualize and easier to search I got my team at RedBit Development involved in a project to help liberate pollution data.  Emitter takes all the Environment Canada data and allows citizens to easily visualize the data pollution data around their neighbourhood.  There are a few ways to search for data and it’s all well explained on the Emitter site. The end result is now you can very easily search and visualize pollution levels in and around the area you are interested in.

What does the search return?  It returns the company that is releasing the pollution into the atmosphere, a ranking relative to other companies and also returns the federal riding and the Elected MP Official for that riding.

Building Emitter was a collaborative effort between many individuals across Canada and hosted by the Microsoft Canada Open Lab.  The  key contributors are listed on the Emitter site but everyone definitely deserves a mention for the effort

  1. Aaron McGowan – Main developer on Emitter.ca, open data activist, student, hacker
  2. Barranger Ridler – Responsible for integrating the pollution data with OGDI and helping get the site up and running on Windows Server
  3. David Eaves – Open Government activist and envisioned the Emitter.ca concept and helped bring it from start to finish
  4. Matthew Dance – The ‘brains’ behind the Emitter.ca methodology, graduate student at University of Alberta and responsible for interpreting/analyzing the pollution data and allowing us to present in useful ways
  5. Nik Garkusha – Open Data enthusiast and Open Source Strategy Lead at Microsoft Canada.  Nik took role of architect and helping to envision Emitter and most importantly providing funding and hosting for Emitter at the Microsoft Open Lab
  6. Mark Arteaga – I took on the role of Project Manager and helping to coordinating all the efforts from figuring out the data to managing the development cycle.

Please go ahead and try out Emitter, it is currently in Beta and we are looking at extending the features in the application.  And be sure to follow Emitter on twitter and send comments questions via twitter as we are actively monitoring that account

01 September 2010 ~ 3 Comments

Sparking ideas at Silicon Halton

Posted by:
Sparking ideas at Silicon Halton

Open Data is about transforming government data into an open platform.
Open Halton is about transforming Halton with open data.
Silicon Halton is about transforming Halton with a network of high tech entrepreneurs and leaders.

The way I see it all 3 can help and benefit from one another. Open Data =Open Halton=Silicon Halton. What a fit!

siliconhalton_meetup10_3That’s why I was so excited to stand up at my second Silicon Halton meet-up and put forward a topic of discussion – you guessed it – Open Data. It helped that this meet-up followed the “open space” concept, which was an ideal format to quickly:

  • Share what Open Data is all about in a 3-minute overview
  • Have a break-out session to throw around great ideas
  • Connect with others who are interested in exploring further

Open Data turned out to be a HOT TOPIC, getting most votes for a breakout. Our breakout was loosely structured around identifying (a) needs & challenges with respect to Open Data and (b) ideas & solutions for those needs. What floated to the top of all other needs & ideas were these 2 major “buckets”:


  • needs for more accessible bus data vs. today’s customary .PDF-only map/schedule option
  • better tools for cross-region trip planning (e.g. taking a bus from a Burlington location to Milton)
  • needs for rider feedback (and insights) as input for a more efficient transit schedule


  • need to have easier access to local arts+culture events & other activities
  • better access to information on school & community center programs & schedules

siliconhalton_meetup10_1The bus data seems to have been a very interesting area because the data already exists in some type of structured format, managed by local transit authorities, but is not very useful as it’s mostly locked up in large-size PDFs that are cumbersome to read and hence planning a trip, particularly cross-region, is a nightmare.

The passion of our breakout group, and all the open data ideas inspired me to double-down on launch of the Open Halton community, and to earmark TRANSIT as our first open data project! I feel that with the help of Chris, Rick and the Silicon Halton community, Open Data can be a transformational force in Halton’s high tech landscape.

Thanks to everyone @SiliconHalton for such interest and volume of ideas and suggestions! You guys rock!