DocumentCloud and the Overview Project

DocumentCloud is an exciting project created with funding from The Knight Foundation (below from the FAQ):

DocumentCloud is both a repository of primary source documents and a tool for document-based investigative reporting. Think of the repository as a card catalog for primary source documents. We’re building tools that accelerate the work of reporters who need to make sense of large sets of documents. (You can use it on small sets, too.)

The software in use by DocumentCloud is available for use by others (see and the software and service of DocumentCloud is being used in other applications, like Overview (below from the About page):

Overview is an open-source tool to help journalists find stories in large amounts of data, by cleaning, visualizing and interactively exploring large document and data sets. Whether from government transparency initiatives, leaks or Freedom of Information requests, journalists are drowning in more documents than they can ever hope to read.

There are good tools for searching within large document sets for names and keywords, but that doesn’t help find stories we’re not looking for. Overview will display relationships among topics, people, places and dates to help journalists to answer the question, “What’s in there?”

We’re building an interactive system where computers do the visualization, while a human guides the exploration. We will also produce documentation and training to help people learn how to use this system. The goal is to make this capability available to anyone who needs it.

With Overview also funded by the Knight Foundation and the AP, both projects specifically focus on journalists. The technologies and functionality are applicable for all fields, so it’s extremely exciting to see these new tools and the new work that they and others could enable.