Natalie Evans Harris spent sixteen years working at the National Security Agency (NSA) focused on improving the way the U.S. federal government used data to inform their work. Her results stood out. She was tapped by the Obama administration to serve as a senior policy advisor to the Chief Technology Officer of the United States. In this capacity, Natalie helped government agencies adopt data science best practices and even created a community of federal data professionals who gathered monthly to trade ideas.
It was meaningful, energizing work.
Yet Natalie also saw too many government pilots fail to achieve scalable results. Projects fell prey to “vendor lock-in” when public service agencies had to pay third party companies every time they wanted to access or make changes to their own data infrastructure. This hampered their ability to drive projects in a sophisticated and informed manner.
Natalie knew there must be a better way. She wanted to help people working on critical social challenges access data in ethical, timely ways to inform their work. When she met Matt Gee, Andrew Means, and Thomas Plagge, the vision for how this could become a reality began to take shape.
Matt had worked in government, academia, and social entrepreneurship. Andrew came from the nonprofit and foundation world. Tom was an astrophysicist who loved building new things. Natalie was a longtime government employee. Collectively, they shared the belief that the data economy was fundamentally broken. They recognized two significant—but solvable—market failures were holding back the most socially beneficial uses of data:
- Lack of access and control. Organizations were often unable to create impact with their own data because they had problems organizing and accessing it. The team wanted to create better and more accessible data infrastructure that would allow groups of organizations working on related problems to securely and responsibly access and share data.
- Limited ability to combine data about related challenges. Second, they wanted to create data infrastructure that would help organizations combine different types of data together today without having to spend significant amounts of money every time they wanted to layer in new information.
How could they leverage these insights to build a sustainable business model that tackled the tricky but important challenge of sharing data ethically?
LAUNCHING A SOCIAL ENTERPRISE FOCUSED ON RESPONSIBLY SHARING DATA
To meet this need for more accessible, scalable, and ethical data infrastructure, the team launched a new social enterprise called BrightHive, “a data collaborative company.” They incorporated as a for-profit B Corp and recently raised their first round of investment capital. (Full disclosure: Acumen is an investor in BrightHive.)
“We want to prove that a for-profit company can be profitable, while prioritizing social impact and data responsibility and hold the CEO and the team to this above all else,” Natalie said.
BrightHive’s core service is known as a “Data Trust.” In its most basic form, a BrightHive Data Trust is a legal, technical, and governance framework that empowers a collective of organizations to securely connect their data and responsibly create and use new shared data resources. This responsible data sharing is made possible through a Data Trust legal agreement between the organizations that lays out the legal, technical, and ethical rules for how data will be shared and used. BrightHive’s open source Data Trust platform automates the process and makes sure all the rules are followed.
Each Data Trust has a governance board made up of the organizations that contribute data. Collectively, the board oversees how the data can be used and who gets to use the data. Additionally, each Data Trust has an external trustee that ensures the Data Trust Agreement is adhered to and that data flows between parties in a responsible fashion.
Developing the agreement is a critical part of kicking off any new Data Trust. BrightHive has a free template available on Github that can be adapted to the specific needs of each Data Trust to make sure data sharing happens in compliance with relevant laws and the policies of participating organizations.
To ensure that data is used responsibly, BrightHive also adds three additional components to each Data Trust Agreement:
EMBEDDING ETHICAL PRINCIPLES
BrightHive explicitly makes sure ethical and responsible data principles are foregrounded in each agreement they oversee. To do this, they start by referencing the Global Data Ethics Project’s core set of principles. Then, BrightHive asks members of the Data Trust about other ethical considerations that are important to them: For example, how else would they define ethical and responsible data practices related to the work they are doing?
“We add this addendum because if we ever have a situation where people are using data inappropriately, the addendum provides a way for the Data Trust governing board to hold them accountable,” Natalie said.
APPOINTING A USER ADVOCATE TO THE GOVERNANCE BOARD
BrightHive also works with the data owners to identify a user advocate to sit on the governance board.
“One thing we always want to make sure is that the Data Trusts put people first,” Natalie explained. “It is one thing to be legally compliant, but it is a different thing to address the needs of populations you are trying to serve and make sure their voices are incorporated into decisions. This is why we help the Data Trust members appoint a user advocate. It can be a community group, a nonprofit organization, or a third party who can be the voice of the individual user whose data is represented in the Data Trust.”
INCORPORATING THE RIGHT TO AUDIT
In every Data Trust Agreement, BrightHive also builds in a provision allowing the technical trustee to audit the Data Trust on an annual basis.
“We don’t do this to punish the Data Trust,” Natalie said. “Instead, we want to make sure that the Data Trust members and the trustee continue to responsibly serve the Data Trust. There is always room for improvement and we want to make sure that the Data Trust is adhering to the agreement and continues the collaboration over time.”
A DATA TRUST IN ACTION
Natalie and her team see many opportunities to put Data Trusts into practice to help organizations answer questions related to complex social issues. One of their first use cases has been in Colorado.
Like many states in the United States, Colorado has been striving to become compliant with the Workforce Innovation and Opportunity Act (WIOA). This is a 2014 law passed by President Barack Obama designed to help job seekers access employment, education, training, and support services to succeed in the labor market and to match employers with the skilled workers they need to compete in the global economy.
Federal grants have been allocated to states to fund workforce development training programs. Now the states need to report back on whether students who benefited from these programs have successfully landed jobs. Before BrightHive, this process was painfully manual. Training providers would call people to collect information and send their results by email to the state. States would then manually clean the data in spreadsheets or whatever alternative systems they had set up.
“Often it takes many months to get good quality data,” Natalie explained. “It is a very manual and labor-intensive process.”
To enable seamless and responsible data sharing between training programs and the government, BrightHive created a Data Trust that currently consists of five members, including private training programs and community colleges. They created a common API that allows training programs to access clean and linked data in a consistent fashion with the government.
“We have automated the data sharing process, working with data providers and the Departments of Higher Education and Labor in Colorado,” Natalie said. “The goal is to get everyone using the same language around data. We want the government to be able to see what results they are achieving from these training programs. Have these grants helped individuals to increase their earnings? What programs are serving the target communities most effectively?”
“We also want the training providers to be able to compare their own impact and effectiveness to others,” she added. “We want this to be a mutually beneficial relationship automated in a way so that they are able to get information in a matter of days and weeks. What used to require an entire government IT team can now be done with the equivalent of one server. We work with the existing teams that the government agencies have in place so there are huge efficiencies and cost savings for them.”
BUILDING A BUSINESS MODEL TO SUSTAIN THIS WORK
BrightHive generates revenue through the Data Trust Agreements it sets up between multiple parties. In new markets, Data Trust Agreements are typically paid for by impact investors or funders looking to open the space for data sharing. Eventually, organizations that are involved in the Data Trust or state or regional entities become paying customers who sustain the model. They see the value and benefit of the data and how it can streamline state or federal reporting requirements and are willing to pay for this continued service.
Brighthive evaluates these new business opportunities carefully. They do not rush into signing new contracts—no matter how lucrative—until they have evaluated factors including ethics, policy incentives, and the motivations of the parties involved.
“We typically look for opportunities to set up a BrightHive Data Trust where a policy incentive is involved,” Natalie explained.
“Often this involves a future crisis that people are trying to prevent. For our projects in the workforce development sector, a key driver has been the focus on the future of work. People are motivated to quickly address skills gaps and prepare the workforce for new types of jobs.”
“We do a market evaluation and partner with nonprofits or think tanks who have evaluated the space,” she continued. “We ask them what challenges need to be solved and what data is currently available. Then we also begin to look at the data that is actually shareable. Will it be possible to share it with a high degree of certainty and confidence to answer questions and address outcomes? It’s one part technical and two parts about the appetite of the people in the organizations wanting to share the data. What is their motivation?”
Once BrightHive evaluates the impact opportunity, technical feasibility, and incentives of potential collaborating parties, they begin to pilot Data Trusts by helping put together a unifying Data Trust Agreement and deploying their technical infrastructure.
“We build in the costs of making sure the Data Trust is developed, deployed, and run in an ethically responsible fashion into our contracts. A culture of ethics is never a tack-on. It’s built into the way we do business,” Natalie said.
Concretely, BrightHive embeds ethics into their business model by creating a checklist for ethical data principles that their developers can reference as they build the software infrastructure. “We don’t have to spend a ton of money to do that,” Natalie pointed out. “Creating a checklist is not expensive.”
BrightHive also builds the costs of audits into their annual subscription.
“Audits are expensive,” Natalie said. “However, because our product as a whole is really affordable and produces a lot of cost savings for the parties involved, we can still be competitive in our pricing even when we build in the extra costs that might come with making sure we are being ethical and responsible at all stages of development and deployment.”
BrightHive also intentionally avoids charging for data collection itself. “We charge for the linking and use of data,” Natalie explained. “Our profits come from the value of the Data Trust, not from the quantities of data. Therefore, we only collect what is necessary to fulfill the use case agreed upon by the Data Trust members. We do not access or collect all of the organization's data. This is not a data lake. Instead, we only collect what is relevant to the specific use case. We also avoid collecting raw data and instead work with our customers to collect samples of synthetic data.”
“I’ve seen that commoditization of raw data itself is very problematic,” she added. “It’s hard to be ethical when you turn people’s data into a business model. I have not seen a good model of a business built around providing access to people’s raw data. Remember that just because data is technically or legally shareable doesn’t mean that sharing it is ethical.”
ADVICE FOR OTHERS LOOKING TO BUILD BUSINESS MODELS THAT ETHICALLY LEVERAGE DATA
While BrightHive is still in the early days of navigating these waters with nearly two years dedicated to developing its Data Trust solution, Natalie and the other founders have been thinking about questions of responsible data sharing for years. Recently, she has seen the conversations evolve.
“There is a dichotomy happening in discussions around data and ethics,” Natalie observed. “There is a robust community of academics and practitioners who have worked on questions of data ethics for years and whose focus has been on culture. They ask about what it looks like to share data ethically: How do I know I’m doing it right? Am I sharing data in a way that doesn’t put people at risk?”
“Then there is also a business conversation related to data and ethics that is really starting to gain steam as companies like Facebook and Google finally grapple with questions around ethics, privacy, and security. Companies tend to be focused on questions like: Will I get sued? How much will it cost me? What are the risks of doing this? They look to lawyers and other voices within companies who are thinking about protecting business interests.”
“Ideally, if we’re building sustainable business models with ethical data practices at their core, we need to figure out how to marry the two conversations,” Natalie said. “That’s the hardest thing to do as a company. You want to build a culture that is ethical and continues to evolve to keep up with responsible data sharing practices. You also need to think about building a sustainable business model. You can’t just have a business model that is purely theoretical; there also have to be ways to pay the bills.”
Here’s what she suggests other entrepreneurs should keep in mind:
EXPLICITLY DEFINE WHAT ETHICS MEANS FOR YOUR ORGANIZATION
“Define what ethics means for your organization and codify this in a manifesto or code of ethics. This should feed into everything you do and permeate your organization. If you don’t have a culture that intentionally talks about ethics, then you will fail.”
BUILD ETHICS INTO YOUR BUSINESS MODEL
“There is a natural inclination to try to find a black and white way to signify that you are using data ethically. In my experience, people want to find a shiny object or tool that they can shoot their data through and see if it has been collected ethically. They want to be able to check a box and state that they’ve been responsible. That doesn’t exist."
"There is no magic tool or silver bullet. Ethics has to feed through your entire business model."
"There shouldn’t be a piece of your business model that doesn’t ask about or incorporate ethical considerations.”
FOCUS ON BUILDING TRUST WITH THE PEOPLE WHOSE DATA YOU ARE USING
“We need to focus on building and earning the trust of people whose data we are using. There is also a lot more literacy that needs to happen among people like my mom and dad when it comes to consent and use of their data. We need to envision models where data can be used so well that people will trust those organizations much like they trust their doctors. They don’t exactly know what will happen when they go into surgery, but they trust that doctors are doing the right thing. We don’t have that same kind of trust when it comes to organizations using our data. How do we start to build it?”
GET THE “PEOPLE PIECE” OF ETHICS RIGHT
“I honestly believe that the future of ethical business models will come down to the strength of the individuals involved."
"We need to be building enterprises where there is a diversity of experience and thought involved, alongside diversities of race, age, and sex. Additionally, any time we approach a new initiative, we need to make sure we don’t do it by ourselves. We need to bring in two or three partners who have different lenses than we do. That is really important as we build our data use cases and design our data infrastructure. We need to keep thinking about how we mitigate unintended consequences and intentionally keep our work with data very people-centric.”
This case study was written by Amy Ahearn of Acumen in March 2019 with generous input from Natalie Evans Harris and Matt Gee of BrightHive and Eliza Golden of Acumen.
Legal Notice and Disclaimer:
This Case Study or Interview Product, commissioned by the Open Society Foundations, is the product of a collaboration between Open Society Foundations and Acumen Fund. The content of this Case Study or Interview Product does not necessarily reflect the official opinion of the Open Society Foundations. Responsibility for the information and views expressed in the Case Study or Interview Product lies entirely with Acumen Fund.
The Open Society Foundations and Acumen Fund request due acknowledgement and quotes from this publication to be referenced as: “BrightHive: Building a Social Enterprise Focused on Sharing Data Ethically to Answer Complex Social Questions,” Open Society Foundations and Acumen Fund March, 2019.
This work is licensed under the Creative Commons Attribution-NonCommercial-No Derivs 4.0 International License. To view a copy of this license, visit: https://creativecommons.org/licenses/by-nc-nd/4.0/ This Case Study or Interview product, and other materials associated with it are available at: plusacumen.org