Questions about Clean Insights

How are we pulling this off? (tracking data but not people)

  • In short, we are helping limit what is collected and when it is collected, while also providing capabilities to process measurements on the client-side instead of sending raw data to a server. For more details, you should read the longer project description on the About Page

What are we doing with IP addresses?

  • In the client apps, we do not use or collect any unique identifiers tied to “real” device identifiers, names, emails or other contact information. On the server-side, we will provide methods for removing IP address logging from any measurement infrastructure. You can read our blog post on Tracking usage without tracking people for specific information on how to do this today.

Is there any risk for my users? Could the data collected on their device put them in danger in any way? Are you collecting data points that could be corroborated to reveal personal identifying information on a user?

  • Clean Insights is much more than just code and tools. It is an entire approach to measurement that is opposed to the usual method of “damn the users, and collect everything you possibly can” that most anweb and app analytics services take. This includes teaching developers how to threat model as part of planning what they are going to measure, and why they are going to do it. Through this, we strive to help developers mitigate putting their users in danger as much as possible.

Deanonymization - How are we handling this potential?

  • Part of this is answered in the previous questions, but we can expand here. There are two approaches possible with Clean Insights. First, through client-side processing of measurement data to produce aggregate “insights”, we move away from users deanonymizing themselves through uploading vast amounts of data. Second, through advanced cryptographic techniques like Differential Privacy and Private Join-and-Compute, we can offer computational proven methods for combatting deanonymization.

My app doesn’t ask for trust. We demonstrate it. How does Clean Insights align with this value?

  • Through transparency in code, process and implementation. The goal is to be able to share publicly any measurements you are gathering.

How would Clean Insights build off of existing methodologies that have been worked on in other successful projects?

  • Prototype for Android exists, and was integrated with OpenArchive mobile application as part of that process to safely measure media content types being shared
  • User measurement through Server infrastructure measurement has occurred through collaboration with Rights Action Lab, and will build upon concepts pioneered there

What makes Clean Insights metrics distinct from other forms of measurement?

  • completely open-source and decentralized, and can be self-hosted in a variety of deployment configurations
  • provides insights to app developers and designers about usage of their apps without resorting to full surveillance-style tracking as provided by Google Analytics and others
  • has an appropriate threat model baked in from the beginning
  • Focused on on building trust, engagement by users to make the apps better in a collaborative and not extractive way

How do you envisage other tool teams using this methodology and integrating into their products?

  • Through software libraries, guidelines, methodologies we publish
  • Through direct collaboration, help in implementation through developer support in both community channels, and professional service engagement opportunities
  • Through adoption by other existing partner projects and collaborations

How can we incentivize tool teams to adopt this methodology?

  • Require any measurement for reporting to be done in privacy preserving way, with guidelines regarding retention, data minimization practices, etc “HIPAA for Human Rights”
  • Community Awareness: Highlight the security, privacy, dignity issues that most app analytics can cause

What are the hopes for long-term impact?

  • Mass industry adoption in the way our SQLCipher library has
  • Conceptual impact through more mainstream privacy preserving features
  • More funding through other sources interested in extended this work, allowing us to fully implement differential privacy, other advanced features