But what about the data in total? The easiest way to combine data from multiple users is to average them. For example, the most popular time tracking app, Flo, has an estimated 230 million users. Imagine three cases: a single user, the average of 230 million users, and the average of 230 million users plus 3.5 million users submitting unwanted data.
A person’s data may contain noise, but the underlying trend is more apparent when averaged across many users, smoothing out the noise to make the trend more apparent. Junk data is just another kind of noise. The difference between the clean and dirty data is noticeable, but the general trend in the data is still clear.
This simple example illustrates three problems. People submitting unwanted data are unlikely to influence predictions for an individual app user. It would take an extraordinary amount of work to shift the underlying signal across the entire population. And even if this were to happen, poisoning the data risks rendering the app useless to those who need it.
Other approaches to protect privacy
In response to people’s concerns about their menstrual app data being used against them, some menstrual apps have made public statements about creating a anonymous modeusing end-to-end encryption, and according to European privacy laws†
The security of any “anonymous mode” depends on what it actually does. Flo’s statement says the company will de-identify data by removing names, email addresses and technical identifiers. Removing names and email addresses is a good start, but the company doesn’t define what they mean by technical identifiers.
With Texas paving the way to legally sue anyone who helps someone else get an abortionand 87% of people in the US can be identified by minimal demographic information such as zip code, gender, and date of birth, demographics or identifiers can be harmful to people seeking reproductive health care. There is a huge user data marketprimarily for targeted advertising, which makes it possible to learn a terrifying amount about almost anyone in the US
While end-to-end encryption and the European General Data Protection Regulation (GDPR) can protect your data from legal questions, unfortunately these solutions don’t help with the digital footprint that everyone leaves behind as they use technology every day. Can even identify users’ search history how far along are they in the pregnancy?†
What do we really need?
Rather than brainstorming ways to circumvent technology to mitigate potential harm and legal problems, we believe that people should advocate for: digital privacy protections and restrictions on data use and sharing† Businesses need to communicate effectively and get feedback from people about how their data is being used, their level of risk from exposure to potential harm, and the value of their data to the business.
People have been concerned about digital data collection in recent years. However, in a post-Roe world, more people may be at legal risk from doing standard health tracking.
Katie Siek is a professor and chair of computer science at Indiana University. Alexander L. Hayes and Zaidat Ibrahim are Ph.D. health informatics student at Indiana University.