If you’d rather listen than read, I recorded an audio version of this essay for paid subscribers at the end of the post. Thank you for being here!
Sometime in the mid-2000s, most of us started handing over pieces of ourselves to the internet without giving the exchange a second thought. We created email accounts, signed up for social media, bought things online, downloaded apps, swiped loyalty cards, connected fitness trackers, stored photos in the cloud, and agreed to terms of service that almost none of us have ever read in full. We did this thousands of times over two decades and counting, and each interaction felt small enough to be inconsequential.
But the accumulation is enormous. More than 6 billion people now use the internet, and each one makes an estimated 5,000 digital interactions per day. Most of those interactions happen without our conscious awareness: a GPS ping, a page load, an app opening, a browser cookie refreshing, a device checking in with a cell tower. The average person in 2010 made an estimated 298 digital interactions per day. In fifteen years, that number multiplied more than sixteenfold. Those digital interactions produce records that can persist indefinitely, stored, copied, indexed, bought, sold, and combined with other records to build profiles of extraordinary detail.
If we’ve been online since the late 1990s or early 2000s, our data footprint can include social media accounts we’ve created, online purchases we’ve made, forums we’ve posted in, loyalty cards we’ve used, and apps we’ve installed going back decades. Some of that information lives on platforms we’ve long forgotten. Some of it was collected by companies that have since been acquired or dissolved, with our data potentially passing to successor entities we’ve never heard of. The digital life most of us have been living for 15 to 25 years has produced a layered, evolving archive that only grows more valuable to the people who buy and sell it as time goes on.
Most of us sense that something is off about all of this. In a 2023 survey, Pew Research found that roughly eight in ten Americans feel they have little to no control over the data companies collect about them, 71% are concerned about government data use, and 67% say they understand little to nothing about what companies are doing with their personal information. The concern is real and widespread. And so is the feeling of helplessness: 60% of Americans believe it’s impossible to go through daily life without having their data tracked. The unease is there. What’s missing is a clear picture of what’s happening on the other side of the transaction.
What “My Data” Looks Like as a File
When we say “my data,” we tend to picture the things we’ve actively typed into a form: our name, email address, maybe a credit card number. But the scope of what companies collect and brokers sell extends far beyond what we’ve consciously shared. The majority of our data profile is generated not from what we’ve entered but from what we’ve done: where we’ve gone, what we’ve browsed, how long we’ve lingered, what we’ve bought, who we’ve contacted, and what patterns emerge when all of those behaviors are tracked over time.
Beyond the identifying details we'd expect (name, date of birth, government IDs), the categories of personal information being collected, sold, and traded include, among others:
Financial records: credit history, transaction logs, bank account activity, loan applications
Location data: GPS coordinates from our phones, Wi-Fi connection logs, cell tower pings, the places we’ve checked in and the routes we’ve traveled
Behavioral data: the sites we’ve browsed, the searches we’ve run, the products we’ve lingered on, the apps we’ve opened and how long we’ve spent inside them
Biometric data: fingerprints, facial recognition templates, voiceprints
Communications metadata: who we’ve contacted, when, how often, and from where, even when the content of the message itself isn’t captured
Health-related data: pharmacy purchases, fitness tracker output, symptom searches, insurance claims
Social data: our contacts, our connections, our group memberships, who we interact with and how frequently
These categories don’t exist in isolation. For example, a pharmacy purchase is one data point on its own. Combined with a location trail, a search history, and a social media profile, it becomes part of a behavioral mosaic that can be used to infer things we never disclosed: our health conditions, our financial stability, our family status, our political orientation even. Acxiom, one of the largest data brokers, advertises more than 10,000 unique data attributes in its consumer profiles. The profile that results from all of this collection isn’t a list of facts we shared. It’s a composite portrait assembled from fragments that we never intended to be read together.
Who Has It
“They have my data” implies a single entity of ownership, but the data ecosystem is a layered network and each layer operates differently. The platform we signed up for, the trackers running behind the webpage we visited, the broker who bought our behavioral profile, and the insurance company that used it to adjust our premium are all separate organizations with separate business models. Understanding who “they” are requires seeing how these layers connect.
The most visible layer is the platforms and services we interact with directly: Google, Apple, Meta, Amazon, Microsoft, and hundreds of smaller apps and websites. These companies collect data as a condition of use. Google processes billions of searches per day (estimates range from 8.5 to 14 billion depending on the source and methodology). Facebook’s engineering team disclosed in 2014 that its data warehouse alone was ingesting about 600 terabytes of new data per day, and the volume has grown substantially since. Every interaction on these platforms produces records that are stored, indexed, and fed into behavioral models.
Behind those platforms sits an advertising and tracking infrastructure that most of us never see. When we load a webpage, dozens of third-party trackers can fire simultaneously, each one logging our browser type, our device, our location, our referring page, and our behavior on the site. A single visit to a news article or product page can involve data transmissions to 50 or more companies we’ve never interacted with directly. These ad networks and analytics firms build cross-platform behavioral profiles that follow us from device to device, assembling a picture of our habits that no single app or website could construct alone.
And finally, there are the end buyers: insurance companies, financial institutions, employers, landlords, retailers, political consultants, government agencies, and AI companies. These are the organizations that purchase brokered data and use it to make decisions that affect our lives directly, from the interest rate we’re offered on a loan to the price we’re shown for a pair of shoes to whether our rental application gets approved. The distance between the data we generated and the decision it informs can be vast, and the connection between the two is almost never disclosed.
Where It Lives and How It Moves
Our data doesn’t sit in one place waiting to be accessed. It’s distributed across more than 4,000 data centers in the United States alone, operated by technology companies, cloud providers, brokers, and government agencies. Cloud storage worldwide is projected to exceed 200 zettabytes in the coming years, a figure that translates to more than 200 trillion gigabytes.
But the numbers matter less than the mechanics. What makes the data economy so difficult to see is the way information flows between layers. A common path looks something like this: a fitness app collects our running routes and resting heart rate. The app’s developer shares that data with an analytics partner. The analytics partner sells aggregated behavioral data to a data broker. The broker combines it with our purchase history, our location patterns, and our credit profile, then sells the resulting bundle to an insurance company, a financial institution, or a marketing firm. At no point in this chain did we interact with anyone beyond the fitness app. We agreed to the app’s terms of service, which included a clause about sharing data with “third-party partners,” and the rest of the chain followed from there.
At each stage, the data is processed: cleaned, categorized, cross-referenced with other datasets, scored, and segmented. Algorithms organize us into consumer profiles based on inferred income, predicted purchasing behavior, estimated health risk, political leanings, and thousands of other variables. The processing is what transforms raw data into something commercially valuable, and it happens largely outside our awareness.
What They’re Using It For
Some of the ways our data gets used are familiar, and some of them are useful in ways we can feel. When a streaming platform recommends a film based on what we’ve watched before, or a search engine surfaces local results because it knows our location, or a retailer suggests a product similar to one we recently bought, what we’re seeing data-driven personalization working as intended. Most of us have experienced moments where targeted content saved us time or introduced us to something we wouldn’t have found otherwise. The advertising model also funds a tremendous amount of the free content and services we use every day, from search engines to email to social media to news.
Insurance companies purchase lifestyle and behavioral data to assess risk and adjust premiums without ever asking us directly. If a data profile suggests certain health patterns, certain driving routes, or a certain kind of neighborhood, those details can shape what we’re offered and what we’re charged. The data acts as a proxy questionnaire we never filled out.
Financial institutions use brokered data for credit decisions, loan eligibility, and fraud detection. Some of that serves a protective function. But the algorithms doing this work are proprietary and opaque, and studies have documented that algorithmic credit scoring models produce systematically lower scores for Black and Latino communities compared to white and Asian populations. Consumers do have legal rights under the Fair Credit Reporting Act (FCRA) to dispute inaccuracies, and lenders are required to provide adverse action notices explaining the principal reasons for a denial. But consumers often don’t know which brokered data fed into the score in the first place, and companies aren’t required to reveal the proprietary formulas their models use. The Consumer Financial Protection Bureau (CFPB) has noted that data brokers operating in a gray area between regulated credit reporting and unregulated data sales make the process of identifying and challenging errors especially difficult in practice.
Employers and landlords use data from people-search sites to screen applicants. These sites source their information from data brokers and public records, and they frequently contain errors, because brokers don’t verify what they aggregate. The Federal Trade Commission and organizations like the Electronic Privacy Information Center have documented cases where inaccurate broker data cost people jobs and housing.
Retailers use personal data to set individualized prices. In January 2025, the Federal Trade Commission released findings from a study of what it calls “surveillance pricing,” in which intermediary firms hired by retailers track browser history, mouse movements, purchase patterns, and location to adjust the price of the same product for different buyers. The FTC described a scenario in which a consumer profiled as a new parent would be shown higher-priced baby products at the top of their search results. New York became the first state to require retailers to disclose when a price was set by an algorithm using the consumer’s personal data; the law was signed in May 2025 and took effect in November of that year after surviving a legal challenge. Also in 2025, dozens of state legislatures introduced bills to regulate various forms of algorithmic pricing, including surveillance pricing and algorithmic rent-setting.
AI companies are training their models on personal data scraped from the web. Web crawlers pull content from blogs, social media profiles, online marketplaces, photo-sharing platforms, and anywhere else that isn’t behind a login wall. A 2025 MIT Technology Review investigation of a major AI training dataset found thousands of identifiable faces, identity documents, and job applications in a sample representing just 0.1% of the data, and estimated the full set contained hundreds of millions of images with personal information. Meta has said its AI models are partially trained on public Facebook and Instagram posts. LinkedIn began using member data to train its AI tools before updating its terms of service to reflect that change. Because AI companies have scraped content posted long before generative AI existed, the people whose data is in those training sets never had the opportunity to consent to that use. And there’s a feedback loop: we generate data by using AI systems, that data refines those systems, and those systems shape the information environment we navigate. Our data becomes part of the infrastructure that determines what information reaches us next.
Government agencies buy personal data from the same commercial brokers that serve advertisers and insurance companies. The Supreme Court ruled in 2018 that law enforcement needs a warrant to access a person’s historical cell phone location data. But federal agencies, including the FBI, ICE, and the Department of Defense, have argued that purchasing location data from a commercial broker is a market transaction, not a compelled disclosure, and therefore doesn’t require one. The Brennan Center for Justice has called this a loophole that allows agencies to bypass constitutional protections, and has documented cases including the Department of Defense purchasing location data collected from prayer apps to monitor Muslim communities. The same data pipeline built for advertising can be repurposed for surveillance, and the legal framework hasn’t caught up to that reality.
And the brokers themselves get breached. In July 2025, hackers accessed names, Social Security numbers, and dates of birth for more than 4.4 million people through a third-party application used by TransUnion, one of the three major U.S. credit bureaus. In a separate incident disclosed that same year, LexisNexis Risk Solutions confirmed that hackers accessed names, Social Security numbers, driver’s license numbers, and dates of birth for more than 364,000 people through a third-party software development platform. The companies that centralize our data become single points of failure, and when they’re compromised, the exposure isn’t one transaction or one relationship. It’s a cross-section of an entire life in one place.
Data We Never Shared
Not all personal data comes from something we’ve handed over. Companies also generate what’s known as inferred or derived data: new information produced by running existing records through predictive algorithms. The inference is drawn from the pattern, not from anything we volunteered. In 2012, a New York Times feature revealed that Target had built a pregnancy prediction algorithm around roughly 25 products whose purchase patterns correlated with pregnancy stages: unscented lotion, certain vitamin supplements, extra-large bags of cotton balls. The algorithm could estimate a due date within a narrow window based entirely on shopping behavior.
Insurance companies can infer health risks from purchase histories and location patterns. Financial institutions can infer economic instability from app usage and transaction frequency. Data brokers categorize consumers into segments like “single parents,” “fitness enthusiasts,” and “budget conscious households” based on behavioral inferences, not declared preferences. These inferred profiles are then sold to the same range of buyers as declared data, with the same range of consequences, but we never agreed to share the information those profiles contain - the information didn’t exist until a model generated it from our behavior.
This means that even careful, privacy-conscious choices about what to share can be partially circumvented by inference. We might choose not to disclose a health condition, a pregnancy, a financial difficulty, or a political affiliation, and a predictive model can generate a probability estimate of that very thing based on the patterns in the data we did share. The profile that follows us through the data economy isn’t limited to what we put into the system. It includes what automated models have inferred from patterns in our behavior, patterns that become labeled, scored, and treated as facts by the companies that buy them.
Why the Context Is the Problem
The philosopher Helen Nissenbaum has a framework for what’s happening here: contextual integrity. The idea is that privacy isn’t about secrecy. We share information willingly all the time, when the context fits. We tell our doctor about a health condition because we expect that information to stay within the medical relationship. We search for symptoms on a health website because we assume that search won’t follow us into an insurance application. In the current data economy, that’s exactly the kind of boundary that dissolves, because the company collecting the data and the company buying it are operating in completely different contexts.
This is an information literacy problem as much as a privacy problem. Information literacy is usually framed around consumption: evaluating sources, questioning claims, recognizing bias in what we read and watch. But every time we interact with a digital service, we’re also producing information: generating a record that will be read, interpreted, scored, and acted on by organizations we may never interact with directly. Many of us have gotten better at questioning the information that comes at us: checking sources, noticing bias, and recognizing when something is trying to sell us a conclusion. But we haven’t developed equivalent habits around the information that flows from us: where it goes after we hand it over, who reads the record, what incentives they have, and what conclusions they draw. The gap between what we think we’re consenting to and what we’ve agreed to in practice is where the real exposure lives, and the system is designed to keep that gap invisible.
The Illusion of Choice
One of the reasons the “so what” question is hard to answer with action is that opting out of data collection often means opting out of participation. Declining a social media platform’s terms of service means not using the platform. Refusing location permissions can mean losing access to navigation, ride-sharing, weather, and delivery apps. Choosing not to create an account can mean paying more, seeing less, or being locked out of services that have become essential infrastructure for work, communication, healthcare, banking, and education.
The architecture of digital consent treats data sharing as a binary: agree to the terms or don’t use the product. There’s rarely a middle option that allows us to use a service while limiting what data gets collected and where it goes. The result is that the “choice” to share data often functions as a condition of entry into daily life rather than an informed negotiation. We’re not handing over data because we’ve weighed the tradeoff and decided it’s fair. We’re handing it over because the alternative is exclusion from services we rely on.
This is the structural context behind the Pew Research Center finding that more than half of Americans believe it’s impossible to go through daily life without being tracked. For many of us, it isn’t possible, at least not without significant inconvenience or sacrifice. The question isn’t whether we can avoid data collection entirely, because for the vast majority of people who participate in modern life, the answer is no. The question is whether we can make more informed decisions within the constraints we’re operating in, and whether the system can be pushed - through regulation, through market pressure, through better tools - toward something more transparent.
Can We Get It Back?
California’s Delete Act, which took effect in January 2026, is the strongest example of what’s emerging. It created a platform called DROP (Delete Request and Opt-Out Platform) that lets California residents submit a single deletion request to every registered data broker in the state. Brokers are required to process those requests, maintain suppression lists to prevent re-collection, and check the platform regularly for new requests. The European Union’s GDPR provides similar individual rights, and a handful of other U.S. states have enacted their own privacy laws with varying levels of protection. But the coverage is uneven: what’s available to a California or EU resident may not extend to someone in a state without comparable legislation.
Some services now automate parts of the opt-out process, submitting removal requests to dozens of brokers on our behalf. These can’t erase the data trail entirely, but they can narrow what’s actively available for sale.
Beyond deletion, there are smaller choices that reduce how much new data we generate. We can audit which apps have permission to track our location or access our contacts, since a surprising amount of behavioral data comes from apps that don’t need those permissions to function. We can treat “sign in with Google” and “sign in with Facebook” buttons as what they are: data-sharing agreements that can link a new service to an existing profile. And we can glance at the first few lines of a privacy policy before agreeing, looking for some version of “we may share your information with our partners,” where “partners” just means anyone willing to pay.
So can we get it back? Not entirely. Data that’s already been collected, copied, sold, and processed across multiple systems can’t be fully recalled. What we can do is reduce what’s actively available for sale, slow the flow of new data going forward, and take advantage of legal tools that didn’t exist a few years ago. The archive of our past digital lives is too distributed to undo, but the file is still being written, and we have more say over the next page than we did over the last twenty years of them.
The Other Side of the Transaction
Most of us don’t read privacy policies, and the policies aren’t built to be read. They average thousands of words of dense legal language filled with terms like “legitimate interest,” “data processor,” and “de-identified data.” Studies consistently put them at a late high school to early college reading level (grade 12 to 14), but the difficulty goes beyond reading level: the concepts are abstract, the volume of agreements we encounter is enormous, and the design of the consent process itself pushes us through as fast as possible. Pre-checked boxes, auto-scrolling agreement windows, “accept all” buttons positioned prominently while “customize settings” options sit behind additional clicks. These are dark patterns, design choices that make the path of least resistance the path of maximum data sharing.
The result is a gap between the moment we share a piece of information and the moment that information shapes a decision about our lives. We don’t connect the app to the insurance premium or the loyalty card to the rental application because the chain of custody between them is long, complex, and designed to stay out of view.
The same critical thinking we’ve learned to apply to the information flowing toward us (checking sources, questioning claims, looking for bias) applies to the information flowing from us: who’s collecting this, what will they do with it, who else will see it, and what did we agree to? The difference is that in the data economy, we’re the product being evaluated, and the questions are being asked about us rather than by us.
So what if they have our data? The tradeoff extends well beyond better ads. It reaches into the prices we’re charged, the credit we’re offered, the jobs we’re considered for, the insurance premiums we pay, the AI systems trained on our behavior, the accuracy of the profiles used to make decisions about our lives, and the degree to which government agencies can monitor our movements without a warrant. Every new service we sign up for, every permission we grant, and every terms-of-service agreement we accept adds another layer to that file. We can’t close the file entirely, but we can make more informed decisions about what goes into it next.
Have you read the Founding Member Report: The State of AI yet?
A comprehensive guide for information navigators who want to understand where AI is actually heading and what it means for how we find, evaluate, and use information in 2026.
Share Card Catalog
Prefer to listen? My audio narration of this essay is available to paid subscribers below.