Built for Access: How Brazil's Identity System Opened a Door It Wasn't Designed to Close

By@yasoMar 25, 2026
Subscribe

Part 1 of 2. This piece covers the history of digital identity in Brazil and the data infrastructure that made industrial scale fraud possible. Part 2 covers how fraud co-evolves with digital infrastructure, and what that means for every country building digital infrastructure for identities today.


We tend to talk about digital fraud as if it were a technology problem. A bug to patch, a model to retrain, a vendor to procure. But the deeper you look at how Brazil's identity infrastructure actually developed, from handwritten ID cards to deepfakes, the more you realize the system was never designed to protect anyone. It was designed to grant access. And for most of its history, that was exactly the right design goal.

The same decisions that made financial inclusion possible at scale are the ones that left the door open. The problem was never the ambition. The problem was the absence of adversarial thinking: the failure to ask, at each step, who else might walk through the door being built.

The document was never about you

Look at a Brazilian identity document from the 1940s. You'll find: name, date of birth, parents' names, skin color, hair, beard, height, civil status, thumbprint, and a photograph. It reads more like a police file than a personal document. And that's exactly what it was: a record kept by the state, about the citizen, for the state's purposes.

The card was designed for a clerk at a counter to verify that the person in front of them matched a record in a ledger. The threat model was simple: the opportunist. Someone trying to claim they were someone else, in person, with a forged document. The system worked reasonably well against that specific, low-scale attack.

What it couldn't anticipate was a world where identity information would flow digitally, at scale, with zero friction. This is the foundational problem: the information model of the Brazilian identity was designed for a physical world and never fundamentally rebuilt when the world became digital. Every subsequent layer of the system was built on top of that foundation, inheriting its assumptions, its data fields, its implicit threat model, and each layer compounded the same structural vulnerability.

What we call the "identity system" in Brazil is not actually a system. It is a collection of overlapping records: the RG issued by state security secretariats, the CPF managed by the Federal Revenue Service, the CNH linked to Detrans, the electoral title, professional registrations, and now gov.br, which might be mistaken for a digital identity document but is actually a unified access credential for more than 7,000 types of government services. Each was created by a different institution, for a different purpose, with different data standards and security models. They were never designed to interoperate. The result is a fragmented identity infrastructure with many entry points and no single authority responsible for its integrity.

The Bifurcation Points

Complex systems don't fail gradually. They accumulate pressure until a small change in conditions produces a qualitative shift in behavior. In systems theory, these are called bifurcation points: moments where the possibility space expands and the system tips into a fundamentally new dynamic. Brazil's digital identity landscape has several of these. Each one was driven by a legitimate and largely successful policy or market goal. Each one also opened a new surface that fraud adapted to exploit. I write about the main ones below:

2003 — Bancarization at scale. Bolsa Família and the expansion of correspondent banking brought tens of millions of Brazilians into the formal financial system for the first time. This was genuinely transformative social policy, and its effects on poverty reduction were real and lasting. It also created a large new population of people with no prior experience navigating financial institutions, no established transaction history that fraud systems could use as a behavioral baseline, and no intuitive sense of what a legitimate bank interaction looked like versus an illegitimate one. The "caderneta de fiado" the shopkeeper's handwritten credit book, a trust system built on years of face-to-face relationship, was being replaced by a debit card, often held by someone who had never held one before. You cannot defraud someone out of a bank account they don't have. Now they had one.

2006 — Social networks and the emergence of digital profiles. Orkut, YouTube, and the early social web created something that had never existed before: a publicly accessible, searchable, persistent record of who people were. Faces, voices, social graphs, location patterns... All flowing into platforms whose business model depended on aggregating and distributing that information. Identity, which had previously required physical presence to steal, became remotely accessible. This is the period in which the raw material for social engineering at scale first became widely available.

2012 — WhatsApp becomes the national communication layer. When WhatsApp crossed class, age, and geographic lines to become the dominant communication channel in Brazil, it handed fraud operators something extraordinary: a channel that felt personal and intimate, that people used to communicate with family and close friends, and that had no centralized mechanism to verify the identity of the sender. The cloned WhatsApp account, the fake bank support call, the urgent message from a "son" asking for a wire transfer and other fraud tactics, none of this works without a channel that carries the full weight of personal trust. Software designed for legitimate mass communication became, with trivial modification, a fraud delivery infrastructure.

2018 — Facial biometrics enter the identity ecosystem. The digital CNH linked facial biometrics to Detran's database and made them available for third-party validation. This was understood as a security upgrade, and in a narrow sense it was. But it also centralized biometric data in a state system with an imperfect security record, and it trained the entire market to treat a facial match as sufficient proof of identity. A generation of onboarding flows was built on the assumption that "face matches document" was equivalent to "this is the actual account holder initiating this transaction." That assumption would eventually be broken by generative AI.

2019 — Frictionless digital onboarding becomes the standard. The consolidation of digital banks (the fintechs) normalized the idea that you could open a financial account from your phone in minutes. Speed became a competitive advantage. Security friction became a competitive liability. "Mule accounts", normally called "contas laranja", or bank accounts opened under real or synthetic identities for the purpose of receiving and laundering fraud proceeds; became a commodity that could be acquired in bulk. The fraud supply chain gained a critical piece of infrastructure.

2020 — Pix. This is the most consequential bifurcation. Pix introduced two properties that fundamentally changed the fraud calculus: instantaneity and irreversibility. Every fraud that had previously required hours or days to clear now completed in seconds. The attack window shrank to zero. The recovery window shrank to zero as well. Pix is a genuine achievement in financial infrastructure: its transaction volumes have no equivalent at scale in most developed economies. That success is precisely what made it so useful to fraud operators.

2023+ — Generative AI and deepfakes. The most recent bifurcation is still unfolding. Synthetic identity documents, biometric bypass, voice cloning, and AI-assisted social engineering didn't create a new category of fraud. They industrialized many that already existed, and removed the last remaining human effort requirements from running it at scale.

The Data That Fuels Everything

To understand where Brazil's fraud ecosystem is today, you need to understand what happened to the country's data. In January 2021, Brazilian cybersecurity researchers discovered what became known as the megavazamento: a collection of databases, likely aggregated over years from multiple sources, compiled into a single archive approaching one terabyte in size. The numbers are difficult to fully absorb: records for 223 million individuals: more than the entire living population of Brazil at the time, because the dataset included deceased persons; and 40 million companies.

This was not a leak of usernames and passwords. It was a leak of the complete socioeconomic profile of essentially every Brazilian adult. The records included full name, CPF, date of birth, parents' names, phone numbers, email addresses, employment history with employer CNPJ, and more. Separate datasets included facial images linked to identity documents, vehicle data for 100 million vehicles, and information on public servants. The data was organized, cleaned, and structured, categorized in the precise way that a credit bureau company would organize it for commercial use.

The initial exposure was on a dark web forum. But accessibility moved quickly beyond dark web infrastructure. A condensed version was made available for free. The complete database was priced between $0.075 and $1 per person depending on data richness and volume, with packages starting at around $500, payable in Bitcoin. At some point, links to portions of the data were indexed by Google Search. Access to dark web infrastructure was not required.

The megavazamento was not an isolated event. It was the most visible point in a pattern of accumulating leaks: government health databases, Detran records, Federal Revenue data, and leaks from private data brokers and private companies have each contributed to a growing underground inventory of Brazilian personal data. Each new leak adds fields. Each new combination increases the resolution of the profile available to anyone who wants to buy it.

The Telegram Layer

What happened to that data next matters, because it moved somewhere far more accessible than the dark web. Telegram has become the primary distribution infrastructure for leaked Brazilian personal data. Researchers who monitored public and semi-public Telegram groups over a six-month period documented what they called "Pull Groups": channels where users can request personal data on specific individuals and receive it within seconds, via automated bots. In that monitoring period alone, more than 12 million personal data records were extracted from these groups.

The mechanics are worth understanding. A user sends a CPF number or a name to a bot. The bot queries a database assembled from multiple leak sources and returns a structured record: full name, date of birth, address, phone numbers, income data, employer, family connections. The response is formatted, clean, and immediate. It looks less like criminal infrastructure and more like a consumer data API, because functionally, that's what it is.

The barrier to entry is low. Many of these groups are publicly accessible and searchable through Telegram's own search function. Some have close to 200,000 participants (the platform's standard limit for groups). A small number of high volume groups account for the majority of the data distributed. Before a fraud operator places a social engineering call or initiates an account takeover, they can pull a complete dossier on the target for a few reais. They will know the target's income bracket, employer, whether they have a car, their mother's name, and their credit score. The call that follows isn't a cold call. It's a warm call, built on data the target will recognize as authentic, which is exactly what creates the trust that enables the fraud.

The Regulatory Gap

The data infrastructure described above didn't emerge in a regulatory vacuum. It emerged in a regulatory gap that remains only partially closed. LGPD, Brazil's data protection law, exists and is broadly aligned with international standards. But enforcement infrastructure is still maturing. The ANPD is building its penalty framework, and the precedent cases that would give the law genuine deterrent force are still working through the system. In the meantime, the collection and aggregation of personal data continued largely as before.

Identity remains fragmented across systems that do not fully interoperate. Rather, these systems interconnect in inconsistent ways. Companies pull data from governmental APIs, while governmental apps rely on private data as a source of truth, and private companies collect biometrics and personal data using the “legitimate interest” legal basis as if personal data were not personal. This means there is no single authority that can respond when credentials are compromised at scale. If your biometric data is leaked, there is no mechanism equivalent to canceling a credit card. If the image of your face or your fingerprint scans are exposed, the data is out. It stays out.

The only response available is downstream: better detection, more friction, because the upstream compromise is irreversible.

Responsibility for Pix fraud sits in genuine legal ambiguity. When a victim transfers money under social engineering, the question of who bears the loss involves multiple institutions and no clear framework. That ambiguity distributes the cost of fraud across victims and institutions in ways that reduce the pressure on any single actor to solve the systemic problem. Criminal law has not kept pace with the sophistication of the threats it needs to address. The Telegram groups distributing millions of personal records, the bot infrastructure, the laranja account markets, all these operate in legal categories that were written for different kinds of crimes and are difficult to prosecute at the speed and scale at which the ecosystem operates.


Part 2 — "Build It and They'll Come: How Fraud Co-Evolves with Digital Infrastructure" examines why fraud doesn't attack weak systems, it attacks good ones, and what that means for every government building digital public infrastructure today.

Yasodara Córdova thinks and builds at the intersection of digital identity, fraud ecosystems, privacy, security and why systems break in ways nobody planned for. She has developed work with Harvard University, the World Bank, W3C, the World Economic Forum, and SXSW, among others.