Learning how Groups Vote Without Knowing Who Voted

The Fundamental Problem

Most voting systems force a choice between transparency and privacy. If ballots are anonymous, nobody can verify that their vote was counted. An observer cannot audit the election. Fraud becomes hard to detect. If ballots are identifiable so that voters can verify their inclusion, then everyone can see how everyone else voted. Privacy evaporates.

Election officials naturally want to report accurate findings: “52% of Muslim voters chose candidate A.” But discovering that fact requires storing two pieces of information together—how a specific person voted and that person’s religion. Once you combine those facts into a single record, you’ve created something that could identify someone. Add their age to the record. Add their zip code. Introduce their occupation. Suddenly an individual voter becomes identifiable by the precise intersection of their demographic attributes. This is called an intersection attack. It’s one of the hardest problems in privacy research.

This document describes a system that separates what traditional voting systems bundle together. Instead of one election that records both a vote and a demographic profile, the system deliberately tears apart three critical questions: Was I eligible? Was my vote included? How did I vote? The system provides answers to the first two without ever answering the third.

The Core Architecture: Separating Three Questions

Traditional elections bundle eligibility, inclusion, and ballot content into a single record. A voter proves they are eligible, casts a vote, and that vote becomes identified either with their name or with a demographic profile. Either way, the system knows three things about one record: eligibility, inclusion, and content.

This system tears those three apart. It answers the eligibility question through anonymous credentials. It answers the inclusion question through public commitments and Merkle trees. It answers the content question through homomorphic tallying or mixnets that deliberately destroy the ability to link any individual’s vote to the final count.

Most importantly, it answers the demographic question—how did Muslims vote, how did younger people vote—by creating entirely separate elections, one for each demographic dimension. Each election sees only one demographic attribute. No election sees any other attribute for the same voter. Therefore, no record ever exists that combines religion with age, age with geography, or any two demographic attributes together.

The dangerous dataset is never created. Privacy emerges from architectural separation rather than from encryption, differential privacy, or policy.

Part 1: Anonymous Credentials Through Blind Signatures

A voter begins by obtaining anonymous credentials—one for religion, one for age, one for geography. These credentials are deliberately unlinkable.

For each demographic dimension, the voter generates a random token, hashes it, then blinds that hash. The blinded value goes to the registration authority. The authority verifies eligibility and signs the blinded hash without seeing the underlying token. The voter unblinds the signature, producing a valid credential the authority cannot later identify.

This yields three cryptographically independent credentials:

  • T_religion (proves membership in the Muslim demographic)
  • T_age (proves membership in the 30–39 demographic)
  • T_region (proves membership in the Northeast demographic)

Even the registration authority cannot determine that these credentials belong to the same person. This property—unlinkability—is the foundation of the entire system.

Part 2: Commitments as Sealed Envelopes

When a vote is cast, the system publishes a commitment rather than the vote itself. A commitment is a sealed envelope: the voter hashes their vote with random salt and nonce, producing a fingerprint everyone can see but nobody can reverse.

The voter submits three ballots—one to each demographic election. Each ballot consists of a credential proving eligibility plus a commitment:

Religion Election: [Credential T_religion, Commitment C_religion] Age Election: [Credential T_age, Commitment C_age] Region Election: [Credential T_region, Commitment C_region]

Each election receives exactly one ballot and operates independently. They do not communicate or share data.

Part 3: Merkle Trees and Inclusion Proofs

As ballots are accepted, their commitments become leaves of a Merkle tree—a cryptographic structure that summarizes millions of commitments into a single root hash. The root is published publicly. Any modification to any ballot changes the root, making tampering detectable.

The election publishes the Merkle root and provides each voter with a cryptographic path proving their commitment exists in the tree. This is the crucial innovation: the voter can prove “my ballot exists in the official record” without revealing how they voted. The commitment is just a hash. The vote content remains private.

The voter now has something unprecedented: a receipt for voting that does not reveal the vote itself.

Part 4: Casting and Validation

Each election validates submitted ballots: Is the credential valid? Has it been used before? Is it valid for this specific election? If all checks pass, the ballot is accepted.

Critically, each election knows nothing about the other elections. The religion election cannot see or access the age or region elections. It cannot compare credentials across systems or determine which religion credentials belong to the same voter as which age credentials. The elections are sealed off by design.

This separation prevents intersection attacks. If all elections existed in one database, a query could join them: “Show records where religion = Muslim AND age = 30-39 AND region = Northeast.” But these elections are separate systems. No query can join them because the data is not stored together.

Part 5: Mixnets and Tallying

After voting ends, the encrypted ballots are shuffled through a mixnet: each server re-encrypts the ballots, randomly permutes them, and passes them to the next server. Like shaking cards in a box a thousand times, everyone can verify the same cards emerge, but nobody knows which card belonged where. Mathematical proofs accompany the mixnet, proving no ballots were added, removed, or modified.

Alternatively, the system can use homomorphic encryption: ballots are encrypted, summed without decryption, and only the aggregate total is decrypted.

In both cases, the final tally is valid but no individual ballot is linked to any individual commitment. The connection between “I submitted this commitment” and “my vote was counted” is destroyed by design.

Part 6: End-to-End Verifiability

Now the voter can perform two separate verifications that together provide end-to-end auditability.

First, the voter verifies inclusion: “Was my ballot included?” They retrieve their commitment and the Merkle path they received at submission. They verify that the path leads from their commitment to the published Merkle root. This proves their ballot exists in the official election record. The election cannot deny this. If the ballot is missing, the proof fails. The voter has evidence of misconduct.

Second, the voter verifies accuracy: “Was every included ballot counted correctly?” The tally authority publishes mathematical proofs showing that the set of encrypted/committed ballots that entered the mixnet or homomorphic aggregation exactly matches the set of commitments published on the bulletin board. These proofs demonstrate that no ballots were added, removed, or modified during tallying. Any observer can verify these proofs independently.

Together, these two verifications provide something almost never achieved in voting systems: end-to-end verifiability without requiring any trust in officials. The voter knows their ballot was counted. The public knows all ballots were counted. The mathematics does the verification.

Part 7: Receipt-Freeness and Vote Buying Prevention

Here is the subtle but crucial point that makes vote buying impossible. The voter has a receipt—their commitment and the Merkle path. But the receipt does not reveal how they voted. The commitment is just a hash. The salt is private. The nonce is private.

After the mixnet shuffles ballots, there is no way to trace the commitment back to the final tally. The shuffling deliberately breaks that link. The voter cannot demonstrate which entry in the final tally corresponds to their commitment. They cannot prove they voted for candidate A even if someone demanded proof.

This seemingly small property destroys vote-buying markets. A buyer wants to purchase votes. The buyer approaches a voter and offers money. The voter agrees and later claims to have voted for the buyer’s preferred candidate. The buyer demands proof. The voter cannot produce it. The system deliberately made it impossible.

Even if the voter is telling the truth, they have no way to prove it. The buyer cannot be confident they received the service they paid for. Consequently, the vote market collapses. You cannot buy what you cannot verify delivery of.

This property is called receipt-freeness. The voter has a receipt for voting, but not a receipt for how they voted. The voter can credibly tell anyone trying to coerce them: “I cannot prove how I voted even if I wanted to. The system made it impossible. The shuffling destroyed the link.”

Part 8: Demographic Elections and Unlinkable Credentials

Now comes the part that makes this system unique: demographics.

The system does not conduct one election. It conducts multiple elections—one for each demographic dimension society wants to measure. The Religion Election measures how Muslims voted versus Christians. The Age Election measures how younger voters voted versus older voters. The Region Election measures how different geographic areas voted.

Each election is completely independent. Each election sees exactly one demographic dimension. The Religion Election sees only religion. It does not know the age of any voter. It does not know their region. It knows only their vote choice and that they are Muslim (or Christian, or Jewish, etc.). The Age Election knows only age and vote choice. The Region Election knows only region and vote choice.

Because the credentials used in each election are unlinkable—because T_religion, T_age, and T_region are cryptographically independent—even the most clever observer cannot determine which credentials belong to the same person. The ballots are cast into separate systems. The ballots are tallied separately. The results are published separately.

The system can therefore report that 52% of Muslims voted for candidate A. It can report that 47% of the 30–39 age group voted for A. It can report that 58% of Northeast voters voted for A. Each statistic is accurate and verifiable. But there is no statistic reporting how many 30–39-year-old Muslims in the Northeast voted for A, because that data never existed.

No database contains the intersection. No query can produce it. No future computation can reconstruct it. The information was never collected.

Part 9: Standing to Complain and Accountability

This architecture creates an unusual legal and political property that is rarely discussed but profoundly important. Traditional voting systems face a dilemma about accountability. If ballots are secret, a voter cannot prove misconduct happened to them. They cannot say “my vote was discarded” because they cannot prove they voted or how they voted. If ballots are identifiable to enable verification, then voters can prove misconduct but at the cost of destroying privacy.

This system breaks the dilemma. A voter can prove misconduct without revealing their vote.

Suppose election officials secretly discard 10% of ballots. The voter checks whether their commitment appears in the published Merkle tree. If their commitment is missing, they can prove it. They can show that they submitted a ballot—they have the receipt—but the commitment is not in the official record. This is evidence of fraud.

Or suppose election officials accept ballots but then replace some votes in the tally. The mixnet proofs fail to validate. The tally authority cannot produce a valid proof showing that the encrypted ballots that entered the mixnet match the committed ballots published beforehand. Again, the voter has standing to complain.

For perhaps the first time in history, a voter can demonstrate election misconduct without revealing how they voted. They can prove “something was done to my ballot” without proving “I voted for candidate X.”

This is possible because inclusion and content are separated. The commitment proves inclusion. The vote content is kept separate. A compromise of one does not compromise the other.

Part 10: The Consistency Problem and the Intentional Tradeoff

Here is where the system makes a controversial but necessary choice. A voter could theoretically vote for candidate A in the Religion Election but candidate B in the Age Election. This seems like a problem. Wouldn’t inconsistent votes skew the results?

The system intentionally does not solve this problem. It does not enforce per-voter consistency across elections. The reason is subtle but crucial: any consistency proof would link credentials across elections, which would reintroduce the intersection attack vulnerability the system is designed to prevent.

If the system required voters to prove that T_religion, T_age, and T_region all contained the same vote, it would have to make those credentials linkable. The proofs would create a bridge connecting them. Once credentials become linkable, an observer can reconstruct demographic profiles. Privacy disappears.

The system therefore makes a deliberate tradeoff: it sacrifices per-voter consistency enforcement in exchange for complete unlinkability. Most voters will vote the same way in all elections anyway. They make a choice and stick with it. Some voters might vote differently in different elections, either because they are uncertain or because they are testing the system. But a small percentage of inconsistency has negligible impact on aggregate results. If 98% of voters are consistent and 2% are not, the aggregate statistics shift by less than 1%.

This small amount of noise is considered a worthwhile price for preventing intersection attacks and preserving privacy. The system uses aggregate consistency checks rather than per-voter checks. Election officials can verify that the total number of accepted ballots makes sense, that credentials were properly issued, that the tally matches the ballots. But they do not attempt to enforce consistency at the individual level.

Part 11: Architectural Privacy vs. Statistical Privacy

Architectural Privacy vs. Statistical Privacy

Differential privacy adds noise to protect records that exist.

This system prevents those records from existing.

The difference is categorical, not incremental.

Differential privacy is a statistical technique used in conventional systems: collect very detailed data about people, then add noise to make individual records resistant to reconstruction attacks. You store 100 million voter records with dozens of demographic attributes each, then run the numbers through a privacy mechanism that introduces uncertainty. The results become approximate, but individual records should resist intersection attacks.

This system does not need differential privacy because it never creates the dangerous dataset in the first place.

The distinction is categorical, not incremental. Differential privacy and this system solve opposite sides of the problem.

Differential privacy asks: “We have already collected demographic profiles. How do we add noise so intersections cannot be reconstructed?”

This system asks: “How do we collect demographic data so that intersections never exist to be reconstructed?”

One protects records that exist by making them fuzzy. The other prevents the records from existing at all.

Consider the practical implications. Differential privacy adds uncertainty to results. A query about how Muslims voted might return 51.3% or 52.7% instead of 52%, because noise was added to protect individual records. You lose analytic precision in exchange for privacy. This is an inherent tradeoff. You cannot both have perfect accuracy and perfect privacy if the dangerous data exists.

This system eliminates that tradeoff. It loses some possible questions—you cannot ask “how did 30–39-year-old Muslims in the Northeast vote?”—but the results you do get are exact. They are the actual tallies from independent elections. They are not noisy approximations. You sacrificed a question you cannot safely ask anyway, not accuracy of the questions you can ask.

More importantly, the security model is fundamentally different. A hacker who breaches differential privacy loses access to noisy approximations. The actual records remain hidden, but the noise might be reconstructed or the data might be re-identified through other means. A hacker who breaches this system finds nothing to steal. Demographic profiles were never stored together. They do not exist in any database, any ledger, any proof. A future authoritarian government cannot access voter data because it does not exist in linkable form. A researcher cannot reverse-engineer intersections because the data was never collected together.

Privacy in this system does not depend on how well the noise generator was built, how long the encryption keys are, or how strong the access controls are. Privacy emerges from the absence of dangerous information, not from the obfuscation of it. The system is secure not because it hides data, but because it refuses to create the data that needs hiding.

This is why the system achieves something differential privacy cannot: it provides demographic transparency to a population that does not trust electronic systems with their data. Voters who understand that the system cannot possibly store their demographic profile together have stronger privacy guarantees than voters who are told “we added noise really well, trust us.” Trust is not required. Mathematical impossibility is.

This represents a categorical paradigm shift in how privacy can be achieved in systems where both transparency and privacy are required. Instead of asking “how do we collect data safely?” the system asks “how do we collect transparency safely?” The answers are completely different.

Part 12: Statistical Inference Without Observation

Although the system never measures demographic intersections directly, society can still estimate them using external information and statistical inference.

The system publishes that Muslims chose candidate A at a rate of 52%, that voters aged 30–39 chose A at a rate of 47%, and that Northeast residents chose A at a rate of 58%. These are three independent statistics from three independent elections. None of them are approximate. All are exact.

From census data, we know what fraction of the population is Muslim, what fraction is aged 30–39, and what fraction lives in the Northeast. Using statistical methods—Bayesian inference, regression analysis, demographic modeling—researchers can estimate the probability that a randomly selected voter who is Muslim AND aged 30–39 AND from the Northeast voted for A.

This estimate is not based on stored data. It is an inference from multiple independent sources combined using known statistical relationships and demographic models. The estimate might be slightly less precise than a direct measurement would be. But the advantage is enormous: no voter profile was ever stored, so no privacy was compromised. No intersection record ever existed.

Researchers and policymakers can get the insights they need without running a surveillance system.

Part 13: The Core Insight — Architectural Privacy

Differential privacy adds noise to existing records. This system prevents records from existing in the first place. That is a categorical difference, not a degree of difference.

For 20+ years, privacy research assumed a fundamental tradeoff: if you want demographic insight, you must collect demographic data somewhere, which creates privacy risk. The only question was how to manage that risk—through encryption, access controls, or statistical noise.

This system breaks that assumption. It achieves demographic transparency by ensuring demographic intersections never exist to be protected. A Muslim voter. A 30–39-year-old. A Northeast resident. All separate. Unlinkable. No database can join them. No query can retrieve them together. No hacker can steal what was never stored.

Voters who don’t trust electronic systems now have a reason to trust them. Not because the encryption is strong or the algorithms are clever, but because the dangerous data was never collected. Privacy is not maintained through secrecy. It is guaranteed through absence.

This is architectural privacy: designing systems so that the dangerous state cannot be reached, rather than protecting the dangerous state once it exists.

Part 14: The Complete Picture

Consider what the system has achieved:

The public can learn how demographic groups voted. Religious organizations can know how their members voted collectively. Age-based advocacy groups can know how their demographic voted. Geographic regions can know how their residents voted. Every demographic result is accurate, verifiable, and published.

Individual voters remain completely anonymous. No observer can link a vote to a voter. No authority maintains a record of which way any person voted. The mixnet shuffling ensures that even the election officials who conducted the tally cannot determine who voted for whom.

Voters can verify their ballots were included and counted, without proving how they voted. They have accountability without exposure. They can detect fraud without compromising privacy.

Vote buying becomes economically impossible because buyers cannot verify delivery. Coercion becomes ineffective because victims cannot prove their choices. Election officials cannot selectively discard ballots because discarded ballots will be missing from the Merkle tree and the voter can prove it.

The entire system is publicly verifiable. No trust is required. The mathematics does the auditing.

And all of this is achieved not through encryption, not through differential privacy, not through legal protections, but through architectural separation. The dangerous dataset is never created. The information simply does not exist.

This is why privacy emerges naturally from the structure rather than from noise, trust, or policy.

How This Compares to Paper Ballots

Paper ballots have been the gold standard of election security for centuries. They are simple, verifiable, and hard to hack remotely. But simplicity comes with costs that are rarely acknowledged.

In a paper ballot system, a voter cannot verify that their specific ballot was counted. They can observe the poll worker place their ballot in a box, but they have no cryptographic proof of inclusion. They cannot audit the election without disclosing how they voted. If officials claim to have counted the ballots correctly, the voter has no mathematical proof to verify this claim. Trust in officials is required.

A famous example: the 2000 U.S. presidential election between George W. Bush and Al Gore turned on the manual recount of paper ballots in Florida. The recount took so long that the Supreme Court intervened before it could be completed. We never definitively know who won that election because paper ballots, while physical, are not cryptographically verifiable. They must be recounted by hand, and hand recounts are slow, error-prone, and subject to human dispute.

More recently, the COVID-19 pandemic created practical concerns about paper voting. Shared voting machines and paper ballots handled by many people create disease transmission risks. Electronic voting with individual devices reduces physical contact, but then voters lose the ability to observe the process and verify their ballot.

This demographic voting system achieves what neither pure paper ballots nor simple electronic voting can: it combines the verifiability advantages of paper ballots with the speed and efficiency advantages of electronic systems. A voter receives a cryptographic commitment proving their ballot exists. They can verify this commitment appears in the official election record without revealing how they voted. Election officials cannot deny that a ballot was submitted or counted without the mathematical proofs failing to validate.

The system is also vastly faster than paper ballots. Tallying happens through cryptographic aggregation rather than hand counting. Results are available immediately upon completion of voting. Audits happen through mathematical verification rather than manual recounts. There is no waiting days or weeks for results. There is no dispute about whether ballots were miscounted because the math either checks out or it does not.

The Evolution of Voting Technology: From Blockchain to Demographic Elections

In 2020, this author published an article in CoinDesk titled “In Defense of Blockchain Voting” arguing that cryptographic voting systems represented the future of elections. The core insight was that for every technology we use today, there was a time it was laughably inadequate as a replacement for what came before—chess engines were once curiosities but now beat grandmasters, and voting systems would eventually move beyond paper to cryptographically verifiable electronic systems.

However, the blockchain systems discussed in that article faced a fundamental scalability problem. Bitcoin and Ethereum could not handle millions of simultaneous votes without creating congestion and transaction backlogs. Voting must happen at a specific time, on a specific day, and the system must be able to handle millions of concurrent submissions. Centralized “layer 2” solutions were proposed, but centralization defeats the purpose of using blockchain in the first place.

The demographic voting system described here solves the scalability problem through a different architectural approach. Rather than relying on a monolithic blockchain where every node processes every transaction, the system uses a sharded architecture similar to peer-to-peer networks like BitTorrent. Each demographic election is a separate, independent system. Each election sees only a fraction of the overall network traffic. The network scales naturally by distributing load across multiple independent elections rather than centralizing it into a single bottleneck.

The system also solves the demographic privacy problem that neither traditional blockchain voting nor paper ballots address. Paper ballots offer no mechanism for demographic analysis without compromising privacy—you cannot ask “how did Muslims vote?” without either collecting demographic information (creating surveillance) or giving up the ability to know. Traditional blockchain voting stores everything on a public ledger, making privacy nearly impossible to achieve. This system separates demographic dimensions so thoroughly that demographic transparency and voter privacy coexist.

The result is a voting architecture that learns from the strengths of all three approaches: the simplicity and physical verifiability of paper ballots, the cryptographic verification and scalability of blockchain systems, and the demographic insight previously available only through centralized systems that compromise privacy.

Conclusion

Most voting systems force a tradeoff. Either voters receive receipts, enabling verification but enabling coercion. Or voters receive no receipts, preserving privacy but eliminating accountability.

This system breaks the false choice. It provides receipts for participation without receipts for preference. It allows verification without enabling coercion. It allows society to learn how groups voted without learning who voted how.

By decomposing a single logical election into multiple independent demographic elections and issuing voters unlinkable credentials for each dimension, the system ensures that no participant ever possesses enough information to reconstruct demographic intersections.

By using commitments, Merkle trees, and mixnets, the system ensures that voters can prove their ballot was included and counted without proving how they voted.

The result is something rare in the history of voting: transparency without surveillance, accountability without identification, analysis without intersection attacks.

Elections that know how groups voted. That never know which groups voted what way. That achieve demographic insight by ensuring demographic intersections never exist.

This represents a paradigm shift in how privacy can be achieved in systems where transparency is required. Rather than the traditional approach—collect data, then protect it through encryption, access controls, or statistical noise—this system asks a different question: how do we collect transparency itself safely, without ever creating the dangerous intermediate state?

The answer is architectural privacy: designing systems so that the dangerous state cannot be reached, rather than protecting the dangerous state once it exists. This principle extends far beyond voting. Anywhere demographic or intersectional analysis is needed—healthcare systems, employment records, housing data, social research—the same logic applies. Separate the data by dimension. Make the dimensions unlinkable. Ensure intersections never coexist. The insight is not about voting. It is about how to think about privacy in systems that demand transparency.