Multi-party Private Set Intersection

Summary

Problem Definition

Given \(n\) parties \(P_1, P_2, \ldots, P_n\), where each party \(P_i\) holds a private set \(X_i\) over a common domain \(\mathcal{U}\), MPSI computes the intersection \(\bigcap_{i=1}^n X_i\) such that:

Correctness: The designated output party (typically \(P_1\)) learns exactly \(\bigcap_{i=1}^n X_i\).
Privacy: No party learns any information about elements in other parties' sets beyond what can be inferred from the intersection and their own input.

The corruption model varies across protocols:

Semi-honest: Adversaries follow the protocol but try to extract additional information from the transcript.
Malicious: Adversaries may deviate arbitrarily from the protocol.
Collusion resistance: Security holds even if up to \(t\) parties collude (with or without an honest majority).

The key challenges over two-party Private Set Intersection are:

Scalability: Communication and computation tend to scale with the number of parties \(n\).
Collusion: In two-party PSI, security is binary: one party is honest, one may be corrupt. In MPSI, any subset of parties may collude and combine their protocol views to extract information about honest parties' sets. The protocol must be simulatable for every possible coalition, which is a combinatorially harder security requirement. This is why many MPSI protocols restrict the corruption threshold or require an honest majority.

Taxonomy of Approaches

See [1] for a comprehensive classification. The main approaches are:

Private homomorphic set representations (polynomials, hash sets, sorted multisets). Encode each set into a representation that supports a homomorphic operation \(+\) such that \(\mathsf{Dec}(\mathsf{Enc}(X_1) + \cdots + \mathsf{Enc}(X_n))\) yields the intersection. For example, represent sets as polynomials with elements as roots and use the property that a random linear combination \(r_1 P_1 + r_2 P_2\) preserves common roots while randomizing the rest.
Leaky homomorphic set representations (Bloom filters, garbled Bloom filters). Same idea as above, but the encoding leaks some information (e.g., Bloom filters leak false positives through hash collisions).
Aggregatable membership queries (Oblivious Programmable PRF, Oblivious Key Value Store, polynomial embeddings). Instead of combining set representations, test each candidate element for membership across all parties and aggregate the results. For example, using zero-sharing ([2]): each party \(P_i\) obtains a share \(s_i(x)\) for every element \(x\) such that \(\bigoplus_{i=1}^n s_i(x) = 0\). Each party then masks its set elements with its share via an Oblivious Programmable PRF. For an element in the intersection, all parties hold it, so the shares XOR to zero and the element is recognizable. For non-intersection elements, at least one share is pseudorandom, making the XOR indistinguishable from random.

Papers

[2] First implemented MPSI protocol; introduces Oblivious Programmable PRF to avoid public-key operations, achieving semi-honest security against any number of corrupted parties without an honest majority.
[3] Combines Bloom filters with exponential ElGamal encryption in a star topology; achieves \(O(n \cdot m_{\max})\) communication and computation independent of \(n\) for non-designated parties.
[4] Extends two-party garbled Bloom filters to the multi-party setting; concretely efficient with communication linear in \(n\) and set sizes, semi-honest model.
[5] Uses Oblivious Linear Evaluation to add polynomials while preserving roots; UC-secure against malicious adversaries with asymptotically optimal \(O((n^2 + nm)\kappa)\) communication, avoiding homomorphic threshold encryption and ZK proofs.
[6] Extends two-party symmetric-key PSI to the multi-party setting using an untrusted third-party model with hardware tokens.

Applications

Multiple Shop owners or digital services want to launch a collective promotion campaign. To do so, these shops must identify their mutual set of customers, without violating the privacy of the customers. ([3] [2])
Multiple organizations come together to catch an intruder in a common network. The organizations keep a list of suspicious IPs and by computing the intersection they narrow them down. Since IP addresses contain personal information it is important to keep them private. ([2, 4, 7])
Detect botnets. ([5])
An investigative agency needs to narrow down a list of suspects by cooperating with multiple other agencies. Since none of the agencies can share plain data with each other, they engage in an MPSI protocol to only consider relevant suspects. [6]
More applications in [1]

References

[1] Jelle Vos, Mauro Conti, and Zekeriya Erkin. 2024. “SoK: Collusion-resistant Multi-party Private Set Intersections in the Semi-honest Model.” https://ieeexplore.ieee.org/abstract/document/10646669?casa_token=SC9GM3SMrZcAAAAA:GWpdE0SfvbK5E4vPnmP-wCRLV2udPyeerw8Xtuqv5KZMjhaPDKjIuVt0Ih03_U1kCiqb1sEtKd4v.

[2] Vladimir Kolesnikov, Naor Matania, Benny Pinkas, Mike Rosulek, and Ni Trieu. 2017. “Practical Multi-party Private Set Intersection from Symmetric-Key Techniques.” https://dl.acm.org/doi/10.1145/3133956.3134065.

[3] Atsuko Miyaji, and Shohei Nishida. 2015. “A Scalable Multiparty Private Set Intersection.”

[4] Roi Inbar, Eran Omri, and Benny Pinkas. 2018. “Efficient Scalable Multiparty Private Set-Intersection via Garbled Bloom Filters.”

[5] Satrajit Ghosh, and Tobias Nilges. 2017. “An Algebraic Approach to Maliciously Secure Private Set Intersection.” https://eprint.iacr.org/2017/1064.

[6] Zhusheng Wang, Karim Banawan, and Sennur Ulukus. 2021. “Multi-Party Private Set Intersection: An Information-Theoretic Approach.” https://ieeexplore.ieee.org/document/9349622/.

[7] Lea Kissner, and Dawn Song. 2005. “Privacy-Preserving Set Operations.” http://link.springer.com/10.1007/11535218_15.