Accelerating Scientific Research with Gemini: Case Studies and Common Techniques
| Entity Passport | |
| Registry ID | arxiv-paper--unknown--2602.03837 |
| License | ArXiv |
| Provider | hf |
Cite this paper
Academic & Research Attribution
@misc{arxiv_paper__unknown__2602.03837,
author = {Unknown},
title = {Accelerating Scientific Research with Gemini: Case Studies and Common Techniques Paper},
year = {2026},
howpublished = {\url{https://free2aitools.com/paper/arxiv-paper--unknown--2602.03837}},
note = {Accessed via Free2AITools Knowledge Fortress}
} 🔬Technical Deep Dive
Full Specifications [+]▾
⚖️ Nexus Index V2.0
💬 Index Insight
FNI V2.0 for Accelerating Scientific Research with Gemini: Case Studies and Common Techniques: Semantic (S:50), Authority (A:0), Popularity (P:0), Recency (R:100), Quality (Q:45).
Verification Authority
📝 Executive Summary
❝ Cite Node
@article{Unknown2026Accelerating,
title={Accelerating Scientific Research with Gemini: Case Studies and Common Techniques},
author={},
journal={arXiv preprint arXiv:arxiv-paper--unknown--2602.03837},
year={2026}
} Abstract & Analysis
[2602.03837] Accelerating Scientific Research with Gemini: Case Studies and Common Techniques
Accelerating Scientific Research with Gemini:
Case Studies and Common Techniques
David P. Woodruff *, †, ‡
Google Research
Carnegie Mellon University
Vincent Cohen-Addad †, ‡
Google Research
Lalit Jain ‡
Google Research
Jieming Mao ‡
Google Research
Song Zuo †, ‡
Google Research
MohammadHossein Bateni †
Google Research
Simina Brânzei †
Purdue University
Google Research
Michael P. Brenner †
Google Research
Harvard University
Lin Chen †
Google Research
Ying Feng †
MIT
Lance Fortnow †
Illinois Institute of Technology
Gang Fu †
Google Research
Ziyi Guan †
EPFL
Zahra Hadizadeh †
University of California, Irvine
Mohammad T. Hajiaghayi †
Google Research
University of Maryland, College Park
Mahdi JafariRaviz †
University of Maryland, College Park
Adel Javanmard †
Google Research
University of Southern California
Karthik C. S. †
Rutgers University
Ken-ichi Kawarabayashi †
National Institute of Informatics, Tokyo and The University of Tokyo
Ravi Kumar †
Google Research
Silvio Lattanzi †
Google Research
Euiwoong Lee †
University of Michigan
Yi Li †
Nanyang Technological University
Ioannis Panageas †
University of California, Irvine
Dimitris Paparas †
Google Research
Benjamin Przybocki †
Carnegie Mellon University
Bernardo Subercaseaux †
Carnegie Mellon University
Ola Svensson †
EPFL
Shayan Taherijam †
University of California, Irvine
Xuan Wu †
Nanyang Technological University
Eylon Yogev †
Bar-Ilan University
Morteza Zadimoghaddam †
Google Research
Samson Zhou †
Texas A&M University
Yossi Matias
Google Research
James Manyika
Google Research
Vahab Mirrokni *, †, ‡
Google Research
Abstract
Recent advances in large language models (LLMs) have opened new avenues for accelerating scientific research. While models are increasingly capable of assisting with routine tasks, their ability to contribute to novel, expert-level mathematical discovery is less understood. We present a collection of case studies demonstrating how researchers have successfully collaborated with advanced AI models, specifically Google’s Gemini-based models (in particular Gemini Deep Think and its advanced variants), to solve open problems, refute conjectures, and generate new proofs across diverse areas in theoretical computer science, as well as other areas such as economics, optimization, and physics. Based on these experiences, we extract common techniques for effective human-AI collaboration in theoretical research, such as iterative refinement, problem decomposition, and cross-disciplinary knowledge transfer. While the majority of our results stem from this interactive, conversational methodology, we also highlight specific instances that push beyond standard chat interfaces. These include deploying the model as a rigorous adversarial reviewer to detect subtle flaws in existing proofs, and embedding it within a "neuro-symbolic" loop that autonomously writes and executes code to verify complex derivations. Together, these examples highlight the potential of AI not just as a tool for automation, but as a versatile, genuine partner in the creative process of scientific discovery.
† † * Corresponding authors: [email protected] , [email protected] † † † Individual section contributor. Authors of individual sections are only responsible for correctness of their section. † † ‡ Contributed to advanced variants of Google models used in this paper.
1
Introduction
The integration of artificial intelligence into the scientific workflow has traditionally focused on data analysis, simulation, and routine automation. However, the emergence of frontier large language models (LLMs) with enhanced reasoning capabilities suggests a shift towards AI systems that can act as an effective research collaborator, and contribute to the core intellectual tasks of research: formulating hypotheses, designing algorithms, developing novel techniques to tackle an open problem, and proving theorems.
This paper documents a series of independent experiments where researchers utilized advanced AI models to tackle specific, often long-standing, open problems in their respective fields. The results range from resolving conjectures in information theory and submodular maximization to deriving exact analytical spectra for cosmic strings and improving bounds for graph algorithms.
Key Contributions: Given the breadth of the fields and case studies covered, we highlight three primary contributions of this paper:
Demonstrating the power of hybrid models: We show the potential of a hybrid human-AI model in tackling real, highly complex open problems across a diverse set of scientific and mathematical domains.
Advancing state-of-the-art science: We make concrete, notable contributions to real problems in science, yielding novel results such as resolving conjectures, improving algorithmic bounds, and identifying critical flaws in literature.
Showcasing the future of scientific discovery: We illustrate what the future of research looks like in practice. By detailing novel workflows, human-AI collaboration techniques, and interactive problem-solving strategies, we highlight the real implications for how science is likely to be conducted in the coming years.
Crucially, these are not hypothetical scenarios but real-world instances where AI played a pivotal role in advancing state-of-the-art research. By analyzing these diverse case studies, we identify recurring patterns and effective strategies for leveraging AI in such theoretical research. The significance of these results is the generality of these techniques and existence of common recipes that can be applied by probing publicly available Gemini models. These include:
•
Agentic Execution Loops: Moving beyond manual chat interfaces, models can be embedded in automated “neuro-symbolic" pipelines. In these setups, the AI proposes a mathematical solution, writes code to numerically verify it, and automatically ingests execution errors (e.g., Python tracebacks) to self-correct and autonomously prune invalid mathematical branches (see Section 6.1 ).
•
Deep Technical Review and Bug Detection: Beyond constructive tasks, AI models can act as adversarial reviewers. We present a case where an LLM, guided by an iterative self-correction protocol, identified a fatal flaw in a recent cryptography preprint claiming a major breakthrough (SNARGs from LWE) [ 52 ] , a subtle inconsistency between a definition and a construction that had escaped initial human review.
•
Deep Literature Synthesis and Connection: AI models can identify obscure connections between disparate fields (e.g., linking Steiner trees to the Kirszbraun Extension Theorem) that human experts might overlook.
•
Counterexample Generation: Models are adept at constructing counterexamples to refute plausible conjectures, saving researchers from pursuing dead ends.
•
Algorithmic Insight and Optimization: In algorithmic research, AI can propose novel data structures or analysis techniques (e.g., adapting quadtrees for different norms) to improve time complexity bounds.
•
Automated Proof Generation and Verification: For well-defined subproblems, AI can generate rigorous proofs, sometimes requiring minimal human intervention, or verify complex manual derivations.
•
Interactive Refinement: A recurring theme is the iterative "conversation" where the researcher guides the model, correcting errors and refining the problem statement, which often leads to the final solution.
•
Theoretical Justification of Heuristics: AI models can bridge the gap between empirical success and theory by deriving rigorous justifications for heuristic methods, such as characterizing the implicit regularization induced by specific architectural choices like the Self-regularized Gumbel Sigmoid.
1.1
Model Description
Unless otherwise specified in the testimonials, the model used in this work is a Google-internal advanced version of Gemini Deep Think [ 73 ] – an enhanced reasoning model for complex problems that incorporates some of our latest research techniques, including parallel thinking, i.e., exploring multiple proof branches simultaneously. Similar models were also used recently in the International Mathematics Olympiad, achieving gold-medal standard performance [ 73 ] . This setup enables the model to simultaneously explore and combine multiple possible solutions before giving a final answer, rather than pursuing a single, linear chain of thought. This approach aligns with prior research demonstrating the efficacy of tree-based search methods in mathematical reasoning [ 44 ] .
To make the most of the reasoning capabilities of Deep Think, we additionally trained this version of Gemini on novel reinforcement learning techniques that can leverage more multi-step reasoning, problem-solving and theorem-proving data. We also provided Gemini with access to a curated corpus of high-quality solutions to mathematics problems [ 73 ] . We also added a long linear chain of interactive verification calls at the output to increase reasoning depth and provide a lot of iterative verification, which was augmented by human expert verification.
Figure 1 : Overview of the reasoning architecture used in many testimonials: an extensive exploration of the solution space combined with deep reasoning and a long tail of automated and human verification and in several cases, guidance and iterative feedback.
1.2
Related Work
Recent advances in Large Language Models (LLMs) have spurred significant interest in their potential to accelerate scientific discovery. Several concurrent works explore the capabilities of AI in assisting mathematical and theoretical research. Here, we situate our contributions in the context of two particularly relevant studies: OpenAI’s investigation into GPT-5’s scientific capabilities [ 16 ] and the work of Nagda et al. on using AlphaEvolve for hardness of approximation [ 76 ] .
Comparison with OpenAI’s GPT-5 Experiments.
Concurrent work by OpenAI [ 16 ] documents a series of case studies demonstrating GPT-5’s ability to contribute to research in fields ranging from biology and physics to mathematics. While both our work and theirs highlight the potential of frontier models as research assistants, there are differences in scope and focus. Our work places a heavier emphasis on Theoretical Computer Science (TCS), covering a wide array of subfields including algorithmic game theory, approximation algorithms, complexity theory, cryptography, graph theory, information theory, optimization, online algorithms, streaming algorithms, and query complexity.
Furthermore, recent activity on the Erdős Problem database has showcased the growing role of AI in resolving open mathematical conjectures. Notably, Erdős Problem #728, concerning factorial divisibility, was solved autonomously by an AI system (Aristotle), with GPT-5 playing a crucial role in generating the proof strategy and subsequent expositions [ 16 ] . This achievement, verified by Terence Tao and the broader community, underscores the capacity of these models not only to solve problems but also to rapidly iterate on and improve mathematical writing. Other problems, such as #729, #401, #205, #333, #281, and #460 have also seen AI-assisted progress or full resolutions, further validating the trend of AI-accelerated mathematical discovery. Our work complements these findings by providing a detailed methodology and a broader set of examples across TCS, demonstrating that these capabilities are not isolated incidents but part of a larger shift in research practices.
Comparison with AlphaEvolve for Hardness of Approximation.
Nagda, Raghavan, and Thakurta [ 76 ] present significant results in hardness of approximation for MAX-CUT, MAX-k-CUT, and TSP. Their primary methodology relies on AlphaEvolve [ 78 ] , a specialized evolutionary algorithm that uses an LLM as a mutation operator to search for combinatorial structures (gadgets) that optimize a specific objective function (inapproximability ratios).
In contrast, our work focuses on the utility of general-purpose LLMs (specifically Gemini and its reasoning-enhanced variants like Deep Think) as conversational research partners. While AlphaEvolve is a powerful tool for search problems where a clear objective function exists (e.g., finding a gadget with specific properties), our case studies demonstrate that general-purpose models can be effective across a broader range of intellectual tasks where the "objective" is less well-defined, such as generating a proof idea, identifying a relevant theorem from a different field, or interactively refining a definition.
For instance, our derivation of the analytical spectrum for cosmic strings (Section 6.1 ) involved a tree-search over derivation strategies and mathematical concepts, rather than a direct optimization of a numerical value. Similarly, the resolution of the "Simplex is the Best for Graph Embeddings" conjecture (Section 4.2 ) relied on conceptual bridging (linking Steiner trees to Lipschitz extensions) rather than combinatorial search. Thus, our work complements [ 76 ] by showing that standard frontier models, without specialized evolutionary wrappers, can drive progress in theoretical research through dialogue and reasoning.
Concurrent Work.
A recent work [ 48 ] does mathematical discovery at scale using AI-assisted methodologies. Our work aligns with it in demonstrating the potential of AI as a collaborative partner in theoretical research, but we offer a distinct set of case studies and a focus on specific TCS domains. Additionally, we note independent works [ 89 , 91 ] that further contribute to the growing body of literature on AI in mathematics and science.
In concurrent and independent work of Feng et al. [ 36 , 37 ] , they study similar Gemini-based models using similar techniques, though their focus is on pure math problems whereas ours is mostly on other disciplines, such as many areas of theoretical computer science, as well as economics and physics.
Roadmap
The remainder of this paper is organized as follows. Section 2 synthesizes the common techniques and methodologies derived from these case studies. The sections after that give detailed and extensive testimonials from each collaboration, roughly categorized by the main role of the AI in the collaboration. Finally, Section 2.8 discusses the broader implications for the future of theoretical research.
2
Techniques for AI-Assisted Research
Across the various successful collaborations documented here, several common techniques emerged. These strategies represent a "playbook" for researchers looking to integrate AI into their theoretical work.
2.1
Iterative Prompting and Refinement
Rarely does a model solve a deep open problem in a single shot. Success often comes from an iterative dialogue.
•
Initial Broad Query: Start by asking the model to digest a relevant paper or problem statement to gauge its understanding.
•
Specific Sub-tasks: Break down the main problem into smaller, verifiable lemmas or calculations.
•
Error Correction: When the model makes a mistake (e.g., a wrong constant or invalid assumption), pointing it out specifically often leads to a correct and sometimes more elegant solution in the next turn.
•
Scaffolding: Providing the model with a high-level proof strategy or "scaffold" allows it to fill in the technical details effectively.
•
Adversarial Self-Correction for Review: When tasked with reviewing complex proofs, standard prompts often yield superficial results. A rigorous protocol instructing the model to (1) generate an initial review, (2) critique its own findings for hallucinations, and (3) iteratively refine the logic, enables deep technical critique. This was critical in identifying the flaw in the SNARGs paper (Section 3.2 ), where the model distinguished between a definition of perfect consistency and a construction of statistical consistency .
2.2
Cross-Pollination of Ideas
Models have ingested vast amounts of literature across all fields. They excel at:
•
Finding Analogies: Identifying similar problems in different domains (e.g., applying techniques from computational geometry to graph theory).
•
Retrieving obscure theorems: Bringing relevant but less-known theorems to the researcher’s attention (e.g., Stone-Weierstrass or Kirszbraun Extension Theorem) to bridge gaps in a proof.
(a) From Discrete Combinatorics to Continuous Measure Theory: To resolve an open question about bounded-rank SDP solutions for Max-Cut, the AI reframed a discrete combinatorial problem involving unit vectors into an energy minimization problem over continuous probability measures on the unit sphere ( S d − 1 S^{d-1} , illustrated above). This cross-pollination allowed the AI to bypass traditional approximation tool-kits and instead apply the Stone-Weierstrass Theorem from geometric functional analysis to establish the necessary variance bounds. See Section 4.1 .
(b) From Graph Embeddings to Hilbert Space Geometry: To resolve the "Simplex is Best" conjecture for Euclidean Steiner Trees, the AI constructed a mapping from the Euclidean embedding of an arbitrary graph (orange network, left) to the embedding of a star graph, which forms a regular geometric simplex (green tetrahedron, right). By recognizing this as a distance-compressing (1-Lipschitz) map, the AI realized it could apply the obscure Kirszbraun Extension Theorem to formally guarantee that transforming the graph into a simplex never increases the Steiner tree cost. See Section 4.2 .
Figure 2 : Visualizing Cross-Pollination: The AI model successfully resolved open problems by reframing them in entirely different mathematical domains, bringing advanced topological and geometric theorems to bear on algorithmic and combinatorial problems.
2.3
Simulation and Counterexample Search
For conjectures, models can be tasked to:
•
Construct Counterexamples: Generating specific instances (graphs, matrices, set systems) that violate a proposed conjecture.
•
Verify Small Cases: Writing code to computationally verify a conjecture for small n n , providing empirical evidence before attempting a general proof.
2.4
Formalization and Rigor Checks
While models can hallucinate, they are increasingly capable of formal reasoning when prompted correctly.
•
Proof Sketch to Formal Proof: Asking the model to expand a high-level sketch into a rigorous L a T e X proof.
•
Sanity Checking: Using the model to check consistent usage of notation or to verify that all conditions of a theorem are met.
•
Mathematical Derivation: Researchers can offload the mechanical heavy lifting of complex derivations to the model, such as simplifying expressions, computing limits, or solving integrals, allowing them to focus on the high-level logic.
2.5
Interactive Proof Construction with External Validation
A powerful technique involves using the model to identify necessary external theorems and then validating those theorems with external sources.
•
Identifying Dependencies: Asking the model to list all external theorems required for a proof.
•
External Verification: The researcher finds the formal statements of these theorems (e.g., via Google Search or textbooks) and feeds them back to the model.
•
Self-Contained Proof Generation: The model then incorporates these verified statements to generate a rigorous, self-contained proof.
2.6
Agentic Tool-Use and Automated Feedback
While most of the collaborations documented here rely on manual, iterative dialogue, frontier models can also be deployed as agents within automated programmatic loops. For problems requiring heavy algebraic manipulation or where symbolic math must be rigorously grounded in numerical reality (e.g., the physics case study in Section 6.4), researchers can construct a “neuro-symbolic" pipeline consisting of the following steps:
•
Symbolic Proposal: The LLM generates a mathematical hypothesis or intermediate expression.
•
Code Generation: The LLM autonomously writes an executable script (e.g., in Python) to evaluate its proposed math against a known numerical baseline.
•
Automated Feedback: The system executes the code. If the code fails, hits a runtime error, or reveals numerical instability (such as catastrophic cancellation), the automated harness captures the exact execution traceback and injects it back into the LLM’s context window.
This programmatic loop allows the AI to systematically explore a tree of mathematical solutions, autonomously pruning dead-ends and self-correcting its algebra without requiring a human-in-the-loop for every intermediate step.
2.7
Human-AI Collaboration Dynamics
It is important to note that AI models like Gemini function best as powerful collaborators rather than autonomous researchers. In the successful case studies presented here, the partnership between the model and the human expert was key to the results.
•
Selection and Refinement: Models are capable of generating a high volume of diverse mathematical statements. Human expertise is valuable for filtering these outputs and identifying the most promising directions for further investigation.
•
Iterative Guidance: While models can solve some problems in a single shot, tackling deep open problems is often most successful through an iterative process. The researcher guides the model, refining the problem statement and narrowing the focus to achieve the desired result.
•
Standard Verification: As with any research collaboration, the AI can make mistakes, and AI-generated proofs and counterexamples benefit from rigorous verification. The model serves as an excellent accelerator for ideation and drafting, while the researcher validates the mathematical correctness.
•
Optimizing Context: Performance is often optimized by providing clear, self-contained definitions, particularly when using highly specialized notation that may deviate from standard literature.
•
Leveraging Literature: We found that incorporating relevant papers directly into the context significantly enhanced the model’s ability to construct correct proofs for specialized domains.
•
Context De-Identification: The model sometimes avoids non-trivial machinery (for example, the Kirszbraun extension theorem as in Section 4.2 ), treating such proofs as non-elementary, or it may do so because the prompt steers it toward conservatism to avoid hallucinations, causing it to abandon an otherwise viable approach. Separately, on occasion, when shown the paper as context in the prompt, it recognizes the statement to prove as a conjecture in the paper and refuses to attempt it on the grounds that it is an open problem. One way to bypass both issues is via context de-identification (remove the paper and provide only the problem statement and definitions), after which the model typically engages (and, in cases like Section 4.2 , may ultimately draw on deeper results to resolve a conjecture).
We view the AI as a tireless, knowledgeable, and creative bright junior collaborator. Its value lies in its ability to synthesize vast amounts of information and generate novel hypotheses that human researchers can then validate and build upon.
2.8
Summary: The AI-Assisted Research Playbook
Taken together, the techniques outlined above represent a fundamental shift in how theoretical research can be conducted. The LLM is no longer acting merely as a search engine or a syntax formatter; it is functioning as a combinatorial reasoning engine and a sounding board for abstract ideation.
However, the most successful collaborations documented in the following case studies all share a common denominator: strong human orchestration. Although several of our successes came from a single “zero-shot" prompt, many required scaffolded reasoning, i.e., breaking down deep open problems into verifiable parts, testing hypotheses through adversarial prompting, and actively steering the model. Informally this interactive workflow has been called “vibe-proving".
By mastering the techniques outlined above—particularly iterative refinement and adversarial self-correction—researchers can effectively elevate the AI from a passive tool into an active, high-leverage research partner. The following sections document these dynamics in practice across a wide array of disciplines.
Chat interactions are presented using a light, color-coded design to distinguish speakers while maintaining a calm and unobtrusive appearance. When present, the overall chat box uses a sand-colored background, providing a warm, neutral container that visually groups multiple messages within a conversation. In cases where only a single message appears—such as an isolated user message or an isolated AI response—the surrounding chat box may be omitted, and only the message styling is shown. Messages authored by the user are displayed on a light, subtle blue background, chosen to clearly indicate user input while remaining easy to read for extended or technical text. AI-generated messages are shown on a very light red background; this tone is intentionally muted to avoid the visual intensity typically associated with red, while still providing a clear distinction from user messages. Across all elements, colors are kept near-white and low in saturation so that speaker differences are conveyed through hue rather than darkness, ensuring readability on screens, in print, and in accessibility-conscious settings.
3
Deep Technical Review and Counterexamples
In this section, we see how AI can act as a critical adversary, finding subtle bugs in human proofs, refuting plausible hypotheses, and hunting for counterexamples.
3.1
Online Algorithms: Submodular Welfare
Written by Morteza Zadimoghaddam and Vahab Mirrokni.
Problem Context
In online submodular welfare maximization, the Greedy algorithm achieves a competitive ratio of 0.5 0.5 . A conjecture by Korula et al. (2015) suggested that the gain from "copying" an item to the end of the stream is bounded by the gain of "moving" it. Proving this would imply a competitive ratio of 0.567 0.567 .
AI Contribution
The researcher provided the research paper that included the formal definitions as well as a conjecture statement, and tasked the model zero-shot with just identifying any open question in the paper and solving it. Throughout this paper we add a prompt that asks the model to be rigorous in its mathematical proofs and to thoroughly check its answers - we do not specify the details of this prompt.
Please try to improve the paper by identifying and solving an open question from it.
Remarkably, the model chose the following counterexample and refuted it autonomously. The subsequent verification calculations were also generated by the model without extra interaction or supervision. This demonstrates its ability to construct valid, non-trivial instances that violate human-proposed conjectures. The human researchers defined the initial search parameters and independently verified the model’s arithmetic.
•
Refutation Strategy: The model independently selected the minimal non-trivial dimensions ( n = 3 n=3 items, m = 2 m=2 agents).
•
Autonomous Construction: In a single output, the model successfully defined the specific, valid submodular valuation functions (presented in Table 1) and the ε \varepsilon -perturbation required to strictly break the bound.
•
Automated Verification: Without human intervention, the model correctly performed the tedious calculations of expected marginal gains across all 3 ! = 6 3!=6 permutations to formally verify the violation.
Technical Details
In the problem of Online Submodular Welfare Maximization (Online SWM), items arrive one by one in a random order and must be irrevocably assigned to agents with submodular valuation functions. The goal is to maximize the total social welfare. It is well-established that the simple Greedy algorithm achieves a competitive ratio of 1 / 2 1/2 in the adversarial setting. In the random order model, Korula et al. proved that Greedy achieves a ratio of at least 0.5052 0.5052 , strictly beating 1 / 2 1/2 [ 67 ] .
To obtain a stronger bound, the authors proposed a natural conjecture (Conjecture 15) relating the expected marginal gain of the Greedy algorithm when an item is moved to the end of the sequence versus when it is duplicated at the end. If true, this conjecture would imply a competitive ratio of at least 0.567 0.567 .
In this note, we refute this conjecture. We construct a counter-example involving a specific submodular valuation function and a set of items where the conjectured inequality fails to hold.
The Open Question
We focus on the definitions and conjecture provided in Section 5 of [ 67 ] . Let π = ( π 1 , π 2 , π 3 , … , π n ) \pi=(\pi_{1},\pi_{2},\pi_{3},\dots,\pi_{n}) be a permutation of n n items.
Definition 3.1
(Permutation Variants [ 67 ] ) .
Fix a permutation π \pi .
•
Let π M o v e , i \pi^{Move,i} be the permutation achieved by moving the item π i \pi_{i} to the end of the sequence:
π M o v e , i = ( π 1 , … , π i − 1 , π i + 1 , … , π n , π i ) \pi^{Move,i}=(\pi_{1},\dots,\pi_{i-1},\pi_{i+1},\dots,\pi_{n},\pi_{i})
•
Let π C o p y , i \pi^{Copy,i} be the sequence of n + 1 n+1 items achieved by copying π i \pi_{i} to the end without removing the original π i \pi_{i} :
π C o p y , i = ( π 1 , … , π i − 1 , π i , π i + 1 , … , π n , π i ) \pi^{Copy,i}=(\pi_{1},\dots,\pi_{i-1},\pi_{i},\pi_{i+1},\dots,\pi_{n},\pi_{i})
Let MG ( k , σ ) \mathrm{MG}(k,\sigma) denote the marginal gain that the Greedy algorithm obtains by allocating the k k -th arriving item in a sequence σ \sigma . Let 𝕊 n \mathbb{S}_{n} be the set of all n ! n! permutations. The conjecture proposed by Korula et al. is as follows:
Conjecture 3.2
(Conjecture 15 of [ 67 ] ) .
For any instance of the online submodular welfare maximization problem,
𝔼 π ∼ 𝕊 n [ ∑ i = 1 n MG ( n + 1 , π C o p y , i ) ] ≤ 𝔼 π ∼ 𝕊 n [ ∑ i = 1 n MG ( n , π M o v e , i ) ] \mathbb{E}_{\pi\sim\mathbb{S}_{n}}\left[\sum_{i=1}^{n}\mathrm{MG}(n+1,\pi^{Copy,i})\right]\leq\mathbb{E}_{\pi\sim\mathbb{S}_{n}}\left[\sum_{i=1}^{n}\mathrm{MG}(n,\pi^{Move,i})\right]
(1)
The Right Hand Side (RHS) of the inequality is equivalent to n ⋅ 𝔼 π ∼ 𝕊 n [ MG ( n , π ) ] n\cdot\mathbb{E}{\pi\sim\mathbb{S}{n}}[\mathrm{MG}(n,\pi)] , as π M o v e , i \pi^{Move,i} and π \pi share the same distribution. The conjecture essentially posits that the gain from a duplicated item at the end of the stream is, in expectation, bounded by the gain of the last item in a standard random permutation.
Refutation of the Conjecture
We answer the open question negatively. We show that there exist submodular valuation functions and instances where the expected marginal gain from the copied item exceeds the bound proposed in Conjecture 3.2 .
Theorem 3.3
.
Conjecture 3.2 is false. There exists an instance of Online SWM such that:
𝔼 π ∼ 𝕊 n [ ∑ i = 1 n MG ( n + 1 , π C o p y , i ) ] > 𝔼 π ∼ 𝕊 n [ ∑ i = 1 n MG ( n , π M o v e , i ) ] \mathbb{E}_{\pi\sim\mathbb{S}_{n}}\left[\sum_{i=1}^{n}\mathrm{MG}(n+1,\pi^{Copy,i})\right]>\mathbb{E}_{\pi\sim\mathbb{S}_{n}}\left[\sum_{i=1}^{n}\mathrm{MG}(n,\pi^{Move,i})\right]
Counterexample Construction
We construct an instance with n = 3 n=3 items N = { e , x 1 , x 2 } N={e,x_{1},x_{2}} and m = 2 m=2 agents { A 1 , A 2 } {A_{1},A_{2}} . We define the valuation functions v 1 v_{1} and v 2 v_{2} . Let ε = 0.1 \varepsilon=0.1 . We define v 1 ( S ) = v 1 0 ( S ) + ε | S | v_{1}(S)=v_{1}^{0}(S)+\varepsilon|S| , where v 1 0 ( S ) v_{1}^{0}(S) and v 2 ( S ) v_{2}(S) are defined in Table 1 .
Table 1 : Valuation Functions
S S
v 1 0 ( S ) v_{1}^{0}(S)
v 2 ( S ) v_{2}(S)
v 1 ( S ) v_{1}(S)
∅ \emptyset
0 0 0
{ e } \{e\}
10 10 10.1
{ x 1 } \{x_{1}\}
10 7 10.1
{ x 2 } \{x_{2}\}
10 11 10.1
{ e , x 1 } \{e,x_{1}\}
15 17 15.2
{ e , x 2 } \{e,x_{2}\}
20 16 20.2
{ x 1 , x 2 } \{x_{1},x_{2}\}
20 16 20.2
{ e , x 1 , x 2 } \{e,x_{1},x_{2}\}
25 21 25.3
Verification of Validity We must verify that v 1 v_{1} and v 2 v_{2} are monotone and submodular.
Monotonicity: It is clear from Table 1 that both v 1 v_{1} and v 2 v_{2} are monotone (i.e., v ( S ) ≤ v ( T ) v(S)\leq v(T) if S ⊆ T S\subseteq T ).
Submodularity of v 2 v_{2} : We check the marginal gains (MG).
•
MG 2 ( e ) \mathrm{MG}{2}(e) : ∅ ( 10 ) \emptyset(10) , { x 1 } ( 10 ) {x{1}}(10) , { x 2 } ( 5 ) {x_{2}}(5) , { x 1 , x 2 } ( 5 ) {x_{1},x_{2}}(5) .
•
MG 2 ( x 1 ) \mathrm{MG}{2}(x{1}) : ∅ ( 7 ) \emptyset(7) , { e } ( 7 ) {e}(7) , { x 2 } ( 5 ) {x_{2}}(5) , { e , x 2 } ( 5 ) {e,x_{2}}(5) .
•
MG 2 ( x 2 ) \mathrm{MG}{2}(x{2}) : ∅ ( 11 ) \emptyset(11) , { e } ( 6 ) {e}(6) , { x 1 } ( 9 ) {x_{1}}(9) , { e , x 1 } ( 4 ) {e,x_{1}}(4) .
In all cases, the marginal gains are non-increasing as the conditioning set grows. Thus, v 2 v_{2} is submodular.
Submodularity of v 1 v_{1} : We first check v 1 0 v_{1}^{0} .
•
MG 1 0 ( e ) \mathrm{MG}{1}^{0}(e) : ∅ ( 10 ) \emptyset(10) , { x 1 } ( 5 ) {x{1}}(5) , { x 2 } ( 10 ) {x_{2}}(10) , { x 1 , x 2 } ( 5 ) {x_{1},x_{2}}(5) .
•
MG 1 0 ( x 1 ) \mathrm{MG}{1}^{0}(x{1}) : ∅ ( 10 ) \emptyset(10) , { e } ( 5 ) {e}(5) , { x 2 } ( 10 ) {x_{2}}(10) , { e , x 2 } ( 5 ) {e,x_{2}}(5) .
•
MG 1 0 ( x 2 ) \mathrm{MG}{1}^{0}(x{2}) : ∅ ( 10 ) \emptyset(10) , { e } ( 10 ) {e}(10) , { x 1 } ( 10 ) {x_{1}}(10) , { e , x 1 } ( 10 ) {e,x_{1}}(10) .
v 1 0 v_{1}^{0} is submodular. Since v 1 ( S ) = v 1 0 ( S ) + ε | S | v_{1}(S)=v_{1}^{0}(S)+\varepsilon|S| is the sum of a submodular function and a modular function, v 1 ( S ) v_{1}(S) is submodular.
Calculation of the RHS
The RHS is the sum of the expected gains of each item when it arrives last. Let W ( j ) W(j) denote this expectation for item j j .
W ( e ) W(e) ( e e arrives last). We consider the permutations of { x 1 , x 2 } \{x_{1},x_{2}\} .
•
( x 1 , x 2 ) (x_{1},x_{2}) . x 1 x_{1} : A 1 ( 10.1 ) A_{1}(10.1) vs A 2 ( 7 ) A_{2}(7) . x 1 → A 1 x_{1}\to A_{1} .
x 2 x_{2} : MG 1 ( x 2 | x 1 ) = 10.1 \mathrm{MG}{1}(x{2}|x_{1})=10.1 . MG 2 ( x 2 ) = 11 \mathrm{MG}{2}(x{2})=11 . x 2 → A 2 x_{2}\to A_{2} .
Gain of e e : MG 1 ( e | x 1 ) = 5.1 \mathrm{MG}{1}(e|x{1})=5.1 . MG 2 ( e | x 2 ) = 5 \mathrm{MG}{2}(e|x{2})=5 . Gain is 5.1.
•
( x 2 , x 1 ) (x_{2},x_{1}) . x 2 x_{2} : A 1 ( 10.1 ) A_{1}(10.1) vs A 2 ( 11 ) A_{2}(11) . x 2 → A 2 x_{2}\to A_{2} .
x 1 x_{1} : MG 1 ( x 1 ) = 10.1 \mathrm{MG}{1}(x{1})=10.1 . MG 2 ( x 1 | x 2 ) = 5 \mathrm{MG}{2}(x{1}|x_{2})=5 . x 1 → A 1 x_{1}\to A_{1} .
Gain of e e : 5.1.
Result: W ( e ) = 5.1 W(e)=5.1 .
W ( x 1 ) W(x_{1}) ( x 1 x_{1} arrives last). We consider the permutations of { e , x 2 } {e,x_{2}} .
•
( e , x 2 ) (e,x_{2}) . e e : A 1 ( 10.1 ) A_{1}(10.1) vs A 2 ( 10 ) A_{2}(10) . e → A 1 e\to A_{1} .
x 2 x_{2} : MG 1 ( x 2 | e ) = 10.1 \mathrm{MG}{1}(x{2}|e)=10.1 . MG 2 ( x 2 ) = 11 \mathrm{MG}{2}(x{2})=11 . x 2 → A 2 x_{2}\to A_{2} .
Gain of x 1 x_{1} : MG 1 ( x 1 | e ) = 5.1 \mathrm{MG}{1}(x{1}|e)=5.1 . MG 2 ( x 1 | x 2 ) = 5 \mathrm{MG}{2}(x{1}|x_{2})=5 . Gain is 5.1.
•
( x 2 , e ) (x_{2},e) . x 2 → A 2 x_{2}\to A_{2} . e → A 1 e\to A_{1} . Gain of x 1 x_{1} is 5.1.
Result: W ( x 1 ) = 5.1 W(x_{1})=5.1 .
W ( x 2 ) W(x_{2}) ( x 2 x_{2} arrives last). We consider the permutations of { e , x 1 } {e,x_{1}} .
•
( e , x 1 ) (e,x_{1}) . e → A 1 e\to A_{1} .
x 1 x_{1} : MG 1 ( x 1 | e ) = 5.1 \mathrm{MG}{1}(x{1}|e)=5.1 . MG 2 ( x 1 ) = 7 \mathrm{MG}{2}(x{1})=7 . x 1 → A 2 x_{1}\to A_{2} .
Gain of x 2 x_{2} : MG 1 ( x 2 | e ) = 10.1 \mathrm{MG}{1}(x{2}|e)=10.1 . MG 2 ( x 2 | x 1 ) = 9 \mathrm{MG}{2}(x{2}|x_{1})=9 . Gain is 10.1.
•
( x 1 , e ) (x_{1},e) . x 1 → A 1 x_{1}\to A_{1} .
e e : MG 1 ( e | x 1 ) = 5.1 \mathrm{MG}{1}(e|x{1})=5.1 . MG 2 ( e ) = 10 \mathrm{MG}{2}(e)=10 . e → A 2 e\to A{2} .
Gain of x 2 x_{2} : MG 1 ( x 2 | x 1 ) = 10.1 \mathrm{MG}{1}(x{2}|x_{1})=10.1 . MG 2 ( x 2 | e ) = 6 \mathrm{MG}{2}(x{2}|e)=6 . Gain is 10.1.
Result: W ( x 2 ) = 10.1 W(x_{2})=10.1 .
R H S = W ( e ) + W ( x 1 ) + W ( x 2 ) = 5.1 + 5.1 + 10.1 = 20.3 . RHS=W(e)+W(x_{1})+W(x_{2})=5.1+5.1+10.1=20.3.
Calculation of the LHS
The LHS is the expected total residual welfare. We calculate the total residual R ( π ) R(\pi) for all 3 ! = 6 3!=6 permutations. The residual of item j j , R j ( π ) R_{j}(\pi) , is the maximum marginal gain of j j for any agent given the final allocation by Greedy on π \pi . (The agent who received j j has 0 marginal gain for a copy).
π 1 = ( e , x 1 , x 2 ) \pi_{1}=(e,x_{1},x_{2}) .
e → A 1 e\to A_{1} . x 1 → A 2 x_{1}\to A_{2} (7 vs 5.1). x 2 → A 1 x_{2}\to A_{1} (10.1 vs 9).
Allocation: A 1 = { e , x 2 } , A 2 = { x 1 } A_{1}={e,x_{2}},A_{2}={x_{1}} . R e = MG 2 ( e | x 1 ) = 10 R_{e}=\mathrm{MG}{2}(e|x{1})=10 . R x 1 = MG 1 ( x 1 | e , x 2 ) = 5.1 R_{x_{1}}=\mathrm{MG}{1}(x{1}|e,x_{2})=5.1 . R x 2 = MG 2 ( x 2 | x 1 ) = 9 R_{x_{2}}=\mathrm{MG}{2}(x{2}|x_{1})=9 . R ( π 1 ) = 24.1 R(\pi_{1})=24.1 .
π 2 = ( e , x 2 , x 1 ) \pi_{2}=(e,x_{2},x_{1}) . e → A 1 e\to A_{1} . x 2 → A 2 x_{2}\to A_{2} (11 vs 10.1). x 1 → A 1 x_{1}\to A_{1} (5.1 vs 5). Allocation: A 1 = { e , x 1 } , A 2 = { x 2 } A_{1}={e,x_{1}},A_{2}={x_{2}} . R e = MG 2 ( e | x 2 ) = 5 R_{e}=\mathrm{MG}{2}(e|x{2})=5 . R x 1 = MG 2 ( x 1 | x 2 ) = 5 R_{x_{1}}=\mathrm{MG}{2}(x{1}|x_{2})=5 . R x 2 = MG 1 ( x 2 | e , x 1 ) = 10.1 R_{x_{2}}=\mathrm{MG}{1}(x{2}|e,x_{1})=10.1 . R ( π 2 ) = 20.1 R(\pi_{2})=20.1 .
π 3 = ( x 1 , e , x 2 ) \pi_{3}=(x_{1},e,x_{2}) . x 1 → A 1 x_{1}\to A_{1} . e → A 2 e\to A_{2} (10 vs 5.1). x 2 → A 1 x_{2}\to A_{1} (10.1 vs 6). Allocation: A 1 = { x 1 , x 2 } , A 2 = { e } A_{1}={x_{1},x_{2}},A_{2}={e} . R e = MG 1 ( e | x 1 , x 2 ) = 5.1 R_{e}=\mathrm{MG}{1}(e|x{1},x_{2})=5.1 . R x 1 = MG 2 ( x 1 | e ) = 7 R_{x_{1}}=\mathrm{MG}{2}(x{1}|e)=7 . R x 2 = MG 2 ( x 2 | e ) = 6 R_{x_{2}}=\mathrm{MG}{2}(x{2}|e)=6 . R ( π 3 ) = 18.1 R(\pi_{3})=18.1 .
π 4 = ( x 2 , e , x 1 ) \pi_{4}=(x_{2},e,x_{1}) . x 2 → A 2 x_{2}\to A_{2} . e → A 1 e\to A_{1} . x 1 → A 1 x_{1}\to A_{1} . Allocation: A 1 = { e , x 1 } , A 2 = { x 2 } A_{1}={e,x_{1}},A_{2}={x_{2}} . R ( π 4 ) = R ( π 2 ) = 20.1 R(\pi_{4})=R(\pi_{2})=20.1 .
π 5 = ( x 1 , x 2 , e ) \pi_{5}=(x_{1},x_{2},e) . x 1 → A 1 x_{1}\to A_{1} . x 2 → A 2 x_{2}\to A_{2} . e → A 1 e\to A_{1} . Allocation: A 1 = { x 1 , e } , A 2 = { x 2 } A_{1}={x_{1},e},A_{2}={x_{2}} . R ( π 5 ) = 20.1 R(\pi_{5})=20.1 .
π 6 = ( x 2 , x 1 , e ) \pi_{6}=(x_{2},x_{1},e) . x 2 → A 2 x_{2}\to A_{2} . x 1 → A 1 x_{1}\to A_{1} . e → A 1 e\to A_{1} . Allocation: A 1 = { x 1 , e } , A 2 = { x 2 } A_{1}={x_{1},e},A_{2}={x_{2}} . R ( π 6 ) = 20.1 R(\pi_{6})=20.1 .
The sum of the total residuals is 24.1 + 20.1 + 18.1 + 20.1 + 20.1 + 20.1 = 122.6 24.1+20.1+18.1+20.1+20.1+20.1=122.6 .
L H S = E [ R ( π ) ] = 122.6 6 . LHS=E[R(\pi)]=\frac{122.6}{6}.
Conclusion
We compare the LHS and the RHS.
R H S \displaystyle RHS
= 20.3 = 121.8 6 . \displaystyle=20.3=\frac{121.8}{6}.
L H S \displaystyle LHS
= 122.6 6 . \displaystyle=\frac{122.6}{6}.
Since 122.6 6 > 121.8 6 \frac{122.6}{6}>\frac{121.8}{6} , we have L H S > R H S LHS>RHS . This violates the inequality stated in Conjecture 15. Therefore, Conjecture 15 is false.
The failure of Conjecture 15 implies that the specific path to proving a 0.567 0.567 competitive ratio proposed by [ 67 ] is invalid. While the Greedy algorithm is known to beat 1 / 2 1/2 in the random order model, establishing the tightest possible bound remains an open challenge. Our result indicates that the interaction between duplicated items and previous greedy allocations is more complex than the “Move” vs “Copy” intuition suggests.
3.2
Cryptography: AI-Assisted Bug Detection in SNARGs
Written by Ziyi Guan, David P. Woodruff, and Eylon Yogev
Problem Context
Succinct non-interactive arguments (SNARGs) are among the most important and actively studied primitives in modern cryptography. A SNARG enables a prover to convince a verifier of some computational statement (such as knowledge of a solution to a complex equation or the validity of a transaction according to network rules) using a proof that is extremely short and efficient to verify. Remarkably, the proof size and verification time can be exponentially smaller than the time required to check the statement directly. The succinct proof and verification is critical for real-world applications. For instance:
•
Verifiable Outsourced Computation: A weak client (like a smartphone) can offload a heavy computation to a cloud server and receive a short proof that the result is correct, without trusting the server.
•
Blockchain Scalability: Blockchains can use SNARGs (often in the form of zk-SNARKs) to compress the verification of thousands of transactions into a single tiny proof, enabling massive throughput improvements (e.g., ZK-Rollups).
Constructions of SNARGs are notoriously difficult despite the active research. For decades, researchers have sought to build them based on standard cryptographic assumptions —assumptions that are widely believed to be true and have withstood years of cryptanalytic attacks. The “holy grail” in this area is to build a SNARG from the learning with errors (LWE) assumption. LWE is the foundation of modern lattice-based cryptography; it is believed to be secure even against quantum computers (post-quantum security) and allows for powerful functionalities like fully homomorphic encryptions.
Existing SNARG constructions typically fall short of this ideal in the following ways:
Idealized Models: The most efficient SNARGs are usually constructed in the random oracle model (ROM), where hash functions are modeled as perfect random functions. However, random oracles cannot be instantiated in the real world and thus only provide heuristic security.
Non-Standard Assumptions: Another common way to build SNARGs (with knowledge) relies on assumptions like “Knowledge of Exponent,” which are non-falsifiable and sometimes imply conclusions too strong to be true.
Indistinguishability Obfuscation (iO): While iO can build almost anything, including SNARGs, current constructions of iO are extremely complex, inefficient, and rely on very heavy mathematical machinery.
The paper “SNARGs for NP from LWE” by Ziyi Guan and Eylon Yogev [ 52 ] claimed to achieve the breakthrough: a SNARG for all of NP based solely on the sub-exponential hardness of LWE. Their proposed construction involved two novel components: “PCP shadows” (a way to compress probabilistically checkable proofs) and “function vector commitments” (a lattice-based vector commitment scheme).
AI Contribution via Advanced Prompting
As part of an experiment to evaluate the capability of AI models in theoretical computer science research, we analyzed the preprint of the paper using a Google-internal LLM. Crucially, we did not simply ask the model to “check the proof,” as standard prompts often lead to superficial reviews or hallucinations when dealing with complex mathematical logic.
Instead, we employed a rigorous iterative self-correction prompt . The prompt was designed to force the model to act as a meticulous and adversarial reviewer. The explicit instructions are in Figure ˜ 3 .
1.
Generate an initial review that was strictly objective, focusing only on identifying errors and suggesting improvements.
Self-correct its first review by rigorously critiquing its own findings. The model was told to verify every derivation, check for hallucinations, and ensure that any claim of an error was substantive.
Generate a revised review incorporating these corrections.
Perform a second round of self-correction to further refine the logic and ensure comprehensive coverage of the paper, including appendices.
Produce a final, verified review adhering to strict mathematical standards.
For your proof, adhere to: ⟨ rigor text ⟩ \langle\text{rigor text}\rangle .
Figure 3 : Iterative self-correction prompt.
Furthermore, the prompt included a specific protocol for mathematical rigor. It required the model to distinguish between a “Complete Proof” (only if every step was verified) and “Structured Partial Progress” (if any gaps were found). It also required the model to explicitly flag gaps or unproven assumptions using a specific tag. This approach was used by Google recently for providing preliminary feedback to authors of STOC submissions, see [ 23 ] .
The Finding and Verification
Using this method, the model’s final review flagged a critical, fatal flaw in the paper’s construction of “PCP with shadow soundness” (Section 4).
The issue lay in the discrepancy between the definition of the scheme and its actual construction:
•
The Definition: Definition 4.1 (Property 3) of the paper required perfect consistency . This meant that for any two valid proofs that shared a local view, the generated “shadows” (succinct digests of the proof) must be identical for every single choice of randomness used in the shadow generation process.
•
The Construction: The construction presented in Section 4.3, however, only achieved imperfect consistency . This meant that the shadows would match with high probability over the choice of randomness, but there could exist bad randomness values where they differed.
The model argued that this was not a minor technicality. The security proof for the SNARG (specifically the soundness argument) relied heavily on the strong, perfect consistency definition. It assumed that once a commitment was opened, the underlying shadow was unique and well-defined. By only achieving statistical consistency, the construction opened the door to an adversary who could find specific bad randomness values to break the binding between the proof and its shadow. This would allow an adversary to forge a valid argument for a false statement, completely undermining the security of the SNARG.
Expert Verification and Outcome
To validate the model’s finding, we shared the detailed critique with cryptography experts Aayush Jain and Zhengzhong Jin. They performed an independent review of the paper and the model’s output.
Their assessment was definitive: the model was correct. They confirmed that the gap between the definition and construction of perfect consistency was a fatal flaw in the logic. They noted that while the model produced some noise (flagging other, less relevant issues), this specific insight was accurate and hit the core of the problem. Without perfect consistency, the reduction used in the security proof fails, and the main theorem does not hold.
We subsequently communicated these findings to the paper’s authors, Ziyi Guan and Eylon Yogev. They acknowledged the validity of the issue raised by the model. They agreed that the gap between the definition and construction of consistency was a significant error. Following this, they updated their paper on the Cryptology ePrint Archive. The updated manuscript includes a note in red text explicitly stating that a gap in the proof of the main theorem had been found.
This episode serves as a powerful testament to the evolving capabilities of large language models in scientific research. While LLMs are often criticized for their inability to reason or their tendency to hallucinate, this case shows that with the right prompting strategy—specifically one that enforces iterative self-correction and rigorous verification—they can identify subtle, deep technical flaws in state-of-the-art research.
4
Cross-Pollination of Ideas
In this section we see AI acting as an interdisciplinary bridge, retrieving obscure theorems from entirely different mathematical domains to overcome roadblocks.
4.1
Approximation Algorithms: Max-Cut
Written by Euiwoong Lee.
Problem Context
The Goemans-Williamson (GW) algorithm uses Semidefinite Programming (SDP) to achieve an optimal 0.878-approximation for Max-Cut. A long-standing open question asks whether rounding SDP solutions of bounded rank d d can yield a better approximation ratio. While known for d = 2 , 3 d=2,3 , the case for general d d remained open.
I realized that this could be resolved if the following simple and natural geometric property held: Does the variance of a sum of random variables X = ∑ sgn ( ⟨ g , v i ⟩ ) X=\sum\text{sgn}(\langle g,v_{i}\rangle) (where v i v_{i} are unit vectors with bounded pairwise inner products) have a lower bound C ( d ) n 2 C(d)n^{2} ?
AI Contribution
The model solved this geometric variance question.
•
Synthesizing Advanced Math: The model recognized this as a problem solvable via geometric functional analysis , a field outside the typical approximation algorithms toolkit.
•
Proof via Measure Theory: It constructed a proof using the compactness of the space of probability measures on the sphere.
•
Iterative Rigor: When we questioned the validity of using “relaxed" measure spaces, the model justified its steps by citing and applying the Stone-Weierstrass Theorem, the Riesz Representation Theorem, and properties of spherical harmonics.
Max-Cut on Bounded-dimension SDP Solutions
Max-Cut is one of the most fundamental problems in combinatorial optimization and approximation algorithms. Given an undirected graph G = ( V , E ) G=(V,E) , the goal is to partition V V into V + V^{+} and V − V^{-} to maximize the number of edges between V + V^{+} and V − V^{-} ; equivalently, compute x : V → { ± 1 } x:V\to{\pm 1} to maximize ∑ ( u , v ) ∈ E 1 − x ( u ) x ( v ) 2 \sum_{(u,v)\in E}\frac{1-x(u)x(v)}{2} .
While combinatorial or linear programming (LP)-based methods only yielded a 1 / 2 1/2 -approximation (e.g., the returned solution’s value is at least half of the optimal value) in polynomial time, the celebrated result of Goemans and Williamson [ 51 ] used a semidefinite programming (SDP) relaxation to obtain an approximation ratio of α G W := min a ∈ [ − 1 , + 1 ] arccos ( a ) / π ( 1 − a ) / 2 ≈ 0.878 \alpha_{GW}:=\min_{a\in[-1,+1]}\frac{\arccos(a)/\pi}{(1-a)/2}\approx 0.878 , which was later proved to be optimal for all polynomial-time algorithms assuming the Unique Games Conjecture [ 65 ] . This tight relationship between algorithms, complexity, and SDPs was later extended to numerous other combinatorial optimization tasks, most notably all Constraint Satisfaction Problems (CSPs) [ 81 ] .
Given the importance of this result, it is natural to ask which additional assumptions allow polynomial-time algorithms to achieve an approximation ratio α G W + C \alpha_{GW}+C for some constant C > 0 C>0 . Structural properties of the input graph G = ( V , E ) G=(V,E) have been the most popular assumptions, which include when G G is dense (i.e., | E | = Ω ( n 2 ) |E|=\Omega(n^{2}) [ 4 ] ), G G is bounded-degree [ 35 , 58 ] , or G G has a certain expansion property [ 9 ] .
In order to motivate assumptions depending on the structure of SDP solutions, let us briefly recall Goemans-Williamson’s SDP relaxation. Given G = ( { 1 , … , n } , E ) G=({1,\dots,n},E) , their SDP relaxation, with a symmetric matrix M ∈ ℝ n × n M\in\mathbb{R}^{n\times n} as the variable, is as follows:
maximize
∑ ( i , j ) ∈ E 1 − M i , j 2 \displaystyle\sum_{(i,j)\in E}\frac{1-M_{i,j}}{2}
subject to
M i , i = 1 ∀ i ∈ [ n ] . \displaystyle M_{i,i}=1\qquad\qquad\forall i\in[n].
M ⪰ 0 . \displaystyle M\succeq 0.
The fact that this SDP is a relaxation for Max-Cut (i.e., the SDP optimal value is at least the Max-Cut optimal value) is based on the fact that for any solution x : V → { ± 1 } x:V\to{\pm 1} for Max-Cut, M = x x T M=xx^{T} is a feasible solution for the SDP. For the α G W \alpha_{GW} -approximation, the algorithm is the following:
Compute the optimal SDP solution M M .
Compute its Gram decomposition M = V V T M=VV^{T} for some V ∈ ℝ n × d V\in\mathbb{R}^{n\times d} . Let v i v_{i} be the i i th row of V V .
Sample a random Gaussian vector g ∼ N ( 0 , I d ) g\sim N(0,I_{d}) .
For each i ∈ [ n ] i\in[n] , let x i ← sgn ( ⟨ v i , g ⟩ ) x_{i}\leftarrow\text{sgn}(\langle v_{i},g\rangle) . (I.e., x i ← + 1 x_{i}\leftarrow+1 if ⟨ v i , g ⟩ ≥ 0 \langle v_{i},g\rangle\geq 0 and x i ← − 1 x_{i}\leftarrow-1 otherwise.)
The α G W \alpha_{GW} -approximation can be proved by a simple edge-by-edge analysis where each edge ( i , j ) (i,j) contributes 1 − ⟨ v i , v j ⟩ 2 \frac{1-\langle v_{i},v_{j}\rangle}{2} to the SDP objective, and arccos ( ⟨ v i , v j ⟩ ) / π \arccos(\langle v_{i},v_{j}\rangle)/\pi to the Max-Cut objective in expectation (because it is the probability that sgn ( v i , g ) ≠ sgn ( v j , g ) \text{sgn}(v_{i},g)\neq\text{sgn}(v_{j},g) ).
So, how about the structural properties of the optimal SDP solution M M ? Perhaps one of the most natural assumptions on M M is its rank; the gist of the Max-Cut problem itself is to force the rank of M M to be 1 1 . Then the question is, if M M is a feasible solution to the SDP with rank d d , can we round M M to a Max-Cut solution of value at least ( α G W + C ( d ) ) (\alpha_{GW}+C(d)) times the SDP value of M M , where C ( d ) > 0 C(d)>0 is a constant depending only on d d ?
This is a natural question asked since the Goemans-Williamson algorithm, and Goemans gave a positive answer when d = 2 d=2 . Avidor and Zwick [ 5 ] refined it and gave a positive answer for d = 3 d=3 as well, but to the best of my knowledge, even the case d = 4 d=4 is open. It is surprising, especially given that analogous questions have already been answered for more general Grothendieck problems [ 14 , 15 ] ; however, they are not directly comparable, as the optimal ratios for Grothendieck problems are worse than α G W \alpha_{GW} .
While studying the recent result of Hsieh and Kothari [ 58 ] , which simplifies and extends the result of Feige, Karpinski, and Langberg [ 35 ] for Max-Cut on bounded-degree graphs, I realized that their algorithmic framework could be applied to the bounded-dimension case as well, modulo one very natural geometric question, which is stated below.
The rest of the section details how the model “solved it.” Now, there exists a polynomial-time algorithm that, given an SDP solution M M of rank d d , produces a Max-Cut solution whose value is at least ( α G W + C ( d ) ) (\alpha_{GW}+C(d)) times the value of M M , for some C ( d ) > 0 C(d)>0 . But, before detailing the answer, let us mention that the answer is not completely satisfactory in the sense that it does not produce any explicit expression for C C as a function of d d ; in contrast, when G G ’s degree is at most d d , the best-known approximation ratio is ( α G W + Ω ~ ( 1 d 2 log d ) ) (\alpha_{GW}+\widetilde{\Omega}(\frac{1}{d^{2}\log d})) [ 58 ] .
On the other hand, the proof is based on results from geometric analysis, including the compactness of a certain space of probability measures, which have not been used much in the design of approximation algorithms. Will it be among the first of many examples where such analytic tools are used in approximation algorithms (and inspire more “constructive proofs” with “more explicit bounds”)? In any case, this experience reinforces common beliefs that large language models can help researchers in areas where they are not experts.
Geometric Question
The original question I asked the model is depicted in Figure ˜ 4 .
## Problem 4.1
.
Let v 1 , … , v n v_{1},\dots,v_{n} be unit vectors in ℝ d \mathbb{R}^{d} such that ⟨ v i , v j ⟩ ≥ − β \langle v_{i},v_{j}\rangle\geq-\beta for every i ≠ j i\neq j , where β = 0.9 \beta=0.9 . Let g ∼ N ( 0 , I d ) g\sim N(0,I_{d}) , X i = sgn ( ⟨ g , v i ⟩ ) X_{i}=\text{sgn}(\langle g,v_{i}\rangle) , and X = ∑ i = 1 n X i X=\sum_{i=1}^{n}X_{i} . Is it always true that | X | ≥ ε n |X|\geq\varepsilon n with probability at least δ \delta , for some ε > 0 \varepsilon>0 and δ > 0 \delta>0 depending only on the dimension d d ?
Figure 4 : Original prompt for model
Using the standard second-moment methods and the fact that 𝔼 [ X i ] = 0 \mathbb{E}[X_{i}]=0 , the model reduced it to the following equivalent question about the variance of X X , which is indeed the version one needs for the Max-Cut application.
Problem 4.2
.
The variance of X X is 𝔼 [ X 2 ] = ∑ i = 1 n ∑ j = 1 n h ( ⟨ v i , v j ⟩ ) \mathbb{E}[X^{2}]=\sum_{i=1}^{n}\sum_{j=1}^{n}h(\langle v_{i},v_{j}\rangle) , where h ( t ) = 2 π arcsin ( t ) h(t)=\frac{2}{\pi}\arcsin(t) . Is 𝔼 [ X 2 ] ≥ C ( d , β ) n 2 \mathbb{E}[X^{2}]\geq C(d,\beta)n^{2} for some constant C ( d , β ) > 0 C(d,\beta)>0 ?
Due to space constraints, let me not fully elaborate on how a positive answer to ˜ 4.2 leads to an improved algorithm for Max-Cut in the bounded-dimension case. It is simple to check formally, because one can just replace Lemma 7 of [ 58 ] with the answer to ˜ 4.2 , and the rest of their (elegant) seven-page paper works as is. Intuitively, their algorithm, which is originally suggested by [ 35 ] , performs the standard rounding algorithm mentioned above, followed by the natural local improvement step; if there is a way to flip the assignment to a vertex that strictly improves the value, execute it. (The actual algorithm implements it in a more conservative way.) In this context, ˜ 4.2 seems natural, because from the perspective of a vertex v v , the more variance its neighbors have, the bigger chance there is that the standard rounding algorithm (which already guarantees an α G W \alpha_{GW} -approximation) made a wrong decision for v v and the local improvement step further improves upon it by flipping v v . In particular, the answer to ˜ 4.2 is applied to the neighborhood of every vertex.
First Response
The response in this section is fully generated by the model, except for removing standard applications of second-order methods.
Let μ = 1 n ∑ i = 1 n δ v i \mu=\frac{1}{n}\sum_{i=1}^{n}\delta_{v_{i}} be the empirical probability measure on the unit sphere S d − 1 S^{d-1} . The normalized variance is:
Q ( μ ) = 1 n 2 𝔼 [ X 2 ] = ∬ S d − 1 × S d − 1 h ( ⟨ x , y ⟩ ) 𝑑 μ ( x ) 𝑑 μ ( y ) . Q(\mu)=\frac{1}{n^{2}}\mathbb{E}[X^{2}]=\iint_{S^{d-1}\times S^{d-1}}h(\langle x,y\rangle)d\mu(x)d\mu(y).
The constraint on the vectors implies that the support S μ S_{\mu} of μ \mu satisfies ⟨ x , y ⟩ ≥ − β \langle x,y\rangle\geq-\beta for all x , y ∈ S μ x,y\in S_{\mu} . Let ℳ β ( d ) \mathcal{M}_{\beta}(d) be the set of all probability measures on S d − 1 S^{d-1} satisfying this support constraint. We define:
C ( d , β ) = inf μ ∈ ℳ β ( d ) Q ( μ ) . C(d,\beta)=\inf_{\mu\in\mathcal{M}_{\beta}(d)}Q(\mu).
Lemma 4.3
.
For finite d ≥ 1 d\geq 1 and 0 ≤ β 0 C(d,\beta)>0 .
Proof.
We proceed in three steps:
1. Compactness and Continuity:
Since d d is finite, S d − 1 S^{d-1} is compact. The space of probability measures 𝒫 ( S d − 1 ) \mathcal{P}(S^{d-1}) is compact in the weak topology. We show that ℳ β ( d ) \mathcal{M}{\beta}(d) is a closed subset. Let μ k ∈ ℳ β ( d ) \mu{k}\in\mathcal{M}_{\beta}(d) converge weakly to μ \mu .
Let S k S_{k} and S S be their respective supports. Let x , y ∈ S x,y\in S . For any ε > 0 \varepsilon>0 , the balls B ( x , ε / 2 ) B(x,\varepsilon/2) and B ( y , ε / 2 ) B(y,\varepsilon/2) have positive measure under μ \mu . By the Portmanteau theorem, for sufficiently large k k , μ k ( B ( x , ε / 2 ) ) > 0 \mu_{k}(B(x,\varepsilon/2))>0 . Thus, there exist x k ∈ S k x_{k}\in S_{k} and y k ∈ S k y_{k}\in S_{k} such that ‖ x − x k ‖ 0 a_{m}>0 for all odd m m . Let T m ( μ ) = ∫ x ⊗ m 𝑑 μ ( x ) T_{m}(\mu)=\int x^{\otimes m}d\mu(x) be the m m -th moment tensor. We can write:
Q ( μ ) = ∑ m odd a m ∬ ⟨ x , y ⟩ m 𝑑 μ ( x ) 𝑑 μ ( y ) = ∑ m odd a m ‖ T m ( μ ) ‖ 2 . Q(\mu)=\sum_{m\text{ odd}}a_{m}\iint\langle x,y\rangle^{m}d\mu(x)d\mu(y)=\sum_{m\text{ odd}}a_{m}\|T_{m}(\mu)\|^{2}.
3. Symmetry and Contradiction:
Suppose C ( d , β ) = 0 C(d,\beta)=0 . Then Q ( μ ∗ ) = 0 Q(\mu^{})=0 . Since a m > 0 a_{m}>0 , this implies T m ( μ ∗ ) = 0 T_{m}(\mu^{})=0 for all odd m m . This means ∫ P ( x ) 𝑑 μ ∗ ( x ) = 0 \int P(x)d\mu^{*}(x)=0 for any odd polynomial P ( x ) P(x) . By the Stone-Weierstrass theorem, odd polynomials are dense in the space of continuous odd functions on S d − 1 S^{d-1} (in the uniform norm).
Thus, ∫ g ( x ) 𝑑 μ ∗ ( x ) = 0 \int g(x)d\mu^{}(x)=0 for any continuous odd function g g . This implies that μ ∗ \mu^{} is a symmetric measure (i.e., μ ∗ ( A ) = μ ∗ ( − A ) \mu^{}(A)=\mu^{}(-A) ).
Let S ∗ S^{} be the support of μ ∗ \mu^{} . If μ ∗ \mu^{} is symmetric, S ∗ S^{} must be symmetric. (If x ∈ S ∗ x\in S^{} but − x ∉ S ∗ -x\notin S^{} , there is an open ball B ( − x , r ) B(-x,r) such that μ ∗ ( B ( − x , r ) ) = 0 \mu^{}(B(-x,r))=0 . By symmetry, μ ∗ ( B ( x , r ) ) = μ ∗ ( − B ( − x , r ) ) = 0 \mu^{}(B(x,r))=\mu^{}(-B(-x,r))=0 , contradicting x ∈ S ∗ x\in S^{} ).
Since S ∗ S^{} is non-empty and symmetric, there exists x ∈ S ∗ x\in S^{} such that − x ∈ S ∗ -x\in S^{*} . The support constraint requires ⟨ x , − x ⟩ ≥ − β \langle x,-x\rangle\geq-\beta . But ⟨ x , − x ⟩ = − ‖ x ‖ 2 = − 1 \langle x,-x\rangle=-|x|^{2}=-1 . So − 1 ≥ − β -1\geq-\beta or β ≥ 1 \beta\geq 1 . This contradicts the assumption β = 0.9 0 C(d,\beta)>0 . ∎
This establishes that 𝔼 [ X 2 ] ≥ C ( d , β ) n 2 \mathbb{E}[X^{2}]\geq C(d,\beta)n^{2} .
Interactions
The first response, though it looked like a natural and sound approach using geometric functional analysis, had some gaps in the proof and was not fully verifiable, partially due to my inexpertise on the topic. Two major hurdles were (1) the foundations of measure theory to establish the compactness of ℳ β ( d ) \mathcal{M}{\beta}(d) and (2) working knowledge of spherical harmonics to see how T m ( μ ∗ ) = 0 T{m}(\mu^{})=0 for all odd m m implies ∫ P ( x ) 𝑑 μ ∗ ( x ) = 0 \int P(x)d\mu^{}(x)=0 for any odd polynomial P ( x ) P(x) . Note that (2) was somewhat expected as the previous results for d = 2 , 3 d=2,3 [ 5 ] and Grothendieck problems [ 14 , 15 ] used spherical harmonics and related Gegenbauer polynomials, but the use of (1), via relaxing n n -point discrete measures to ℳ β ( d ) \mathcal{M}_{\beta}(d) , is a type of an argument rarely used in the approximation algorithms literature. In order to formally understand it, one needs to review the foundations of measure theory. For example, what is the weak topology? How is the Stone-Weierstrass theorem formally applied here? How does ∫ g ( x ) 𝑑 μ ∗ ( x ) = 0 \int g(x)d\mu^{}(x)=0 for every odd g g imply μ ∗ \mu^{} is a symmetric measure?
After several iterations of asking for the reference behind each nontrivial step and manually searching each reference and feeding it back to the model, it produced the output in the next section. Now it is understandable to me, whose only formal exposure to analysis was an undergraduate course almost twenty years ago.
Second Response
This response is fully generated by the model, except for (1) removing standard applications of second-order methods, (2) adding two footnotes further justifying some steps, and (3) minor cosmetic changes.
Foundational Theorems
We rely on the following established theorems, sourced externally (e.g., via Google Search, Wikipedia, or standard textbooks).
External Result 1 (Stone-Weierstrass Theorem (Real Version)).
Suppose X X is a compact Hausdorff space and A A is a subalgebra of C ( X , ℝ ) C(X,\mathbb{R}) which contains a non-zero constant function. Then A A is dense in C ( X , ℝ ) C(X,\mathbb{R}) (under the uniform norm) if and only if it separates points.
External Result 2 (Riesz Representation Theorem).
Let Ω \Omega be a compact Hausdorff space. The dual space C ( Ω ) ∗ C(\Omega)^{*} is isomorphic to the space of finite regular signed measures on Ω \Omega .
External Result 3 (Banach-Alaoglu Theorem).
The closed unit ball of the dual space of a normed vector space is compact in the weak* topology.
External Result 4 (Definition and Properties of Spherical Harmonics).
(Definition): Spherical harmonics of degree k k on S d − 1 S^{d-1} are the restrictions to the sphere of homogeneous harmonic polynomials of degree k k in ℝ d \mathbb{R}^{d} . Let H k H_{k} denote this space.
(Completeness): The spaces H k H_{k} are mutually orthogonal in L 2 ( S d − 1 ) L^{2}(S^{d-1}) . The algebraic sum ⨁ k = 0 ∞ H k \bigoplus_{k=0}^{\infty}H_{k} is dense in C ( S d − 1 ) C(S^{d-1}) and L 2 ( S d − 1 ) L^{2}(S^{d-1}) .
(Parity): A spherical harmonic Y k ∈ H k Y_{k}\in H_{k} has parity ( − 1 ) k (-1)^{k} . That is, Y k ( − x ) = ( − 1 ) k Y k ( x ) Y_{k}(-x)=(-1)^{k}Y_{k}(x) for x ∈ S d − 1 x\in S^{d-1} .
External Result 5 (Addition Theorem for Spherical Harmonics).
Let d ≥ 2 d\geq 2 and λ = ( d − 2 ) / 2 \lambda=(d-2)/2 . The Gegenbauer polynomial C k λ ( ⟨ x , y ⟩ ) C_{k}^{\lambda}(\langle x,y\rangle) is a positive semi-definite kernel. The energy integral
I k ( μ ) = ∬ C k λ ( ⟨ x , y ⟩ ) 𝑑 μ ( x ) 𝑑 μ ( y ) I_{k}(\mu)=\iint C_{k}^{\lambda}(\langle x,y\rangle)d\mu(x)d\mu(y)
is non-negative. I k ( μ ) = 0 I_{k}(\mu)=0 if and only if μ \mu is orthogonal to H k H_{k} .
Energy Minimization Framework
If d = 1 d=1 , the constraint forces all v i v_{i} to be identical, so | X | = n |X|=n . Assume d ≥ 2 d\geq 2 . Let μ n = 1 n ∑ δ v i \mu_{n}=\frac{1}{n}\sum\delta_{v_{i}} . The normalized variance is:
F ( μ n ) = 𝔼 [ X 2 ] n 2 = ∬ K ( ⟨ x , y ⟩ ) 𝑑 μ n ( x ) 𝑑 μ n ( y ) , F(\mu_{n})=\frac{\mathbb{E}[X^{2}]}{n^{2}}=\iint K(\langle x,y\rangle)d\mu_{n}(x)d\mu_{n}(y),
where K ( t ) = 2 π arcsin ( t ) K(t)=\frac{2}{\pi}\arcsin(t) . Let ℳ β ′ ( S d − 1 ) \mathcal{M}{\beta}^{\prime}(S^{d-1}) be the set of probability measures μ \mu on S d − 1 S^{d-1} such that for any distinct x , y x,y in the support S μ S{\mu} , ⟨ x , y ⟩ ≥ − β \langle x,y\rangle\geq-\beta . We want to show C ( d ) = inf μ ∈ ℳ β ′ F ( μ ) > 0 C(d)=\inf_{\mu\in\mathcal{M}_{\beta}^{\prime}}F(\mu)>0 .
Compactness and Continuity
We consider ℳ ( S d − 1 ) \mathcal{M}(S^{d-1}) with the weak-* topology. S d − 1 S^{d-1} is a compact metric space.
Lemma 4.4
(Compactness of ℳ ( S d − 1 ) \mathcal{M}(S^{d-1}) ) .
ℳ ( S d − 1 ) \mathcal{M}(S^{d-1}) is compact in the weak-* topology.
Proof.
By External Results 2 and 3, ℳ ( S d − 1 ) \mathcal{M}(S^{d-1}) is identified as a weak-* closed subset of the unit ball in C ( S d − 1 ) ∗ C(S^{d-1})^{*} , which is compact by Banach-Alaoglu. ∎
Lemma 4.5
(Portmanteau Lemma (Partial)) .
If μ k → μ \mu_{k}\to\mu weakly in a metric space X X , then for every open set G ⊂ X G\subset X , lim inf k μ k ( G ) ≥ μ ( G ) \liminf_{k}\mu_{k}(G)\geq\mu(G) .
Proof.
Let G G be open, F = G c F=G^{c} . Define the continuous function f m ( x ) = min ( 1 , m ⋅ dist ( x , F ) ) f_{m}(x)=\min(1,m\cdot\text{dist}(x,F)) . As m → ∞ m\to\infty , f m ( x ) ↗ 𝟏 G ( x ) f_{m}(x)\nearrow\mathbf{1}_{G}(x) .
lim inf k μ k ( G ) ≥ lim inf k ∫ f m 𝑑 μ k = ∫ f m 𝑑 μ . \liminf_{k}\mu_{k}(G)\geq\liminf_{k}\int f_{m}d\mu_{k}=\int f_{m}d\mu.
By the Monotone Convergence Theorem, as m → ∞ m\to\infty , the RHS converges to μ ( G ) \mu(G) . ∎
Lemma 4.6
.
ℳ β ′ ( S d − 1 ) \mathcal{M}_{\beta}^{\prime}(S^{d-1}) is a closed subset of ℳ ( S d − 1 ) \mathcal{M}(S^{d-1}) and thus compact.
Proof.
Let μ k ∈ ℳ β ′ → μ \mu_{k}\in\mathcal{M}{\beta}^{\prime}\to\mu . Suppose μ ∉ ℳ β ′ \mu\notin\mathcal{M}{\beta}^{\prime} . There exist distinct x , y ∈ S μ x,y\in S_{\mu} such that ⟨ x , y ⟩ 0 \mu(B_{x})>0 . By Lemma ˜ 4.5 , lim inf k μ k ( B x ) > 0 \liminf_{k}\mu_{k}(B_{x})>0 . For large k k , there exist x k , y k x_{k},y_{k} in the support of μ k \mu_{k} within B x B_{x} and B y B_{y} . Thus x k ≠ y k x_{k}\neq y_{k} . As μ k ∈ ℳ β ′ \mu_{k}\in\mathcal{M}{\beta}^{\prime} , ⟨ x k , y k ⟩ ≥ − β \langle x{k},y_{k}\rangle\geq-\beta . Contradiction. ∎
Lemma 4.7
(Product Convergence) .
If μ k → μ \mu_{k}\to\mu weakly on a compact space X X , then μ k × μ k → μ × μ \mu_{k}\times\mu_{k}\to\mu\times\mu weakly on X × X X\times X .
Proof.
Let H ∈ C ( X × X ) H\in C(X\times X) . By External Result 1, separable functions H approx ( x , y ) = ∑ f i ( x ) g i ( y ) H_{\text{approx}}(x,y)=\sum f_{i}(x)g_{i}(y) are dense. For H approx H_{\text{approx}} , the convergence holds by definition of μ k → μ \mu_{k}\to\mu . The result extends to H H by density. ∎
Lemma 4.8
.
F ( μ ) F(\mu) is continuous on ℳ ( S d − 1 ) \mathcal{M}(S^{d-1}) .
Proof.
H ( x , y ) = K ( ⟨ x , y ⟩ ) H(x,y)=K(\langle x,y\rangle) is continuous. If μ k → μ \mu_{k}\to\mu , by Lemma ˜ 4.7 , μ k × μ k → μ × μ \mu_{k}\times\mu_{k}\to\mu\times\mu . By definition of weak convergence, F ( μ k ) → F ( μ ) F(\mu_{k})\to F(\mu) .
∎
Since F F is continuous and ℳ β ′ \mathcal{M}_{\beta}^{\prime} is compact, the infimum C ( d ) C(d) is attained.
Strict Positivity via Harmonic Analysis
We show C ( d ) > 0 C(d)>0 . Let λ = ( d − 2 ) / 2 \lambda=(d-2)/2 . We write K ( t ) = ∑ k = 0 ∞ b k C k λ ( t ) K(t)=\sum_{k=0}^{\infty}b_{k}C_{k}^{\lambda}(t) . The energy is F ( μ ) = ∑ k = 0 ∞ b k I k ( μ ) F(\mu)=\sum_{k=0}^{\infty}b_{k}I_{k}(\mu) . By External Result 5, I k ( μ ) ≥ 0 I_{k}(\mu)\geq 0 .
Lemma 4.9
(Monomial Expansion Positivity) .
Let d ≥ 2 d\geq 2 (so λ ≥ 0 \lambda\geq 0 ). The expansion of t n t^{n} in the basis of Gegenbauer polynomials C k λ ( t ) C_{k}^{\lambda}(t) has strictly positive coefficients.
Proof.
Case λ > 0 \lambda>0 ( d > 2 d>2 ). The inversion formula is known to be: 1 1 1 The section author’s note: It can be proved by induction using the recurrence relation given in https://en.wikipedia.org/wiki/Gegenbauer_polynomials .
t n = n ! 2 n ∑ k = 0 ⌊ n / 2 ⌋ ( n − 2 k + λ ) λ ⋅ k ! ⋅ ( λ + 1 ) n − k C n − 2 k λ ( t ) t^{n}=\frac{n!}{2^{n}}\sum_{k=0}^{\lfloor n/2\rfloor}\frac{(n-2k+\lambda)}{\lambda\cdot k!\cdot(\lambda+1)_{n-k}}C_{n-2k}^{\lambda}(t)
where ( a ) m (a)_{m} is the rising factorial. Since λ > 0 \lambda>0 , n − 2 k ≥ 0 n-2k\geq 0 , and the factorials and Pochhammer symbols are positive, the coefficients are strictly positive.
Case λ = 0 \lambda=0 ( d = 2 d=2 ). The basis are Chebyshev polynomials T k ( t ) T_{k}(t) . The inversion formula is:
t n = 1 2 n − 1 ∑ k = 0 ⌊ n / 2 ⌋ ( n k ) ε n − 2 k T n − 2 k ( t ) t^{n}=\frac{1}{2^{n-1}}\sum_{k=0}^{\lfloor n/2\rfloor}\binom{n}{k}\varepsilon_{n-2k}T_{n-2k}(t)
where ε j \varepsilon_{j} is 1 / 2 1/2 if j = 0 j=0 and 1 1 if j > 0 j>0 . The coefficients are positive. ∎
We analyze the coefficients b k b_{k} for K ( t ) = 2 π arcsin ( t ) K(t)=\frac{2}{\pi}\arcsin(t) .
Parity: K ( t ) K(t) is odd. C k λ ( t ) C_{k}^{\lambda}(t) has the parity of k k . Thus, b k = 0 b_{k}=0 for even k k .
Positivity of b k b_{k} : arcsin ( t ) = ∑ c m t 2 m + 1 \arcsin(t)=\sum c_{m}t^{2m+1} with c m > 0 c_{m}>0 . By Lemma ˜ 4.9 , t 2 m + 1 t^{2m+1} is a sum of C 2 j + 1 λ ( t ) C_{2j+1}^{\lambda}(t) with strictly positive coefficients. Therefore, b k > 0 b_{k}>0 for all odd k k .
Suppose F ( μ ) = 0 F(\mu)=0 . Since b k > 0 b_{k}>0 (odd k k ) and I k ( μ ) ≥ 0 I_{k}(\mu)\geq 0 , we must have I k ( μ ) = 0 I_{k}(\mu)=0 for all odd k k . By External Result 5, μ \mu is orthogonal to H k H_{k} for all odd k k .
Lemma 4.10
(Parity of Polynomials and Spherical Harmonics) .
A polynomial P ( x ) P(x) restricted to S d − 1 S^{d-1} is an odd function if and only if its expansion in the basis of spherical harmonics consists solely of spherical harmonics of odd degree.
Proof.
By External Result 4 (Completeness and Definition), the space of polynomials restricted to S d − 1 S^{d-1} is the algebraic direct sum ⨁ H k \bigoplus H_{k} . Let P ( x ) = ∑ k = 0 N Y k ( x ) P(x)=\sum_{k=0}^{N}Y_{k}(x) , where Y k ∈ H k Y_{k}\in H_{k} . By External Result 4 (Parity), Y k ( − x ) = ( − 1 ) k Y k ( x ) Y_{k}(-x)=(-1)^{k}Y_{k}(x) . So P ( − x ) = ∑ k = 0 N ( − 1 ) k Y k ( x ) P(-x)=\sum_{k=0}^{N}(-1)^{k}Y_{k}(x) . If P P is odd, P ( − x ) = − P ( x ) P(-x)=-P(x) . Then ∑ k = 0 N ( ( − 1 ) k + 1 ) Y k ( x ) = 0 \sum_{k=0}^{N}((-1)^{k}+1)Y_{k}(x)=0 . Since the spaces H k H_{k} are orthogonal, they are linearly independent. Thus ( ( − 1 ) k + 1 ) Y k ( x ) = 0 ((-1)^{k}+1)Y_{k}(x)=0 for all k k . If k k is even, 2 Y k ( x ) = 0 2Y_{k}(x)=0 , so Y k = 0 Y_{k}=0 . Thus, P ( x ) P(x) consists only of odd degree harmonics. The converse is immediate. ∎
We show F ( μ ) = 0 F(\mu)=0 implies μ \mu is symmetric (i.e., ∫ f 𝑑 μ = 0 \int fd\mu=0 for any continuous odd function f f ). Let f f be continuous and odd. By External Result 1, let polynomials P n → f P_{n}\to f uniformly. The odd part Q n ( x ) = ( P n ( x ) − P n ( − x ) ) / 2 Q_{n}(x)=(P_{n}(x)-P_{n}(-x))/2 is an odd polynomial, and Q n → f Q_{n}\to f uniformly. By Lemma ˜ 4.10 , Q n Q_{n} is composed solely of spherical harmonics of odd degree. Since μ \mu is orthogonal to these, ∫ Q n ( x ) 𝑑 μ ( x ) = 0 \int Q_{n}(x)d\mu(x)=0 . By uniform convergence, ∫ f ( x ) 𝑑 μ ( x ) = 0 \int f(x)d\mu(x)=0 . Thus, μ \mu is symmetric. 2 2 2 The section author’s note: For the formal justification of this step, note that the pushforward A ∗ μ A_{}\mu for the antipodal map A ( x ) = − x A(x)=-x satisfies ∫ f 𝑑 μ = ∫ f 𝑑 A ∗ μ \int fd\mu=\int fdA_{}\mu for every continuous f f . As the Riesz Representation Theorem shows that μ \mu is fully characterized by ( ∫ f 𝑑 μ ) f ∈ C ( S d − 1 ) (\int fd\mu){f\in C(S^{d-1})} , we can conclude that μ = A ∗ μ \mu=A{*}\mu .
Now we use the constraint μ ∈ ℳ β ′ \mu\in\mathcal{M}{\beta}^{\prime} . Suppose C ( d ) = 0 C(d)=0 . The minimizer μ ∗ \mu^{} has F ( μ ∗ ) = 0 F(\mu^{})=0 , so μ ∗ \mu^{} is symmetric. Let S S be the support of μ ∗ \mu^{} . Let x ∈ S x\in S . Since μ ∗ \mu^{*} is symmetric, − x ∈ S -x\in S . Since ‖ x ‖ = 1 |x|=1 , x ≠ − x x\neq-x . By the definition of ℳ β ′ \mathcal{M}{\beta}^{\prime} , ⟨ x , − x ⟩ ≥ − β \langle x,-x\rangle\geq-\beta . This means − 1 ≥ − β -1\geq-\beta , or β ≥ 1 \beta\geq 1 .
This contradicts β = 0.9 \beta=0.9 . Therefore, C ( d ) > 0 C(d)>0 .
4.2
Computational Geometry: Steiner Trees
Written by Karthik C. S.
Problem Context
The "Simplex is the Best for Graph Embeddings" conjecture states that among all graphs with m m edges embedded into Euclidean space as characteristic vectors, the star graph configuration yields the minimum Steiner tree cost. Proving this is a key step towards establishing APX-hardness for high-dimensional Steiner Tree problems.
AI Contribution
The model’s involvement was decisive.
•
The Crucial Link: The model suggested an approach based on the Kirszbraun Extension Theorem , which guarantees the existence of Lipschitz-preserving extensions of maps between Hilbert spaces.
•
Overcoming "Elementary" Bias: Initially, the model dismissed this approach as "non-elementary". However, when the researcher clarified that an elementary proof was not required, the model successfully formalized the argument.
•
Formal Proof: The model constructed a mapping from any graph embedding to the star graph embedding and used Kirszbraun’s theorem to show that a Steiner tree for the former can be transformed into a valid tree for the latter without increasing cost.
Technical Details: Steiner Trees and Graph Embeddings
In the Euclidean Steiner Tree problem, we are given as input a set of points (called terminals ) in the ℓ 2 \ell_{2} -metric space, and the goal is to find the minimum-cost tree connecting them. Additional points from the space (called Steiner points ) can be introduced as nodes in the solution.
The seminal works of Arora [ 3 ] and Mitchell [ 75 ] provide a Polynomial Time Approximation Scheme (PTAS) for solving the Euclidean Steiner Tree problem in fixed dimensions. However, the problem remains poorly understood in higher dimensions (such as when the dimension is logarithmic in the number of terminals). Ruling out a PTAS for the problem in high dimensions is a notoriously long-standing open problem (for example, see Trevisan [ 92 ] ).
In [ 40 ] , the authors proposed various conjectures related to Steiner trees in order to make progress on understanding the inapproximability of the high-dimensional Euclidean Steiner Tree problem. Assuming these, they could prove APX-hardness of the Euclidean Steiner Tree problem in dimensions polynomial in the input size.
The simplest such conjecture, called the Simplex is the Best for Graph Embeddings Conjecture in [ 40 ] , was a special case of the widely open generalized Gilbert-Pollak conjecture [ 50 , 60 , 29 ] . In this report, we detail the resolution of this conjecture by Gemini, including the process.
First, we state the conjecture and provide the necessary context. Next, we describe our interaction with Gemini that led to its resolution. Finally, we present the proof produced by Gemini.
Research Context
A natural approach to prove the APX-hardness of the Euclidean Steiner Tree problem is via a reduction from the Vertex Cover problem on bounded degree graphs (for example, see [ 41 ] ). A candidate reduction proposed in [ 40 ] is to embed a vertex cover instance graph G = ( V , E ) G=(V,E) into ℝ | V | \mathbb{R}^{|V|} by mapping each edge { u , v } ∈ E {u,v}\in E to 𝐞 u + 𝐞 v \mathbf{e}{u}+\mathbf{e}{v} , where 𝐞 u \mathbf{e}_{u} is the standard basis vector with 1 1 in the coordinate indexed by u u and 0 elsewhere. Thus, each edge is embedded as its characteristic vector. The embedding of the set of edges incident to a single node forms the vertices of a regular simplex of side length 2 \sqrt{2} . The point configuration as a whole consists of the vertices of a regular simplicial complex (where we take the union of the simplices associated with each node).
Proving that the above reduction is gap-preserving, by analyzing the completeness and soundness cases, requires proving the following conjecture:
Conjecture 4.11
(Euclidean Steiner Tree for Regular Simplicial Complexes [ 40 ] ) .
For all constants r ∈ ( 0 , 1 ) r\in(0,1) and α ∈ ( 0 , 1 / r − 1 ) \alpha\in(0,1/r-1) , there exist constants s , β > 0 s,\beta>0 and M ∈ ℤ + M\in\mathbb{Z}^{+} sufficiently large so that, for all m ≥ M m\geq M , given a regular, unit, simplicial complex on m m vertices:
Completeness: If the vertices can be partitioned into the vertices of at most r m rm unit regular simplices, then the point configuration of the m m vertices admits a Euclidean Steiner tree of cost at most s m sm .
Soundness: If the vertices cannot be partitioned into the vertices of fewer than ( 1 + α ) r m (1+\alpha)rm unit regular simplices, then the point configuration of the m m vertices does not admit a Euclidean Steiner tree of cost less than ( 1 + β ) s m (1+\beta)sm .
As a first step toward proving the above conjecture, [ 40 ] proposed the following much simpler conjecture about Steiner ratios of specialized point-sets.
Conjecture 4.12
(Simplex is the Best for Graph Embeddings [ 40 ] ) .
Over all simple graphs with m m edges, the embedding 3 3 3 Here we allude to embedding each edge by its characteristic vector, as detailed in the aforementioned reduction from the Vertex Cover problem. of the star graph on m m edges has the minimum cost Steiner tree.
Observe that restricted to connected graphs with the same number of edges, the minimum spanning tree costs are identical. Consequently, Conjecture 4.12 over connected graphs is equivalent to finding the point configuration with the minimum Steiner ratio. In [ 40 ] , the authors verified the conjecture computationally up to m = 10 m=10 using the exact algorithm of Smith [ 90 ] and established the following structural property.
Lemma 4.13
(Fleischmann et al. [ 40 ] ) .
For any fixed m ≥ 1 m\geq 1 , the graph of size m m whose embedding (as above) has the minimum cost Steiner tree has diameter at most 2 2 .
Methodology
I requested David Woodruff to formulate a prompt to solve Conjecture 4.12 using the Google internal model, providing only the paper [ 40 ] as context.
The model proposed two approaches but did not offer a conclusive answer. The second approach involved starting from an arbitrary graph G G and applying local transformations that do not increase the optimal Steiner tree cost, eventually converging to the star graph. As we had already attempted this strategy, it offered no new insights.
The first approach, however, was based on the Kirszbraun Extension Theorem [ 66 , 93 ] . I must confess that while I was aware of various extension theorems from my work on the computation of fixed-points (see, for example, [ 45 ] ), I had not immediately seen the connection to Steiner trees. To my frustration, instead of exploring this direction further, the model noted that the proof of Kirszbraun Extension Theorem is non-elementary, and concluded that this approach would not yield an elementary proof of Conjecture 4.12 .
As I was not seeking an elementary proof and indeed, had not requested one, this comment left me slightly frustrated, yet intrigued. Consequently, I decided to run my own experiments using Gemini Ultra (utilizing the standard “thinking” mode rather than “deep think”).
Initially, I provided the exact prompt used for the internal model, but Gemini consistently declined to attempt the problem, insisting that it was an open problem that could not be solved.
Next, I removed the paper from the context and provided only the problem statement along with the relevant definitions. At first, the model suggested a hand-waving argument; however, when I pointed out the parts that required formalization, it successfully provided them. It first derived the same lemma suggested by the internal model, though this time with a proof. As the proof remained somewhat informal, I provided Gemini with the summary of the internal model’s output and with its explanation of why it had abandoned the approach. At that point, Gemini successfully formalized the lemma using the Kirszbraun Extension Theorem (the tool it had been missing earlier).
Through this process, I have learned about the power of the Kirszbraun Extension Theorem for Steiner tree computation and analysis. To the best of my knowledge, this is a new connection (yet one that feels very natural!). This discovery makes me more optimistic about the prospects of establishing the APX-hardness of computing the Euclidean Steiner Tree in high dimensions.
Gemini’s Proof of Conjecture 4.12
Theorem 4.14
.
Let f f be an embedding of a graph into Euclidean space such that for any edge { u , v } ∈ E ( G ) {u,v}\in E(G) , f ( { u , v } ) = e u + e v f({u,v})=e_{u}+e_{v} , where { e u : u ∈ V ( G ) } {e_{u}:u\in V(G)} are standard basis vectors. Let 𝖲𝗍𝖺𝗋 m \mathsf{Star}_{m} denote the star graph with m m edges, and let G G be any arbitrary graph with m m edges. Let 𝖲𝖬𝖳 ( P ) \mathsf{SMT}(P) denote the length of the Euclidean Steiner Minimal Tree of a point set P P . Then:
𝖲𝖬𝖳 ( f ( 𝖲𝗍𝖺𝗋 m ) ) ≤ 𝖲𝖬𝖳 ( f ( G ) ) . \mathsf{SMT}(f(\mathsf{Star}_{m}))\leq\mathsf{SMT}(f(G)).
Proof.
Let G = ( V G , E G ) G=(V_{G},E_{G}) be a graph with | E G | = m |E_{G}|=m edges and | V G | = k |V_{G}|=k vertices. The embedding f f maps edges to points in ℝ k \mathbb{R}^{k} . Let P G = { x 1 , … , x m } ⊂ ℝ k P_{G}={x_{1},\dots,x_{m}}\subset\mathbb{R}^{k} be the set of m m points generated by the edges of G G . Similarly, let 𝖲𝗍𝖺𝗋 m \mathsf{Star}{m} have vertex set V 𝖲𝗍𝖺𝗋 V{\mathsf{Star}} with | V 𝖲𝗍𝖺𝗋 | = m + 1 |V_{\mathsf{Star}}|=m+1 . Its embedding lies in ℝ m + 1 \mathbb{R}^{m+1} . Let P Star = { y 1 , … , y m } ⊂ ℝ m + 1 P_{\text{Star}}={y_{1},\dots,y_{m}}\subset\mathbb{R}^{m+1} be the corresponding point set generated by the edges of the star graph.
The star graph edges share a center. For any distinct pair y i , y j ∈ P 𝖲𝗍𝖺𝗋 y_{i},y_{j}\in P_{\mathsf{Star}} , the distance is:
‖ y i − y j ‖ = 2 . \|y_{i}-y_{j}\|=\sqrt{2}.
Thus, P 𝖲𝗍𝖺𝗋 P_{\mathsf{Star}} forms the vertices of a regular simplex in ℝ m + 1 \mathbb{R}^{m+1} .
For P G ⊂ ℝ k P_{G}\subset\mathbb{R}^{k} , the distance between points depends on edge adjacency, where we have for adjacent edges that ‖ x i − x j ‖ = 2 |x_{i}-x_{j}|=\sqrt{2} and for disjoint edges we have ‖ x i − x j ‖ = 2 |x_{i}-x_{j}|=2 . Thus, ∀ x i , x j ∈ P G , i ≠ j : ∥ x i − x j ∥ ≥ 2 \forall x_{i},x_{j}\in P_{G},i\neq j:|x_{i}-x_{j}|\geq\sqrt{2} .
We construct a bijection ϕ : P G → P 𝖲𝗍𝖺𝗋 \phi:P_{G}\to P_{\mathsf{Star}} . Let ϕ ( x i ) = y i \phi(x_{i})=y_{i} for i = 1 , … , m i=1,\dots,m . We verify the Lipschitz condition for ϕ \phi on the domain P G P_{G} . Consider any x i , x j ∈ P G x_{i},x_{j}\in P_{G} :
‖ ϕ ( x i ) − ϕ ( x j ) ‖ = ‖ y i − y j ‖ = 2 ≤ ‖ x i − x j ‖ . \|\phi(x_{i})-\phi(x_{j})\|=\|y_{i}-y_{j}\|=\sqrt{2}\leq\|x_{i}-x_{j}\|.
The map ϕ \phi is 1-Lipschitz on the set of terminal points P G P_{G} .
Let T G T_{G} be the Euclidean Steiner Minimal Tree for P G P_{G} . T G T_{G} is a subset of the ambient space ℝ k \mathbb{R}^{k} . We must map this tree to the ambient space of the star graph, ℝ m + 1 \mathbb{R}^{m+1} . We now use the Kirszbraun Extension Theorem [ 66 , 93 ] in the following formulation.
Theorem 4.15
(Theorem 1.31 in [ 88 ] ) .
Let H 1 H_{1} and H 2 H_{2} be Hilbert spaces. If U ⊂ H 1 U\subset H_{1} and ϕ : U → H 2 \phi:U\to H_{2} is a 1-Lipschitz map, there exists an extension Φ : H 1 → H 2 \Phi:H_{1}\to H_{2} such that Φ | U = ϕ \Phi|{U}=\phi and Φ \Phi is 1-Lipschitz on all of H 1 H{1} .
We apply this theorem with H 1 = ℝ k H_{1}=\mathbb{R}^{k} and H 2 = ℝ m + 1 H_{2}=\mathbb{R}^{m+1} . The extension Φ : ℝ k → ℝ m + 1 \Phi:\mathbb{R}^{k}\to\mathbb{R}^{m+1} satisfies:
‖ Φ ( u ) − Φ ( v ) ‖ ≤ ‖ u − v ‖ ∀ u , v ∈ ℝ k . \|\Phi(u)-\Phi(v)\|\leq\|u-v\|\quad\forall u,v\in\mathbb{R}^{k}.
Let T G T_{G} denote the Euclidean Steiner Minimal Tree for the point set P G P_{G} . We treat T G T_{G} as a graph defined by a set of vertices V T V_{T} and a set of edges E T E_{T} :
•
V T = P G ∪ S G V_{T}=P_{G}\cup S_{G} , where P G P_{G} are the terminal points and S G S_{G} are the Steiner points introduced to minimize the total length.
•
The total cost is the sum of the lengths of the edges:
𝖲𝖬𝖳 ( f ( G ) ) = 𝖫𝖾𝗇𝗀𝗍𝗁 ( T G ) = ∑ { u , v } ∈ E T ‖ u − v ‖ . \mathsf{SMT}(f(G))=\mathsf{Length}(T_{G})=\sum_{\{u,v\}\in E_{T}}\|u-v\|.
We construct a candidate tree T ′ T^{\prime} for the star graph configuration P 𝖲𝗍𝖺𝗋 P_{\mathsf{Star}} by applying the extended map Φ \Phi to every vertex in V T V_{T} . Let V T ′ = { Φ ( z ) ∣ z ∈ V T } V^{\prime}{T}={\Phi(z)\mid z\in V{T}} . Since Φ \Phi is an extension of ϕ \phi , we have Φ ( P G ) = P 𝖲𝗍𝖺𝗋 \Phi(P_{G})=P_{\mathsf{Star}} . Thus, V T ′ V^{\prime}{T} contains the required terminals P 𝖲𝗍𝖺𝗋 P{\mathsf{Star}} . For every edge { u , v } ∈ E T {u,v}\in E_{T} , we form a corresponding edge { Φ ( u ) , Φ ( v ) } {\Phi(u),\Phi(v)} in T ′ T^{\prime} . Since T G T_{G} is connected, the image graph T ′ T^{\prime} is also connected and spans P 𝖲𝗍𝖺𝗋 P_{\mathsf{Star}} . Thus, T ′ T^{\prime} is a valid Steiner tree for P 𝖲𝗍𝖺𝗋 P_{\mathsf{Star}} .
We now compare the lengths using the 1-Lipschitz property of Φ \Phi :
𝖫𝖾𝗇𝗀𝗍𝗁 ( T ′ ) = ∑ { u , v } ∈ E T ‖ Φ ( u ) − Φ ( v ) ‖ . \mathsf{Length}(T^{\prime})=\sum_{\{u,v\}\in E_{T}}\|\Phi(u)-\Phi(v)\|.
Since ‖ Φ ( u ) − Φ ( v ) ‖ ≤ ‖ u − v ‖ |\Phi(u)-\Phi(v)|\leq|u-v| for every pair u , v u,v :
𝖫𝖾𝗇𝗀𝗍𝗁 ( T ′ ) ≤ ∑ { u , v } ∈ E T ‖ u − v ‖ = 𝖫𝖾𝗇𝗀𝗍𝗁 ( T G ) . \mathsf{Length}(T^{\prime})\leq\sum_{\{u,v\}\in E_{T}}\|u-v\|=\mathsf{Length}(T_{G}).
Finally, since 𝖲𝖬𝖳 ( P 𝖲𝗍𝖺𝗋 ) \mathsf{SMT}(P_{\mathsf{Star}}) is the infimum of lengths of all valid trees connecting P 𝖲𝗍𝖺𝗋 P_{\mathsf{Star}} , it must be less than or equal to the length of our candidate tree T ′ T^{\prime} :
𝖲𝖬𝖳 ( P 𝖲𝗍𝖺𝗋 ) ≤ 𝖫𝖾𝗇𝗀𝗍𝗁 ( T ′ ) . \mathsf{SMT}(P_{\mathsf{Star}})\leq\mathsf{Length}(T^{\prime}).
Combining these inequalities:
𝖲𝖬𝖳 ( f ( 𝖲𝗍𝖺𝗋 m ) ) = 𝖲𝖬𝖳 ( P 𝖲𝗍𝖺𝗋 ) ≤ 𝖫𝖾𝗇𝗀𝗍𝗁 ( T ′ ) ≤ 𝖫𝖾𝗇𝗀𝗍𝗁 ( T G ) = 𝖲𝖬𝖳 ( f ( G ) ) . \mathsf{SMT}(f(\mathsf{Star}_{m}))=\mathsf{SMT}(P_{\mathsf{Star}})\leq\mathsf{Length}(T^{\prime})\leq\mathsf{Length}(T_{G})=\mathsf{SMT}(f(G)).
∎
4.3
Graph Theory: Computing Perfect Matchings in Regular Bipartite Graphs
Written by Ken-ichi Kawarabayashi
The main contribution of the AI here is to provide a solution to a problem in Graph Theory by pulling in tools from:
Statistical Physics (Bethe Approximation),
Number Theory (Coprime integers and integrality gaps)
Spectral Analysis (Ihara-Bass identity)
Remark after the first version is posted
After we posted the first version on ArXiv, Leonid Gurvits made a few remarks, including (i) the AI and I missed several vital citations, including improvement over Schrijver’s bound below, and (ii) the AI’s suggestion based on spectral graph theory is, indeed, an important factor for the solution of Kadison-Singer problem by Adam Marcus, Daniel Spielman and Nikhil Srivastava, which is one of the most fascinating and, perhaps one of the most important breakthroughs in mathematics of 21-st century.
Below, we add these remarks (as mentioned in “Post Remark”). We sincerely thank Leonid Gurvits for these suggestions, which make this section even more interesting.
Problem Context
The goal was to improve bounds on the number of perfect matchings (permanents) in regular bipartite graphs. In 1998, Schrijver provided a lower bound with a superior base [ 86 ] .
Theorem 4.16
(Schrijver, 1998) .
Any k k -regular bipartite graph with 2 n 2n vertices has at least
B S ( k , n ) = ( ( k − 1 ) k − 1 k k − 2 ) n B_{S}(k,n)=\left(\frac{(k-1)^{k-1}}{k^{k-2}}\right)^{n}
perfect matchings.
The base in Schrijver’s bound is best possible, as shown in [ 87 ] , and the equality for all k k is shown by Schrijver [ 86 ] . The goal is to improve not the base, but a multiplicative factor.
Post Remark.
Both the AI and I missed the following important results [ 53 , 55 ] . Gurvits has shown that the number of perfect matchings in k k -regular bipartite graphs with 2 n 2n vertices is at least
k n ( ( k − 1 k ) k − 1 ) n − k k ! k k > B S ( k , n ) . k^{n}((\frac{k-1}{k})^{k-1})^{n-k}\frac{k!}{k^{k}}>B_{S}(k,n).
Let us point out that the AI proved a slightly stronger form (in terms of the Bethe approximation, and this form may be of independent interest, slightly different from the improvement of Schrijver’s bound).
AI Contributions
We fed the Schrijver’s paper [ 86 ] to the model and asked it to improve the bound.
•
Iterative Proof Discovery via Integrality Gaps: The AI’s initial proof attempt had errors (misunderstanding of the proof in [ 86 ] and citing some wrong papers/theorems),
but through an iterative prompting process, the AI successfully connected Schrijver’s bound to the Bethe permanent [ 54 ] . By identifying a number-theoretic integrality gap, the AI established a strictly stronger lower bound (a marginal improvement, but it shows a stronger form).
I should mention that the AI demonstrated a strong capacity for self-correction. Acting as its own adversarial peer reviewer (like “STOC” review), it critiqued its own intermediate proofs, identifying and correcting not only typographical errors but substantive logical flaws.
•
Generalization via Exact Bounds:
The AI analyzed cubic bipartite graphs to extract exact boundary constants. By recovering Voorhoeve’s exact differential formula, it successfully extracted a strict asymptotic improvement factor.
•
A New Perspective: Finally, the AI suggested a strategy based on spectral graph theory to improve further. This insight was not obvious to the researchers as it was a consequence of the “worst” case analysis (i.e., by considering Ramanujan/expander graphs, the AI brought the Alon-Boppana theorem [ 77 ] and Kesten-McKay laws [ 64 , 74 ] )). Although the AI’s improvement is marginal (though it shows a slightly stronger form), this part is perhaps interesting.
In addition, this collaboration could serve as a prime example of AI-driven cross-pollination: to make progress on a pure graph-theoretic question, the AI autonomously synthesized tools from statistical physics, number theory, and spectral analysis.
•
Post Remark:
It turns out that the AI’s suggestion for a strategy based on spectral graph theory to improve further was an essential factor for the solution of the Kadison-Singer problem by Marcus, Spielman, and Srivastava [ 72 ] .
Thus, this suggestion appears highly promising, given the prior success. Indeed, it is surprising that, although their work is not directly connected to counting perfect matchings in regular bipartite graphs, the AI has already identified one of the most successful approaches in 21st-century mathematics.
Technical Details
For an n × n n\times n matrix 𝐀 \mathbf{A} , let a i , j a_{i,j} be the entry at the i i th row and j j th column. The permanent of 𝐀 \mathbf{A} is defined as
perm ( 𝐀 ) = ∑ σ ∈ 𝒫 ∏ i = 1 n a i , σ ( i ) \text{perm}(\mathbf{A})=\sum_{\sigma\in\mathcal{P}}\prod_{i=1}^{n}a_{i,\sigma(i)}
(2)
where 𝒫 \mathcal{P} is the set of all permutations of { 1 , 2 , … , n } {1,2,\ldots,n} . Computing the permanent is known to be #P-complete [ 94 ] .
We are interested in the case when 𝐀 \mathbf{A} is the bi-adjacency matrix of a k k -regular bipartite graph G G with 2 n 2n vertices. In this case, perm ( 𝐀 ) \text{perm}(\mathbf{A}) equals the number of perfect matchings in G G (such a bipartite graph always has a perfect matching by Hall’s theorem (see [ 70 ] ).
A famous result concerning the lower bound is the resolution of the Van der Waerden conjecture, proved by Falikman [ 33 ] and Egorychev [ 32 ] , which asserts that the number of perfect matchings for a k k -regular bipartite graph H H with 2 n 2n vertices is at least
k n n ! n n . \frac{k^{n}n!}{n^{n}}.
This bound is approximately 2 π n ( k / e ) n \sqrt{2\pi n}(k/e)^{n} .
In 1998, Schrijver provided a different lower bound with a superior base [ 86 ] , as in Theorem 4.16 . Two new proofs are given in [ 25 , 53 ] . The base in Schrijver’s bound is best possible, as shown in [ 87 ] , and the equality for all k k is shown by Schrijver [ 86 ] . Let p ( k , n ) p(k,n) be the minimum number of perfect matchings in any k k -regular bipartite graph with 2 n 2n vertices. As mentioned above, we cannot improve the base, but Schrijver mentioned:
[ Question 1 ] It might be possible to derive a sharper lower bound for p ( k , n ) p(k,n) with the methods of the present paper.
This is the question the model is asked. Below, we basically present the AI’s answers and writeups.
Post Remark.
The AI proved a slightly stronger form (in terms of the Bethe approximation, and this form may be of independent interest (a bit different progress from improvement of Shrijver’s bound)), but, as mentioned above, this question was already answered by Gurvits [ 53 , 55 ] .
A slightly better bound on (Q1)
The model answers Q1 affirmatively for k ≥ 2 k\geq 2 (but small improvement) by utilizing the connection between Schrijver’s bound and the Bethe approximation, relying on results by Gurvits [ 54 ] and Vontobel [ 95 ] (so the AI gives a stronger form). The proof and writeup below is given by the AI and its “revision” suggested by the AI when reviewing its proof and writeup (at the end, we (KK) will make remarks on the proof and writeup).
Theorem 4.17
.
For k ≥ 2 k\geq 2 and n ≥ 1 n\geq 1 , p ( k , n ) > B S ( k , n ) p(k,n)>B_{S}(k,n) . In fact, we show a stronger form: p ( k , n ) = perm ( A G ) > perm B ( A G ) = B S ( k , n ) p(k,n)=\mathrm{perm}(A_{G})>\mathrm{perm}{\mathrm{B}}(A{G})=B_{S}(k,n) , where A G A_{G} is the bi-adjacency matrix of a k k -regular bipartite graph G G with 2 n 2n vertices, and perm B ( A G ) = Perm Bethe ( G ) \mathrm{perm}{\mathrm{B}}(A{G})=\text{Perm}_{\text{Bethe}}(G) is defined below.
Proof.
The proof relies on establishing the relationship between the true permanent, the Bethe permanent, and Schrijver’s bound.
Step 1: Schrijver’s Bound is the Bethe Approximation. The connection is fundamental. As derived by Gurvits [ 54 ] (specifically in Section 5.3 ), for any k k -regular bipartite graph G G , Schrijver’s bound B S ( k , n ) B_{S}(k,n) corresponds exactly to the Bethe approximation of its permanent, denoted Perm Bethe ( G ) \text{Perm}{\text{Bethe}}(G) (for the precise definition, see below). Gurvits calculates the functional corresponding to the Bethe free energy at the uniform point and shows that it evaluates exactly to B S ( k , n ) B{S}(k,n) for regular graphs (see below for a proof).
Below, we prove the following to show Step 1.
Lemma 4.18
.
For any k k -regular bipartite graph G G , Schrijver’s bound B S ( k , n ) B_{S}(k,n) corresponds exactly to the Bethe approximation of its permanent, Perm Bethe ( G ) \mathrm{Perm}_{\mathrm{Bethe}}(G) .
Proof.
We aim to show that B S ( k , n ) = Perm Bethe ( G ) B_{S}(k,n)=\mathrm{Perm}_{\mathrm{Bethe}}(G) .
Step 1: Define the Matrix and Bounds. Let G G be a k k -regular bipartite graph. We consider the associated n × n n\times n normalized bi-adjacency matrix P P . Since G G is k k -regular, P ( i , j ) = 1 / k P(i,j)=1/k if an edge exists between i i and j j , and 0 otherwise. The matrix P P is doubly stochastic ( P ∈ Ω n P\in\Omega_{n} , i.e., Birkhoff polytope of doubly stochastic matrices).
Since matrix P P is normalized by k k , Schrijver’s bound for the permanent of such a matrix P P is given by:
B S ( k , n ) = ( k − 1 k ) ( k − 1 ) n B_{S}(k,n)=\left(\frac{k-1}{k}\right)^{(k-1)n}
Note that the bound here is different from the one in Theorem 4.16 because of the scaling.
The Bethe approximation of the permanent, Perm Bethe ( P ) \mathrm{Perm}{\mathrm{Bethe}}(P) , is defined in the context of Gurvits’ paper as exp ( max Q ∈ Ω n C W ( P , Q ) ) \exp(\max{Q\in\Omega_{n}}CW(P,Q)) (see below for details).
Step 2: Simplifying the Bethe Approximation for Doubly Stochastic P. The functional C W ( P , Q ) CW(P,Q) is defined as:
C W ( P , Q ) = ∑ i , j [ ( 1 − Q ( i , j ) ) log ( 1 − Q ( i , j ) ) − Q ( i , j ) log ( Q ( i , j ) P ( i , j ) ) ] CW(P,Q)=\sum_{i,j}\left[(1-Q(i,j))\log(1-Q(i,j))-Q(i,j)\log\left(\frac{Q(i,j)}{P(i,j)}\right)\right]
When the matrix P P is itself doubly stochastic (which is true for the normalized bi-adjacency matrix of a k k -regular graph), the maximum of C W ( P , Q ) CW(P,Q) is attained when Q = P Q=P .
Let’s evaluate C W ( P , P ) CW(P,P) :
C W ( P , P ) \displaystyle CW(P,P)
= ∑ i , j [ ( 1 − P ( i , j ) ) log ( 1 − P ( i , j ) ) − P ( i , j ) log ( P ( i , j ) P ( i , j ) ) ] \displaystyle=\sum_{i,j}\left[(1-P(i,j))\log(1-P(i,j))-P(i,j)\log\left(\frac{P(i,j)}{P(i,j)}\right)\right]
= ∑ i , j [ ( 1 − P ( i , j ) ) log ( 1 − P ( i , j ) ) − P ( i , j ) log ( 1 ) ] \displaystyle=\sum_{i,j}\left[(1-P(i,j))\log(1-P(i,j))-P(i,j)\log(1)\right]
= ∑ i , j ( 1 − P ( i , j ) ) log ( 1 − P ( i , j ) ) \displaystyle=\sum_{i,j}(1-P(i,j))\log(1-P(i,j))
Therefore, the Bethe approximation is:
Perm Bethe ( P ) = exp ( C W ( P , P ) ) = ∏ i , j ( 1 − P ( i , j ) ) 1 − P ( i , j ) \mathrm{Perm}_{\mathrm{Bethe}}(P)=\exp(CW(P,P))=\prod_{i,j}(1-P(i,j))^{1-P(i,j)}
This quantity is denoted as F ( P ) F(P) in the paper [ 54 ] (Equation 14).
Step 3: Evaluating F(P) for the k-regular Case. We calculate F ( P ) F(P) for the specific matrix P P derived from the k k -regular graph. Let us remind that the entries of P P are P ( i , j ) ∈ { 0 , 1 / k } P(i,j)\in{0,1/k} .
We evaluate the product term by term:
If P ( i , j ) = 0 P(i,j)=0 , the contribution is ( 1 − 0 ) 1 − 0 = 1 (1-0)^{1-0}=1 .
If P ( i , j ) = 1 / k P(i,j)=1/k , the contribution is:
( 1 − 1 k ) 1 − 1 k = ( k − 1 k ) k − 1 k \left(1-\frac{1}{k}\right)^{1-\frac{1}{k}}=\left(\frac{k-1}{k}\right)^{\frac{k-1}{k}}
Since the graph is k k -regular, every row (and column) of P P has exactly k k non-zero entries. Thus, the total number of non-zero entries in the n × n n\times n matrix P P is k n kn .
Step 4: Final Calculation. The total value of F ( P ) F(P) is the product of the contributions of the k n kn non-zero entries:
F ( P ) = [ ( k − 1 k ) k − 1 k ] k n F(P)=\left[\left(\frac{k-1}{k}\right)^{\frac{k-1}{k}}\right]^{kn}
We simplify the exponent:
k − 1 k ⋅ k n = ( k − 1 ) n \frac{k-1}{k}\cdot kn=(k-1)n
Therefore,
Perm Bethe ( P ) = F ( P ) = ( k − 1 k ) ( k − 1 ) n \mathrm{Perm}_{\mathrm{Bethe}}(P)=F(P)=\left(\frac{k-1}{k}\right)^{(k-1)n}
This exactly matches Schrijver’s bound B S ( k , n ) B_{S}(k,n) . ∎
Step 2: The Inequality. Gurvits ( [ 54 ] , Theorem 2.2 or 2 in Section 3 (Corollaries)) established that the true permanent is lower-bounded by the Bethe approximation:
perm ( A G ) ≥ Perm Bethe ( G ) . \mathrm{perm}(A_{G})\geq\text{Perm}_{\text{Bethe}}(G).
Lemma 4.19
(Gurvits [ 54 ] ) .
The true permanent is lower-bounded by the Bethe approximation.
Combining this with Step 1, we confirm that perm ( A G ) ≥ B S ( k , n ) \mathrm{perm}(A_{G})\geq B_{S}(k,n) .
Below, we use Perm Bethe ( G ) \text{Perm}{\text{Bethe}}(G) as perm B ( A G ) \mathrm{perm}{B}(A_{G}) , where A G A_{G} is the bi-adjacency matrix of G G .
Step 3: Strict Inequality (Condition for Equality). To prove the theorem, we must show the inequality is strict ( > > ). This requires characterizing the condition for equality. We rely on the analysis provided by Vontobel [ 95 ] .
1. Exactness on Forests (Sufficient Condition).
The Bethe permanent is defined via the minimization of the Bethe free energy, which corresponds to the fixed points of the Sum-Product Algorithm (SPA). In Section VI-F (p. 17) of [ 95 ] , under the heading “Relevance of Finite Graph Covers,” Vontobel explicitly establishes the exactness of the SPA on cycle-free graphs:
“If the NFG [Normal Factor Graph] 𝖭 ( θ ) \mathsf{N}(\theta) had no cycles, then the SPA could be used to exactly compute the partition function… the partition function Z G ( 𝖭 ( θ ) ) = perm ( θ ) Z_{G}(\mathsf{N}(\theta))=\mathrm{perm}(\theta) could be computed…”
By Definition 11 in [ 95 ] , perm B ( 𝜽 ) \mathrm{perm}{\mathrm{B}}(\boldsymbol{\theta}) is defined as this partition function at the SPA fixed point. Thus, if G G is a forest, perm ( 𝜽 ) = perm ( A G ) = perm B ( 𝜽 ) = perm B ( A G ) \mathrm{perm}(\boldsymbol{\theta})=\mathrm{perm}(A{G})=\mathrm{perm}{\mathrm{B}}(\boldsymbol{\theta})=\mathrm{perm}{\mathrm{B}}(A_{G}) , where A G A_{G} is the bi-adjacency matrix of G G .
2. Strict Inequality for Regular Graphs ( k ≥ 2 k\geq 2 ).
Let G G be a k k -regular bipartite graph with 2 n 2n vertices ( n ≥ 1 n\geq 1 ). We show that perm ( A G ) > perm B ( A G ) \mathrm{perm}(A_{G})>\mathrm{perm}{\mathrm{B}}(A{G}) . First, we recall Theorem 49 in [ 95 ] (attributed to Gurvits, see the above lemma), which establishes the lower bound: perm ( 𝜽 ) ≥ perm B ( 𝜽 ) \mathrm{perm}(\boldsymbol{\theta})\geq\mathrm{perm}{\mathrm{B}}(\boldsymbol{\theta}) . Indeed, in this case perm ( 𝜽 ) = perm ( A G ) ≥ perm B ( 𝜽 ) = perm B ( A G ) \mathrm{perm}(\boldsymbol{\theta})=\mathrm{perm}(A{G})\geq\mathrm{perm}{\mathrm{B}}(\boldsymbol{\theta})=\mathrm{perm}{\mathrm{B}}(A_{G}) 4 4 4 When G G is a k k -regular bipartite graph with 2 n 2n vertices, we can assume N ( 𝜽 ) = G N(\boldsymbol{\theta})=G . Hence, we only need to look at the bi-adjacency matrix A G A_{G} of G G . . To prove strictness, we distinguish two cases based on the degree k k .
Case 1: k = 2 k=2 (Disjoint Cycles). A 2-regular bipartite graph is a union of disjoint cycles.
•
Bethe Value: Using Example 41 (p. 14) and Lemma 40 in [ 95 ] , Vontobel proves that for a cycle, perm B ( 𝜽 ) = perm B ( A G ) \mathrm{perm}{\mathrm{B}}(\boldsymbol{\theta})=\mathrm{perm}{\mathrm{B}}(A_{G}) calculates the maximum weight of a local matching. For a standard 2-regular graph (0-1 entries), this value is 1 1 .
•
True Value: A cycle of length 2 n 2n has exactly two perfect matchings. Thus perm ( A G ) ≥ 2 \mathrm{perm}(A_{G})\geq 2 .
Since 2 > 1 2>1 , the strict inequality holds.
Case 2: k ≥ 3 k\geq 3 (Integrality Gap). We utilize a number-theoretic argument. The true permanent perm ( A G ) \mathrm{perm}(A_{G}) counts the number of perfect matchings in G G , so it must be an integer . In contrast, Vontobel [ 95 ] derives the explicit value of the Bethe permanent for a homogeneous k k -regular matrix in Section VII-E (p. 21, Eq. 13):
perm B ( 𝜽 ) = ( ( k − 1 ) k − 1 k k − 2 ) n . \mathrm{perm}_{\mathrm{B}}(\boldsymbol{\theta})=\left(\frac{(k-1)^{k-1}}{k^{k-2}}\right)^{n}.
(3)
Again, in this case perm B ( 𝜽 ) = perm B ( A G ) \mathrm{perm}{\mathrm{B}}(\boldsymbol{\theta})=\mathrm{perm}{\mathrm{B}}(A_{G}) , as a homogeneous k k -regular matrix is indeed the bi-adjacency matrix A G A_{G} of a k k -regular bipartite graph G G with 2 n 2n vertices. We observe that for any integer k ≥ 3 k\geq 3 , this value is not an integer:
The base fraction is b = ( k − 1 ) k − 1 k k − 2 b=\frac{(k-1)^{k-1}}{k^{k-2}} .
Since k ≥ 3 k\geq 3 , the denominator k k − 2 > 1 k^{k-2}>1 . Let p p be any prime factor of k k .
gcd ( k , k − 1 ) = 1 \gcd(k,k-1)=1 , so p p does not divide k − 1 k-1 . Thus, p p cannot divide the numerator ( k − 1 ) k − 1 (k-1)^{k-1} .
Therefore, the fraction b b is irreducible and non-integer. It follows that b n b^{n} is never an integer for any n ≥ 1 n\geq 1 .
Since perm ( 𝜽 ) = perm ( A G ) \mathrm{perm}(\boldsymbol{\theta})=\mathrm{perm}(A_{G}) is an integer and perm B ( 𝜽 ) = perm B ( A G ) \mathrm{perm}{\mathrm{B}}(\boldsymbol{\theta})=\mathrm{perm}{\mathrm{B}}(A_{G}) is strictly non-integer, they cannot be equal. Combining this with the lower bound perm ( A G ) ≥ perm B ( A G ) \mathrm{perm}(A_{G})\geq\mathrm{perm}{\mathrm{B}}(A{G}) , we conclude:
perm ( A G ) > perm B ( A G ) . \mathrm{perm}(A_{G})>\mathrm{perm}_{\mathrm{B}}(A_{G}).
∎
Remark 4.20
.
While the AI generated a complete proof of this strict inequality (detailed above), human review revealed that the core logic could be streamlined: because k k and k − 1 k-1 are consecutive, they are strictly coprime. This makes the Bethe approximation an irreducible, non-integer rational for k ≥ 3 k\geq 3 . Since the true permanent must be an integer, it must be strictly greater than this rational.
Post Remark.
Regarding Bethe approximation, the best achievement in this context is 2 n \sqrt{2}^{n} deterministic polynomial time approximation of the permanent of nonnegative matrices [ 2 ] . Moreover, the solution of the asymptotic version of Friedland’s monomer-dimer conjecture, which is a generalization of Schrijver’s bound on matching of size p n pn for 0
Further Discussion on Improvement (Q1)
While Theorem ˜ 4.17 proves that Schrijver’s bound is never tight for k ≥ 2 k\geq 2 , a deeper interpretation of Q1 asks whether the asymptotic improvement factor C k ∞ C_{k}^{\infty} (i.e., p ( k , n ) / B S ( k , n ) p(k,n)/B_{S}(k,n) ) is strictly greater than 1.
Case k = 2 k=2
For k = 2 k=2 , B S ( 2 , n ) = 1 B_{S}(2,n)=1 . A 2-regular bipartite graph is a disjoint union of even cycles. The minimum number of perfect matchings occurs for a connected graph (a single cycle C 2 n C_{2n} ), which has 2 perfect matchings. Thus p ( 2 , n ) = 2 p(2,n)=2 .
C 2 ∞ = 2 . C_{2}^{\infty}=2.
Case k = 3 k=3
Theorem 4.21
.
For cubic graphs, the asymptotic improvement factor is strictly bounded below by C 3 ∞ ≥ 81 32 ≈ 2.53125 > 1 C_{3}^{\infty}\geq\frac{81}{32}\approx 2.53125>1 .
Proof.
Schrijver’s Bound: Schrijver’s lower bound for k = 3 k=3 [ 86 ] is defined as B S ( 3 , n ) = ( ( 3 − 1 ) 3 − 1 3 3 − 2 ) n = ( 4 3 ) n B_{S}(3,n)=\left(\frac{(3-1)^{3-1}}{3^{3-2}}\right)^{n}=\left(\frac{4}{3}\right)^{n} .
The Exact Voorhoeve Lower Bound [ 96 ] (1979): It is a foundational result that the absolute minimum number of perfect matchings in any 3-regular bipartite graph with 2 n 2n vertices ( n ≥ 3 n\geq 3 ) is strictly bounded below by the exact differential formula:
perm ( G ) ≥ 6 ( 4 3 ) n − 3 \text{perm}(G)\geq 6\left(\frac{4}{3}\right)^{n-3}
(Note: Schrijver [ 86 ] utilized Voorhoeve’s [ 96 ] local differential tree-pruning arguments to generalize the base to all k k , but absorbed the boundary conditions into O ( 1 ) O(1) terms for asymptotic simplicity. By retrieving the exact boundary constants from Voorhoeve’s original differential equation, we recover the explicit asymptotic surplus).
Evaluating the Asymptotic Ratio: Evaluating the infimum ratio between the true deterministic limit and Schrijver’s bound as n → ∞ n\to\infty :
C 3 ∞ = lim inf n → ∞ min G perm ( G ) B S ( 3 , n ) ≥ lim n → ∞ 6 ( 4 3 ) n − 3 ( 4 3 ) n = 6 ( 3 4 ) 3 = 6 ( 27 64 ) = 81 32 = 2.53125 C_{3}^{\infty}=\liminf_{n\to\infty}\min_{G}\frac{\text{perm}(G)}{B_{S}(3,n)}\geq\lim_{n\to\infty}\frac{6\left(\frac{4}{3}\right)^{n-3}}{\left(\frac{4}{3}\right)^{n}}=6\left(\frac{3}{4}\right)^{3}=6\left(\frac{27}{64}\right)=\frac{81}{32}=2.53125
∎
Remark 4.22
.
The human researcher noted that the crucial insight in the AI’s derivation was the rigorous extraction of the O ( 1 ) O(1) boundary constants. While Schrijver’s general asymptotic formulation omitted these for simplicity, the AI recognized that retrieving Voorhoeve’s exact bound allows for the recovery of the explicit asymptotic surplus.
Post Remark.
The k = 3 k=3 bound corresponds to Gurvits’ bound above [ 53 ] .
An AI-Proposed Spectral Roadmap
Below, the AI provides some attempts; To tackle the general k ≥ 4 k\geq 4 case, the AI autonomously recognized the limitations of local combinatorial methods, identifying a ’Harmonic Divergence’. Remarkably, it proposed shifting the paradigm entirely to spectral graph theory. By invoking the Ihara-Bass identity and the Kesten-McKay spectral measure for Ramanujan graphs, the AI reframed the combinatorial roadblock as an analytic singularity, outlining a highly non-trivial roadmap for future research (That said, if spectral graph theory (by considering Ramanujan/expander graphs) plays a role, this is interesting).
Proving C k ∞ > 1 C_{k}^{\infty}>1 remains a significant open problem. The graphs that minimize the number of perfect matchings are expected to be those with the largest possible girth (i.e., Ramanujan/expander graphs), as they locally resemble the infinite k k -regular tree T k T_{k} , for which the Bethe approximation (Schrijver’s bound) is exact (see below too).
Heuristic arguments based on correlation decay suggest that the convergence of the normalized log-permanent of a graph G G to the tree value depends exponentially on the girth g ( G ) g(G) . Since the maximum girth grows logarithmically with n n , this suggests that the ratio Perm ( G ) / B S ( k , n ) \text{Perm}(G)/B_{S}(k,n) might tend to a constant greater than 1. However, rigorously identifying the structure of the minimizing graphs and proving these convergence rates are challenging open questions.
The Attempt: By the Bethe Loop Calculus (Vontobel [ 95 ] ), the ratio of the true permanent to Schrijver’s bound is a cycle gas partition function:
perm ( G ) B S ( k , n ) = 1 + ∑ S ∈ ℰ ( k − 1 ) − | E ( S ) | \frac{\text{perm}(G)}{B_{S}(k,n)}=1+\sum_{S\in\mathcal{E}}(k-1)^{-|E(S)|}
where ℰ \mathcal{E} is the set of vertex-disjoint cycle unions. The difficulty: In Ramanujan graphs, g ( n ) ≈ 2 log k − 1 ( n ) → ∞ g(n)\approx 2\log_{k-1}(n)\to\infty . Thus, the weight of any individual shortest cycle becomes ( k − 1 ) − 2 log k − 1 n = 1 n 2 (k-1)^{-2\log_{k-1}n}=\frac{1}{n^{2}} . As n → ∞ n\to\infty , individual cycle contributions vanish to zero. If one naively sums all cycles, the number of cycles of length ℓ \ell in a regular graph grows asymptotically as ( k − 1 ) ℓ ℓ \frac{(k-1)^{\ell}}{\ell} . When we sum the expected weights of all cycles to find the gap, we get:
∑ ℓ = g ( n ) ∞ Count ( ℓ ) ⋅ Weight ( ℓ ) ≈ ∑ ℓ ( k − 1 ) ℓ ℓ ( k − 1 ) − ℓ = ∑ ℓ 1 ℓ \sum_{\ell=g(n)}^{\infty}\text{Count}(\ell)\cdot\text{Weight}(\ell)\approx\sum_{\ell}\frac{(k-1)^{\ell}}{\ell}(k-1)^{-\ell}=\sum_{\ell}\frac{1}{\ell}
This yields the divergent Harmonic Series ( ∑ 1 ℓ → ∞ \sum\frac{1}{\ell}\to\infty ). Thus, local combinatorial correlation decay fails to bound the n → ∞ n\to\infty limit, trapping researchers in an analytic divergence. The remaining open question is the exact asymptotic value of C k ∞ C_{k}^{\infty} for k ≥ 4 k\geq 4 . Overcoming the Harmonic Divergence requires moving from standard Combinatorics to Spectral Graph Theory and the geometry of Zeta Functions.
The Crux: The Ihara Zeta Singularity
By the Ihara-Bass identity, the infinite cycle sum for the Permanent on regular graphs can be analytically continued as the reciprocal determinant of the Non-Backtracking Matrix B B :
perm ( G ) B S ( k , n ) ∝ [ det ( I − 1 k − 1 B ) ] − 1 / 2 \frac{\text{perm}(G)}{B_{S}(k,n)}\propto\left[\det\left(I-\frac{1}{k-1}B\right)\right]^{-1/2}
The gap preventing the completion of the general proof is a spectral singularity. For any k k -regular graph, the matrix B B possesses a trivial Perron-Frobenius eigenvalue of λ = k − 1 \lambda=k-1 corresponding to traversing all edges. Substituting this fundamental eigenvalue into the determinant yields:
det ( I − I ) = 0 \det(I-I)=0
The theoretical limit C k ∞ C_{k}^{\infty} resides exactly at this 0 / 0 0/0 spectral singularity. Attempting to pass the limit locally without deflating this pole results in a harmonic explosion.
Because resolving this singularity requires theorems on the spectrum of the Non-Backtracking matrix (e.g., the Alon-Boppana theorem, see [ 77 ] , and the Kesten-McKay laws [ 64 , 74 ] ), which are non-elementary, this portion of the conjecture must remain an identified gap, ready for future spectral exploration.
Note that the Ihara-Bass identity relates the non-backtracking matrix B B to the adjacency matrix A A via det ( I − u B ) = ( 1 − u 2 ) ( k − 2 ) n det ( I − u A + u 2 ( k − 1 ) I ) \det(I-uB)=(1-u^{2})^{(k-2)n}\det(I-uA+u^{2}(k-1)I) . Evaluating the second determinant at the Bethe pole u = 1 k − 1 u=\frac{1}{k-1} yields det ( I − 1 k − 1 A + 1 k − 1 I ) = det ( k I − A k − 1 ) \det(I-\frac{1}{k-1}A+\frac{1}{k-1}I)=\det(\frac{kI-A}{k-1}) . The factors corresponding to the eigenvalues λ \lambda of A A are thus k − λ k − 1 \frac{k-\lambda}{k-1} .
The remaining limit will be determined by integrating the non-trivial eigenvalues over the Kesten-McKay spectral measure [ 64 , 74 ] , μ K M ( λ ) \mu_{KM}(\lambda) of the infinite k k -regular tree:
ln ( C k ∞ ) ∝ − 1 2 ∫ − 2 k − 1 2 k − 1 ln ( k − λ k − 1 ) 𝑑 μ K M ( λ ) \ln(C_{k}^{\infty})\propto-\frac{1}{2}\int_{-2\sqrt{k-1}}^{2\sqrt{k-1}}\ln\left(\frac{k-\lambda}{k-1}\right)d\mu_{KM}(\lambda)
By the Alon-Boppana theorem, the non-trivial eigenvalues of high-girth graphs are strictly bounded by 2 k − 1 2\sqrt{k-1} . Since 2 k − 1
Post Remark.
It turns out that the AI’s suggestion for a strategy to shift the paradigm to spectral graph theory was an important factor for the solution of the Kadison-Singer problem by Marcus, Spielman, and Srivastava (MSS) [ 72 ] , which is, perhaps, one of the most important breakthroughs in 21st-century mathematics.
Indeed, there is a similar flavor in the first paper of MSS’s work on Ramunajan graphs [ 71 ] . It is worth noting that, rather than spectral graph theory and linear operators, MSS took a different direction: real stability, which eventually led to a solution of the Kadison-Singer problem [ 72 ] .
It is surprising that, given that their work is not directly connected to counting perfect matchings in regular bipartite graphs, the AI has already identified one of the most successful approaches in mathematics of the 21st century. That is, the last question: we need to write the discrete spectral sum for finite graphs (indeed, Ramanujan/expander graphs) with 2 n 2n vertices more accurately.
Thus, this suggestion appears highly promising, given the prior success.
5
Using an AI-integrated IDE to “vibe-code”
In this section, we explore a different modality of interaction: embedding an LLM directly into a LaTeX integrated development environment (IDE). This setup allows the researcher to act as a high-level orchestrator while the AI autonomously drafts and refines technical proofs, effectively "vibe-coding" a research paper from scratch.
5.1
Search vs. Decision in S 2 P S_{2}^{P}
Written by Lance Fortnow.
Problem Context
The complexity class S 2 P S_{2}^{P} represents problems solvable by a game between two competing provers. A fundamental question is the relationship between the decision version (does a strategy exist?) and the search version (find the strategy). It was known that decision is in 𝖹𝖯𝖯 𝖭𝖯 \mathsf{ZPP}^{\mathsf{NP}} (Cai 2001), but the status of search was unclear.
AI Contribution
The researcher used an AI-powered IDE to write a paper on this topic from scratch.
•
"Vibe-Coding" a Paper: The researcher provided high-level prompts (e.g., "Plan a paper showing finding an S 2 P S_{2}^{P} witness is equivalent to 𝖳𝖥𝖭𝖯 𝖭𝖯 \mathsf{TFNP}^{\mathsf{NP}} ").
•
Autonomous Proof Discovery: The AI independently generated the proof of the main equivalence.
•
Self-Correction: When the AI made an incorrect assumption in a corollary (assuming a containment that is open), the researcher pointed it out, and the AI immediately corrected the proof to use a reduction instead.
Experiment Narrative
The story starts with a talk by Purdue philosophy of science professor Eamon Duede [ 30 ] . Duede argued that if artificial intelligence makes writing papers much easier, researchers will publish less important research results. He worried about an explosion of AI-generated mediocre research papers, but I saw the chance to get research out there that wouldn’t be there otherwise. AI itself can help sort through the papers.
But how about the premise itself, can AI simplify writing a quality paper, especially one that has a result that might not have otherwise been published? So I picked one of my theorems that I never wrote up and decided to try it out myself.
In short, the process worked even better than I expected. I created the paper [ 42 ] with eight prompts, not counting requests to compile. Surprisingly, it came up with the proof of the main result on its own though it did assume something it shouldn’t have for the corollary, which went away when I pointed it out.
I give the detailed discussions below followed by a few lessons. At the end, I give an informal description of the paper itself for those so inclined.
Details of the Process
I used Google Antigravity, an AI integrated development environment, with Gemini 3 Pro as the AI model. I had successfully used Antigravity to fix some old links on my blog, so why not try it for this paper? I used an IDE instead of a chatbot as I see L a T e X as code that produces a paper.
Below is a transcript between myself and Antigravity, lightly edited for clarity. The plan.md and paper.tex files mentioned below are available in a GitHub repository [ 43 ] .
Let’s plan a paper showing that finding a S 2 P S_{2}^{P} witness is equivalent to TFNP NP \mathrm{TFNP}^{\mathrm{NP}} .
The paper’s author is Lance Fortnow (me) and should be written in my style.
I’ve drafted a plan for the paper, outlining the argument that finding an S 2 P S_{2}^{P} witness is equivalent to TFNP NP \mathrm{TFNP}^{\mathrm{NP}} .
The plan is saved in plan.md .
The core argument rests on two directions:
S 2 P S_{2}^{P} -Search ∈ TFNP NP \in\mathrm{TFNP}^{\mathrm{NP}} : This is the straightforward direction.
Verifying a witness (strategy) requires checking a universal quantifier ( ∀ z P ( x , y , z ) \forall z,P(x,y,z) ), which is a coNP query. Since S 2 P S_{2}^{P} guarantees a witness always exists, the problem is total.
TFNP NP ⊆ S 2 P \mathrm{TFNP}^{\mathrm{NP}}\subseteq S_{2}^{P} -Search: This is the main technical contribution.
We need to show that any total search problem with an NP-oracle verifier can be reduced to the game-theoretic setting of S 2 P S_{2}^{P} . This likely involves encoding the P NP \mathrm{P}^{\mathrm{NP}} verification circuit into the symmetric alternation structure.
Next Steps
•
Review the Plan: Does the structure in plan.md look right to you?
•
Start the Draft: I can initialize a paper.tex file with this structure and your standard preamble.
•
Flesh out the Proofs: We can start sketching the reduction for the hard direction ( TFNP NP ⊆ S 2 P \mathrm{TFNP}^{\mathrm{NP}}\subseteq S_{2}^{P} -Search) in more detail.
Not a bad start and it has the right ideas for the proof. But it doesn’t mention Cai’s work or the search versus decision angle.
Don’t forget to cite Cai’s paper that S 2 P S_{2}^{P} is in ZPP NP \mathrm{ZPP}^{\mathrm{NP}} .
Add as a corollary that reducing search to decision for S 2 P S_{2}^{P} would put Σ 2 P ∩ Π 2 P \Sigma_{2}^{P}\cap\Pi_{2}^{P} in ZPP NP \mathrm{ZPP}^{\mathrm{NP}} .
I’ve updated the plan to include the corollary and the citation for Jin-Yi Cai’s paper.
Updates
•
Section 4: Added the corollary: “If search reduces to decision for S 2 P S_{2}^{P} , then Σ 2 P ∩ Π 2 P ⊆ ZPP NP \Sigma_{2}^{P}\cap\Pi_{2}^{P}\subseteq\mathrm{ZPP}^{\mathrm{NP}} .”
•
References: Added “Cai, J.-Y. (2001). S 2 P ⊆ ZPP NP S_{2}^{P}\subseteq\mathrm{ZPP}^{\mathrm{NP}} .”
The plan is now updated in plan.md . Ready to proceed with drafting the content whenever you are.
Mention the corollary in the abstract, and note the stark difference between search and decision.
I’ve updated the abstract in plan.md to include the corollary and emphasize the stark difference between the search and decision complexities of S 2 P S_{2}^{P} , specifically referencing Cai’s result. The plan is now fully updated with your requirements. Would you like to proceed with creating the LaTeX file for the paper?
Proceed.
I’ve created the paper.tex file with the content we discussed. It includes the abstract, introduction, preliminaries, the equivalence proof sketch, the corollary regarding search-to-decision, and the references. You can now compile this LaTeX file to generate the PDF. Would you like me to try and compile it for you (if a LaTeX environment is available), or is there anything else you’d like to add to the paper first?
I had expected to have to walk through the proof with the IDE, but it came up with the proof on its own. That surprised me. It isn’t a technically difficult theorem but it has never been written up before. But there was a mistaken assumption in the corollary.
It’s open whether Σ 2 P ∩ Π 2 P ⊆ S 2 P \Sigma_{2}^{P}\cap\Pi_{2}^{P}\subseteq S_{2}^{P} .
The point is that Σ 2 P ∩ Π 2 P \Sigma_{2}^{P}\cap\Pi_{2}^{P} can be reduced to TFNP NP \mathrm{TFNP}^{\mathrm{NP}} .
I’ve corrected the proof in paper.tex . It now correctly states that the search problem for Σ 2 P ∩ Π 2 P \Sigma_{2}^{P}\cap\Pi_{2}^{P} is in TFNP NP \mathrm{TFNP}^{\mathrm{NP}} , which (via our main result) connects it to S 2 P S_{2}^{P} -Search and subsequently ZPP NP \mathrm{ZPP}^{\mathrm{NP}} . Is there anything else you’d like to refine in the paper?
That’s all it took to fix the proof of the corollary. Still Gemini got a little lazy in the write up.
Avoid statements like “it is known” and put in details instead.
I’ve expanded the proof in paper.tex to explicitly detail why the search problem for Σ 2 P ∩ Π 2 P \Sigma_{2}^{P}\cap\Pi_{2}^{P} falls into TFNP NP \mathrm{TFNP}^{\mathrm{NP}} , replacing the “it is known” statement. The proof now walks through the definitions of membership in Σ 2 P \Sigma_{2}^{P} and Π 2 P \Pi_{2}^{P} , the existence of witnesses for either case, and the verification complexity using an NP oracle. Is there anything else you’d like to modify or add?
Similarly give the details of predicate Q Q .
I’ve updated the paper with the detailed construction of the predicate Q Q and the logic for the reduction.
This should make the proof much more rigorous.
The paper now contains:
Equivalence of S 2 P S_{2}^{P} -Search and TFNP NP \mathrm{TFNP}^{\mathrm{NP}} .
Detailed reduction construction.
Corollary connecting search-to-decision for S 2 P S_{2}^{P} with Σ 2 P ∩ Π 2 P ⊆ ZPP NP \Sigma_{2}^{P}\cap\Pi_{2}^{P}\subseteq\mathrm{ZPP}^{\mathrm{NP}} .
Corrected proof for the corollary.
Is there anything else you’d like to refine?
The paper at this point was pretty well written but I wanted to change the emphasis to search versus decision.
Change the title to “Search versus Decision for S 2 P S_{2}^{P} ” and adjust the abstract and introduction appropriately.
I have read paper.tex and understood the necessary changes. Now I am creating the implementation plan. I have created an implementation plan to update the paper’s title, abstract, and introduction. Please review it. I have verified the changes in paper.tex . The title, abstract, and introduction have been correctly updated. Now I am creating the walkthrough artifact. I have updated the paper’s title, abstract, and introduction as requested. You can review the changes in paper.tex .
I had been using a separate L a T e X editor to compile paper. Maybe I didn’t need to.
Can you convert the tex file to pdf?
I have successfully compiled paper.tex to paper.pdf . You can find the output at paper.pdf .
The citations weren’t going to the latest journal versions.
Check Google Scholar that we have the latest versions of the papers in the citation.
I have checked the citations on Google Scholar. I found updated journal versions for Canetti (IPL 1996) and Cai (JCSS 2007). Russell–Sundaram (1998) and Megiddo–Papadimitriou (1991) are already correct. I found updated journal versions for Canetti and Cai. I have created a plan to update the citations. Please review it. I have verified the updated citations in paper.tex . They are correct. I have updated the citations for Canetti and Cai to their journal versions and recompiled the paper. You can check paper.pdf for the latest version.
Finally I should come clean that I had AI write this paper.
Add an acknowledgment section: “While the results are fully due to the author, this paper was generated using the large language model Gemini 3 Pro with prompting from the author.
The author takes full responsibility for its contents.”
I have added the acknowledgment section and recompiled the paper. You can check paper.pdf for the final version.
And that’s how I "vibe-coded" a research paper in eight prompts, not counting the two used to compile. I did some light editing afterwards to get it ready for the arXiv submission but I didn’t need to.
Lessons
I did this as an experiment on a result that may never have seen the light of day otherwise, and I was fully open about how I had AI write the paper. Nevertheless, it felt wrong, like I cheated somehow. I felt a similar way when I first used L a T e X in the 1980s, a paper that looked far better than it deserved. After that all papers looked the same, and maybe with AI all papers will read the same.
The experience felt similar to working with a graduate student writing their first research paper. I would just make suggestions until they got it right.
In AI coding you can get better behavior when you give detailed instructions using markdown files like the plan.md that Gemini created for me. I could have taken the approach by creating a markdown file myself, instead of having AI create one for me. I could have a separate file that describes how I personally like papers written. This might lead to a system where you write mathematical papers in L a T e X without ever looking at the L a T e X produced and the markdown files become the true paper source.
Is low-friction research paper writing good for science? It’s a question that philosophers like Duede contemplate. But I see no one suggesting we go back to quill and scroll.
Informal Theorem Description
The complexity class 𝖲 2 𝖯 \mathsf{S}{2}^{\mathsf{P}} [ 18 , 82 ] can be thought of as an exponential-sized 0-1 matrix with the promise that either (a) there is a row of all ones or (b) a column of all zeros (you can’t have both). The decision problem for 𝖲 2 𝖯 \mathsf{S}{2}^{\mathsf{P}} is to tell whether (a) or (b) holds, and the search problem is to find the appropriate row or column. An 𝖭𝖯 \mathsf{NP} oracle is an extra capability to ask about existential questions, like does a specified row have a zero. Jin-Yi Cai [ 17 ] showed that the decision problem can be solved by a randomized algorithm with access to an 𝖭𝖯 \mathsf{NP} oracle. But his proof did not necessarily find the row or column. The new result shows that the search problem is equivalent to a likely harder problem, total search problems verifiable with access to an NP oracle, a class called 𝖳𝖥𝖭𝖯 𝖭𝖯 \mathsf{TFNP}^{\mathsf{NP}} .
6
Autonomous Verification and Neuro-Symbolic Loops
Standard text-based chat interfaces are fundamentally limited by the AI’s tendency to hallucinate during long symbolic derivations. This section highlights a methodological leap: embedding the AI in an automated "neuro-symbolic" loop where it autonomously writes and executes code to numerically verify its own mathematical hypotheses, effectively pruning its own invalid reasoning branches.
6.1
Physics: Cosmic String Spectra
Written by Michael P. Brenner, Vincent Cohen-Addad, and David P. Woodruff .
Problem Context
Predicting the gravitational radiation emitted by cosmic strings requires solving a specific, notoriously difficult integral over the sphere. Cosmic strings are hypothetical one-dimensional topological defects that may have formed during symmetry-breaking phase transitions in the early universe. The study of these strings as sources of gravitational radiation has seen renewed interest following recent observations of a stochastic gravitational wave background by Pulsar Timing Arrays.
A critical quantity in predicting this radiation is the power spectrum P N P_{N} of the N N -th harmonic emitted by a cosmic string loop. For the well-studied class of Garfinkle-Vachaspati strings, the power emitted at frequency ω N = 4 π N / L \omega_{N}=4\pi N/L (where L L is the loop length) is governed by a core integral I ( N , α ) I(N,\alpha) evaluated over the unit sphere S 2 S^{2} :
P N = 32 G μ 2 π 3 N 2 I ( N , α ) P_{N}=\frac{32G\mu^{2}}{\pi^{3}N^{2}}I(N,\alpha)
(4)
where μ \mu is the mass per unit length. The core integral I ( N , α ) I(N,\alpha) is defined as:
I ( N , α ) = ∫ S 2 𝑑 Ω [ 1 − ( − 1 ) N cos ( N π e 1 ) ] [ 1 − ( − 1 ) N cos ( N π e 2 ) ] ( 1 − e 1 2 ) ( 1 − e 2 2 ) I(N,\alpha)=\int_{S^{2}}d\Omega\frac{[1-(-1)^{N}\cos(N\pi e_{1})][1-(-1)^{N}\cos(N\pi e_{2})]}{(1-e_{1}^{2})(1-e_{2}^{2})}
(5)
where e 1 = 𝐫 ^ ⋅ 𝐚 ^ e_{1}=\mathbf{\hat{r}}\cdot\mathbf{\hat{a}} and e 2 = 𝐫 ^ ⋅ 𝐛 ^ e_{2}=\mathbf{\hat{r}}\cdot\mathbf{\hat{b}} are projection factors. The vectors 𝐚 ^ \mathbf{\hat{a}} and 𝐛 ^ \mathbf{\hat{b}} are three-dimensional unit vectors characterizing the string trajectory, and the problem is defined entirely by the loop opening angle α \alpha between them.
Evaluating this integral has been a persistent roadblock in theoretical astrophysics. The integrand features severe singularities at the poles ( e 1 , 2 = ± 1 e_{1,2}=\pm 1 ). Furthermore, as N N grows large, the integrand in angular coordinates becomes highly oscillatory—so spiky that it resembles a sea urchin—rendering standard numerical integration grids highly unstable and computationally prohibitive. Analytical expansions (e.g., using standard Legendre polynomials) are exceedingly difficult due to the non-matching weight functions ( 1 − e 2 ) − 1 (1-e^{2})^{-1} in the denominator. Previous human and AI-assisted efforts yielded only partial solutions for odd N N or asymptotic approximations for large N N . Finding a unified, exact, closed-form analytical solution for arbitrary loop geometries remained a significant open problem. We do give an exact closed form analytical solution (with an infinite number of terms), though we do not give a simple expression for the asymptotics at large N for any alpha. While the formulae discovered are all in the form of infinite expansions of known functions, this can be used to derive effective asymptotic formulae. We will report on this in forthcoming work.
AI Contribution
To tackle this integral, we deployed a hybrid neuro-symbolic system combining the Gemini Deep Think reasoning engine with a systematic Tree Search (TS) algorithm. This interaction highlighted three remarkable capabilities of modern AI as an active research partner:
•
Automated Pruning and Grounded Verification: To bridge the gap between symbolic mathematical manipulation and ground-truth verification, we utilized an automated numerical feedback loop. At each node in the search space, the model proposed an intermediate mathematical expression in L a T e X and autonomously generated an executable Python function to evaluate it. The TS algorithm scored the node against a high-precision numerical baseline. If the proposed expression exhibited numerical instability (e.g., catastrophic cancellation) or divergence, the evaluation harness caught the exception and injected the Python traceback directly back into Gemini’s context window. This successfully pruned over 80% of the approximately 600 candidate branches early, keeping the LLM strictly grounded in mathematical reality.
•
Methodological Diversity via Negative Prompting: The system was highly adept at finding alternative mathematical routes. Once the model successfully found a valid solution path, we utilized negative prompting to force broader methodological exploration. By explicitly instructing the model, “ One way of solving this problem is to use the following method… DO NOT use this method. Reflect on your plan and try a different plan ,” the AI autonomously discovered six distinct analytical methods to solve the integral.
•
Hierarchical Refinement and Self-Correction: The most profound moment came during the final verification stage. The TS framework initially produced an exact solution for our preferred method (Method 6) expressed as an infinite tail sum of coefficients. We then passed this intermediate result to a larger, unconstrained version of Gemini Deep Think, asking it to rigorously verify the proofs and search for further simplifications. This advanced model independently spotted an algebraic oversight in a related recurrence (Method 5, where a denominator dependency had been missed). By correcting this, it established a mathematical equivalence between the two methods, and brilliantly recognized that a localized recurrence structure allowed the infinite tail sum to telescope into a finite, closed-form expression.
Technical Details: The AI’s Discovery Process
To demonstrate the depth of the AI’s mathematical exploration, we detail the progression of the six methods it discovered. The AI first recast the integral into the following general form over the unit sphere S 2 S^{2} :
I ( N , α ) = ∫ S 2 𝑑 Ω ( 𝐮 ) f N ( 𝐮 ⋅ 𝐳 ) f N ( 𝐮 ⋅ 𝐚 ) I(N,\alpha)=\int_{S^{2}}d\Omega(\mathbf{u})\,f_{N}(\mathbf{u}\cdot\mathbf{z})\,f_{N}(\mathbf{u}\cdot\mathbf{a})
(6)
where 𝐳 \mathbf{z} and 𝐚 \mathbf{a} are unit vectors with 𝐳 ⋅ 𝐚 = cos α \mathbf{z}\cdot\mathbf{a}=\cos\alpha . The function f N ( t ) f_{N}(t) is defined as:
f N ( t ) = 1 − ( − 1 ) N cos ( N π t ) 1 − t 2 , t ∈ [ − 1 , 1 ] . f_{N}(t)=\frac{1-(-1)^{N}\cos(N\pi t)}{1-t^{2}},\quad t\in[-1,1].
(7)
For convergence at the poles t = ± 1 t=\pm 1 , N N must be an integer, and we define A = N π A=N\pi .
Class I: Monomial Basis Approaches (The Unstable Solutions)
The AI first explored expanding f N ( t ) f_{N}(t) in a Taylor series monomial basis { t 2 k } {t^{2k}} . It found the coefficients d 2 k d_{2k} by expanding the right-hand side using the cosine power series:
( 1 − t 2 ) ∑ k = 0 ∞ d 2 k t 2 k = 1 − ( − 1 ) N ∑ m = 0 ∞ ( − 1 ) m ( A t ) 2 m ( 2 m ) ! . (1-t^{2})\sum_{k=0}^{\infty}d_{2k}t^{2k}=1-(-1)^{N}\sum_{m=0}^{\infty}\frac{(-1)^{m}(At)^{2m}}{(2m)!}.
(8)
Matching coefficients of t 2 k t^{2k} yields the recurrence d 2 k − d 2 k − 2 = − ( − 1 ) N ( − 1 ) k A 2 k ( 2 k ) ! d_{2k}-d_{2k-2}=-(-1)^{N}\frac{(-1)^{k}A^{2k}}{(2k)!} , implying:
d 2 k = − ( − 1 ) N ∑ j = 1 k ( − 1 ) j A 2 j ( 2 j ) ! + ( 1 − ( − 1 ) N ) . d_{2k}=-(-1)^{N}\sum_{j=1}^{k}\frac{(-1)^{j}A^{2j}}{(2j)!}+(1-(-1)^{N}).
(9)
Substituting this expansion into the integral yields a double sum:
I ( N , α ) = ∑ k = 0 ∞ ∑ j = 0 ∞ d 2 k d 2 j J 2 k , 2 j ( α ) where J k , l ( α ) = ∫ S 2 ( 𝐮 ⋅ 𝐳 ) k ( 𝐮 ⋅ 𝐚 ) l 𝑑 Ω . I(N,\alpha)=\sum_{k=0}^{\infty}\sum_{j=0}^{\infty}d_{2k}d_{2j}J_{2k,2j}(\alpha)\quad\text{where}\quad J_{k,l}(\alpha)=\int_{S^{2}}(\mathbf{u}\cdot\mathbf{z})^{k}(\mathbf{u}\cdot\mathbf{a})^{l}d\Omega.
(10)
To evaluate the angular moments J k , l ( α ) J_{k,l}(\alpha) , the AI discovered three separate sub-methods:
Method 1: Generating Function Approach. The AI defined a generating function G ( λ , μ ) = ∫ S 2 e λ 𝐮 ⋅ 𝐳 + μ 𝐮 ⋅ 𝐚 𝑑 Ω G(\lambda,\mu)=\int_{S^{2}}e^{\lambda\mathbf{u}\cdot\mathbf{z}+\mu\mathbf{u}\cdot\mathbf{a}}d\Omega . Letting 𝐊 = λ 𝐳 + μ 𝐚 \mathbf{K}=\lambda\mathbf{z}+\mu\mathbf{a} and aligning the polar axis with 𝐊 \mathbf{K} , the AI integrated to find:
G ( λ , μ ) = 2 π e K − e − K K = 4 π sinh K K , G(\lambda,\mu)=2\pi\frac{e^{K}-e^{-K}}{K}=4\pi\frac{\sinh K}{K},
(11)
where K 2 = λ 2 + μ 2 + 2 λ μ cos α K^{2}=\lambda^{2}+\mu^{2}+2\lambda\mu\cos\alpha . The AI then expanded sinh K / K \sinh K/K :
sinh K K = ∑ s = 0 ∞ K 2 s ( 2 s + 1 ) ! = ∑ s = 0 ∞ ( λ 2 + μ 2 + 2 λ μ cos α ) s ( 2 s + 1 ) ! . \frac{\sinh K}{K}=\sum_{s=0}^{\infty}\frac{K^{2s}}{(2s+1)!}=\sum_{s=0}^{\infty}\frac{(\lambda^{2}+\mu^{2}+2\lambda\mu\cos\alpha)^{s}}{(2s+1)!}.
(12)
This allowed the AI to compute J 2 k , 2 j J_{2k,2j} via differentiation [ ∂ λ 2 k ∂ μ 2 j G ] λ = μ = 0 \left[\partial_{\lambda}^{2k}\partial_{\mu}^{2j}G\right]_{\lambda=\mu=0} , resulting in an explicit sum involving factorials and powers of cos α \cos\alpha .
Method 2: Gaussian Integral Lifting. Alternatively, the AI lifted the basic equation into ℝ 3 \mathbb{R}^{3} by introducing a Gaussian weight:
M = ∫ ℝ 3 e − r 2 ( 𝐫 ⋅ 𝐳 ) 2 k ( 𝐫 ⋅ 𝐚 ) 2 j d 3 𝐫 . M=\int_{\mathbb{R}^{3}}e^{-r^{2}}(\mathbf{r}\cdot\mathbf{z})^{2k}(\mathbf{r}\cdot\mathbf{a})^{2j}d^{3}\mathbf{r}.
(13)
By switching to spherical coordinates 𝐫 = r 𝐮 \mathbf{r}=r\mathbf{u} , the radial integral separates and evaluates to 1 2 Γ ( k + j + 3 / 2 ) \frac{1}{2}\Gamma(k+j+3/2) , implying M = 1 2 Γ ( k + j + 3 / 2 ) J 2 k , 2 j M=\frac{1}{2}\Gamma(k+j+3/2)J_{2k,2j} . Simultaneously, applying the differential operator identity ( 𝐫 ⋅ 𝐳 ) 2 k = ∂ λ 2 k | λ = 0 e λ 𝐫 ⋅ 𝐳 (\mathbf{r}\cdot\mathbf{z})^{2k}=\partial_{\lambda}^{2k}|_{\lambda=0}e^{\lambda\mathbf{r}\cdot\mathbf{z}} directly to the standard Gaussian integral allowed the AI to evaluate M M analytically:
M = π 3 / 2 ∂ λ 2 k ∂ μ 2 j | λ = μ = 0 exp ( λ 2 + μ 2 + 2 λ μ cos α 4 ) . M=\pi^{3/2}\partial_{\lambda}^{2k}\partial_{\mu}^{2j}\bigg|_{\lambda=\mu=0}\exp\left(\frac{\lambda^{2}+\mu^{2}+2\lambda\mu\cos\alpha}{4}\right).
(14)
Equating the two representations of M M isolates J 2 k , 2 j J_{2k,2j} .
Method 3: Hybrid Coordinate Transformation. The AI’s third approach projected the power series onto a Legendre basis P 2 m ( t ) P_{2m}(t) , expanding t 2 k = ∑ m = 0 k 𝒯 k , m P 2 m ( t ) t^{2k}=\sum_{m=0}^{k}\mathcal{T}{k,m}P{2m}(t) , where 𝒯 k , m \mathcal{T}{k,m} are known analytical coefficients. Substituting this back into the Taylor expansion yielded f N ( t ) = ∑ m = 0 ∞ ( ∑ k = m ∞ d 2 k 𝒯 k , m ) P 2 m ( t ) f{N}(t)=\sum_{m=0}^{\infty}\left(\sum_{k=m}^{\infty}d_{2k}\mathcal{T}{k,m}\right)P{2m}(t) .
AI Self-Correction: While mathematically correct, the AI’s automated Python verification routines revealed that all three Monomial methods were highly unstable. The calculation of the Taylor coefficients d 2 k d_{2k} involves alternating sums of massive numbers. For large N N , this triggers catastrophic O ( e N π ) O(e^{N\pi}) cancellation, rendering the methods computationally intractable without arbitrary-precision libraries. Recognizing this, the AI autonomously pivoted to Spectral methods.
Class II: Spectral Basis Approaches (The Pivot to Stability)
The AI recognized that because I ( N , α ) I(N,\alpha) is a spherical self-convolution, expanding f N ( t ) = ∑ C 2 j P 2 j ( t ) f_{N}(t)=\sum C_{2j}P_{2j}(t) (where P P are standard Legendre polynomials) allows the direct application of the Funk-Hecke Convolution Theorem to diagonalize the integral:
I ( N , α ) = 4 π ∑ j = 0 ∞ C 2 j 2 4 j + 1 P 2 j ( cos α ) . I(N,\alpha)=4\pi\sum_{j=0}^{\infty}\frac{C_{2j}^{2}}{4j+1}P_{2j}(\cos\alpha).
(15)
The problem now reduced to finding the Legendre coefficients C 2 j C_{2j} efficiently without relying on the unstable Taylor series. The AI found two stable O ( N ) O(N) methods to do this:
Method 4: Spectral Galerkin (Matrix Method). The AI formulated a linear system by defining g ( t ) = ( 1 − t 2 ) f N ( t ) = 1 − ( − 1 ) N cos ( A t ) g(t)=(1-t^{2})f_{N}(t)=1-(-1)^{N}\cos(At) . Substituting f N = ∑ j C 2 j P 2 j f_{N}=\sum_{j}C_{2j}P_{2j} and projecting onto a test function P 2 i P_{2i} yielded:
∑ j C 2 j ∫ − 1 1 ( 1 − t 2 ) P 2 i ( t ) P 2 j ( t ) 𝑑 t = ∫ − 1 1 P 2 i ( t ) g ( t ) 𝑑 t . \sum_{j}C_{2j}\int_{-1}^{1}(1-t^{2})P_{2i}(t)P_{2j}(t)dt=\int_{-1}^{1}P_{2i}(t)g(t)dt.
(16)
This forms a matrix equation 𝐆𝐂 = 𝐛 \mathbf{G}\mathbf{C}=\mathbf{b} . Using the recurrence identity t 2 P l = A l P l + 2 + B l P l + C l P l − 2 t^{2}P_{l}=A_{l}P_{l+2}+B_{l}P_{l}+C_{l}P_{l-2} , the AI deduced an explicit formula for the matrix elements G i j G_{ij} and proved that 𝐆 \mathbf{G} is a symmetric positive-definite tridiagonal matrix. The RHS vector 𝐛 \mathbf{b} was evaluated using the Bauer plane wave expansion to yield b i = 2 δ i 0 − 2 ( − 1 ) N + i j 2 i ( A ) b_{i}=2\delta_{i0}-2(-1)^{N+i}j_{2i}(A) , allowing the coefficients to be solved with high stability.
Method 5: Spectral Volterra (Recurrence Method). Pushing further, the AI derived a forward recurrence for C 2 j C_{2j} . Starting from C l = 2 l + 1 2 ∫ − 1 1 f N ( t ) P l ( t ) 𝑑 t = 2 l + 1 2 γ l C_{l}=\frac{2l+1}{2}\int_{-1}^{1}f_{N}(t)P_{l}(t)dt=\frac{2l+1}{2}\gamma_{l} , the AI multiplied the Legendre differential equation by f N ( t ) f_{N}(t) and integrated by parts:
l ( l + 1 ) γ l = ∫ − 1 1 f N ′ ( t ) ( 1 − t 2 ) P l ′ ( t ) 𝑑 t . l(l+1)\gamma_{l}=\int_{-1}^{1}f_{N}^{\prime}(t)(1-t^{2})P_{l}^{\prime}(t)\,dt.
(17)
Differentiating ( 1 − t 2 ) f N ( t ) = 1 − ( − 1 ) N cos ( A t ) (1-t^{2})f_{N}(t)=1-(-1)^{N}\cos(At) gives ( 1 − t 2 ) f N ′ ( t ) = ( − 1 ) N A sin ( A t ) + 2 t f N ( t ) (1-t^{2})f_{N}^{\prime}(t)=(-1)^{N}A\sin(At)+2tf_{N}(t) . Substituting this back into the integral brilliantly split the result into two tractable terms, T 1 ( l ) T_{1}(l) and T 2 ( l ) T_{2}(l) :
l ( l + 1 ) γ l = ∫ − 1 1 ( − 1 ) N A sin ( A t ) P l ′ ( t ) 𝑑 t ⏟ T 1 ( l ) + ∫ − 1 1 2 t f N ( t ) P l ′ ( t ) 𝑑 t ⏟ T 2 ( l ) . l(l+1)\gamma_{l}=\underbrace{\int_{-1}^{1}(-1)^{N}A\sin(At)P_{l}^{\prime}(t)dt}_{T_{1}(l)}+\underbrace{\int_{-1}^{1}2tf_{N}(t)P_{l}^{\prime}(t)dt}_{T_{2}(l)}.
(18)
The AI evaluated T 1 ( 2 j ) T_{1}(2j) via spherical Bessel functions as − 2 A 2 ( − 1 ) N + j j 2 j ( A ) -2A^{2}(-1)^{N+j}j_{2j}(A) . It expanded T 2 ( 2 j ) T_{2}(2j) into a telescoping sum structure, yielding a clean O ( N ) O(N) forward recurrence:
4 ( 2 j 2 − j ) 4 j + 1 C 2 j = T 1 ( 2 j ) + S R ( j ) , \frac{4(2j^{2}-j)}{4j+1}C_{2j}=T_{1}(2j)+S_{R}(j),
(19)
where S R ( j ) = ∑ m = 0 j − 1 ( 8 m + 2 ) γ 2 m S_{R}(j)=\sum_{m=0}^{j-1}(8m+2)\gamma_{2m} is a running sum of lower-order coefficients.
Class III: The Exact Analytic Solution (Method 6)
While Methods 4 and 5 provided stable algorithmic resolutions, the crowning achievement was finding an exact analytic closed-form solution via the Gegenbauer Method .
The AI insightfully chose to expand f N ( t ) f_{N}(t) not in standard Legendre polynomials, but in the basis of Gegenbauer polynomials C 2 m ( 3 / 2 ) ( t ) C_{2m}^{(3/2)}(t) :
f N ( t ) = ∑ m = 0 ∞ b 2 m C 2 m ( 3 / 2 ) ( t ) . f_{N}(t)=\sum_{m=0}^{\infty}b_{2m}C_{2m}^{(3/2)}(t).
(20)
It realized that Gegenbauer polynomials are orthogonal with respect to the specific weight w ( t ) = 1 − t 2 w(t)=1-t^{2} . This specific weight miraculously cancels the problematic singular denominator in f N ( t ) f_{N}(t) when determining the coefficients by orthogonality:
b 2 m = 1 h 2 m ∫ − 1 1 [ 1 − ( − 1 ) N cos ( N π t ) 1 − t 2 ] ⏟ f N ( t ) C 2 m ( 3 / 2 ) ( t ) ( 1 − t 2 ) ⏟ weight 𝑑 t , b_{2m}=\frac{1}{h_{2m}}\int_{-1}^{1}\underbrace{\left[\frac{1-(-1)^{N}\cos(N\pi t)}{1-t^{2}}\right]}_{f_{N}(t)}C_{2m}^{(3/2)}(t)\underbrace{(1-t^{2})}_{\text{weight}}dt,
(21)
where h 2 m h_{2m} is the standard normalization constant. By using the identity C k ( 3 / 2 ) ( t ) = P k + 1 ′ ( t ) C_{k}^{(3/2)}(t)=P_{k+1}^{\prime}(t) and integrating by parts, the boundary terms vanish (because the numerator 1 − ( − 1 ) N cos ( N π t ) 1-(-1)^{N}\cos(N\pi t) evaluates to 0 at t = ± 1 t=\pm 1 ), and the integral elegantly reduces to the Fourier transform of Legendre polynomials:
b 2 m = − A ( − 1 ) N + m j 2 m + 1 ( A ) 4 m + 3 ( 2 m + 1 ) ( 2 m + 2 ) . b_{2m}=-A(-1)^{N+m}j_{2m+1}(A)\frac{4m+3}{(2m+1)(2m+2)}.
(22)
Using the expansion C 2 m ( 3 / 2 ) ( t ) = ∑ j = 0 m ( 4 j + 1 ) P 2 j ( t ) C_{2m}^{(3/2)}(t)=\sum_{j=0}^{m}(4j+1)P_{2j}(t) , the AI then relates the Legendre coefficients C 2 j C_{2j} (from Equation 15 ) to these Gegenbauer coefficients via a tail sum:
C 2 j = ( 4 j + 1 ) ∑ m = j ∞ b 2 m , C_{2j}=(4j+1)\sum_{m=j}^{\infty}b_{2m},
(23)
noting a critical property: when j = 0 j=0 , the sum of all Gegenbauer coefficients simply equals C 0 C_{0} :
C 0 = ∑ m = 0 ∞ b 2 m . C_{0}=\sum_{m=0}^{\infty}b_{2m}.
(24)
Finally, during the hierarchical refinement phase, the advanced Gemini model exploited a partial fraction decomposition of 1 1 − t 2 = 1 2 ( 1 1 − t + 1 1 + t ) \frac{1}{1-t^{2}}=\frac{1}{2}\left(\frac{1}{1-t}+\frac{1}{1+t}\right) to telescope the series and find an exact analytical expression for C 0 C_{0} :
C 0 = 1 2 ∫ − 1 1 1 − ( − 1 ) N cos ( N π t ) 1 − t 2 𝑑 t = 1 2 ∫ 0 2 N π 1 − cos ( x ) x 𝑑 x . C_{0}=\frac{1}{2}\int_{-1}^{1}\frac{1-(-1)^{N}\cos(N\pi t)}{1-t^{2}}\,dt=\frac{1}{2}\int_{0}^{2N\pi}\frac{1-\cos(x)}{x}\,dx.
(25)
Recognizing this final integral as the standard definition of the generalized cosine integral function Cin ( z ) ≡ ∫ 0 z 1 − cos ( t ) t 𝑑 t \text{Cin}(z)\equiv\int_{0}^{z}\frac{1-\cos(t)}{t}dt , the AI produced the final, exact closed-form solution:
C 0 = 1 2 Cin ( 2 N π ) . C_{0}=\frac{1}{2}\text{Cin}(2N\pi).
(26)
Combining, we obtain
C 2 j = ( 4 j + 1 ) ( C 0 + A ( − 1 ) N ∑ m = 0 j − 1 ( − 1 ) m j 2 m + 1 ( A ) 4 m + 3 ( 2 m + 1 ) ( 2 m + 2 ) ) . C_{2j}=(4j+1)\left(C_{0}+A(-1)^{N}\sum_{m=0}^{j-1}(-1)^{m}j_{2m+1}(A)\frac{4m+3}{(2m+1)(2m+2)}\right).
(27)
Summary of Discovered Methods
The table below summarizes the AI’s discoveries and their computational tradeoffs. This AI-driven breakthrough provided a completely stable, closed-form analytic solution for the spectral coefficients.
Method
Core Technique
Complexity
Numerical Stability
1, 2, 3 Monomial Expansions O ( N 2 ) O(N^{2})
Unstable (Catastrophic Cancellation)
4 Spectral Galerkin Matrix O ( N ) O(N)
Stable (Tridiagonal SPD Matrix)
5 Spectral Volterra Recurrence O ( N ) O(N)
Stable (Forward Step Recurrence)
6 Gegenbauer Expansion O ( 1 ) O(1)
Stable (Exact Analytic Closed-Form)
Table 2 : Comparison of AI-discovered methods for evaluating the cosmic string radiation integral I ( N , α ) I(N,\alpha) .
7
Algorithms and Bounds
In this section we see the AI acting as an algorithmic optimizer—tightening mathematical bounds, removing logarithmic factors, and discovering exact optimal constants.
7.1
Graph Theory: Biclique Partitions
Written by Benjamin Przybocki and Bernardo Subercaseaux.
Problem Context
Biclique partitions are a well-studied topic in graph theory with applications to secret sharing, circuit design, and graph compression. Together with Andrew Krapivin and Nicolás Sanhueza-Matamala, we recently proved that every n n -vertex graph admits an integral biclique partition of weight at most ( 1 2 + o ( 1 ) ) n 2 / lg ( n ) (\frac{1}{2}+o(1))n^{2}/\lg(n) , matching an information-theoretic lower bound [ 68 ] . However, in the context of cryptographic secret sharing, fractional solutions to the biclique partition problem are also studied. Previous work by Csirmaz, Ligeti, and Tardos [ 26 ] had shown a fractional upper bound with a constant of 1 2 \frac{1}{2} , and a fractional lower bound of 0.265 0.265 . Since we were able to match their fractional upper bound with integral solutions, a natural question was whether we could use our techniques to push the fractional upper bound strictly below 1 2 \frac{1}{2} .
AI Contribution
We had our paper fed to the model and asked it to resolve the open problem.
Please try to improve the paper by identifying and solving the open problem from it.
While its initial proof attempt had errors, it ultimately led us to solve the problem by providing a crucial insight on how one of our lemmas could be repurposed in a way we had not seen.
•
Proof Strategy: The AI suggested a strategy based on partitioning vertices into “good” (degree close to n / 2 n/2 ) and “bad”. We had considered this before, so we had some understanding of the technical hurdle the AI would need to overcome to make it work.
•
Overcoming the Hurdle: In the AI’s proof attempt, it pointed out a crucial idea that allowed us to get better control on the proportion of bad vertices. This insight was not obvious to us as it was a consequence of a construction we had used for a different purpose.
•
From Insight to Proof: The AI’s initial proof attempt had errors (incorrect constants and Taylor approximations), but the core idea—that “bad” vertices actually help reduce the partition weight—was correct.
Technical Details: Fractional and Integral Biclique Partitions
Definition 7.1
.
Given a graph G G , a fractional biclique partition is a function ℬ : 𝔅 → [ 0 , 1 ] \mathcal{B}:\mathfrak{B}\to[0,1] , where 𝔅 \mathfrak{B} is the set of all complete bipartite subgraphs of G G , such that for all e ∈ E ( G ) e\in E(G) , we have
∑ B ∈ 𝔅 𝟏 [ e ∈ B ] ⋅ ℬ ( B ) = 1 . \sum_{B\in\mathfrak{B}}\mathbf{1}_{[e\in B]}\cdot\mathcal{B}(B)=1.
The weight w ( ℬ ) w(\mathcal{B}) of such a partition is w ( ℬ ) := ∑ B ∈ 𝔅 ℬ ( B ) ⋅ | V ( B ) | w(\mathcal{B}):=\sum_{B\in\mathfrak{B}}\mathcal{B}(B)\cdot|V(B)| . A fractional biclique partition ℬ \mathcal{B} is integral if ℬ ( B ) ∈ { 0 , 1 } \mathcal{B}(B)\in{0,1} for all B ∈ 𝔅 B\in\mathfrak{B} .
The following notation will be useful for discussing the problem at hand.
Definition 7.2
.
Given a graph G G , let BP ( G ) \textsf{BP}(G) (resp., BP ∗ ( G ) \textsf{BP}^{\ast}(G) ) be the minimum weight of an integral (resp., fractional) biclique partition of G G . Then, given n ≥ 1 n\geq 1 , let
BP ( n ) := max G , | V ( G ) | = n BP ( G ) and BP ∗ ( n ) := max G , | V ( G ) | = n BP ∗ ( G ) . \textsf{BP}(n):=\max_{G,\;|V(G)|=n}\textsf{BP}(G)\quad\text{and}\quad\textsf{BP}^{\ast}(n):=\max_{G,\;|V(G)|=n}\textsf{BP}^{\ast}(G).
We had recently proven the following theorem regarding optimal integral biclique partitions:
Theorem 7.3
(Krapivin, Przybocki, Subercaseaux, and Sanhueza-Matamala [ 68 ] ) .
B P ( n ) ∼ 1 2 ⋅ n 2 lg n BP(n)\sim\frac{1}{2}\cdot\frac{n^{2}}{\lg n} .
In turn, the best bounds for fractional biclique partitions were:
Theorem 7.4
(Csirmaz, Ligeti, and Tardos [ 26 ] ) .
( 0.265 − o ( 1 ) ) n 2 lg n ≤ B P ∗ ( n ) ≤ ( 1 2 + o ( 1 ) ) ⋅ n 2 lg n (0.265-o(1))\frac{n^{2}}{\lg n}\leq BP^{\ast}(n)\leq(\frac{1}{2}+o(1))\cdot\frac{n^{2}}{\lg n} .
Importantly, Csirmaz, Ligeti, and Tardos also proved that on graphs of large minimum degree, a better upper bound can be obtained:
Theorem 7.5
(C.L.T. [ 26 ] ) .
Let p ∈ [ 0 , 1 ] p\in[0,1] be a fixed constant, and let G G be a graph on n n vertices such that deg ( v ) ≥ p ⋅ n \deg(v)\geq p\cdot n for every v ∈ V ( G ) v\in V(G) . Then,
BP ∗ ( G ) ≤ ( 0.725 ⋅ ( 1 − p ) + o ( 1 ) ) n 2 lg n . \textsf{BP}^{\ast}(G)\leq(0.725\cdot(1-p)+o(1))\frac{n^{2}}{\lg n}.
In particular, for any graph G G with minimum degree at least 0.32 n 0.32n (such as a G ( n , 1 2 ) G(n,\frac{1}{2}) graph, w.h.p.), this gives an upper bound strictly better than Theorem ˜ 7.4 . On the other hand, for graphs whose edge density is far from 0.5 0.5 , we also have an improved upper bound, even for integral biclique partitions:
Theorem 7.6
(K.P.S.S.-M. [ 68 ] ) .
Let γ ∈ ( 0 , 1 ) \gamma\in(0,1) be such that max { γ − 1 , ( 1 − γ ) − 1 } = n o ( 1 ) \max{\gamma^{-1},(1-\gamma)^{-1}}=n^{o(1)} . Then, given a graph G G of density γ \gamma , we have
B P ( G ) ≤ ( 1 2 + o ( 1 ) ) ⋅ h 2 ( γ ) n 2 lg n , BP(G)\leq(\frac{1}{2}+o(1))\cdot h_{2}(\gamma)\frac{n^{2}}{\lg n},
where h 2 ( x ) := − x lg x − ( 1 − x ) lg ( 1 − x ) h_{2}(x):=-x\lg x-(1-x)\lg(1-x) is the binary entropy function.
Moreover, we also proved a similar result that applies when the degree of every vertex is bounded away from 0.5 0.5 , even if the global density is close to 0.5 0.5 :
Theorem 7.7
(K.P.S.S-M. [ 68 ] ) .
Let γ ˙ ∈ ( 0 , 1 ) \dot{\gamma}\in(0,1) be such that γ ˙ − 1 = n o ( 1 ) \dot{\gamma}^{-1}=n^{o(1)} . Then, given a graph G G such that min { d ( v ) / n , 1 − d ( v ) / n } ≤ γ ˙ \min{d(v)/n,1-d(v)/n}\leq\dot{\gamma} , we have B P ( G ) ≤ ( 1 2 + o ( 1 ) ) ⋅ h 2 ( γ ˙ ) n 2 lg n . BP(G)\leq(\frac{1}{2}+o(1))\cdot h_{2}(\dot{\gamma})\frac{n^{2}}{\lg n}.
Therefore, in light of Theorems ˜ 7.5 , 7.6 and 7.7 , if it is not possible to improve the 1 2 \frac{1}{2} constant from Theorem ˜ 7.4 , then the problematic graphs must have a particular structure: (i) a constant fraction of vertices of degree at most 0.32 n 0.32n , (ii) edge density 0.5 0.5 , and (iii) a constant fraction of vertices whose degree is approximately n 2 \frac{n}{2} .
Naturally, we thought of decomposing problematic graphs into different pieces that could be separately handled by the three aforementioned results. Roughly, the idea was to partition the vertices of a given graph into those whose degree is close to n 2 \frac{n}{2} (call these V good V_{\text{good}} ) and those whose degree is far from n 2 \frac{n}{2} (call these V bad V_{\text{bad}} ). Our hope was to use Csirmaz, Ligeti, and Tardos’s result to construct a biclique partition for G [ V good ] G[V_{\text{good}}] and then try to construct a biclique partition for the remaining edges using a different method. But, in order for this to work, we needed V bad V_{\text{bad}} to be a small proportion of the vertices, and it did not seem possible to ensure this with our strategy. We therefore left this as an open problem.
The AI model was fed our paper, and at our suggestion, was asked to resolve this open problem. 5 5 5 Initially, the AI model tried a different open problem from our paper, but its response did not seem very promising to us, and we were more interested in this problem anyway.
The proof attempt generated by the AI was similar to the strategy outlined in the previous paragraph, which was impressive given that we did not share our ideas with the AI. But there was one crucial difference. The AI noticed that the construction we used for Theorem ˜ 7.7 actually implies that we can improve the 1 2 \frac{1}{2} constant from Theorem ˜ 7.4 whenever the average of | deg ( v ) / n − 1 2 | |\deg(v)/n-\frac{1}{2}| is bounded away from 0. We neglected to consider this ourselves, since Theorem ˜ 7.7 , as stated here, is in fact a corollary of a stronger theorem in our paper bounding the maximum number of bicliques any given vertex belongs to; for this stronger result, it was necessary to assume that | deg ( v ) / n − 1 2 | |\deg(v)/n-\frac{1}{2}| is bounded away from 0 for every vertex rather than merely on average.
However, despite the promising high-level strategy, the AI’s proof attempt contained several mistakes, which did not seem to be fixable by local modifications of the proof attempt at least as written. For example, the AI asserted that some expression is ≈ 0.31966 \approx 0.31966 when it is actually ≈ 0.13933 \approx 0.13933 , and it had a nonsensical asymptotic expression A x + x 3 / 2 + O ( x 2 ) Ax+x^{3}/2+O(x^{2}) , where x x is a constant. Nevertheless, the AI’s observation in the previous paragraph turned out to be exactly what we needed to overcome the difficulties we had encountered in our first attempt, and we proved that every n n -vertex graph admits a fractional biclique partition of weight at most ( 0.4999 + o ( 1 ) ) n 2 / lg ( n ) (0.4999+o(1))n^{2}/\lg(n) . We made no attempt to optimize the constant 0.4999 0.4999 ; in fact, it seems to us that where the AI went wrong is trying too aggressively to optimize the constant in its proof, which resulted in overly complicated algebraic calculations that the AI struggled to accurately manipulate.
Final Proof
For technical reasons, Theorems ˜ 7.6 and 7.7 require γ \gamma or γ ˙ \dot{\gamma} to not be too close to 0 or 1. For similar reasons, it is convenient to define a “clipped” version of the binary entropy function, which allows us to state Lemma ˜ 7.9 , a lemma corresponding to the AI’s central insight mentioned above. The AI originally stated the lemma using the non-clipped binary entropy function, although the lemma is incorrect when stated that way. Interestingly, when the AI was asked the provide a rigorous proof of the lemma, it realized it could not be true as stated, although it did not discover that the clipped binary entropy function provides a simple way to salvage it that suffices for its application.
Definition 7.8
.
Let h 2 ( x ) := − x lg x − ( 1 − x ) lg ( 1 − x ) h_{2}(x):=-x\lg x-(1-x)\lg(1-x) be the binary entropy function, and let h 2 † : [ 0 , 1 ] → [ 0 , 1 ] h_{2}^{\dagger}\colon[0,1]\to[0,1] be defined by
h 2 † ( x ) = { h 2 ( x ) if x ∈ [ 0.01 , 0.99 ] h 2 ( 0.01 ) if x ∉ [ 0.01 , 0.99 ] . h_{2}^{\dagger}(x)=\begin{cases}h_{2}(x)&\text{if }x\in[0.01,0.99]\\
h_{2}(0.01)&\text{if }x\not\in[0.01,0.99].\end{cases}
Lemma 7.9
.
Let G G be an n n -vertex graph, and let h ¯ = h ¯ ( G ) := 1 n ∑ v ∈ V h 2 † ( deg ( v ) / n ) \bar{h}=\bar{h}(G):=\frac{1}{n}\sum_{v\in V}h_{2}^{\dagger}(\deg(v)/n) . Then,
BP ∗ ( G ) ≤ ( h ¯ 2 + o ( 1 ) ) n 2 lg n . \textsf{BP}^{\ast}(G)\leq\left(\frac{\bar{h}}{2}+o(1)\right)\frac{n^{2}}{\lg n}.
The proof of Lemma ˜ 7.9 is a relatively straightforward modification of the proof for Theorem ˜ 7.7 , which corresponds to [ 68 , Theorem 26] . With it, the proof of the final result is surprisingly simple.
Theorem 7.10
.
BP ∗ ( n ) ≤ ( 0.4999 + o ( 1 ) ) ⋅ n 2 lg n \textsf{BP}^{\ast}(n)\leq(0.4999+o(1))\cdot\frac{n^{2}}{\lg n} .
Proof.
Let G G be an arbitrary n n -vertex graph. Let V good = { v ∈ V ( G ) ∣ d ( v ) ≥ 0.37 n } V_{\text{good}}={v\in V(G)\mid d(v)\geq 0.37n} and V bad = V ( G ) ∖ V good V_{\text{bad}}=V(G)\setminus V_{\text{good}} . Then, note that
h ¯ ( G ) = 1 n ∑ v ∈ V h 2 † ( deg ( v ) / n ) \displaystyle\bar{h}(G)=\frac{1}{n}\sum_{v\in V}h_{2}^{\dagger}(\deg(v)/n)
= 1 n ( ∑ v ∈ V bad h 2 † ( deg ( v ) / n ) + ∑ v ∈ V good h 2 † ( deg ( v ) / n ) ) \displaystyle=\frac{1}{n}\left(\sum_{v\in V_{\text{bad}}}h_{2}^{\dagger}(\deg(v)/n)+\sum_{v\in V_{\text{good}}}h_{2}^{\dagger}(\deg(v)/n)\right)
≤ 1 n ( | V bad | ⋅ h 2 ( 0.37 ) + | V good | ⋅ 1 ) . \displaystyle\leq\frac{1}{n}\left(|V_{\text{bad}}|\cdot h_{2}(0.37)+|V_{\text{good}}|\cdot 1\right).
Thus, if | V bad | ≥ n / 100 |V_{\text{bad}}|\geq n/100 , then by Lemma ˜ 7.9 ,
BP ∗ ( G ) ≤ ( 0.02 ⋅ h 2 ( 0.37 ) + 0.98 2 + o ( 1 ) ) n 2 lg n ≤ ( 0.4999 + o ( 1 ) ) n 2 lg n . \textsf{BP}^{\ast}(G)\leq\left(\frac{0.02\cdot h_{2}(0.37)+0.98}{2}+o(1)\right)\frac{n^{2}}{\lg n}\leq(0.4999+o(1))\frac{n^{2}}{\lg n}.
Therefore, we may assume that | V bad |
BP ∗ ( G [ V good ] ) ≤ ( 0.725 ⋅ ( 1 − 0.36 ) + o ( 1 ) ) n 2 lg n = ( 0.464 + o ( 1 ) ) n 2 lg n . \textsf{BP}^{\ast}(G[V_{\text{good}}])\leq(0.725\cdot(1-0.36)+o(1))\frac{n^{2}}{\lg n}=(0.464+o(1))\frac{n^{2}}{\lg n}.
Let H H be the graph with V ( H ) = V ( G ) V(H)=V(G) and E ( H ) = { e ∈ E ( G ) ∣ e ∩ V bad ≠ ∅ } E(H)={e\in E(G)\mid e\cap V_{\text{bad}}\neq\emptyset} . It remains to construct a fractional biclique partition for H H . Note that | E ( H ) | ≤ | V bad | ⋅ 0.37 n ≤ 0.0037 n 2 |E(H)|\leq|V_{\text{bad}}|\cdot 0.37n\leq 0.0037n^{2} , from where the edge density of H H is at most 0.0074 + o ( 1 ) 0.0074+o(1) . Hence, by Theorem ˜ 7.6 ,
BP ∗ ( H ) ≤ ( h 2 ( 0.0074 ) 2 + o ( 1 ) ) n 2 lg n ≤ ( 0.0316 + o ( 1 ) ) n 2 lg n . \textsf{BP}^{\ast}(H)\leq\left(\frac{h_{2}(0.0074)}{2}+o(1)\right)\frac{n^{2}}{\lg n}\leq(0.0316+o(1))\frac{n^{2}}{\lg n}.
Since, E ( G ) := E ( H ) ⊔ E ( G [ V good ] ) E(G):=E(H)\sqcup E(G[V_{\text{good}}]) , we have
BP ∗ ( G ) ≤ BP ∗ ( G [ V good ] ) + BP ∗ ( H ) ≤ ( 0.496 + o ( 1 ) ) n 2 lg n . ∎ \textsf{BP}^{\ast}(G)\leq\textsf{BP}^{\ast}(G[V_{\text{good}}])+\textsf{BP}^{\ast}(H)\leq(0.496+o(1))\frac{n^{2}}{\lg n}.\qed
7.2
Query Complexity: Local Search on General Graphs
Written by Simina Brânzei, Ioannis Panageas, Dimitris Paparas [ 13 ] .
Problem Context
We analyze the query complexity of the abstract problem of finding a local minimum of a function defined on a general graph using t t rounds of interaction with the oracle. This theoretical model captures optimization tasks such as training neural networks, where the goal is to minimize a loss function, each “query” is an expensive loss evaluation, and batching queries is crucial for efficiency.
Prior work [ 12 ] resolved the query complexity of local search in rounds for the d d -dimensional grid. However, the problem remained open for general graphs, which are essential for modeling non-Euclidean geometries such as manifold discretizations.
AI Contribution
The AI was instrumental in obtaining the following results on the query complexity of local search on arbitrary graphs:
•
A deterministic upper bound as a function of the number of vertices n n , the number of rounds t t , and the separation number s s of the graph.
•
A randomized lower bound dependent on n n and t t .
These results constitute a research paper [ 13 ] , developed entirely by iterating with the AI via a “scaffolded reasoning” process. Rather than asking the model to generate the paper from scratch, we treated it as a junior research assistant: we defined the lemmas and tasked the AI with writing the proofs, which we then rigorously verified and refined. We directed the model to expand unclear sections and flagged issues where applicable. This approach can be characterized as a form of “vibe proving”.
•
Algorithm Design: We tasked the AI with deriving an upper bound for two-round local search parameterized by the graph’s separation number. The model synthesized a two-round algorithm with O ( n s Δ ) O(\sqrt{ns\Delta}) queries and stated a supporting “Shattering Lemma” about separator decompositions.
•
Iterative Improvements: For all the upper and lower bounds, we guided the model to gradually generalize (e.g., obtain a lower bound for a tree in 2 2 rounds and then generalize to any graph in 2 2 rounds). We gave several relevant papers as context in all iterations.
•
Adversarial Hypothesis Testing: When we hypothesized a linear lower bound for local search in two rounds on constant-degree expanders, the AI refuted it by proposing a counter-algorithm (Randomized Parallel Steepest Descent) and demonstrating an upper bound of O ( n / log n ) O(n/\log{n}) for it.
Problem choice. We chose this problem for several reasons. One of the authors has expertise on the topic of local search (with a prior paper that focuses on grids [ 12 ] ). Investigating the query complexity for general geometries was written in an NSF grant by this author. Moreover, one of the journal reviewers of the paper on local search in rounds on grids [ 12 ] independently raised this question. Pedagogically, the problem seemed ideal for a collaboration with a student, as it allowed for a natural progression from simple cases—such as two-round search on trees—to general results.
Strengths. The AI model demonstrated distinct strengths during our collaboration. One of the key contributions was the model’s tailoring of the classical staircase construction to the round setting. While the staircase technique is well-known, the model’s instantiation ensured that in the hard distribution, every vertex is the (only) local minimum with equal probability. This created a recursive structure where, at the end of each round, the set of viable candidates forms a smaller instance of the original problem, enabling a clean inductive argument to go through. The elegance of this structure was a very nice surprise.
Weaknesses. The model also exhibited weaknesses, such as sometimes producing incorrect proofs or conveniently mis-interpreting the question so that it could answer it. For instance, when we first asked for a lower bound, it would often switch to proving an existential one. Specifically, when queried about a lower bound for local search on graphs with n n vertices and separation number s s , it created a “hard graph” for these parameters (such as a line where each vertex is connected to a clique of size s s ). We also encountered occasional hallucinations of non-existent results. Finally, when we prompted without suggesting graph features, the AI model gave upper bounds that were less interpretable.
Model and Results
Let G = ( V , E ) G=(V,E) be a connected undirected graph with vertex set V = [ n ] = { 1 , … , n } V=[n]={1,\ldots,n} . Let f : V → ℝ f:V\to\mathbb{R} . We call a vertex v ∈ V v\in V a local minimum if f ( v ) ≤ f ( u ) f(v)\leq f(u) for every neighbor u u of v v . In the local search problem, we are given G G and oracle access to the values of f f . The goal is to locate a local minimum using a minimum number of queries, where a query at vertex v v reveals f ( v ) f(v) .
Suppose the interaction with the oracle has t t rounds. In each round j j , the algorithm selects a set of queries based on the outcomes of rounds 1 , … , j − 1 1,\dots,j-1 , submits them, and receives the answers. The algorithm terminates after round t t with a proposed solution.
Query Complexity.
The deterministic query complexity is the minimum number of queries required for an optimal deterministic algorithm to find a local minimum on a worst case instance.
The randomized query complexity is the minimum worst-case number of queries required by a randomized algorithm to compute the function with probability at least 9 / 10 9/10 for every input 6 6 6 The constant 9 / 10 9/10 can be replaced by any other constant strictly more than 1 / 2 1/2 . .
Graph Features.
Let Δ \Delta be the maximum degree of G G , and let d i s t ( u , v ) dist(u,v) denote the shortest-path distance between u , v ∈ V u,v\in V .
Given a parameter 1 / 2 ≤ α
The separation number s ( G ) s(G) is the minimum s s such that every subgraph of G G admits an ( s , 2 / 3 ) (s,2/3) -separator. The separation number s ( G ) s(G) is within a constant factor of the graph’s treewidth.
Experiment Narrative
We started the investigation with trying to understand the query complexity for two rounds. One of the prompts used in the early stages of writing the paper is in Figure ˜ 5 .
“Given a connected undirected graph G = ( V , E ) G=(V,E) , there is an unknown function f : V → ℝ f:V\to\mathbb{R} , which assigns a value to every vertex v v (say the values are distinct). We have oracle access to the function (i.e. an algorithm can learn the value of the function at a vertex by probing that vertex). A vertex v v is a local minimum if f ( v ) ≤ f ( u ) f(v)\leq f(u) for all neighbors u u of v v . What is the query complexity of finding a local minimum in t t rounds of interaction with the oracle?”
Suppose we focus on two rounds. Can you design algorithms as a function of graph features/properties like minimum vertex cover, chromatic number, treewidth?
Figure 5 : Early prompt for trying to understand the query complexity for two rounds.
The full response to this prompt is included below.
Response:
This is an excellent problem that explores the trade-offs between the number of interaction rounds and the total query complexity for finding a local minimum in a graph. When restricted to two rounds, algorithms must strategically balance the exploration in the first round with an adaptive, focused search in the second round. We can indeed design effective algorithms by leveraging graph structural properties.
General Framework: Decomposition-Based Approach.
The core strategy for a two-round algorithm involves partitioning the vertices V V into a separator set S S and the remainder V ∖ S V\setminus S . The first round queries S S . The key is that G [ V ∖ S ] G[V\setminus S] must be decomposed into small connected components, allowing the second round to adaptively explore the neighborhood of the most promising candidate.
Let K K be the maximum size of any connected component in G [ V ∖ S ] G[V\setminus S] .
Generalized Decomposition Algorithm:
Decomposition: Identify a separator set S S .
Round 1: Query all vertices in S S ( Q 1 = S Q_{1}=S ).
Adaptivity: Identify the minimum value found: v m i n = arg min v ∈ S f ( v ) v_{min}=\arg\min_{v\in S}f(v) .
Round 2: Identify all connected components of G [ V ∖ S ] G[V\setminus S] that are adjacent to v m i n v_{min} . Let Q 2 Q_{2} be the union of vertices in these components. Query Q 2 Q_{2} .
Output: Output v ∗ = arg min v ∈ S ∪ Q 2 f ( v ) v^{*}=\arg\min_{v\in S\cup Q_{2}}f(v) .
Correctness: The algorithm guarantees finding a local minimum. If v ∗ ∈ S v^{}\in S (so v ∗ = v m i n v^{}=v_{min} ), all its neighbors are queried. If v ∗ ∈ Q 2 v^{*}\in Q_{2} , then f ( v ∗ ) 1 K>1 . Let s K ( G ) s_{K}(G) be the minimum size of a separator S S such that all components of G [ V ∖ S ] G[V\setminus S] have size at most K K . The complexity is: Q C = min K ( s K ( G ) + K ⋅ Δ ) QC=\min_{K}(s_{K}(G)+K\cdot\Delta) .
Graphs with low treewidth can be efficiently shattered. Specifically, many graph classes with bounded treewidth satisfy the property s K ( G ) = O ( N ⋅ w / K ) s_{K}(G)=O(N\cdot w/K) (known as having 1 / K 1/K -separators).
Optimized Complexity: We minimize O ( N w / K ) + K Δ O(Nw/K)+K\Delta . The balance is achieved when the two terms are roughly equal: N w / K ≈ K Δ Nw/K\approx K\Delta . This yields an optimal component size K ≈ N w / Δ K\approx\sqrt{Nw/\Delta} . The resulting query complexity is: Q C T W = O ( N ⋅ t w ( G ) ⋅ Δ ) QC_{TW}=O(\sqrt{N\cdot tw(G)\cdot\Delta}) .
Implications: This demonstrates that sublinear query complexity is achievable in two rounds for structured graphs.
•
Paths and Trees: For paths ( w = 1 , Δ = 2 w=1,\Delta=2 ) or bounded-degree trees, the complexity is O ( N ) O(\sqrt{N}) .
•
Bounded Treewidth/Degree Graphs: If t w ( G ) = O ( 1 ) tw(G)=O(1) and Δ = O ( 1 ) \Delta=O(1) , the complexity is O ( N ) O(\sqrt{N}) .
Example: Planar Graphs. Planar graphs utilize stronger separator theorems ( s K ( G ) = O ( N / K ) s_{K}(G)=O(N/\sqrt{K}) ). Optimizing O ( N / K ) + K Δ O(N/\sqrt{K})+K\Delta yields a complexity of O ( N 2 / 3 Δ 1 / 3 ) O(N^{2/3}\Delta^{1/3}) .
Summary.
By exploiting graph structures, we can design efficient two-round algorithms. The Vertex Cover and Chromatic Number approaches provide parameterized complexity based on the K = 1 K=1 decomposition. The Treewidth approach leverages generalized decomposition (optimized K K ) and allows for sublinear query complexity in various important graph classes.
The separation number and treewidth are related by a constant factor; thus, approach (3) from the response is essentially the two-round algorithm presented in our full paper.
In subsequent interactions, we prompted the model to generalize this two-round strategy to t ≥ 2 t\geq 2 rounds.
The model successfully synthesized a t t -round algorithm anchored by a “Shattering Lemma”. The lemma is: “ For any parameter K ∈ [ 1 , n ] K\in[1,n] , there exists a separator S ⊆ V S\subseteq V such that every connected component of G [ V ∖ S ] G[V\setminus S] has size at most K K , with | S |
We obtained this way the following theorem.
Theorem 7.11
(Theorem 1 in [ 13 ] ) .
Let G = ( V , E ) G=(V,E) be a connected undirected graph with n n vertices. The deterministic query complexity of finding a local minimum on G G in t ≥ 2 t\geq 2 rounds is at most
min ( 4 t ⋅ n 1 t ( s Δ ) 1 − 1 t , n ) , \min\bigl(4t\cdot n^{\frac{1}{t}}(s\Delta)^{1-\frac{1}{t}},n\bigr),
where Δ \Delta is the maximum degree and s s is the separation number of G G .
The development of the lower bounds was more iterative. Initially, the model insisted on constructing hard graphs (e.g., line graphs with cliques of size s s attached to each node) rather than finding a hard distribution of functions for a fixed graph. Some of the lower bounds derived in this stage were not correct, or relied on looking at the longest path in the graph or using Menger’s theorem.
Eventually we created a prompt that had the following elements:
•
explicitly clarifying it must keep the given graph fixed and find a hard input distribution (of functions) for that graph;
•
including as context prior papers on lower bounds for local search (specifically, [ 12 ] and [ 84 ] );
•
asking it to focus on local search in two rounds on a tree and try to prove a lower bound of Ω ( n ) \Omega(\sqrt{n}) .
The prompt, given in Figure ˜ 6 , also contained the version of the paper up to that point, which contained only algorithmic results (Theorem 7.11 ).
I am working on the following manuscript. Can you read it (ignoring the todos/notes) and then help me prove a lower bound of n \sqrt{n} on the randomized query complexity of finding a local minimum in two rounds on a tree? The statement should be of the form:
Theorem
.
Let G = ( V , E ) G=(V,E) be an undirected connected tree with n n vertices. Then the randomized query complexity of finding a local minimum in two rounds on G G is Ω ( n ) \Omega(\sqrt{n}) .
Note the lower bound should hold for the given tree G G , so don’t create a hard family of trees. Rather, you need to create a hard family of functions for the given tree G G . Make it so that for each function in the family, there is a unique local minimum. Also don’t hide the constants under big Omega until the very end of the proof. Attaching two papers from this literature together with my current draft; see especially the paper on local search in rounds on grids for examples of how to use Yao’s lemma in such proofs (i.e. finding a hard distribution of functions and analyzing the performance of a deterministic algorithm).
Figure 6 : Prompt for lower bound on trees in two rounds
The model’s response to the prompt is included below:
Theorem
.
Let G = ( V , E ) G=(V,E) be an undirected connected tree with n > 1 n>1 vertices. Then the randomized query complexity of finding a local minimum in two rounds on G G , with success probability at least c = 9 / 10 c=9/10 , is Ω ( n ) \Omega(\sqrt{n}) . Specifically, it is at least 1.8 n − 2 1.8\sqrt{n}-2 .
Proof.
We use Yao’s Minimax Principle. We define a distribution 𝒟 \mathcal{D} over input functions such that each function has a unique local minimum. We then analyze the expected query complexity of any deterministic two-round algorithm 𝒜 \mathcal{A} that succeeds with probability at least c c over 𝒟 \mathcal{D} .
- The Hard Distribution 𝒟 \mathcal{D} . Fix an arbitrary vertex r ∈ V r\in V as the root. Let d ( v ) d(v) be the depth of v v (distance from r r ). Let A n c ( v ) Anc(v) be the set of ancestors of v v (the unique path from r r to v v , inclusive). We write u ⪯ v u\preceq v if u ∈ A n c ( v ) u\in Anc(v) .
For each v ∗ ∈ V v^{}\in V , we define the function f v ∗ : V → ℤ f_{v^{}}:V\to\mathbb{Z} :
f v ∗ ( x ) = { − d ( x ) if x ⪯ v ∗ d ( x ) if x ⋠ v ∗ f_{v^{*}}(x)=\begin{cases}-d(x)&\text{if }x\preceq v^{*}\\
\phantom{-}d(x)&\text{if }x\not\preceq v^{*}\end{cases}
f v ∗ f_{v^{*}} has a unique local minimum at v ∗ v^{*} .
Let 𝒟 \mathcal{D} be the uniform distribution over { f v ∗ } v ∗ ∈ V {f_{v^{}}}_{v^{}\in V} . The input is f V ∗ f_{V^{}} , where V ∗ V^{} is chosen uniformly at random from V V .
- Analysis of a Deterministic Two-Round Algorithm 𝒜 \mathcal{A} .
Round 1. 𝒜 \mathcal{A} queries a fixed set Q 1 ⊂ V Q_{1}\subset V . Let q 1 = | Q 1 | q_{1}=|Q_{1}| . Since the structure of G G is known, d ( x ) d(x) is known. A query to f V ∗ ( x ) f_{V^{}}(x) reveals whether x ⪯ V ∗ x\preceq V^{} . The outcome of Round 1 is the signature S V ∗ = A n c ( V ∗ ) ∩ Q 1 S_{V^{}}=Anc(V^{})\cap Q_{1} .
Let 𝒮 \mathcal{S} be the set of possible signatures. The outcomes partition V V into K = | 𝒮 | K=|\mathcal{S}| blocks C 1 , … , C K C_{1},\ldots,C_{K} , where C i C_{i} is the set of v ∗ v^{*} corresponding to the i i -th signature.
Lemma 7.12
.
The number of distinct outcomes K K is at most q 1 + 1 q_{1}+1 .
Proof.
For any non-empty signature S ∈ 𝒮 S\in\mathcal{S} , let m ( S ) m(S) be the unique element in S S farthest from the root r r . Note m ( S ) ∈ Q 1 m(S)\in Q_{1} . We show the map m : 𝒮 ∖ { ∅ } → Q 1 m:\mathcal{S}\setminus{\emptyset}\to Q_{1} is injective.
Suppose m ( S 1 ) = m ( S 2 ) = x m(S_{1})=m(S_{2})=x . Let S 1 = A n c ( v 1 ) ∩ Q 1 S_{1}=Anc(v_{1})\cap Q_{1} . Since x ∈ S 1 x\in S_{1} , we have x ⪯ v 1 x\preceq v_{1} . We claim S 1 = A n c ( x ) ∩ Q 1 S_{1}=Anc(x)\cap Q_{1} . ( ⊆ \subseteq ): If y ∈ S 1 y\in S_{1} , then y ∈ Q 1 y\in Q_{1} and y ⪯ v 1 y\preceq v_{1} . Since x x is the deepest element in S 1 S_{1} , y y must be an ancestor of x x (as all elements in A n c ( v 1 ) Anc(v_{1}) are comparable). Thus y ∈ A n c ( x ) ∩ Q 1 y\in Anc(x)\cap Q_{1} . ( ⊇ \supseteq ): If y ∈ A n c ( x ) ∩ Q 1 y\in Anc(x)\cap Q_{1} . Since x ⪯ v 1 x\preceq v_{1} , A n c ( x ) ⊆ A n c ( v 1 ) Anc(x)\subseteq Anc(v_{1}) . Thus y ∈ A n c ( v 1 ) ∩ Q 1 = S 1 y\in Anc(v_{1})\cap Q_{1}=S_{1} . Therefore, S 1 = A n c ( x ) ∩ Q 1 S_{1}=Anc(x)\cap Q_{1} . Similarly, S 2 = A n c ( x ) ∩ Q 1 S_{2}=Anc(x)\cap Q_{1} . Thus S 1 = S 2 S_{1}=S_{2} . Since m m is injective, there are at most | Q 1 | = q 1 |Q_{1}|=q_{1} non-empty signatures. Including the possibility of the empty signature, K ≤ q 1 + 1 K\leq q_{1}+1 . ∎
Round 2. If the outcome implies V ∗ ∈ C i V^{*}\in C_{i} , 𝒜 \mathcal{A} queries Q 2 , i Q_{2,i} of size q i q_{i} . This further partitions C i C_{i} into K i K_{i} sub-blocks. By the same logic as Lemma 7.12 , K i ≤ q i + 1 K_{i}\leq q_{i}+1 .
Success Probability and Expected Cost. Let P i P_{i} be the conditional success probability given V ∗ ∈ C i V^{*}\in C_{i} . Since the distribution within C i C_{i} is uniform, and the algorithm makes a deterministic guess for each sub-block C i , j C_{i,j} , it succeeds on at most one vertex per sub-block.
P i = ∑ j = 1 K i P ( V ∗ ∈ C i , j | V ∗ ∈ C i ) P ( Success | V ∗ ∈ C i , j ) ≤ ∑ j = 1 K i | C i , j | | C i | 1 | C i , j | = K i | C i | . P_{i}=\sum_{j=1}^{K_{i}}P(V^{}\in C_{i,j}|V^{}\in C_{i})P(\text{Success}|V^{*}\in C_{i,j})\leq\sum_{j=1}^{K_{i}}\frac{|C_{i,j}|}{|C_{i}|}\frac{1}{|C_{i,j}|}=\frac{K_{i}}{|C_{i}|}.
(The inequality becomes equality if the algorithm guesses optimally within each sub-block). Since K i ≤ q i + 1 K_{i}\leq q_{i}+1 , we have | C i | P i ≤ q i + 1 |C_{i}|P_{i}\leq q_{i}+1 , so q i ≥ | C i | P i − 1 q_{i}\geq|C_{i}|P_{i}-1 . Also, P i ≤ 1 P_{i}\leq 1 since K i ≤ | C i | K_{i}\leq|C_{i}| .
We require the overall success probability P S ≥ c P_{S}\geq c . P S = ∑ i = 1 K P ( V ∗ ∈ C i ) P i = 1 n ∑ i = 1 K | C i | P i P_{S}=\sum_{i=1}^{K}P(V^{*}\in C_{i})P_{i}=\frac{1}{n}\sum_{i=1}^{K}|C_{i}|P_{i} . Let S = c n S=cn . We require ∑ i = 1 K | C i | P i ≥ S \sum_{i=1}^{K}|C_{i}|P_{i}\geq S .
The expected query complexity Q e x p Q_{exp} of 𝒜 \mathcal{A} under 𝒟 \mathcal{D} is:
Q e x p \displaystyle Q_{exp}
= q 1 + 1 n ∑ i = 1 K | C i | q i ≥ q 1 + 1 n ∑ i = 1 K | C i | ( | C i | P i − 1 ) = q 1 + 1 n ( ∑ i = 1 K P i | C i | 2 ) − 1 n ∑ i = 1 K | C i | \displaystyle=q_{1}+\frac{1}{n}\sum_{i=1}^{K}|C_{i}|q_{i}\geq q_{1}+\frac{1}{n}\sum_{i=1}^{K}|C_{i}|(|C_{i}|P_{i}-1)=q_{1}+\frac{1}{n}\left(\sum_{i=1}^{K}P_{i}|C_{i}|^{2}\right)-\frac{1}{n}\sum_{i=1}^{K}|C_{i}|
= q 1 − 1 + 1 n ∑ i = 1 K P i | C i | 2 . \displaystyle=q_{1}-1+\frac{1}{n}\sum_{i=1}^{K}P_{i}|C_{i}|^{2}.
Optimization. We use the Cauchy-Schwarz inequality to lower bound the summation:
( ∑ i = 1 K P i | C i | 2 ) ( ∑ i = 1 K P i ) \displaystyle\left(\sum_{i=1}^{K}P_{i}|C_{i}|^{2}\right)\left(\sum_{i=1}^{K}P_{i}\right)
≥ ( ∑ i = 1 K ( P i | C i | ) P i ) 2 = ( ∑ i = 1 K P i | C i | ) 2 ≥ S 2 . \displaystyle\geq\left(\sum_{i=1}^{K}(\sqrt{P_{i}}|C_{i}|)\sqrt{P_{i}}\right)^{2}=\left(\sum_{i=1}^{K}P_{i}|C_{i}|\right)^{2}\geq S^{2}.
Therefore, ∑ i = 1 K P i | C i | 2 ≥ S 2 / ( ∑ i = 1 K P i ) \sum_{i=1}^{K}P_{i}|C_{i}|^{2}\geq S^{2}/(\sum_{i=1}^{K}P_{i}) .
We bound the denominator. Since P i ≤ 1 P_{i}\leq 1 , ∑ P i ≤ K \sum P_{i}\leq K . By Lemma 7.12 , K ≤ q 1 + 1 K\leq q_{1}+1 .
∑ i = 1 K P i | C i | 2 ≥ S 2 q 1 + 1 . \sum_{i=1}^{K}P_{i}|C_{i}|^{2}\geq\frac{S^{2}}{q_{1}+1}.
Substituting this back into the expected cost: Q e x p ≥ q 1 − 1 + S 2 n ( q 1 + 1 ) = q 1 − 1 + c 2 n q 1 + 1 . Q_{exp}\geq q_{1}-1+\frac{S^{2}}{n(q_{1}+1)}=q_{1}-1+\frac{c^{2}n}{q_{1}+1}.
This lower bound holds for any deterministic algorithm 𝒜 \mathcal{A} , characterized by its choice of q 1 q_{1} . To find the overall lower bound via Yao’s principle, we minimize this expression over q 1 ≥ 0 q_{1}\geq 0 . Let x = q 1 + 1 x=q_{1}+1 ( x ≥ 1 x\geq 1 ). We minimize g ( x ) = x − 2 + c 2 n x g(x)=x-2+\frac{c^{2}n}{x} . By the AM-GM inequality, x + c 2 n / x ≥ 2 c 2 n = 2 c n x+c^{2}n/x\geq 2\sqrt{c^{2}n}=2c\sqrt{n} . Equality holds when x = c n x=c\sqrt{n} . (Assuming c n ≥ 1 c\sqrt{n}\geq 1 , which holds for c = 9 / 10 c=9/10 and n > 1 n>1 ).
Q e x p ≥ 2 c n − 2 . Q_{exp}\geq 2c\sqrt{n}-2.
Substituting c = 9 / 10 c=9/10 : Q e x p ≥ 2 ( 0.9 ) n − 2 = 1.8 n − 2 Q_{exp}\geq 2(0.9)\sqrt{n}-2=1.8\sqrt{n}-2 . Thus, the randomized query complexity is Ω ( n ) \Omega(\sqrt{n}) . ∎
Building on the lower bound for trees in two rounds, we asked the model to generalize the result to any graph G G in two rounds. The proposed approach defines an arbitrary spanning tree T T of G G rooted at some vertex r r and a family of functions ℱ = { f v ∣ v ∈ V } \mathcal{F}={f_{v}\mid v\in V} parameterized by each vertex v ∈ V v\in V that also depends on T T . In particular, for every v ∈ V v\in V , let f v : V → ℤ f_{v}:V\to\mathbb{Z} be such that for all x ∈ V x\in V ,
f v ( x ) = { − d i s t T ( r , x ) if x ⪯ T v d i s t T ( r , x ) otherwise \displaystyle f_{v}(x)=\begin{cases}-dist_{T}(r,x)&\text{if }x\preceq_{T}v\\
\phantom{-}dist_{T}(r,x)&\text{otherwise}\end{cases}
(28)
where d i s t T ( r , x ) dist_{T}(r,x) represents the distance between r r and x x in the spanning tree T T .
Moreover, we denote by 𝒟 \mathcal{D} the uniform distribution over ℱ \mathcal{F} . The distribution 𝒟 \mathcal{D} is the one used to obtain the lower bound for any graph and number t t of rounds. Unlike the two-round case where the round-1 queries are fixed, the t ≥ 3 t\geq 3 round case requires analyzing adaptive histories. To handle the adaptivity, the model suggested the notion of candidate set . Suppose H H is a history reachable after i i rounds. The candidate set, denoted 𝒞 ( H ) \mathcal{C}(H) , represents the vertices of G G that could still be local minima given the history H H .
(a) Graph with n = 16 n=16 vertices
(b) Spanning tree with the candidate sets attainable at the end of round 1 1 .
Figure 7 : The left figure (a) shows a graph G G . The input function is drawn from ℱ \mathcal{F} . The right figure (b) shows a spanning tree of G G rooted at 1 1 . The set of queries Q 1 = { 2 , 3 , 10 } Q_{1}={2,3,10} submitted in round 1 are circled in red. The partition of candidate sets attainable at the end of round 1 (depending on the answers to the queries) is illustrated using yellow frames.
The model observed several properties of the candidate sets, which we consolidated into a key lemma stating that at the end of each round i i :
•
(a) each attainable candidate set is a sub-tree of the original spanning tree T T of G G ; and
•
(b) the candidate sets partition the graph, and there is a bijection from the set of histories reachable at the end of round i i and the set of candidate sets attainable at the end of round i i .
While the model’s intuition was correct, obtaining a rigorous proof for part (b) was more challenging. It initially treated the result as self-evident; when pressed for details, it produced a circular argument (very briefly, to prove the bijection, it assumed that the candidate sets already form a partition of the set of vertices). We tried guiding it toward an inductive proof, and later toward a direct non-inductive one, but the circular argument persisted in both. The turning point came when we gave it a hint: assume by contradiction that a vertex lies in two candidate sets, and use part (a) to analyze the roots of the trees underlying the two sets. With this hint, the model generated a correct proof that is now found in the paper.
The remainder of the t t -round lower bound argument was correct. Thus we obtained:
Theorem 7.13
(Theorem 2 in [ 13 ] ) .
Let G = ( V , E ) G=(V,E) be a connected undirected graph with n n vertices. The randomized query complexity of finding a local minimum on G G in t ∈ ℕ ∗ t\in\mathbb{N}^{*} rounds is Ω ( t n 1 / t − t ) \Omega(tn^{1/t}-t) .
We also asked the model to prove a linear lower bound for local search in two rounds on constant-degree expanders. Instead, it responded by proposing a Parallel Steepest Descent with a Warm Start algorithm. This algorithm has a query complexity slightly better than linear even for two rounds on any graph with constant maximum degree: O ( n / log n ) O(n/\log{n}) . The bound given by this algorithm is stated next for any number of rounds.
Proposition 7.14
(Proposition 1 in [ 13 ] ) .
Let G = ( V , E ) G=(V,E) be a graph with n n vertices and maximum degree Δ \Delta . The randomized query complexity of finding a local minimum in t ≥ 2 t\geq 2 rounds is O ( n + t ) O(\sqrt{n}+t) when Δ ≤ 2 \Delta\leq 2 and O ( n t ⋅ log Δ n + t Δ 2 n ) O\bigl(\frac{n}{t\cdot\log_{\Delta}n}+t\Delta^{2}\sqrt{n}\bigr) when Δ ≥ 3 \Delta\geq 3 .
Concluding Remarks.
Our experiment demonstrates that the model can expedite theoretical research when treated as a research assistant. For instance, when guided toward separation numbers, the model developed the divide-and-conquer algorithm and its proof, leveraging the ‘Shattering Lemma’ as a key component for the analysis. This process required vigilance; we had to rigorously audit the model’s claims, as illustrated by a circular argument we uncovered in one of the lower bound lemmas. Despite the need for verification, the interaction was very fruitful, yielding a lower bound construction with a surprisingly elegant recursive structure.
Acknowledgements.
Simina Brânzei was supported by US National Science Foundation grant CCF-2238372. Ioannis Panageas was supported by US National Science Foundation grant CCF-2454115.
7.3
Robust Coresets
Written by Yi Li, David P. Woodruff, and Xuan Wu.
Problem Context
In the standard coreset framework, one is given a finite weighted collection of functions ℱ = { ( f , w f ) } \mathcal{F}={(f,w_{f})} , where each function f : ℝ d → ℝ ≥ 0 f:\mathbb{R}^{d}\to\mathbb{R}{\geq 0} is associated with a weight w f ≥ 0 w{f}\geq 0 . The loss function for ℱ \mathcal{F} is defined as ℒ ( ℱ ; x ) = ∑ f ∈ ℱ w f f ( x ) \mathcal{L}(\mathcal{F};x)=\sum_{f\in\mathcal{F}}w_{f}f(x) . An ε \varepsilon -coreset is a weighted subset ℱ ~ ⊂ ℱ \tilde{\mathcal{F}}\subset\mathcal{F} such that
( 1 − ε ) ℒ ( ℱ ; x ) ≤ ℒ ( ℱ ~ ; x ) ≤ ( 1 + ε ) ℒ ( ℱ ; x ) (1-\varepsilon)\mathcal{L}(\mathcal{F};x)\leq\mathcal{L}(\tilde{\mathcal{F}};x)\leq(1+\varepsilon)\mathcal{L}(\mathcal{F};x)
holds simultaneously for all possible values of x x . Classical problems such as ℓ p \ell_{p} -regression and clustering naturally fit into this framework. A typical approach to construct a coreset is sensitivity sampling, where each function f ∈ ℱ f\in\mathcal{F} is sampled with probability proportional to its sensitivity, defined as
σ ℱ ( f ) = sup x w f f ( x ) ℒ ( ℱ ; x ) . \sigma_{\mathcal{F}}(f)=\sup_{x}\frac{w_{f}f(x)}{\mathcal{L}(\mathcal{F};x)}.
For example, in ℓ 2 \ell_{2} -subspace embedding and regression, these sensitivities coincide with leverage scores. In the more general ℓ p \ell_{p} -subspace embedding and regression, exact sensitivities are harder to compute and Lewis weights are commonly used as an effective proxy. Similarly, in clustering, recent work [ 8 ] has shown that sensitivity sampling can achieve near-optimal coreset sizes. We also define the sensitivity of the function set ℱ \mathcal{F} as σ ( ℱ ) = ∑ f ∈ ℱ σ ℱ ( f ) \sigma(\mathcal{F})=\sum_{f\in\mathcal{F}}\sigma_{\mathcal{F}}(f) .
Robust coresets can be studied within the same framework by replacing the standard loss with a trimmed loss. Specifically, for an integer m ≥ 0 m\geq 0 , the trimmed loss is defined as
ℒ ( m ) ( ℱ ; x ) = min ℱ ′ ⊂ ℱ | ℱ ∖ ℱ ′ | ≤ m ∑ ( f , w f ) ∈ ℱ ′ n w f f ( x ) , \mathcal{L}^{(m)}(\mathcal{F};x)=\min_{\begin{subarray}{c}\mathcal{F}^{\prime}\subset\mathcal{F}\\
|\mathcal{F}\setminus\mathcal{F}^{\prime}|\leq m\end{subarray}}\sum_{(f,w_{f})\in\mathcal{F}^{\prime}}^{n}w_{f}f(x),
which discards (at most) m m largest contributions to the loss. Correspondingly, an ( ε , m ) (\varepsilon,m) -robust coreset is a weighted subset ℱ ~ ⊂ ℱ \tilde{\mathcal{F}}\subset\mathcal{F} such that
( 1 − ε ) ℒ ( m ) ( ℱ ; x ) ≤ ℒ ( m ) ( ℱ ~ ; x ) ≤ ( 1 + ε ) ℒ ( m ) ( ℱ ; x ) (1-\varepsilon)\mathcal{L}^{(m)}(\mathcal{F};x)\leq\mathcal{L}^{(m)}(\tilde{\mathcal{F}};x)\leq(1+\varepsilon)\mathcal{L}^{(m)}(\mathcal{F};x)
for all x x . When m = 0 m=0 , this definition reduces to the standard ε \varepsilon -coreset. Prior work [ 59 ] has shown that if the total sensitivity of ℱ \mathcal{F} is bounded by T T (i.e. σ ( ℱ ′ ) ≤ T \sigma(\mathcal{F}^{\prime})\leq T for any non-empty subset ℱ ′ ⊂ ℱ \mathcal{F}^{\prime}\subset\mathcal{F} ), then there exists an ( ε , m ) (\varepsilon,m) -robust coreset of size O ( T m / ε ⋅ log ( T m / ε ) ) + Q O(Tm/\varepsilon\cdot\log(Tm/\varepsilon))+Q , where Q Q denotes the size of a standard ε \varepsilon -coreset.
AI Contribution
AI provided a sharper analysis that eliminates the logarithmic factor, improving the robust coreset size bound to O ( T m / ε ) + Q O(Tm/\varepsilon)+Q , which is known to be tight.
Technical Details
We first review the algorithms from [ 59 ] and the associated guarantees before presenting the tighter analysis produced by AI.
Algorithm 1 Uniform ( A , ε , m ) \mathrm{Uniform}(A,\varepsilon,m)
1: A set A A of functions, parameters ε \varepsilon and m m
2: A subset D ⊆ A D\subseteq A
3: B ← ∅ B\leftarrow\emptyset
4: for each f ∈ A f\in A , with probability 1 m \frac{1}{m} , add f f to B B
5: for each f ∈ B f\in B , compute the sensitivity σ B ( f ) \sigma_{B}(f)
6: D ← { f ∈ B : σ B ( f ) ≥ ε 4 } D\leftarrow{f\in B:\sigma_{B}(f)\geq\frac{\varepsilon}{4}}
7: return D D
Algorithm 2 Refine ( D , ε , m ) \mathrm{Refine}(D,\varepsilon,m)
1: A coreset D D , parameters ε \varepsilon and m m
2: A refined subset D ~ \tilde{D} adapted for the robust optimization problem
3: D ~ ← ∅ \tilde{D}\leftarrow\emptyset
4: for ( f , ω f ) ∈ D (f,\omega_{f})\in D do
5: compute the sensitivity σ D ( f ) \sigma_{D}(f)
6: n f ← ⌈ m ε ⋅ σ D ( f ) ⌉ n_{f}\leftarrow\lceil\frac{m}{\varepsilon}\cdot\sigma_{D}(f)\rceil
7: Add n f n_{f} copies of ( f , ω f n f ) (f,\frac{\omega_{f}}{n_{f}}) to D ~ \tilde{D}
8: end for
9: return D ~ \tilde{D}
Algorithm 3 Coreset ( A , ε , m ) \mathrm{Coreset}(A,\varepsilon,m)
1: A set A A of functions, parameters ε \varepsilon and m m , and an algorithm Vanilla ( A ) \mathrm{Vanilla}(A) to construct an ε \varepsilon -coreset for A A
2: An ( ε , m ) (\varepsilon,m) -robust coreset for A A
3: S ← ∅ S\leftarrow\emptyset
4: R ← Θ ( m log T m ε ) R\leftarrow\Theta(m\log\frac{Tm}{\varepsilon})
5: for i = 1 , 2 , ⋯ , R i=1,2,\cdots,R do
6: D ← Uniform ( A , ε , m ) D\leftarrow\mathrm{Uniform}(A,\varepsilon,m)
7: S ← S ∪ D S\leftarrow S\cup D
8: end for
9: V ← Vanilla ( A ∖ S ) V\leftarrow\mathrm{Vanilla}(A\setminus S)
10: S ~ ← { ( f , 1 ) : f ∈ S } \tilde{S}\leftarrow{(f,1):f\in S}
11: Return S ~ ∪ Refine ( V , ε , m ) \tilde{S}\cup\mathrm{Refine}(V,\varepsilon,m) .
Algorithm 4 ModifiedCoreset ( A , ε , m ) \mathrm{ModifiedCoreset}(A,\varepsilon,m)
1: A set A A of functions, parameters ε \varepsilon and m m , and an algorithm Vanilla ( A ) \mathrm{Vanilla}(A) to construct an ε \varepsilon -coreset for A A
2: An ( ε , m ) (\varepsilon,m) -robust coreset for A A
3: R ← 40 m ln ( 2000 T m / ε ) R\leftarrow 40m\ln(2000Tm/\varepsilon)
4: for i = 1 , 2 , ⋯ , R i=1,2,\cdots,R do
5: D i ← Uniform ( A , ε , m ) D_{i}\leftarrow\mathrm{Uniform}(A,\varepsilon,m)
6: for f ∈ D i f\in D_{i} do
7: c f ← c f + 1 c_{f}\leftarrow c_{f}+1
8: end for
9: end for
10: S ← { f ∈ A : c f ≥ R 10 m } S\leftarrow{f\in A:c_{f}\geq\frac{R}{10m}}
11: V ← Vanilla ( A ∖ S ) V\leftarrow\mathrm{Vanilla}(A\setminus S)
12: S ~ ← { ( f , 1 ) : f ∈ S } \tilde{S}\leftarrow{(f,1):f\in S}
13: Return S ~ ∪ Refine ( V , ε , m ) \tilde{S}\cup\mathrm{Refine}(V,\varepsilon,m) .
The algorithms from [ 59 ] are reproduced in Algorithms 1 to 3 , where Algorithm 3 outputs the final robust coreset. A function f ∈ A f\in A is called contributing if there exists x ∈ ℝ d x\in\mathbb{R}^{d} such that f ( x ) ≥ ε m ⋅ ℒ ( m ) ( A ; x ) f(x)\geq\frac{\varepsilon}{m}\cdot\mathcal{L}^{(m)}(A;x) . The following guarantee of Algorithm 1 was established in Lemma 4.3 of [ 59 ] .
Lemma 7.15
.
Assume that f f is contributing, then with probability at least 1 5 m \frac{1}{5m} , the set returned by Uniform ( A , ε , m ) \mathrm{Uniform}(A,\varepsilon,m) contains f f .
It was then argued in [ 59 ] that the number of contributing functions in A A is O ( T m ε ⋅ log T m ε ) O(\frac{Tm}{\varepsilon}\cdot\log\frac{Tm}{\varepsilon}) . Consequently, Algorithm 3 executes Θ ( m log ( T m / ε ) ) \Theta(m\log(Tm/\varepsilon)) rounds of Algorithm 1 in order to capture all contributing functions with high probability, thereby forming the basis of the robust coreset that can handle outliers.
Now we describe our interactions with AI. We first entered the following prompt.
Prompt 1.
Can you resolve the logarithmic factor gap in the coreset construction in the attached paper? Feel free to give both a tighter analysis as well as to change the algorithmic framework in the paper in order to achieve this improved bound.
Gemini.
Yes, the logarithmic factor gap in the coreset construction size can be resolved. The paper presents a coreset of size O ( K log K ) + Q O(K\log K)+Q , where K = T m / ε K=Tm/\varepsilon . We demonstrate that this can be improved to O ( K ) + Q O(K)+Q in expectation and with constant probability.
This improvement is achieved in two steps: first, by providing a tighter analysis of the number of contributing functions, and second, by modifying Algorithm 3 to filter out non-contributing functions more effectively using frequency estimation.
…
The AI produces the following bound on the number of contributing functions, which already improves the argument in [ 59 ] .
Lemma 7.16
.
The number of contributing functions in A A is 20 T m ε 20\frac{Tm}{\varepsilon} .
Proof.
Let C C be the set of contributing functions, N = | C | N=|C| . Let D D be the output of a single execution of Algorithm 1 . Algorithm 1 returns D = { f ∈ B : σ B ( f ) ≥ ε / 4 } D={f\in B:\sigma_{B}(f)\geq\varepsilon/4} . By the assumption of strong total sensitivity T T , we have σ B ≤ T \sigma_{B}\leq T . Therefore, | D | ⋅ ( ε / 4 ) ≤ ∑ f ∈ D σ B ( f ) ≤ σ B ≤ T |D|\cdot(\varepsilon/4)\leq\sum_{f\in D}\sigma_{B}(f)\leq\sigma_{B}\leq T . This implies | D | ≤ 4 T / ε |D|\leq 4T/\varepsilon deterministically. Let W = 4 T / ε W=4T/\varepsilon . Consequently, the expected size of D D is bounded: 𝔼 [ | D | ] ≤ W \mathbb{E}[|D|]\leq W .
Let p f = Pr ( f ∈ D ) p_{f}=\Pr(f\in D) . By 7.15 , if f ∈ C f\in C , then p f ≥ 1 / ( 5 m ) p_{f}\geq 1/(5m) . By linearity of expectation:
𝔼 [ | D | ] = ∑ f ∈ A p f ≥ ∑ f ∈ C p f ≥ N ⋅ 1 5 m . \mathbb{E}[|D|]=\sum_{f\in A}p_{f}\geq\sum_{f\in C}p_{f}\geq N\cdot\frac{1}{5m}.
Combining the inequalities:
N 5 m ≤ W ⟹ N ≤ 5 m W = 5 m 4 T ε = 20 T m ε . ∎ \frac{N}{5m}\leq W\implies N\leq 5mW=5m\frac{4T}{\varepsilon}=\frac{20Tm}{\varepsilon}.\qed
The AI then proceeds with a modified version of Algorithm 3 by tracking the frequency of each function in S S and discarding the ones with low frequencies. It then presents a theorem as follows.
Theorem 7.17
(Improved Coreset Construction) .
Algorithm 3 -Modified constructs an ( ε , m ) (\varepsilon,m) -robust coreset for A A of expected size O ( K ) + Q O(K)+Q . The construction succeeds with high probability (at least 1 − O ( 1 / n ) 1-O(1/n) ).
The proof is correct, though we omit it here, since the theorem states an expected size of the coreset, which is not desirable. We further make the following prompt.
Prompt 2.
It looks correct, but the Theorem only gives the size in expectation. The earlier version had a high-probability bound, which shouldn’t be hard to get. Maybe you can prompt it to produce an upper bound that holds with high probability? Actually we need only probability ≥ 0.99 \geq 0.99 instead of 1 − 1 / n 1-1/n .
Gemini.
We demonstrate that the logarithmic factor in the coreset size bound of Theorem 4.1 in the paper "Robust Sparsification via Sensitivity" can be removed. The original paper presents a coreset of size O ( K log K ) + Q O(K\log K)+Q , where K = T m / ε K=Tm/\varepsilon . We show that a modification of the algorithm yields a coreset of size O ( K ) + Q O(K)+Q with high probability (at least 0.99).
…
Key Properties of Algorithm 1 :
Deterministic Size Bound: The output D D always satisfies | D | ≤ W |D|\leq W . This is because ∑ f ∈ D σ B ( f ) ≤ σ B ≤ T \sum_{f\in D}\sigma_{B}(f)\leq\sigma_{B}\leq T . Since σ B ( f ) ≥ ε / 4 \sigma_{B}(f)\geq\varepsilon/4 for f ∈ D f\in D , we have | D | ⋅ ( ε / 4 ) ≤ T |D|\cdot(\varepsilon/4)\leq T , so | D | ≤ W |D|\leq W .
Expected Size Bound: 𝔼 [ | D | ] ≤ W \mathbb{E}[|D|]\leq W .
Capture Probability (Lemma 4.3): If f ∈ C f\in C , the probability p f = P ( f ∈ D ) p_{f}=P(f\in D) satisfies p f ≥ 1 / ( 5 m ) p_{f}\geq 1/(5m) .
…
AI returns a correct version of the theorem.
Theorem 7.18
.
The Modified Algorithm constructs an ( ε , m ) (\varepsilon,m) -robust coreset for A A of size O ( K ) + Q O(K)+Q with probability at least 0.99.
The proof is correct, though somewhat suboptimal. It uses Markov’s inequality to bound the total number of sampled functions over R R rounds, which is unnecessary. This suboptimality arises from a weaker property generated by the AI, as presented in its response above. In particular, Point 1 already provides a deterministic bound | D | ≤ W |D|\leq W , rendering the expected bound in Point 2 unnecessary.
Therefore, we further prompt the AI to improve the argument.
Prompt 3.
This looks correct though it is a bit suboptimal when bounding the size. Specifically, it says
.
In the first bullet point, it already obtains that | D | ≤ W |D|\leq W , so the second bullet point is unnecessary. In the subsequent proof of Part 2 of the theorem, it could directly use the deterministic bound | D | ≤ W |D|\leq W instead of the expectation bound. This will further tighten the analysis a bit.
Gemini.
The results are correct and are reproduced below with minor modifications to fit the present context. The algorithm generated by the AI is presented in Algorithm 4 , with slight simplifications.
Theorem 7.19
.
Let ε ∈ ( 0 , 1 2 ) \varepsilon\in(0,\frac{1}{2}) and ℱ \mathcal{F} be a unit-weight function set with total sensitivity T T . Suppose that there exists an algorithm that computes a vanilla ε \varepsilon -coreset for F F of size Q Q . Then, Algorithm 4 computes an ( ε , m ) (\varepsilon,m) -robust coreset for F F of size O ( T m ε ) + Q O(\frac{Tm}{\varepsilon})+Q , with probability at least 0.99 0.99 .
Proof.
Size bound. We analyze the size of the set S S returned by the algorithm. Let C total C_{\mathrm{total}} be the total count of all functions captured across all R R iterations.
C total = ∑ i = 1 R | D i | . C_{\mathrm{total}}=\sum_{i=1}^{R}|D_{i}|.
By the deterministic size bound property of Algorithm 1 , we know | D i | ≤ W := 4 T / ε |D_{i}|\leq W:=4T/\varepsilon for all i i . Therefore, C total C_{\mathrm{total}} is deterministically bounded:
C total ≤ ∑ i = 1 R W = R W . C_{\mathrm{total}}\leq\sum_{i=1}^{R}W=RW.
Now we relate the size of S S to C total C_{\mathrm{total}} . By definition, f ∈ S f\in S if and only if c f ≥ R θ c_{f}\geq R\theta for θ = 1 / ( 10 m ) \theta=1/(10m) .
C total = ∑ f ∈ A c f ≥ ∑ f ∈ S c f ≥ ∑ f ∈ S ( R θ ) = | S | ⋅ R θ . C_{\mathrm{total}}=\sum_{f\in A}c_{f}\geq\sum_{f\in S}c_{f}\geq\sum_{f\in S}(R\theta)=|S|\cdot R\theta.
Therefore, we have a deterministic bound on | S | |S| :
| S | ≤ C total R θ ≤ R W R θ = W θ = 40 T m ε . |S|\leq\frac{C_{\mathrm{total}}}{R\theta}\leq\frac{RW}{R\theta}=\frac{W}{\theta}=\frac{40Tm}{\varepsilon}.
Thus, | S | ≤ 40 K |S|\leq 40K , where K = T m / ε K=Tm/\varepsilon . The size bound | S | = O ( K ) |S|=O(K) holds deterministically.
Correctness. We want to bound the probability that a contributing function f ∈ C f\in C is missed ( f ∉ S f\notin S ). The count c f c_{f} is a sum of R R independent indicators. If f ∈ C f\in C , we have 𝔼 [ c f ] ≥ R / ( 5 m ) \mathbb{E}[c_{f}]\geq R/(5m) . If f ∉ S f\not\in S , it means that c f
Pr ( f ∉ S ) = Pr ( c f < R θ ) = Pr ( c f ≤ 1 2 𝔼 [ c f ] ) ≤ exp ( − R / ( 5 m ) 8 ) = exp ( − 40 m ln ( 2000 K ) 40 m ) = 1 2000 K . \Pr(f\notin S)=\Pr(c_{f}
=\exp\left(-\frac{40m\ln(2000K)}{40m}\right)=\frac{1}{2000K}.
We apply a union bound over C C . By Lemma 7.16 , | C | ≤ 20 K |C|\leq 20K .
Pr ( C ⊈ S ) = Pr ( ∃ f ∈ C : f ∉ S ) ≤ ∑ f ∈ C Pr ( f ∉ S ) ≤ 20 K ⋅ 1 2000 K = 1 100 = 0.01 . \Pr(C\not\subseteq S)=\Pr(\exists f\in C:f\notin S)\leq\sum_{f\in C}\Pr(f\notin S)\leq 20K\cdot\frac{1}{2000K}=\frac{1}{100}=0.01.
The algorithm deterministically produces a set S S of size | S | ≤ 40 K |S|\leq 40K . With probability at least 1 − 0.01 = 0.99 1-0.01=0.99 , the algorithm ensures C ⊆ S C\subseteq S . The remainder of the correctness proof follows exactly as in Theorem 4.1 of [ 59 ] . ∎
7.4
Submodular Function Maximization in a Stream
Written by Ola Svensson.
Problem context
We consider monotone submodular maximization under a cardinality constraint in the adversarial injection streaming model introduced in the paper “Robust Algorithms under Adversarial Injections” [ 46 ] . In this model, the stream consists of an (unknown) set E good E_{\mathrm{good}} of “genuine” elements arriving in uniformly random order , interleaved with an arbitrary set E noise E_{\mathrm{noise}} of injected elements placed by an adversary. The motivation of this semi-random model is to design algorithms in a beyond-worst-case model while not overfitting to the assumption (such as all elements arrive in a uniformly at random order). The goal is to output a set S ⊆ E good ∪ E noise S\subseteq E_{\mathrm{good}}\cup E_{\mathrm{noise}} of size at most k k with value close to
𝖮𝖯𝖳 := max | T | ≤ k , T ⊆ E good f ( T ) , \mathsf{OPT}:=\max_{|T|\leq k,\;T\subseteq E_{\mathrm{good}}}f(T),
where f f is a nonnegative monotone submodular function given by value-oracle access. We remark that it is important that 𝖮𝖯𝖳 \mathsf{OPT} is defined with respect to the optimal solution of elements in E good E_{\mathrm{good}} . Indeed, if 𝖮𝖯𝖳 \mathsf{OPT} were also allowed to contain elements of E noise E_{\mathrm{noise}} then the model would be equivalent to a fully adversarial one.
The above paper proposes a streaming algorithm for this setting (the tree algorithm for monotone submodular maximization), achieving an approximation factor around 0.55 0.55 while storing a number of elements independent of | E good | |E_{\mathrm{good}}| (but exponential in k k ). Two concrete questions were left open for submodular maximization in this model: (i) whether one can reach the offline-optimal constant ( 1 − 1 / e ) (1-1/e) , and (ii) whether one can reduce memory to poly ( k ) \mathrm{poly}(k) elements.
The tree algorithm and where the analysis bottleneck appears
At a high level, the tree algorithm maintains a rooted tree of depth k k whose root-to-leaf paths represent candidate size- ≤ k \leq k solutions. When a new element e e arrives, the algorithm considers attaching e e beneath existing nodes based on its marginal contribution. The tree is kept from exploding by merging equal-marginal states : within each level, only one representative for each marginal-gain value is retained. This keeps the algorithm stream-length independent , but in the worst case still requires exponentially many stored elements as a function of k k .
The original approximation analysis proceeds by tracking a carefully chosen leaf as the k k (unknown) optimal elements from E good E_{\mathrm{good}} appear in the stream. A central technical device is a threshold parameter t t used in a case split comparing (i) the best “available” marginal gain since the last optimal element appeared, versus (ii) the marginal gain of the next unseen optimal element. This case split yields a recurrence R ( k , h ) R(k,h) that lower bounds the fraction of 𝖮𝖯𝖳 \mathsf{OPT} captured after “progress” through h h optimal elements with budget k k , with the final guarantee given by R ( k , k ) R(k,k) . In the paper, t t is chosen globally (independent of the state ( k , h ) (k,h) ); numerically, setting t ≈ 0.8 t\approx 0.8 yields R ( k , k ) ≈ 0.55067 R(k,k)\approx 0.55067 .
A key insight by Gemini—which ultimately enabled our improvement—is that t t is not an algorithm parameter . It only appears in the analysis , meaning we are free to choose a different threshold in different states as long as the inequalities used in the proof remain valid.
AI Contribution The AI identified a latent degree of freedom in the existing algorithmic analysis. By replacing a global threshold with a state-dependent threshold, the model autonomously derived and proved an optimized recurrence relation, strictly improving the algorithm’s approximation ratio from ≈ 0.55 \approx 0.55 to 2 − 2 2-\sqrt{2} .
We used Gemini to explore whether the analysis could be sharpened.
Gemini did not produce a new algorithm achieving ( 1 − 1 / e ) (1-1/e) , nor did it find a way to reduce the memory requirement to poly ( k ) \mathrm{poly}(k) elements in this adversarial injection setting. However, it did identify a clean way to improve the existing analysis of the tree algorithm: replace the single global threshold t t by a state-dependent threshold t k , h t_{k,h} in the recurrence.
Concretely, Gemini suggested using
t k , h := k k 2 + h 2 , t_{k,h}\ :=\ \frac{k}{\sqrt{k^{2}+h^{2}}},
and first predicted and then formally proved that the corresponding optimized recurrence achieves the exact constant
R ( k , k ) = 2 − 2 ≈ 0.585786 , R(k,k)\ =\ 2-\sqrt{2}\ \approx\ 0.585786,
thereby improving the analyzed approximation ratio of the same underlying algorithm from ≈ 0.55067 \approx 0.55067 to ≈ 0.58579 \approx 0.58579 .
The remainder of this testimonial highlights the four prompts that led from “idea” to a complete, algorithm-level guarantee.
Prompt 1: exploratory questions, and the key analysis insight
Prompt 1.
Please investigate if you can improve the submodular part of the attached paper. Specifically:
•
Can we get ( 1 − 1 / e ) (1-1/e) for submodular maximization?
•
Can we only store poly ( k ) \mathrm{poly}(k) elements instead of exponentially many in k k ?
Be mathematically rigorous.
Gemini (summary).
The model did not produce a new algorithm achieving ( 1 − 1 / e ) (1-1/e) , and it did not resolve how to store only poly ( k ) \mathrm{poly}(k) elements.
However, it identified that the analysis recurrence in the paper can be strengthened by letting the threshold depend on the state: instead of a global t t , introduce t k , h t_{k,h} .
It further hypothesized that the best constant achievable within this recurrence framework is
2 − 2 2-\sqrt{2} , and that a natural candidate is a closed-form threshold of the form
t k , h = t ( h / k ) t_{k,h}=t(h/k) .
Figure 8 : The first prompt where Gemini identified the direction for improvement and the improved guarantee.
In the first interaction with Gemini ( fig. ˜ 8 ), we gave a fairly general prompt with two open questions. Even though the response did not resolve the two open algorithmic questions, it pinpointed a previously unexplored degree of freedom in the analysis and proposed (correctly) that exploiting it improves the approximation factor.
Prompt 2: a rigorous (but asymptotic) derivation via continuous limits
Prompt 2.
Can you rigorously improve it to obtain a 2 − 2 2-\sqrt{2} approximation, as suggested?
Gemini (summary).
Gemini produced a rigorous argument in a continuous/asymptotic regime (as k → ∞ k\to\infty ) by recasting the recurrence optimization as a minimax problem that leads to differential inequalities/ODE-type conditions.
While this was not yet the discrete proof we needed, it strongly indicated the correct functional form of the optimal thresholds, and guided us to the explicit choice
t k , h = k k 2 + h 2 . t_{k,h}=\frac{k}{\sqrt{k^{2}+h^{2}}}.
This directly set up Prompt 3, where we asked for a fully discrete proof.
Figure 9 : In the second prompt, we asked for a rigorous proof of the improved guarantee.
In the second prompt, fig. ˜ 9 , we asked for a rigorous proof and Gemini responded with an argument in the asymptotic regime. Even though the ODE argument was “not exactly what we asked for”, it served as a highly effective derivation tool : it revealed the right closed-form threshold and the right constant before we invested time in a discrete induction.
Prompt 3: a complete discrete proof of the optimized recurrence
We next asked Gemini to prove the optimized recurrence formally, with the thresholds fixed to the candidate form, see fig. ˜ 10 for the third prompt.
Prompt 3 (condensed).
Define t k , h = k k 2 + h 2 t_{k,h}=\frac{k}{\sqrt{k^{2}+h^{2}}} and the recurrence
R ( k , h ) = min { A k , h , B k , h , C k , h } R(k,h)=\min\{A_{k,h},B_{k,h},C_{k,h}\}
with
A k , h \displaystyle A_{k,h}
:= t k , h k + ( 1 − t k , h k ) R ( k , h − 1 ) , \displaystyle:=\frac{t_{k,h}}{k}+\Bigl(1-\frac{t_{k,h}}{k}\Bigr)R(k,h-1),
B k , h \displaystyle B_{k,h}
:= 1 k + ( 1 − 1 + t k , h k ) R ( k − 1 , h − 1 ) , \displaystyle:=\frac{1}{k}+\Bigl(1-\frac{1+t_{k,h}}{k}\Bigr)R(k-1,h-1),
C k , h \displaystyle C_{k,h}
:= 1 1 + t k , h , \displaystyle:=\frac{1}{1+t_{k,h}},
and base R ( k , 0 ) = 0 R(k,0)=0 . Prove that R ( k , k ) = 2 − 2 R(k,k)=2-\sqrt{2} for all integers k ≥ 1 k\geq 1 .
Gemini (summary).
Gemini proposed an explicit closed-form candidate P ( k , h ) P(k,h) (as a function of h / k h/k ) and showed it satisfies the recurrence by induction, yielding the exact identity
R ( k , k ) = 2 − 2 R(k,k)=2-\sqrt{2} for all k ≥ 1 k\geq 1 .
Figure 10 : In our third prompt, we asked for and received a complete proof of the discrete recurrence.
At this point, Gemini had delivered an almost complete proof of the mathematical core: the optimal solution of the improved recurrence. This was the key missing technical ingredient needed to upgrade the approximation factor.
Prompt 4: connecting the recurrence back to the tree algorithm (and a hidden monotonicity issue)
Finally, we asked Gemini to connect the recurrence to the original algorithm analysis, see Figure ˜ 11 . This step is subtle because the original paper uses a global t t in an inequality of the form R ( k − 1 , h − 1 ) ≤ 1 / ( 1 + t ) R(k-1,h-1)\leq 1/(1+t) , whereas our thresholds vary with ( k , h ) (k,h) .
Prompt 4 (condensed).
Please relate the recurrence with state-dependent thresholds t = t k , h t=t_{k,h} to the analysis of the tree algorithm. In particular, in the analogue of Lemma 4 from the paper, the proof used R ( k − 1 , h − 1 ) ≤ 1 / ( 1 + t ) R(k-1,h-1)\leq 1/(1+t) for a global t t . Now t t depends on ( k , h ) (k,h) , so this step requires care. Please write step-by-step how R ( k , h ) R(k,h) bounds the approximation guarantee, and identify what additional property is needed.
Gemini (summary).
Gemini provided a step-by-step mapping from the algorithm’s case analysis to the three terms
A k , h A_{k,h} , B k , h B_{k,h} , and C k , h C_{k,h} , showing how R ( k , h ) R(k,h) lower bounds the value of a carefully chosen leaf as the optimal elements appear.
Crucially, it identified that the “global- t t ” inequality is replaced by a requirement of the form
R ( k − 1 , h − 1 ) ≤ 1 1 + t k , h , R(k-1,h-1)\ \leq\ \frac{1}{1+t_{k,h}},
which does not automatically hold for arbitrary state-dependent thresholds. Gemini then pinpointed a sufficient condition: a monotonicity relation along the diagonal,
t k , h ≤ t k − 1 , h − 1 , t_{k,h}\ \leq\ t_{k-1,h-1},
which implies 1 1 + t k − 1 , h − 1 ≤ 1 1 + t k , h \frac{1}{1+t_{k-1,h-1}}\leq\frac{1}{1+t_{k,h}} and allows the induction to go through. It also proved this monotonicity for the chosen optimal thresholds t k , h = k k 2 + h 2 t_{k,h}=\frac{k}{\sqrt{k^{2}+h^{2}}} .
Figure 11 : In the fourth prompt, we asked Gemini to correctly relate the improved analysis of the recurrence to the improved analysis of the algorithm.
This last step was important for correctness: it is easy to “solve” an abstract recurrence while accidentally breaking an inequality needed to justify the recurrence from the algorithm. Gemini not only identified the missing monotonicity condition but also proved it for the optimal threshold schedule, thereby completing the logical chain from the recurrence back to the streaming algorithm.
Outcome and takeaway
The outcome of this interaction was a strictly improved approximation guarantee for the same tree algorithm in the adversarial injection streaming model:
approximation ratio improved from ≈ 0.55067 to 2 − 2 ≈ 0.58579 , \text{approximation ratio improved from }\approx 0.55067\text{ to }2-\sqrt{2}\approx 0.58579,
achieved by refining the analysis (via a sharper, state-dependent choice of thresholds).
From a “how to use Gemini” perspective, this case study highlights a valuable contribution: even when a model does not solve the headline open problems (here, ( 1 − 1 / e ) (1-1/e) or poly ( k ) \mathrm{poly}(k) memory), it can still make a decisive contribution by (i) identifying a latent degree of freedom in an existing proof, (ii) guessing the right closed form for an optimum constant/parameter schedule, and (iii) producing formal proofs with minimal guidance.
The Formal Improved Analysis by Gemini
The following is the complete proof that was automatically generated by Gemini. We have kept the writing exactly as automatically produced.
Preliminaries and Notation
Submodularity.
A function f f is submodular if for all S ⊆ T ⊂ E S\subseteq T\subset E and e ∈ E ∖ T e\in E\setminus T , the marginal gain f ( e | S ) := f ( S ∪ { e } ) − f ( S ) f(e|S):=f(S\cup{e})-f(S) satisfies f ( e | S ) ≥ f ( e | T ) f(e|S)\geq f(e|T) . It is monotone if f ( S ) ≤ f ( T ) f(S)\leq f(T) for S ⊆ T S\subseteq T .
Lemma 7.20
(Average Marginal Bound) .
Let O O be a set of size k k and S S be any partial solution. For a monotone submodular function f f , the expected marginal gain of a random element o ∈ O o\in O satisfies:
𝔼 o ∼ O [ f ( o ∣ S ) ] ≥ f ( O ∪ S ) − f ( S ) k \mathbb{E}_{o\sim O}[f(o\mid S)]\geq\frac{f(O\cup S)-f(S)}{k}
If S = ∅ S=\emptyset and O = E o p t O=E_{opt} , then 𝔼 o ∼ E o p t [ f ( o ) ] ≥ O P T k \mathbb{E}{o\sim E{opt}}[f(o)]\geq\frac{OPT}{k} .
Proof.
This follows from the definition of submodularity. The sum of the marginals of elements in O O added to S S is at least the total gain f ( O ∪ S ) − f ( S ) f(O\cup S)-f(S) . Taking the average yields the result. ∎
Stream Notation
Let σ \sigma be the input stream. E o p t = { o 1 , … , o k } E_{opt}={o_{1},\dots,o_{k}} . Let π \pi be the random permutation of E o p t E_{opt} . o i π o^{\pi}{i} is the i i -th element of E o p t E{opt} to arrive in the stream. O i π = { o 1 π , … , o i π } O^{\pi}{i}={o^{\pi}{1},\dots,o^{\pi}{i}} . O P T = f ( E o p t ) OPT=f(E{opt}) .
The Tree Algorithm
We briefly review the Tree Algorithm from [ 46 ] . For simplicity, we assume the algorithm knows O P T OPT and that the set of possible marginal gains I I is small (these assumptions can be removed using standard techniques involving discretization and guessing, as detailed in the appendix of [ 46 ] ).
The algorithm maintains a rooted tree 𝒯 \mathcal{T} of height at most k k . Each node represents a partial solution (the set of elements on the path from the root to the node).
Algorithm 5 Tree Algorithm
1: Initialize tree 𝒯 \mathcal{T} with only the root (representing S = ∅ S=\emptyset ).
2: for each element e e arriving in the stream σ \sigma do
3: for each node v v in 𝒯 \mathcal{T} do
4: Let S v S_{v} be the solution corresponding to v v .
5: if | S v | < k |S_{v}|<k then
6: Calculate the marginal gain g = f ( e | S v ) g=f(e|S_{v}) .
7: if v v has no child c c such that f ( c | S v ) = g f(c|S_{v})=g then
8: Add a new child node corresponding to S v ∪ { e } S_{v}\cup{e} .
9: end if
10: end if
11: end for
12: end for
13: Return the solution S A L G S_{ALG} corresponding to the leaf with maximum value f ( S A L G ) f(S_{ALG}) .
Analysis via Generalized Recurrence
The core of the analysis involves defining a specific path in the execution tree and lower bounding its expected value. We generalize the approach by introducing a state-dependent threshold.
The Generalized Recurrence Relation
We define the domain 𝒟 := { ( k , h ) ∈ ℤ ≥ 0 2 : 0 ≤ h ≤ k } \mathcal{D}:={(k,h)\in\mathbb{Z}_{\geq 0}^{2}:0\leq h\leq k} . We introduce a generalized recurrence R ( k , h ) R(k,h) which aims to lower bound the approximation ratio when the optimal solution has size k k and the algorithm’s budget is h h .
Definition 7.21
(Threshold Function) .
We define a threshold function T : 𝒟 ∖ { ( 0 , 0 ) } → ( 0 , 1 ] T:\mathcal{D}\setminus{(0,0)}\to(0,1] .
Definition 7.22
(Generalized Recurrence) .
Let R : 𝒟 → ℝ ≥ 0 R:\mathcal{D}\to\mathbb{R}_{\geq 0} be defined as follows: R ( k , 0 ) := 0 R(k,0):=0 for all k ≥ 0 k\geq 0 . For ( k , h ) ∈ 𝒟 (k,h)\in\mathcal{D} with k ≥ 1 , h ≥ 1 k\geq 1,h\geq 1 , let t = T ( k , h ) t=T(k,h) .
R ( k , h ) := min { A k , h , B k , h , C k , h } R(k,h):=\min\{A_{k,h},B_{k,h},C_{k,h}\}
where
A k , h \displaystyle A_{k,h}
= t k + ( 1 − t k ) R ( k , h − 1 ) , \displaystyle=\frac{t}{k}+\left(1-\frac{t}{k}\right)R(k,h-1),
B k , h \displaystyle B_{k,h}
= 1 k + ( 1 − 1 + t k ) R ( k − 1 , h − 1 ) , \displaystyle=\frac{1}{k}+\left(1-\frac{1+t}{k}\right)R(k-1,h-1),
C k , h \displaystyle C_{k,h}
= 1 1 + t . \displaystyle=\frac{1}{1+t}.
Adapted Path Selection Strategy
To utilize this recurrence, the analysis employs a path selection strategy that dynamically uses the threshold corresponding to the current subproblem. This strategy defines a path in the tree whose expected value we analyze.
We define the path S h π = { s 1 π , … , s h π } S^{\pi}{h}={s^{\pi}{1},\dots,s^{\pi}{h}} inductively. We track the state ( k i , h i , n i ) (k{i},h_{i},n_{i}) : remaining OPT size, remaining budget, and index of the current OPT element. Initialize ( k 1 , h 1 , n 1 ) = ( k , h , 1 ) (k_{1},h_{1},n_{1})=(k,h,1) .
At step i i (for i = 1 , … , h i=1,\dots,h ):
- Set the threshold t i = T ( k i , h i ) t_{i}=T(k_{i},h_{i}) .
- Define u i π u^{\pi}{i} (the best non-OPT element seen so far) and v i π v^{\pi}{i} (the better of u i π u^{\pi}{i} and the current OPT element o n i π o^{\pi}{n_{i}} ) relative to S i − 1 π S^{\pi}_{i-1} (as defined in [ 46 ] ).
- Apply the decision rule:
•
Case 1 (Greedy Choice): If 𝔼 π [ f ( u i π | S i − 1 π ) ] ≥ t i ⋅ 𝔼 π [ f ( o n i π | S i − 1 π ) ] \mathbb{E}{\pi}[f(u^{\pi}{i}|S^{\pi}{i-1})]\geq t{i}\cdot\mathbb{E}{\pi}[f(o^{\pi}{n_{i}}|S^{\pi}{i-1})] . Set s i π = u i π s^{\pi}{i}=u^{\pi}{i} . Update: ( k i + 1 , h i + 1 ) = ( k i , h i − 1 ) (k{i+1},h_{i+1})=(k_{i},h_{i}-1) , n i + 1 = n i n_{i+1}=n_{i} .
•
Case 2 (Wait for OPT): Otherwise. Set s i π = v i π s^{\pi}{i}=v^{\pi}{i} . Update: ( k i + 1 , h i + 1 ) = ( k i − 1 , h i − 1 ) (k_{i+1},h_{i+1})=(k_{i}-1,h_{i}-1) , n i + 1 = n i + 1 n_{i+1}=n_{i}+1 .
Connecting the Recurrence to the Algorithm
We now prove that R ( k , h ) R(k,h) bounds the performance of this strategy, provided the threshold function satisfies a monotonicity property.
Definition 7.23
(Monotonic Threshold Condition (MTC)) .
A threshold function T ( k , h ) T(k,h) satisfies the Monotonic Threshold Condition (MTC) if for all k ≥ 2 , h ≥ 1 k\geq 2,h\geq 1 :
T ( k , h ) ≤ T ( k − 1 , h − 1 ) . T(k,h)\leq T(k-1,h-1).
Lemma 7.24
.
If T ( k , h ) T(k,h) satisfies the MTC, then for all k ≥ 1 , h ≥ 1 k\geq 1,h\geq 1 , letting t = T ( k , h ) t=T(k,h) :
R ( k − 1 , h − 1 ) ≤ 1 1 + t . R(k-1,h-1)\leq\frac{1}{1+t}.
Proof.
If k = 1 k=1 , then h = 1 h=1 . R ( 0 , 0 ) = 0 R(0,0)=0 . Since t > 0 t>0 , the inequality holds. Assume k ≥ 2 k\geq 2 . By definition of the recurrence:
R ( k − 1 , h − 1 ) ≤ C k − 1 , h − 1 = 1 1 + T ( k − 1 , h − 1 ) . R(k-1,h-1)\leq C_{k-1,h-1}=\frac{1}{1+T(k-1,h-1)}.
By the MTC, T ( k , h ) ≤ T ( k − 1 , h − 1 ) T(k,h)\leq T(k-1,h-1) . Thus 1 + T ( k , h ) ≤ 1 + T ( k − 1 , h − 1 ) 1+T(k,h)\leq 1+T(k-1,h-1) .
R ( k − 1 , h − 1 ) ≤ 1 1 + T ( k − 1 , h − 1 ) ≤ 1 1 + T ( k , h ) = 1 1 + t . R(k-1,h-1)\leq\frac{1}{1+T(k-1,h-1)}\leq\frac{1}{1+T(k,h)}=\frac{1}{1+t}.
∎
We can now prove the main connection theorem.
Theorem 7.25
.
If the threshold function T ( k , h ) T(k,h) satisfies the MTC, then for all instances of the problem and ( k , h ) ∈ 𝒟 (k,h)\in\mathcal{D} , the solution S h π S^{\pi}_{h} defined by the adapted strategy satisfies:
𝔼 π [ f ( S h π ) ] ≥ R ( k , h ) ⋅ O P T . \mathbb{E}_{\pi}[f(S^{\pi}_{h})]\geq R(k,h)\cdot OPT.
Proof.
We proceed by induction on h h .
Base Case (h=0): R ( k , 0 ) = 0 R(k,0)=0 . 𝔼 π [ f ( S 0 π ) ] = f ( ∅ ) = 0 \mathbb{E}{\pi}[f(S^{\pi}{0})]=f(\emptyset)=0 . The theorem holds.
Inductive Step: Assume the theorem holds for h − 1 h-1 . We prove it for ( k , h ) (k,h) , where k ≥ 1 , h ≥ 1 k\geq 1,h\geq 1 . Let t = T ( k , h ) t=T(k,h) . We analyze the first step ( i = 1 i=1 ). By Lemma 7.20 , due to the random permutation, 𝔼 π [ f ( o 1 π ) ] ≥ O P T / k \mathbb{E}{\pi}[f(o^{\pi}{1})]\geq OPT/k .
Case 1: 𝔼 π [ f ( u 1 π ) ] ≥ t ⋅ 𝔼 π [ f ( o 1 π ) ] \mathbb{E}{\pi}[f(u^{\pi}{1})]\geq t\cdot\mathbb{E}{\pi}[f(o^{\pi}{1})] . We set s 1 π = u 1 π s^{\pi}{1}=u^{\pi}{1} .
As argued in [ 46 ] , u 1 π u^{\pi}{1} is determined by the elements arriving before the first OPT element. Since the adversary injects noise without knowledge of the permutation π \pi , the set of elements in this prefix is independent of π \pi . Therefore, u 1 π u^{\pi}{1} is independent of π \pi . Let s 1 = u 1 π s_{1}=u^{\pi}_{1} .
We consider the remaining subproblem I ′ I^{\prime} . It has k ′ = k k^{\prime}=k OPT elements and budget h ′ = h − 1 h^{\prime}=h-1 . The function is f ′ ( T ) = f ( T | s 1 ) f^{\prime}(T)=f(T|s_{1}) . The optimum of the subproblem is O P T ′ = f ( E o p t | s 1 ) OPT^{\prime}=f(E_{opt}|s_{1}) . By submodularity and monotonicity, O P T ′ ≥ O P T − f ( s 1 ) OPT^{\prime}\geq OPT-f(s_{1}) .
The remainder of the path S h π ∖ { s 1 } S^{\pi}{h}\setminus{s{1}} corresponds exactly to the path constructed by our strategy applied to I ′ I^{\prime} . By the Inductive Hypothesis (IH): 𝔼 π [ f ( S h π | s 1 ) ] ≥ R ( k , h − 1 ) ⋅ O P T ′ \mathbb{E}{\pi}[f(S^{\pi}{h}|s_{1})]\geq R(k,h-1)\cdot OPT^{\prime} .
𝔼 π [ f ( S h π ) ] \displaystyle\mathbb{E}{\pi}[f(S^{\pi}{h})]
= f ( s 1 ) + 𝔼 π [ f ( S h π | s 1 ) ] \displaystyle=f(s_{1})+\mathbb{E}{\pi}[f(S^{\pi}{h}|s_{1})]
≥ f ( s 1 ) + R ( k , h − 1 ) ( O P T − f ( s 1 ) ) \displaystyle\geq f(s_{1})+R(k,h-1)(OPT-f(s_{1}))
= f ( s 1 ) ( 1 − R ( k , h − 1 ) ) + R ( k , h − 1 ) ⋅ O P T . \displaystyle=f(s_{1})(1-R(k,h-1))+R(k,h-1)\cdot OPT.
We verify that R ( k , h − 1 ) 0 T(k,h-1)>0 , R ( k , h − 1 ) 0 1-R(k,h-1)>0 .
By the condition for Case 1 and Lemma 7.20 , f ( s 1 ) ≥ t ⋅ 𝔼 π [ f ( o 1 π ) ] ≥ t ⋅ O P T / k f(s_{1})\geq t\cdot\mathbb{E}{\pi}[f(o^{\pi}{1})]\geq t\cdot OPT/k .
𝔼 π [ f ( S h π ) ] \displaystyle\mathbb{E}{\pi}[f(S^{\pi}{h})]
≥ t ⋅ O P T k ( 1 − R ( k , h − 1 ) ) + R ( k , h − 1 ) ⋅ O P T \displaystyle\geq\frac{t\cdot OPT}{k}(1-R(k,h-1))+R(k,h-1)\cdot OPT
= O P T ⋅ [ t k + ( 1 − t k ) R ( k , h − 1 ) ] = A k , h ⋅ O P T . \displaystyle=OPT\cdot\left[\frac{t}{k}+\left(1-\frac{t}{k}\right)R(k,h-1)\right]=A_{k,h}\cdot OPT.
Case 2: 𝔼 π [ f ( u 1 π ) ]
We consider the remaining subproblem I ′ ( π ) I^{\prime}(\pi) . It depends on π \pi because both s 1 π s^{\pi}{1} and the remaining OPT elements E o p t ′ ( π ) = E o p t ∖ { o 1 π } E^{\prime}{opt}(\pi)=E_{opt}\setminus{o^{\pi}{1}} depend on π \pi . The parameters are ( k − 1 , h − 1 ) (k-1,h-1) . The function is f π ′ ( T ) = f ( T | s 1 π ) f^{\prime}{\pi}(T)=f(T|s^{\pi}_{1}) .
We analyze the expected value over π \pi . 𝔼 π [ f ( S h π ) ] = 𝔼 π [ f ( s 1 π ) ] + 𝔼 π [ f ( S h π | s 1 π ) ] \mathbb{E}{\pi}[f(S^{\pi}{h})]=\mathbb{E}{\pi}[f(s^{\pi}{1})]+\mathbb{E}{\pi}[f(S^{\pi}{h}|s^{\pi}_{1})] .
By the IH applied to the subproblem (noting that conditional on o 1 π o^{\pi}{1} , the remaining elements are still uniformly permuted): 𝔼 π [ f ( S h π | s 1 π ) ] ≥ R ( k − 1 , h − 1 ) ⋅ 𝔼 π [ O P T ′ ( π ) ] \mathbb{E}{\pi}[f(S^{\pi}{h}|s^{\pi}{1})]\geq R(k-1,h-1)\cdot\mathbb{E}_{\pi}[OPT^{\prime}(\pi)] .
The expected optimum of the subproblem is 𝔼 π [ O P T ′ ( π ) ] = 𝔼 π [ f ( E o p t ′ ( π ) | s 1 π ) ] \mathbb{E}{\pi}[OPT^{\prime}(\pi)]=\mathbb{E}{\pi}[f(E^{\prime}{opt}(\pi)|s^{\pi}{1})] . By submodularity: 𝔼 π [ O P T ′ ( π ) ] ≥ O P T − 𝔼 π [ f ( s 1 π ∪ o 1 π ) ] \mathbb{E}{\pi}[OPT^{\prime}(\pi)]\geq OPT-\mathbb{E}{\pi}[f(s^{\pi}{1}\cup o^{\pi}{1})] .
Combining these: 𝔼 π [ f ( S h π ) ] ≥ 𝔼 π [ f ( s 1 π ) ] + R ( k − 1 , h − 1 ) ⋅ ( O P T − 𝔼 π [ f ( s 1 π ∪ o 1 π ) ] ) \mathbb{E}{\pi}[f(S^{\pi}{h})]\geq\mathbb{E}{\pi}[f(s^{\pi}{1})]+R(k-1,h-1)\cdot(OPT-\mathbb{E}{\pi}[f(s^{\pi}{1}\cup o^{\pi}_{1})]) .
We bound the loss term 𝔼 π [ f ( s 1 π ∪ o 1 π ) ] \mathbb{E}{\pi}[f(s^{\pi}{1}\cup o^{\pi}{1})] . Note that if s 1 π ≠ o 1 π s^{\pi}{1}\neq o^{\pi}{1} , then s 1 π s^{\pi}{1} must have arrived before o 1 π o^{\pi}{1} , and since s 1 π = v 1 π s^{\pi}{1}=v^{\pi}{1} is the best element up to o 1 π o^{\pi}{1} , it must be that s 1 π = u 1 π s^{\pi}{1}=u^{\pi}{1} .
We can obtain a bound using the Case 2 condition.
𝔼 π [ f ( s 1 π ∪ o 1 π ) ] \displaystyle\mathbb{E}_{\pi}[f(s^{\pi}_{1}\cup o^{\pi}_{1})]
= 𝔼 π [ f ( o 1 π ) ] + 𝔼 π [ f ( s 1 π | o 1 π ) ] \displaystyle=\mathbb{E}_{\pi}[f(o^{\pi}_{1})]+\mathbb{E}_{\pi}[f(s^{\pi}_{1}|o^{\pi}_{1})]
= 𝔼 π [ f ( o 1 π ) ] + P ( s 1 π = o 1 π ) E [ f ( s 1 π | o 1 π ) | s 1 π = o 1 π ] + P ( s 1 π ≠ o 1 π ) E [ f ( s 1 π | o 1 π ) | s 1 π ≠ o 1 π ] . \displaystyle=\mathbb{E}_{\pi}[f(o^{\pi}_{1})]+P(s^{\pi}_{1}=o^{\pi}_{1})E[f(s^{\pi}_{1}|o^{\pi}_{1})|s^{\pi}_{1}=o^{\pi}_{1}]+P(s^{\pi}_{1}\neq o^{\pi}_{1})E[f(s^{\pi}_{1}|o^{\pi}_{1})|s^{\pi}_{1}\neq o^{\pi}_{1}].
If s 1 π = o 1 π s^{\pi}{1}=o^{\pi}{1} , the marginal gain is 0. If s 1 π ≠ o 1 π s^{\pi}{1}\neq o^{\pi}{1} , then s 1 π = u 1 π s^{\pi}{1}=u^{\pi}{1} .
𝔼 π [ f ( s 1 π ∪ o 1 π ) ] \displaystyle\mathbb{E}_{\pi}[f(s^{\pi}_{1}\cup o^{\pi}_{1})]
= 𝔼 π [ f ( o 1 π ) ] + P ( s 1 π ≠ o 1 π ) E [ f ( u 1 π | o 1 π ) | s 1 π ≠ o 1 π ] \displaystyle=\mathbb{E}_{\pi}[f(o^{\pi}_{1})]+P(s^{\pi}_{1}\neq o^{\pi}_{1})E[f(u^{\pi}_{1}|o^{\pi}_{1})|s^{\pi}_{1}\neq o^{\pi}_{1}]
≤ 𝔼 π [ f ( o 1 π ) ] + 𝔼 π [ f ( u 1 π ) ] . (By submodularity and non-negativity) \displaystyle\leq\mathbb{E}_{\pi}[f(o^{\pi}_{1})]+\mathbb{E}_{\pi}[f(u^{\pi}_{1})].\quad\text{(By submodularity and non-negativity)}
Using the Case 2 condition 𝔼 π [ f ( u 1 π ) ]
Substituting back into the main inequality:
𝔼 π [ f ( S h π ) ] > 𝔼 π [ f ( s 1 π ) ] + R ( k − 1 , h − 1 ) ⋅ ( O P T − ( 1 + t ) 𝔼 π [ f ( o 1 π ) ] ) \mathbb{E}{\pi}[f(S^{\pi}{h})]>\mathbb{E}{\pi}[f(s^{\pi}{1})]+R(k-1,h-1)\cdot(OPT-(1+t)\mathbb{E}{\pi}[f(o^{\pi}{1})]) .
By definition of v 1 π v^{\pi}{1} , f ( s 1 π ) ≥ f ( o 1 π ) f(s^{\pi}{1})\geq f(o^{\pi}{1}) for all π \pi . Thus 𝔼 π [ f ( s 1 π ) ] ≥ 𝔼 π [ f ( o 1 π ) ] \mathbb{E}{\pi}[f(s^{\pi}{1})]\geq\mathbb{E}{\pi}[f(o^{\pi}{1})] . 𝔼 π [ f ( S h π ) ] > 𝔼 π [ f ( o 1 π ) ] ( 1 − ( 1 + t ) R ( k − 1 , h − 1 ) ) + R ( k − 1 , h − 1 ) ⋅ O P T \mathbb{E}{\pi}[f(S^{\pi}{h})]>\mathbb{E}{\pi}f(o^{\pi}_{1})+R(k-1,h-1)\cdot OPT .
Crucially, by Lemma 7.24 (which relies on the MTC), the coefficient 1 − ( 1 + t ) R ( k − 1 , h − 1 ) 1-(1+t)R(k-1,h-1) is non-negative. We substitute 𝔼 π [ f ( o 1 π ) ] ≥ O P T / k \mathbb{E}{\pi}[f(o^{\pi}{1})]\geq OPT/k (Lemma 7.20 ).
𝔼 π [ f ( S h π ) ] \displaystyle\mathbb{E}{\pi}[f(S^{\pi}{h})]
> O P T k ( 1 − ( 1 + t ) R ( k − 1 , h − 1 ) ) + R ( k − 1 , h − 1 ) ⋅ O P T \displaystyle>\frac{OPT}{k}(1-(1+t)R(k-1,h-1))+R(k-1,h-1)\cdot OPT
= O P T ⋅ [ 1 k + ( 1 − 1 + t k ) R ( k − 1 , h − 1 ) ] = B k , h ⋅ O P T . \displaystyle=OPT\cdot\left[\frac{1}{k}+\left(1-\frac{1+t}{k}\right)R(k-1,h-1)\right]=B_{k,h}\cdot OPT.
In both cases, 𝔼 π [ f ( S h π ) ] ≥ min ( A k , h , B k , h ) ⋅ O P T ≥ R ( k , h ) ⋅ O P T \mathbb{E}{\pi}[f(S^{\pi}{h})]\geq\min(A_{k,h},B_{k,h})\cdot OPT\geq R(k,h)\cdot OPT . The induction is complete. ∎
Analysis of the Specific Recurrence
We now analyze the recurrence using a specific threshold function that optimizes the fluid limit.
The Optimal Threshold Function
We define the threshold function T ∗ ( k , h ) = k k 2 + h 2 T^{*}(k,h)=\frac{k}{\sqrt{k^{2}+h^{2}}} .
Lemma 7.26
.
The threshold function T ∗ ( k , h ) T^{*}(k,h) satisfies the Monotonic Threshold Condition (MTC).
Proof.
We need to show T ∗ ( k , h ) ≤ T ∗ ( k − 1 , h − 1 ) T^{}(k,h)\leq T^{}(k-1,h-1) for k ≥ 2 , h ≥ 1 k\geq 2,h\geq 1 . Let g ( x ) = ( 1 + x 2 ) − 1 / 2 g(x)=(1+x^{2})^{-1/2} . g ( x ) g(x) is strictly decreasing for x ≥ 0 x\geq 0 . T ∗ ( k , h ) = g ( h / k ) T^{}(k,h)=g(h/k) . T ∗ ( k − 1 , h − 1 ) = g ( ( h − 1 ) / ( k − 1 ) ) T^{}(k-1,h-1)=g((h-1)/(k-1)) . We compare x 1 = h / k x_{1}=h/k and x 0 = ( h − 1 ) / ( k − 1 ) x_{0}=(h-1)/(k-1) . Since ( k , h ) ∈ 𝒟 (k,h)\in\mathcal{D} , k ≥ h k\geq h . k ≥ h ⟹ − k ≤ − h ⟹ h k − k ≤ h k − h k\geq h\implies-k\leq-h\implies hk-k\leq hk-h . k ( h − 1 ) ≤ h ( k − 1 ) k(h-1)\leq h(k-1) . Since k ≥ 2 k\geq 2 , k > 0 k>0 and k − 1 > 0 k-1>0 . ( h − 1 ) / ( k − 1 ) ≤ h / k (h-1)/(k-1)\leq h/k . So x 0 ≤ x 1 x_{0}\leq x_{1} . Since g ( x ) g(x) is decreasing, g ( x 1 ) ≤ g ( x 0 ) g(x_{1})\leq g(x_{0}) . Thus T ∗ ( k , h ) ≤ T ∗ ( k − 1 , h − 1 ) T^{}(k,h)\leq T^{}(k-1,h-1) . ∎
Since T ∗ ( k , h ) T^{*}(k,h) satisfies the MTC, Theorem 7.25 applies. The approximation ratio of the Tree Algorithm is lower bounded by R ( k , k ) R(k,k) derived using this threshold function.
Solving the Recurrence
We now prove that the recurrence defined with T ∗ ( k , h ) T^{*}(k,h) evaluates to 2 − 2 2-\sqrt{2} when h = k h=k .
Theorem 7.27
.
Let R ( k , h ) R(k,h) be the recurrence defined using T ∗ ( k , h ) T^{*}(k,h) . Then for all k ≥ 1 k\geq 1 , R ( k , k ) = 2 − 2 R(k,k)=2-\sqrt{2} .
To prove this, we introduce continuous helper functions that represent the fluid limit of the recurrence and use them to establish bounds.
The Continuous Helper Functions Let ρ : [ 0 , 1 ] → [ 0 , 1 ] \rho:[0,1]\to[0,1] and t : [ 0 , 1 ] → ( 0 , 1 ] t:[0,1]\to(0,1] be defined as:
ρ ( x ) \displaystyle\rho(x)
= 1 + x − x 2 + 1 , \displaystyle=1+x-\sqrt{x^{2}+1},
t ( x ) \displaystyle t(x)
= 1 x 2 + 1 . \displaystyle=\frac{1}{\sqrt{x^{2}+1}}.
Note that T ∗ ( k , h ) = t ( h / k ) T^{*}(k,h)=t(h/k) . We define the comparison function P ( k , h ) = ρ ( h / k ) P(k,h)=\rho(h/k) for k ≥ 1 k\geq 1 .
We establish key properties of these functions.
Lemma 7.28
(Properties of ρ ( x ) \rho(x) and t ( x ) t(x) ) .
Boundary values: ρ ( 0 ) = 0 , ρ ( 1 ) = 2 − 2 \rho(0)=0,\rho(1)=2-\sqrt{2} .
Concavity: ρ ( x ) \rho(x) is strictly concave on [ 0 , 1 ] [0,1] .
Convex Auxiliary Function: W ( x ) = 1 1 − ρ ( x ) = x 2 + 1 + x W(x)=\frac{1}{1-\rho(x)}=\sqrt{x^{2}+1}+x is strictly convex on [ 0 , 1 ] [0,1] .
Rate 1 Identity: ρ ′ ( x ) = t ( x ) ( 1 − ρ ( x ) ) \rho^{\prime}(x)=t(x)(1-\rho(x)) . (Equivalently, W ′ ( x ) = W ( x ) t ( x ) W^{\prime}(x)=W(x)t(x) ).
Rate 2 Identity: ρ ′ ( x ) ( 1 − x ) = 1 − ( 1 + t ( x ) ) ρ ( x ) \rho^{\prime}(x)(1-x)=1-(1+t(x))\rho(x) .
Threshold Relation: ρ ( x ) ≤ 1 1 + t ( x ) \rho(x)\leq\frac{1}{1+t(x)} .
Monotonicity of t: t ( x ) t(x) is strictly decreasing on [ 0 , 1 ] [0,1] .
Proof.
-
text
ρ ( 0 ) = 1 + 0 − 1 = 0 \rho(0)=1+0-\sqrt{1}=0 . ρ ( 1 ) = 1 + 1 − 2 = 2 − 2 \rho(1)=1+1-\sqrt{2}=2-\sqrt{2} . -
text
ρ ′ ( x ) = 1 − x x 2 + 1 \rho^{\prime}(x)=1-\frac{x}{\sqrt{x^{2}+1}} . ρ ′′ ( x ) = − x 2 + 1 − x ( x / x 2 + 1 ) x 2 + 1 = − ( x 2 + 1 ) − x 2 ( x 2 + 1 ) 3 / 2 = − 1 ( x 2 + 1 ) 3 / 2 \rho^{\prime\prime}(x)=-\frac{\sqrt{x^{2}+1}-x(x/\sqrt{x^{2}+1})}{x^{2}+1}=-\frac{(x^{2}+1)-x^{2}}{(x^{2}+1)^{3/2}}=-\frac{1}{(x^{2}+1)^{3/2}} . Since ρ ′′ ( x ) -
text
W ′ ( x ) = x x 2 + 1 + 1 W^{\prime}(x)=\frac{x}{\sqrt{x^{2}+1}}+1 . W ′′ ( x ) = 1 ( x 2 + 1 ) 3 / 2 W^{\prime\prime}(x)=\frac{1}{(x^{2}+1)^{3/2}} . Since W ′′ ( x ) > 0 W^{\prime\prime}(x)>0 , W ( x ) W(x) is strictly convex. -
text
t ( x ) ( 1 − ρ ( x ) ) = 1 x 2 + 1 ( 1 − ( 1 + x − x 2 + 1 ) ) = 1 x 2 + 1 ( x 2 + 1 − x ) = 1 − x x 2 + 1 = ρ ′ ( x ) t(x)(1-\rho(x))=\frac{1}{\sqrt{x^{2}+1}}(1-(1+x-\sqrt{x^{2}+1}))=\frac{1}{\sqrt{x^{2}+1}}(\sqrt{x^{2}+1}-x)=1-\frac{x}{\sqrt{x^{2}+1}}=\rho^{\prime}(x) . Let S = x 2 + 1 S=\sqrt{x^{2}+1} . t ( x ) = 1 / S , ρ ( x ) = 1 + x − S , ρ ′ ( x ) = 1 − x / S t(x)=1/S,\rho(x)=1+x-S,\rho^{\prime}(x)=1-x/S . LHS = ( 1 − x / S ) ( 1 − x ) = 1 − x − x / S + x 2 / S = 1 − x + x 2 − x S =(1-x/S)(1-x)=1-x-x/S+x^{2}/S=1-x+\frac{x^{2}-x}{S} .
RHS = 1 − ( 1 + 1 / S ) ( 1 + x − S ) = 1 − ( 1 + x − S + 1 + x S − 1 ) = 1 − x + S − 1 + x S =1-(1+1/S)(1+x-S)=1-(1+x-S+\frac{1+x}{S}-1)=1-x+S-\frac{1+x}{S} . RHS = 1 − x + S 2 − ( 1 + x ) S = 1 − x + ( x 2 + 1 ) − 1 − x S = 1 − x + x 2 − x S =1-x+\frac{S^{2}-(1+x)}{S}=1-x+\frac{(x^{2}+1)-1-x}{S}=1-x+\frac{x^{2}-x}{S} . LHS = RHS. (There was a slight algebraic error in the previous draft’s proof of this point, which is corrected here).
We want ρ ( x ) ( 1 + t ( x ) ) ≤ 1 \rho(x)(1+t(x))\leq 1 . By the Rate 2 identity, this is equivalent to 1 − ( 1 − x ) ρ ′ ( x ) ≤ 1 1-(1-x)\rho^{\prime}(x)\leq 1 , or ( 1 − x ) ρ ′ ( x ) ≥ 0 (1-x)\rho^{\prime}(x)\geq 0 . Since x ∈ [ 0 , 1 ] x\in[0,1] , 1 − x ≥ 0 1-x\geq 0 . Since x 2 + 1 > x \sqrt{x^{2}+1}>x , ρ ′ ( x ) = 1 − x / x 2 + 1 > 0 \rho^{\prime}(x)=1-x/\sqrt{x^{2}+1}>0 . The inequality holds. Equality holds only if x = 1 x=1 .
-
text
t ′ ( x ) = − 1 2 ( x 2 + 1 ) − 3 / 2 ( 2 x ) = − x ( x 2 + 1 ) − 3 / 2 t^{\prime}(x)=-\frac{1}{2}(x^{2}+1)^{-3/2}(2x)=-x(x^{2}+1)^{-3/2} . This is negative for x > 0 x>0 .
∎
We next give the proof of Theorem 7.27 .
Proof.
We use a "Diagonal Sandwich" argument. We first prove a global lower bound R ( k , h ) ≥ P ( k , h ) R(k,h)\geq P(k,h) , and then prove a tight upper bound at the diagonal h = k h=k .
Step 1: Global Lower Bound. We prove by strong induction that R ( k , h ) ≥ P ( k , h ) = ρ ( h / k ) R(k,h)\geq P(k,h)=\rho(h/k) for all ( k , h ) ∈ 𝒟 , k ≥ 1 (k,h)\in\mathcal{D},k\geq 1 .
Base Conditions:
•
h = 0 , k ≥ 1 h=0,k\geq 1 : R ( k , 0 ) = 0 R(k,0)=0 . P ( k , 0 ) = ρ ( 0 ) = 0 P(k,0)=\rho(0)=0 . Holds.
•
k = 1 k=1 . R ( 1 , 1 ) R(1,1) . t 1 , 1 = T ∗ ( 1 , 1 ) = 1 / 2 t_{1,1}=T^{*}(1,1)=1/\sqrt{2} . A 1 , 1 = t 1 , 1 + ( 1 − t 1 , 1 ) R ( 1 , 0 ) = 1 / 2 A_{1,1}=t_{1,1}+(1-t_{1,1})R(1,0)=1/\sqrt{2} . B 1 , 1 = 1 + ( 1 − ( 1 + t 1 , 1 ) ) R ( 0 , 0 ) = 1 B_{1,1}=1+(1-(1+t_{1,1}))R(0,0)=1 . C 1 , 1 = 1 1 + 1 / 2 = 2 2 + 1 = 2 − 2 C_{1,1}=\frac{1}{1+1/\sqrt{2}}=\frac{\sqrt{2}}{\sqrt{2}+1}=2-\sqrt{2} .
R ( 1 , 1 ) = min ( 1 / 2 , 1 , 2 − 2 ) R(1,1)=\min(1/\sqrt{2},1,2-\sqrt{2}) . Since 2 − 2 ≈ 0.586 2-\sqrt{2}\approx 0.586 and 1 / 2 ≈ 0.707 1/\sqrt{2}\approx 0.707 , R ( 1 , 1 ) = 2 − 2 R(1,1)=2-\sqrt{2} . P ( 1 , 1 ) = ρ ( 1 ) = 2 − 2 P(1,1)=\rho(1)=2-\sqrt{2} . Holds.
Inductive Step: Fix ( k , h ) (k,h) with k ≥ 2 , h ≥ 1 k\geq 2,h\geq 1 . Assume R ( k ′ , h ′ ) ≥ P ( k ′ , h ′ ) R(k^{\prime},h^{\prime})\geq P(k^{\prime},h^{\prime}) for all ( k ′ , h ′ )
Analysis of C k , h C_{k,h} : C k , h = 1 1 + t ( x ) C_{k,h}=\frac{1}{1+t(x)} . By Lemma 7.28 .6, ρ ( x ) ≤ 1 1 + t ( x ) \rho(x)\leq\frac{1}{1+t(x)} . So C k , h ≥ P ( k , h ) C_{k,h}\geq P(k,h) .
Analysis of A k , h A_{k,h} : Let Δ x = 1 / k \Delta x=1/k . The previous state is ( k , h − 1 ) (k,h-1) . The ratio is x ′ = ( h − 1 ) / k = x − Δ x x^{\prime}=(h-1)/k=x-\Delta x . By IH, R ( k , h − 1 ) ≥ P ( k , h − 1 ) = ρ ( x − Δ x ) R(k,h-1)\geq P(k,h-1)=\rho(x-\Delta x) .
A k , h ≥ t ( x ) Δ x + ( 1 − t ( x ) Δ x ) ρ ( x − Δ x ) . A_{k,h}\geq t(x)\Delta x+(1-t(x)\Delta x)\rho(x-\Delta x).
We want to show this is ≥ ρ ( x ) \geq\rho(x) .
t ( x ) Δ x ( 1 − ρ ( x − Δ x ) ) ≥ ρ ( x ) − ρ ( x − Δ x ) . t(x)\Delta x(1-\rho(x-\Delta x))\geq\rho(x)-\rho(x-\Delta x).
Let W ( x ) = 1 / ( 1 − ρ ( x ) ) W(x)=1/(1-\rho(x)) .
t ( x ) Δ x W ( x − Δ x ) ≥ 1 W ( x − Δ x ) − 1 W ( x ) . \frac{t(x)\Delta x}{W(x-\Delta x)}\geq\frac{1}{W(x-\Delta x)}-\frac{1}{W(x)}.
Multiplying by W ( x ) W ( x − Δ x ) W(x)W(x-\Delta x) (which is positive):
t ( x ) Δ x W ( x ) ≥ W ( x ) − W ( x − Δ x ) . t(x)\Delta xW(x)\geq W(x)-W(x-\Delta x).
W ( x − Δ x ) ≥ W ( x ) − W ( x ) t ( x ) Δ x . W(x-\Delta x)\geq W(x)-W(x)t(x)\Delta x.
By Lemma 7.28 .4 (Rate 1 Identity), W ′ ( x ) = W ( x ) t ( x ) W^{\prime}(x)=W(x)t(x) .
W ( x − Δ x ) ≥ W ( x ) − W ′ ( x ) Δ x . W(x-\Delta x)\geq W(x)-W^{\prime}(x)\Delta x.
This inequality holds because W ( x ) W(x) is strictly convex (Lemma 7.28 .3). A convex function lies above its tangent line: W ( y ) ≥ W ( x ) + W ′ ( x ) ( y − x ) W(y)\geq W(x)+W^{\prime}(x)(y-x) . Setting y = x − Δ x y=x-\Delta x yields the result. Thus A k , h ≥ P ( k , h ) A_{k,h}\geq P(k,h) .
Analysis of B k , h B_{k,h} : The previous state is ( k − 1 , h − 1 ) (k-1,h-1) . Let x 1 = h / k x_{1}=h/k and x 0 = ( h − 1 ) / ( k − 1 ) x_{0}=(h-1)/(k-1) . By IH, R ( k − 1 , h − 1 ) ≥ P ( k − 1 , h − 1 ) = ρ ( x 0 ) R(k-1,h-1)\geq P(k-1,h-1)=\rho(x_{0}) .
B k , h ≥ 1 k + ( 1 − 1 + t ( x 1 ) k ) ρ ( x 0 ) . B_{k,h}\geq\frac{1}{k}+\left(1-\frac{1+t(x_{1})}{k}\right)\rho(x_{0}).
We want to show this is ≥ ρ ( x 1 ) \geq\rho(x_{1}) .
1 k ( 1 − ( 1 + t ( x 1 ) ) ρ ( x 0 ) ) ≥ ρ ( x 1 ) − ρ ( x 0 ) . \frac{1}{k}(1-(1+t(x_{1}))\rho(x_{0}))\geq\rho(x_{1})-\rho(x_{0}).
We analyze the relationship between 1 / k 1/k and the ratios.
x 1 − x 0 = h k − h − 1 k − 1 = h ( k − 1 ) − k ( h − 1 ) k ( k − 1 ) = k − h k ( k − 1 ) x_{1}-x_{0}=\frac{h}{k}-\frac{h-1}{k-1}=\frac{h(k-1)-k(h-1)}{k(k-1)}=\frac{k-h}{k(k-1)} .
1 − x 0 = 1 − h − 1 k − 1 = k − 1 − ( h − 1 ) k − 1 = k − h k − 1 1-x_{0}=1-\frac{h-1}{k-1}=\frac{k-1-(h-1)}{k-1}=\frac{k-h}{k-1} .
Therefore, x 1 − x 0 = 1 k ( 1 − x 0 ) x_{1}-x_{0}=\frac{1}{k}(1-x_{0}) . So 1 / k = ( x 1 − x 0 ) / ( 1 − x 0 ) 1/k=(x_{1}-x_{0})/(1-x_{0}) .
(This assumes h
Case 1: 1 ≤ h 0 1-x_{0}>0 and x 1 − x 0 > 0 x_{1}-x_{0}>0 . Substituting 1 / k 1/k :
x 1 − x 0 1 − x 0 ( 1 − ( 1 + t ( x 1 ) ) ρ ( x 0 ) ) ≥ ρ ( x 1 ) − ρ ( x 0 ) . \frac{x_{1}-x_{0}}{1-x_{0}}(1-(1+t(x_{1}))\rho(x_{0}))\geq\rho(x_{1})-\rho(x_{0}).
1 − ( 1 + t ( x 1 ) ) ρ ( x 0 ) 1 − x 0 ≥ ρ ( x 1 ) − ρ ( x 0 ) x 1 − x 0 . \frac{1-(1+t(x_{1}))\rho(x_{0})}{1-x_{0}}\geq\frac{\rho(x_{1})-\rho(x_{0})}{x_{1}-x_{0}}.
The RHS is the slope of the secant line of ρ ( x ) \rho(x) from x 0 x_{0} to x 1 x_{1} . Since ρ ( x ) \rho(x) is concave (Lemma 7.28 .2), the slope of the secant is less than the slope of the tangent at x 0 x_{0} :
ρ ( x 1 ) − ρ ( x 0 ) x 1 − x 0 < ρ ′ ( x 0 ) . \frac{\rho(x_{1})-\rho(x_{0})}{x_{1}-x_{0}}<\rho^{\prime}(x_{0}).
It is sufficient to prove:
1 − ( 1 + t ( x 1 ) ) ρ ( x 0 ) 1 − x 0 ≥ ρ ′ ( x 0 ) . \frac{1-(1+t(x_{1}))\rho(x_{0})}{1-x_{0}}\geq\rho^{\prime}(x_{0}).
By Lemma 7.28 .5 (Rate 2 Identity) applied at x 0 x_{0} : ρ ′ ( x 0 ) = 1 − ( 1 + t ( x 0 ) ) ρ ( x 0 ) 1 − x 0 \rho^{\prime}(x_{0})=\frac{1-(1+t(x_{0}))\rho(x_{0})}{1-x_{0}} . We need to show:
1 − ( 1 + t ( x 1 ) ) ρ ( x 0 ) ≥ 1 − ( 1 + t ( x 0 ) ) ρ ( x 0 ) . 1-(1+t(x_{1}))\rho(x_{0})\geq 1-(1+t(x_{0}))\rho(x_{0}).
( 1 + t ( x 0 ) ) ρ ( x 0 ) ≥ ( 1 + t ( x 1 ) ) ρ ( x 0 ) . (1+t(x_{0}))\rho(x_{0})\geq(1+t(x_{1}))\rho(x_{0}).
Since h ≥ 1 h\geq 1 , x 0 = ( h − 1 ) / ( k − 1 ) ≥ 0 x_{0}=(h-1)/(k-1)\geq 0 . ρ ( x 0 ) ≥ 0 \rho(x_{0})\geq 0 . We need t ( x 0 ) ≥ t ( x 1 ) t(x_{0})\geq t(x_{1}) . Since t ( x ) t(x) is decreasing (Lemma 7.28 .7) and x 0
Case 2: h = k h=k . ( k ≥ 2 k\geq 2 ). We want to show B k , k ≥ P ( k , k ) = 2 − 2 B_{k,k}\geq P(k,k)=2-\sqrt{2} . Here x 1 = 1 x_{1}=1 . t ( x 1 ) = 1 / 2 t(x_{1})=1/\sqrt{2} . The previous state is ( k − 1 , k − 1 ) (k-1,k-1) . x 0 = 1 x_{0}=1 . By IH, R ( k − 1 , k − 1 ) ≥ P ( k − 1 , k − 1 ) = ρ ( 1 ) = 2 − 2 R(k-1,k-1)\geq P(k-1,k-1)=\rho(1)=2-\sqrt{2} .
B k , k ≥ 1 k + ( 1 − 1 + 1 / 2 k ) ( 2 − 2 ) . B_{k,k}\geq\frac{1}{k}+\left(1-\frac{1+1/\sqrt{2}}{k}\right)(2-\sqrt{2}).
We check if this is ≥ 2 − 2 \geq 2-\sqrt{2} .
1 k ≥ ( 2 − 2 ) − ( 1 − 1 + 1 / 2 k ) ( 2 − 2 ) \frac{1}{k}\geq(2-\sqrt{2})-\left(1-\frac{1+1/\sqrt{2}}{k}\right)(2-\sqrt{2})
1 k ≥ ( 2 − 2 ) ( 1 − ( 1 − 1 + 1 / 2 k ) ) = ( 2 − 2 ) ( 1 + 1 / 2 k ) . \frac{1}{k}\geq(2-\sqrt{2})\left(1-\left(1-\frac{1+1/\sqrt{2}}{k}\right)\right)=(2-\sqrt{2})\left(\frac{1+1/\sqrt{2}}{k}\right).
1 ≥ ( 2 − 2 ) ( 1 + 1 / 2 ) . 1\geq(2-\sqrt{2})(1+1/\sqrt{2}).
RHS = ( 2 − 2 ) ( 2 + 1 2 ) = ( 2 − 2 ) ( 2 + 1 ) 2 = 2 2 + 2 − 2 − 2 2 = 2 2 = 1 =(2-\sqrt{2})\left(\frac{\sqrt{2}+1}{\sqrt{2}}\right)=\frac{(2-\sqrt{2})(\sqrt{2}+1)}{\sqrt{2}}=\frac{2\sqrt{2}+2-2-\sqrt{2}}{\sqrt{2}}=\frac{\sqrt{2}}{\sqrt{2}}=1 . The inequality 1 ≥ 1 1\geq 1 holds. Thus B k , k ≥ P ( k , k ) B_{k,k}\geq P(k,k) .
Conclusion of Step 1: We have shown A k , h ≥ P ( k , h ) A_{k,h}\geq P(k,h) , B k , h ≥ P ( k , h ) B_{k,h}\geq P(k,h) , and C k , h ≥ P ( k , h ) C_{k,h}\geq P(k,h) . Therefore, R ( k , h ) = min ( A k , h , B k , h , C k , h ) ≥ P ( k , h ) R(k,h)=\min(A_{k,h},B_{k,h},C_{k,h})\geq P(k,h) for all ( k , h ) ∈ 𝒟 (k,h)\in\mathcal{D} . In particular, R ( k , k ) ≥ P ( k , k ) = ρ ( 1 ) = 2 − 2 R(k,k)\geq P(k,k)=\rho(1)=2-\sqrt{2} .
Step 2: Diagonal Upper Bound. We consider the upper bound at h = k h=k . By definition of the recurrence, R ( k , k ) ≤ C k , k R(k,k)\leq C_{k,k} . t k , k = T ∗ ( k , k ) = k / k 2 + k 2 = k / 2 k 2 = 1 / 2 t_{k,k}=T^{*}(k,k)=k/\sqrt{k^{2}+k^{2}}=k/\sqrt{2k^{2}}=1/\sqrt{2} .
C k , k = 1 1 + t k , k = 1 1 + 1 / 2 = 2 − 2 . C_{k,k}=\frac{1}{1+t_{k,k}}=\frac{1}{1+1/\sqrt{2}}=2-\sqrt{2}.
So, R ( k , k ) ≤ 2 − 2 R(k,k)\leq 2-\sqrt{2} .
Conclusion of the Proof (Diagonal Sandwich): Since 2 − 2 ≤ R ( k , k ) 2-\sqrt{2}\leq R(k,k) (Step 1) and R ( k , k ) ≤ 2 − 2 R(k,k)\leq 2-\sqrt{2} (Step 2), we must have R ( k , k ) = 2 − 2 R(k,k)=2-\sqrt{2} for all k ≥ 1 k\geq 1 . ∎
Conclusion
We have rigorously established the connection between the Tree Algorithm for submodular maximization under adversarial injections and a generalized recurrence relation featuring a state-dependent threshold T ( k , h ) T(k,h) .
We proved that if the threshold function satisfies the Monotonic Threshold Condition (MTC), the recurrence R ( k , h ) R(k,h) provides a lower bound on the expected approximation ratio of the algorithm (Theorem 7.25 ), correctly utilizing the inequalities inherent to submodular functions.
We defined a specific threshold function T ∗ ( k , h ) = k / k 2 + h 2 T^{*}(k,h)=k/\sqrt{k^{2}+h^{2}} and verified it satisfies the MTC (Lemma 7.26 ).
We analyzed the resulting recurrence using a "Diagonal Sandwich" argument and proved that R ( k , k ) = 2 − 2 R(k,k)=2-\sqrt{2} (Theorem 7.27 ).
This implies that the Tree Algorithm achieves an approximation ratio of at least 2 − 2 ≈ 0.5857 2-\sqrt{2}\approx 0.5857 in the adversarial injections model. This improves upon the 0.55 approximation derived in [ 46 ] using a fixed threshold analysis.
7.5
Streaming Algorithms: Entropy, Low Rank Approximation, and Chamfer Distance
Written by Ying Feng, David P. Woodruff, and Samson Zhou .
Problem Context
This case study examines several fundamental problems in streaming algorithms, but where the focus is on settings where space efficiency alone is not the only resource constraint. In addition to minimizing memory usage, modern systems often operate under limitations on write bandwidth, communication, or the frequency with which internal state can be modified. We consider four representative problems:
Entropy Estimation: Can the Shannon entropy of a data stream be estimated with significantly fewer internal state changes than the current O ~ ( n ) \tilde{O}(\sqrt{n}) bound?
Consistent LRA: Is the optimal low-rank approximation subspace stable under row updates?
Efficient Encoding for LRA: Can low-rank sketches computed on different parts of the data be globally compressed without losing accuracy?
Chamfer Distance: Can the fast ℓ 1 \ell_{1} Chamfer distance algorithm be extended to the ℓ 2 \ell_{2} metric?
Together, these problems highlight a common theme: understanding how algorithmic guarantees interact with more refined complexity measures beyond space, such as stability, recourse, and internal state changes.
AI Contribution
Across these problems, the model played a role in identifying structural properties that were not explicit in prior analyses and in turning those observations into concrete algorithmic improvements.
•
Entropy:
The model observed that the standard algorithm only queries moments F p F_{p} for p ≈ 1 p\approx 1 , avoiding the high-variance regime ( p = 2 p=2 ) entirely.
This immediately implied a polylogarithmic bound on the number of internal state changes.
•
LRA Stability:
The model produced a proof showing that the optimal rank- k k subspace changes by at most a constant amount under a single row insertion, using eigenvalue interlacing and dimension arguments.
•
LRA Encoding:
The model suggested decomposing local sketches into a global “head” and quantized “tail,” and proved that this representation preserves projection costs with small additive error.
•
Chamfer Distance:
The model adapted the quadtree-based algorithm from ℓ 1 \ell_{1} to ℓ 2 \ell_{2} , introducing a dimensionality reduction step (Johnson-Lindenstrauss) to handle high dimensions efficiently.
7.5.1
Internal State Changes for Streaming Entropy
In streaming algorithms, the foremost priority is typically to minimize the amount of memory (space) required to process a stream, since data streams can be massive and cannot be stored in full. Recently, [ 62 ] noted that minimizing the number of internal state changes, which counts how often the algorithm updates its memory representation, is also a reasonable secondary priority, particularly in settings with limited write bandwidth or distributed monitoring systems.
In the streaming model considered here, the stream consists of a sequence of elements from [ n ] = { 1 , 2 , … , n } [n]={1,2,\ldots,n} , and the algorithm maintains a frequency vector f ∈ ℤ ≥ 0 n f\in\mathbb{Z}{\geq 0}^{n} , where f i f{i} denotes the number of times item i i has appeared in the data stream, so that each update to the stream increments the corresponding coordinate of f f by one. The number of internal state changes is the number of times the algorithm modifies its memory state over the duration of the data stream.
The F p F_{p} moment estimation problem asks to approximate the p p -th frequency moment of the stream, defined as F p = ∑ i = 1 n f i p F_{p}=\sum_{i=1}^{n}f_{i}^{p} . Accurate F p F_{p} estimation is fundamental because many other statistics can be expressed in terms of frequency moments. In particular, F p F_{p} moments are central tools in streaming algorithms for estimating norms, identifying heavy hitters, and, as in this discussion, approximating Shannon entropy.
Please try to improve the paper by identifying and solving an open question from it.
For your proof, adhere to: ⟨ rigor text ⟩ \langle\text{rigor text}\rangle .
We identify and resolve an open question arising from the analysis of the entropy estimation algorithm presented in Section 3.3 of the paper.
Open Question:
The paper establishes in Theorem 3.8 that the Shannon entropy of an insertion-only stream can be estimated with an ε \varepsilon -additive error using O ~ ( n / ε O ( 1 ) ) \tilde{O}(\sqrt{n}/\varepsilon^{O(1)}) internal state changes (assuming m = poly ( n ) m=\text{poly}(n) ) and polylogarithmic space. This O ~ ( n ) \tilde{O}(\sqrt{n}) bound stems from the worst-case complexity of F p F_{p} moment estimation for p ∈ ( 0 , 2 ] p\in(0,2] , which is O ~ ( n 1 − 1 / p ) \tilde{O}(n^{1-1/p}) , maximized as p → 2 p\to 2 . The open question is: Can this O ~ ( n ) \tilde{O}(\sqrt{n}) upper bound on the number of state changes for streaming entropy estimation be improved?
We answer this question affirmatively. By analyzing the specific moments required by the underlying [HNO08] framework used in the paper, we show that they are sufficiently close to 1 such that the polynomial dependence on n n vanishes.
Figure 12 : Model identification of open problem and progress direction
Shannon entropy , defined as H = − ∑ i = 1 n p i log p i H=-\sum_{i=1}^{n}p_{i}\log p_{i} with p i = f i / ‖ f ‖ 1 p_{i}=f_{i}/|f|{1} , measures the uncertainty or information content in a dataset. Estimating Shannon entropy is a fundamental problem with numerous applications, including network traffic analysis (to detect anomalies or DDoS attacks), database summarization (to quantify diversity), natural language processing (to measure information content of text corpora), and data compression (to determine optimal coding schemes). Because entropy is sensitive to the entire distribution of frequencies, accurately approximating it in a streaming setting is challenging and typically relies on sophisticated techniques, such as Chebyshev interpolation of multiple F p F{p} moment estimates, as observed by [ 57 ] .
[ 62 ] analyzed the internal-state complexity required for approximating F p F_{p} in one-pass insertion-only streams. Their main theorem distinguishes two regimes: for p ∈ ( 0 , 1 ] p\in(0,1] , a ( 1 + ε ) (1+\varepsilon) -approximation can be achieved with high probability using only poly ( 1 ε , log n ) \mathrm{poly}\left(\frac{1}{\varepsilon},\log n\right) internal state changes, while for p ≥ 1 p\geq 1 , estimating F p F_{p} requires O ~ ( n 1 − 1 / p ) \widetilde{O}(n^{1-1/p}) internal state changes, which can be as large as 𝒪 ~ ( n ) \tilde{\mathcal{O}}\left(\sqrt{n}\right) when p = 2 p=2 . In their discussion of Shannon entropy, they interpreted the Chebyshev interpolation observation by [ 57 ] and noted that it requires evaluating multiple F p F_{p} moment estimates. They assumed the exponents lie in the range p ∈ ( 0 , 2 ) p\in(0,2) , which implies that estimating F 2 F_{2} could be necessary. Because F 2 F_{2} sketches can incur up to n \sqrt{n} internal state changes, this led to the conclusion that Shannon entropy estimation inherits an O ( n ) O(\sqrt{n}) state-change cost.
However, upon closer examination, the model noticed that all evaluation points in the interpolation procedure are of the form 1 + y i 1+y_{i} , where y i = f ( cos ( i π / k ) ) y_{i}=f(\cos(i\pi/k)) , and showed that 1 + y i ∈ ( 0 , 1 ) 1+y_{i}\in(0,1) for all i i , c.f., Lemma ˜ 7.31 . This implies that the entropy algorithm never requires F p F_{p} estimates for p ≥ 1 p\geq 1 , and all required frequency moments lie strictly within the low- p p regime. Consequently, the worst-case 𝒪 ~ ( n 1 − 1 / p ) \tilde{\mathcal{O}}\left(n^{1-1/p}\right) state-change barrier from the p ≥ 1 p\geq 1 region is avoided. It then follows that the necessary F p F_{p} estimates for p ∈ ( 0 , 1 ) p\in(0,1) use only poly ( 1 ε , log n ) \mathrm{poly}\left(\frac{1}{\varepsilon},\log n\right) internal state changes, c.f., Theorem ˜ 7.29 . As a result, the Shannon entropy algorithm can compute an additive ε \varepsilon -approximation using 𝒪 ~ ( 1 ε 2 + log n ) \tilde{\mathcal{O}}\left(\frac{1}{\varepsilon^{2}}+\log n\right) bits of space and poly ( 1 ε , log n ) \mathrm{poly}\left(\frac{1}{\varepsilon},\log n\right) internal state changes, significantly improving upon the previously believed 𝒪 ~ ( n ) \tilde{\mathcal{O}}\left(\sqrt{n}\right) bound.
Theorem 7.29
.
[ 62 ]
Let ε ∈ ( 0 , 1 ) \varepsilon\in(0,1) be an approximation parameter, δ ∈ ( 0 , 1 / 3 ) \delta\in(0,1/3) be a failure probability, and n n be the domain size. There exists a one-pass insertion-only streaming algorithm for the p p -th frequency moment F p F_{p} :
For p ∈ ( 0 , 1 ] p\in(0,1] : The algorithm achieves a ( 1 + ε ) (1+\varepsilon) -approximation (w.h.p.) using poly ( log n , 1 ε , log 1 δ ) \mathrm{poly}\left(\log n,\frac{1}{\varepsilon},\log\frac{1}{\delta}\right) internal state changes and 𝒪 ~ ( 1 ε 2 ( log log n + log 1 ε ) + log ( 1 / ε ) log log ( 1 / ε ) log n ) \tilde{\mathcal{O}}\left(\frac{1}{\varepsilon^{2}}(\log\log n+\log\frac{1}{\varepsilon})+\frac{\log(1/\varepsilon)}{\log\log(1/\varepsilon)}\log n\right) bits of space.
For p ≥ 1 p\geq 1 : The algorithm outputs F p ^ \widehat{F_{p}} such that 𝐏𝐫 [ | F p ^ − F p | ≤ ε ⋅ F p ] ≥ 2 3 \mathbf{Pr}\left[\left\lvert\widehat{F_{p}}-F_{p}\right\rvert\leq\varepsilon\cdot F_{p}\right]\geq\frac{2}{3} , with 𝒪 ~ ( n 1 − 1 / p ) \tilde{\mathcal{O}}\left(n^{1-1/p}\right) internal state changes. The space complexity depends on p p :
•
If p ∈ [ 1 , 2 ] p\in[1,2] : Space is 𝒪 ~ ( 1 ε 4 + 4 p ) ⋅ polylog ( m n ) \tilde{\mathcal{O}}\left(\frac{1}{\varepsilon^{4+4p}}\right)\cdot\mathrm{polylog}(mn) .
•
If p > 2 p>2 : Space is 𝒪 ~ ( 1 ε 4 + 4 p n 1 − 2 / p ) \tilde{\mathcal{O}}\left(\frac{1}{\varepsilon^{4+4p}}n^{1-2/p}\right) .
Algorithm 6 Additive approximation of empirical Shannon entropy
1: Error parameter ε ~ \tilde{\varepsilon} , points { y 0 , … , y k } {y_{0},\dots,y_{k}}
2: for i = 0 , … , k i=0,\dots,k do
3: Compute F ~ 1 + y i \tilde{F}{1+y{i}} , a ( 1 + ε ~ ) (1+\tilde{\varepsilon}) -approx. of F 1 + y i F_{1+y_{i}}
4: H ~ ( y i ) ← − log ( F ~ 1 + y i / ‖ A ‖ 1 + y i 1 + y i ) / y i \tilde{H}(y_{i})\leftarrow-\log!\left(\tilde{F}{1+y{i}}/|A|{1+y{i}}^{,1+y_{i}}\right)/y_{i}
5: T ~ ( y i ) ← ( 1 − F ~ 1 + y i / ‖ A ‖ 1 + y i 1 + y i ) / y i \tilde{T}(y_{i})\leftarrow\left(1-\tilde{F}{1+y{i}}/|A|{1+y{i}}^{,1+y_{i}}\right)/y_{i}
6: end for
7: return estimate of H ( 0 ) H(0) or T ( 0 ) T(0) by interpolating { H ~ ( y i ) } {\tilde{H}(y_{i})} or { T ~ ( y i ) } {\tilde{T}(y_{i})}
Lemma 7.30
.
[ 57 ]
To achieve an additive ε \varepsilon -approximation of the Shannon entropy, it suffices to implement Algorithm ˜ 6 with ε ~ = ε 12 ( k + 1 ) 3 log m \tilde{\varepsilon}=\frac{\varepsilon}{12(k+1)^{3}\log m} using points y 0 , … , y k y_{0},\ldots,y_{k} with k = log 1 ε + log log m k=\log\frac{1}{\varepsilon}+\log\log m and y i = f ( cos ( i π / k ) ) y_{i}=f(\cos(i\pi/k)) for f ( y ) = ( k 2 ℓ ) ⋅ y − ℓ ⋅ ( k 2 + 1 ) 2 k 2 + 1 f(y)=\frac{(k^{2}\ell)\cdot y-\ell\cdot(k^{2}+1)}{2k^{2}+1} , where ℓ = 1 ( 2 ( k + 1 ) log m \ell=\frac{1}{(2(k+1)\log m} .
Lemma 7.31
.
Let y i = f ( cos ( i π / k ) ) y_{i}=f(\cos(i\pi/k)) for f ( y ) = ( k 2 ℓ ) ⋅ y − ℓ ⋅ ( k 2 + 1 ) 2 k 2 + 1 f(y)=\frac{(k^{2}\ell)\cdot y-\ell\cdot(k^{2}+1)}{2k^{2}+1} , with ℓ = 1 ( 2 ( k + 1 ) log m \ell=\frac{1}{(2(k+1)\log m} . Then 1 + y i ∈ ( 0 , 1 ) 1+y_{i}\in(0,1) for all i ≥ 0 i\geq 0 .
Proof.
Observe that 1 + y i
( k 2 ℓ ) ⋅ cos ( i π / k ) − ℓ ⋅ ( k 2 + 1 ) < 0 . (k^{2}\ell)\cdot\cos(i\pi/k)-\ell\cdot(k^{2}+1)<0.
Since ℓ > 0 \ell>0 , this is equivalent to k 2 ⋅ cos ( i π / k )
Similarly, observe that 1 + y i > 0 1+y_{i}>0 if and only if f ( cos ( i π / k ) ) > − 1 f(\cos(i\pi/k))>-1 or equivalently,
( k 2 ℓ ) ⋅ cos ( i π / k ) − ℓ ⋅ ( k 2 + 1 ) > − ( 2 k 2 + 1 ) . (k^{2}\ell)\cdot\cos(i\pi/k)-\ell\cdot(k^{2}+1)>-(2k^{2}+1).
Since | cos ( i π / k ) | ≤ 1 |\cos(i\pi/k)|\leq 1 for all i i , then
( k 2 ℓ ) ⋅ cos ( i π / k ) − ℓ ⋅ ( k 2 + 1 ) > − ℓ ( 2 k 2 + 1 ) . (k^{2}\ell)\cdot\cos(i\pi/k)-\ell\cdot(k^{2}+1)>-\ell(2k^{2}+1).
Then, the desired claim follows because ℓ ∈ ( 0 , 1 ) \ell\in(0,1) . ∎
From Theorem ˜ 7.29 and Lemma ˜ 7.31 , we have:
Corollary 7.32
.
Given an accuracy parameter ε ∈ ( 0 , 1 ) \varepsilon\in(0,1) and a stream of length m = poly ( n ) m=\mathrm{poly}(n) over a universe of size n n , there exists a one-pass insertion-only streaming algorithm that outputs an additive ε \varepsilon -approximation H ^ \widehat{H} to the entropy H H of the data stream with high probability, using 𝒪 ~ ( 1 ε 2 + log n ) \tilde{\mathcal{O}}\left(\frac{1}{\varepsilon^{2}}+\log n\right) bits of space and poly ( 1 ε , log n ) \mathrm{poly}\left(\frac{1}{\varepsilon},\log n\right) internal state changes.
7.5.2
Consistent Low-Rank Approximation
Low-rank approximation is a central tool in data analysis and machine learning. Given a matrix A ∈ ℝ n × d A\in\mathbb{R}^{n\times d} , the goal is to find a rank- k k subspace, i.e., orthogonal matrix V ∈ ℝ k × d V\in\mathbb{R}^{k\times d} , that minimizes the approximation error ‖ A − A V ⊤ V ‖ F 2 |A-AV^{\top}V|_{F}^{2} , measured in Frobenius norm. In static settings, this problem is well understood: the optimal solution is given by the top k k singular vectors of A A , and efficient algorithms with strong guarantees are known.
Many modern applications, however, are inherently dynamic. Data arrive sequentially, are corrected, or are removed, and low-rank approximations are recomputed repeatedly as part of larger pipelines. In these settings, approximation quality alone is not sufficient. Each time the output subspace changes, downstream systems may need to retrain models, update features, or revalidate decisions, all of which can be expensive. This motivates a second objective beyond accuracy, namely consistency.
Consistent low-rank approximation is a relatively new problem formulation that makes this tradeoff explicit. Instead of computing a single low-rank approximation to a fixed dataset, the algorithm is asked to maintain a sequence of near-optimal subspaces as the data evolve, while keeping the changes between successive outputs small. Formally, we fix an accuracy parameter ε > 0 \varepsilon>0 and a target rank k k . The input is a matrix A ∈ ℝ n × d A\in\mathbb{R}^{n\times d} that changes over time, e.g., rows of A A are revealed incrementally, or there is a sequence of updates to entries of A A . At each time step t t , the algorithm observes the current matrix A ( t ) A^{(t)} and outputs a rank- k k orthogonal matrix V ( t ) ∈ ℝ k × d V^{(t)}\in\mathbb{R}^{k\times d} .
The first requirement is accuracy. For every t t , the subspace V ( t ) V^{(t)} should give a near-optimal rank- k k approximation to A ( t ) A^{(t)} , in the sense that
‖ A ( t ) − A ( t ) ( V ( t ) ) ⊤ V ( t ) ‖ F 2 ≤ ( 1 + ε ) ⋅ 𝖮𝖯𝖳 t , \|A^{(t)}-A^{(t)}(V^{(t)})^{\top}V^{(t)}\|_{F}^{2}\leq(1+\varepsilon)\cdot\mathsf{OPT}_{t},
where
𝖮𝖯𝖳 t = min V ∈ ℝ k × d ‖ A ( t ) − A ( t ) V ⊤ V ‖ F 2 \mathsf{OPT}_{t}=\min_{V\in\mathbb{R}^{k\times d}}\|A^{(t)}-A^{(t)}V^{\top}V\|_{F}^{2}
is the optimal rank- k k approximation error at time t t .
The second requirement is stability. As the matrix A A evolves, the sequence of outputs V ( 1 ) , … , V ( n ) V^{(1)},\ldots,V^{(n)} should change minimally. We formalize this desideratum through the notion of recourse , defined as the squared Frobenius distance between consecutive subspaces. For two rank- k k matrices R , T ∈ ℝ k × d R,T\in\mathbb{R}^{k\times d} , let P R P_{R} and P T P_{T} be the orthogonal projection matrices onto their row spans, and define Recourse ( R , T ) = ‖ P R − P T ‖ F 2 \operatorname*{Recourse}(R,T)=|P_{R}-P_{T}|_{F}^{2} . The goal is to minimize the total recourse
∑ t = 2 n Recourse ( V ( t ) , V ( t − 1 ) ) , \sum_{t=2}^{n}\operatorname*{Recourse}(V^{(t)},V^{(t-1)}),
while still meeting the approximation guarantee at every time step. This formulation reflects the practical intuition behind consistent low-rank approximation. The algorithm is allowed to adapt as the input evolves, but large or frequent rotations of the maintained subspace are discouraged, since they correspond to costly changes in downstream representations.
This notion of consistency is not addressed by standard streaming algorithms such as Frequent Directions [ 49 ] , which may change their output subspaces frequently in order to achieve strong approximation guarantees, e.g., even small perturbations in the data can cause a complete change in the reported subspace and leading to large recourse. However, the model noticed that the optimal rank- k k subspace is inherently stable under row insertions. In other words, the recourse between the optimal subspaces before and after a row insertion is bounded by a fixed constant.
Analyze whether the optimal rank- k k subspace is stable under rank-one updates when the stream has polynomially bounded condition number.
Provide a rigorous proof establishing the resulting recourse bound.
We show that when a single row is appended to a matrix, the optimal rank- k k right singular subspace changes by at most a constant amount. Specifically, if V t − 1 ∗ V^{}_{t-1} and V t ∗ V^{}{t} denote the top- k k subspaces before and after the row insertion, then the squared Frobenius distance between their projection matrices satisfies Recourse ( V t − 1 ∗ , V t ∗ ) ≤ 8 \operatorname*{Recourse}(V^{*}{t-1},V^{*}_{t})\leq 8 . This establishes that the optimal subspace is inherently stable under row insertions, providing a concrete bound on recourse in the row-arrival model.
Figure 13 : Model-generated resolution of a stability question in consistent low-rank approximation
In particular, the model produced a complete and correct proof showing:
Theorem 7.33
.
Suppose A ( t ) A^{(t)} is obtained from A ( t − 1 ) A^{(t-1)} by adding one new row. Let V t − 1 ∗ V^{}_{t-1} and V t ∗ V^{}{t} denote the optimal rank- k k subspaces of the matrices before and after the update. Then the change between these two subspaces is at most Recourse ( V t − 1 ∗ , V t ∗ ) ≤ 8 \operatorname*{Recourse}(V^{*}{t-1},V^{*}_{t})\leq 8 .
Proof.
Consider the covariance matrices
B t − 1 = ( A ( t − 1 ) ) ⊤ A ( t − 1 ) , B t = ( A ( t ) ) ⊤ A ( t ) . B_{t-1}=(A^{(t-1)})^{\top}A^{(t-1)},\qquad B_{t}=(A^{(t)})^{\top}A^{(t)}.
Since A ( t ) A^{(t)} is just A ( t − 1 ) A^{(t-1)} with the new row A t A_{t} appended, we have
B t = B t − 1 + A t ⊤ A t , B_{t}=B_{t-1}+A_{t}^{\top}A_{t},
so B t B_{t} is a rank-one positive semidefinite update of B t − 1 B_{t-1} . By the Eckart-Young theorem, the optimal rank- k k subspaces V t − 1 ∗ V^{}_{t-1} and V t ∗ V^{}{t} are spanned by the top k k eigenvectors of B t − 1 B{t-1} and B t B_{t} , respectively.
Next, we consider how much these subspaces can change. Define the subspace orthogonal to the new row as
S a = { v ∈ ℝ d : A t v = 0 } , S_{a}=\{v\in\mathbb{R}^{d}:A_{t}v=0\},
which has dimension d − 1 d-1 , assuming v v is nonzero. Let W = V t − 1 ∗ ∩ S a W=V^{}{t-1}\cap S{a} be the part of the old subspace that is unaffected by the new row. Using basic dimension counting, it follows that dim ( W ) ≥ dim ( V t − 1 ∗ ) − ( d − ( d − 1 ) ) = k − 1 \dim(W)\geq\dim(V^{}_{t-1})-(d-(d-1))=k-1 .
For any vector w ∈ W w\in W , we have B t w = B t − 1 w B_{t}w=B_{t-1}w , so w w is an eigenvector for both covariance matrices with the same eigenvalue. Let λ 1 ≥ … ≥ λ d \lambda_{1}\geq\ldots\geq\lambda_{d} be the eigenvalues of B t − 1 B_{t-1} and μ 1 ≥ … ≥ μ d \mu_{1}\geq\ldots\geq\mu_{d} those of B t B_{t} . By the Cauchy interlacing theorem,
μ 1 ≥ λ 1 ≥ μ 2 ≥ λ 2 ≥ … ≥ μ k ≥ λ k ≥ μ k + 1 ≥ … \mu_{1}\geq\lambda_{1}\geq\mu_{2}\geq\lambda_{2}\geq\ldots\geq\mu_{k}\geq\lambda_{k}\geq\mu_{k+1}\geq\ldots
Since W ⊆ V t − 1 ∗ W\subseteq V^{*}{t-1} , its vectors correspond to eigenvalues among the top k k of B t − 1 B{t-1} , and by interlacing they lie within the top k + 1 k+1 eigenvectors of B t B_{t} . From this, we conclude that the intersection of the old and new subspaces has dimension at least k − 2 k-2 .
Finally, consider the projection matrices P t − 1 P_{t-1} and P t P_{t} onto V t − 1 ∗ V^{}_{t-1} and V t ∗ V^{}{t} . Let W ∩ = V t − 1 ∗ ∩ V t ∗ W{\cap}=V^{}_{t-1}\cap V^{}{t} , and let P ∩ P{\cap} be the projection matrix onto W ∩ W_{\cap} .
If dim ( W ∩ ) = k \dim(W_{\cap})=k , the subspaces are identical and recourse is zero. If dim ( W ∩ ) = k − 1 \dim(W_{\cap})=k-1 , then
P t − 1 = P ∩ + u 1 u 1 ⊤ , P t = P ∩ + u 2 u 2 ⊤ , P_{t-1}=P_{\cap}+u_{1}u_{1}^{\top},\qquad P_{t}=P_{\cap}+u_{2}u_{2}^{\top},
where u 1 u_{1} and u 2 u_{2} are unit vectors orthogonal to the shared subspace. Then P t − P t − 1 = u 2 u 2 ⊤ − u 1 u 1 ⊤ P_{t}-P_{t-1}=u_{2}u_{2}^{\top}-u_{1}u_{1}^{\top} , so by the triangle inequality
Recourse ( P t , P t − 1 ) ≤ 2 ‖ u 1 u 1 ⊤ ‖ F 2 + 2 ‖ u 2 u 2 ⊤ ‖ F 2 = 4 . \operatorname*{Recourse}(P_{t},P_{t-1})\leq 2\|u_{1}u_{1}^{\top}\|_{F}^{2}+2\|u_{2}u_{2}^{\top}\|_{F}^{2}=4.
Similarly, if dim ( W ∩ ) = k − 2 \dim(W_{\cap})=k-2 , the difference involves four rank-one terms, giving Recourse ( P t , P t − 1 ) ≤ 8 \operatorname*{Recourse}(P_{t},P_{t-1})\leq 8 . ∎
This result has immediate algorithmic implications and has been incorporated into a work accepted for publication at ICLR 2026 [ 97 ] . Although the theorem is stated for row insertions, the result also applies to row deletions by symmetry. Moreover, changes to a single entry of a matrix can be expressed as a row deletion followed by a row insertion. Thus, the result directly implies a constant upper bound on recourse for these operations, thereby establishing a 𝒪 ( 1 ) \mathcal{O}(1) amortized recourse bound for the dynamic model , where the matrix is updated by a sequence of entry modifications, row insertions, or row deletions.
7.5.3
Global Efficient Encoding for Low-Rank Approximation
In many applications, we work with a large matrix A ∈ ℝ n × d A\in\mathbb{R}^{n\times d} , where both dimensions n n and d d may be very large. Directly storing or processing such matrices can be computationally expensive, especially when we only care about their action on low-dimensional subspaces or rank- k k approximations. Low-rank approximations (LRA) provide a natural tool to reduce storage and computation by approximating A A with a matrix of rank at most k ≪ min ( n , d ) k\ll\min(n,d) , while preserving important properties such as the Frobenius or spectral norm. A more powerful guarantee is achieving a projection-cost preservation. In our specific context, the rows of an input matrix A ∈ ℝ n × d A\in\mathbb{R}^{n\times d} arrive sequentially and the goal is to compute a matrix B ∈ ℝ m × d B\in\mathbb{R}^{m\times d} such that for all rank- k k orthogonal projection matrices P ∈ ℝ d × d P\in\mathbb{R}^{d\times d} , we have
( 1 − ε ) ‖ A − A P ‖ F 2 ≤ ‖ B − B P ‖ F 2 ≤ ( 1 + ε ) ‖ A − A P ‖ F 2 , (1-\varepsilon)\|A-AP\|_{F}^{2}\leq\|B-BP\|_{F}^{2}\leq(1+\varepsilon)\|A-AP\|_{F}^{2},
where ∥ ⋅ ∥ F |\cdot|_{F} denotes the Frobenius norm.
A common setting arises when A A can be decomposed as a concatenation of blocks,
A = Q 1 ∘ Q 2 ∘ ⋯ ∘ Q m , A=Q_{1}\circ Q_{2}\circ\dots\circ Q_{m},
where each block Q i ∈ ℝ n i × d Q_{i}\in\mathbb{R}^{n_{i}\times d} represents a submatrix of A A . This block structure naturally appears in streaming or distributed scenarios, where each Q i Q_{i} corresponds to a portion of the data arriving at different times or stored across multiple nodes. For each block, one can compute a local low-rank approximation W i W_{i} that summarizes Q i Q_{i} while approximately preserving its projection costs, i.e., the squared Frobenius norm under all rank- k k orthogonal projections. In particular, it is known that it suffices for W i W_{i} to just have r := O ~ k ε 2 r:=\tilde{O}{\frac{k}{\varepsilon^{2}}} subsampled and reweighted rows of Q i Q_{i} [ 22 ] . However, naïvely storing all local approximations W i W_{i} may still be expensive, especially if m m or the number of rows in each local approximation W i W_{i} is large. The goal of the global efficient encoding is to compress these local sketches into a representation that:
Uses significantly fewer bits of storage compared to storing all entries of W i W_{i} explicitly.
Allows efficient reconstruction of W i ′ W_{i}^{\prime} such that the concatenated matrix W ′ = W 1 ′ ∘ ⋯ ∘ W m ′ W^{\prime}=W_{1}^{\prime}\circ\dots\circ W_{m}^{\prime} still forms a good approximation to the original matrix A A in the sense of low-rank approximation and projection-cost preservation.
Preserves provable guarantees on the Frobenius norm and spectral norm errors, as well as the Loewner ordering of Gram matrices, which are important in downstream tasks such as regression, PCA, and clustering.
The model suggested the global efficient encoding for low-rank approximation depicted in Algorithm ˜ 7 .
Algorithm 7 Global efficient encoding for low-rank approximation
1: Accuracy parameter ε < 1 2 \varepsilon<\frac{1}{2} , matrix A = Q 1 ∘ … ∘ Q m A=Q_{1}\circ\ldots\circ Q_{m} , matrix B ∈ ℝ r × d B\in\mathbb{R}^{r\times d} that is a ( 1 + ε ) (1+\varepsilon) -PCP of A A , matrices W 1 , … , W m ∈ ℝ r × d W_{1},\ldots,W_{m}\in\mathbb{R}^{r\times d} where each W i W_{i} is a ( 1 + ε ) (1+\varepsilon) -PCP of Q i Q_{i} , rank k k
2: Encoded matrices W 1 ′ , … , W m ′ W_{1}^{\prime},\ldots,W_{m}^{\prime}
3: Set quantization precision ε ′ ≤ ε 4 C ε \varepsilon^{\prime}\leq\frac{\varepsilon}{4\sqrt{C_{\varepsilon}}} , where C ε = 1 + ε 1 − ε C_{\varepsilon}=\frac{1+\varepsilon}{1-\varepsilon}
4: Compute the SVD of B B and let V k ∈ ℝ d × k V_{k}\in\mathbb{R}^{d\times k} be the top- k k right singular vectors
5: Define projection matrices P B = V k V k ⊤ P_{B}=V_{k}V_{k}^{\top} and P B ⟂ = I d − P B P_{B}^{\perp}=I_{d}-P_{B}
6: Store V k V_{k} with high precision (e.g., 𝒪 ( log n ) \mathcal{O}(\log n) bits per entry)
7: for i = 1 i=1 to m m do
8: Compute head coefficients H i = W i V k H_{i}=W_{i}V_{k} and store with high precision
9: Compute tail residual T i = W i P B ⟂ T_{i}=W_{i}P_{B}^{\perp}
10: Quantize T i T_{i} entry-wise to relative precision ε ′ \varepsilon^{\prime} to obtain T i ′ T_{i}^{\prime} and store
11: end for
12: for i = 1 i=1 to m m do
13: Reconstruct W i ′ = H i V k ⊤ + T i ′ W_{i}^{\prime}=H_{i}V_{k}^{\top}+T_{i}^{\prime}
14: end for
15: return W ′ = W 1 ′ ∘ … ∘ W m ′ W^{\prime}=W_{1}^{\prime}\circ\ldots\circ W_{m}^{\prime}
The main idea behind the global encoding is to separate each block W i W_{i} into a “head” component and a “tail” residual with respect to a global low-rank subspace determined by the top- k k singular vectors of a global sketch B B of A A . Specifically, for a matrix B B that is a ( 1 + ε ) (1+\varepsilon) -PCP of A A , we compute its top- k k right singular vectors V k V_{k} and define the projections
P B = V k V k ⊤ , P B ⟂ = I d − P B . P_{B}=V_{k}V_{k}^{\top},\qquad P_{B}^{\perp}=I_{d}-P_{B}.
The head coefficients H i = W i V k H_{i}=W_{i}V_{k} capture the component of W i W_{i} in the top- k k global subspace, while the tail residual T i = W i P B ⟂ T_{i}=W_{i}P_{B}^{\perp} contains the remaining information orthogonal to this subspace. The tail residuals are then quantized entry-wise to a precision ε ′ \varepsilon^{\prime} , providing a compressed representation while controlling the additive projection-cost error. Reconstruction simply sums the head and tail contributions:
W i ′ = H i V k ⊤ + T i ′ . W_{i}^{\prime}=H_{i}V_{k}^{\top}+T_{i}^{\prime}.
This approach ensures that the concatenated reconstructed matrix W ′ W^{\prime} satisfies a ( 1 ± O ( m ε ) ) (1\pm O(m\varepsilon)) projection-cost preservation guarantee with respect to A A , while using significantly less storage than the naive concatenation of the W i W_{i} . Moreover, the method preserves Loewner orderings of the Gram matrices up to small additive errors, which is crucial for applications that rely on positive semi-definite approximations. The algorithm is thus globally efficient in both space and reconstruction time, while providing strong theoretical guarantees on approximation quality.
Toward establishing correctness of the proposed efficient encoding, the model first suggested the following auxiliary lemma that bounds the Frobenius norm of a projected matrix.
Lemma 7.34
.
For any projection matrix Q Q , it holds that
‖ W i Q ‖ F 2 ≤ C ε ‖ B Q ‖ F 2 , \|W_{i}Q\|_{F}^{2}\leq C_{\varepsilon}\|BQ\|_{F}^{2},
where C ε = 1 + ε 1 − ε C_{\varepsilon}=\frac{1+\varepsilon}{1-\varepsilon} .
Proof.
Since W i W_{i} is a ( 1 + ε ) (1+\varepsilon) -PCP of Q i Q_{i} , we have
‖ W i Q ‖ F 2 ≤ ( 1 + ε ) ‖ Q i Q ‖ F 2 . \|W_{i}Q\|_{F}^{2}\leq(1+\varepsilon)\|Q_{i}Q\|_{F}^{2}.
Because A A is the concatenation of the matrices Q j Q_{j} , it follows that
‖ A Q ‖ F 2 = ∑ j = 1 m ‖ Q j Q ‖ F 2 ≥ ‖ Q i Q ‖ F 2 . \|AQ\|_{F}^{2}=\sum_{j=1}^{m}\|Q_{j}Q\|_{F}^{2}\geq\|Q_{i}Q\|_{F}^{2}.
Moreover, as B B is a ( 1 + ε ) (1+\varepsilon) -PCP of A A , we obtain
‖ B Q ‖ F 2 ≥ ( 1 − ε ) ‖ A Q ‖ F 2 . \|BQ\|_{F}^{2}\geq(1-\varepsilon)\|AQ\|_{F}^{2}.
Combining these inequalities yields
‖ W i Q ‖ F 2 ≤ 1 + ε 1 − ε ‖ B Q ‖ F 2 = C ε ‖ B Q ‖ F 2 . \|W_{i}Q\|_{F}^{2}\leq\frac{1+\varepsilon}{1-\varepsilon}\|BQ\|_{F}^{2}=C_{\varepsilon}\|BQ\|_{F}^{2}.
∎
Using this result, the model then showed that the global efficient encoding results in a PCP with small additive error, which for our downstream applications can ultimately be absorbed into multiplicative error.
Lemma 7.35
.
For every rank- k k orthogonal projection matrix P ∈ ℝ d × d P\in\mathbb{R}^{d\times d} and each i ∈ [ m ] i\in[m] ,
‖ W i − W i P ‖ F 2 − ε ‖ B − B P ‖ F 2 ≤ ‖ W i ′ − W i ′ P ‖ F 2 ≤ ‖ W i − W i P ‖ F 2 + ε ‖ B − B P ‖ F 2 . \|W_{i}-W_{i}P\|_{F}^{2}-\varepsilon\|B-BP\|_{F}^{2}\leq\|W_{i}^{\prime}-W_{i}^{\prime}P\|_{F}^{2}\leq\|W_{i}-W_{i}P\|_{F}^{2}+\varepsilon\|B-BP\|_{F}^{2}.
Proof.
Let Q = I d − P Q=I_{d}-P and fix an arbitrary i ∈ [ m ] i\in[m] . It suffices to show that
| ‖ W i ′ Q ‖ F 2 − ‖ W i Q ‖ F 2 | ≤ ε ‖ B Q ‖ F 2 . \left|\|W_{i}^{\prime}Q\|_{F}^{2}-\|W_{i}Q\|_{F}^{2}\right|\leq\varepsilon\|BQ\|_{F}^{2}.
Define E i = W i ′ − W i = T i ′ − T i E_{i}=W_{i}^{\prime}-W_{i}=T_{i}^{\prime}-T_{i} . Then,
‖ W i ′ Q ‖ F 2 = ‖ ( W i + E i ) Q ‖ F 2 = ‖ W i Q ‖ F 2 + 2 ⟨ W i Q , E i Q ⟩ F + ‖ E i Q ‖ F 2 . \|W_{i}^{\prime}Q\|_{F}^{2}=\|(W_{i}+E_{i})Q\|_{F}^{2}=\|W_{i}Q\|_{F}^{2}+2\langle W_{i}Q,E_{i}Q\rangle_{F}+\|E_{i}Q\|_{F}^{2}.
Consequently,
| ‖ W i ′ Q ‖ F 2 − ‖ W i Q ‖ F 2 | = | 2 ⟨ W i Q , E i Q ⟩ F + ‖ E i Q ‖ F 2 | ≤ 2 ‖ W i Q ‖ F ‖ E i Q ‖ F + ‖ E i Q ‖ F 2 , \left|\|W_{i}^{\prime}Q\|_{F}^{2}-\|W_{i}Q\|_{F}^{2}\right|=\left|2\langle W_{i}Q,E_{i}Q\rangle_{F}+\|E_{i}Q\|_{F}^{2}\right|\leq 2\|W_{i}Q\|_{F}\|E_{i}Q\|_{F}+\|E_{i}Q\|_{F}^{2},
where the inequality follows from the Cauchy-Schwarz inequality. Since Q Q is a projection matrix, we have ‖ E i Q ‖ F ≤ ‖ E i ‖ F ≤ ε ′ ‖ T i ‖ F |E_{i}Q|{F}\leq|E{i}|{F}\leq\varepsilon^{\prime}|T{i}|_{F} , which implies
| ‖ W i ′ Q ‖ F 2 − ‖ W i Q ‖ F 2 | ≤ 2 ε ′ ‖ W i Q ‖ F ‖ T i ‖ F + ( ε ′ ) 2 ‖ T i ‖ F 2 . \left|\|W_{i}^{\prime}Q\|_{F}^{2}-\|W_{i}Q\|_{F}^{2}\right|\leq 2\varepsilon^{\prime}\|W_{i}Q\|_{F}\|T_{i}\|_{F}+(\varepsilon^{\prime})^{2}\|T_{i}\|_{F}^{2}.
By Lemma ˜ 7.34 , ‖ W i Q ‖ F 2 ≤ C ε ‖ B Q ‖ F 2 |W_{i}Q|{F}^{2}\leq C{\varepsilon}|BQ|{F}^{2} . Furthermore, since P B P{B} denotes the projection onto the top- k k singular vectors of B B ,
‖ T i ‖ F 2 = ‖ W i P B ⟂ ‖ F 2 ≤ C ε ‖ B P B ⟂ ‖ F 2 ≤ C ε ‖ B Q ‖ F 2 . \|T_{i}\|_{F}^{2}=\|W_{i}P_{B}^{\perp}\|_{F}^{2}\leq C_{\varepsilon}\|BP_{B}^{\perp}\|_{F}^{2}\leq C_{\varepsilon}\|BQ\|_{F}^{2}.
Substituting these bounds yields
| ‖ W i ′ Q ‖ F 2 − ‖ W i Q ‖ F 2 | ≤ C ε ( 2 ε ′ + ( ε ′ ) 2 ) ‖ B Q ‖ F 2 . \left|\|W_{i}^{\prime}Q\|_{F}^{2}-\|W_{i}Q\|_{F}^{2}\right|\leq C_{\varepsilon}\bigl(2\varepsilon^{\prime}+(\varepsilon^{\prime})^{2}\bigr)\|BQ\|_{F}^{2}.
Choosing ε ′ ≤ ε 4 C ε \varepsilon^{\prime}\leq\frac{\varepsilon}{4\sqrt{C_{\varepsilon}}} completes the proof. ∎
Finally, we analyze the space complexity of the global efficient encoding.
Lemma 7.36
.
The global encoding scheme can be represented using
O ( k d log n + m r k log n + m r d ( log 1 ε ′ + log log n ) ) O\left(kd\log n+mrk\log n+mrd\left(\log\frac{1}{\varepsilon^{\prime}}+\log\log n\right)\right)
bits of space in total.
Proof.
The storage cost consists of three components. First, the matrix V k ∈ ℝ d × k V_{k}\in\mathbb{R}^{d\times k} is stored at high precision, requiring O ( k d log n ) O(kd\log n) bits. Second, each head coefficient matrix H i ∈ ℝ r × k H_{i}\in\mathbb{R}^{r\times k} is stored at high precision, contributing a total of O ( m r k log n ) O(mrk\log n) bits across all i ∈ [ m ] i\in[m] . Finally, each tail residual T i ′ T_{i}^{\prime} is stored using entry-wise quantization with precision parameter ε ′ \varepsilon^{\prime} . This quantization requires O ( log 1 ε ′ + log log n ) O\left(\log\frac{1}{\varepsilon^{\prime}}+\log\log n\right) bits per entry, giving a total of
O ( m r d ( log 1 ε ′ + log log n ) ) O\left(mrd\left(\log\frac{1}{\varepsilon^{\prime}}+\log\log n\right)\right)
bits for all residuals. Summing these three contributions gives the stated space bound. ∎
The existence of a global efficient encoding has immediate implications for low-rank approximation. In particular, it enables the construction of projection-cost preservation sketches in a single pass over the data, using
k 2 ε 2 ⋅ polylog ( 1 ε , log ( n d κ ) ) + O ~ ( d k ε 2 ) \frac{k^{2}}{\varepsilon^{2}}\cdot\mathrm{polylog}\left(\frac{1}{\varepsilon},\log(nd\kappa)\right)+\tilde{O}\left(\frac{dk}{\varepsilon^{2}}\right)
words of space and input-sparsity runtime. Here, κ \kappa is a term related to the “condition number” of the data stream, which essentially captures the worst-case arrival of the matrix over the data stream, e.g., if “small” rows arrive first and the “larger” rows arrive later. This quantitatively matches the best known offline coreset constructions while improving over previous streaming algorithms by removing extra logarithmic factors, making the space essentially independent of n n for low-rank projections. Consequently, high-accuracy low-rank approximations are achievable efficiently in streaming settings, closing the gap between streaming and offline performance and enabling fast, memory-efficient dimensionality reduction and randomized matrix computations.
7.5.4
Even Faster Algorithm for the Chamfer Distance
The Chamfer distance is a popular quantification of the dissimilarity between point clouds. For any two d d -dimensional point sets A , B A,B of sizes up to n n , the Chamfer distance from A A to B B is defined as
𝖢𝖧 ( A , B ) = ∑ a ∈ A min b ∈ B ‖ a − b ‖ \mathsf{CH}(A,B)=\sum_{a\in A}\min_{b\in B}\|a-b\|
where ∥ ⋅ ∥ |\cdot| is the underlying norm defining the distance between two points, such as the Euclidean or Manhattan distance.
While the naive algorithm for the Chamfer distance takes 𝒪 ( d n 2 ) \mathcal{O}(dn^{2}) time, recently, [ 7 ] proposed the first near-linear-time algorithm to approximate the Chamfer distance. Their algorithm works for the underlying norm being ℓ 1 \ell_{1} or ℓ 2 \ell_{2} , and outputs an ( 1 ± ε ) (1\pm\varepsilon) -approximation in time 𝒪 ( d n log ( n ) / ε 2 ) \mathcal{O}(dn\log(n)/\varepsilon^{2}) . When ε \varepsilon is a constant, this leaves a gap of 𝒪 ( log n ) \mathcal{O}(\log n) between the upper bound and the trivial Ω ( d n ) \Omega(dn) lower bound.
Towards closing this gap, [ 38 ] proposed a faster algorithm for the ℓ 1 \ell_{1} norm. This leads to the natural question of whether the improvement generalizes to the ℓ 2 \ell_{2} norm. The model answers this question in affirmative.
Initial Improvement in the Low-Dimensional Regime
As summarized in Figure ˜ 14 , with a generic initial prompt, the model identifies the open problem and the key ingredient to focus on. It then gives a proof in the ℓ 2 \ell_{2} norm, which improves over the runtime of [ 7 ] in the low-dimensional regime ( d ≪ log n / log log n \sqrt{d}\ll\log n/\log\log n ).
The proof focuses on a data structure called quadtree. A quadtree of depth t t is defined by a random offset vector z ∼ [ 0 , 2 t ] d z\sim[0,2^{t}]^{d} . It assigns every point in ℝ d \mathbb{R}^{d} with a sequence of t t hash values, using t t nested grids shifted by z z . Concretely, for any a ∈ ℝ d a\in\mathbb{R}^{d} and any integer k k such that 0 ≤ k ≤ t 0\leq k\leq t , it hashes h k ( a ) := ( ⌈ a 1 + z 1 2 k ⌉ , ⌈ a 2 + z 2 2 k ⌉ , h_{k}(a):=(\lceil\frac{a_{1}+z_{1}}{2^{k}}\rceil,\lceil\frac{a_{2}+z_{2}}{2^{k}}\rceil, ⋯ , ⌈ a d + z d 2 k ⌉ ) \cdots,\lceil\frac{a_{d}+z_{d}}{2^{k}}\rceil) .
[ 38 ] used two independent quadtrees to estimate the nearest neighbor distance for all a ∈ A a\in A in ℓ 1 \ell_{1} , i.e. estimating min b ∈ A ‖ a − b ‖ 1 \min_{b\in A}|a-b|{1} for all a ∈ A a\in A . The model follows a similar proof structure and analyzes the behavior of quadtrees in ℓ 2 \ell{2} as follows:
Lemma 7.37
.
For all a , b ∈ ℝ d a,b\in\mathbb{R}^{d} and 0 ≤ k ≤ t 0\leq k\leq t , if h k ( a ) = h k ( b ) h_{k}(a)=h_{k}(b) , then ‖ a − b ‖ 2 ≤ 2 k d |a-b|_{2}\leq 2^{k}\sqrt{d} .
Proof.
If h k ( a ) = h k ( b ) h_{k}(a)=h_{k}(b) , then a , b a,b lie in the same d d -dimensional cube of side-length 2 k 2^{k} . The ℓ 2 \ell_{2} diameter of such a cube is 2 k d 2^{k}\sqrt{d} . ∎
Lemma 7.38
.
With probability 1 − 𝒪 ( 1 / n ) 1-\mathcal{O}(1/n) , the following holds simultaneously for all a , b , k a,b,k : If h k ( a ) = h k ( b ) h_{k}(a)=h_{k}(b) , then ‖ a − b ‖ 2 ≤ 2 k ⋅ 3 log n |a-b|_{2}\leq 2^{k}\cdot 3\log n .
Proof.
Lemma 3.4 of [ 38 ] shows that ∀ a , b , k : ‖ a − b ‖ 1 ≤ 2 k ⋅ 3 log n \forall a,b,k:|a-b|{1}\leq 2^{k}\cdot 3\log n with probability 1 − 𝒪 ( 1 / n ) 1-\mathcal{O}(1/n) . Combining this with the fact that ‖ a − b ‖ 1 ≥ ‖ a − b ‖ 2 |a-b|{1}\geq|a-b|_{2} concludes the proof. ∎
With the above lemmas, the model shows that in expectation, two quadtrees output good estimation to 𝗈𝗉𝗍 a := min b ∈ A ‖ a − b ‖ 2 \mathsf{opt}{a}:=\min{b\in A}|a-b|{2} for all a ∈ A a\in A . The estimator 𝒟 a \mathcal{D}{a} is defined as:
Identifying the smallest k ~ \tilde{k} such that h k ~ ( a ) = h k ~ ( b ) h_{\tilde{k}}(a)=h_{\tilde{k}}(b) for some b ∈ B b\in B across the two quadtrees. 7 7 7 For simplicity, assume that we chose t t so that there always exists such k ~ ≤ t \tilde{k}\leq t .
Assigning 𝒟 a := ‖ a − b ‖ 2 \mathcal{D}{a}:=|a-b|{2} for arbitrary b b such that h k ~ ( a ) = h k ~ ( b ) h_{\tilde{k}}(a)=h_{\tilde{k}}(b) .
Lemma 7.39
.
With probability 1 − 𝒪 ( 1 / n ) 1-\mathcal{O}(1/n) , it holds for all a ∈ A a\in A that 𝔼 D a ≤ F ⋅ 𝗈𝗉𝗍 ( a ) \mathbb{E}{D_{a}}\leq F\cdot\mathsf{opt}(a) , where F = 𝒪 ( min ( d 3 / 2 , d log n ) ) F=\mathcal{O}(\min(d^{3/2},d\log n)) .
Proof.
We need the following fact:
Fact 7.40
( [ 7 ] ) .
Pr [ h k ( a ) ≠ h k ( b ) ] ≤ ‖ a − b ‖ 1 2 k \Pr[h_{k}(a)\neq h_{k}(b)]\leq\frac{|a-b|_{1}}{2^{k}} .
Similar to Theorem 3.5 in [ 38 ] , we condition on Lemma 7.38 and fix a ∈ A a\in A . Let k ∗ := ⌈ log ( 𝗈𝗉𝗍 a ) ⌉ k^{*}:=\lceil\log(\mathsf{opt}{a})\rceil . And let ℰ k \mathcal{E}{k} be the event that k ~ = k \tilde{k}=k (recall that we identified a unique k ~ \tilde{k} when defining the estimator). If ℰ k \mathcal{E}{k} occurs, the estimator 𝒟 a \mathcal{D}{a} is bounded by 2 k ⋅ F ~ 2^{k}\cdot\tilde{F} for F ~ = 𝒪 ( min ( d , log n ) ) \tilde{F}=\mathcal{O}(\min(\sqrt{d},\log n)) . Thus
𝔼 𝒟 a ≤ ∑ 0 ≤ k ≤ t Pr [ ℰ k ] ⋅ 2 k ⋅ F ~ . \mathbb{E}{\mathcal{D}_{a}}\leq\sum_{0\leq k\leq t}\Pr[\mathcal{E}_{k}]\cdot 2^{k}\cdot\tilde{F}.
We split the sum based on k ≤ k ∗ k\leq k^{} and k > k ∗ k>k^{} . The first part is bounded by
F ~ 𝗈𝗉𝗍 a ∑ k ≤ k ∗ Pr [ ℰ k ] ≤ F ~ 𝗈𝗉𝗍 a . \tilde{F}\mathsf{opt}_{a}\sum_{k\leq k^{*}}\Pr[\mathcal{E}_{k}]\leq\tilde{F}\mathsf{opt}_{a}.
For the second part ( k > k ∗ k>k^{} ), we fix b ∗ := arg min b ∈ B ‖ a − b ‖ 2 b^{}:=\arg\min_{b\in B}|a-b|{2} . Event ℰ k \mathcal{E}{k} implies that h k − 1 ( a ) ≠ h k − 1 ( b ∗ ) h_{k-1}(a)\neq h_{k-1}(b^{*}) in two independent quadtrees. Therefore,
Pr [ ℰ k ] ≤ ( ‖ a − b ∗ ‖ 1 2 k − 1 ) 2 ≤ ( d ‖ a − b ∗ ‖ 2 2 k − 1 ) 2 = d ( 𝗈𝗉𝗍 a ) 2 2 2 k − 2 \Pr[\mathcal{E}_{k}]\leq(\frac{\|a-b^{*}\|_{1}}{2^{k-1}})^{2}\leq(\frac{\sqrt{d}\|a-b^{*}\|_{2}}{2^{k-1}})^{2}=\frac{d(\mathsf{opt}_{a})^{2}}{2^{2k-2}}
because ‖ a − b ∗ ‖ 1 ≤ d ‖ a − b ∗ ‖ 2 |a-b^{}|_{1}\leq\sqrt{d}|a-b^{}|_{2} . Now we can bound the summation for k > k ∗ k>k^{*} :
∑ k > k ∗ Pr [ ℰ k ] ⋅ 2 k ⋅ F ~ ≤ F ~ ∑ k > k ∗ d ( 𝗈𝗉𝗍 a ) 2 2 2 k − 2 ⋅ 2 k ≤ 4 F ~ d ⋅ 𝗈𝗉𝗍 a ∑ k > k ∗ 𝗈𝗉𝗍 a 2 k = 𝒪 ( F ~ d 𝗈𝗉𝗍 a ) . \sum_{k>k^{*}}\Pr[\mathcal{E}_{k}]\cdot 2^{k}\cdot\tilde{F}\leq\tilde{F}\sum_{k>k^{*}}\frac{d(\mathsf{opt}_{a})^{2}}{2^{2k-2}}\cdot 2^{k}\leq 4\tilde{F}d\cdot\mathsf{opt}_{a}\sum_{k>k^{*}}\frac{\mathsf{opt}_{a}}{2^{k}}=\mathcal{O}(\tilde{F}d\mathsf{opt}_{a}).
Thus, 𝔼 𝒟 a = 𝒪 ( F ~ d ) ⋅ 𝗈𝗉𝗍 a \mathbb{E}{\mathcal{D}{a}}=\mathcal{O}(\tilde{F}d)\cdot\mathsf{opt}{a} , i.e. the approximation factor is F = 𝒪 ( min ( d 3 / 2 , d log n ) ) F=\mathcal{O}(\min(d^{3/2},d\log n)) . ∎
As outlined in Figure ˜ 14 , the rest of the algorithm in [ 38 ] works the same regardless of the underlying norm. And the total runtime can be derived from the bound on the approximation factor F F . This leads to the conclusion (Theorem 7.41 ).
Please try to improve the paper by identifying and solving an open question from it.
For your proof, adhere to: ⟨ rigor text ⟩ \langle\text{rigor text}\rangle .
[ 38 ] presents an
improved algorithm for approximating the Chamfer distance in the ℓ 1 \ell_{1} metric, achieving a runtime
of 𝒪 ( d n ( log log ( n ) + log 1 ε ) / ε 2 ) \mathcal{O}(dn(\log\log(n)+\log\frac{1}{\varepsilon})/\varepsilon^{2}) . However, they note that extending this improvement to the ℓ 2 \ell_{2} metric
is challenging. The standard approach, used in prior work [ 7 ] , involves embedding the points
from ℓ 2 \ell_{2} to ℓ 1 \ell_{1} , which takes 𝒪 ( d n log ( n ) / ε 2 ) \mathcal{O}(dn\log(n)/\varepsilon^{2}) time, thereby eliminating the speedup. The authors leave
finding a faster embedding algorithm, or consequently a faster ℓ 2 \ell_{2} Chamfer distance algorithm, as
an open problem.
We address this open problem by analyzing the performance of the proposed algorithmic framework when adapted directly to the ℓ 2 \ell_{2} metric. We show that the new techniques can yield a faster algorithm for the ℓ 2 \ell_{2} Chamfer distance when the dimension d d is relatively small.
Overview of the Approach:
The algorithm presented in the paper consists of three main components:
QuadTree: A fast procedure using quadtrees to compute crude estimates 𝒟 a \mathcal{D}{a} of 𝗈𝗉𝗍 a := min b ∈ B ‖ a − b ‖ \mathsf{opt}{a}:=\min_{b\in B}|a-b| . This runs in 𝒪 ( d n log log n ) \mathcal{O}(dn\log\log n) time.
Tournament: An algorithm to efficiently compute 𝒪 ( 1 ) \mathcal{O}(1) -approximations of 𝗈𝗉𝗍 a \mathsf{opt}_{a} for a batch of sampled points.
Rejection Sampling: A framework that uses the estimates from QuadTree and Tournament to perform importance sampling and estimate the Chamfer distance.
The efficiency of the overall algorithm depends on the approximation factor achieved by the
QuadTree step. If QuadTree provides an F F -approximation in expectation (i.e., 𝔼 𝒟 a ≤ F ⋅ 𝗈𝗉𝗍 a \mathbb{E}{\mathcal{D}{a}}\leq F\cdot\mathsf{opt}{a} ),
the overall runtime is dominated by the QuadTree and Tournament steps. The Tournament runs in time roughly 𝒪 ( n ( d + F / ε 2 ) log ( F / ε 2 ) ) \mathcal{O}(n(d+F/\varepsilon^{2})\log(F/\varepsilon^{2})) .
For the ℓ 1 \ell_{1} metric, the authors prove F = 𝒪 ( min ( d , log n ) ) F=\mathcal{O}(\min(d,\log n)) . This leads to the
improved 𝒪 ( d n log log n / ε 2 ) \mathcal{O}(dn\log\log n/\varepsilon^{2}) runtime. We analyze the approximation factor F F when the framework is applied to the ℓ 2 \ell_{2} metric.
[Proofs to Lemma 7.37 , 7.38 , 7.39 ]
Conclusion:
We have shown that the techniques introduced in the paper can be adapted directly to the ℓ 2 \ell_{2} metric
to yield a faster algorithm for the Chamfer distance, provided the dimension d d is small.
Theorem 7.41
.
There is an algorithm that computes a ( 1 + ε ) (1+\varepsilon) -approximation to the Chamfer distance under the ℓ 2 \ell_{2} metric in time 𝒪 ( n ( F / ε 2 ) ( log ( F / ε 2 ) + log log n ) ) \mathcal{O}(n(F/\varepsilon^{2})(\log(F/\varepsilon^{2})+\log\log n)) , where F = 𝒪 ( min ( d 3 / 2 , d log n ) ) F=\mathcal{O}(\min(d^{3/2},d\log n)) .
This runtime constitutes an improvement over the previous 𝒪 ( d n log ( n ) / ε 2 ) \mathcal{O}(dn\log(n)/\varepsilon^{2}) when d 1 / 2 ( log d + log ( 1 / ε ) + log log n ) = o ( log n ) d^{1/2}(\log d+\log(1/\varepsilon)+\log\log n)=o(\log n) .
Figure 14 : Model identification of problem and initial solution.
Refined Improvement for Any Dimension. Following the initial response of the model , we asked the model to focus on improving the high-dimensional case. the model refines the previous Lemma 7.39 and combines it with a fast Johnson-Lindenstrauss Transform. This gives an algorithm for ℓ 2 \ell_{2} that matches the runtime of ℓ 1 \ell_{1} in [ 38 ] for any dimension d d , thus improving the previous best runtime from [ 7 ] .
Lemma 7.42
(Refinement of Lemma 7.39 ) .
With probability 1 − 𝒪 ( 1 / n ) 1-\mathcal{O}(1/n) , it holds for all a ∈ A a\in A that 𝔼 D a ≤ F ⋅ 𝗈𝗉𝗍 ( a ) \mathbb{E}{D_{a}}\leq F\cdot\mathsf{opt}(a) , where F = 𝒪 ( min ( d , d log n ) ) F=\mathcal{O}(\min(d,\sqrt{d}\log n)) .
Compared to Lemma 7.39 , this proof switches from ℓ 1 \ell_{1} to ℓ 2 \ell_{2} in a later stage. This switch applies ∥ ⋅ ∥ 1 ≤ d ∥ ⋅ ∥ 2 |\cdot|{1}\leq\sqrt{d}|\cdot|{2} , thus introducing an extra factor of d \sqrt{d} . Intuitively, we want this to happen after multiplying the probabilities of two independent quadtrees. This is precisely what the model changes.
Proof.
As before, we fix b ∗ := arg min b ∈ B ‖ a − b ‖ 2 b^{}:=\arg\min_{b\in B}|a-b|_{2} . But now we define k ∗ := ⌈ log ( ‖ a − b ∗ ‖ 1 ) ⌉ k^{}:=\lceil\log(|a-b^{}|{1})\rceil . We again split 𝔼 𝒟 a ≤ ∑ 0 ≤ k ≤ t Pr [ ℰ k ] ⋅ 2 k ⋅ F ~ \mathbb{E}{\mathcal{D}{a}}\leq\sum_{0\leq k\leq t}\Pr[\mathcal{E}_{k}]\cdot 2^{k}\cdot\tilde{F} based on k ≤ k ∗ k\leq k^{} and k > k ∗ k>k^{} , and bound the first part by F ~ 2 k ∗ ∑ k ≤ k ∗ Pr [ ℰ k ] ≤ F ~ ‖ a − b ∗ ‖ 1 \tilde{F}2^{k^{}}\sum_{k\leq k^{}}\Pr[\mathcal{E}_{k}]\leq\tilde{F}|a-b^{}|_{1} . For the second part,
∑ k > k ∗ Pr [ ℰ k ] ⋅ 2 k ⋅ F ~ ≤ F ~ ∑ k > k ∗ ( ‖ a − b ∗ ‖ 1 2 k − 1 ) 2 ⋅ 2 k ≤ 4 F ~ ⋅ ‖ a − b ∗ ‖ 1 ∑ k > k ∗ ‖ a − b ∗ ‖ 1 2 k = 𝒪 ( F ~ ‖ a − b ∗ ‖ 1 ) . \sum_{k>k^{*}}\Pr[\mathcal{E}_{k}]\cdot 2^{k}\cdot\tilde{F}\leq\tilde{F}\sum_{k>k^{*}}(\frac{\|a-b^{*}\|_{1}}{2^{k-1}})^{2}\cdot 2^{k}\leq 4\tilde{F}\cdot\|a-b^{*}\|_{1}\sum_{k>k^{*}}\frac{\|a-b^{*}\|_{1}}{2^{k}}=\mathcal{O}(\tilde{F}\|a-b^{*}\|_{1}).
Thus, 𝔼 𝒟 a = 𝒪 ( F ~ ) ⋅ ‖ a − b ∗ ‖ 1 ≤ 𝒪 ( F ~ d ) ⋅ 𝗈𝗉𝗍 a \mathbb{E}{\mathcal{D}{a}}=\mathcal{O}(\tilde{F})\cdot|a-b^{*}|{1}\leq\mathcal{O}(\tilde{F}\sqrt{d})\cdot\mathsf{opt}_{a} . The approximation factor is F = 𝒪 ( min ( d , d log n ) ) F=\mathcal{O}(\min(d,\sqrt{d}\log n)) . ∎
As mentioned in Figure ˜ 15 , this gives a 𝒪 ( d n ( log d + log log n + log 1 ε ) / ε 2 ) \mathcal{O}(dn(\log d+\log\log n+\log\frac{1}{\varepsilon})/\varepsilon^{2}) . For d ≪ poly ( n ) d\ll\mathrm{poly}(n) -time algorithm, this improves the best known runtime of ℓ 2 \ell_{2} . Moreover, for d ≤ poly ( log n ) d\leq\mathrm{poly}(\log n) , this already matches the best known runtime of ℓ 1 \ell_{1} . Finally, to handle larger d d , the model applies fast Johnson-Lindenstrauss transform to reduce the dimension, and then estimate the Chamfer distance.
Note that while the model claims the existence of an (unconditional) 𝒪 ( d n ) \mathcal{O}(dn) -time JL transform, which we are not aware of, there indeed exists an 𝒪 ( d n log m ) \mathcal{O}(dn\log m) -time JL transform that reduces the dimension to m = 𝒪 ( log 2 n / ε 2 ) m=\mathcal{O}(\log^{2}n/\varepsilon^{2}) [ 34 ] , which suffices for the final conclusion (Theorem 7.43 ).
Focus on improving the high dimensional regime. Concretely, try (1) using an ℓ 2 \ell_{2} -native scheme, and (2) modifying later stages of the algorithm.
We begin by rigorously analyzing the performance of the QuadTree estimator when adapted
to the ℓ 2 \ell_{2} metric. This analysis provides a tighter bound than the one presented in the
preamble and establishes the baseline performance.
[Proof to Lemma 7.42 ]
This leads to 𝒪 ( d n ( log d + log log n + log 1 ε ) / ε 2 ) \mathcal{O}(dn(\log d+\log\log n+\log\frac{1}{\varepsilon})/\varepsilon^{2}) total runtime. If d d is large (e.g., d = poly ( n ) d=\mathrm{poly}(n) ), the runtime matches
the previous state-of-the-art 𝒪 ( d n log n / ε 2 ) \mathcal{O}(dn\log n/\varepsilon^{2}) , offering no substantial improvement.
Strategy (1) proposes using an ℓ 2 \ell_{2} -native approach. We can achieve this by employing
dimensionality reduction via the Johnson-Lindenstrauss (JL) transform before applying the
framework. This reduces the dimension from d d to m = 𝒪 ( log n / ε 2 ) m=\mathcal{O}(\log n/\varepsilon^{2}) , thereby reducing the
impact of the dimension on the approximation factor F F .
Fast Johnson-Lindenstrauss (JL) Transform : Given n n
points in ℝ d \mathbb{R}^{d} and <se
AI Summary: Based on hf metadata. Not a recommendation.
🛡️ Paper Transparency Report
Technical metadata sourced from upstream repositories.
🆔 Identity & Source
- id
- arxiv-paper--unknown--2602.03837
- slug
- unknown--2602.03837
- source
- hf
- author
- Unknown
- license
- ArXiv
- tags
- paper, research
⚙️ Technical Specs
- architecture
- null
- params billions
- null
- context length
- null
- pipeline tag
📊 Engagement & Metrics
- downloads
- 0
- stars
- 0
- forks
- 0
Data indexed from public sources. Updated daily.