Can legal AI platforms provide useful insight into how the U.S. Supreme Court might decide difficult cases?
Chatrie v. United States, a pending Fourth Amendment case involving law enforcement’s use of a geofence warrant, offers a useful test case. To explore that question, we ran a mini-experiment.
Following oral arguments in the case, we provided Thomson Reuters’s CoCounsel, Harvey (without the LexisNexis tie-in), and LexisNexis Protégé with the same case materials and asked each platform to simulate how the Supreme Court would decide Chatrie. The results were striking. All three platforms predicted that the Supreme Court would reverse the Fourth Circuit’s en banc decision affirming the denial of Chatrie’s suppression motion. Two platforms predicted reversal by a 5-4 vote, while one predicted reversal by a broader 7-2 vote.
Because Chatrie remains pending before the Supreme Court, we do not yet know whether any of the platforms’ predictions will match the Court’s eventual decision. We are publishing the results now, before the opinion issues, for an important methodological reason: once the Court decides Chatrie, the opinion, commentary, summaries, and legal analysis may begin appearing in legal databases, search results, and AI tool environments. At that point, it would be harder to know whether a later “prediction” was based on the uploaded case materials or influenced by post-decision information. For that reason, these results can be understood as a pre-decision simulation.
The purpose of this experiment is not to suggest that AI tools should replace human Justices or how this case should be decided. Rather, the experiment offers a limited snapshot of the current state of legal AI technology in a particularly challenging setting: a case that is difficult and important enough to reach the Supreme Court of the United States.
This also warrants a brief observation about how the performance of AI technology may vary depending on the difficulty of the case being analyzed. Edge cases involving unsettled doctrine, fractured lower-court opinions, and competing constitutional principles are inherently harder to predict than routine cases governed by clear precedent. At the same time, Supreme Court prediction may present certain advantages for AI tools because there is a substantial public record regarding each Justice’s jurisprudence, prior opinions, oral-argument questions, methodological commitments, and voting patterns. That level of decisionmaker-specific data is often less available for lower-court judges.
Accordingly, we expect that the reliability of predictions from AI tools to depend on both the novelty of the legal issue and the quality of the available information on the decision-makers.
These platforms are typically used to develop arguments designed to persuade Justices and Judges. But this turns the lens in the opposite direction. Instead of advocacy, we seek prediction: given the same inputs, can these tools predict what the Justices are likely to do? The goal is not empirical truth about how the case should be decided, but the practical reality of how it may be decided. Of course, the tools’ predictive accuracy is also instructive regarding the quality of the tools’ advocacy advice.
Finally, although we identify the platforms included in the experiment, we have anonymized which platforms produced which result. That choice is intentional. The purpose of this exercise is not to endorse, rank, or criticize any particular platform. This decision is also a recognition of the limits of this experiment. This is one case, one snapshot, one prompt, and one run per platform. We do not want to overstate the significance or implications of the results.
Background on Chatrie
Chatrie presents the question of whether the government violated the Fourth Amendment when it obtained location data from Google to identify devices near the scene of a bank robbery. According to the case record, Google internally developed a three-step process for providing law enforcement with “Location History” data of users. In this case, consistent with the three-step process, law enforcement obtained a geofence warrant from a state magistrate to obtain the “Location History” of users within a certain area near the site of the robbery during a limited period of time. The geographic area of the search initially covered a 150-meter radius around the bank during the 30 minutes before and after the robbery. Google first supplied law enforcement with anonymized information of users within the geofence during the requested time period. Then, the warrant authorized the officer – without additional court involvement – to “attempt to narrow down the list” and seek additional information regarding those users outside the geofence for an additional period of time. Finally, the warrant authorized the officer to “attempt to narrow down the list” and seek “identifying account information,” including names, telephone numbers, and email addresses. Following this process, law enforcement ultimately obtained identifying information for three accounts, one of which belonged to Okello T. Chatrie. That location evidence helped lead to Chatrie’s prosecution and eventual conviction for bank robbery.
The U.S. District Court for the Eastern District of Virginia found that the geofence warrant likely violated the U.S. Constitution but did not suppress the evidence due to the officer’s good faith reliance on the warrant (590 F. Supp 3d 901 (E.D. Va. 2022)). On appeal, a highly fractured Fourth Circuit issued an en banc affirmance with no majority opinion, which resulted in the conviction being upheld (136 F. 4th 100 (4th Cir. 2025)). The per curiam opinion affirming the District Court was joined by fourteen of the fifteen members of the court. Eight concurring opinions and one dissenting opinion were written.
Ultimately, the en banc court divided sharply on the threshold Fourth Amendment question. Seven judges concluded that no search occurred because Chatrie had shared his Location History with a third-party, Google. Seven judges concluded that a search did occur. Chief Judge Diaz affirmed based solely on the District Court’s finding of good faith without deciding whether the warrant effected a search. Still, other judges indicated that, even if a search occurred or the warrant was constitutionally defective, suppression would be inappropriate under the good-faith exception. As a result, counting the district judge and the fifteen members of the en banc Fourth Circuit, fifteen of the sixteen judges to address the suppression issue declined to suppress the evidence, albeit for different reasons.
Methodology and Limitations
Our methodology was straightforward. We provided each platform with the same set of case materials: the petition for certiorari, petitioner’s opening merits brief, respondent’s merits brief, Volume I of the Joint Appendix, and the oral argument transcript.
After uploading those documents, we used the same master prompt across all three platforms without modification. The full prompt is available in a footnote below. In summary, the prompt instructed each tool to act as the Supreme Court and decide the case based solely on the uploaded record. It required each platform to summarize the case, address threshold issues and governing law, predict each Justice’s vote, provide a final vote count, draft a simulated majority opinion and separate opinion, identify limiting principles, and assess confidence.
The purpose of the exercise was not to create a statistically valid benchmark. It was to explore what these tools can and cannot do with respect to evaluating litigation risk and anticipating potential case outcomes. In that sense, the experiment is best understood as a practical demonstration rather than a scientific study. It offers a snapshot of how these tools reasoned from the same materials under the same prompt.
To be clear, there are many limitations in this mini-experiment. This was one case simulation, involving one prompt, one set of uploaded documents, and one run per platform. The platforms may differ in context-window size, document ingestion, retrieval architecture, access to legal data, prompting behavior, model configuration, and product design. The results also may have been affected by the wording of the prompt, the specific versions of the tools used at the time of the experiment, and the fact that the tools were asked to perform numerous tasks and steps within one prompted workflow. We also cannot fully know whether any tool’s response was influenced by external information, even though the prompt instructed each system to rely only on the uploaded materials. And because portions of the Joint Appendix were sealed, each tool necessarily worked from an incomplete record.
There is also an important accuracy limitation. Even if one or more platforms ultimately predict the correct outcome, that would not prove predictive accuracy in any rigorous sense. A single simulation of a single case cannot establish that a tool can reliably forecast Supreme Court decisions.
But those limitations do not eliminate the exercise’s practical value. In fact, they may make the experiment closer to how lawyers and litigants would actually use these tools in practice. Lawyers often ask AI systems to evaluate litigation risk, pressure-test arguments, and identify likely legal issues based on incomplete records and imperfect information. Few real-world uses of legal AI occur under laboratory conditions.
Accordingly, while this experiment should not be treated as a scientific benchmark or a definitive assessment of any platform’s predictive accuracy, it may still provide useful insight into how legal AI tools perform when used in a realistic litigation-strategy setting.
Results Overview
All three platforms, Thomson Reuters’ CoCounsel, Harvey, and Lexis Nexis Protégé, which we have anonymized below, predicted that the Supreme Court would reverse the Fourth Circuit. Platform 1 and Platform 2 reached the same predicted vote alignment: a 5-4 reversal, with Chief Justice Roberts and Justices Sotomayor, Kagan, Gorsuch, and Jackson in the majority. Platform 3 also predicted reversal, but by a broader 7-2 vote, adding Justices Kavanaugh and Barrett to the majority and leaving only Justices Thomas and Alito in dissent. Thus, the platforms’ predictions regarding disposition only differ with respect to two Justices: Justice Kavanaugh and Justice Barrett.
| Platform | Predicted Outcome | Predicted Majority | Predicted Dissent |
| Platform 1 | Reverse, 5-4 | Roberts, Sotomayor, Kagan, Gorsuch, Jackson | Thomas, Alito, Kavanaugh, Barrett |
| Platform 2 | Reverse, 5-4 | Roberts, Sotomayor, Kagan, Gorsuch, Jackson | Thomas, Alito, Kavanaugh, Barrett |
| Platform 3 | Reverse, 7-2 | Roberts, Sotomayor, Kagan, Gorsuch, Kavanaugh, Barrett, Jackson | Thomas, Alito |
The headline result is that all three platforms reached the same bottom-line prediction: reversal. But the more important question is how each platform reached its conclusion. We therefore compared not only the predicted outcome, but also the doctrinal path each tool used to get there, including how each platform addressed reasonable expectations of privacy, the third-party doctrine, particularity, general-warrant concerns, and the good-faith exception.
While fourteen of the fifteen judges on the Fourth Circuit voted to affirm the District Court’s decision, all three AI platforms predicted that the Supreme Court would reverse. The platforms differed only as to two Justices: Justice Kavanaugh and Justice Barrett. Platforms 1 and 2 predicted that both Justices would vote to affirm, placing them in a four-Justice dissent. Platform 3, however, predicted that both Justices would vote to reverse, and thereby form part of a seven-Justice majority.
The platforms’ divergent treatment of Justices Kavanaugh and Barrett is therefore the key point of disagreement. Their respective rationales can be summarized as follows:
Justice Kavanaugh: Platforms 1 and 2 predicted that Justice Kavanaugh would view the search warrant as reasonable and sufficiently particularized. Both platforms also predicted that, even if the warrant had constitutional defects, Justice Kavanaugh would likely conclude that suppression was inappropriate under the good-faith exception. Platform 3, however, predicted that Justice Kavanaugh would vote to reverse on narrower grounds, focusing on the lack of judicial oversight over Steps Two and Three of the geofence-warrant process.
Justice Barrett: Platforms 1 and 2 likewise predicted that Justice Barrett would vote to affirm. Platform 1 predicted that she would focus on the opt-in nature of Google Location History and conclude that the overall process was reasonable. Platform 2 predicted that Justice Barrett would likely focus on the voluntary nature of providing the Location History data and would be less likely to find a clearly protected privacy interest. Both Platforms 1 and 2 also suggested that Justice Barrett may rely, at least alternatively, on the good-faith exception. Platform 3, however, predicted that Justice Barrett would vote to reverse, reasoning that she would be concerned by the lack of judicial oversight over Steps Two and Three and would view that feature of the warrant as raising textualist and structural concerns about government overreach.
Across all of the Justices, the Platforms described the rationale of the Justices’ votes as follows:
Affirmance based on the absence of a protected privacy interest or application of the Third-Party doctrine:
- Justice Thomas: Platforms 1, 2, and 3
- Justice Alito: Platform 1
- Justice Barrett: Platform 1
Affirmance based on the reasonableness or particularity of the warrant:
- Justice Kavanaugh: Platforms 1 and 2
- Justice Alito: Platform 2
- Justice Barrett: Platform 2
Reversal based on an unconstitutional search, insufficient judicial oversight of Steps Two and Three, or general-warrant concerns:
- Chief Justice Roberts: Platforms 1, 2, and 3
- Justice Kagan: Platforms 1, 2, and 3
- Justice Jackson: Platforms 1, 2, and 3
- Justice Sotomayor: Platforms 1, 2, and 3
- Justice Gorsuch: Platforms 1, 2, and 3, with particular emphasis on property-rights principles
- Justice Kavanaugh: Platform 3
- Justice Barrett: Platform 3
Affirmance based on the good-faith exception, either as the primary rationale or as an alternative ground:
- Justice Alito: Platforms 1, 2, and 3
- Justice Barrett: Platforms 1 and 2
- Justice Kavanaugh: Platform 1
- Justice Thomas: Platform 2
- Justice Jackson: Platform 1, which suggested a possible remand for further good-faith analysis
*****
Taken together, the platforms did not predict a categorical prohibition on geofence warrants. Instead, their predictions suggest a narrower possible outcome: one in which the Court imposes greater judicial oversight over the process by which law enforcement obtains and narrows location-history data, particularly at the later stages of the geofence-warrant process.
The main point is not whether any platform ultimately “got it right” in Chatrie. One case cannot prove predictive reliability. But the exercise does suggest that legal AI tools may offer useful strategic insight into difficult appellate questions. That possibility warrants further study across more cases, prompts, platforms, and procedural settings, with attention not only to bottom-line accuracy, but also to reasoning quality, consistency, and treatment of uncertainty.
*****
The full prompt used in this mini-experiment is available below:
#######
PROMPT: SUPREME COURT SIMULATION
You are acting as the Justices of the Supreme Court of the United States deciding this case based solely on the uploaded record. Do not rely on external knowledge of how the case was actually decided or predicted. Base your analysis only on the materials provided.
TASK 1: CASE OVERVIEW
Provide a concise, neutral summary:
- Question Presented
- Relevant facts
- Procedural posture
TASK 2: THRESHOLD ISSUES (MANDATORY)
Analyze in order:
- Standing
- Jurisdiction
- Mootness / ripeness (if applicable)
State clearly whether any threshold issue is dispositive. If so, resolve the case on that basis.
TASK 3: GOVERNING LEGAL FRAMEWORK
Identify and explain:
- Controlling constitutional or statutory provisions
- Relevant precedent (limit to the most important cases)
- Standard of review
Do not proceed to the merits until this framework is clearly established.
TASK 4: JUSTICE-BY-JUSTICE VOTE PREDICTION
Assign a vote to each Justice:
- Chief Justice John Roberts: ___
- Justice Clarence Thomas: ___
- Justice Samuel Alito: ___
- Justice Sonia Sotomayor: ___
- Justice Elena Kagan: ___
- Justice Neil Gorsuch: ___
- Justice Brett Kavanaugh: ___
- Justice Amy Coney Barrett: ___
- Justice Ketanji Brown Jackson: ___
For each Justice:
- State the vote (affirm / reverse / other disposition)
- Provide a 2–3 sentence justification grounded in that Justice’s interpretive approach (e.g., textualism, originalism, pragmatism, institutional concerns)
TASK 5: FINAL VOTE COUNT
Provide:
- Majority outcome (e.g., “Affirmed, 6–3”)
- Identify which Justices are in the majority and dissent
TASK 6: MAJORITY OPINION (REQUIRED)
Write a majority opinion that:
- Follows standard Supreme Court structure
- Applies the governing framework identified above
- Resolves the case narrowly unless broader resolution is necessary
- Avoids policy arguments not grounded in law
TASK 7: CONCURRENCE OR DISSENT (IF APPLICABLE)
Write:
- One concurrence (if appropriate), OR
- One dissent
The opinion should reflect a distinct methodological disagreement (not just outcome disagreement).
TASK 8: LIMITING PRINCIPLE AND IMPLICATIONS
State:
- The limiting principle of the decision
- Likely impact on future cases
TASK 9: CONFIDENCE ASSESSMENT
Provide:
- Confidence level (High / Medium / Low)
- Brief explanation of uncertainty (e.g., unclear precedent, factual ambiguity, competing doctrines)
#######