In February 1912, Franz Reichelt, an impoverished tailor, died after jumping from the first floor of the Eiffel Tower to test his self-designed wearable parachute. The event attracted journalists and a small crowd, having been publicized by Reichelt himself, who intended to make the leap personally rather than using a dummy. His fatal attempt remains a stark reminder of the risks inherent in untested innovations.

Reichelt’s story continues to resonate in contemporary discussions about evidence-based decision-making, particularly the use of randomized controlled trials (RCTs) to evaluate the effectiveness of interventions. While RCTs are widely regarded as the gold standard for establishing causal relationships, their applicability and necessity are still debated across medicine, education, criminal justice, economics, and policy-making.

A notable example highlighting this debate emerged in 2003, when a satirical review published in a medical journal pointed out the absence of RCTs on parachute use to prevent death and severe injury from falls. The authors facetiously challenged proponents of evidence-based medicine to organize such a trial. Fifteen years later, a parody RCT was conducted in which participants jumped from stationary aircraft on the ground, revealing no difference in injury rates between parachute users and non-users. Researchers cautioned that the results were not generalizable to high-altitude jumps, underscoring limitations in extrapolating findings outside study conditions.

These examples illustrate a broader lesson: some technologies or practices, such as parachutes, have benefits regarded as self-evident, making formal trials unnecessary or even unethical. Conversely, RCTs are not infallible; they require careful design and interpretation, and overgeneralization of results can mislead decision-makers and the public. The challenge lies in determining when rigorous experimentation is appropriate and when alternative forms of evidence must suffice.

Advocates for randomized experiments emphasize their power to eliminate confounding factors that obscure true effects. For instance, comparing educational outcomes by simply contrasting different schools does not yield reliable conclusions due to differing contexts. Carefully designed experiments provide clearer insights into causality. Yet, history also reveals that expert consensus can err, as seen in longstanding but unsupported medical practices like bloodletting or routine procedures administered to pregnant women in the 1980s that lacked evidence-based justification.

The COVID-19 pandemic provided a recent example of rapid evidence generation through randomized trials. The Recovery trial demonstrated that dexamethasone, a low-cost steroid, significantly reduced mortality in mechanically ventilated patients. This finding contradicted initial expert skepticism and concerns about immunosuppressive risks, highlighting the value of controlled experimentation even under urgent circumstances. The trial was launched swiftly alongside standard care, enabling life-saving data within months.

When high-quality experimental evidence is unavailable or unattainable, some experts recommend what Helen Pearson terms “aluminium standard” evidence—an approach involving clearly stated hypotheses and the best available data to test causal assumptions. Although less definitive than gold-standard RCTs, such evidence can guide decisions when time or ethical considerations preclude formal trials.

Reichelt’s fatal experiment serves as both a cautionary tale and an impetus to improve how society evaluates novel ideas. While rigorous evidence is invaluable, understanding its limits and complementing it with other forms of inquiry remains critical. Ultimately, recognizing that experts often know less than they presume encourages ongoing skepticism and the judicious application of research methods.