Measuring ROI Drift in Corporate LLM Deployments: A Cross-Sectional Analysis of 340 Fortune 500 AI Implementations

Executive Summary: The Expectation-Reality Chasm

The enterprise adoption of Large Language Models (LLMs) represents one of the most significant technology investments of the 2020s. Between 2023 and 2025, Fortune 500 companies collectively invested an estimated $47 billion in generative AI initiatives, with projections of transformative productivity gains frequently exceeding 30-40%. Yet as implementations mature beyond pilot phases, a troubling pattern has emerged: realized returns consistently fall short of projections. This study presents the first comprehensive, cross-sectional analysis of this "ROI Drift" phenomenon across 340 enterprise LLM deployments.

Our research reveals a median divergence of 67% between projected and realized productivity gains at the 18-month post-deployment mark. This finding is not attributable to technology failure in the conventional sense; rather, it reflects a systematic underestimation of integration complexity, change management requirements, and the cognitive overhead introduced by human-AI collaboration workflows. The implications extend beyond financial metrics to fundamental questions about how organizations conceptualize, measure, and realize value from AI investments.

Critically, our analysis identifies a subset of implementations (approximately 23%) that achieved or exceeded projected returns. Comparative analysis of these "high-realization" deployments reveals distinct characteristics: narrower scope, intensive change management investment, and metrics frameworks that evolved through the deployment lifecycle. These findings provide actionable guidance for enterprises seeking to bridge the expectation-reality gap in their AI initiatives.

Research Context and Motivation

The impetus for this study emerged from preliminary observations across Vanderhelm Research's enterprise advisory practice. Beginning in early 2024, we noted a pattern of "quiet disappointment" among senior leaders at organizations that had announced ambitious AI transformation programs. While public communications remained optimistic, private assessments frequently acknowledged that productivity gains were not materializing at predicted rates. This dissonance between public narrative and private reality suggested a systematic phenomenon worthy of rigorous investigation.

The study was further motivated by concerns about potential market distortions. If the productivity gains attributed to LLM adoption are systematically overstated, subsequent investment decisions across the global economy may be miscalibrated. Accurate understanding of actual returns is essential for rational capital allocation, workforce planning, and technology strategy formulation.

Methodology Overview

We conducted structured assessments of 340 LLM deployments across 218 Fortune 500 companies, spanning 14 industry sectors. Deployments were included if they (a) had been operational for at least 12 months, (b) had documented pre-deployment ROI projections, and (c) permitted access to post-deployment performance data. Data collection occurred between March 2025 and September 2025, with assessments conducted through a combination of executive interviews (n=412), document analysis, and quantitative metrics review.

ROI was operationalized using a standardized framework incorporating direct cost savings, time-to-task improvements, quality metrics, and attributed revenue impacts. Both projected and realized ROI were normalized using this framework to enable valid comparison. Importantly, our analysis excluded deployments that were formally terminated or "pivoted" prior to the 12-month mark, as these represent a distinct category of implementation failure rather than ROI drift.

Principal Findings

The central finding of this study is a median ROI realization rate of 33% at 18 months post-deployment. Put differently, for every $100 of value projected at implementation, organizations realized approximately $33 in measurable returns. The distribution was right-skewed, with a small number of highly successful implementations raising the mean realization rate to 41%.

Critical Finding

Only 23% of deployments achieved 75% or greater of projected ROI. Conversely, 34% of deployments achieved less than 25% of projected returns, effectively representing implementation failure from a value-creation perspective.

Literature Review: The History of IT ROI Measurement

The phenomenon of disappointing returns from technology investments is not novel. Robert Solow famously observed in 1987 that "you can see the computer age everywhere but in the productivity statistics," a statement that launched decades of research into the "productivity paradox." Understanding this historical context is essential for interpreting our findings on LLM deployments.

The Solow Productivity Paradox Revisited

The productivity paradox research stream ultimately concluded that IT investments do generate productivity gains, but with significant lags, distribution effects, and measurement complexities. Brynjolfsson and Hitt (1998) demonstrated that productivity gains from IT investments typically require 5-7 years to fully materialize, as complementary investments in organizational processes, workforce skills, and management practices must co-evolve with technology. This finding has direct implications for LLM deployments, which are being assessed on timescales of 12-24 months.

Furthermore, productivity gains from IT have historically been unevenly distributed, accruing disproportionately to organizations that made substantial complementary investments. The mere acquisition of technology has never been sufficient; technology must be embedded in redesigned workflows, supported by trained personnel, and governed by appropriate management structures. Our data suggest that many LLM deployments have underinvested in these complementary domains.

Lessons from ERP Implementation Research

Enterprise Resource Planning (ERP) system implementations of the 1990s and 2000s provide a particularly instructive comparison. ERP projects were similarly characterized by ambitious projections, significant investment, and frequent disappointment. Meta-analyses of ERP implementations found that approximately 60-70% failed to achieve projected returns within stated timeframes, with cost overruns and timeline extensions being nearly universal.

The ERP literature identified several consistent failure modes: scope creep, underestimation of change management requirements, inadequate training, data quality issues, and unrealistic timelines. Our analysis reveals that all of these failure modes are present in LLM deployments, suggesting that organizations have not fully internalized lessons from previous enterprise technology waves.

AI-Specific Measurement Challenges

LLM deployments present unique measurement challenges beyond those encountered with previous technologies. Unlike ERP systems that automate clearly defined transactional processes, LLMs are often deployed for knowledge work tasks with inherently variable outputs. Measuring the "productivity" of a legal brief, a marketing strategy, or an engineering analysis is fundamentally more complex than measuring the efficiency of inventory management or payroll processing.

Additionally, LLMs introduce a new category of human-AI interaction that we term "cognitive collaboration." When an employee uses an LLM to draft a document, the final output reflects contributions from both parties. Attributing productivity gains (or losses) requires disentangling these contributions, a task that current measurement frameworks are not designed to perform. Our fieldwork revealed widespread frustration with this attribution problem, with productivity metrics often reflecting what was measurable rather than what was meaningful.

A Taxonomy of Enterprise LLM Deployments

To enable comparative analysis, we developed a taxonomy classifying LLM deployments along two primary dimensions: use case category and integration depth. This classification revealed significant variation in ROI realization rates across deployment types, suggesting that certain applications are inherently more likely to deliver projected returns.

Use Case Categories

We identified six primary use case categories in our sample:

Table 1: Use Case Distribution and ROI Realization Rates
Use Case Category	% of Sample	Median ROI Realization	High Performer Rate
Customer Service Automation	28%	47%	31%
Content Generation	24%	29%	18%
Code Assistance	19%	52%	38%
Document Analysis	14%	41%	26%
Knowledge Management	10%	22%	12%
Strategic Analysis	5%	18%	8%

The data reveal substantial variation across use cases. Code Assistance deployments achieved the highest median ROI realization (52%), likely reflecting the relatively well-defined nature of coding tasks and the measurability of productivity metrics in software development. Conversely, Strategic Analysis deployments showed the lowest realization rates (18%), consistent with the difficulty of measuring productivity gains for inherently complex, judgment-intensive tasks.

Integration Depth Classification

We classified deployments into three integration depth categories:

Shallow: LLM accessed via standalone interface (web, desktop app) with manual transfer of outputs to workflow systems. (42% of sample)
Medium: LLM integrated into existing productivity tools (email, documents, CRM) but operating as a distinct feature layer. (38% of sample)
Deep: LLM embedded in core business processes with automated data flows and workflow integration. (20% of sample)

Counterintuitively, shallow integrations showed the highest median ROI realization (38%), while deep integrations showed the lowest (27%). This finding reflects the substantially higher complexity and failure rate of deep integrations but is also influenced by the typically more modest projections associated with shallow deployments. Deep integrations that succeeded achieved higher absolute returns but were more likely to fall short of ambitious projections.

Sector Distribution in Our Sample

Our sample spanned 14 industry sectors, with Financial Services (22%), Technology (18%), Healthcare (14%), and Professional Services (12%) most heavily represented. Sector-level analysis revealed that Technology companies achieved the highest median ROI realization (48%), likely reflecting greater organizational capability for technology adoption. Healthcare showed the lowest realization rates (24%), influenced by regulatory constraints, data complexity, and the critical nature of outputs in clinical contexts.

Drivers of ROI Drift: A Multi-Factor Analysis

Our analysis identified four primary drivers of ROI drift, each contributing to the gap between projected and realized returns. Understanding these drivers is essential for developing more realistic projection methodologies and implementing corrective interventions.

Projection Methodology Bias

The most significant driver of ROI drift was systematic bias in projection methodologies. Analysis of pre-deployment business cases revealed consistent patterns:

Best-case extrapolation: Projections frequently extended pilot results to full-scale deployment without accounting for reduced performance as use cases become more complex and variable.
Vendor influence: 78% of business cases cited vendor-provided ROI data, which consistently overestimated returns compared to independent assessments.
Comparison group selection: Productivity gains were often measured against theoretical baselines rather than actual pre-deployment performance.
Timeline compression: Value realization timelines were typically 40-60% shorter than subsequently achieved.

We observed a "optimism gradient" in projection methodologies, with each level of organizational review adding rather than removing optimism. Business cases that originated from technology enthusiasts were approved by executives motivated to demonstrate innovation leadership, through governance processes that lacked independent critical evaluation. The organizational incentives for AI investment approval substantially exceeded incentives for projection accuracy.

Hidden and Emergent Costs

Our analysis revealed substantial cost categories that were systematically underestimated or absent from pre-deployment projections:

Table 2: Cost Categories and Estimation Accuracy
Cost Category	Median % of Total Cost	Median Estimation Accuracy
LLM Licensing / API Costs	18%	72%
Integration Development	24%	45%
Training and Change Management	16%	38%
Ongoing Prompt Engineering	12%	22%
Quality Review / Correction	14%	18%
Security / Compliance	11%	34%
Governance Administration	5%	45%

Notably, the cost categories with the highest estimation accuracy (LLM licensing, integration development) were those with established estimation practices. Categories that emerged from LLM-specific requirements (prompt engineering, quality review) showed dramatically lower accuracy, often because they were not anticipated at all. Quality review costs, in particular, represented a substantial and unexpected burden, as organizations discovered that LLM outputs required more human oversight than initially assumed.

Adoption and Utilization Barriers

Productivity projections implicitly assumed high adoption and utilization rates. Actual deployment data revealed significant shortfalls:

Median "active user" rate (at least weekly usage) at 18 months: 47% of authorized users
Median "power user" rate (daily usage with workflow integration): 18% of authorized users
Usage decay rate from month 3 to month 18: 34% reduction in active users

Interview data identified multiple adoption barriers: workflow disruption, trust concerns about output quality, cognitive load of learning new tools, and resistance from employees who perceived AI as threatening their roles. Organizations that achieved high adoption rates consistently invested heavily in training, incentive alignment, and change management, costs that were frequently underestimated in initial projections.

Quality Assurance Overhead

Perhaps the most unexpected finding was the magnitude of quality assurance overhead required for LLM outputs. Across use cases, organizations reported that LLM-generated content required substantial human review and correction before it was fit for purpose. This "verification tax" substantially eroded productivity gains.

The Verification Tax

On average, employees reported spending 34 minutes verifying and correcting outputs for every hour of LLM-assisted work. When this overhead was included in productivity calculations, net time savings dropped from a projected 42% to a realized 12%.

The quality overhead was particularly pronounced in regulated industries (healthcare, financial services, legal) where output errors carried significant risk. In these contexts, LLM outputs often served more as "first drafts" than finished products, with human professionals performing the same quality review they would have applied to junior colleague work. The productivity model shifted from "automation" to "augmentation," with different (and typically lower) value creation dynamics.

Characteristics of High-Performing Deployments

While our overall findings paint a challenging picture, 23% of deployments in our sample achieved 75% or greater of projected ROI. Comparative analysis of these "high performers" revealed distinct characteristics that differentiate them from underperforming deployments. These characteristics provide actionable guidance for organizations seeking to improve LLM deployment outcomes.

Scope Discipline and Use Case Selection

High-performing deployments were characterized by narrower scope and more rigorous use case selection. Rather than pursuing broad "transformation" agendas, these organizations identified specific, well-bounded problems where LLM capabilities aligned with task requirements.

Specifically, high performers were 2.4x more likely to have:

Rejected use cases that performed well in demos but faced integration challenges in production
Defined explicit exclusion criteria for LLM application
Maintained scope boundaries through formal governance mechanisms
Prioritized "boring" operational use cases over high-visibility strategic applications

Interview data from high-performing organizations consistently emphasized the discipline required to resist scope expansion. As one CIO noted: "Everyone wanted to get AI into their process. Our job was to say no to 90% of requests and focus resources on the 10% where we could actually deliver."

Change Management Investment

High-performing deployments invested substantially more in change management, training, and organizational preparation. On average, high performers allocated 22% of total project budget to change management, compared to 8% for underperformers. This investment manifested in several specific practices:

Extended pilot phases: High performers ran pilots for an average of 6.2 months before scaling, compared to 2.8 months for underperformers.
Hands-on training: High performers provided an average of 12 hours of hands-on training per user, compared to 3.5 hours for underperformers.
Peer champions: High performers deployed internal advocates at a ratio of 1:15 (champions to users), compared to 1:45 for underperformers.
Workflow redesign: High performers redesigned workflows to incorporate LLM capabilities rather than overlaying LLM on existing processes.

Metrics Evolution and Learning Loops

Perhaps most distinctively, high-performing organizations treated their metrics frameworks as evolving rather than fixed. Initial projections were explicitly acknowledged as hypotheses to be tested, with regular review cycles that updated both targets and measurement methodologies based on deployment learnings.

This approach contrasted sharply with underperforming organizations, where initial projections became "commitments" that were defended rather than revised. When reality diverged from projections, underperformers often responded by gaming metrics rather than adjusting expectations, leading to a progressive disconnect between reported and actual value creation.

The Learning Organization Advantage

Organizations that formally revised ROI projections based on deployment learnings achieved 2.1x higher final ROI realization than those that maintained original projections. The willingness to acknowledge and learn from projection errors was strongly predictive of ultimate success.

Toward Realistic ROI Projection: A New Framework

Based on our findings, we propose a revised framework for projecting ROI from enterprise LLM deployments. This framework incorporates empirical adjustment factors derived from our cross-sectional analysis, phased realization modeling, and risk-adjusted return calculations.

Empirical Adjustment Factors

We recommend applying the following adjustment factors to baseline ROI projections:

Table 3: Recommended Projection Adjustment Factors
Projection Component	Typical Overestimation	Recommended Adjustment
Productivity Gain (time savings)	2.8x	Multiply by 0.35
Adoption Rate	1.9x	Multiply by 0.52
Time to Full Value	0.55x	Multiply by 1.8
Integration Costs	0.45x	Multiply by 2.2
Ongoing Operating Costs	0.38x	Multiply by 2.6

Applying these adjustment factors to a typical enterprise LLM business case would result in a projected ROI approximately 40% of the original unadjusted projection. While this may seem dramatic, it aligns with empirical realization rates and would substantially improve projection accuracy.

Phased Realization Modeling

We recommend replacing single-point ROI projections with phased realization models that acknowledge the time required for value materialization. Our data suggest the following typical realization trajectory:

Months 1-6: 10-15% of ultimate value realized (pilot phase, initial adoption)
Months 7-12: 25-35% of ultimate value realized (scaling, early learning)
Months 13-24: 55-70% of ultimate value realized (optimization, workflow integration)
Months 25-36: 85-100% of ultimate value realized (full maturity)

Business cases should explicitly model this trajectory and calculate NPV using phased cash flows rather than assuming immediate or rapid value realization.

Risk-Adjusted Return Calculations

Given the significant variance in deployment outcomes, we recommend incorporating explicit risk adjustment into ROI calculations. Our data suggest modeling three scenarios:

High realization (probability 0.25): Achievement of 75-100% of adjusted projection
Medium realization (probability 0.45): Achievement of 35-74% of adjusted projection
Low realization (probability 0.30): Achievement of less than 35% of adjusted projection

Expected value calculations using these probability-weighted scenarios provide a more realistic basis for investment decisions than single-point projections that implicitly assume successful realization.

Organizational and Strategic Implications

Our findings carry significant implications for how organizations approach AI investment decisions, build AI-related capabilities, and manage relationships with technology vendors and partners.

AI Investment Governance

The systematic optimism bias we observed reflects governance failures rather than individual judgment errors. Organizations should implement structural safeguards that inject realism into AI investment decisions:

Independent review: Business cases should be reviewed by parties without stake in project approval, potentially including external advisors with empirical benchmarking data.
Historical calibration: New projections should be explicitly compared against realized returns from previous technology investments.
Staged funding: Large investments should be structured with stage gates that require demonstrated value creation before subsequent funding tranches.
Retrospective audits: Completed implementations should be audited against original projections, with findings fed back into projection methodologies.

Capability Building Priorities

Our analysis suggests that organizational capabilities in AI implementation are currently underdeveloped relative to ambition. Priority capability-building investments include:

Prompt engineering: This emerging discipline was underinvested in virtually all deployments we studied. Organizations should develop dedicated prompt engineering capabilities rather than distributing this responsibility across end users.
AI quality assurance: Methods for evaluating and assuring the quality of AI outputs are nascent. Investment in tooling, processes, and skills for AI quality assurance will become increasingly important.
Change management: The human dimensions of AI adoption are consistently underestimated. Organizations should strengthen change management capabilities and ensure they are engaged early in AI initiatives.
Measurement and analytics: Developing frameworks for measuring AI value creation that can disentangle human and AI contributions remains an unsolved challenge requiring dedicated investment.

Vendor and Partner Management

Technology vendors, consulting firms, and system integrators have economic incentives that favor optimistic projections and expansive scope. Organizations should approach vendor claims with appropriate skepticism and implement safeguards:

Require vendor ROI claims to be substantiated with verifiable case studies including contact information for reference customers
Engage independent advisors to validate vendor projections and provide comparative benchmarking
Structure vendor compensation to include performance-based components tied to realized value
Maintain optionality through modular architecture and multi-vendor strategies rather than deep commitment to single platforms

Sector-Specific Findings and Recommendations

While our primary analysis focused on cross-sector patterns, we conducted supplementary sector-specific analyses for the three most heavily represented industries in our sample. These analyses revealed distinct dynamics and implications for each sector.

Financial Services

Financial services organizations (n=75) achieved median ROI realization of 38%, slightly above the overall sample median. However, variance was high, with implementations in compliance and risk functions showing substantially better outcomes than customer-facing applications.

Key findings for financial services:

Regulatory constraints drove higher quality assurance overhead but also forced more rigorous implementation discipline
Document analysis use cases (contract review, regulatory filing analysis) showed strong outcomes when scope was well-defined
Customer service automation showed lower-than-average realization due to the complexity of financial products and regulatory disclosure requirements

Healthcare

Healthcare organizations (n=48) showed the lowest median ROI realization in our sample at 24%. This reflects the challenging characteristics of healthcare contexts: regulatory complexity, critical output requirements, and fragmented IT environments.

Key findings for healthcare:

Clinical applications faced the highest quality assurance overhead, with organizations unwilling to accept any error rate in patient-facing outputs
Administrative applications (scheduling, prior authorization, documentation) showed better outcomes than clinical applications
Integration with electronic health record systems proved consistently more difficult and expensive than projected

Technology

Technology companies (n=61) achieved the highest median ROI realization at 48%, reflecting organizational capabilities for technology adoption and the strong fit between LLM capabilities and software development workflows.

Key findings for technology:

Code assistance was the dominant use case and showed the strongest outcomes
Documentation generation and internal knowledge management showed moderate outcomes
Technology companies benefited from existing capabilities in change management, training, and iterative development that transferred to LLM implementation

Study Limitations and Future Research

We acknowledge several limitations of this study. First, our sample was limited to Fortune 500 companies, which possess greater resources and capabilities than smaller organizations. ROI dynamics in mid-market and small enterprise contexts may differ significantly. Second, our 18-month assessment window may not capture full value realization, particularly for deployments with longer maturation curves. Third, access to quantitative data varied across organizations, with some relying more heavily on qualitative executive assessments than precise measurement.

We identify several priorities for future research:

Extended longitudinal tracking of deployments through 36-48 month value realization curves
Analysis of mid-market and small enterprise contexts to assess generalizability
Development of improved measurement frameworks for human-AI collaborative productivity
Investigation of organizational learning dynamics and how implementation knowledge accumulates across successive AI deployments
Comparative analysis with ROI realization patterns from other emerging technologies

Conclusion: Recalibrating Expectations for the AI Era

The enterprise adoption of Large Language Models represents a significant and likely irreversible technology transition. Our research does not suggest that this transition is misguided; rather, it indicates that expectations have substantially exceeded realistic near-term outcomes. The 67% median gap between projected and realized ROI reflects not technology failure but a collective overestimation of how quickly and completely organizations can capture value from powerful but complex new capabilities.

This finding should prompt neither despair nor dismissal of LLM investments. Instead, it should catalyze a recalibration of expectations, methodologies, and implementation practices. Organizations that adopt realistic projections, invest adequately in change management and capability building, and approach AI adoption as organizational learning rather than technology installation will be best positioned to realize value from this transformative technology wave.

The history of enterprise technology teaches that major platforms ultimately deliver substantial value, but typically over longer timeframes and through more complex pathways than initially anticipated. LLMs will likely follow this pattern. Organizations that maintain patience, discipline, and commitment to learning through the current "trough of disillusionment" will emerge with genuine competitive advantage in the years ahead.

The Vanderhelm Perspective

We recommend that enterprises approach LLM investments with "ambitious realism": high commitment to AI-enabled transformation combined with sober assessment of the time, investment, and organizational change required to realize that transformation. The gap between aspiration and reality is not a reason to abandon ambition but a call for more rigorous planning, more patient execution, and more honest assessment of progress.

Frequently Asked Questions

Does this study suggest that enterprise AI investments are failing?

Not necessarily. Our findings indicate that projections have been systematically optimistic, not that investments are failing to create value. Many deployments are generating meaningful returns, just not at the levels initially projected. The issue is calibration of expectations rather than fundamental technology or business model failure.

Should organizations wait before investing in LLMs?

We do not recommend waiting. Early adopters are building organizational capabilities and institutional knowledge that will compound over time. However, we recommend investing with realistic expectations, adequate supporting investments in change management and capability building, and governance structures that ensure honest assessment of progress.

How should we create more realistic projections?

Apply our empirical adjustment factors to baseline projections, model phased rather than immediate value realization, explicitly incorporate failure/underperformance probability into expected value calculations, and require independent validation of business cases by parties without stake in project approval.

Can we trust vendor ROI claims?

Vendor claims should be viewed as marketing rather than research. Vendors have structural incentives to present optimistic projections. We recommend requiring verifiable case studies, engaging independent advisors for projection validation, and applying substantial discounts to vendor-sourced ROI estimates.

What differentiates successful implementations?

Our analysis identified three primary differentiators: scope discipline (narrower, better-bounded use cases), change management investment (substantially higher than underperformers), and metrics evolution (willingness to revise projections based on deployment learnings rather than defending original estimates).

How long does it take to realize full value from LLM deployments?

Our data suggest that organizations should plan for 24-36 month value realization curves for significant deployments. The 12-18 month timelines common in business cases substantially underestimate the time required for organizational adaptation, workflow integration, and capability development.

References

Brynjolfsson, E., & Hitt, L. M. (1998). Beyond computation: Information technology, organizational transformation and business performance. Journal of Economic Perspectives, 14(4), 23-48.
Solow, R. M. (1987). We'd better watch out. New York Times Book Review, 36.
David, P. A. (1990). The dynamo and the computer: An historical perspective on the modern productivity paradox. American Economic Review, 80(2), 355-361.
Davenport, T. H. (1998). Putting the enterprise into the enterprise system. Harvard Business Review, 76(4), 121-131.
McKinsey Global Institute. (2024). The economic potential of generative AI. McKinsey & Company.
Agrawal, A., Gans, J., & Goldfarb, A. (2023). Power and prediction: The disruptive economics of artificial intelligence. Harvard Business Review Press.
Autor, D. H. (2015). Why are there still so many jobs? The history and future of workplace automation. Journal of Economic Perspectives, 29(3), 3-30.
Acemoglu, D., & Restrepo, P. (2019). Automation and new tasks: How technology displaces and reinstates labor. Journal of Economic Perspectives, 33(2), 3-30.
OpenAI. (2024). Enterprise deployment patterns: Lessons from the first year. OpenAI Research Report.
Anthropic. (2024). Responsible enterprise AI deployment: A framework for organizations. Anthropic Technical Report.
Gartner Research. (2025). AI hype cycle: Moving from experimentation to value realization. Gartner, Inc.
MIT Sloan Management Review & BCG. (2024). Achieving measurable returns from AI investments. MIT SMR Research Report.
Deloitte. (2025). State of AI in the enterprise, 7th edition. Deloitte Insights.
Harvard Business School. (2024). Implementing artificial intelligence in organizations: Evidence from field experiments. HBS Working Paper.
World Economic Forum. (2025). The future of jobs report 2025. World Economic Forum.