How AI can put mistakes into overdrive
Researchers have demonstrated that knowledge workers significantly degrade the quality of their work by asking OpenAI's GPT-4 technology to perform tasks it didn't train for, and then failing to spot "hallucinations" in its answers.
- A new study of more than 750 strategy consultants showed that AI helped them produce better content, more quickly in many tasks — but the consultants were "less likely to produce correct solutions" by attempting tasks of similar difficulty which fell outside the AI model's capabilities.
Why it matters: AI at work remains a double-edged sword — able to add great value but also able to destroy it when workers use it without coaching.
- Highly skilled workers struggled to navigate what AI can and can't do well, and a reliance on clichéd GPT-4 outputs reduced the group's diversity of thought by 41%.
- The researchers termed the uneven nature of AI capabilities a "jagged technological frontier."
Details: Researchers from Harvard, MIT, Wharton and other institutions teamed up with Boston Consulting Group's think tank to study 758 BCG consultants given complex tasks "selected by industry experts to replicate real-world workflows."
- Half the participants were asked to conceptualize and develop new product ideas — such as new beverages and types of footwear — and the others were asked to identify the root cause of one company's challenges using performance data and executive interviews.
- Participants conducted an assessment task to establish a baseline for the quality of their work, and the later experiment task involved 18 sub-tasks graded by two humans.
- All participants began their work without AI, before being randomly split into three categories mid-way: no AI access, GPT-4 AI access, or GPT-4 AI access with a prompt engineering overview.
What happened: Of the consultants asked to develop new ideas — a challenge within GPT-4's known capabilities — those who received both AI access and guidance "consistently" performed better than those given AI access only. Both groups outperformed the control group.
- Those with AI access also worked faster and completed more of the 18 tasks, on average.
- Consultants who scored below average in the assessment task got the biggest boost from using AI later in the creativity experiment: Their output improved by 43%, compared to 17% for above average consultants.
- The other half of the group were tackling one company's deep problems, a topic that GPT-4 knew little about. Those using AI in their responses were "more likely to make mistakes" and 19% less likely to produce correct solutions compared to those without AI, per the report.
The intrigue: Employers face a dilemma with these results.
- AI can quickly and cheaply increase worker performance, but taking shortcuts with AI training might cause more mistakes, which could potentially blow up in an employer's face.
- Even the AI experts behind this report are stumped by "novel and unexpected" AI capabilities and capability gaps in the workplace.
Zoom out: AI holds the promise of significantly improving productivity growth — which is appealing to both CEOs and politicians because of its correlation with strong economic growth.
- But industrial revolutions from the steam engine to personal computing took decades to translate into widespread productivity growth.
Yes, but: Long-term projections of generative AI's economic impact are largely guesswork at this early stage of the technology's development.
What they're saying: "Even participants who were warned about the possibility of wrong answers from the tool did not challenge its output" when GPT-4 made things up, BCG analysts noted in a review of the study.
- But when participants used AI for brainstorming ideas, "around 90% of participants improved their performance" and "people did best when they did not attempt to edit GPT-4's output."
- "This study highlights the importance of validating and interrogating AI" and the importance of applying "experts' judgment" when working with AI, per the report.
The other side: The work measured by the study isn't rocket science, or even software coding, and the assessment of quality is inevitably subjective.
- Some critics argue that boosting business-consultant productivity with AI is a pointless game of maximizing output for "a profession of bulls--t generators."