New Study: Consultants Using AI 25% Quicker With 40% Higher Quality

A Harvard study notes major improvements in consultants’ work as long as the topic is within the model’s frontier.

In a new Harvard Business School working paper, researchers found that humans using AI can offer better productivity and quality in consulting or strategizing-related tasks. Key findings include:

People with AI access performed 25.1% faster, completed 12.2% more tasks on average, and produced 40% higher quality results.
Consultants using AI performed 19% worse than those without AI in tasks that were outside the capabilities of current generative AI, sometimes including mistakes, errors, and inaccuracies as if they were true facts (hallucination problem).
Humans act as either those who divide and delegate their activities to the AI or those who integrate AI into their task flow naturally by continually interacting with the tool.

You can read the abstract or download the paper, titled “Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality,” from the Social Science Research Network (SSRN) website.

Noting its findings, the researchers suggest that, “the capabilities of AI create a ‘jagged technological frontier’ where some tasks are easily done by AI, while others, though seemingly similar in difficulty level, are outside the current capability of AI.”

Whereas the positives can be taken as a boost to productivity with AI, the negatives make the use of generative AI more dangerous. That being said, most LLMs including ChatGPT are bound to improve and get fine-tuned further over time.

On the topic of humans performing worse in tasks “outside the frontier,” the paper noted the glaring issues that current LLMs have.

Some unexpected tasks (like idea generation) are easy for AIs, while other tasks that seem to be easy for machines to do (like basic math) are challenges for some LLMs. This creates a “jagged Frontier,” where tasks that appear to be of similar difficulty may either be performed better or worse by humans using AI.
Harvard Business School Working Paper 24-013, page 4.

The essence is that inside the frontier, GPT-4 acts as a quality and productivity booster and outside the frontier, it’s a disruptor. In its coverage, Axios AI+ writer Ryan Heath mentions how the use of generative AI is a double-edged sword, noting that “a reliance on clichéd GPT-4 outputs reduced the group’s diversity of thought by 41%.”

As long as tasks such as idea generation, product development, conceptualization, etc. are considered, gen AI can significantly boost both productivity and quality, particularly in the subset of knowledge workers who perform below average traditionally (43% improvement in creativity). But if you go beyond the AI’s capabilities, such as by giving it deeper company-specific problems (that the model knows little about), the responses will make the user 19% less likely to produce correct solutions compared to people not using AI.

Study background: The Boston Consulting Group is a 21,000-strong global management consulting firm involved in various problem-solving, strategizing, business, and management services for clients across industries. The experiment took 758 consultants from the company’s workforce and assigned conditions randomly: no AI, GPT-4 access, and GPT-4 access with prompt engineering overview. The idea was to test the impact of GPT-4 on high human capital professionals or highly skilled knowledge workers and see if AI augmented, disrupted, or influenced traditional workflows.

The study was mentioned in Boston Consulting Group’s article titled How People Can Create—and Destroy—Value with Generative AI.

Image taken from Business Insider article Inside Boston Consulting Group’s Office.

By Abhimanyu