Large Language Models as Pretrained Data Engineers preview page 1

The Computer Society

Large Language Models as Pretrained Data Engineers

Pages

Time to read

50 mins

Publication

03/18/25

Language

English

Summary

This technical report discusses the integration of large language models (LLMs) into data engineering workflows, highlighting their potential to enhance data management tasks. The paper outlines an architectural framework that incorporates LLMs across three critical stages: data wrangling, analytical querying, and table augmentation for machine learning. It details how LLMs can simplify and optimize data preparation and transformation processes, extend querying capabilities, and improve the performance of data-centric tasks. The report also reviews current research and presents three developed systems: UniDM for data wrangling, DAIL-SQL for Text2SQL solutions, and SMARTFEAT for automating feature engineering. Furthermore, it addresses the challenges of integrating LLMs into complex data workflows and envisions future directions for LLM-assisted data engineering applications, including automated systems for efficient data preparation and enhanced querying interfaces for unstructured data.

The Computer Society

Large Language Models as Pretrained Data Engineers

Summary

Get the Full Copy