This case study examines the development and implementation of a custom L31 core with an AI accelerator using the Codasip L31 processor and Codasip Studio. It discusses the shift from cloud-level to device-level AI processing in IoT and IIoT applications, emphasizing the necessity of running AI tasks locally to reduce security issues, data transfer costs, and latency. The Codasip Application Engineering team utilized TensorFlow Lite for Microcontrollers to evaluate the performance of the L31 core with standard ISA and custom AI accelerator instructions. The results indicated a performance increase of over five times and a reduction in energy consumption by more than three times. The study outlines the architecture of a convolutional neural network used for image classification and details the profiling of the image convolution function, which was identified as a key area for optimization. The implementation of a convolution accelerator is described, highlighting its efficiency and the simplicity of using CodAL for design. The case study concludes with the benefits of improved performance and energy efficiency, despite the additional silicon area required.