224 
 
 
Advances in Production Engineering & Management ISSN 1854-6250 
Volume 20 | Number 2 | June 2025 | pp 224–238 Journal home: apem-journal.org 
https://doi.org/10.14743/apem2025.2.537 Original scientific paper 
 
 
Large language models for G-code generation in CNC 
machining: A comparison of ChatGPT-3.5 and ChatGPT-4o 
Šket, K.
a,*
, Potočnik, D.
a
, Brezocnik, M.
a
, Ficko, M.
a
, Klančnik, S.
a
 
a
University of Maribor, Faculty of Mechanical Engineering, Maribor, Slovenia 
 
 
A B S T R A C T  A R T I C L E   I N F O 
This research explores the viability of producing ISO G-code for 3-axis ma-
chining with OpenAI's Chat Generative Pre-Trained Transformer models, 
particularly ChatGPT-3.5 and the newer GPT-4o. G-code (RS-274-D, ISO 6983) 
converts human directives into commands that machines can understand, 
controlling toolpaths, spindle velocities, and feed rates to produce particular 
aspects of an object. Previously, G-code was generated either by hand or 
through the use of computer-aided manufacturing (CAM) software along with 
machine-specific post-processors, both of which may require considerable 
time and expense. This research aimed to assess the practicality and effec-
tiveness of specific large language models (LLMs) in generating G-code. The 
assessment took place in three distinct phases on a sample component that 
required 3-axis machining. These phases included: (1) the self-generated 
production of G-code for the sample component, (2) the examination of the 
independently generated G-code in the CAM application, and (3) the recogni-
tion and justification of mistakes in the G-code. The outcomes indicated vary-
ing abilities with promising findings. This method could accelerate and possi-
bly enhance manufacturing workflows by decreasing reliance on expensive 
CAM software and specialized knowledge. 
 Keywords: 
Generative artificial intelligence; 
Intelligent manufacturing; 
Large language models (LLM); 
ChatGPT; 
CNC machining; 
G-code programming  
*Corresponding author:  
kristijan.sket@um.si 
(Šket, K.) 
Article history:  
Received 9 May 2025 
Revised 13 June 2025 
Accepted 19 June 2025 
 
Content from this work may be used under the terms of 
the Creative Commons Attribution 4.0 International 
Licence (CC BY 4.0). Any further distribution of this work 
must maintain attribution to the author(s) and the title of 
the work, journal citation and DOI. 
 
 
1. Introduction  
In advanced manufacturing, especially CNC machining, the incorporation of generative AI models 
like ChatGPT signifies a novel frontier. This integration aligns with the larger trend towards smart 
manufacturing and machining as outlined in [1, 2]. The rapid advancement of AI technologies, es-
pecially data-driven systems, swarm intelligence, and hybrid human-machine systems, symbolizes 
progress of smart manufacturing [3, 4], in which AI's analytical and predictive power can be ap-
plied to enhance and streamline CNC machining processes, advancing the continuous technological 
revolution in manufacturing characterized by the new age of the Internet and AI [5-7].  
The objective of this study was to employ OpenAI's Chat Generative Pre-Trained Transformer, 
widely referred to as ChatGPT, in CNC machining to explore its ability to identify, comprehend, and 
produce ISO G-code for milling operations. This study draws inspiration from the extensive adop-
tion of AI in production and manufacturing systems. The research conducted by Hu Li and col-
leagues [1] offers a technical examination of the elements that are crucial for the acceptance of AI. 
Bernhard Heiden and colleagues [8] demonstrate how AI can integrate with manufacturing tech-
niques to establish a self-organizing system. Both studies highlight the significance of AI in enhanc-
Large language models for G-code generation in CNC machining: A comparison of ChatGPT-3.5 and ChatGPT-4o 
 
Advances in Production Engineering & Management 20(2) 2025 225 
 
ing material movement and process effectiveness. This emphasis on data analysis aligns with the 
increasing trend of employing AI to assess and enhance manufacturing parameters. 
1.1 Related work 
The paper [9] authored by Rane et al. reviews ChatGPT, Bard, and various other generative AI 
technologies and performs a comprehensive examination, featuring a literature review and bib-
liometric analysis, to identify the prevailing trends and significant factors in the incorporation of 
AI tools like ChatGPT into manufacturing, pointing out these technologies as a crucial element in 
the continuous advancement in production engineering. 
In a study by Wang et al. [10] the authors analyse the use of ChatGPT in the manufacturing 
industry and evaluate its advantages and disadvantages. It works well with structured and com-
prehensive answers, yet drawbacks were identified, particularly in the provision of accurate 
technical expertise and the tendency to generate incorrect information when queries are made 
from outside the training data. The authors emphasize the importance of human verification of 
answers for efficient communication.  
In their study, Javaid, Haleem and Singh [11] discuss the incorporation of ChatGPT into the 
framework of Industry 4.0, exploring how ChatGPT can be adapted to automate tasks and exam-
ining its applications in Industry 4.0, including improving human-robot collaboration, support-
ing predictive maintenance, ensuring quality control, and performing big data analytics. 
Some extensive research has been conducted into ChatGPT’s usage in additive manufacturing 
(AM) and usage of G-code in that field. Badini et al. [12] conducted an evaluation of usability by 
optimizing the generation of G-codes for fused filament fabrication. ChatGPT’s ability to process 
and optimize suboptimal G-code data has shown its potential to streamline the process. In their 
article [13], Sriwastwa et al. examined the role of ChatGPT in enhancing training for medical 3D 
printing. Their research indicated that it offers precise and beneficial responses to fundamental 
inquiries, particularly for novice trainees or newcomers; nonetheless, as the difficulty of the ques-
tions rises, particularly for situations demanding practical experience or thorough technical exper-
tise, the shortcomings become increasingly clear. In [14], the authors discuss the use of large lan-
guage models (LLM) in AM, with a particular focus on their ability to understand and process G-
code. The study shows that while models such as GPT-4 and Claude-2 perform excellently in sev-
eral areas, their ability to comprehensively analyse and capture the complicated geometry of G-
code is significantly limited, mainly due to the short length of the context windows. 
1.2 Study justification 
To authors’ knowledge, there are no studies that focus on the use of generative AI models, such 
as ChatGPT, for the automatic generation, interpretation, and correction of G-code in CNC ma-
chining, although there is a variety of research on AI and its applications in manufacturing. This 
lack of literature highlights the need for a focused study to evaluate the usefulness and draw-
backs of these models in actual production scenarios. 
This study is the first evaluation of ChatGPT's ability to generate, decode and correct ISO G-
code for CNC machining. It explores the potential of ChatGPT to automate key steps in the CNC 
programming process, in contrast to previous studies that have investigated more general appli-
cations of AI in manufacturing. The focus of the study is not on the architectural novelty of the 
models, but on the empirical evaluation of how these differences affect the performance of G-
code programming. 
The article is based on experiments that evaluated responses with three main objectives. 
First, the ability to create G-code with inputs typically used in commercial computer-aided man-
ufacturing (CAM) programs was determined. Second, ChatGPT's understanding of G-code was 
evaluated by creating a simple program for 3-axis machining. In the third iteration, ChatGPT's 
ability to recognize errors and attempt to correct the given G-code was evaluated. The goal of 
this investigation was to assess ChatGPT's capabilities for this type of work and to determine 
whether it is possible to create G-code for simple CNC machining problems using ChatGPT alone, 
which could reduce the need for commercial CAM software in the future. 
Šket, Potočnik, Brezocnik, Ficko, Klančnik  
 
226 Advances in Production Engineering & Management 20(2) 2025 
 
2. Materials and methods  
2.1 Used artificial intelligence method 
LLMs have transformed natural language processing through transformer-based neural network 
architectures (TSMs) [15, 16]. The progression of natural language processing encompasses four 
key stages: statistical language models, neural language models, pre-trained language models, 
and LLMs, each stage building on the capabilities of the prior generation to improve language 
understanding and generation [15, 16]. These models have shown potential in areas such as 
healthcare [15], education [16], and scientific research [17]; yet some challenges remain, includ-
ing data bias, high computational costs, and ethical considerations. 
TSMs are typically divided into two main types: scratch-trained and pre-trained. Models devel-
oped from the ground up are tailored for specific tasks, while pre-trained models first undergo ex-
tensive training on large text datasets using self-supervised learning methods before they are fine-
tuned for specific downstream applications. Instances of pre-trained models are BERT [18] (Bidirec-
tional Encoder Representations from Transformers) and GPT. BERT employs a bidirectional tech-
nique to understand the context of words in both directions in a sentence, whereas GPT utilizes a 
unidirectional approach that focuses on forecasting the next word in a sequence [19, 20].  
The functionality and implementation of ChatGPT is complicated and advanced, but the prod-
uct is a system that can respond to queries and cues like a human [21] and thanks to its scalabil-
ity, it can manage multiple conversations simultaneously, increasing productivity and reducing 
the need for human intervention. Its efficiency in processing large amounts of data quickly also 
saves time.  
But ChatGPT also has some significant drawbacks. It can reproduce biases from the training 
set, potentially promoting discrimination or stereotyping. Additionally, since its knowledge base 
is limited to training data, it may contain errors for unusual or specialized topics. In this study, 
the limitations of ChatGPT were tested in the generation of G-code for CNC machining. 
2.2 ISO G-code 
G-code is a language used to control toolpaths and generate the profiles of an object's features in 
CNC machining. It is essentially a set of instructions that translate human directives into ma-
chine-readable instructions so that CNC machines can operate automatically. The primary aim of 
G-code is to regulate different aspects of the machining process, including the motion of the cut-
ting tool, feed rates, spindle speeds, and coolant flow, to ensure high precision and efficiency in 
production, while also enabling safe operation and collision-free machining [22-24]. 
G-code can be generated automatically with CAM software that uses a virtual 3D model as input 
to produce the necessary code for machining the part, or it can be manually written. However, 
creating the G-code manually requires computer and programming skills. Generating the G-code 
with CAM requires specialized software and a machine-specific post-processor, which can be quite 
expensive and requires a trained expert to operate. On the other hand, writing the code manually 
can be time-consuming and repetitive. The use of AI models for the rapid creation of G-code for 
simple and small series of parts could, therefore, be a cost-effective and fast solution [22]. 
2.3 Example part 
The example part consists of a cube (Fig. 1) measuring 150 × 150 × 150 mm. On its top side, 
there are four symmetrically arranged holes, each with a diameter of 16 mm, located 25 mm 
from the edges of the cube and reach a depth of 50 mm. In the centre there is a through-hole 
with a diameter of 12 mm. This central through-hole is counterbored with a diameter of 25 mm 
and a depth of 25 mm. In addition, the upper surface must be face-milled by 1 mm. 
Large language models for G-code generation in CNC machining: A comparison of ChatGPT-3.5 and ChatGPT-4o 
 
Advances in Production Engineering & Management 20(2) 2025 227 
 
 
Fig. 1 Design of the example part 
 
2.4 Responses verifying method 
To check the answers generated by ChatGPT, the G-code required to produce the example part 
was created separately in the CAM software Siemens NX version 2312 build 1700 where every-
thing required for a functioning ISO G-code was defined. A part measuring 150 × 150 × 151 mm 
was defined as the blank and four different tools were provided to create the desired part. Tool 
characteristics and intended uses are listed in Table 1.  
Table 1 Tools properties and intended use 
Tool number Tool name Feed rate 
(mm/min) 
Surface speed 
(mm/min) 
Operation 
T01 Insert cutter 𝜙𝜙 50 mm 250 100 Top face milling 
T02 Drill 𝜙𝜙 16 mm 250 100 Drilling holes 𝜙𝜙 16 
T03 Drill 𝜙𝜙 12 mm 250 100 Drilling hole 𝜙𝜙 12 
T04 Carbide end mill 𝜙𝜙 15 mm 250 100 Pocket and central hole milling 
2.5 Queries encoding 
The requests were simple and informative, aiming to provide ChatGPT with all relevant infor-
mation, especially when it was expected to create the G-code for the requested part. The model 
was encouraged to make suggestions about what information might be missing or could be bet-
ter presented.  
It was decided that the model should first create its own version of the code before the sepa-
rately created code was fed into the model for debugging and explanation. This way, the model 
did not have access to any pre-learned data. Additionally, the conversations with the model were 
performed in the temporary chat, where the input data should not be used for model learning 
(specified by the service provider). 
2.6 G-code preparation 
To prepare the G-code readable file for the model, an STL file of the example part and a technical 
drawing were created. In addition, text descriptions of the part, the tool library, and the capabili-
ties of the CNC machine were added. Features that should be used in the code such as planes, 
units, feed modes, coordinate systems, and tool lengths compensation modes, etc. were specified 
to clearly describe what and how the part should be manufactured. The input text is shown in 
Fig. 2. 
Šket, Potočnik, Brezocnik, Ficko, Klančnik  
 
228 Advances in Production Engineering & Management 20(2) 2025 
 
 
 Fig. 2 Text query for G-code generation using ChatGPT-3.5 and GPT-4o 
To test the image recognition capabilities of the GPT-4o model and the recognition capabili-
ties of STL files, the text explanation of the designed part in the query was replaced with an STL 
file of the 3D model and an image of the technical documentation (Fig. 1), as presented in Ap-
pendix Fig. A1. 
2.7 Debugging G-code 
For the requests to debug and repair the created G-code, a part description and the entire G-code 
with three implemented errors (Table 2) were provided to the model along with the request for 
ChatGPT to identify and fix them. 
For the implemented errors, types were selected that have a major impact on the stability of 
the process and can lead to damage to the workpiece, tool or machine. The absence of the M3 
command can cause the process to start with the spindle switched off, potentially leading to a 
collision between the tool and the workpiece. The next error is a rapid traverse movement (G0) 
instead of a feed movement (G1), which means that the tool moves into the workpiece at a much 
higher speed, which can also result in a collision. In the last error implemented, the spindle 
speed was set to an impossible speed. In practice, this may cause the process to stop, but it could 
also result in machining at the maximum speed the machine can deliver, producing an unstable 
process that can cause serious damage to the workpiece, tool, and machine. The entire query for 
ChatGPT-3.5 is shown in Fig. A2.  
Table 2 Implemented errors 
No. of error Line in code Error Correct 
1 4 Absence of M3 (spindle on) M3 after spindle speed set 
2 31 G0 (rapid traverse movement) G1 (movement at feed rate) 
3 53 Spindle speed set at 716000 RPM Spindle speed set at 7160 RPM 
2.8 Explaining G-code 
For ChatGPT’s explanation of the code, the model was provided with a separately created G-code 
for the specified part and asked to provide a detailed description of each line of code. The re-
quest is shown in Fig. 3. 
Large language models for G-code generation in CNC machining: A comparison of ChatGPT-3.5 and ChatGPT-4o 
 
Advances in Production Engineering & Management 20(2) 2025 229 
 
 
Fig. 3 Query for G-code explanation for both models 
3. Results 
As mentioned earlier, ChatGPT's capabilities with respect to ISO G-code for 3-axis CNC machin-
ing were evaluated by verifying its ability to automatically generate the code with the given in-
structions, detect and correct errors in written code and explain the meaning of the code in de-
tail. In this way, a relevant and practical comparison was achieved by creating realistic scenarios 
that a CNC programmer could encounter. The tested models GPT-3.5and GPT-4o showed a large 
discrepancy in understanding both the code and the instructions. 
Both models were tested with identical prompts and analysed using Siemens NX CAM soft-
ware (version 2312 build 1700) for simulation validation. The target geometry, tool set, and all 
machine constraints were standardized to ensure fairness. The following subsections present 
the observed differences in model performance. 
3.1 G-code generation 
When working with the model GPT-3.5, no fully functional code was generated. The closest re-
sult was obtained with a semi-functional code, as shown in Fig. 4A. The code contained several 
errors in both milling operations, such as: 
• a collision between the tool and the workpiece, 
• movements over the same position, 
• circular interpolation in the wrong direction, 
• missing sections of the milling operations. 
The code was tested in the simulation software Siemens NX version 2312 build 1700. The de-
sired part is shown in Fig. 4B and the resulting part in Fig. 4C. The errors in the code resulted in 
an unfinished face milling operation, with the turquoise colour representing areas where no face 
milling operation was performed and the darker blue colour representing areas with a complet-
ed face milling operation.  
The code also caused a collision and an off-center milled blind hole with an unfinished bot-
tom (see Fig. 4C). In the authors’ limited experience, GPT-3.5 is not very useful when working 
with ISO G-code for 3-axis CNC machining, especially for milling operations. However, it can 
produce functional code for pure drilling operations. 
GPT-4o produced fully functional code (Fig. 5A) that can be used directly for manufacturing 
on an ISO G-code compatible CNC machine. The code is well organized and follows the given 
instructions. In addition, a request was sent to the model to display the features generated by 
the code in 2D (Fig. 5B). The generated code was then tested again in the dedicated CAM soft-
ware by Siemens NX to verify both the code and the resulting part. The part generated with the 
GPT-4o code matches the expected design and is shown in Fig. 5C. 
Šket, Potočnik, Brezocnik, Ficko, Klančnik  
 
230 Advances in Production Engineering & Management 20(2) 2025 
 
 
Fig. 4 ChatGPT model 3.5 outputted G-code (A), the desired result of the code (B), and the actual result of the 
simulated code in Siemens NX version 2312 build 1700 (C) 
 
Fig. 5 ChatGPT GPT-4o model output G-code (A), the output depiction (B), and the result of the simulated code in 
Siemens NX version 2312 build 1700 (C) 
  
Large language models for G-code generation in CNC machining: A comparison of ChatGPT-3.5 and ChatGPT-4o 
 
Advances in Production Engineering & Management 20(2) 2025 231 
 
3.2 Debugging G-code 
A deliberately flawed G-code was presented to both models. The bugs built into the code proved 
too difficult for model 3.5 to detect. As can be seen on Fig. A3, not only were there no errors de-
tected, but it was even claimed that none were present, and the code should work as expected. 
This is of course unacceptable and could lead to damage and a failed process. Based on the re-
sults the usage of model 3.5 for error detection is not suggested. 
In contrast, the ChatGPT-4o model not only successfully identified the errors and implement-
ed the correct syntax, but also explained what the errors were and why they could lead to prob-
lems. It additionally warned about other parts of the code that could be redundant or problemat-
ic and corrected them. 
By focusing only on the errors entered, the model has shown that it understands the code and 
the process that the code specifies, considering the part, the tool and the machine. It has correct-
ly recognized the missing command to switch on the spindle which causes the machine to start, 
the G0 command which would result in a collision has been correctly changed to G1 and the set 
spindle speed has been overridden with a more reasonable value so that the occurrence of er-
rors or damage is mitigated.  
The model's responses are shown in Fig. 6. The code was tested again using the Siemens NX 
simulation tool to check that it still worked as intended, which it did. 
 
Fig. 6 ChatGPT-4o model error detection capabilities (corrections highlighted in yellow) 
3.3 Explaining G-code 
When it comes to explaining the meaning of the commands in the code, GPT-3.5 provides only a 
very limited explanation. Although it was instructed to explain the code line by line, many com-
mands were skipped, and those that were explained are presented very briefly, though correctly. 
A potential issue is that the model skipped commands related to movements during machining 
operations, which are important to know. The explanation of the code by GPT-3.5 is shown in 
Fig. A4. 
As in the previous categories, GPT-4o once again outperformed GPT-3.5, this time in code ex-
planation. The model provided a line-by-line explanation of each command in a readable and 
Šket, Potočnik, Brezocnik, Ficko, Klančnik  
 
232 Advances in Production Engineering & Management 20(2) 2025 
 
coherent manner. It correctly recognized different types of commands, such as program start 
and end, selected planes, units, rapid and work movements, tool selection, coordinate system 
selection, tool in moves and retraction, machine settings, etc. This demonstrates that the model 
understands various aspects of CNC machining and can recognize and explain them correctly. 
Such detailed explanations could be useful for individuals learning ISO G-code or for teaching 
custom machine learning models. The detailed explanation of the code is shown in Fig. A5. 
4. Discussion 
The results show that the performance differences between GPT-3.5 and GPT-4o are not only 
quantitative (i.e., in terms of the number of parameters) but also functional, particularly in the 
context of ISO G-code. In contrast to previous general AI comparisons, this study introduces a 
task-specific benchmarking framework that is validated with professional CAM tools. The study 
provides a basis for understanding LLMs role in CNC programming; however, several critical 
aspects, including the handling of complex geometries, integration with CAM systems, and spe-
cific limitations require further discussed. 
4.1 Practical implementation challenges 
The application of GPT-4o in CNC machining practise poses various practical difficulties. The 
conclusion of the study emphasizes the need for professional monitoring and highlights that the 
models cannot yet completely replace the CAM software. Even small errors in the G-code can 
lead to costly machine damage, production delays or safety risks and therefore require strict 
validation processes. In practise, ensuring error-free G-code requires experienced operators to 
check outputs, which can reduce the appeal of models in demanding production environments. 
The reliance on carefully crafted prompts emphasises the importance of input quality. Incom-
plete or inconsistent prompts can lead to erroneous G-code, especially for users with little CNC 
knowledge who may struggle to define machining parameters such as feed rates, toolpaths or 
coordinate systems. This presents a challenge because operators may have varying levels of 
technical knowledge. 
The lack of direct integration with CNC machines or CAM systems complicates implementa-
tion. Unlike CAM software, which can seamlessly interface with CAD models and machine con-
trollers, GPT operates as a standalone tool. Users must manually input data and transfer the out-
put to the machining systems, a process that can introduce errors. Integration as a CAM plug-in 
is possible but would require significant development effort to achieve real-time data transfer 
and compatibility with different machine controllers. 
Additionally, the research assumes that the ISO G-code is compatible with all CNC controls, as 
most modern controls (e.g. from Fanuc, Siemens or Heidenhain) comply with this standard. 
However, controls often require machine-specific post-processing to accommodate slight differ-
ences in syntax, proprietary M-codes or specified cycles. GPT-4o has effectively generated G-
code that has been validated in Siemens NX, but its ability to adapt to control-specific details 
without direct guidance has yet to be evaluated. Some controls may require additional com-
mands to change tools or specific formatting for coordinate systems. 
4.2 Model performance for complex geometries 
ChatGPT-3.5 generated a partially functional G-code, whereas GPT-4o generated a functional G-
code, demonstrating its improved ability to understand and generate machining instructions. 
Nevertheless, both models struggle with complex geometries, such as free-form surfaces or 
complex toolpaths, which were not investigated in this study. 
Complicated shapes require accurate toolpath calculations and an understanding of complex 
machining dynamics. The research focuses on a basic 3-axis component, raising concerns about 
the models’ ability to scale to multi-axis machining (e.g., 4- or 5-axis), where tool orientation and 
simultaneous multi-axis movements add complexity. Both models, particularly ChatGPT-3.5, are 
likely to struggle in such scenarios due to their reliance on text-based input and limited capacity 
to process complex spatial information. 
Large language models for G-code generation in CNC machining: A comparison of ChatGPT-3.5 and ChatGPT-4o 
 
Advances in Production Engineering & Management 20(2) 2025 233 
 
4.3 Specific LLM limitations 
Specific LLM constraints, such as the size of the context window have a significant impact on the 
performance of the G-code. The context window, which determines the amount of text an LLM 
can process at once, poses a challenge for large G-code files or complex parts with thousands of 
lines of code.  
The scope of the training data also affects performance. While GPT-4o’s training likely in-
cludes diverse text data, it may lack comprehensive CNC machining datasets, especially for spe-
cialised processes or proprietary control syntax. This gap can lead to errors in generating G-code 
for niche applications or in interpreting ambiguous prompts. For example, the study shows that 
ChatGPT-3.5 had difficulty with milling operations, likely due to insufficient training in CNC-
specific terminology and processes. The improved performance of GPT-4o suggests a broader 
training dataset, yet its limitations become apparent with advanced toolpath strategies that re-
quire fine-tuned prompts to specify parameters accurately. Addressing these limitations may 
require fine-tuning the LLMs on CNC-specific data sets or enlarging the context windows to han-
dle larger G-code files. 
4.4 Future directions 
Future research should focus on several key factors to improve the application of LLMs in CNC 
machining. Evaluating GPT-4o with complex geometries and multi-axis machining scenarios 
would reveal its scalability and limitations. Secondly, establishing standardised metrics such as 
error rates, generation time and simulation success would allow a thorough comparison with 
CAM software. Thirdly, implementing CAM plugins, could connect customised LLM applications 
with the industry. Finally, LLM-specific constraints, such as the size of the context window and 
the amount of training data, may improve performance in advanced machining activities 
through fine-tuning or customised data sets. 
In summary, GPT-4o has potential for creating, debugging, and clarifying G-code for basic 3-
axis CNC machining and serves as a cost-effective alternative to conventional CAM software. 
Nevertheless, the problems associated with complicated shapes, challenging toolpath techniques 
and LLM limitations require further research and development. Under expert supervision and 
potential integration into CAM systems, GPT-4o could become an indispensable tool in manufac-
turing, particularly for small-batch production or for educational purposes. 
5. Conclusion 
This study has demonstrated the functionality of ChatGPT-3.5 and ChatGPT-4o in CNC machining 
using ISO G-code. Notable differences in performance were observed when evaluating when 
evaluating the ability of the AI models to generate, interpret and correct ISO G-code. ChatGPT-
3.5 showed limitations, particularly in identifying errors and explaining the code, frequently 
skipping lines and providing terse and uninformative descriptions. It also struggled with milling, 
although it showed some ability with simpler tasks such as drilling. 
In comparison, ChatGPT-4o produced fully functional ISO G-code for the example part, 
demonstrating its capability for applications with simple geometries. It showed an improved 
understanding of the code, successfully detecting and correcting errors while providing clear 
and thorough explanations for each line of code. This makes it a useful resource for learning and 
teaching G-code programming, as well as an additional verification method when code simula-
tion is not possible. Despite its advances, ChatGPT-4o cannot yet replace traditional CAM pro-
gramming, especially for complex operations. Its limitations, such as the requirement for text-
only input, can lengthen the information input process. However, for simple operations and in 
situations where CAM software is not available, it can significantly reduce the time required to 
manually write the G-code. 
In conclusion, ChatGPT, especially the GPT-4o model, has some potential to improve G-code 
programming, however, it still requires expert supervision. Subsequent studies should aim to 
improve the functionalities for more complicated machining operations and increase the inte-
gration of these AI models into current CAM systems. 
Šket, Potočnik, Brezocnik, Ficko, Klančnik  
 
234 Advances in Production Engineering & Management 20(2) 2025 
 
Funding and acknowledgment 
The authors acknowledge the financial support from the Slovenian Research and Innovation Agency (research core 
funding No. P2-0157). 
Declaration of competing interests 
The authors declare that they have no known competing financial interests or personal relationships that could influ-
ence the work in this article. 
References 
[1] Li, B.-H., Hou, B.-C., Yu, W.-T., Lu, X.-B., Yang, C.-W. (2017). Applications of artificial intelligence in intelligent 
manufacturing: A review, Frontiers of Information Technology & Electronic Engineering, Vol. 18, 86-96, doi: 
10.1631/FITEE.1601885. 
[2] Yang, T., Yi, X., Lu, S., Johansson, K.H., Chai, T. (2021). Intelligent manufacturing for the process industry driven 
by industrial artificial intelligence, Engineering, Vol. 7, No. 9, 1224-1230, doi: 10.1016/j.eng.2021.04.023. 
[3] Tao, F., Qi, Q., Liu, A., Kusiak, A. (2018). Data-driven smart manufacturing, Journal of Manufacturing Systems, Vol. 
48, Part C, 157-169, doi: 10.1016/j.jmsy.2018.01.006. 
[4] Davis, J., Edgar, T., Graybill, R., Korambath, P., Schott, B., Swink, D., Wang, J., Wetzel, J. (2015). Smart manufactur-
ing, Annual Review of Chemical and Biomolecular Engineering, Vol. 6, 141-160, doi: 10.1146/annurev-
chembioeng-061114-123255. 
[5] Wan, J., Li, X., Dai, H.-N., Kusiak, A., Martinez-Garcia, M., Li, D. (2021). Artificial-intelligence-driven customized 
manufacturing factory: Key technologies, applications, and challenges, Proceedings of the IEEE, Vol. 109, No. 4, 
377-398, doi: 10.1109/JPROC.2020.3034808. 
[6] Lee, J., Davari, H., Singh, J., Pandhare, V. (2018). Industrial artificial intelligence for industry 4.0-based manufac-
turing systems, Manufacturing Letters, Vol. 18, 20-23, doi: 10.1016/j.mfglet.2018.09.002. 
[7] Yao, X., Zhou, J., Zhang, J., Boer, C.R. (2017). From intelligent manufacturing to smart manufacturing for industry 
4.0 driven by next generation artificial intelligence and further on, In: Proceedings of 2017 5
th
 International Con-
ference on Enterprise Systems (ES), Beijing, China, 311-318, doi: 10.1109/ES.2017.58. 
[8] Heiden, B., Alieksieiev, V., Volk, M., Tonino-Heiden, B. (2021). Framing artificial intelligence (AI) additive manu-
facturing (AM), Procedia Computer Science, Vol. 186, 387-394, doi: 10.1016/j.procs.2021.04.161. 
[9] Rane, N., Choudhary, S., Rane, J. (2024). Intelligent manufacturing through generative artificial intelligence, such 
as ChatGPT or Bard, SSRN Electronic Journal, doi: 10.2139/ssrn.4681747. 
[10] Wang, X., Anwer, N., Dai, Y., Liu, A. (2023). ChatGPT for design, manufacturing, and education, Procedia CIRP, Vol. 
119, 7-14, doi: 10.1016/j.procir.2023.04.001. 
[11] Javaid, M., Haleem, A., Singh, R.P. (2023). A study on ChatGPT for industry 4.0: Background, potentials, challeng-
es, and eventualities, Journal of Economy and Technology, Vol. 1, 127-143, doi: 10.1016/j.ject.2023.08.001. 
[12] Badini, S., Regondi, S., Frontoni, E., Pugliese, R. (2023). Assessing the capabilities of ChatGPT to improve additive 
manufacturing troubleshooting, Advanced Industrial and Engineering Polymer Research, Vol. 6, No. 3, 278-287, 
doi: 10.1016/j.aiepr.2023.03.003. 
[13] Sriwastwa, A., Ravi, P., Emmert, A., Chokshi, S., Kondor, S., Dhal, K., Patel, P., Chepelev, L.L., Rybicki, F.J., Gupta, R. 
(2023). Generative AI for medical 3D printing: A comparison of ChatGPT outputs to reference standard educa-
tion, 3D Printing in Medicine, Vol. 9, Article No. 21, doi: 10.1186/s41205-023-00186-8. 
[14] Jignasu, A., Marshall, K., Ganapathysubramanian, B., Balu, A., Hegde, C., Krishnamurthy, A. (2023). Towards foun-
dational AI models for additive manufacturing: Language models for G-code debugging, manipulation, and com-
prehension, ArXiv, doi: 10.48550/arXiv.2309.02465. 
[15] Thirunavukarasu, A.J., Ting, D.S.J., Elangovan, K., Gutierrez, L., Tan, T.F., Ting, D.S.W. (2023). Large language mod-
els in medicine, Nature Medicine, Vol. 29, 1930-1940, doi: 10.1038/s41591-023-02448-8. 
[16] Dai, W., Lin, J., Jin, H., Li, T., Tsai, Y.-S., Gašević, D., Chen, G. (2023). Can large language models provide feedback to 
students? A case study on ChatGPT, In: Proceedings of 2023 IEEE International Conference on Advanced Learning 
Technologies (ICALT), Orem, USA, 323-325, doi: 10.1109/ICALT58122.2023.00100. 
[17] Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., Ye, W., Zhang, Y., Chang, 
Y., Yu, P.S., Yang, Q., Xie, X. (2024). A survey on evaluation of large language models, ACM Transactions on Intelli-
gent Systems and Technology, Vol. 15, No. 3, 1-45, doi: 10.1145/3641289. 
[18] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for 
language understanding, In: Proceedings of the 2019 Conference of the North American Chapter of the Association 
for Computational Linguistics: Human Language Technologies, Minneapolis, USA, 4171-4186, doi: 10.18653/ 
v1/N19-1423. 
[19] Bouschery, S.G., Blazevic, V., Piller, F.T. (2023). Augmenting human innovation teams with artificial intelligence: 
exploring transformer-based language models, Journal of Product Innovation Management, Vol. 40, No. 2, 139-
153, doi: 10.1111/jpim.12656. 
[20] Pearce, K., Zhan, T., Komanduri, A., Zhan, J. (2021). A comparative study of transformer-based language models 
on extractive question answering, ArXiv, doi: 10.48550/arXiv.2110.03142. 
Large language models for G-code generation in CNC machining: A comparison of ChatGPT-3.5 and ChatGPT-4o 
 
Advances in Production Engineering & Management 20(2) 2025 235 
 
[21] Kalla, D., Smith, N. (2023). Study and analysis of Chat GPT and its impact on different fields of study, Internation-
al Journal of Innovative Science and Research Technology, Vol. 8, No. 3, 827-833, doi: 10.5281/zenodo.7767675. 
[22] Chitsaart, C., Rianmora, S., Rattana-Areeyagon, M., Namjaiprasert, W. (2014). Automatic generating CNC-code for 
milling machine, International Journal of Mechanical, Aerospace, Industrial, Mechatronic and Manufacturing Engi-
neering, Vol. 7, No. 12, 2607-2613. 
[23] Zhang, Y., Zeng, Q., Mu, G., Yang, Y., Yan, Y., Song, W. , Gong, Y. (2018). A design for a novel open, intelligent and 
integrated CNC system based on ISO 10303-238 and PMAC, Tehnički Vjesnik – Technical Gazette, Vol. 25, No. 2, 
470-478, doi: 10.17559/TV-20170419111243. 
[24] Gu, Y., Wang, Y., Lin, J., Yuan, X. (2017). Fault location in CNC system software based on the architecture expan-
sion, Tehnički Vjesnik – Technical Gazette, Vol. 24, No. 2, 619-625, doi: 10.17559/TV-20160704190047. 
Appendix A 
 
Fig. A1 Query for G-code generation and file recognition using ChatGPT-4o model 
Šket, Potočnik, Brezocnik, Ficko, Klančnik  
 
236 Advances in Production Engineering & Management 20(2) 2025 
 
 
Fig. A2 Query for G-code debugging using ChatGPT-3.5 model (lines with error marked in yellow) 
 
Fig. A3 ChatGPT-3.5 model error detection capabilities (problematic part highlighted in red) 
Large language models for G-code generation in CNC machining: A comparison of ChatGPT-3.5 and ChatGPT-4o 
 
Advances in Production Engineering & Management 20(2) 2025 237 
 
 
Fig. A4 ChatGPT-3.5 model G-code explanation 
Šket, Potočnik, Brezocnik, Ficko, Klančnik  
 
238 Advances in Production Engineering & Management 20(2) 2025 
 
 
Fig. A5 ChatGPT-4o model G-code explanation