224 Advances in Production Engineering & Management ISSN 1854-6250 Volume 20 | Number 2 | June 2025 | pp 224–238 Journal home: apem-journal.org https://doi.org/10.14743/apem2025.2.537 Original scientific paper Large language models for G-code generation in CNC machining: A comparison of ChatGPT-3.5 and ChatGPT-4o Šket, K. a,* , Potočnik, D. a , Brezocnik, M. a , Ficko, M. a , Klančnik, S. a a University of Maribor, Faculty of Mechanical Engineering, Maribor, Slovenia A B S T R A C T A R T I C L E I N F O This research explores the viability of producing ISO G-code for 3-axis ma- chining with OpenAI's Chat Generative Pre-Trained Transformer models, particularly ChatGPT-3.5 and the newer GPT-4o. G-code (RS-274-D, ISO 6983) converts human directives into commands that machines can understand, controlling toolpaths, spindle velocities, and feed rates to produce particular aspects of an object. Previously, G-code was generated either by hand or through the use of computer-aided manufacturing (CAM) software along with machine-specific post-processors, both of which may require considerable time and expense. This research aimed to assess the practicality and effec- tiveness of specific large language models (LLMs) in generating G-code. The assessment took place in three distinct phases on a sample component that required 3-axis machining. These phases included: (1) the self-generated production of G-code for the sample component, (2) the examination of the independently generated G-code in the CAM application, and (3) the recogni- tion and justification of mistakes in the G-code. The outcomes indicated vary- ing abilities with promising findings. This method could accelerate and possi- bly enhance manufacturing workflows by decreasing reliance on expensive CAM software and specialized knowledge. Keywords: Generative artificial intelligence; Intelligent manufacturing; Large language models (LLM); ChatGPT; CNC machining; G-code programming *Corresponding author: kristijan.sket@um.si (Šket, K.) Article history: Received 9 May 2025 Revised 13 June 2025 Accepted 19 June 2025 Content from this work may be used under the terms of the Creative Commons Attribution 4.0 International Licence (CC BY 4.0). Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. 1. Introduction In advanced manufacturing, especially CNC machining, the incorporation of generative AI models like ChatGPT signifies a novel frontier. This integration aligns with the larger trend towards smart manufacturing and machining as outlined in [1, 2]. The rapid advancement of AI technologies, es- pecially data-driven systems, swarm intelligence, and hybrid human-machine systems, symbolizes progress of smart manufacturing [3, 4], in which AI's analytical and predictive power can be ap- plied to enhance and streamline CNC machining processes, advancing the continuous technological revolution in manufacturing characterized by the new age of the Internet and AI [5-7]. The objective of this study was to employ OpenAI's Chat Generative Pre-Trained Transformer, widely referred to as ChatGPT, in CNC machining to explore its ability to identify, comprehend, and produce ISO G-code for milling operations. This study draws inspiration from the extensive adop- tion of AI in production and manufacturing systems. The research conducted by Hu Li and col- leagues [1] offers a technical examination of the elements that are crucial for the acceptance of AI. Bernhard Heiden and colleagues [8] demonstrate how AI can integrate with manufacturing tech- niques to establish a self-organizing system. Both studies highlight the significance of AI in enhanc- Large language models for G-code generation in CNC machining: A comparison of ChatGPT-3.5 and ChatGPT-4o Advances in Production Engineering & Management 20(2) 2025 225 ing material movement and process effectiveness. This emphasis on data analysis aligns with the increasing trend of employing AI to assess and enhance manufacturing parameters. 1.1 Related work The paper [9] authored by Rane et al. reviews ChatGPT, Bard, and various other generative AI technologies and performs a comprehensive examination, featuring a literature review and bib- liometric analysis, to identify the prevailing trends and significant factors in the incorporation of AI tools like ChatGPT into manufacturing, pointing out these technologies as a crucial element in the continuous advancement in production engineering. In a study by Wang et al. [10] the authors analyse the use of ChatGPT in the manufacturing industry and evaluate its advantages and disadvantages. It works well with structured and com- prehensive answers, yet drawbacks were identified, particularly in the provision of accurate technical expertise and the tendency to generate incorrect information when queries are made from outside the training data. The authors emphasize the importance of human verification of answers for efficient communication. In their study, Javaid, Haleem and Singh [11] discuss the incorporation of ChatGPT into the framework of Industry 4.0, exploring how ChatGPT can be adapted to automate tasks and exam- ining its applications in Industry 4.0, including improving human-robot collaboration, support- ing predictive maintenance, ensuring quality control, and performing big data analytics. Some extensive research has been conducted into ChatGPT’s usage in additive manufacturing (AM) and usage of G-code in that field. Badini et al. [12] conducted an evaluation of usability by optimizing the generation of G-codes for fused filament fabrication. ChatGPT’s ability to process and optimize suboptimal G-code data has shown its potential to streamline the process. In their article [13], Sriwastwa et al. examined the role of ChatGPT in enhancing training for medical 3D printing. Their research indicated that it offers precise and beneficial responses to fundamental inquiries, particularly for novice trainees or newcomers; nonetheless, as the difficulty of the ques- tions rises, particularly for situations demanding practical experience or thorough technical exper- tise, the shortcomings become increasingly clear. In [14], the authors discuss the use of large lan- guage models (LLM) in AM, with a particular focus on their ability to understand and process G- code. The study shows that while models such as GPT-4 and Claude-2 perform excellently in sev- eral areas, their ability to comprehensively analyse and capture the complicated geometry of G- code is significantly limited, mainly due to the short length of the context windows. 1.2 Study justification To authors’ knowledge, there are no studies that focus on the use of generative AI models, such as ChatGPT, for the automatic generation, interpretation, and correction of G-code in CNC ma- chining, although there is a variety of research on AI and its applications in manufacturing. This lack of literature highlights the need for a focused study to evaluate the usefulness and draw- backs of these models in actual production scenarios. This study is the first evaluation of ChatGPT's ability to generate, decode and correct ISO G- code for CNC machining. It explores the potential of ChatGPT to automate key steps in the CNC programming process, in contrast to previous studies that have investigated more general appli- cations of AI in manufacturing. The focus of the study is not on the architectural novelty of the models, but on the empirical evaluation of how these differences affect the performance of G- code programming. The article is based on experiments that evaluated responses with three main objectives. First, the ability to create G-code with inputs typically used in commercial computer-aided man- ufacturing (CAM) programs was determined. Second, ChatGPT's understanding of G-code was evaluated by creating a simple program for 3-axis machining. In the third iteration, ChatGPT's ability to recognize errors and attempt to correct the given G-code was evaluated. The goal of this investigation was to assess ChatGPT's capabilities for this type of work and to determine whether it is possible to create G-code for simple CNC machining problems using ChatGPT alone, which could reduce the need for commercial CAM software in the future. Šket, Potočnik, Brezocnik, Ficko, Klančnik 226 Advances in Production Engineering & Management 20(2) 2025 2. Materials and methods 2.1 Used artificial intelligence method LLMs have transformed natural language processing through transformer-based neural network architectures (TSMs) [15, 16]. The progression of natural language processing encompasses four key stages: statistical language models, neural language models, pre-trained language models, and LLMs, each stage building on the capabilities of the prior generation to improve language understanding and generation [15, 16]. These models have shown potential in areas such as healthcare [15], education [16], and scientific research [17]; yet some challenges remain, includ- ing data bias, high computational costs, and ethical considerations. TSMs are typically divided into two main types: scratch-trained and pre-trained. Models devel- oped from the ground up are tailored for specific tasks, while pre-trained models first undergo ex- tensive training on large text datasets using self-supervised learning methods before they are fine- tuned for specific downstream applications. Instances of pre-trained models are BERT [18] (Bidirec- tional Encoder Representations from Transformers) and GPT. BERT employs a bidirectional tech- nique to understand the context of words in both directions in a sentence, whereas GPT utilizes a unidirectional approach that focuses on forecasting the next word in a sequence [19, 20]. The functionality and implementation of ChatGPT is complicated and advanced, but the prod- uct is a system that can respond to queries and cues like a human [21] and thanks to its scalabil- ity, it can manage multiple conversations simultaneously, increasing productivity and reducing the need for human intervention. Its efficiency in processing large amounts of data quickly also saves time. But ChatGPT also has some significant drawbacks. It can reproduce biases from the training set, potentially promoting discrimination or stereotyping. Additionally, since its knowledge base is limited to training data, it may contain errors for unusual or specialized topics. In this study, the limitations of ChatGPT were tested in the generation of G-code for CNC machining. 2.2 ISO G-code G-code is a language used to control toolpaths and generate the profiles of an object's features in CNC machining. It is essentially a set of instructions that translate human directives into ma- chine-readable instructions so that CNC machines can operate automatically. The primary aim of G-code is to regulate different aspects of the machining process, including the motion of the cut- ting tool, feed rates, spindle speeds, and coolant flow, to ensure high precision and efficiency in production, while also enabling safe operation and collision-free machining [22-24]. G-code can be generated automatically with CAM software that uses a virtual 3D model as input to produce the necessary code for machining the part, or it can be manually written. However, creating the G-code manually requires computer and programming skills. Generating the G-code with CAM requires specialized software and a machine-specific post-processor, which can be quite expensive and requires a trained expert to operate. On the other hand, writing the code manually can be time-consuming and repetitive. The use of AI models for the rapid creation of G-code for simple and small series of parts could, therefore, be a cost-effective and fast solution [22]. 2.3 Example part The example part consists of a cube (Fig. 1) measuring 150 × 150 × 150 mm. On its top side, there are four symmetrically arranged holes, each with a diameter of 16 mm, located 25 mm from the edges of the cube and reach a depth of 50 mm. In the centre there is a through-hole with a diameter of 12 mm. This central through-hole is counterbored with a diameter of 25 mm and a depth of 25 mm. In addition, the upper surface must be face-milled by 1 mm. Large language models for G-code generation in CNC machining: A comparison of ChatGPT-3.5 and ChatGPT-4o Advances in Production Engineering & Management 20(2) 2025 227 Fig. 1 Design of the example part 2.4 Responses verifying method To check the answers generated by ChatGPT, the G-code required to produce the example part was created separately in the CAM software Siemens NX version 2312 build 1700 where every- thing required for a functioning ISO G-code was defined. A part measuring 150 × 150 × 151 mm was defined as the blank and four different tools were provided to create the desired part. Tool characteristics and intended uses are listed in Table 1. Table 1 Tools properties and intended use Tool number Tool name Feed rate (mm/min) Surface speed (mm/min) Operation T01 Insert cutter 𝜙𝜙 50 mm 250 100 Top face milling T02 Drill 𝜙𝜙 16 mm 250 100 Drilling holes 𝜙𝜙 16 T03 Drill 𝜙𝜙 12 mm 250 100 Drilling hole 𝜙𝜙 12 T04 Carbide end mill 𝜙𝜙 15 mm 250 100 Pocket and central hole milling 2.5 Queries encoding The requests were simple and informative, aiming to provide ChatGPT with all relevant infor- mation, especially when it was expected to create the G-code for the requested part. The model was encouraged to make suggestions about what information might be missing or could be bet- ter presented. It was decided that the model should first create its own version of the code before the sepa- rately created code was fed into the model for debugging and explanation. This way, the model did not have access to any pre-learned data. Additionally, the conversations with the model were performed in the temporary chat, where the input data should not be used for model learning (specified by the service provider). 2.6 G-code preparation To prepare the G-code readable file for the model, an STL file of the example part and a technical drawing were created. In addition, text descriptions of the part, the tool library, and the capabili- ties of the CNC machine were added. Features that should be used in the code such as planes, units, feed modes, coordinate systems, and tool lengths compensation modes, etc. were specified to clearly describe what and how the part should be manufactured. The input text is shown in Fig. 2. Šket, Potočnik, Brezocnik, Ficko, Klančnik 228 Advances in Production Engineering & Management 20(2) 2025 Fig. 2 Text query for G-code generation using ChatGPT-3.5 and GPT-4o To test the image recognition capabilities of the GPT-4o model and the recognition capabili- ties of STL files, the text explanation of the designed part in the query was replaced with an STL file of the 3D model and an image of the technical documentation (Fig. 1), as presented in Ap- pendix Fig. A1. 2.7 Debugging G-code For the requests to debug and repair the created G-code, a part description and the entire G-code with three implemented errors (Table 2) were provided to the model along with the request for ChatGPT to identify and fix them. For the implemented errors, types were selected that have a major impact on the stability of the process and can lead to damage to the workpiece, tool or machine. The absence of the M3 command can cause the process to start with the spindle switched off, potentially leading to a collision between the tool and the workpiece. The next error is a rapid traverse movement (G0) instead of a feed movement (G1), which means that the tool moves into the workpiece at a much higher speed, which can also result in a collision. In the last error implemented, the spindle speed was set to an impossible speed. In practice, this may cause the process to stop, but it could also result in machining at the maximum speed the machine can deliver, producing an unstable process that can cause serious damage to the workpiece, tool, and machine. The entire query for ChatGPT-3.5 is shown in Fig. A2. Table 2 Implemented errors No. of error Line in code Error Correct 1 4 Absence of M3 (spindle on) M3 after spindle speed set 2 31 G0 (rapid traverse movement) G1 (movement at feed rate) 3 53 Spindle speed set at 716000 RPM Spindle speed set at 7160 RPM 2.8 Explaining G-code For ChatGPT’s explanation of the code, the model was provided with a separately created G-code for the specified part and asked to provide a detailed description of each line of code. The re- quest is shown in Fig. 3. Large language models for G-code generation in CNC machining: A comparison of ChatGPT-3.5 and ChatGPT-4o Advances in Production Engineering & Management 20(2) 2025 229 Fig. 3 Query for G-code explanation for both models 3. Results As mentioned earlier, ChatGPT's capabilities with respect to ISO G-code for 3-axis CNC machin- ing were evaluated by verifying its ability to automatically generate the code with the given in- structions, detect and correct errors in written code and explain the meaning of the code in de- tail. In this way, a relevant and practical comparison was achieved by creating realistic scenarios that a CNC programmer could encounter. The tested models GPT-3.5and GPT-4o showed a large discrepancy in understanding both the code and the instructions. Both models were tested with identical prompts and analysed using Siemens NX CAM soft- ware (version 2312 build 1700) for simulation validation. The target geometry, tool set, and all machine constraints were standardized to ensure fairness. The following subsections present the observed differences in model performance. 3.1 G-code generation When working with the model GPT-3.5, no fully functional code was generated. The closest re- sult was obtained with a semi-functional code, as shown in Fig. 4A. The code contained several errors in both milling operations, such as: • a collision between the tool and the workpiece, • movements over the same position, • circular interpolation in the wrong direction, • missing sections of the milling operations. The code was tested in the simulation software Siemens NX version 2312 build 1700. The de- sired part is shown in Fig. 4B and the resulting part in Fig. 4C. The errors in the code resulted in an unfinished face milling operation, with the turquoise colour representing areas where no face milling operation was performed and the darker blue colour representing areas with a complet- ed face milling operation. The code also caused a collision and an off-center milled blind hole with an unfinished bot- tom (see Fig. 4C). In the authors’ limited experience, GPT-3.5 is not very useful when working with ISO G-code for 3-axis CNC machining, especially for milling operations. However, it can produce functional code for pure drilling operations. GPT-4o produced fully functional code (Fig. 5A) that can be used directly for manufacturing on an ISO G-code compatible CNC machine. The code is well organized and follows the given instructions. In addition, a request was sent to the model to display the features generated by the code in 2D (Fig. 5B). The generated code was then tested again in the dedicated CAM soft- ware by Siemens NX to verify both the code and the resulting part. The part generated with the GPT-4o code matches the expected design and is shown in Fig. 5C. Šket, Potočnik, Brezocnik, Ficko, Klančnik 230 Advances in Production Engineering & Management 20(2) 2025 Fig. 4 ChatGPT model 3.5 outputted G-code (A), the desired result of the code (B), and the actual result of the simulated code in Siemens NX version 2312 build 1700 (C) Fig. 5 ChatGPT GPT-4o model output G-code (A), the output depiction (B), and the result of the simulated code in Siemens NX version 2312 build 1700 (C) Large language models for G-code generation in CNC machining: A comparison of ChatGPT-3.5 and ChatGPT-4o Advances in Production Engineering & Management 20(2) 2025 231 3.2 Debugging G-code A deliberately flawed G-code was presented to both models. The bugs built into the code proved too difficult for model 3.5 to detect. As can be seen on Fig. A3, not only were there no errors de- tected, but it was even claimed that none were present, and the code should work as expected. This is of course unacceptable and could lead to damage and a failed process. Based on the re- sults the usage of model 3.5 for error detection is not suggested. In contrast, the ChatGPT-4o model not only successfully identified the errors and implement- ed the correct syntax, but also explained what the errors were and why they could lead to prob- lems. It additionally warned about other parts of the code that could be redundant or problemat- ic and corrected them. By focusing only on the errors entered, the model has shown that it understands the code and the process that the code specifies, considering the part, the tool and the machine. It has correct- ly recognized the missing command to switch on the spindle which causes the machine to start, the G0 command which would result in a collision has been correctly changed to G1 and the set spindle speed has been overridden with a more reasonable value so that the occurrence of er- rors or damage is mitigated. The model's responses are shown in Fig. 6. The code was tested again using the Siemens NX simulation tool to check that it still worked as intended, which it did. Fig. 6 ChatGPT-4o model error detection capabilities (corrections highlighted in yellow) 3.3 Explaining G-code When it comes to explaining the meaning of the commands in the code, GPT-3.5 provides only a very limited explanation. Although it was instructed to explain the code line by line, many com- mands were skipped, and those that were explained are presented very briefly, though correctly. A potential issue is that the model skipped commands related to movements during machining operations, which are important to know. The explanation of the code by GPT-3.5 is shown in Fig. A4. As in the previous categories, GPT-4o once again outperformed GPT-3.5, this time in code ex- planation. The model provided a line-by-line explanation of each command in a readable and Šket, Potočnik, Brezocnik, Ficko, Klančnik 232 Advances in Production Engineering & Management 20(2) 2025 coherent manner. It correctly recognized different types of commands, such as program start and end, selected planes, units, rapid and work movements, tool selection, coordinate system selection, tool in moves and retraction, machine settings, etc. This demonstrates that the model understands various aspects of CNC machining and can recognize and explain them correctly. Such detailed explanations could be useful for individuals learning ISO G-code or for teaching custom machine learning models. The detailed explanation of the code is shown in Fig. A5. 4. Discussion The results show that the performance differences between GPT-3.5 and GPT-4o are not only quantitative (i.e., in terms of the number of parameters) but also functional, particularly in the context of ISO G-code. In contrast to previous general AI comparisons, this study introduces a task-specific benchmarking framework that is validated with professional CAM tools. The study provides a basis for understanding LLMs role in CNC programming; however, several critical aspects, including the handling of complex geometries, integration with CAM systems, and spe- cific limitations require further discussed. 4.1 Practical implementation challenges The application of GPT-4o in CNC machining practise poses various practical difficulties. The conclusion of the study emphasizes the need for professional monitoring and highlights that the models cannot yet completely replace the CAM software. Even small errors in the G-code can lead to costly machine damage, production delays or safety risks and therefore require strict validation processes. In practise, ensuring error-free G-code requires experienced operators to check outputs, which can reduce the appeal of models in demanding production environments. The reliance on carefully crafted prompts emphasises the importance of input quality. Incom- plete or inconsistent prompts can lead to erroneous G-code, especially for users with little CNC knowledge who may struggle to define machining parameters such as feed rates, toolpaths or coordinate systems. This presents a challenge because operators may have varying levels of technical knowledge. The lack of direct integration with CNC machines or CAM systems complicates implementa- tion. Unlike CAM software, which can seamlessly interface with CAD models and machine con- trollers, GPT operates as a standalone tool. Users must manually input data and transfer the out- put to the machining systems, a process that can introduce errors. Integration as a CAM plug-in is possible but would require significant development effort to achieve real-time data transfer and compatibility with different machine controllers. Additionally, the research assumes that the ISO G-code is compatible with all CNC controls, as most modern controls (e.g. from Fanuc, Siemens or Heidenhain) comply with this standard. However, controls often require machine-specific post-processing to accommodate slight differ- ences in syntax, proprietary M-codes or specified cycles. GPT-4o has effectively generated G- code that has been validated in Siemens NX, but its ability to adapt to control-specific details without direct guidance has yet to be evaluated. Some controls may require additional com- mands to change tools or specific formatting for coordinate systems. 4.2 Model performance for complex geometries ChatGPT-3.5 generated a partially functional G-code, whereas GPT-4o generated a functional G- code, demonstrating its improved ability to understand and generate machining instructions. Nevertheless, both models struggle with complex geometries, such as free-form surfaces or complex toolpaths, which were not investigated in this study. Complicated shapes require accurate toolpath calculations and an understanding of complex machining dynamics. The research focuses on a basic 3-axis component, raising concerns about the models’ ability to scale to multi-axis machining (e.g., 4- or 5-axis), where tool orientation and simultaneous multi-axis movements add complexity. Both models, particularly ChatGPT-3.5, are likely to struggle in such scenarios due to their reliance on text-based input and limited capacity to process complex spatial information. Large language models for G-code generation in CNC machining: A comparison of ChatGPT-3.5 and ChatGPT-4o Advances in Production Engineering & Management 20(2) 2025 233 4.3 Specific LLM limitations Specific LLM constraints, such as the size of the context window have a significant impact on the performance of the G-code. The context window, which determines the amount of text an LLM can process at once, poses a challenge for large G-code files or complex parts with thousands of lines of code. The scope of the training data also affects performance. While GPT-4o’s training likely in- cludes diverse text data, it may lack comprehensive CNC machining datasets, especially for spe- cialised processes or proprietary control syntax. This gap can lead to errors in generating G-code for niche applications or in interpreting ambiguous prompts. For example, the study shows that ChatGPT-3.5 had difficulty with milling operations, likely due to insufficient training in CNC- specific terminology and processes. The improved performance of GPT-4o suggests a broader training dataset, yet its limitations become apparent with advanced toolpath strategies that re- quire fine-tuned prompts to specify parameters accurately. Addressing these limitations may require fine-tuning the LLMs on CNC-specific data sets or enlarging the context windows to han- dle larger G-code files. 4.4 Future directions Future research should focus on several key factors to improve the application of LLMs in CNC machining. Evaluating GPT-4o with complex geometries and multi-axis machining scenarios would reveal its scalability and limitations. Secondly, establishing standardised metrics such as error rates, generation time and simulation success would allow a thorough comparison with CAM software. Thirdly, implementing CAM plugins, could connect customised LLM applications with the industry. Finally, LLM-specific constraints, such as the size of the context window and the amount of training data, may improve performance in advanced machining activities through fine-tuning or customised data sets. In summary, GPT-4o has potential for creating, debugging, and clarifying G-code for basic 3- axis CNC machining and serves as a cost-effective alternative to conventional CAM software. Nevertheless, the problems associated with complicated shapes, challenging toolpath techniques and LLM limitations require further research and development. Under expert supervision and potential integration into CAM systems, GPT-4o could become an indispensable tool in manufac- turing, particularly for small-batch production or for educational purposes. 5. Conclusion This study has demonstrated the functionality of ChatGPT-3.5 and ChatGPT-4o in CNC machining using ISO G-code. Notable differences in performance were observed when evaluating when evaluating the ability of the AI models to generate, interpret and correct ISO G-code. ChatGPT- 3.5 showed limitations, particularly in identifying errors and explaining the code, frequently skipping lines and providing terse and uninformative descriptions. It also struggled with milling, although it showed some ability with simpler tasks such as drilling. In comparison, ChatGPT-4o produced fully functional ISO G-code for the example part, demonstrating its capability for applications with simple geometries. It showed an improved understanding of the code, successfully detecting and correcting errors while providing clear and thorough explanations for each line of code. This makes it a useful resource for learning and teaching G-code programming, as well as an additional verification method when code simula- tion is not possible. Despite its advances, ChatGPT-4o cannot yet replace traditional CAM pro- gramming, especially for complex operations. Its limitations, such as the requirement for text- only input, can lengthen the information input process. However, for simple operations and in situations where CAM software is not available, it can significantly reduce the time required to manually write the G-code. In conclusion, ChatGPT, especially the GPT-4o model, has some potential to improve G-code programming, however, it still requires expert supervision. Subsequent studies should aim to improve the functionalities for more complicated machining operations and increase the inte- gration of these AI models into current CAM systems. Šket, Potočnik, Brezocnik, Ficko, Klančnik 234 Advances in Production Engineering & Management 20(2) 2025 Funding and acknowledgment The authors acknowledge the financial support from the Slovenian Research and Innovation Agency (research core funding No. P2-0157). Declaration of competing interests The authors declare that they have no known competing financial interests or personal relationships that could influ- ence the work in this article. References [1] Li, B.-H., Hou, B.-C., Yu, W.-T., Lu, X.-B., Yang, C.-W. (2017). Applications of artificial intelligence in intelligent manufacturing: A review, Frontiers of Information Technology & Electronic Engineering, Vol. 18, 86-96, doi: 10.1631/FITEE.1601885. [2] Yang, T., Yi, X., Lu, S., Johansson, K.H., Chai, T. (2021). Intelligent manufacturing for the process industry driven by industrial artificial intelligence, Engineering, Vol. 7, No. 9, 1224-1230, doi: 10.1016/j.eng.2021.04.023. [3] Tao, F., Qi, Q., Liu, A., Kusiak, A. (2018). Data-driven smart manufacturing, Journal of Manufacturing Systems, Vol. 48, Part C, 157-169, doi: 10.1016/j.jmsy.2018.01.006. [4] Davis, J., Edgar, T., Graybill, R., Korambath, P., Schott, B., Swink, D., Wang, J., Wetzel, J. (2015). Smart manufactur- ing, Annual Review of Chemical and Biomolecular Engineering, Vol. 6, 141-160, doi: 10.1146/annurev- chembioeng-061114-123255. [5] Wan, J., Li, X., Dai, H.-N., Kusiak, A., Martinez-Garcia, M., Li, D. (2021). Artificial-intelligence-driven customized manufacturing factory: Key technologies, applications, and challenges, Proceedings of the IEEE, Vol. 109, No. 4, 377-398, doi: 10.1109/JPROC.2020.3034808. [6] Lee, J., Davari, H., Singh, J., Pandhare, V. (2018). Industrial artificial intelligence for industry 4.0-based manufac- turing systems, Manufacturing Letters, Vol. 18, 20-23, doi: 10.1016/j.mfglet.2018.09.002. [7] Yao, X., Zhou, J., Zhang, J., Boer, C.R. (2017). From intelligent manufacturing to smart manufacturing for industry 4.0 driven by next generation artificial intelligence and further on, In: Proceedings of 2017 5 th International Con- ference on Enterprise Systems (ES), Beijing, China, 311-318, doi: 10.1109/ES.2017.58. [8] Heiden, B., Alieksieiev, V., Volk, M., Tonino-Heiden, B. (2021). Framing artificial intelligence (AI) additive manu- facturing (AM), Procedia Computer Science, Vol. 186, 387-394, doi: 10.1016/j.procs.2021.04.161. [9] Rane, N., Choudhary, S., Rane, J. (2024). Intelligent manufacturing through generative artificial intelligence, such as ChatGPT or Bard, SSRN Electronic Journal, doi: 10.2139/ssrn.4681747. [10] Wang, X., Anwer, N., Dai, Y., Liu, A. (2023). ChatGPT for design, manufacturing, and education, Procedia CIRP, Vol. 119, 7-14, doi: 10.1016/j.procir.2023.04.001. [11] Javaid, M., Haleem, A., Singh, R.P. (2023). A study on ChatGPT for industry 4.0: Background, potentials, challeng- es, and eventualities, Journal of Economy and Technology, Vol. 1, 127-143, doi: 10.1016/j.ject.2023.08.001. [12] Badini, S., Regondi, S., Frontoni, E., Pugliese, R. (2023). Assessing the capabilities of ChatGPT to improve additive manufacturing troubleshooting, Advanced Industrial and Engineering Polymer Research, Vol. 6, No. 3, 278-287, doi: 10.1016/j.aiepr.2023.03.003. [13] Sriwastwa, A., Ravi, P., Emmert, A., Chokshi, S., Kondor, S., Dhal, K., Patel, P., Chepelev, L.L., Rybicki, F.J., Gupta, R. (2023). Generative AI for medical 3D printing: A comparison of ChatGPT outputs to reference standard educa- tion, 3D Printing in Medicine, Vol. 9, Article No. 21, doi: 10.1186/s41205-023-00186-8. [14] Jignasu, A., Marshall, K., Ganapathysubramanian, B., Balu, A., Hegde, C., Krishnamurthy, A. (2023). Towards foun- dational AI models for additive manufacturing: Language models for G-code debugging, manipulation, and com- prehension, ArXiv, doi: 10.48550/arXiv.2309.02465. [15] Thirunavukarasu, A.J., Ting, D.S.J., Elangovan, K., Gutierrez, L., Tan, T.F., Ting, D.S.W. (2023). Large language mod- els in medicine, Nature Medicine, Vol. 29, 1930-1940, doi: 10.1038/s41591-023-02448-8. [16] Dai, W., Lin, J., Jin, H., Li, T., Tsai, Y.-S., Gašević, D., Chen, G. (2023). Can large language models provide feedback to students? A case study on ChatGPT, In: Proceedings of 2023 IEEE International Conference on Advanced Learning Technologies (ICALT), Orem, USA, 323-325, doi: 10.1109/ICALT58122.2023.00100. [17] Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., Ye, W., Zhang, Y., Chang, Y., Yu, P.S., Yang, Q., Xie, X. (2024). A survey on evaluation of large language models, ACM Transactions on Intelli- gent Systems and Technology, Vol. 15, No. 3, 1-45, doi: 10.1145/3641289. [18] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding, In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, USA, 4171-4186, doi: 10.18653/ v1/N19-1423. [19] Bouschery, S.G., Blazevic, V., Piller, F.T. (2023). Augmenting human innovation teams with artificial intelligence: exploring transformer-based language models, Journal of Product Innovation Management, Vol. 40, No. 2, 139- 153, doi: 10.1111/jpim.12656. [20] Pearce, K., Zhan, T., Komanduri, A., Zhan, J. (2021). A comparative study of transformer-based language models on extractive question answering, ArXiv, doi: 10.48550/arXiv.2110.03142. Large language models for G-code generation in CNC machining: A comparison of ChatGPT-3.5 and ChatGPT-4o Advances in Production Engineering & Management 20(2) 2025 235 [21] Kalla, D., Smith, N. (2023). Study and analysis of Chat GPT and its impact on different fields of study, Internation- al Journal of Innovative Science and Research Technology, Vol. 8, No. 3, 827-833, doi: 10.5281/zenodo.7767675. [22] Chitsaart, C., Rianmora, S., Rattana-Areeyagon, M., Namjaiprasert, W. (2014). Automatic generating CNC-code for milling machine, International Journal of Mechanical, Aerospace, Industrial, Mechatronic and Manufacturing Engi- neering, Vol. 7, No. 12, 2607-2613. [23] Zhang, Y., Zeng, Q., Mu, G., Yang, Y., Yan, Y., Song, W. , Gong, Y. (2018). A design for a novel open, intelligent and integrated CNC system based on ISO 10303-238 and PMAC, Tehnički Vjesnik – Technical Gazette, Vol. 25, No. 2, 470-478, doi: 10.17559/TV-20170419111243. [24] Gu, Y., Wang, Y., Lin, J., Yuan, X. (2017). Fault location in CNC system software based on the architecture expan- sion, Tehnički Vjesnik – Technical Gazette, Vol. 24, No. 2, 619-625, doi: 10.17559/TV-20160704190047. Appendix A Fig. A1 Query for G-code generation and file recognition using ChatGPT-4o model Šket, Potočnik, Brezocnik, Ficko, Klančnik 236 Advances in Production Engineering & Management 20(2) 2025 Fig. A2 Query for G-code debugging using ChatGPT-3.5 model (lines with error marked in yellow) Fig. A3 ChatGPT-3.5 model error detection capabilities (problematic part highlighted in red) Large language models for G-code generation in CNC machining: A comparison of ChatGPT-3.5 and ChatGPT-4o Advances in Production Engineering & Management 20(2) 2025 237 Fig. A4 ChatGPT-3.5 model G-code explanation Šket, Potočnik, Brezocnik, Ficko, Klančnik 238 Advances in Production Engineering & Management 20(2) 2025 Fig. A5 ChatGPT-4o model G-code explanation