https://doi.org/10.31449/inf.v48i14.5175 Informatica 48 (2024) 189–204 189 Optimization Strategy of Customer Relationship Management based on Big Data Analysis Baohua Zhang 1* , Sanbao Zhang 2 , Chaojie Zhang 3 1 School of Business Administration Tongling University, Tongling Anhui, 244061, China 2 School of Finance Tongling University, Tongling Anhui, 244061, China 3 School of Mathematics and Computer Science Nanchang University, Nanchang Jiangxi, 330031, China E-mail: 164108@tlu.edu.cn * Corresponding author Keywords: customer relationship, data mining, apriori algorithm, real-time decision engine Received: September 10, 2023 This paper analyzes the basic customer management process and chooses the Apriori algorithm to build a CRM model based on data mining. In addition, this paper designs a customer relationship management system based on big data. The system is divided into three layers: data source, batch processing, and real- time processing. In the part of constructing the system architecture, this paper adopts the Hadoop platform. The batch processing layer, it has consisted of four parts, which include No SQL database, Oracle database, ETL architecture, and Hadoop platform. This paper gives the logical architecture design diagram. The real-time processing layer mainly includes a real-time decision engine and service bus. The key part of this layer is the real-time decision engine, in the design of which the Bayesian algorithm and product recommendation prediction model are used. Finally, this paper takes K company as an example to demonstrate the model and management system. After applying the analytical model and management system, the sales of K company keep increasing. Povzetek: Analiziran je proces upravljanja odnosov s strankami in predlagan CRM model na osnovi algoritma Apriori in analize velikih podatkov. Sistem, zasnovan na Hadoop platformi, izboljša prodajo podjetja. 1 Introduction Today's world is gradually stepping into a new economic era. Social productive forces are continuously developing, and our society is changing from the era of low production efficiency into the current surplus of various products with many more types and styles in our surroundings [1]. The variety and channels of products that customers can choose from are also significantly increased. The market is constantly cyclical, and the market may boom or bust. In the cycle of market economy development, the only thing that can be determined is that market competition will become increasingly fierce. Under such fierce competition, customer relationships have become increasingly obvious [2]. It is increasingly important for the company to consider the customer relationship as an extremely important factor for sustainable development in the future. As is known to all, customers are becoming the main force of a company's economy in this society. To develop customers to their maximum value and increase sales, a company must pay attention to managing customer relations and continuously develop new products to satisfy customers. Thus, the company can reach the goal of maintaining its competitive advantage in the market [3]. The way to build a good relationship with customers is mainly three parts. Developing new customers, improving customer satisfaction, and conducting effective customer segmentation are three important things for companies to optimize customer relationship management. In the business process oriented by customer demand, companies must take customer demand as the starting point in all aspects of the product. The company should pay attention to the whole production process, from the first step, product design, to the last step, after-sales service. Only by continuously meeting the needs of customers can the company's own operational capabilities and business capabilities be improved. Focusing on the customer as the centre, innovate the company's marketing, sales, service and other aspects so that the company can more efficiently provide customers with satisfactory and thoughtful services. Because the existing database does not have the perfect data mining function, it is not possible to find the rules implied in the massive data and the correlation between the data, it is difficult to evaluate and predict the future development situation based on the existing data, and there is no technology to mine the rules implied by a large number of data, so that a large number of data becomes "data garbage". The emergence of customer relationship management just solves the problem that a large number of customer data in the enterprise database cannot be utilized. CRM is the technology and method that enterprises use computer information technology to realize the investigation and understanding of customer information, analysis and service, and finally achieve the 190 Informatica 48 (2024) 189–204 B. Zhang et al. retention of original customers and the discovery of new customers. Nowadays, the development of enterprises is customer-centric. Shopping malls are like battlefields. In a highly competitive environment, it is impossible to retain customers by improving product quality. Only by improving service can enterprises maximize their interests and win the final victory. How to improve service in such fierce business competition? How to easily achieve customer management and maintain good contact with existing customers, while attracting more new customers? How to reduce the total cost of the enterprise and maximize the profit? The emergence of customer relationship management system makes these problems easily solved. Association rules are one of the necessary algorithms to implement customer relationship management system. The application of association rules in CRM mainly includes classification and prediction, which can be summarized in many aspects such as customer group classification, customer profitability analysis, customer acquisition and retention, and customer satisfaction analysis. Big data processing technologies currently provide more methods for each company's data needs. In this era, every company generates a large amount of data every year, which ends up building a huge database. Therefore, every company can use information data analysis to promote its business development. More and more companies realize the value of the big database and use it as an effective way to build good customer relationships, which also require information technology support. It can be said that the development of big data applications provides new ideas and new technical support for companies to implement customer relationship reform. This paper designs a new CRM model and system to help companies get new tools to optimize customer management. The application of current science and technology can quickly and effectively analyze customer demand preferences, which provides a new economic growth point for companies and promote the development of related technology as well. From the perspective of practical application, customer relationship management research is limited to theoretical research, not really applicable to the actual development needs of companies. This paper applies the theoretical model and design system to K company's customer relationship management example and analyzes the current research theories in detail. 2 Related work In the middle and late 20th century, research on customer relation management gradually emerged, and the United States first began to explore the field of scientific management decision-making. In the 1980s, the United States first put forward the very important topic of "contact management" in the marketing field. It put forward the marketing conception of collecting customer information- relationship marketing. In the 1990s, based on relationship marketing, scholars put forward the concept of "Customer Care", which can be used by all enterprises and industries [4]. At that time, some scholars proposed that the core of customer relationship management should be automation, which can be used in the entire business process. Processes can be improved in an automated way to form a systematic system. Some scholars in academia believe that using data warehouse technology in companies, comprehensively integrating and analyzing massive user data, subdividing customers through different classifications, and dividing customer preferences can help corporate decision-makers improve business strategy better [5]. This is the starting point of the combination of customer relationship theory and user data, but so far, it only exists in the theoretical stage. From the mid-1990s to 2014, customer relationship management was gradually transformed into practical applications. After 2000, the system software for customer relationship management was developed, and customer relationships began to receive attention in the field of commercial applications. The main feature of this stage is technology [6]. At this time, the research focus planned to apply database technology to customer relationship management research. However, the technology is still in a relatively immature stage at this stage, and the methods of data acquisition and calculation are both immature. The term big data first appeared in a research report submitted by the famous McKinsey Company in 2011. In this famous research report, researchers from McKinsey Company have very strategically analyzed that big data technology will change the economic development of the whole world. Nowadays, with the rapid development of computer technology, network technology and database technology, big data technology is applied to the whole process of customer relationship management. At this stage, many foreign scholars have carried out research on the integration of customer relationship management and big data technology [7]. Some scholars believe that accurate customer search and precision marketing through big data in e-commerce is a major trend in the future. The customer satisfaction obtained by data mining is integrated, and effective improvement measures are finally proposed, which can also enhance the competitiveness of enterprises [8]. Nowadays, big data has been widely used in the information management process all over the world. Related research from the perspective of information systems development. In 2001, someone designed a CRM framework with seven parts and pointed out that this framework should include seven parts, which are customer behaviour database, acquisition of the data mentioned above sources, application of the database, customer preference and choice, customer acquisition, relationship marketing, CRM evaluation criteria, etc. Some companies have already applied a complete set of customer relationship management systems in their workflow [9]. Its basic function is to collect customer information in real-time, process related customer management services, better maintain customer relationships and increase customer activity. Some companies have added support vector machine technology to the customer relationship management system they developed, which has broadened the application scope of the customer relationship management system, and finally Optimization Strategy of Customer Relationship Management… Informatica 48 (2024) 189–204 191 can better mine customer data and transmit big data to the server to realize customer input, automation of business processing, customer discovery, and customer return visits. Some scholars believe that big data is an important tool for enterprises to create value in the future. Suppose future enterprises do not keep up with the pace of big data and cannot effectively organize and discover information that can create value from the information left by customers. In that case, this company that relies only on traditional experience will be unable to maintain old customers and acquire new customers efficiently, so that they will be eliminated. Big data technology is no longer the future for us. Now using data can help companies better analyze the market, better increase market share, improve profit margins and form new economic growth points, which is an important direction that every company should pay attention to [10]. The methods, results and shortcomings of existing CRM models and systems are shown in Table 1. However, the prediction of customer value in the above study is based on the assumption that customer behavior will not change, and does not dynamically measure the entire customer lifecycle. It is only based on the historical static indicator division, does not grasp the customer buying trend, does not consider the possibility of cross-selling opportunities customer growth and upgrading, will miss the potential and worth tracking quality customer groups. 3 Customer relationship management model based on big data 3.1 Customer relationship management process According to the customer life cycle theory, we have sorted out the entire customer contact process with the company. We have set up some external channels because there are too many online and offline channels. Customer management ability becomes particularly important in this complex customer acquisition and maintenance system [11]. Table 1: Methods, results and shortcomings of existing CRM models and systems Existing CRM models and systems Contact management Customer RFM classification model RBF neural network model The methods adopted Customer care with call centres to support data analysis Clustering and SOM neural network technology Establish customer evaluation index system Achieved result Customer relationship management sprouts Customer value was evaluated Categorize by category Shortcoming Only limited to the collection of customer information There is no dynamic measurement of the entire customer lifecycle There is no consideration of cross-selling opportunities for customer growth and upgrading Figure 1: The flow chart of customer acquisition, service and sales realization According to the flow chart (Figure 1) of customer- enterprise contact, we can divide customer relationship management into three aspects. The first is the customer information management system. The basis of customer information management is to collect customer information, including the customer's basic information, consumption information, consumption habits, credit value, etc. The second part of customer relationship management is operation information management. This part integrates multiple modules, such as policy information and competitor information. Aspects of the situation to plan a scientific and reasonable management policy for the development of the enterprise. It can be seen that factors such as the number of customers, information, business and market competition can greatly affect the final results Agent Diversion Store Customer Flow Partner Diversion New media Customer Service Sales Realization External Agency Services Platform Self-op erated Service Electronic Business Platform Market Operation 192 Informatica 48 (2024) 189–204 B. Zhang et al. of customer analysis and the direction of business operations. The third part is sales information management. Sales information management mainly manages customer and sales-related information, mainly product information, sales activity information, sales channels, and after-sales management. Through sales analysis, companies can better understand customer preferences and market dynamics and formulate better directions and strategies for maintaining customer relationships and increasing market share. The three parts of sales information management are shown in Figure 2. 3.2 CRM model based on data mining Before starting to dig and make plans, it is necessary to clarify the project objectives, be familiar with the business fields of relevant departments, have relevant knowledge background, understand the business content, determine the business object, and make feasibility analysis and evaluation of the project from the aspects of resource allocation, technology and economy. In the process of data mining, the preliminary data preparation work and the model evaluation of mining results are very important. The analysis objectives set at the beginning of this mining task have a great guiding role in the evaluation of the mining results, and then the novelty and validity of the discovered knowledge model are evaluated by relevant experts in the field. After evaluation by experts and machines, it is necessary to remove redundant or meaningless patterns from these patterns. Sometimes some patterns cannot meet the actual requirements or reach the ideal effect. In order to obtain an effective knowledge pattern, it is necessary to return and repeat the previous processing steps to extract the knowledge pattern repeatedly until a meaningful knowledge pattern is found, so as to discover more effective and accurate knowledge. Generally speaking, there are two types of data mining process models commonly used in academia, the process model summarized by Fayyad and the process model that follows the CRISP-DM standard. This paper adopts the Fayyad process model. Its process model mainly includes the following seven steps: data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation, and knowledge representation [12]. The procedure is shown in Figure 3. Figure 2: The three parts of sales information management Optimization Strategy of Customer Relationship Management… Informatica 48 (2024) 189–204 193 Figure 3: Fayyad process model First, the noise data is filtered. After the filtering work is completed, use the Clementine platform to filter and analyze the data to ensure the effectiveness of data processing on each platform. After obtaining good data processing nodes, integrate the data content to realize the processing of logical business information. The Apriori algorithm's basic idea is to first find all frequent itemsets in the original data set. The dominant thinking method of the algorithm is a recursive method based on the theory of frequency sets. It must meet the minimum requirements. Eliminate the association rules that do not meet the minimum confidence threshold, and the remaining association rules are strong association rules that satisfy both requirements. This algorithm needs to scan the transaction database many times, which takes a long time and needs to write a calculation program to complete the calculation. The algorithm runs for a long time when the program is executed, so the number of iterations of the algorithm cannot be too large, so further optimization of the program algorithm and related parameters may obtain more ideal results [13]. The algorithm uses a prior knowledge of the property of frequent itemsets: all non-empty subsets of frequent itemsets must also be frequent, and all supersets of infrequent itemsets are also infrequent itemsets. In this way, Apriori algorithm uses this prior knowledge, adopts the iterative method of layer-by-layer search, explores (k+1) item sets with frequent k item sets to identify all frequent item sets higher than the set support in the target data set, and then computs conditional probability to construct strong association rules that meet the set confidence in frequent item sets. This algorithm used the following two properties to reduce the search space. Property 1: Any non-empty subset of a frequent itemset is frequent. Property 2: Any superset of an infrequent itemset is infrequent. On the basis of the above two properties, the Apriori algorithm generates all itemsets through the following process: M1= .(frequentl-itemsets}; for (g=2J Ml- 1≠null ;l++) do begin Dl=apriori-gen (Ml--1) for every transaction u ∈E do begin du=subset(dl ,t) ; for every candidate D ∈Dl do C .count++ ; Although the Apriori algorithm is simple and accurate, it has certain defects in efficiency. Therefore, a derivative algorithm can be used to make up for the shortcomings of Apriori. In this paper, we choose the Apriori_RD as the algorithm to make up for the shortcomings of Apriori. The Apriori_RD algorithm operates on the database bits based on the logical "&" and mines and analyzes frequent itemsets and strong association rules. According to the operation process of the picking algorithm based on the Apriori property, the basis for the improvement of the Apriori_RD algorithm mainly includes the following three aspects: (1) L subsets of an l-item are frequent as well (2) If the value of l- is less than l+1, then l+1 does not exist. Data after preprocessing Data integration Preprocessing Result presentation and interpretation Data mining Data selection Data source Integrated data Target data Model Knowledge Data preparation Mining Result expression 194 Informatica 48 (2024) 189–204 B. Zhang et al. (3) The value of every item set repetitions in the l- candidate item set Cl generated by Ll-1 self-connection is l*(l-1)/3. During the execution process, set the minimum confidence data as 11% and set the maximum antecedent data as the relevant threshold of 2. The customer data clustering model is shown in Figure 4. The calculating process is divided into three steps. The flowchart is shown in Figure 5. Figure 4: Customer data clustering model based on apriori algorithm Figure 5: Apriori algorithm flowchart Aprior algorithm Raw data Data filtering Type of data Field filtering Field selection Data type settings Algorithm starts Is it greater than the minimum support Scan the database,counting each item Is it greater than the minimum support Apriori-gen calculation Apriori-gen calculation Scan the database,counting each item Is it greater than the minimum support Algorithm result 1st-order candidate itemset 1st-order frequent itemset 2nd-order candidate itemset 2nd order frequent itemset Frequent itemsets of order K K+1candidate Frequent itemsets Frequent itemsets K+1 Scan the database,counting each item Yes Yes Yes NO NO Optimization Strategy of Customer Relationship Management… Informatica 48 (2024) 189–204 195 After mining all the data that can make use of the database, the next step would be to gain the corresponding association rules. The following formula can be used to calculate the obtained confidence. 𝐶 𝑜𝑛𝑓 𝑖 𝑑𝑒 𝑛 𝑐𝑒 ( 𝐵 = > 𝐶 ) = 𝑄 ( 𝐶 ∣ 𝐵 ) = 𝑠 𝑢 𝑝 𝑝 𝑜𝑟 t _ 𝑢𝑚 ( 𝐵 ∪ 𝐶 ) / 𝑠 𝑢 𝑝 𝑝 𝑜 𝑟 t _ 𝑢𝑚 ( 𝐵 ) (1) 4 Customer relationship management system design 4.1 CRM system architecture According to the results of customer segmentation, enterprise resources should be allocated reasonably, and the operation means and business processes of enterprises should be carried out around the customer, so as to bring more benefits to enterprises and customers' loyalty and satisfaction to enterprises. The integration of enterprise information system with existing computer technology provides the conditions for the successful implementation of CRM. CRM realizes the informationization of business processes, that is, the integration of sales, marketing and customer service. The integration of customer communication modules and the automation of communication means are realized. The generated information is integrated and processed to achieve customer intelligence and give decision-making guidance for the business marketing skills and development strategies of enterprises. The Hadoop platform used in this project adopts a distributed computing cluster of big data all-in-one machines based on an X86 server, which is composed of multiple nodes [14]. Any query or processing request for platform data is processed by multiple nodes. When system capacity or processing capacity becomes the bottleneck, nodes can be added by "building blocks", and each data block exists on multiple data server nodes, which ensures data reliability. This greatly reduces the coupling between the business layer and the data layer, improves the scalability and maintainability of the system, and improves the development efficiency of the system [15]. The system architecture has good scalability, which ensures the dynamic expansion of cloud platform services and the rapid launch of new services. The immutable nature of distributed data storage also enhances the credibility of the data, thereby increasing the platform's credibility. In the specific design, the big data is first cut into pieces. Then according to the concept of distributed decentralized storage, these pieces of data are stored in different clusters or computers of the Hadoop system respectively. When reading system data, this distributed storage structure can freely realize parallel extraction of related diced files from different machines or clusters. The logical structure of the project is shown in Figure 6. Figure 6: Logical architecture design Transaction record query Marketing performance analysis Service bus (switching,routing,forwarding) Socket Marketing Management Data source Store Marketing Agent Diversion Partner Diversion New media Backend system RDBMS Model Algorithm Self-study Decision Rule Real-time decision engine Hadoop Hive MapReduce HDFS Batch processing Customer Transaction details Real-time processing Marketing Resules Marketing Rules Cleaning filtering convert statistics Data mining Machine learning Customer tag Customer grouping ETL/FTP Nosql DB Http 196 Informatica 48 (2024) 189–204 B. Zhang et al. The data source layer is the provider of Hadoop platform data. There are multiple data sources in the system, and data is collected and stored in different ways. This requires complete or differential extraction from different data platforms, including various data source tables, databases, and data files of various forms. The data are distributed in different regions, run in different system environments, and the data formats produced are also different. The company's data sources mainly include the following components: customer information of stores, customer data brought by all media, external partners, etc., and data generated in the process of operating services. A lot of business data comes from suppliers, which can also be used to manage suppliers. The full data file can be exported from the private data analysis platform and loaded into the Hadoop platform to obtain the stored historical data. For daily data acquisition requirements, the business system is the main source of data. Through the combination of full complement and incremental, the source system provides corresponding data according to relevant interface specifications and downloads data through the data download platform. The data is transmitted in the form of text. Enterprises generate a large amount of enterprise data in the business development process, providing a reliable analysis basis for enterprise management and decision-making. Companies can use this data to conduct precision marketing or provide data for other companies or units to gain profits. The batch processing layer is composed of the Hadoop platform, No SQL database (SequoiaDB), and Oracle database combined with ETL architecture. The batch processing layer architecture diagram is shown in Figure 7. The linear expansion and low cost of the Hadoop platform solve the shortcomings of the traditional architecture (IOE) for building a data platform [16]. The introduction of the Hadoop platform has greatly improved the elastic space for data storage and processing performance expansion. The distributed file system (Hadoop Distributed File System, HDFS) in Hadoop can store a large amount of structured and unstructured data by means of distributed storage. The distributed computing model MapReduce simplifies the complex data processing and computing process and is very suitable for analyzing and processing huge, complex and disordered data. The data in the distributed storage server is not whole but a piece of data, which can better protect the security of the data. In addition, it can help businesses and enterprises create large data centres without increasing costs by utilizing efficient processing power. The cooperation of these tools enables the Hadoop platform to exchange data with existing database systems and become a platform with comprehensive functions in the field of massive data computing, such as file systems, computing models, database systems, and management modules. The Hadoop platform processes and stores data from business systems through Hive and Hbase components. The result set processed by the Hadoop platform is imported into the Oracle database for use by the real-time decision engine, and other result sets, such as the after- sales information thin and wide table, are imported into the NoSQL database. NoSQL is a non-relational database that can store different types of structured and unstructured data [17]. It is very suitable for accessing massive data and massive customer information. The database is a dynamic database, and the underlying architecture is a distributed, highly integrated, and high-response-speed architecture, which is very suitable for relational and non-relational data with repeated records. The NoSQL database adopts a more flexible data model than the relational one to achieve real-time data analysis, good usability and scalability in a big data environment. Each data mirror of a NoSQL database is stored in a different location, ensuring data availability and non-loss, with high availability and flexibility. By carrying out ETL architecture design, the main purpose is to provide a long-term foundation and architecture so that it can meet actual needs as business needs grow and change. In the ETL development process, the SAP data service tool is used for data extraction, conversion and integration process design. The data in the business system is extracted into the ODS and data warehouse, and the automatic data loading processing is realized in a graphical way in the design process, shown in Figure 8. Figure 7: Batch processing layer architecture Optimization Strategy of Customer Relationship Management… Informatica 48 (2024) 189–204 197 Figure 8: ETL technology architecture The real-time processing layer mainly includes two parts, the real-time decision engine and the service bus. Real-time data processing is the default business logic of the data processing system and is also the core part of the data processing system. The key part of the project is the real-time decision engine, including modules such as models, algorithms, rules, decision-making, and self- learning [18]. Business logic is centrally located, interacting with all data sources to meet actual business needs. The real-time decision engine adopts the Bayesian algorithm and product recommendation prediction model. The driving factors of the Bayesian prediction model are derived from the wide table of customer information processed by the Hadoop platform. The real-time decision-making engine receives customer information from various channels such as stores, external partners, and new media drainage, performs analysis and calculates recommended products based on customer information, product purchases, and historical marketing campaign results, and connects marketing channels through the service bus. Customers push product recommendation information, conduct comprehensive analysis and prediction on customer needs, and fully tap customer needs. On this basis, collect, sort out and analyze customers' individual needs and carry out demand forecasting. At the same time, it combines data visualization tools and decision optimization algorithm tools to provide users with decision support. The real-time decision engine receives the user's purchase product information downloaded from the business system. Further, it optimizes the Bayesian prediction model through its self-learning function so that its recommendation and success rates are getting higher and higher. The model mainly predicts the response degree of target customers to a certain product or service based on basic customer information and historical transaction data to carry out targeted marketing, improve the marketing response rate and reduce marketing costs. At the same time, collect a large amount of data for product feedback, update and adjust products, provide products that are more in line with customer needs, realize the visualization of products and services, and meet the service needs of different customers. The design of the real-time decision engine is shown in Figure 9. Data file acquisition Data file decompression Data loading Data down-load platform Core system business data Company featured data Other data FTP data directory Extract load processing Tempo- rary data area Standardize Data Update Data Append Detailed data processing (HiveQL) Detail data area Aggregate data generation Aggregate data processing (HiveQL) Aggre- gate data area Interface data area App Mart Interface data processing Interface data generation Data quality check Export data processing (HiveQL) Extract Cleaning Convert Load ETL process scheduling library/Data quality management library Log management engine Task scheduling engine ETL process scheduling ETL Management and Monitor Task Core service components Unit 1 Unit 2 Unit N Task execution agent Detector 198 Informatica 48 (2024) 189–204 B. Zhang et al. Figure 9: Schematic diagram of the real-time decision engine 4.2 Customer relationship management system application In this part, we take Company K as an example. Company K is mainly engaged in the communication business. In the modern era of rapid development of communication technology, value-added services such as mobile data communication have brought high development opportunities to Company K. With the introduction of new network technologies, the introduction of new business requirements, and the construction of new operation management systems, these will become the inevitable trend of enterprise development. Under such a background, precision marketing emerges as a new and efficient marketing method. It continuously integrates advanced marketing concepts and, through a high degree of integration of information, helps enterprises to accurately locate customers and achieve precision marketing, which can not only reduce the cost of enterprises' marketing costs and improve the economic benefits of enterprises. Nowadays, the competition for value-added services of various operators is very fierce. In order to better enhance the operator's customer value and provide new growth points for operating income, Company K must improve its logical business and design new business strategies and development paths, forming a new growth point. In the process of applying the model to the instance of Company K, we found that the Apriori algorithm is very inefficient when searching for a large number of frequent itemsets during the running process. For example, for a data item with n data items, the data items owned by the frequent item set are 2o-1, so the huge amount of data causes the algorithm to run a very large amount of computation. After many experiments, it can be proved that the newly generated candidate item sets are infrequent. At the same time, the number of candidate item sets can be reduced as much as possible and verified. The number of itemsets further improves the number of candidates itemsets subsets. In addition, the Apriori algorithm adopts the depth-first search strategy during the running process. The depth-first search strategy can deeply search the candidate item set in the database. The algorithm scans the database repeatedly during the search process, which can consume a lot of algorithms running space and practice. Regarding data preparation, Company K has data information on a number of value-added services. After screening, the value-added services left in this paper are shown in Table 2. 75% 25% 50% 50% Entities Choices Decisions Business Rulees precictive Models Recommenddation(s) to external applications Performance Goals 75% 25% 50% 50% Optimization Strategy of Customer Relationship Management… Informatica 48 (2024) 189–204 199 The Apriori algorithm analyzes the data to obtain the association rules between value-added services. The specific association rules are shown in Table 3. From the analysis in the table, we can conclude that the correlation between mobile game data information and mobile video data information is the strongest, followed by mobile phone report data information and mobile phone reading data information. Because the data processed by the Aprior algorithm is of Boolean type, Boolean data is more suitable when using symbols such as true and false, 01, etc., which can focus on describing the data structure, thereby showing the relationship between the data. At the same time, it can realize the data analysis. Specific description. To better describe the relationship between the numerical values processed by the algorithm, numerical data is used to describe the related variables, and the algorithm is used to perform related data mining operations. Add the network graph node after the type node, and select the fields participating in the network graph construction. This article uses the thickness of the connecting line to reflect and demonstrate the strong and weak relationship between businesses. The resulting correlation diagram is shown in Figure 10. Table 2: Business of Z company found by association rules Company Z's major business Mobile game ordering identification information Weather forecast order identification information Mobile video ordering identification information Wireless music order identification information Mobile news order identification information Mobile securities order identification information Mobile phone reading order identification information Table 3: Association rules find the required value-added services Latter Former Support (%) Confidence (%) Game Video 3.21 24.34 Video Game 6.2 12.34 Weather forecast Wireless Music 4.3 12.32 Game News 2.3 14.21 Weather forecast News 2.5 10.22 Game Reading 1.77 14.23 Video Reading 1.77 13.54 News Securities 0.53 22.08 Game Securities 0.53 17.23 Video Securities 0.53 13.22 Figure 10: Associated network of value-added services Mobile Game Mobile News Mobile video Mobile Reading Weather Forcast Wireless Music Mobile Securities 200 Informatica 48 (2024) 189–204 B. Zhang et al. The association network of value-added services can effectively display the associated relationship between a value-added service, enabling us to conduct data mining and analysis through an intuitive knowledge graph. The relevant right-side table can describe the relevant left-side business relationship so that the relationship between each mobile phone can be better represented to display the relationship between the data. At the same time, it can realize the specific description of the data. In the above figure, the value-added services with the largest number of user links are the subscription of mobile game value- added services and mobile video value-added services, with a value of 344. Secondly, the number of users of wireless music is 123. The analysis shows that the current marketing direction of K company should focus on these two aspects. For K company, it can not only carry out precise marketing through the analysis results but also increase the sales volume of this business, which is also very important for K company's mobile phone sales. This paper takes K company as an example to apply the customer management model based on big data. It applies this model to the enterprise OA management system to improve its practicability. And the sales of company K within one year after applying the optimized customer management system are counted. It can be seen from Figure 11 that the sales of company K in 2022 will increase significantly compared with the sales in 2021, and the growth trend is obvious. Good customer relationships are the reason why the company's performance continues to grow, and it is also the reason why the company is becoming more and more influential in the market. New customers have opened up new development space for the company and brought new growth points for the company. Stable customer resources have laid the foundation for the company's stable development. At the same time, the company is also actively exploring new technologies in technology research and development, actively adopting new technologies, and launching new products in a timely manner according to market demand to meet the needs of consumers. Management to enhance customer value. With the improvement of the company's informatization level, the company's operation efficiency has been continuously improved, which has promoted the improvement of the company's performance, making the company's operation enter a virtuous circle. Figure 11: Company K's 2020-2021 sales comparison chart 0 1 2 3 4 5 6 7 2020 sales 2021 sales (million) Optimization Strategy of Customer Relationship Management… Informatica 48 (2024) 189–204 201 5 Discussion With the development of economy, the needs of customers are diversified, changeable and personalized, and the complex relationship between customers and enterprises is growing exponentially. The existing enterprise customer relationship management system has been unable to fully meet the management needs of the actual development of enterprises. In the era of the Internet and big data, how to intelligently and automatically dig out potential and valuable business models from the complex historical transaction information? It is an inevitable trend that the focus of enterprise customer relationship management shifts from operation to analysis and intelligent decision making. In recent years, the field of enterprise customer relationship management is undergoing revolutionary changes. The extensive application of mathematical models and the rapid development of computer technology have laid a good foundation for dealing with the complex customer relationship of enterprises. The emergence of data mining technology provides a new opportunity to improve the level of enterprise customer relationship management. Therefore, it is necessary to make full use of enterprise information resources, actively promote the transformation of enterprise management mode from product-centered to customer-centered, combine intelligent knowledge mining technology with enterprise customer relationship management system, improve the utilization of information resources to the advanced stage of knowledge innovation, and realize scientific enterprise management decision-making. It is of great practical significance for enterprises to stand out in the fierce market competition. The research of this paper aims to realize the scientific and intelligent analysis method of customer relationship management data and information, and improve the level of enterprise customer relationship management. It intends to analyze the current situation and demand of enterprise customer relationship management, integrate the knowledge of logistics field, and comprehensively apply the intelligent technology of data mining. From the aspects of data mining process framework construction, association rule mining implementation, association rule analysis, customer relationship management improvement, etc., this paper conducts a systematic and comprehensive in-depth research on enterprise customer relationship management data knowledge mining, establishes a preliminary application framework of enterprise customer relationship management data mining method and implements its application. It effectively breaks the subjectivity and efficiency limitation of manual prediction of customer potential demand, and promotes the improvement of enterprise customer relationship management with the help of intelligent technology. The research of this paper has great theoretical and practical significance for the research and application of enterprise customer relationship management theory and intelligent technology method of association rules mining. This study initially establishes a set of systematic methods and technical routes for enterprise customer relationship management knowledge mining, and improves the scientificity of customer relationship management data and information analysis methods. The method system of customer relationship management knowledge mining process framework is described from the aspects of enterprise business goal definition, transaction data preprocessing, rule mining and result analysis. The association rules mining process framework is effectively applied to the specific field of enterprise customer relationship management, and the excavated association rules can provide scientific and intelligent support for enterprise management decision-making, help enterprises find customer purchasing trend, accurately predict customer potential demand, guide cross-selling, promote the development of customer value, enhance enterprise competitiveness, and improve the scientific management level of enterprises. Using the transaction data information effectively, a new customer value analysis method is put forward from another angle. Based on the analysis of the association rules mined by Apriori algorithm, this paper selects the purchasing pattern characteristics of customer groups that are more likely to purchase high-margin services and increase the net present value of customers' life cycle, and puts forward another method to identify the potential value brought by high-quality customers from the future purchasing trend. 6 Conclusion In this paper, the basic process of customer management is analyzed, and the Aprior algorithm is chosen to build a CRM model based on data mining. In addition, this paper designs a customer relationship management system based on big data. The system is mainly divided into three layers, the data source layer, batch processing layer and real-time processing layer. In the part of constructing the system architecture, this paper adopts the Hadoop platform. In the batch processing layer, it is consisted of four parts, which include No SQL database, Oracle database, ETL architecture, and Hadoop platform. This paper gives the logical architecture design diagram. The real-time processing layer mainly includes a real-time decision engine and service bus. The key part of this layer is the real-time decision engine, in the design of which the Bayesian algorithm and product recommendation prediction model are used. Finally, this paper takes K company as an example to demonstrate the model and management system. After applying the analytical model and management system, the sales of K company keep increasing. The research scope of big data is very wide, and the variables of customer relationship management in the era of big data need to be explored from multiple perspectives and dimensions. This paper only analyzes several aspects of customer relationship management from the management perspective, and the scope and depth of the research are far from enough. It is hoped that 202 Informatica 48 (2024) 189–204 B. Zhang et al. with the deepening of research in the future, more dimensional research can be done on strategies to improve customer management relationships. The data sets in the customer relationship management system of enterprises are characterized by huge capacity, different abstraction layers and multi- dimensional structure. The Apriori algorithm adopted in this paper takes a long time to calculate and has limited application scope. The subsequent improvement of the algorithm can make the mining results more efficient and accurate. Competing of interests The authors declare no competing of interests. Authorship contribution statement Baohua Zhang: Writing-Original draft preparation, Conceptualization, Supervision. Sanbao Zhang: Language review, Project administration Chaojie Zhang: Methodology, Software, Validation Data availability On Request Declarations Not applicable Conflicts of interest The authors declare that there is no conflict of interest regarding the publication of this paper. Author statement The manuscript has been read and approved by all the authors, the requirements for authorship, as stated earlier in this document, have been met, and each author believes that the manuscript represents honest work. Funding 1). The Major Project of Humanities and Social Sciences of University in Anhui, A Study on the Path of Improving, Transforming, and Upgrading Tourism Consumption in Anhui Province under the New Pattern of Double Circulation, SK2020ZD41. 2). The Major Project of Scientific Research Compilation Plan of University in Anhui, The Mechanism and Path Selection of Digital Inclusive Finance to Promote the Structural Upgrade of Anhui Manufacturing Industry,2022AH040245. Ethical approval All authors have been personally and actively involved in substantial work leading to the paper, and will take public responsibility for its content. R efer ence s [1] N. Capuano, L. Greco, P. Ritrovato, and M. Vento, “Sentiment analysis for customer relationship management: an incremental learning approach,” Applied intelligence, 51: 3339–3352, 2021. https://doi.org/10.1007/s10489-020-01984- x [2] B. Melović, B. Rondović, S. Mitrović-Veljković, S. B. Očovaj, and M. Dabić, “Electronic customer relationship management assimilation in Southeastern European companies—cluster analysis,” IEEE Trans Eng Manag, 69(4): 1081– 1100, 2020. https://doi.org/10.1109/TEM.2020.2972532 [3] M. Parast and D. Golmohammadi, “The impact of firm size and business strategy on response to service disruptions: evidence from the US domestic airline industry,” IEEE Trans Eng Manag, 69(5): 1944–1957, 2020. https://doi.org/10.1109/TEM.2020.2994828 [4] R. Zhang, W. Chen, T.-C. Hsu, H. Yang, and Y.- C. Chung, “ANG: a combination of Apriori and graph computing techniques for frequent itemsets mining,” J Supercomput, 75: 646–661, 2019. https://doi.org/10.1007/s11227-017-2049-z [5] A. Kumar et al., “Effects of HTL and ETL thicknesses on the performance of PQT- 12/PCDTBT: PC 61 BM/ZnO QDs solar cells,” IEEE Photonics Technology Letters, 32(12): 677– 680, 2020. https://doi.org/10.1109/LPT.2020.2991536 [6] M. Zaharia et al., “Apache spark: a unified engine for big data processing,” Commun ACM, 59(11): 56–65, 2016. https://doi.org/10.1145/2934664 [7] Y. Lei, F. Jia, J. Lin, S. Xing, and S. X. Ding, “An intelligent fault diagnosis method using unsupervised feature learning towards mechanical big data,” IEEE Transactions on Industrial Electronics, 63(5): 3137–3147, 2016. https://doi.org/10.1109/TIE.2016.2519325 [8] S. Athey, “Beyond prediction: Using big data for policy problems,” Science (1979), 355(6324): 483–485, 2017. https://doi.org/10.1126/science.aal4321 [9] Y. Chen, J. D. E. Argentinis, and G. Weber, “IBM Watson: how cognitive computing can be applied to big data challenges in life sciences research,” Clin Ther, 38(4): 688–701, 2016. https://doi.org/10.1016/j.clinthera.2015.12.001 [10] K. Zheng, Z. Yang, K. Zhang, P. Chatzimisios, K. Yang, and W. Xiang, “Big data-driven optimization for mobile networks toward 5G,” IEEE Netw, 30(1): 44–51, 2016. https://doi.org/10.1109/MNET.2016.7389830 [11] X. Wang, Y. Zhang, V. C. M. Leung, N. Guizani, and T. Jiang, “D2D big data: Content deliveries over wireless device-to-device sharing in large- scale mobile networks,” IEEE Wirel Commun, 25(1): 32–38, 2018. https://doi.org/10.1109/MWC.2018.1700215 Optimization Strategy of Customer Relationship Management… Informatica 48 (2024) 189–204 203 [12] Y. Chen and Y. Chi, “Harnessing structures in big data via guaranteed low-rank matrix estimation: Recent theory and fast algorithms via convex and nonconvex optimization,” IEEE Signal Process Mag, 35(4): 14–31, 2018. https://doi.org/10.1109/MSP.2018.2821706 [13] M. Asch et al., “Big data and extreme-scale computing: Pathways to convergence-toward a shaping strategy for a future software and data ecosystem for scientific inquiry,” Int J High Perform Comput Appl, 32(4): 435–479, 2018. https://doi.org/10.1177/1094342018778123 [14] N. Zhang, P. Yang, J. Ren, D. Chen, L. Yu, and X. Shen, “Synergy of big data and 5G wireless networks: opportunities, approaches, and challenges,” IEEE Wirel Commun, 25(1): 12–18, 2018. https://doi.org/10.1109/MWC.2018.1700193 [15] Z. Chang, L. Lei, Z. Zhou, S. Mao, and T. Ristaniemi, “Learn to cache: Machine learning for network edge caching in the big data era,” IEEE Wirel Commun, 25(3): 28–35, 2018. https://doi.org/10.1109/MWC.2018.1700317 [16] Y. Zhao, S. Bin, and G. Sun, “[Retracted] Research on Information Propagation Model in Social Network Based on BlockChain,” Discrete Dyn Nat Soc, 2022(1): 7562848, 2022. https://doi.org/10.1155/2022/7562848 [17] H. Hong, P. Tsangaratos, I. Ilia, J. Liu, A.-X. Zhu, and W. Chen, “Application of fuzzy weight of evidence and data mining techniques in construction of flood susceptibility map of Poyang County, China,” Science of the total environment, 625: 575–588, 2018. https://doi.org/10.1016/j.scitotenv.2017.12.256 [18] M. Sattarian, J. Rezazadeh, R. Farahbakhsh, and A. Bagheri, “Indoor navigation systems based on data mining techniques in internet of things: a survey,” Wireless Networks, 25: 1385–1402, 2019. https://doi.org/10.1007/s11276-018-1766-4 204 Informatica 48 (2024) 189–204 B. Zhang et al. Optimization Strategy of Customer Relationship Management… Informatica 48 (2024) 189–204 205