DIKWP -based Semantic Space Intellectual Property Infringement



DIKWP -based Semantic Space Intellectual Property Infringement

通用人工智能AGI测评DIKWP实验室

2025-11-01

DIKWP -based Semantic Space Intellectual Property Infringement Identification and Detection

Yucong Duan

International Standardization Committee of Networked DIKWPfor Artificial Intelligence Evaluation(DIKWP-SC)

World Artificial Consciousness CIC(WAC)

World Conference on Artificial Consciousness(WCAC)

(Email: duanyucong@hotmail.com)

I. Research Background and Significance

In recent years, with the deepening of technological innovation and intensified global competition, the number of patent applications and grants has continued to grow. Statistics show that from 2014 to 2020, the number of patent grants in China rose steadily. Although there was a slight decline after 2020, it remains at a historical high. At the same time, patent infringement dispute cases have also shown explosive growth, increasing from 7,671 cases in 2014 to 67,375 cases in 2024, a nearly 9-fold increase in ten years. This indicates that while technological innovation is increasingly active, the accompanying risk of infringement has also significantly increased. On the other hand, traditional infringement detection methods, which mainly rely on manual comparison or keyword retrieval, struggle to capture deep semantic associations and innovative intent across different technical fields. In summary, in the era of big data and artificial intelligence, how to efficiently and accurately identify potentially infringing patents has become an urgent need for intellectual property protection. Some scholars have pointed out that AI tools can "accelerate and enhance" tasks such as classification and retrieval in the patent life cycle, improving examination efficiency. Therefore, utilizing advanced semantic technology to build an automated infringement identification system is of great significance and can provide technical support for maintaining an innovative ecosystem and market order.

II. Literature Review

The DIKWP (Data-Information-Knowledge-Wisdom-Purpose) model, proposed by Yucong Duan et al., is a five-dimensional semantic cognitive framework designed to simulate the human process from perceiving data to making wise decisions and then to purpose-driven behavior. This model has already been applied in fields such as medical diagnosis and human-machine consciousness, and its formal semantic definitions and reasoning mechanisms have been proposed. Notably, the DIKWP model differs from the traditional DIKW model by adding the "Purpose" ( P ) layer to express the value orientation and goals of a technical solution or behavior. Research shows that the DIKWP model forms a closed-loop semantic topology (Data → Information → Knowledge → Wisdom → Purpose → Data), ensuring the internal logical self-consistency and circular closure of the cognitive space. That is, starting from any data, after reasoning and mapping, the result remains within the same semantic space, avoiding "semantic escape." This self-consistent semantic mathematical foundation ensures the system can continuously iterate and enrich knowledge without self-contradiction. Furthermore, work has been done to extend the DIKWP model to the TRIZ invention methodology, forming the DIKWP-TRIZ framework. By incorporating the five DIKWP elements into traditional TRIZ principles, it provides a unified model for incomplete problems and the innovation process. This framework emphasizes the key role of the "Purpose" dimension in technological innovation and ethical considerations, striving to achieve a balance between technological advancement and social well-being.

In the field of patent analysis, various intelligent methods based on semantics have emerged. Scholars use technologies like knowledge graphs and semantic embedding for in-depth analysis of patent texts, for example, by extracting Subject-Action-Object (SAO) structures from patents to calculate technical similarity. A recent review pointed out that Large Language Models (LLMs) and the Transformer framework show strong potential in patent classification, retrieval, and value assessment. They can be pre-trained on large-scale corpora and then retrained on patent corpora to cope with differences in domain terminology. In addition, Graph Neural Networks (GNNs) are often used to construct and embed patent knowledge graph nodes to capture complex relational structures. DIKWP2Vec, as a novel algorithm, transforms the five-dimensional semantic representation of DIKWP into a vector space for quantitatively comparing semantic similarity between patents. In summary, this project intends to combine existing DIKWP theory with the latest deep learning technologies to fill the research gap in patent semantic infringement detection and to improve detection automation and accuracy.

III. Research Content and Objectives

This research adopts a phased strategy:

Phase 1 (2025.11–2026.12): Construct a semantic space based on a networked DIKWP model, using over 240 publicly available patents. The main tasks include:

Mathematical Definition and Inferential Derivation of the DIKWP Model. Formally define the semantic meanings of the five layers: Data, Information, Knowledge, Wisdom, and Purpose. Introduce mathematical symbols and mapping functions to describe hierarchical transformations, such as I= f DI ( D ) , K= f IK ( I ) , W= f KW ( K ) , P= f WP ( W ) , etc., and deduce the inferential logic between layers.

Integration of Semantic Modeling and Transfer Mechanisms. Combine DIKWP with various technologies: use the DIKWP2Vec algorithm to vectorize patent text into five-dimensional semantics; employ deep models like Transformer/LLM for text encoding and feature extraction; perform embedding learning on the constructed patent knowledge graph via GNN, thereby achieving feature fusion and cross-level transfer within the semantic space.

The DIKWP×DIKWP Paradigm and Self-Consistency Mechanism. Study the mathematical principles of the DIKWP model generating a closed semantic space through the Cartesian product on the type and instance layers. Prove the closure and completeness of the cognitive space under semantic operations, i.e., that any reasoning closed-loop does not produce semantic drift, ensuring the consistency and robustness of the five-dimensional semantic representation.

Empirical Analysis of Patent Scenarios. Select representative patent cases from typical fields (AI algorithms, healthcare, autonomous driving) and apply the constructed DIKWP semantic model for simulation experiments: vectorize and calculate similarity using DIKWP2Vec to analyze the potential infringement risk between candidate patents and target patents. Draw on research in AI-driven patent analysis methods to verify the DIKWP framework's ability to align technical details and capture innovative intent.

Phase 2 (2027.1–2027.5): Conduct research on semantic innovation and ethical embedding based on the DIKWP-TRIZ model. The main tasks include:

DIKWP Modeling of New Patent Applications and Development of Innovation Tools. Perform multi-dimensional semantic analysis on over 100 new patent applications by Professor Yucong Duan using the DIKWP-TRIZ framework. Combine with TRIZ invention principles to guide the discovery and evaluation of innovative solutions, and develop intelligent creative assistance tools to enhance patent mining and design efficiency.

Embedding of Ethics and Value Frameworks. Introduce multi-dimensional ethical evaluation into the technical system: draw on consciousness measurement methods like IIT (Integrated Information Theory) to assess the complexity and transparency of the AI system; incorporate CARE ethical principles (Ethics of Care) and the Moral Machine (autonomous driving ethical dilemma) framework to ensure innovative outcomes meet social responsibility requirements. Design a specific evaluation indicator system, such as fairness, transparency, trustworthiness, etc., and integrate it into the system's feedback loop to promote the system's evolution towards being "warm and responsible."

Research Objectives: Construct a patent semantic infringement detection and innovation support system that integrates DIKWP theory and advanced AI technologies, achieving the following expectations:

Perform multi-dimensional semantic modeling on existing target patents and external candidate patents to achieve accurate quantitative measurement of technical similarity and infringement risk assessment.

Develop a prototype tool that supports patent innovation assistance (based on DIKWP-TRIZ) and automatic infringement scanning, with a certain degree of engineering feasibility.

Integrate ethical norms to realize value-based evaluation and constraints on AI decisions, promoting the transformation of intellectual property technology towards responsibility.

IV. Technical Route and Key Methods

This research adopts the technical path of "Concept Space → Semantic Space → Evaluation Output," as shown in Figure 3. It mainly includes the following key links:

Formal Definition of the DIKWP Model: In accordance with the DIKWP philosophy, mathematically define the five semantic layers. The Data layer ( D ) is defined as a set of raw materials D= d 1 ,…, d n ; the Information layer ( I ) consists of organized and processed data, formalized as I= i 1 , i 2 ,… , obtained through the mapping function I= f DI ( D ) ; the Knowledge layer ( K ) is represented as a concept-relationship network K= ( E,R ) , with the mapping K= f IK ( I ) ; the Wisdom layer ( W ) is the set of strategies generated from the application of knowledge W= w 1 , w 2 ,… , derived through W= f KW ( K ) ; the Purpose layer ( P ) is represented as a goal description P=p , formed through P= f WP ( W ) . Thus, the complete hierarchical mapping chain can be expressed as I= f DI ( D ) →K= f IK ( I ) →W= f KW ( K ) →P= f WP ( W ) , where the transformations and reasoning between layers follow corresponding semantic logic.

DIKWP2Vec Semantic Vectorization: Based on the model framework above, design the DIKWP2Vec algorithm to map patent text into semantic space vectors. The process includes:

Semantic Element Extraction: Use natural language processing and knowledge extraction on the full patent text to extract content related to the five dimensions: Data, Information, Knowledge, Wisdom, and Purpose. For example, numerical parameters and experimental results in the patent are classified as the Data dimension; sentences involving method flows are classified as the Information dimension; core technical principles and models as the Knowledge dimension; performance optimization strategies as the Wisdom dimension; and the invention's purpose and value proposition as the Purpose dimension.

Semantic Embedding Representation: Use appropriate models to generate vectors for the text of each dimension: pre-trained large models (like BERT, GPT) or specially trained word vector models can be used to process various corpora, obtaining data vector v D , information vector v I , knowledge vector v K , wisdom vector v W , and purpose vector v P . Since the semantic types of each dimension are different, different feature representation methods (such as TF-IDF, relational embedding, etc.) can be selected.

Dimensional Vector Fusion: Concatenate the vectors of the five dimensions in a fixed order, and perform normalization/weighting processing to obtain a unified five-dimensional semantic vector V= [ v D ; v I ; v K ; v W ; v P ] . Through this process, the patent text is quantitatively represented while maintaining its semantic structure.

Similarity Calculation: For the semantic vectors V 1 , V 2 of two patents, use cosine similarity to measure their semantic proximity (Formula 1):

cos ( V 1 , V 2 ) = V 1 ⋅ V 2 | V 1 || V 2 |

Where a cosine value closer to 1 indicates the patents are more semantically similar. Based on this, independent similarity for each dimension can also be calculated and fused into a comprehensive infringement index.

System Architecture Design: The overall system is divided into three modules: data collection, semantic mapping, and similarity calculation:

Module 1: Target Patent Data Collection. For the target institutions or inventors to be analyzed, retrieve and collect all their relevant patent information (including structured attributes and unstructured text), and perform preprocessing (denoising, normalization, completion) to build a complete dataset.

Module 2: Candidate Patent DIKWP Mapping. For potential infringement sources (enterprises or individuals, etc.), collect their patent library and apply the DIKWP mapping method to parse the original patent text into multi-dimensional concept space resources: extract the technical elements and process relationships of the patent as Data resources, capture text features and patterns as Information resources, build knowledge graph nodes/relationships as Knowledge resources, form Wisdom resources by combining context and value judgments, and extract the innovation's purpose to form Purpose resources. Subsequently, perform DIKWP2Vec transformation to vectorize these concept resources and generate a unified semantic representation.

Module 3: Semantic Similarity Calculation and Infringement Assessment. Use the target patent and candidate patent semantic vectors generated by Module 1 and Module 2 to calculate technical similarity; construct an infringement index function based on the calculation results and semantic features. Specifically, a comprehensive score function f risk can be defined, fusing multi-dimensional similarities (as shown in Formula 2):

RiskScore= f risk ( D sim , I sim , K sim , W sim , P sim )

Where D sim , I sim , K sim , W sim , P sim respectively represent the similarity measures of the five semantic dimensions. Set a threshold to distinguish high risk, filter out potential infringing patent pairs, and generate a visual risk assessment report.

Key Technical Difficulties and Innovation Points: The formalization and self-consistency proof of DIKWP semantic mathematics is one of the theoretical difficulties. Decomposing patent semantics into five dimensions and vectorizing them involves deep NLP and knowledge graph construction, requiring the comprehensive application of technologies like Transformer, GNN, and LLM. The innovation of this research lies in proposing the DIKWP*DIKWP closed-loop semantic paradigm (cross-mapping of multi-level semantic spaces) and designing the DIKWP2Vec vector algorithm to apply this paradigm to patent semantic alignment. At the same time, the system integrates the TRIZ innovation method and an ethical evaluation framework, achieving an organic combination of infringement detection and value assessment.

Pseudocode Example: DIKWP2Vec Vectorization Processing

Input: Patent text `text`

Output: Five-dimensional semantic vector ` V= [ v D ; v I ; v K ; v W ; v P ] `

1. Extract five-dimensional semantic units:

data_text = extract_data_related(text)

info_text = extract_information_related(text)

knowledge_text = extract_knowledge_related(text)

wisdom_text = extract_wisdom_related(text)

purpose_text = extract_purpose_related(text)

2. Generate semantic vectors:

` v D ` = Vectorize(data_text) # e.g., TF-IDF or numerical encoding

` v I ` = Embedding(info_text) # e.g., BERT encoding

` v K ` = GraphEmbed(knowledge_text) # e.g., Graph embedding

` v W ` = SemanticFeat(wisdom_text) # e.g., Rule-based reasoning output

` v P ` = Embedding(purpose_text) # e.g., Target vector generated by LLM

3. Normalize and fuse:

Standardize ` v D `, ` v I `, ` v K `, ` v W `, ` v P `

` V ` = concatenate([` v D `, ` v I `, ` v K `, ` v W `, ` v P `])

4. Return ` V `

V. Ethics and Value Framework Embedding

To ensure the "warmth and responsibility" of technological development, this research introduces a multi-dimensional ethical evaluation mechanism into the system design. Specific practices include: embedding Integrated Information Theory (IIT) metrics at the algorithm level to analyze the complexity and transparency of system decisions; incorporating Ethics of Care (CARE) principles at the requirements level to focus on the interests and fairness of affected groups; and combining Moral Machine experiment rules at the application level to impose moral constraints on decision-making outcomes in scenarios like autonomous driving. By evaluating the system's output results against ethical indicators (such as fairness, justice, trustworthiness, etc.), we ensure that the infringement detection and innovation tools are both technically effective and aligned with social values. In addition, indicator weights and feedback mechanisms will be set in the evaluation system, allowing the system to continuously optimize during operation and correct for potential biases and adverse effects.

VI. Expected Achievements and Innovations

This project plans to complete the following main achievements:

Construct a DIKWP-based Patent Semantic Model: Including mathematical definitions, reasoning formulas, and semantic transformation algorithms, forming a complete five-dimensional semantic analysis methodology.

Implement a Prototype System for Patent Semantic Infringement Detection: Integrating technologies like DIKWP2Vec, knowledge graphs, and Transformer to conduct comparative analysis on case patents and candidate patents, generating a list of high-risk infringing patents and a risk report.

Develop a Semantic Innovation Assistance Tool: Based on the DIKWP-TRIZ model's invention creativity module, to help inventors discover design change solutions.

Propose an Ethical Evaluation Indicator System: Design the system's ethical embedding mechanism and provide corresponding evaluation standards, offering a reference for responsible AI.

The above research has strong scientific feasibility and technical depth: using existing public patents and mature AI models for empirical analysis is expected to yield deployable tools and methods. At the same time, it theoretically innovates by combining DIKWP and TRIZ and integrating an ethical framework, enhancing the plan's foresight and responsibility.

VII. Research Plan and Schedule

2025.11 – 2026.6: Complete Phase 1 theoretical research, including the mathematical definition of the DIKWP model, DIKWP2Vec algorithm design, and prototype system architecture construction.

2026.7 – 2027.2: Implement Phase 1 experiments, collect target and candidate patent data, and complete semantic modeling, similarity calculation, and infringement assessment. Optimize algorithms based on preliminary results.

2027.3 – 2027.5: Carry out Phase 2 work, including DIKWP-TRIZ model construction, ethical framework integration, and overall system testing, to form the final technical report and tool demo.

References:

(References cited in this report include DIKWP-related literature and AI + patent analysis research (list omitted).)

(PDF) 基于DIKWP 语义解析的专利潜在侵权检测报告——以“基于数据图谱、信息图谱和知识图谱的语义建模及抽象增强方法”为例 (A DIKWP Semantic-based Patent Potential Infringement Detection Report - Taking "Semantic Modeling and Abstraction Enhancement Method Based on Data Graph, Information Graph, and Knowledge Graph" as an Example)

A Comprehensive Survey on AI-based Methods for Patents

The DIKWP (Data, Information, Knowledge, Wisdom, Purpose) Revolution: A New Horizon in Medical Dispute Resolution

(PDF) DIKWP×DIKWP 语义数学帮助大型模型突破认知极限研究报告 (DIKWP×DIKWP Semantic Mathematics Helps Large Models Break Through Cognitive Limits Research Report)

DIKWP-TRIZ: A Revolution on Traditional TRIZ towards Invention for Artificial Consciousness[v2] | Preprints.org