Chinese Discourse Representation Structure Parsing: Feasibility, Pipeline, and Evaluation

1. Introduction

This work addresses a significant gap in semantic parsing research: the parsing of Chinese text into formal meaning representations, specifically Discourse Representation Structures (DRS). While neural parsers for DRS have achieved remarkable performance for English and other Latin-alphabet languages, the feasibility for Chinese—a language with a different character set and linguistic properties—remains largely unexplored due to the lack of labeled Chinese DRS data. The paper investigates whether high-quality Chinese semantic parsing can be achieved and compares two primary approaches: training a model directly on (silver-standard) Chinese data versus using a machine translation (MT) pipeline coupled with an English parser.

2. Background & Motivation

2.1. The Challenge of Multilingual Semantic Parsing

Semantic parsing transforms natural language into structured meaning representations like Abstract Meaning Representation (AMR) or Discourse Representation Structures (DRS). These representations are often considered language-neutral. However, practical parsing faces the "named entity problem": entities may have different orthographies across languages (e.g., Berlin vs. Berlino) or entirely different character sets (e.g., Latin vs. Chinese characters). Expecting a Chinese parser to output Latin-script named entities is impractical for real-world applications.

2.2. The Case for Chinese DRS Parsing

The core research question is whether Chinese semantic parsing can match English performance with comparable data resources. The study explores if a dedicated Chinese parser is necessary or if an MT-based approach using an existing English parser is sufficient, thereby evaluating the true "language-neutrality" of DRS in practice.

3. Methodology: Data Pipeline for Chinese DRS

The key innovation is creating a silver-standard dataset for Chinese DRS parsing without manual annotation.

3.1. Data Source: Parallel Meaning Bank (PMB)

The Parallel Meaning Bank (PMB) provides aligned multilingual texts (including Chinese and English) paired with English DRS annotations. This serves as the foundational parallel corpus.

3.2. Named Entity Alignment with GIZA++

To handle the named entity problem, GIZA++ (a statistical machine translation alignment tool) is used on word-segmented Chinese and English text. This generates Chinese-English named entity alignment pairs. The aligned Chinese named entities are then used to replace the corresponding English named entities within the DRS structures derived from the English side, creating a Chinese-anchored DRS.

3.3. Linearization for Seq2Seq Models

The resulting DRS graphs (now with Chinese entities) are linearized into a sequence format suitable for training sequence-to-sequence neural network models, such as Transformers.

Key Pipeline Output

Input: Parallel (Chinese Text, English Text, English DRS) from PMB.

Process: GIZA++ alignment → Chinese entity substitution into DRS.

Output: Silver-standard (Chinese Text, Chinese-anchored DRS) pairs for model training.

4. Experimental Setup & Test Suite

4.1. Model Training

Two experimental setups are compared:

Direct Parsing: Train a seq2seq model directly on the generated silver-standard Chinese DRS data.
MT + Parsing Pipeline: First, translate Chinese text to English using an MT system. Then, parse the English translation using a state-of-the-art English DRS parser.

4.2. Chinese-Focused Test Suite Design

A novel contribution is a test suite designed explicitly for evaluating Chinese semantic parsing. It provides fine-grained evaluation across linguistic phenomena, allowing researchers to pinpoint specific challenges (e.g., adverbs, negation, quantification) rather than relying solely on aggregate scores like F1.

5. Results & Analysis

5.1. Direct Parsing vs. MT+Parsing Pipeline

The experimental results show that training a model directly on Chinese data yields slightly higher performance than the MT+Parsing pipeline. This indicates that while meaning representations are theoretically language-neutral, the parsing process itself benefits from direct exposure to the source language's syntactic and lexical patterns. The MT step introduces an additional layer of potential error propagation.

5.2. Error Analysis: The Adverb Challenge

A critical finding from the fine-grained test suite is that the primary difficulty in Chinese semantic parsing stems from adverbs. Chinese adverbs often have flexible positions and complex interactions with aspect and modality, making their mapping to precise logical operators in DRS particularly challenging. This insight is crucial for guiding future model improvements.

Key Insights

Feasibility Proven: Effective Chinese DRS parsing is achievable using a silver-standard data pipeline.
Direct Approach Superior: A dedicated Chinese parser outperforms an MT-based pipeline, justifying language-specific development.
Adverbs are the Bottleneck: The test suite reveals adverbs as the major source of parsing errors, a specific linguistic challenge for Chinese.
Value of Diagnostic Evaluation: The Chinese-focused test suite is a vital tool for moving beyond black-box evaluation.

6. Technical Details & Framework

DRS Formalism: A DRS is a recursive first-order logic structure comprising discourse referents (variables for entities) and conditions (predicates relating them). A simple DRS for "John runs" can be represented as a box:

    [ x ]
    named(x, john)
    event(e)
    run(e)
    agent(e, x)

Linearization: For seq2seq models, this graph is converted to a string, e.g., using a prefix notation: (drs [ x ] (named x john) (event e) (run e) (agent e x)).

Alignment Objective: The GIZA++ alignment aims to maximize the translation probability $P(f|e) = \prod_{j=1}^{m} \sum_{i=0}^{n} t(f_j | e_i) a(i | j, m, n)$, where $f$ is the Chinese sentence, $e$ is the English sentence, $t$ is the lexical translation probability, and $a$ is the alignment probability.

7. Core Analyst Insight

Core Insight: This paper is a pragmatic, resource-conscious blueprint for expanding formal semantic parsing beyond its English-centric stronghold. It correctly identifies that true "language neutrality" is a practical engineering challenge, not just a theoretical claim, and tackles the most non-trivial case: Chinese.

Logical Flow: The argument is sound. 1) Acknowledge the named entity roadblock for non-Latin scripts. 2) Propose an automated, scalable pipeline (PMB + GIZA++) to sidestep costly manual annotation—a move reminiscent of leveraging weak supervision in other NLP domains. 3) Conduct a crucial ablation study (Direct vs. MT+Parsing) that provides a clear cost-benefit analysis for future projects. 4) Use a diagnostic test suite to move from "it works" to "why it fails," isolating adverbs as the key adversary.

Strengths & Flaws: The major strength is its practicality. The pipeline is reproducible. The test suite is a significant contribution for model diagnostics, akin to the role of GLUE or SuperGLUE for English understanding. The weakness, acknowledged by the authors, is the reliance on silver-standard data. Noise from automatic alignment and potential translation artifacts in the PMB could limit the ceiling performance. As seen in projects like UniParse or the challenges of cross-lingual transfer for AMR, the quality of the seed data is paramount. The study also doesn't deeply explore modern contextual embedding-based alignment versus GIZA++, which could improve entity mapping.

Actionable Insights: For researchers: Build on this test suite. It's the perfect benchmark for probing the semantic competence of large Chinese language models like ERNIE or GLM. For engineers: The direct parsing approach is justified. If you need Chinese DRS, train a dedicated model; don't just pipe through MT. The ROI on collecting/refining silver data is positive. The next step is clear: integrate this pipeline with massively multilingual pre-trained models (e.g., mT5, XLM-R) in a fine-tuning setup. The adverb problem specifically calls for incorporating linguistic features or adversarial training on adverb-heavy examples, a technique successful in other structured prediction tasks.

8. Future Applications & Future Directions

Applications:

Cross-lingual Information Extraction: DRS parsing can serve as an intermediate, language-neutral layer for extracting events, relations, and coreference from Chinese text for knowledge base population.
Advanced Machine Translation: DRS can be used as an interlingua for semantically-aware MT between Chinese and other languages, potentially improving the translation of meaning over form.
Question Answering & Dialogue Systems: A formal semantic representation of Chinese user queries can enable more precise reasoning and database querying in customer service chatbots or intelligent assistants.

Future Directions:

From Silver to Gold: Using the silver-standard data as a starting point for active learning or human-in-the-loop annotation to create a high-quality gold-standard Chinese DRS corpus.
Integrating Large Language Models (LLMs): Exploring prompt-based or fine-tuning approaches with multilingual LLMs (e.g., GPT-4, Claude) for zero-shot or few-shot Chinese DRS parsing.
Expanding the Framework: Applying the same pipeline methodology to other meaning representations (e.g., Chinese AMR) and other non-Latin script languages (e.g., Arabic, Japanese).
Architectural Innovations: Developing graph-based neural parsers that directly generate DRS structures from Chinese text, potentially better handling the graph semantics than linearized seq2seq models.

9. References

Abzianidze, L., Bjerva, J., Evang, K., Haagsma, H., van Noord, R., & Bos, J. (2017). The Parallel Meaning Bank: Towards a Multilingual Corpus of Translations Annotated with Compositional Meaning Representations. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL).
Bos, J. (2015). Open-domain semantic parsing with Boxer. In Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA).
Kamp, H., & Reyle, U. (1993). From Discourse to Logic: Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Kluwer.
Och, F. J., & Ney, H. (2003). A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics.
Ribeiro, L. F., Zhang, Y., & Gurevych, I. (2021). Structural Adapters in Pretrained Language Models for AMR-to-Text Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP).
van Noord, R., Abzianidze, L., Toral, A., & Bos, J. (2018). Exploring Neural Methods for Parsing Discourse Representation Structures. Transactions of the Association for Computational Linguistics (TACL).
Wang, C., Zhang, X., & Bos, J. (2023). Discourse Representation Structure Parsing for Chinese. arXiv preprint arXiv:2306.09725.