Heterogeneous Dependency Graph-Guided Attentionfor Patent Representation Learning

Pre-trained language models advance patent classification and retrieval via encoding claims as flat token sequences, yet overlooking the dependency hierarchy among claims. Incorporating the hierarchy into self-attention poses two challenges. First, claim dependencies involve relation types with varying reliability: treating them indiscriminately allows noisy technical relations to corrupt cleaner legal citation signals. Second, when the dependency graph is defined over claims, Transformer models fail as they operate at the token level; broadcasting claim-level adjacency can dilute structural information across unrelated token pairs. A novel Patent Heterogeneous Attention Graph Encoder (PHAGE) addresses these challenges. To handle heterogeneous dependencies, PHAGE constructs a typed graph to separate legal citations from technical relations as distinct edge types. To bridge the hierarchy gap, PHAGE introduces a connectivity mask with learnable relation-aware biases to project a claim-level topology into token-level attention. PHAGE learns a dual-granularity contrastive objective to align representations with inter-patent taxonomy and intra-patent topology. Experiments show that PHAGE outperforms domain-adapted and citation-aware baselines on patent classification, retrieval, and clustering. PHAGE discloses that the intra-patent claim topology captures stronger inductive bias than the inter-patent structure.