Data Science
Store
Stories

Ultraviolet: Turn Hidden Document Data into an AI Advantage

Session Abstract

Every PDF hides a world of structure, metadata and embedded signals that can silently influence AI based processing. With ultraviolets, we reveal how those can be exploited for malicious purposes and even become powerful tools for smarter applications. Designing for both humans and machines become a vital aspect of AI experience design.

Session Description

Artificial intelligence is no longer only something we build — it is something we design for. As AI systems increasingly mediate how users access information, make decisions, and interact with digital products, a new role is emerging: designing how intelligence itself is perceived, trusted, and behaves in real-world environments. This perspective becomes especially critical when AI systems depend on complex information artifacts such as documents.

Documents remain one of the primary means of information exchange across industries, with PDFs alone accounting for billions of files generated each year. Despite their ubiquity, PDFs are often treated merely as containers of visible text and images. In reality, they encapsulate a much richer and more complex internal structure, including annotations, cross-references, accessibility artifacts (such as alternate text), hidden or layered content, embedded attachments, metadata, and other non-obvious elements. These components are largely invisible to users, yet they can have a profound impact on downstream artificial intelligence systems.

This talk explores how agentic workflows, automated information extraction, and retrieval-augmented generation (RAG) can be influenced, or even exploited by the way PDF internals are interpreted. We examine the types of hidden information that can be found or intentionally included within PDFs, how parsers and document processing tools handle (or ignore) this information.

We further investigate the risks and opportunities associated with PDF metadata and hidden content. On one hand, poorly handled metadata can introduce vulnerabilities, including malicious data-injection attacks that target AI pipelines at the document layer. On the other hand, these same mechanisms may offer untapped potential: can documents embed structured signals, pre-computed representations, or even vector-like information that could enhance retrieval, indexing, or storage? Could documents themselves act as intelligent carriers of contextual knowledge?

Using practical examples, the talk aims to make “visible” the “invisible” layer behind visualized text and images, and its interaction with AI systems. Framed through the lens of AI experience design, we discuss what it means to make content truly AI-ready, why structure and intent matter when information is consumed by both humans and machines, and how responsible design can improve reliability, transparency, and control.

Participants will gain a deeper understanding of how hidden document structures affect AI behavior, how to safeguard pipelines against adversarial or accidental misuse, and how to responsibly leverage document internals to build more robust, trustworthy, and intentionally designed AI-powered knowledge systems.

Short Talk