FontCode Embeds Hidden Data into Text Docs to Increase Security and Validity


(NEW YORK)–FontCode, which was developed by Columbia research scientists, embeds hidden information in ordinary text by changing the shapes of fonts in text, which could prevent document tampering, protect copyrights without altering the look or layout of a document.

Someone using FontCode would supply a secret message and a carrier text document. FontCode converts the secret message to a bit string (ASCII or Unicode) and then into a sequence of integers. Each integer is assigned to a five-letter block in the regular text where the numbered locations of each letter sum to the integer.

“While there are obvious applications for espionage, we think FontCode has even more practical uses for companies wanting to prevent document tampering or protect copyrights, and for retailers and artists wanting to embed QR codes and other metadata without altering the look or layout of a document,” says Changxi Zheng, associate professor of computer science and the paper’s senior author.

Zheng created FontCode with his students Chang Xiao (PhD student) and Cheng Zhang MS’17 (now a PhD student at UC Irvine) as a text steganographic method that can embed text, metadata, a URL, or a digital signature into a text document or image, whether it’s digitally stored or printed on paper. It works with common font families, such as Times Roman, Helvetica, and Calibri, and is compatible with most word processing programs, including Word and FrameMaker, as well as image-editing and drawing programs, such as Photoshop and Illustrator.

Data hidden using FontCode can be extremely difficult to detect. Even if an attacker detects font changes between two texts—highly unlikely given the subtlety of the perturbations—it simply isn’t practical to scan every file going and coming within a company.

To read the full article written by Holly Evarts at Columbia School of Engineering on May 11, 2018 visit

A.I. Investor Newsletter

AI VentureTech, Inc. is a network of investors who have strong interest in the outlook for A.I., Machine Learning, and Robotics as an investment sector. To join our AI investor network please subscribe at

About FontCode

FontCode, an information embedding technique for text documents. Provided a text document with specific fonts, our method embeds user-specified information in the text by perturbing the glyphs of text characters while preserving the text content. We devise an algorithm to choose unobtrusive yet machine-recognizable glyph perturbations, leveraging a recently developed generative model that alters the glyphs of each character continuously on a font manifold. We then introduce an algorithm that embeds a user-provided message in the text document and produces an encoded document whose appearance is minimally perturbed from the original document. We also present a glyph recognition method that recovers the embedded information from an encoded document stored as a vector graphic or pixel image, or even on a printed paper. In addition, we introduce a new error-correction coding scheme that rectifies a certain number of recognition errors. Lastly, we demonstrate that our technique enables a wide array of applications, using it as a text document metadata holder, an unobtrusive optical barcode, a cryptographic message embedding scheme, and a text document signature.

About Columbia Technology Ventures

Columbia Technology Ventures is the technology transfer office for Columbia University and a central location for many of the technology development initiatives, entrepreneurial activities, external industry collaborations, and commercially-oriented multidisciplinary technology innovations across the university. CTV’s core mission is to facilitate the transfer of inventions from academic research labs to the market for the benefit of society on a local, national, and global basis.