Events Add an event Speakers Talks Collections
 
Electronic Document Conference 2019
June 18, 2019, Seattle, WA, USA
Electronic Document Conference 2019
Request Q&A
Request Q&A
Video
Leveraging vector graphics in PDF
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Add to favorites
90
I like 0
I dislike 0
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
  • Description
  • Transcript
  • Discussion

About the talk

Jean Haney, Visual Integrity | Electronic Document Conference 2019

About speaker

Jean Haney
Owner at Visual Integrity

Jean Haney is the co-founder and president of Visual Integrity, long-standing specialists in PDF and vector graphics technologies. With more than 35 years of software industry experience and a fascination with good page design and typography, Jean enjoys distilling topics down to their essence and providing her audience with a few, "I didn't know you could do that with PDF!" moments.

View the profile
Share

So good morning, everybody during the session. I'm looking forward to sharing our experiences with the sorts of things. You can do with vector graphics. Letter contains inside of PDF documents before I get started. I just want to pick up on a couple of seems that I've sort of picked up on over the last couple days. First was David Blattner. He talked about turning a nuanced being in the business of Graphics. We care a lot about how things look and our customers tend to care about that too. They put a lot of work into developing their PDF and they want it to look exactly the same

when it's going into another format or displaying on another system. And you become once you start noticing good design. You become more sensitive to it don't talked about the Mindless manttra of where PDF goes to die. And while I agree wholeheartedly with his approach of attaching data to PDF file. Sometimes that just simply isn't possible. Sometimes it's interesting to pull meaningful information out of PDF. It's not always scraping sounds kind of negative. Sometimes it's targeted extraction with a purpose and sometimes we don't need the data. We just need a visual and that's what

we're trying to mind from a PDF. So a lot of the discussions here have been about getting organizations to adopt more of the PDF a standard. But we also have to respect the number of PDFs that exist today. That's an overwhelming number has Kenny swope's that part of the digital transformation is looking at that content that we already have and trying to leverage more information from it. And that's basically what a lot of my presentation is going to be about. Okay, my name is Jean Haiti. I'm the co-founder and CEO official Integrity

Home and is the is my partner and he and I started the company together in 1993. It was the same year that adobe first released acrobat and PDF up until that point. We had already built up a good deal of experience in developing Vector drawing applications as well as converting vector graphics into PostScript in converting the vector graphics from PostScript an EPS files into Vector formats, like wmfs VGC GM and myth Vortec tub systems back in those days a lot of content with being created and engineering

companies on sun and other work stations and they mostly outfit post-credits, but the type of people needed to use EPS or myth or another format in their technical manuals. So that's sort of where we got our start and The two of us we have and I really hate to admit to this almost 70 years of experience and he's vector graphics Technologies products customers and workflows. So I know that you're technically astute audience, so I'm not going to spend a lot of time talking about the difference between raster and vector graphics, but I am going to set the stage for

anybody who may not be in this world. So then we're going to talk a little bit about how end-users can use PDF today the graphics in PDF how integrators are using them to automate workflow Solutions and for developers to look at how they can access and work with Vector PDF information in their applications at both the file and the object level. Okay. So first of vector vs. Raster of the first thing we ask yourself is what is visual information and visual information is

all of the things you see in this chart. We just find my place here. You've all heard the adage that a picture is worth a thousand words. And when we think of pictures, we probably think of both photos and lineart the average user might never stopped to consider that these are by Nature very different types of Graphics photos are made up of pixels while illustrations are made up of objects drive from data pixels are just dots on a screen or page devoid of any information about what they collectively represent on the other hand illustrations

are rendered from data every line character object group and layer contains, which amounts of information that could be extracted to further understand or drive your business a little bit about this is sort of in line with what Kenny Swope was saying in his presentation this morning about how you can have a huge a locker of information of PDFs in your business, but the digital transformation is sort of trying to figure out how to Find those documents from even more information when you can. See them digitally versus

just on paper. So this fly summarizes many of the types of visual information which contain data. These are vector graphics, and that's what we're going to focus on today. Scan documents are raster documents their roster PDF, even though they look like drawings act like drawings and say drawing in neon on them. They're still bitmapped images. They're like photo copies or photos or photographs a snap from a moment in time. And there are there's no intelligence in the middle. You can't leverage what is lost so there's not much we can do with skins files.

Computer generated documents are primarily Vector PDF files. But remember garbage in garbage out. When a lot of people create PDF, they don't give so much attention to the graphics. Even if it was computer generated and was a vector they might take a screen snap out of it or they might just tore a lot of the conversion engines out there. We'll just convert all the graphics to bitmaps and that information is lost. So what we're advocating for us to try and keep as much of the vector in your file as possible. So why do I say primarily because the

PDF file the van to the graphs and charts the call-outs the block what's the logo is drawing schematics tables. That's all Vector information and there's intelligence in there. The wrestler information would be the photographs the spring snaps the original art that might be in there. So basically way to look at it is that rosters come from paint programs and scanners while vectors come from Any kind of illustration package technical drawing package and original line art? Okay, so

maximize your vector content. This is this image right here is basically just showing the difference between a bitmap and vector and I'm sure that you're all familiar with that. The great thing about vector graphics are that they're clear and crisp no matter how large or small they are. And this is a really important thing for a responsive design on the web for display on all different size devices vectors are device-independent resolution independent simple to edit searchable actionable intelligent and precise.

Okay. So what I'm going to do now is switch gears a little bit and talk about actual examples so that you can see the way people right now are looking to use the graphics in their PDF documents. And these are all examples of real customers or real products that exist today that people are using to do this. We'll start with PowerPoint and this is an example. Where is what we started with was the World Wildlife funds 2017 annual report downloaded from the web and let's imagine that we're a

local organising group and we want to try and get contributions for the WWF. So what happens is we go to the document and we go to the first page. We open it up and PowerPoint we grab the logo and then we're going to grab a graphic Wichita KS about how important individual contributions are to their program. And when we go through that you see that you can appear in the left. I knew it was going to be pretty hard to see you see the original. This is the page 30 of the annual report. That's the number to slide and you see that

we've grabbed the top graphic and the bottom graphic and we've brought them over into the slide and weave Enlarge the weave change the color of the two areas were interested one is how much it relies on individual contributions to show people how important their donations are. So basically you can take an ordinary corporate document with very high quality graphics and you can grab those and use them for a purpose of where you adopt from a little bit and I haven't how it works here. So we'll just look at that. Hopefully it starts on its own if not all

that this That's not good. That was supposed to be kind of video demonstration I don't know how else to get it to work here. I guess we should have tested that part. Okay, so I'm just going to go back to that slide for a second and just manually walk you through what we did. So basically we took those Graphics. We we grabbed the musi the logo up in the very corner of the the homepage of the front page of the document. And so we grabbed that and in grabbing it we then

grouped it to make sure that it was scale properly and then we were able to enlarge it. We could annotate it. We let me grab these parts of the slide and we were able to do a lot but we were also able to really leverage those Graphics that were professionally designed in a way. That was very powerful. I can show anybody who wants to see the video demonstration on my laptop after this if you want to take a look at it and I don't know why it didn't work here in this PDF. So then

we'll move on to the next product which is Vizio. So a lot of people will receive a Vizio floor plans layout diagram and they'll receive them from maybe a floor plan from somebody in the company. It was done a year ago. They want to be able to modify a few things moved a few chairs and tables around what you can do with a PDF is open it as an editable Visio diagram, you can specify which page you want to open you can scale the drawing to scale to a drawing or to a page size. Remember that would PDF PDF is a page format. So in

things like had and Vizio, those are more spatially oriented they're based on scale and when the PDF is created all of the dimensions. I lost you no longer know really what the measurements are. You just have a scale a page of a drawing to scale to a page. You can emulate PDF cropping you can rotate you have editable text with turning and van stuff substitution and allow these features are ones that have been developed to compensate for differences between PDF and the target formats. So for example, you might have busy a Curves in PDF, but then

in a format like wmf it doesn't support busy at Curve, so we have to pray polylines. So there's a lot of work that's gone into compensating for the differences in formats. You can do things like other recognized objects Circle or ellipse is a Kinect as a collection of busy a curves and art and we can look at it and say hey the way that set up in the way that they're connected. It looks like a circle acts like a circle. Let's make it a circle and chances are it's a circle and if it's not then the end user can look at that in the end and

change modify whatever they need to. As far as AutoCAD goes there are a lot of solutions out there already for opening and editing PDF drawings in CAD Autodesk themselves included feature in AutoCAD, which allows you to open a PDF file. You can specify the scaling Factor again, same thing with Vizio when you have a PDF on paper, it's a page size. So when you bring it into a bad you have to be able to add the scale at add the dimensions and hopefully, you know that from the legends on the

drawing you can see what one dxf unit equals in terms of feet or inches or You can extract drawings for multi-page PDF files. You can use the layers in PDF to combine merge create separate files for each layer more features that have been sort of compensated for our being able to render passes polylines are polygons being able to recognize dashed lines, which when you with a PDF to season a dashed line as a bunch of very small dashes and products a technology like this can go in and say hey, that's the way that's behaving. We think that's a line segment. So turning it

back into a line segment adapting line with ignoring past text images keeping a removing 3D effects. an adjusting for incompatibilities And also layers, which I talked about and you can do this with external programs with plugins many of the cad programs are starting to add it either themselves or through are we having technology? So in the cat world is becoming very common to be able to open and edit PDF. This is an example of what the drawing looks like. Once it's been imported

into AutoCAD. You can see that everything looks the way it should one of the gachas here is the text because in AutoCAD CAD program support truetype and opentype tax now, but they also have something called shx text and that's the type of text which is shaped text and it's basically potted text. So any letter is made up of a bunch of 10 strokes and at the moment there aren't really any good ways to recognize that text with any kind of OCR or anyting and there are companies including us. We're working on developing that Autodesk has a simple feature for that in

AutoCAD, but it requires you to know what s h x text was used by which Park was used. It requires you to know a lot in order to for them to review that text and say here's what we think it says and it's I did a lot of tests with it and I on our test files that we've accumulated over many years and it rarely got it, right. So it's a really difficult thing to do well. So the next thing we're going to talk about our servers and a p i c Solutions. This is an example of companies. We've worked with

and the ways that they have implemented solutions that focus a lot on the graphics content in PDF files. I'm just going to stroll along here for a second to get to my point. Okay, so they've implemented some type of PDF document process automation or service related to vector graphics in some way. They fall into a few categories one would be content management another technical publishing and then workflow in general. So if we look at the CMS in publishing examples

first on the Lefty of STL Varitek pubs company, they free products. For example mercues. Is there software for the FDA submission switch your very long involved documents and they wanted to be able to make sure that the graphics from PDF to be inserted as EPS to neutralize font issues because of there's a lot of issues around van since we've all heard and when you have these complicated equations for my pharmaceutical compounds, it's very easy for that to get off a little bit and when that happens the whole

submission falls apart because that compound is what the whole submit News about so we find ways to do things like render the the equation forms as busy a curve. So it looks exactly right no matter what happens to it after that. So in that case it's been reduced to its at the date has been taken out of that equation. You wouldn't be able to open it up again in Matlab or whatever you're using but you do have an exact representation that can go through your document system now and you never have to worry about it losing its meaning so

that there's that example. So there you insure your wysiwyg reference representations another example and Publishing is ABB who converts PDF as well as EPS in a I withdraw kind of formats of PDF into wmf print streams in their link one publishing a system to produce Parts catalogs nxp semiconductors and a spin-off next. They convert PDF into SVG and epsf for their Tech Pub system and bashed us something similar going from PD PDF to EPS an SVG. A lot of companies are

interested in going from a PDF the graphics in PDF to SVG because of the web. That's the HTML standard Vector graphic and they're trying to get their systems to be more browser-based. The other companies on the slider doing volume automation of reports statements and direct mail when converting PDF to wmf EMS Prince dreams, they merge graphics and text from the PDF file. The text is pulled from the PDF is a ski and then placed precisely on the wmf form, which may be a statement or check change Healthcare who we all know is

WebMD. They convert 10000 Page Plus PDF documents into automated. Wmfe MF Prince dreams as a service business for their health care providers in a fork. They also extract selected ASCII text from the documents to populate their data management system First Hawaiian Bank and Bank of the West have similar operations. Were they processed 10K + page documents there PDF Mainframe reports to extract and present text within a fixed 232 character for line constraint and then they load merge their logo with it and regenerate. All of that as a PDF

so sometimes it's a PDF in PDF out but you're doing a lot to the PDF to add value to it in that process by using the value of the object data examples of workflow would-be writers who is converting PDF based Oracle reports and casting them to the browser is SVG and Lufthansa also, does this by converting flight navigation maps from PDF to SVG for web access? So that's what we're up wraps up these command line examples and then what we'll do next is move two examples of where developers

can use an API to do a lot of these things first office PDF conversion. And with two just to API calls a conversion solution can be implemented in half a day in an application to allow it to open and edit PDF files. It's becoming a checklist item to add support for PDF import edit and Commercial applications and conversion is not always straightforward in order for it to be done while there has to be compensation for differences in PDF in the Target format as I mentioned already before things like PDF cropping line type definitions layers and fonts.

By mapping your format to PDF through use of an API. You can also add features like snaptube an underlay or viewing to your software service matching PDF to another format is never a one-to-one exercise. So these compensations become very important examples here would be an example where somebody adds gray birthday add PDF import to their application. Where is draftsight they create a plug-in to also be able to get a little bit of revenue for that functionality from their customers. And then for a laser cutting machines you

can now they expect dxf files from their customers which are not everybody can create a dxf file. So by allowing the customers to give them PDF they expand their base for the kinds of documents that they can consume and then cut with their laser cutting machine. So that's a big area where PDF is becoming an import format for those sorts of machines. As far as PDF creation apis go. There are a lot of them out there between open source and Commercial Solutions and it's pretty much your choice. But when

you have Vector graphic intensive intensive documents, you need to pay attention to get wysiwyg accuracy. There are apis to create PDF from scratch from data text Prince dreams PostScript EPS in PDF and apis that allow you to modify that content and we've heard some of those already at this conference being able to merge combine stamp and Watermark example, I want to tell you about here is the National Bank of Norway. They feed their Economist PS PDF an DPS charts and Graphics from various analytical tools. Did they

use in to their production system to produce PDFs for their presentations and reports EMF for MS Office charting and paying or SVG for the web. So they're basically creating a graphics program system where they can send convert and then send off Graphics to multiple different systems to be used kind of like we started out with the World Wildlife logo, like it would be great if that existed in one place and everybody can use it but what happens today is people just store it take a screen snap of a logo. They stick it in something it gets all skewed or resized and it looks

awful. So it's a way to kind of ensure quality. Okay. So with the bank of Norway's they take their they also add their logo and page numbers to the document after they do all of that work. We're going to have to finish up pretty quickly here on you. I had a lot to go through PDF object access apis. This is a really interesting area of development activity level control for pre-processing PDF files. You can extract or search for objects on page. You could create your own conversion engine with no intermediate stepper print driver which gives you better performance and better-quality. You can

find text object strings and then pass them to something you could perform operations on object data, like find it and replace it. You could delete object data you could change attributes of the object data and you can Implement features like snapping. Two customers that use this type of API that I know of are the open design Alliance which managed the the DWF format that is outside of AutoCAD and they created it today be used it to create a PDF underlie Flay for the format and to add clipping and snapping to their CAD

platform gstarcad map their internal format to the open design alliances object database so that they could then use the objects within they could import PDF and they could export PDF. So we can go and look a little bit at the future. I'm just going to catch up here with my notes. I'm going to show you a couple of short-term projects which we do is important and I'll also talk briefly about some game-changing possibilities what sort of short-term challenges are there to

improve after extraction. They would be first would be plotted text recognition. This is what I talked about with ashx fonts and AutoCAD that you can have PDF files that have these plotted on a text in them. And right now there's no way to do OCR on that and there's a couple of reasons why one is because it's really hard to know the difference when fonts get involved like what a no and a zero are how you tell the difference between them. How do you know which one it is? And then also for example this cute that's actually a compound object. It's Abby lifts and it's a

straight line and it takes two objects to make a cute. So there are there are a lot of issues in that so we're working on that being able to recognize Watch feels when you open up a PDF file in a CAD program and there's something that has a hatch fill in it. It doesn't see it as a circle with a pattern. It sees it as a circle with 20 lines in the middle of it and then explodes the file and makes it very large. So being able to see that hatchville as a hatchville and replace it as a pattern. That's an important

advancement be able to make and then we also talked about fonts and how phones are difficult. I mean not only is there a font mapping issues but even with infants if you use Windows 7, you probably have Ariel Mt on your system and if you use Windows 10, you probably have Ariel on your system. Those are exactly the same font, but does the PDF know that no, so when you have a PDF with Ariel Mt. And you sent it to somebody who has regular area and it won't know that that's the same aerial so there are things to do to make up for understand the font.

Variant name recognition. Recognize the names that are in the PDF files. And then we also have to look at how Graphics fit into the PDF and Association and the ISO standards because now the focus as we see it this conference is a lot on accessibility. It's maybe not as much yet on the idea of using the PDFs that exist to help people transform their digital businesses to find the intelligence that that exists in their organization today and we can do that a lot through this. So the last thing real quick, I know it. My time is

up is the long-term goal we have is so they were working on something called doctor search and actions. And in this case, this is part of the digital transformation organizations have a wealth of information locked inside their PDF face visual content and Mining it can lead to better better business intelligence streamlined workflows and result ultimately and increased competitive Advantage the solution for this is something we call search and action and it needs a platformer API. Weather Naples the Enterprise to use the intelligence stored in the visual content through facilities for

retrieval viewing discreet search and actions Dynamic modifications analysis publishing and where to find correlate analyze and public business results. We've created a prototype which we developed with grants from the Netherlands government Innovation funds. We do our developments in Holland. This has two parts to it. The first is compound object recognition, which is where you can pinpoint specific option up objects within files compound subjects would be we have a simple object like a circle but a compound object would be like a cad part and you

would be able to search the parts catalog for that CAD part and you would be able to find other 10 parts that fit that that it looks like both at that size at scale. And at any rotation at me orientation, you would really be able to start looking into the graphics of 5. In order to operate this system. You need to have a object query language. And so these are the two parts. It allows for a lot of dynamic functions within the graphics in PDF. It allows data-driven Graphics merging of files tamping watermarks. That's it. That is it. And

so if you have questions, I'm around you can ask me if you want to see the way the PDF can be used in PowerPoint that little video. I was going to show let me know and if you want to contact me after there's my information. Thanks a lot.

Cackle comments for the website

Buy this talk

Access to the talk “Leveraging vector graphics in PDF”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Access to all the recordings of the event

Get access to all videos “Electronic Document Conference 2019”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Ticket

Interested in topic “Business Management”?

You might be interested in videos from this event

October 5 - 7, 2020
Online
62
4.86 K
brand communities, cmx, community platform, community relations, networking, swot, virtual event

Similar talks

Paul Rayius
Director of Training at CommonLook
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Roman Toda
pdf expert, co-founder at Normex
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Karina Zander
Manager Marketing at axaio software GmbH
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Buy this video

Video
Access to the talk “Leveraging vector graphics in PDF”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Conference Cast

With ConferenceCast.tv, you get access to our library of the world's best conference talks.

Conference Cast
839 conferences
34097 speakers
12891 hours of content