About the talk
Aleksy Jones, PDFTron | Electronic Document Conference 2019
Aleksy Jones is the Head of Systems and Rendering at PDFTron Systems Inc. He is responsible for PDFTron's Office to PDF conversion technology, a position that has taught him deep appreciation for complexity and intricacies of both PDF and Office formats. Aleksy has spent many years crafting technological solutions to real business problems, and has developed a strong appreciation for standards and how much easier they make it to get work done.View the profile
Good morning, everyone. My name is Heather office. And whether reliable office rendering is possible. So just to get things started over the course of this talk right now, but later on the office is a good format for for document creation, but PDF is better for sharing viewing in archival. So the answer to her question about whether reliable true is is is possible there is no it is not that that's what social media is better for sharing viewing her and
archival on different devices on my office now and in the future as well. Finish going through if you were close together. I just took before we start off something to eat pretty loose with my language in terms of what an office file is what I mean is basically offices in XML. Microsoft standard standardized is stop PowerPoint or Excel. So here's our of total road map. We're going to go through while we are together introduction with the final problem will take an in-depth look at different aspects of the problem. I look at some
solutions and then summer. So the start off we want to define a problem in terms of like these before operations with viewing archival sharing and editing. There are a lot of office files in the world. I mean this format has been around for quite a long time. Microsoft Office has been around for quite a long time people have been producing documents using productivity software for a long time and you can edit documents. We can perhaps collaborate on those documents and then maybe archive them and then we can maybe at at them again or
archival. We still have the same basic operations here. We're still viewing. We're still archiving for sharing for reading. Of course, if you have if you're looking at making some sort of user-facing out then it's a similar sort of problem in that case. You have a lot of documents that have been there forever in the world still creating them and it's your job to to process those documents and perhaps or perhaps two other people can share a document Park Library on them.
And again, we we just we have the same base of operations. And so what I'm trying to claim here, is that where his office is quite to do you know, it's not really that great. Now we're going to take a look at it why that might have case? So this next next few different parts? Basic facts about the format that formats that are commonly known as well as the implications of those facts. So here we go fight night in the morning, but with all our technology used all our technology within this converter
with you might my experience is basically But what I mean that each element on a page is independent, so we have you know over here. When you lay out element day, it does not change the position of element B. You always have some well-defined formula for determining where these elements are there in each place. Iris Global element you don't you need to look at element pay through an and determine how much space they took up where they were on the page in order to find out where your
later content should be placed. and again non-contentious we look at the file formats themselves PDF is just a single flat file and within that flat file you can have binary look up look up. Let's say you can jump in any position in the file and you can jump any pages in the file is slightly different here. So first of all, we have a package. Yes, you can jump to one part of the package, but if you look at something like Microsoft Word for office at Mount Then am I gay or I should say a word processing and in that case all of your content
in one single XML file. So you have a 600 page document generally will be in one single XML file and then you can't randomly access we've been here. So if you want to look at page 599 you need to read the entire file. Take me to things together. Maybe going to be fun personal statement, especially considering I am looking here at at an Imaging model is compatible with office. So nothing complicated if you want to look at your 18 message transparency group send it on the page that I can take long time.
If you're looking at it purely office-based content. Then PDF is actually very good here in the sense that when you lay out your 600 page document and if you want to look at page 300, you definitely need to do page 252 99 first. This is not the case of golf course PDF PDF. You want to render page 300? You just look it up you render it it possible to render more quickly and as possible to offer a better user experience using less CPU power. And you know you can argue that is really matter these days and see if
you guys are pretty fast. But generally when you talk about feelings to battery life, so, you know more see if you about power more battery usage or if you're looking at rendering something. The clients not in the web performance. You're still absolutely critical. Not to make things totally unfair here for when you add a document that people take it for granted productivity software you into a paragraph at the top of your page and it's going to push a paragraph about the bottom
that's going to go to the next page. It's going to continue this all the way down to page 600. He retired document works because it's a giant thing you would want to do the process manually. So office is definitely the format itself as well suited for quick and effective editing. And then we get to the Crux of the the the issue here. So unfortunately unreliable and but what I mean is inconsistent when you view a file on one platform versus another platform,
you're going to get different results. Get into right now. It's the first of all you can either in better phone or you can rot in betta farm with office files generally fonts are not embedded PDF for this a lot of it has to do with permissions in fonts it if you're going to do, you know office files are meant for editing. So if you embed a subset of the font with the letters, you know, Ed&f actually make any more letters using the same font. So you have to invade the entire fault and you have too
many phones are they don't actually allow this through formations very hard to do. So and PDS generally do unless people try hard not to do. So, even if you have a house in which van turn on embedded in a PDF at least you're going to have the size of your phone. So, you know how big each glyph is supposed to be and you know where it could reside in the office and fortunately does not give you that. At least you a sort of situation. I think everyone who is used productivity software has seen something like this before so he
has created this lovely doctoring here and she's used Californian comes with word. It looks a little bit like Times New Roman and I'm sorry if there any designers in the audience with this unicorn Alistair and you know where is a normal person when she decides about this picture should be about as big as a candy expands it to a Kill the Page Bob wants to view the document and unfortunately system. He does not have California. So unfortunately, I have a document that looks quite different. Maybe you
can maybe Fourteen Points difference in the height of that paragraph and now something's been pushed to entirely different page and unfortunately reflowable make you take the combination of pages and reflowable Leo you end up with the situation quite often small changes leading to large. diaper dances in Granbury can actually be like a pretty big deal if you imagine. This was like Well, someone's confidence to no fault of their own making me more important
than legal disclaimer That was supposed to be on this page of the document no more appropriate format. So in general is the problem the start of unavoidable like small differences between two large changes free flow pelayo. But you don't have to take my word for it for your own even we can just ask everyone and as Google says it looks different another computer. That's totally not contrived the happens a lot. Turn off all about this argument basically
office running will never be stable in the sense that it will be. Is it them on more than one platform at the same time? And this is almost built into the format. The reason it's so huge and there's a lot of aspects of the rendering that are totally undefined so he can PDF standard. Shimano 950 pages and you know that this again might be contentious will leave out complicated Imaging models, especially transparency groups efficiently motivated by another person who is sufficiently motivated put them into different rooms hasn't come up with an a renderer from the Stacked with
nothing else that you would end up with something that was reasonably given the same document there two different cleaner implementations would would generate similar help with certainly semantically similar Contrast office, we look at the standard difficult kind of information in there and certainly everything that's supposed to be in an office file is is accounted for in the standard. But unfortunately what it doesn't give you any information about rendering yourself. So
one person one room with these 9,000 pages, and I'm sorry for them. And then in the other room and Pages you would end up with something that certainly read everything from the office file handles. One of these days but I would call and divine pavers in the in the stack is a city with floating contact. But this little combination is it is undefined and it's also kind of difficult to do. Loading content on the page or normal text and you also have special elements
that can be in a particular position. We're looking at the case where the reef Global context and go around the building, So very rich model of doing this you can do a whole bunch of different ways. You can make it look like this is not to go with the same number in every since of features and you can't do it looks like this. And this puts a very large burden on implementers. And you end up with situations like this. So we have on the left we have MS word 265 for Windows is a new version Alice for soul has made a new
document try to highlight a particular part of the document by saying, you know, this part is important using one of the words or offices floating content. I guess methods exact same vintage online version. You just take the same file content rapping in the same way and they moved it as well. And you enter blue. What I would say is a different document. Eye in the sky looking try but what if it was in the initial, you know? You might say well what if it's just loading content manager
and I came a little bit list of things that I said, I would definitely consider Define overlap. When you consider character overlapping with another character or maybe a shaved or paragraph for nicely decline blunt. The rules are all different. What if you have a table table with a cell with another table in it was another spell in there with another table and then that goes across the page which unfortunately is undefined by the spec as well. PowerPoint animations in PDF
and not in the sense of having actual animated. Yes, but they just wanted it so that every time you click on your PowerPoint animation that's Having learned over the last two years. I did not go to the specification for a second. Try it out. So make a file in word simplest thing. You can imagine one titled fly or run one title that fits phasing / 1/2. And I'm sorry and also I'm texting you find out here you have an amazing fact transition in filter Faith. Why do we need 18 levels of
masturbate current some contacts and we can imagine what they were going for. I need to know more so important. Let's look it up in this back and see what it actually means. Soufeel describes the Steel type that I'm not the most helpful thing. So is this symbol * Best Buy's what modifications the effect leaves on the target element's properties when the effect as maybe a little bit too much for me something from here what happens at the end? We have four different choices. And there's nothing else you're stop you're
looking a lot of work and this lot of work applies to basically everything and so if you're looking at different office implementations and thinking are they going to do all of the stuff that that is in the form and is it going to be there they all going to go through this work? Probably not. So I would say that you know, if you have Alice and she sends odor documents to whole bunch of Bob's going to be different different popular mobile as well and she has no guarantee that any
different than they see something different as well between them. That's why the guide PDF for sharing or Heigl viewing. I'm going to mention here that I would consider. Sharing with yourself and future someone else in the future very similar to actually just sharing with someone else. So now we're going to look at things that office is not good at so we can create workflows. That's buying the strength of YouTube on That's So here so we're going to look at it. Maybe a larger company the other doctors management workflow. And the idea here is that you
have a user and then they got themselves. So, you know, they've the onus is on the user to perform their editing and then save as a PDF and then transmit that to the rest of the company into the document management system after which it can be viewed by others and and collaborated on maybe feedback would go back and then we do the same cycle again. This will definitely work at least do not have the same. Everyone is going to be looking at the same thing at
least in this case. Unfortunately, that's the conversion and does it correctly. I know the embedding fonts correctly or they using the word to do this for policies to sleep and it takes a lot of work first of all through for everyone to do this every little minute counts when you leave when you're going to work day and Horseman the policy information of policy is also not expensive. That is one possibility. But at least we will be reliably communicating. Another possibility here is that we can just use
Central instead of converting to PDF to do the conversion. So unfortunately we can't use off. Or at least we probably shouldn't so office itself. Most servers are many service members are Linux are some lady. David E Y application and even if you automated it still tries to do DUI things every now and then it's threading model. It not really compatible with servers have the server. And it does use a lot of resources as well. And even if you are okay with all of that, if you look at the licensing for office, not at all
really want to cross you can put it in your own now office as a standalone application and you can't wrap it in your own app. So with that said we're going to have to use some sort of third-party software here. And what's going on is we have the same thing as before this time. It gets converted centrally. And so we still have at this point the potential for Divergence. We are converting from from office for converting to PDF and we need accurate version 2.0 4% from the side for second. Once we have done that once
the document is he in the system. We're at that point totally sure that everyone is at least looking at the same thing. So there is a possibility of collaborating. If you're trying to collaborate on on two different documents in two people are seeing different things you have effectively malformed packets that the communication doesn't work that problem. Also now that we're not using him as software we can embed it in our own software and do you have the possibility of taking an external documents converting those within your own software as soon as they hit your
system and then pushing those into the server? So this this song? what we need here to complete the picture is essentially just a but we need some way of ensuring that I can bring the factory. I thought that that certain breeds has another problem with y'all leave for another talk and not quite yet and it's basically how do we quantify your accurate conversion? How do you how do you take some sort of any solution essentially and put a number on you know, how how reliable 20% on all documents?
I think I think there's another solution to this problem too though a little bit here. Didn't mean to format which which eventually makes it difficult to communicate. And it just doesn't include the necessary information to to make sure that your visual information is is represented properly. Maybe we could take something like a wet smell. Hey, what is the embedded fix the format with in the office. Org Somoza is a package for my you just cover PDF directly in there.
Buy this talk
Access to all the recordings of the event
Buy this video
Our other topics
With ConferenceCast.tv, you get access to our library of the world's best conference talks.