About the talk
01:30 What will be indexed
06:41 Canonical document
24:27 Urgent data-detection
29:32 Mobile-friendly test
Good morning, everyone. My name is Tom Greenway and I'm a partner developer Advocate from Google Sydney with a focus on the index ability or Progressive web applications in Dallas from Zurich and with the like it's great to see so many of you here even at this early hour. Now that's a good Imagine John. I have a lot of experience to work web developers must do to ensure that website for indexable, which is another way of saying with a web page can be found in understood by search engines. What do search engine Co White Pages exactly the same way at
looking at this initial HTML that's been delivered from the seven. I'm with you the initial a female that's been sent down is that you're completely devoid of any content be here in the a fruit. That's all the reason the body of the page except for some script tags. So that doesn't mean nothing here index. Had to be clear Angela isn't the only by framework that says an empty response on its initial set aside render react if you don't have similar issues by the fault, so what does this mean to the index ability of the websites from the perspective of
Google Search? About to answer that question better will take a little step back and talk about the web in general why search engines exist and why search Melissa necessary? That's a good question to start with his how big is the web while we can tell it was actually found other 130 trillion documents on the web. So in other words, it's really big Hennessy know the animal search engines including Google is to provide a list of relevant search results based on a user's search query results. We need an index similar to the
catalog of a gigantic library and give me the size of the web. That's a really complex top. Set Builders index to power out the attention. We need another tool I slept for a while and traditionally is that for all I was basically just a computer and a piece of software that phone to key steps 1 a.m. To find a piece of content to be called in to do this the content must be retrievable buy a URL and once we haven't URL we get its content and we stick to the HTML the index of page and find you linked to crawl as well. And that's the cycle repeats. So it's okay that Fest at The Crawling and
break it down. Oh and yes as an Australian, I felt was imperative that I do some spiders in my dog. So this is the cutest most of what I can find. John with these date. Okay. Well, I have a few more in the day. So maybe he'll come around. To ensure the crawling is possible. There is some key things to keep in mind Presley. We need your own spiritual. I think that should be any issue when the Corolla wants to request the web pages and Retreat the resources necessary for indexing them from your website. And secondly if there are multiple documents that contain the same
providing recommended setup your rolls to Crowell initially for a side. There's no guarantee. These Euros will get cold. Just wanted to speak no successful as well. Consider. Okay, but now let's talk about that. You look content scenario and how such policy with this situation sometimes websites one. Multiple Pages have the same content, right? Even if it's a website free sample song pluggers will publish articles on their website and crossbows to services like medium to increase the reach of that content and this is called content syndication which URL
and history as well. So it's Rick was invented which leverage something called a strike would identify its purposes for deep linking to the sub-continent of a page like a subsection of an encyclopedia article. And because right when identifies Westport by browsers for history, navigation dismiss developers could trick the browser into fetching you content dynamically without reloading the browser paid and yet also support the history in the navigation. We love about the web but we realize that using the fragment identify for two Pepsis. Stop directions on pages and also deep
managing the history of the history state of funeral without requiring complete reload to the browser all three jobs. So we get the best of both worlds dynamically fetch content between for just know you or else but I can tell you that from Google to spective. We no longer index that single hash walk around and we discourage the use of the hashbang trick as well. Okay. Well, that's the way that lets me go into the indexing step. April is ideally want to have to find all the content on your website if the coil is Concepts and content and how they going to in
Dexter and the cool content of the page include all the texts imagery video and even hidden element extraction metadata. And otherwise, it's okay, but don't forget about that content. You want to see the embedded content to I don't say this might seem really obvious, but I want to emphasize that I Google we take HD codes pretty seriously especially for for knockdown code. If Coral is find a page that has a 404 status code then they probably won't even bother index again and loudly of course a Corolla wants to find all
mentioned earlier navigate to page Hughley on the client and fetch you contact. Dynamically, you can't do that so long as you use the anchor tags with a giraffe actually it's like if it's lost example because most of us including Google will not stimulate stimulate stimulate navigation of a page find links tag will be followed for leaking. The wait is that really everything in order to have sifted Through the Ages melting index the page. We need to have the egg smell in the first place and in the early days of the web 7th likely gave us. But nowadays that's not really the case.
call that hype rate read right? Reddit rear-ended on the sub of the day Chanel immediately, but if it's rented on the client and things get a little bit tricky a right and that's going to be the challenge that we discussed in today. But one last time you might be wondering what is Google search of Chronicle Universal software running on it may be in the 90s. That was the case. But nowadays you just the sheer size of the web is comprised thousands of machines
running has distributed software that's counseling crunching data to understand all this is continuously expanding information on the web. And to be honest. I think we sometimes I recently learned that with the knowledge graph. We haven't which of the Dodger baseball the information we have on the web. How more than 1 billion things in the real world are connected. I noticed everything billion facts between them. Okay. Well now that we know the principles of assets Rob. Let's see how the three different key steps froling rendering an indexing All Connect because one
crucial thing to understand is the cycle of how Google bought or how it should ideally work. As you can see we want these three steps to hand over to one of the insulin. And as soon as the content is believed rented we want an index it to keep the Google search index as fresh as possible. This ain't sounds simple. Right? Well, it would be if all the content was Randy on the server and complete when we crawl it. Let me know if it's not uses client-side rendering but that's not going to be the case just like that Angela sample. I showed u l yet. So what does Google do in this situation?
But I'll show you. The reality Google Bots process looks a bit different we crawl the page. We fetch the service. I've been in contact and every run some initial indexing on that document. But rendering the jobs gopal web pages takes processing power in memory of all Google. Is very very powerful doesn't have to get infinite resources. So if the page has jobs 15, it is actually until we have the resources ready to run to the clients. I contact in the weight index be content further. So Google bought my index of page before rendering is complete and the final render connecting arrived several
you I'm not trying to talk more about these issues later in this world. But the important thing to take away right now is that these really aren't mine at issues. These are real issues that could affect your index ability to go tags. H-E-B card understand the content on your web pages. I want to see Cleo. Old web pages on a website necessarily need to be indexed for example website to find the individual session pages when it being indexed. But because the chemical tag we rented in the client and the URLs with fragment identify base. 3D template with clean URLs and set aside
render canonical tags to ensure this prescriptions would probably index because we care about that content and to ensure these documents were added them to the site map as well. But what about the single page app which allows for filtering session on that page? So ask yourself this the pages I care about from the perspective of content indexing use client-side rendering anyway. Okay. So now, you know when building a client-side rendering website, you must tread carefully as the web in the industry has gotten bigger, since you have two teams and companies become more complex.
We the people building website the same people promoting or marketing this website and see this challenge is when they were facing together is an industry both from Google's perspective. And yours is Bella face after all you want your content index my search engine and so do we This seems like a good option to change track. So don't you want to take over and tell everyone about the Google search policy changes and some of the best practices they can apply so we can meet this challenge together. Sure. Thanks. Tom. That was a great summary of how search works
our job as a search engine is pretty easy here. We just got there a free rider HD content we call this hybrid rendering. This is our long-term recommendation. We think this is probably where things will end up in the long run over and practice implementing that can still be a bit tricky. Don't make this easy. I put call out to angular since we feature them in the beginning as an example of a page that was hard to pick up. They have built a hybrid rendering mode with angular Universal. What I tell you to do that's a little bit easier over time. I imagine more
before. So we call it dynamic because your site dynamically detect whether or not to request there is a search engine crawler like Google. And only then sends a server-side render content directly to the client. You can include other web services here as well that can't deal with red Drake or example, maybe social media services or chat Services anything I try to extract structured information from your pages and for all other requesters. So your normal users you would serve your normal hybrid or client-side winter coat. This also gives you the best of both worlds and makes
Puppeteer, which is a no JS Library which wraps a headless version of Google Chrome underneath. This allows you to render pages on your phone. And another option is render triumph, which is what you can run as a software-as-a-service that renters in cash with your content on your side as well. Both of these are open source, so you could make your own version or use something from a third-party that does something similar as well. For more information on these. I'd recommend checking out the iOS session on head looked wrong. I believe that's a recording about that already. Either way. He
can be pretty resource-intensive. So we recommend doing this out of van from your normal life and implementing cashing at as you need it. So let's take a quick look at what your server infrastructure might look like with a dynamic renderer integrated repressed from Google Maps come in on the side here. They are sent to your normal server and then perhaps through a reverse proxy. They're sent to the dynamic render there if requested render is the complete final page and since that's back to the search engine. So without needing to implement and maintain any new code this
setup could enable a website that's designed only for client-side rendering to perform Dynamic rendering of the content to Google. And two other appropriate clients. If you think about it, it's kind of solve the problems that Tom mentioned it before and now we can be kind of confident that the important content of a web pages is available to Google. When texting so how might you recognize Google. Request. It's actually pretty easy. So the easiest way to do that is to find Google. In the user agent string that you can do something similar for other
services that you want to serve pre-rendered content to and from Google. As well as some others you can also do a reverse DNS lookup if you want to be sure that you're serving it just to legitimize science. At one thing to kind of watch out for here is that if you serve adaptive content to smartphone users vs. Desktop users are you redirect user has two different Darrell is depending on the device that they use you must make sure that Dynamic read during also returns device Focus content. In other words, when they go to your webpages, they should see the mobile version of
coding conventions like Arrow function aren't supported by Google. And without that was added after Crown xli currently. Is it supported you can check these on a site like can I use and why you could theoretically install an older version of chromium. We don't recommend doing that for obvious security reasons. additionally, there's some apis I Google about doesn't support because they don't provide additional value for search will check these out to All right, so you might be thinking it sounds like a lot of work. I don't know if I really need to do this.
have any libraries that can't be transferred back to es5 then Dynamic Frederick can't help you there. And that said we continue to recommend using proper graceful degradation technique so that even older clients have access to your contact. And finally, there is a third reason to also look into this in particular. If you used a social media sites, if your side relies on sharing through social media or through that application if these Services require access to your Page's content than Dynamic rendering can help you there, too. So when you might you not use Dynamic rendering
I think the main aspect here is balancing the time and effort needed to implement and to run this with the game that are received the remember implementation and maintenance of dynamic rendering can use a significant amount of server resources. And if you see properly high frequency changes to your side, maybe you don't need to actually met him some at anything special most I should be able to like Google and other Pages just fine. Like I mentioned your pages probably don't need Dynamic rendering for that time.
Let's take a look at a few tools to help you figure out what the situation is. I want diagnosing rendering we recommend doing so incrementally first taking the raw agent TV response and then checking the rendered version either on mobile or on mobile. And that's if you serve different contact, for example, let's take a quick look at these. So looking at the raw HD to be response one way to do that is to use Google search console add to gain access to Google search console and too few other features that they have their first need to
verify ownership of your website. This is really easy to do their few ways to do that. So I'd recommend doing that regardless of what your workout verify you can use a tool called Fetch as Google which will show the HTTP response that was received by Google not including the response code on top and the HDMI all that was provided before any rendering was done. This is a great way to double-check what is happening on your server, especially if you're using Dynamic rendering to to serve different content to Google. I want you text the wrong response.
I recommend checking how the pages actually render. So the two I use for this is the mobile friendly test. It's a really fast way of checking Google's rendering of a page as as I mentioned the name suggests that it's made for mobile devices. So as you might know overtime or indexing will be primarily focused on the mobile version of a page we call this mobile first indexing. So it's good to already start focusing on the mobile version when you're testing Gregory, We recommend testing a few pages of each kind of page within your website. So for example, if you have an
e-commerce site check the homepage soon as a category pages and some of the detail page and you don't need to check every page on him the whole website because a lot of times the 10th lights will be pretty similar. If your page is render. Well here then chances are pretty high Google search as well. One thing that's kind of a downside here is that you just see the screenshot that you don't see the record HDMI out here. What's one way to check the HDMI how well you fry? Oh, I think we lost it yesterday. We've added a way to review the HDMI out after
responsive. A lot of times not everything needs to be called kind of like Tom mention that for example, also if you have tracking pixels on a page Google. Doesn't really need to render those tracking pixels, but if you use an ATM to pull in content from somewhere else that's API endpoint is blocked by content at all and get the list of all of these issues is also available in search console. so when Pages fail in a browser, usually I checked the developer console for more information to see more details on exception and
is implemented. Google may be able to trigger it and with that may be able to pick up these images for indexing that for example, if the images are above the fold and you're lazy loading kind of runs those images automatically then we'll probably see that. However, if you want to be sure that he's able to pick up lazy load image one way to do that is to use a no script tag, so you can add to notes for pack around the normal image and will be able to pick that up for image search director. Another approach is to use structured data on a page when we see
structured data that refers to an image. We can also pick that up for image search images that are a reference only through CSS. We currently only index images that are embedded with the structure or with I parked wrong lazy loaded images to be loaded. What about tabs? That load the content after you click on them or if you have infinite scroll patterns on his side Google. Generally won't interact with a page. So it wouldn't be able to see these there two ways that you can get this to Google. So either you can pre-load the content and just UCSF to toggle visibility
on and off. That way. I can see that content from the preloaded version or ultimately you can just use separate URL app and navigate to user in Google Voice to those pages individually. Google box is a patient fought there a lot of pages that we have to crawl. So we have to be efficient and kind of go through pages fairly quickly will take ages are slow to load or rendered Google. Might miss some of the content and since embedded resources are aggressively cash for search rendering time out a really hard to test
for these problems. We recommend making performance and efficient web pages with your hopefully already doing for users. Anyway, right anyway in particular limit the number of embedded resources and avoid artificial delays, like kind of decisions like here you can test pages with the usual set of tools and roughly test rendering with the mobile-friendly testing tool little bit different for indexing in general if the pages work in the mobile friendly test still work for search injecting two, Additionally Google I want to see the page as a new user would
see it. So this way to store something locally would not be supported. So if you use any of these Technologies, make sure to use Grateful Dead Redemption techniques to allow anyone to view your pages, even if he's apis are not supported. And that was it what your guards to critical best practices now, it's time to go back and see what we seen. So first we recommend checking for proper implementation of best practices that we talked about in particular lazy load images are really
common second test the sample of your pages with the mobile friendly test and use the other testing tools as well. Remember you don't need to test all of your pages just make sure that you have all of the template cover and then finally if pages are large and recycle large and quit changing and you can or you can't reasonably fix rendering across besides then maybe consider using Dynamic rendering take me to serve Google. In other crawlers the pre-rendered version of your page. And finally if you do decide to use Dynamic rendering make sure to double-check the result there as
well. One thing to keep in mind indexing isn't the same as ranking but generally speaking Pages do need to be indexed before their content can appear in search at all. I don't know, Do you think that covers about everything all it was a lot to take in John some amazing content, but I guess one question I have and I think maybe other people in the audience at this on that might as well. Is it always going to be this way John? That's a great question Tom. I don't know is I think things will never stay the same. So as you mentioned the beginning this is a challenge for us
that's important within Google search. We want our search results reflect the web as it is regardless of the type of website. To use so our long-term version is that you the developers need to worry as much about this for search problems. Stop circling back on the diagram that the Tom Toad the beginning with deferred rendering one change we want to make is to move rendering closer to crawling and indexing another day. We want to make is to make Google bought you the more modern version of Chrome overtime. Both of these will take a bit of time. I don't like making
websites work well in Google search if you have any questions will be in the mobile web sandbox area together with a search console team, and alternately you can always reach out to us online as well be at our office out alive office hours hang out and in the white Mazda help form as well. So thanks everyone for your time. Thank you.
Buy this talk
Access to all the recordings of the event
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.