How can I fix PDFs and DOCs that fail to index or have the wrong title?
If a few PDF and DOC files are not added to your collection or have the wrong title, here are some steps to take.
- The first thing to do is check how the crawler views your document. Do this by adding the URL of the document to the debug page.
- If the debug page shows that the page is indexed correctly, then go to the Domains section of the console and use “Diagnose” to see the current crawl status of the page. If status is no-index or redirect, then it means that there are rules in the collection or a no-index tag in that document due to which we cannot crawl your document.
- If the debug page shows an error and mentions that it can't download the document, then it's likely a corrupt file. Some systems may still be able to open the file, but not all. We recommend re-saving or exporting with a different program or version.
Regarding the documents that have wrong title, we take the title from the metadata of the document. If no title is present, then we use the filename instead. You can do the following to update the title:
- Update either the metadata or the filename and upload the file to your CMS/website
- Once added, we will index the PDF on the next crawl cycle. If you want the change to reflect immediately, then re-index the URL of the PDF document via our Diagnose tool in the Domains section.