Google doesn’t find every page on web. Search engines start at some popular spots and repeatedly follow links from there, “crawling” their way across the web. A new website that nobody links to will not be found unless its admins explicitly submit it to Google, asking for it to be crawled.
Google gets to know about any new website through these 3 sources;
Domain Discovery: Google DNS: Almost every time you visit a website, it needs to get the IP address for the website. Google DNS is very popular DNS around the world, DNS logs are very useful for discovering domains.
Domain Registrars: hostgator, hostinger, inmotion and other domain registrars
Web Page Discovery through: Google Toolbar / Google Omnibox / Mozilla Suggestions / IE Suggestions:
Google/Bing make very heavy use of toolbar/omnibox data. Whenever a user visits the page, the request is logged by the browser/toolbar.
Browser/Toolbar logs are very rich source of signals for URL discovery and ranking. Assuming a page is visited by at least one person, the creator, Google can discover it from the logs.
Partners: Sitemap.xml/ RSS feed
Website owners can communicate the structure and orphan pages of the website to search engine using sitemap.xml.
How Google search engine works? Explanation by Matt Cutts