how google crawls dynamic pages? [closed]

how google crawls dynamic pages? [closed] - php

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am about to create an Online Shopping site for my one of the client. I have to make this site SEO Friendly and therefore I must have to understand few things before I proceed to make a custom CMS Based website.
As I said I am going to make a Custom CMS Based website so that my client will be able to add new content through CMS but I don't understand few things.
For Example: I have an index.php page which has many links to different products and all of these links are created through Database using PHP. Site Link like
http://www.def.com/shoes/Men-Shoes
My Questions:
1) I want to know that when the GoogleBot crawls my site, will it also open my dynamically created links and index them? Will GoogleBot also index the content of my dynamic links?
2) Do I have to create seperate pages for all of the products on site and store them on my server? Or just a single page which serves dynamically according to user query for every product?
I read this
"It functions much like your web browser, by sending a request to a web server for a web page, downloading the entire page, then handing it off to Google’s indexer."
is it right?
my above query was actually looking like this and I used .htaccess file to make it pretty
http://www.def.com/shoes.php?type=Men-Shoes
so is it right and google will crawl it to index?

SEO is a complex science in itself and Google is always changing the goal posts and modifying their algorithm.
While you don't need to create separate pages for each product, creating friendly URL's using the .htaccess file can make them look better and easier to navigate. Also creating a site map and submitting this to Google Via their webmaster tools will help them to know which pages to index.
GoogleBot will follow the links in your site, including dynamically created one, but it is important not to try and game the system using Blackhat methods if long term success is your aim.
Also, use social media (Twitter, Facebook, Google+) to help promote your brand and make sure you follow Google's guidelines with regards to SEO and inpage optimisation.
There is a huge amount of information on the internet on this subject, but be careful what advice you follow.

Google and other search engines index the dynamic links too. So a way to avoid duplicate content is to use the "Crawl"->"URL Parameters" tool in Google Webmasters. You can read more about how that works here https://support.google.com/webmasters/answer/6080548?rd=1. Set "Crawl" field to "No URLs". By this way you could hide from search dynamic links but you have to have a list of all of your dynamic links of your website/CMS in order not to hide important content accidentally. The "URL Parameters" feature is available in Bing Webmaster tools also http://searchenginewatch.com/sew/how-to/2195777/bing-webmaster-tools-an-overview#.

Related

How do I protect individual wordpress urls from being attacked? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am trying to secure my WordPress site from hackers - more specifically, individual non-content pages that appear to be getting more hits. I am using Siteground and installed WordPress a few months ago. Checking the website statistics I was taken aback by what I saw. I have briefly summarised the page hits below.
https:// ... /wp-admin/admin-ajax.php --> this page has been viewed over 16k times each month. Considering my site has been live for only 2 months, contains no SEO, and no-one knows about its existence this is odd!
https:// ... /index.php/wp-json/wp/v2/users/ --> this page is giving away my usernames.
https:// ... /index.php/wp-json/wp/v2/pages --> appears to display code from one of my main pages.
And a whole load of pages that appear very odd to be accessing:
https:// ... /index.php/wp-json/wp/v2/taxonomies
https:// ... /index.php/wp-json/wp/v2/categories
https:// ... /index.php/wp-json/wp/v2/taxonomies/post_tag
https:// ... /index.php/wp-json/wp/v2/taxonomies/category
https:// ... /index.php/wp-json/wp/v2/tags
https:// ... /wp-admin/load-styles.php --> shows blank screen
There's a whole load more, some redirects to the WordPress login page and others show a blank screen. There's also some URLs that allow any user to download a *.woff file (whatever that is?!).
Point is, I thought WordPress would be secure enough to not let these pages appear visible and show details at the very least.
Is there anything I can do? As I pointed out, I'm using Siteground which doesn't use cPanel.
I thought the most difficult part of a blog site is the content creation and overall web design. I'm not sure now.
Any help and/or advice would be greatly appreciated.
Thank you.

As for accessing login pages, that's to be expected with a WordPress site - bots love them.
You should have a strong password and use a nonstandard username for your admin-rights accounts. Bots will always access the default page with default login credentials to try it out. You could go another step and move the login page, too, that will massively drop accesses to the real login page, there's a plugin for it if you aren't comfortable coding that yourself: WPS Hide Login.
As for the wp-json URLs, you can ensure they are requiring logins / disabled with answers provided here, such as a plugin that disables it: Disable REST API.
Concerning the .woff files, those are just font files, either a bot scrapes over them or a user is accessing them to view the web page as it was designed; not a concern really.
WordPress has a decent article on additional things you can do to secure your website as well here.

Content management for my already coded website [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am in secondary school and learning web development and our latest school project was to come up with our own company based around a website.
Basically my website is going to display aspiring animator’s videos, there is going to be a place where other users of the website can comment feedback on these videos and there are going to be other resources for the animators to use.
I have already created the base of the website. I have placeholder youtube videos on the home screen (where the user’s videos would go) and I have a contact page and a resource page.
Basically, my teacher told me that if I wanted the website to actually function, that is to have a login system where users can go in and be able to post their own videos for the other users to see, (posting videos would most likely be in the form of submitting a youtube link, there the video would be displayed on the home page) and have a comment system for other users to be able to leave feedback on other user’s videos and so on, my best option was to use a CMS e.g. Drupal. I was unsure if this would be my best option, because as far as my research goes, I believe that CMS are made for users to use their web templates and it doesn’t work well for those who have already got a website coded. (unsure)
I am new to making websites but I am quite capable with a bit of learning. Basically, all I need to know is what method I should use to integrate this login system for users to be able to post and comment to my website and a way for an admin who would run the website to be able to manage the content on the website easily without having to change any of the code. Considering that I have already coded my website, I am unsure if this is possible and I do not have the time to start again.
Thanks for your help.

Actually I belive that it would be lot easier to simply take your coded website and convert it to template for one of the most popular CMS platforms (Joomla, for example). It would allow you to use thousands of free plugins (also for video uploading and galleries, for that matter), and will make your site LOT safer. It's lot faster than coding your own CMS too - if design is not very complicated and you don't have lot of functions, I belive it would take you few days max to install Joomla, find, add and configure few necessary plugins, and follow one of hundreds of tutorials about converting your HTML to Joomla template.
If you insist on coding your own CMS, start with this tutorial
https://css-tricks.com/php-for-beginners-building-your-first-simple-cms/
It's old, from 2009, but it covers most of the basics of working with simple databases, user login sessions, etc.

How to crawl links from a database of a website? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am new to search engines, and I find googlenews very interesting.
I would like to write a simple crawler which
parse only the article links of three different news sites.
Save the links in database (mysql) with the timestamp in which the link has been advertised on the website (not the time in which the link has been detected by the crawler).
As you know, news website generate links on a daily basis (And I would like basically to parse all their links (not just those who are printed today, but also all the links that were generated before...and all these links are kept in the news website database).
I dont know which database is used by the news websites that I want to crawl and I also don`t have access permission to it.
So how does googlenews able to parse all the article links of all news sites, including the links which have been generated long time ago? Does googlenews have access to all those websites databases?
How does a crawler know that a NEW link has been added to the website? if for example, a news site posted a new article, and I want my crawler to parse the link immediately, how can the crawler knows that (googlenews also able to do it...so how...?) i.e does the crawler knows immediately about the new article link? or google just crawls the website on a fixed interval (every one hour etc...)?
How does google news crawler know when a new website has been launched?
Does the crawler looks automatically for new websites, or google engineers basically holds a fixed list of news website to crawl?
The same question can be asked regarding google search crawler i.e crawler should be aware that a new domain has been launched so it can crawl it and therefore make sure google database reflect the most updated state of the world wide web.
So is there any open worldwide database which keeps all the domains ever launched and google basically crawls it?
What will be the best tool to implement my news website crawler?
Apache Lucene, Nutch, Solr, ElasticSearch?
Maybe http://phpcrawl.cuab.de/?
I am REALLY curious to the answer of the above four questions.
Please assist.
Thanks in advance.

You have some key questions here which I'll answer but first you should understand what is a crawler.
What is a crawler?
The crawler's job is to scan the internet by reading a page, getting all the links he contains and then reading those pages as well. The main purpose of this action is to find new content automatically. A good crawler will start crawling few big and familiar websites that updates often, this way he can update and index these sites and also get new content and new sites fast (because big websites often contains links to other sites).
Regarding your questions:
Does googlenews have access to all those websites databases?
No, if you got access to the database there is no need for a crawler.
How does a crawler know that a NEW link has been added to the website?
Google crawls every site once in a while and searches for new links inside the site. Usually a new page or an article will be linked through the main page that already stored in Google's database.
How does google news crawler know when a new website has been
launched?
The simple answer is: the crawler finds a link to the new website, checks if the website is in the system and if not, adds it.
How they get the links of the old articles?
Easy, they save those links in a huge database. Google started crawling the internet years ago. Old links probably won't show up if Google will start crawling the internet today all over again.
How do I get the timing in which the site posted the article?
That's depends on the site you're crawling. If each article have a date you need to parse the page and extract this date. This article have a date in the top and it's easy to find the the HTML dom by searching the date class: <span class="date">6 June 2014</span>.
If the date does not appear, you won't have a way to know when they published it.
As a developer you can make the life of Google easier and ask Google to crawl your new website via Google Webmaster Tools.
While crawling the web, Google also counts how many links lead to a page, this will affect the page's ranking. Many links to your site will indicate you have a valuable content and you should appear higher in the search results.
Writing a simple crawler is easy. You get a page's content with php cURL or file_get_contents, parse it, select and save the data you want, extract all the links in this page and then recursively crawl the links you found.

In page A/B testing [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I looked all over google and the internet as to performing In Page A/B testing.
What I am trying to do is perform A/B testing on a single page, but that page's content varies on the referring url, performed through <?php include
Say if you come from Google, it displays, 'Hey, are you new here?!', or if you come from another page on our website it will display 'Let's get you started'. The goal is then to see which page has longer visit duration.
Does anyone know of how to do this through Google analytics/Optimizely or any other analytics plugin?

Ryan,
I believe this shouldn't be too difficult to do... just depends on the tools you use :-).
Personally, I can recommend Visual Website Optimizer, which allows you to segment a running test to specific segment based on various conditions. Referring URL is one of them (see screen below).
However, you can then use only one variation of the page, so if you have more segments that you would like to test, you would need to:
Duplicate the test itself,
Change the copy in the variation,
Set up the segment rules according to your needs,
Follow this procedure with every segment :)
I have done this, but can't say it had much impact. It was too much work and I personally prefer segmenting based on customer data (new/existing customer etc.), where you can notice much more impact and it's also then "easier" to report since the differences are quite noticeable.
Hope this helps!

You should be able to set this up in any modern A/B testing tool on the market. Here’s how to do it in Optimizely:
Create a new experiment and go to Options and Targeting. Select the page that the experiment should run on and select the referrers it should run for in Visitor Conditions:
Make sure that the messaging isn’t displayed for the the Original / Control / A in the experiment.
For the Variation / A, use the visual editor to add an element with the messaging or select an existing element on the page to change it’s text. You can also write Javascript code to insert the element (via ‘Edit Code’).
If you want to display different messages for different referrers, click on the the ‘Edit Code’ ribbon in Optimizely and wrap the Javascript in if clauses for each referrer (and create a backup message), like so:
if (document.referrer.match(/^https?:\/\/([^\/]+\.)?reddit\.com(\/|$)/i)) {
$('#welcome-message').text('Hi redditor!');
} if (document.referrer.match(/^https?:\/\/([^\/]+\.)?huffingtonpost\.com(\/|$)/i)) {
$('#welcome-message').text('Hi Huffington Post reader!');
} else {
$('#welcome-message').text('Hi! I’m a backup message, just in case !');
}
Select Options and Analytics Integration. Enable Google Analytics.
Start the experiment. Within a few minutes you should see the first results in Optimizely. In a few hours, results will be available in Google Analytics, where you can drill down to see how things like visit duriation, pageview per visit and bounce rate changed based on custom segments (that’s how the variations are displayed in Google Analytics).

How can I get search engines to index dynamically generated pages [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I'm trying to create a blog ..and for every entry I have a template page called entry.php
I create the pages by creating a link from my home page like this
mywebsite.com/entry.php?title=My Title&content=This is my blog
This link is passed to entry.php and a page is generated on the fly based on the link i wrote but search engines wont index these.
Do I really have to create a unique page for every entry I create?
Do sites like youtube have individual pages for every single video or are they generated dynamically like im trying to do. If so how do the videos show up in search results?
Ive heard something called .htaccess or sitemap.xml can be used for this I have no idea what these are though.

Learn about .htaccess
There are a lot of generators for .htaccess and you can use one of them. The best thing to do is, using a .htaccess file, create the paths this way:
mywebsite.com/entry.php?title=My Title&content=This is my blog
Change them to:
mywebsite.com/My Title/This is my blog
And the code for the same is:
RewriteEngine On
RewriteRule ^([^/]*)/([^/]*)\.html$ /entry.php?title=$1&content=$2 [L]
Websites offering generators:
.htaccess redirect
Mod Rewrite Generator
Search Engine Optimization Tools » mod_rewrite rewriterule generator

As long as the page is linked to from somewhere, search engines will find the page. It does not matter how the page is generated. A link is a URL, when requesting this URL the browser or any other client (including search engines) receive the page content, period. It works like any other regular page, search engines don't know or care what happens behind the scenes to generate this page.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.