I am developing a website similar to a web forum. Users will post queries and others help them through their replies. In many websites like mine, the topic of the query will be included in the URL, e.g. www.sample.com/topic-1.html or www.sample.com/topic-2.html, and these links can be found via search engines.
How can I dynamically generate the HTML files and configure my website so that search engines can access it?
No, they usually aren't putting these files on web server manually. They are rewriting urls using web server (e.g. apache2/nginx). Check the response headers to get more info about what happens behind the scenes.
Please see How to create friendly URL in php?
Related
So, I have a website where I am allow users to aggregate certain settings into readable html pages that can be emailed or copied in html to be shared elsewhere.
What I would like to do now is use REST API in order to allow users to post this content directly to their WordPress blogs (independently hosted or on wordpress.com). From my research, REST API is the best way to go about doing this. However, I cannot seem to find any reliable resources on how this is to be exactly implemented or achieved (or if it is 100% entirely possible).
I'm hoping there are one or two people who have had experience with this sort of thing and who can provide me with some guidance!
There is no standard REST interface to WordPress. However, WordPress does have an XMLRPC interface to post, edit, and otherwise manage content. Information on this interface is available in the WordPress Codex at:
https://codex.wordpress.org/XML-RPC_WordPress_API
I'm trying to detect the CNAME used on incoming traffic so I can customize a site accordingly. I have a site that displays some info to the client (actually more complicated, but will work for an example). Some of my customers send their own customers to my site to see this info. They are using a CNAME to get to my site. I would like to display certain logos etc to the viewer based on what CNAME was used to get to my page.
What I have come up with so far is using dns_get_record, I'm I on the right track with that if it can be done at all?
Thanks for any help!
It depends on the language that your web content is written in, it also depends on what webserver you are using. What you want to look for in the web content is the "HOST header". If you want to treat it differently at the web server then what you want to look for is "virtual hosts".
I am new to programming so please if I say something stupid don't judge me.
I was wondering if there is any way to trick web crawlers, so some of the content of a website will be different for a human visitor, than a web spider.
So here's an idea I thought.
Everytime a visitor enter a page, there will be a script that will identify users gender from facebook API. If there is a return (if user is connected to facebook in the same browser) then some code will be printed with PHP to the page code. If it's a crawler, there will be no return so the code will not exist in the source code of that page.
I know that PHP is a server side language, so web crawlers don't have permition to scan those codes. If I am not right, please correct me.
Thank you.
I think what you are trying to do can be accomplished with robots.txt
This file can sit at the root of your web directory and it defines the rules for web crawlers.
See here: http://www.robotstxt.org/
I would like to prevent google from following links I have in JS.
I didn't find how to do that in robots.txt
Am I looking in the wrong place?
Some more information:
I'm seeing google is crawling those pages although the links only appear in JS.
The reason I don't want him to crawl is that this content depends on external API's which I don't want to waste my rate limit with them on google crawlers and only per user demand
Direct from google ->
http://www.google.com/support/webmasters/bin/answer.py?answer=96569
Google probably won't find any links you have hidden in JS, but someone else could link to the same place.
It isn't links that matter though, it is URLs. Just specify the URLs you don't want search engines to visit in the robots.txt. The fact that you usually expose them to the browser via JS is irrelevant.
If you really want to limit access to the content, then just reducing discoverability probably isn't sufficient and you should put an authentication layer (e.g. password protection) in place.
Hi friends I want to implement Bing Search my for My Static Site So that it when I put somthing in the search bar it should search my hole site for the contents to match the search keywords.
Please Can any one help me to do this.
http://www.bing.com/siteowner
Here is Bing's "Getting Started" site, which should help you get started and guide you through setting this up on your website.
You don't necessarily have to submit your site to Bing, but it might be a good idea if searches are not generating results. Bing has a web crawler, the so-called "BingBot" that indexes web content. It may have already found your site and indexed it. You might check the robots.txt file to make sure that it doesn't contain directives blocking crawlers.
To submit site to Bing:
http://www.bing.com/webmaster/SubmitSitePage.aspx
From personal experience - we didn't formally submit our site to Bing. We've got a Bing Box, and it's returning results.