We're using the Google CSE (Custom Search Engine) paid service to index content on our website. The site is built of mostly PHP pages that are assembled with include files, but there are some dynamic pages that pull info from a database into a single page template (new releases for example). The issue we have is I can set an expire date on the content in the database so say "id=2" will bring up a "This content is expired" notice. However, if ID 2 had an uploaded PDF attached to it, the PDF file remains in the search index.
I know I could write a cleanup script and have cron run it that looks at the db, finds expired content, checks to see if any uploaded files were attached and either renames or removes them, but there has to be a better solution (I hope).
Please let me know if you have encountered this in the past, and what you suggest.
Thanks,
D.
There's unfortunately no way to give you a straight answer at this time: we have no knowledge of how your PDFs are "attached" to your pages or how your DB is structured.
The best solution would be to create a robots.txt file that blocks the URLs for the particular PDF files that you want to remove. Google will drop them from the index on its next pass (usually in about an hour).
http://www.robotstxt.org/
What we ended up doing was tying a check script to the upload script that once it completed the current upload, old files were "unlinked" and the DB records were deleted.
For us, this works because it's kind of an "add one/remove one" situation where we want a set number of of items to appear in a rolling order.
Related
I recently installed Sphider onto my site and it was simple to do so and indexing the pages was very simple, however I ran into a small issue.
I have a lot (seriously loads) of pages on my site and a lot of them weren't indexed. I have a page which takes a .csv file and creates a table using a foreach loop in PHP and the first column is a hyperlink to each item with a dedicated page for that item. My issue arises whereby Sphider does not index these individual pages, it only indexes the table page. I'm in a right two and eight because I have no idea why these pages are not indexed.
I checked to see if I had any but I didn't and I even set Sphider to index a random one of the individual pages from the table and it appeared in the search. I'd do this with all the pages but I keep adding new pages every time we get a new item so I would get inundated with things to add to the index list.
My question comes here: is there some solution where I can have a script that adds each URL to Sphider's database seeing as that seems to make them appear; or am I being a complete div and am missing something really obvious here that because of the .csv PHP table something goes wrong, maybe?
I would really appreciate your help because I am completely confused.
Thanks, Carty
PS, What's the standard for including tl; dr? Is that just for Redditors? :P
I had a similar problem when I first started using Sphider Search that when I would try to spider a folder on my website eg. www.mysite.com/myfolder which contained 900 different html pages, it would only spider / list in database 1 link which was www.mysite.com/myfolder.
I Figured out That sphider wont spider a whole directory if it has a 'index.html' or 'home.html' or 'index.php' file in said folder.
So I temporarily deleted my index.html file, successfully spider'd all 900 html files.
then re-uploaded my index.html
If index & home html files are not the cause, It might be your Spidering Link Depth Settings are not high enought.
lastly Sphider search respects the rel="nofollow" attribute in tags, so it wont index said links either.
Hope this helps.
if your page contain less then 3 words then sphider is not able to index by default. You have to change in
/sphider/settings/conf.php
as per your requirment.
$min_words_per_page=0;
I am trying to create a web page with a tab menu. I want to be able to dynamically add and delete tabs (and other content). There is a perfect example of what I want here: http://www.dhtmlgoodies.com/index.html?whichScript=tab-view . I want the newly created tabs to be persistent through page loads. So basically if I add a tab and refresh I want the tab to still be there. If I close the browser and reload the page a month later I would like the tab and any content to still be there. This page is for personal use and will be hosted on my computer and accessed through the browser alone, not any kind of web server. Although I'm not against using a web server if I need to.
Looking at the code it seems that the 'add tab' functions just add HTML to the page in memory but I need it to permanently change the HTML of the page. Is there a way to write dynamic changes to the DOM back to disk? I'm not quite sure where to go with this and searching for a week has left me with too many language and implementation options to look into. I am not an experienced web developer and there is so many different ways to create web pages and so many new terms that I'm a little overloaded now.
I do realize that this is a little outside the realm of a typical web-site. It is generally not a good idea to let the client-side make changes to data on the server-side. But since I am the only person who will be using this and it will not be accessible from the internet security is not an issue.
I'm not apposed to any particular scripting language, but I would like to keep it as simple as possible. I.e.: one HTML page, one CSS, and maybe a script file. Whatever is necessary. I am not apposed to reading and learning on my own either so being pointed down the right path is fine for me.
If you need a rock solid method, then you would need some record of having those tabs existing. That means having a database that knows that the tab exists, which tab it was, and what content it contained. Html5's local browser storage (not to be confused with cookies though) could also be a viable solution but browser compatibility is an issue (for now).
You also need some sort of "user accounts system" so you know who among your users had this set of tabs open. Otherwise, if you had a single "tabs list" for everyone, everyone would open the same tabs!
For dynamic html and js for the "tab adding", you are on the right spot. You need PHP to interact with the database that is MySQL. What PHP does it it recieves data in the server from the browser about what happened like:
know which user is logged in
what action did he choose (add or remove tab)
add to the database or delete a record
reply with a success or error, whichever happened
For MySQL, you need to create a database with a table for your "tab list". This list must have:
User id (to know which user did what among the ones in the list)
Tab id (know which tab is which among the ones in the list)
Tab content (it may be a link for an iframe, actual html, text etc.)
Friend, when you talk of closing the browser and not losing the data, then you are talking about data persistence or data durability. In other words, you have to save your data somewhere, and load it next time.
For storage you can use a flat file (a simple text file), a database, an XML file, etc. However, you need to learn a lot to save the information and content of the new tab somewhere, and next time load it.
I'm wondering if there's a more efficient way to serve a large number of link redirects. For background: my site receives tens of thousands of users a day, and we "deep link" to a large number of individual product pages on affiliate websites.
To "cloak" the affiliate links and keep them all in one place, I'm currently serving all our affiliate links from a single PHP file, e.g. a user clicks on mysite.com/go.php?id=1 and is taken to the page on the merchant's site, appended with our affiliate cookie, where you can buy the product. The code I'm using is as follows:
<?php
$path = array(
‘1′ => ‘http://affiliatelink1.com’,
‘2′ => ‘http://affiliatelink2.com’,
‘3′ => ‘http://affiliatelink3.com’,
);
if (array_key_exists($_GET['id'], $path))
header(‘Location: ‘ .$path[$_GET['id']]);
?>
The problem I'm having is that we link to lots of unique products every day and the php file now contains 11K+ links and is growing daily. I've already noticed it takes ages to simply download and edit the file via FTP, as it is nearly 2MB in size, and the links don't work on our site while the file is being uploaded. I also don't know if it's good for the server to serve that many links through a single php file - I haven't noticed any slowdowns yet, but can certainly see that happening.
So I'm looking for another option. I was thinking of simply starting a new .php file (e.g. go2.php) to house more links, since go.php is so large, but that seems inefficient. Should I be using a database for this instead? I'm running Wordpress too so I'm concerned about using mySQL too much, and simply doing it in PHP seems faster, but again, I'm not sure.
My other option is to find a way to dynamically create these affiliate links, i.e. create another PHP file that will take a product's URL and append our affiliate code to it, eliminating the need for me to manually update a php file with all these links, however I'm not sure about the impact on the server if we're serving nearly 100K clicks a day through something like this.
Any thoughts? Is the method I'm using spelling certain death for our server, or should I keep things as is for performance? Would doing this with a database or dynamically put more load on the server than the simple php file I'm using now? Any help/advice would be greatly appreciated!
What I would do is the following:
Change the URL format to have the product name in it for SEO purposes, such as something like "my_new_product/1"
Then use mod_rewrite to map that url to a page with a query string such as:
Rewriterule ^([a-zA-Z0-9_-]*)/([0-9]*)$ index.php?id=$2 [L]
Then create a database table containing the following fields:
id (autonumber, unique id)
url (the url to redirect to)
description (the text to make the url on your site)
Then, you can build a simple CRUD thing to keep those up to date easily and let your pages serve up the list of links from the DB.
I have a website live cricket scores , in which dynamically i am controlling the news section.
I have my own custom build CMS system with PHP, where admin will add the news to the web portal.
If i generate the Sitemap, all dynamically created pages wont be added to the sitemap,
is this a good practice or do we need to add the dynamically created links in sitemap?
if yes, can you please share how we can add dynamic links?
One more observation, I have made, whatever the news which is added getting cached within 4 Hrs in google.
Please share your thoughts, thanks in advance
If the pages are important, then you should add them to the site map so they can be indexed for future reference. However, if the pages are going to disappear after the match, then I wouldn't put them on the site map as they may get indexed then disappear, which may have a negative impact on your search engine rankings.
You can add these dynamic pages to a site map in a couple of ways:
Whenever a new dynamic page is created, re-create your site map. Do this by looking through the database for the pages which will be valid and writing them out into an XML site map file.
When a new page is created, read the current XML site map, and insert a new entry into the relevant place.
I would say the easiest option is option 1 as you can quickly and easily build a site map without having to read what you already have. That option also means that when you remove a one of the dynamic pages, it will be removed from the site map when it is re-built without the need to read through what you have, find the entry and remove it.
Google code has a number of different options for you, some of which you can download and run, others look like they need implementing within your own code.
Yes, if these pages content needs to be referenced by search engines, of course they have to be in sitemap.
I worked on a lot of ebusiness website and of course, almost 99% of pages where dynamically generated, almost 1000 product pages versus the 3 sales conditions & legal static pages.
So the sitemap itself was dynamic and regenerated every 15 minutes (to avoid dumping the whole product base each and running thousands of queries each tim the sitemap is called).
You can use a sort of separate script to do this : I would do one static part template if you have static page, and one other embedding the dynamically generated urls.
It would be easier if you CMS already embed url management (or routing) system.
just yesterday i finished one site, which provides video watching.
but now i need to show the number of views of each film. i never wrote such thing, so i don't know what to do.
maybe i must add one field in mysql database, and increase it every time the video opened?
but i use flash player, and i can't wrote script onclick of player.
so, could you give me an idea...
thanks
The best way is probably going to be to just parse your webserver's log and look for when Flash requests the video stream (nightly cron job). I'm guessing you are using a pre-existing Flash video player, where you may not have the ability to modify the Flash to push a server request to update the view count when PLAY is clicked, nor do I recommend it because you might switch to another player or to HTML5 streaming.
As already suggest I also agree, just add a "view_count" to your "video" table and increment it, you can do it in one step with (no need to retrieve, add and update)
UPDATE video SET view_count = view_count + 1
Also keep in mind, if you incrimenting your video count upon loading the video page, make sure to exclude page views from search bots (ex: google) because half or more of your view counts could end up being search bots. You can find a list of search bot browser strings, so you know what to look for here.
I'd create a table called video_views or something, have it contain the id of the video being viewed. I'd then use Flash's ExternalInterface functionality to call a JavaScript function which would, in turn, hit a remote URL via ajax to add a new record to video_views.
I've done this a few times in the past and it's always worked well. You could also just maintain a view_count value in your videos table and increment that each time a video is viewed.
[Edit]
One argument in favor of tracking each individual view would be the ability to add a timestamp to each record and gain further insight into things like peak traffic times.
Another way to go about it is to pass the web (Apache) Logs through a script real time by using the CustomLog directive as such
CustomLog |/var/www/view_count.php common
This would allow for real time statistics. Not sure if you need those or not.
fetch like below:
$viewd=$results['viewd'];
$viewd+=1;
mysql_query("update user set viewd = '$viewd' where id='1'");