I'm using the "Snoopy" class to pick up HTML for phrasing.
The problem is that with one of the pages I need to get the html for redirects automatically because I'm using a the sites search and if it find a perfect result it will redirect.
Here is my snoop:
if($snoopy->fetch("http://www.rottentomatoes.com/search/?search=$pagelink&sitesearch=rt")){
$printable = $snoopy->results;
If the search is exact it will place me on a page like this...
http://www.rottentomatoes.com/m/captain-america/
I need this above link.
Any help would be great,
Thanks!
From poking around in the code a little, it seems like you should be able to check the variable $snoopy->lastredirectaddr, which should be set if you got redirected (if not, it should be a blank string).
Related
This is my first question I hope to get best guidance.
I'm trying to grab the content of a webpage using file_get_contents().
In many occasions it's working fine, but there is one thing that is driving me crazy.
I'm separating a long link into three parts and put it back together with the code below. The link is a pagination link and the "3" is responsible for indicating the page, so in this particular link I want to see page 3.
$combinedlink = $firstpart."3".$secondpart."3".$thirdpart."1445256372";
$input = file_get_contents($combinedlink);
When I now echo $input, it is showing me page 1 instead of 3. When I echo the $combinedlink and follow it, it is taking me to the correct page. Now the shocking part: When I copy the output of echo $combinedlink; and insert it like this:
$input = file_get_contents("http://www.ReallyLongLink.de/EvenMoreStuff");
It is working fine and takes me to page 3. But the variable contains exactly the right thing but it is only working when I hard-code the link. Var_dump also shows me String(178) and then the string in quotationmarks.
The website you are trying to crawl might be using some other means of pagination besides the URL, such as a cookie/ session. That might explain why the link works in your browser but not in your script.
To track cookies sent by the website, you may want to try using a library, such as Guzzle, to fetch the pages.
UPDATED
$input = file_get_contents(html_entity_decode($combinedlink));
I have never seen anything like this before and I did a search and couldn't find anything about it.
I created a site map for my site and I noticed quite a few url's were like
url.com/page1.php/page.php
url.com/page1.php/page1.php
url.com/page1.php/page2.php
Any idea of why that would be doing that? I checked all my code but I don't see anything out of the ordinary but again, I have no clue why it would do that so I am not sure what to look for.
You can use the header() method that is build in PHP. It allows you to redirect the user to specific URL.
header('Location: http://www.example.com/');
I have this website http://www.kdomestriha.cz/recenze-kadernicvi which basically shows list of hairdressers you searched for. When you enter one word term and search (you can try "Praha"), ajax update on pagination works perfectly fine. However if you try to enter two words (you can try Hradec Králové), pagination will refresh whole site. I am not sure, if showing all of my code helps... Does anyone have any clue what could be a cause of this strange behavior? Thanks
Since you didn't post any code, only way I got that looking your web source, and I point what wrong was in it.
If I searched by word "Praha", your web would generate following things:
<div id="Praha" class="list-view">....
After then in jQuery, you have code to access it through syntax:
$('#Praha').....
In order to achieve your own purpose, I think you used searching word as element id, it would caused the problem if you entered more one word into your filter, space character is not a valid for ID attribute in jQuery
$('#Hradec Králové').. //failed
It did not raise any error, but it wouldn't work as you expected.
If pagination is causing your site to refresh, the most likely cause is that something in the search/filter result is causing javascript error, thus causing yii to revert to full page refresh.
I'd advice that you look at your page in chrome's web inspector (cos that's what I use) after searching and confirm that the javascript isn't broken
I have a PHP website that I send users to via a Dynamic URL like this:
http://mwebsitehere.com/?gw=1
well the page I send them too, works great with the code I am using to do certain things if the Dynamic content is set in the url. But whenever they click on a link on the page, which are ALWAYS changing, the Dynamic Content in the url is completely gone... For instances:
Lets say they are on the homepage that looks like this http://mwebsitehere.com/?gw=1, and then they click on a link that looks like this http://mwebsitehere.com/new-page/. Notice the ?gw=1 is completely gone from the url.
Is there a way to keep the Dynamic Links on every page if the url has dynamic content.
Like if it were to say ?gw=2 could all the links they click on or url somehow keep ?gw=2 on every page. Or if it said ?gw=1 for it to do the same thing.
Any help would be appreciated! Let me know if I need to explain my question better. Thanks!
I am also using wordpress, just in case you know anything wordpress specific! Thx!
the only reason to have get variables ?gw=2 in the url is if they are needed for that page, if you are wanting them for all pages,
have your scripts check to see if it exists in the $_GET array or $_COOKIES array, if its in the $_GET array but not it in the $_COOKIE array then set it in the cookies. That way your script will still see it,by checking the cookies.
No sense in cluttering the url with variables that dont need to always be shown.
If you want the exact same variable passed to every page, why not use
$_SESSION['gw'];
or
$_COOKIE['gw'];
to store "gw".
Otherwise you would have to pass it on via each link as follows
For example on page http://mwebsitehere.com/?gw=1
Link
There are a few ways you can do this.
You may use $_SERVER['QUERY_STRING'] and put it in every single link in your page. It will keep your links always repeating the same query string that your current file is.
You should try storing data in sessions! Then you can carry data from a page to another. Take a look at the PHP manual.
Good luck!
hi im using ajax to extract all the pages into the main page but am not being able to control the refresh , if somebody refreshes the page returns back to the main page can anybody give me any solutions , i would really appreciate the help...
you could add anchor (#something) to your URL and change it to something you can decode to some particular page state on every ajax event.
then in body.onload check the anchor and decode it to some state.
back button (at least in firefox) will be working alright too. if you want back button to work in ie6, you should add some iframe magic.
check various javascript libraries designed to support back button or history in ajax environment - this is probably what you really need. for example, jQuery history plugin
You can rewrite the current url so it gives pointers to where the user was - see Facebook for examples of this.
I always store the 'current' state in PHP session.
So, user can refresh at any time and page will still be the same.
if somebody refreshes the page returns back to the main page can anybody give me any solutions
This is a feature, not a bug in the browser. You need to change the URL for different pages. Nothing is worse then websites that use any kind of magic either on the client side or the server side which causes a bunch of completely different pages to use the same URL. Why? How the heck am I gonna link to a specific page? What if I like something and want to copy & paste the URL into an IM window?
In other words, consider the use cases. What constitutes a "page"? For example, if you have a website for stock quotes--should each stock have a unique URL? Yes. Should you have a unique URL for every variation you can make to the graph (i.e. logarithmic vs linear, etc)? Depends--if you dont, at least provide a "share this" like google maps does so you can have some kind of URL that you can share.
That all said, I agree with the suggestion to mess with the #anchor and parse it out. Probably the most elegant solution.