PHP - Edit Request Being Made by Other Request to the Server? - php

everyone.
I'm working with cURL to download some page from a website. After the webpage loads, it makes a few other requests to the server to get some files from the server to load into the HTML (different images I want to compare that are randomized - they're a few decorating images that are randomizes, if that matters - a few are displayed, selected randomly from a set of images).
I want to load the images for the comparisons via cURL, but only after the page finished downloading.
Right now, in the webpage I get from the request I made, I get <img>s, but their srcs are wrong, because they're stored locally on the server I'm fetching the webpage from.
Because of that, the page on the original page on the website is able to get the picture (since it's running on that server), yet I'm unable to get them successfully.
I already have the page in as a string variable. I know the address of the page and the URL/paths of the images, but I don't want to make another request to the server, because then the images would be randomized, and I want to do the comparisons between then after the page loads.
How can I do that? Is it possible to make a a request? Is it possible to somehow change my existing request of the page, or modify anything else that would bring me the images?
I know how to make requests, but I've never dealt with this kind of a "depth" of requests.
Thanks a lot in advance!

Related

Console Style PHP Script Response Pushing and Max Time

I realize the title is a bit incomprehensible, but, I am struggle on how to define what it is I am trying to accomplish.
Basically, I am creating my own photo gallery because none of the existing ones met my needs. The one part of the script loads the directory where my photos are located, and adds them to the database, loads the image tags, XMP info, etc. The problem is, it takes way too long. As a challenge, I want to do it without tampering with the max execution time.
I think would would be best is the PHP script sends notifications to the browser into a console style page. Ie, "File 3232 Processed \n, File 3233 Processed". Then using the shutdown function in before the max time, it sends a message saying to the browser that it got shutdown on file 3234 and a javascript function reloads the script starting at file 3234 through ajax or something.
My issue is I do not know how to get the PHP script to send the console style messages instantaneously. I tried various ajax load commands, but it waits for the entire script to finish, then sends a huge glob of results. I have tried the flush command and ob_implicit_flush(1) with no avail.
Links to examples or tutorials, or postings would be great. Thanks
You can do it the other way around: send the full list of images to the browser, and the browser will make ajax requests to process specific image.

Pointing crawler to HTML snapshot

I'm trying to make my AJAX website crawlable:
Here is the website in question.
I've created a htmlsnapshot.php that generates the page (this file needs to be passed the hash fragment to be able to generate the right content).
I don't know how to get the crawler to load this file while getting normal users to load the normal file.
I don't really understand what the crawler does to the hash fragment (and this probably is part of my problem.)
Does anybody have any tips?
The crawler will divert itself. You just need to configure your PHP script to handle the GET parameters that Google will be sending your site (instead of relying on the AJAX).
Basically, when Google finds a link to yourdomain.com/#!something instead of requesting / and running the JavaScript to make an AJAX request for data something, Google will automatically (WITHOUT you doing anything) translate anything that comes after #! in your URL to ?_escaped_fragment_=something.
You just need to (in your PHP script) check if $_GET['_escaped_fragment_'] is set, and if so, display the content for that value of something.
It's actually very easy.

Using YQL in javascript/php to scrape article html?

I'm new to YQL, and just trying to learn how to do some fairly simple tasks.
Let's say I have a list of URLs and I want to get their HTML source as a string in javascript (so I can later insert it to a database via ajax). How would I go about getting this info back in Javascript? Or would I have to do it in PHP? I'm fine with either, really - whatever can work.
Here's the example queries I'd run on their console:
select * from html where url="http://en.wikipedia.org/wiki/Baroque_music"
And the goal is to essentially save the HTML or maybe just the text or something, as a string.
How would I go about doing this? I somewhat understand how the querying works, but not really how to integrate with javascript and/or php (say I have a list of URLs and I want to loop through them, getting the html at each one and saving it somewhere).
Thanks.
You can't read other pages with Javascript due to a built-in security feature in web browsers. It is called the Same origin policy.
The usual method is to scrape the content of these sites from the server using PHP.
There is an other option with javascript called a bookmarklet.
You can add the bookmarklet in your bookmarks bar, and each time you want the content of a site click the bookmark.
A script will be loaded in the host page, it can read the content and post it back to your server.
Oddly enough, the same origin policy, does not prevent you to POST data from this host page to your domain. You need to POST a FORM to an IFRAME that has a source hosted on your domain.
You won't be able to read the response you get back from the POST.
But you can poll with a setInterval making a JSONP call to your domain to know if the POST was successful.

PHP and Javascript / Ajax caching for load speed - JSON and SimpleXML

I have a site that get content from other sites with some JSON and XML API. To prevent loading problems and problems with limitations I do the following:
PHP - Show the cached content with PHP, if any.
PHP - If never cached content, show an empty error page and return 404. (The second time the page loads it will be fine "success 200")
Ajax - If a date field does not exist in the database, or current date is earlier than the stored date, load/add content from API. Add a future date to the database. (This makes the page load fast and the Ajax caches the content AFTER the page is loaded).
I use Ajax just to run the PHP-file. I get the content with PHP.
Questions
Because I cache the content AFTER it was loaded the user will see the old content. Which is the best way to show the NEW content to the user. I'm thinking automatically with Javascript reload the page or message-nag. Other prefered ways?
If I use very many API:s the Ajax loadtime will be long and it's a bigger risk that some error will accur. Is there a clever way of splitting the load?
The second question is the important one.
Because I cache the content AFTER it
was loaded the user will see the old
content. Which is the best way to show
the NEW content to the user. I'm
thinking automatically with Javascript
reload the page or message-nag. Other
prefered ways?
I don't think you should reload the page via javascript, but just use Jquery's .load(). This way new content is inserted in the DOM without reloading the entire page. Maybe you highlight the newly inserted content be adding some CSS via addClass().
If I use very many API:s the Ajax
loadtime will be long and it's a
bigger risk that some error will
accur. Is there a clever way of
splitting the load?
You should not be splitting content in first place. You should try to minimize number of HTTP requests. If possible you should be doing all the API calling offline using some sort of message queue like for example beanstalkd, redis. Also cache the data inside in-memory database like for example redis. You can have a free instance of redis available thanks to http://redistogo.com. To connect to redistogo you should probably use predis
Why not use the following structure:
AJAX load content.php
And in content.php
check if content is loaded. yes > check if date is new. yes > return content
there is content, but its older > reload content from external > return content
there is no content > reload content from external > return content.
And for your second question. It depends on how often the content of the api's needs to be refreshed. If its daily you could run a script at night (or when there are the littlest people active) to get all new content and then during the day present that content. This way you minimize the calls to external resources during peak hours.
If you have access to multiple servers, the clever way is splitting the load. have each server handle a part of the requests.

Detecting when a link has been clicked with PHP

I have built an in browser engine that will retrieve pages without executing server side scripting... seems ridiculous, I know, but I'm doing this as part of a school project.
The problem that I am having is that once it displays the page if a link is clicked it will bring you to www.their-site.com instead of www.my-site.com?site=www.their-site.com.
Basically I need my php page to detect if a link is clicked and, if so, add "www.my-site.com?" before it so that all sites will still be rendered without all the server side scripting. Is their any way to do this?
---------------EDIT---------------------------------------------------------------------------
Ok I guess I wasn't clear enough the first time sorry about that.
I have made a php page that will display the contents of any site without executing the server side scripting that belongs with that page. This allows you to get around those annoying news articles that allow you to have a glimpse at them for two seconds and then a login box appears. the problem is once you've accessed the pages if you click any links you are connected to their server and the scripts turn back on. I want MY php to execute, not THEIRS
You need to know what you want first.
You say no server side scripting, then you mention php.
To do this, I don't think you can do it with just js.
You need to get the pages, using php, depending on what exactly, modify them such that when a link is clicked, it sends an ajax call to another page. This will require either regex replacement or the use of htmldom.
When a link is clicked, it should send the ajax response to the php page which can then request tha page, make modifications and send it back to the browser. You can then use js to replace the page contents.

Categories