which is better to get the contents of a web page - php

To get the contents of a web page which is better:
1) To use Curl
2) file_get_contents
3)Any thing better than the above two.
Thanks for answers in advance

It all boils down to your needs. file_get_contents is a simple and convenient way to get the contents of a webpage, if the HTTP headers it sends are okay for you. However, if you need more complexity, like HTTP authentication or custom headers, cURL will be a better fit.
If you just want to retrieve the contents of a public URL, I'd go for file_get_contents.

1) cURL gives you much much more advanced features about sending cookies, headers, time-to-try, etc and is widely supported on webhosts.
2) On the other hand, file_get_contents is better for quick-jobs (and beginners) when you do not need anything fancy and you are totally sure that it is enabled on the server. Most of the times, it is disabled.
3) There is nothing better than cURL.

Related

file_get_contents gets different file from google than shown in browser

I use file_get_contents to find out if there is an URL of the search I look at:
http://www.google.com/search?q=*a*+site:www.reddit.com/r/+-inurl:(/shirt/|/related/|/domain/|/new/|/top/|/controversial/|/widget/|/buttons/|/about/|/duplicates/|dest=|/i18n)&num=1&sort=date-sdate
If I go to this URL in my browser, a different file is displayed then what I see when I echo file_get_contents
$url = "http://www.google.com/search?q=*a*+site:www.reddit.com/r/+-inurl:(/shirt/|/related/|/domain/|/new/|/top/|/controversial/|/widget/|/buttons/|/about/|/duplicates/|dest=|/i18n)&num=1&sort=date-sdate";
$google_search = file_get_contents($url);
What's wrong with my code?
Nothing really. The problem is that the page uses javascript and ajax to get contents. So, in order to get a "snapshot" of the page, you need to "run it". That is, you need to parse the javascript code, which php doesn't do.
Your best bet is to use an headless browser such as phantomjs. If you search, you find some tutorials explaining how to do it
NOTE
If all you're looking for is a way to retrieve raw data from the search, you might want to try to use google's search api.
I assume Google is definitely checking the user agent to avoid any kind of automated searches.
So you should at least use CURL and define a proper user agent string (i.e. the same as a common browser) to "trick" Google.
Somehow I fear it will not be so easy to trick Google, but maybe I'm just paranoic and at least you may learn something about CURL.

How to properly serve CSS

Say I for some reason want to serve my CSS through PHP (because of pre-processing, merging, etc). What do I need to do in my PHP to make this work well? Other than the most obvious:
header('content-type: text/css; charset=utf-8');
What about headers related to caching, modification times, etags, etc? Which ones should I use, why and how? How would I parse incoming headers and respond appropriately (304 Not Modified for example)?
Note: I know this can be tricky and that it would be a lot easier to just do what I want to do with the CSS before I deploy it as a regular CSS file. If I wanted to do it that way, I wouldn't have asked this question. I'm curious to how to do this properly and would like to know. What I do or could do beforehand with the CSS is irrelevant; I just want to know how to serve it properly :)
Note 2: I really would like to know how to do this properly. I feel most of the activity on this question has turned into me defending why I would want to do this, rather than getting answers on how to do this. Would very much appreciate it if someone could answer my question rather than just suggesting things like SASS. I'm sure it's awesome, and I might try it out sometime, but that's not what I'm asking about now. I want to know how to serve CSS through PHP and learn how to deal with the caching and things like that properly.
A commendable effort. Caching gets way too little good will. Please enjoy my short prose attempting to help you on your way.
The summary
Sending an ETag and a Last-Modified header will enable the browser to send a If-Modified-Since and a If-None-Match header back to your server on subsequent requests. You may then, when applicable, respond with a 304 Not Modified HTTP status code and an empty body, i.e. Content-Length: 0. Including a Expires header will help you to serve fresh content one day when the content has indeed changed.
The apprentice
Sounds simple enough, but it can be a bit tricky to get just right. Luckily for us all, there is really good guidance available.
Once you get it up and running, please turn to REDbot to help you smooth out any rough corners you may have left in.
The expert
For the value of the ETag, you will want to have something you can reproduce, but will still change whenever the content does. Otherwise you will not be able to tell whether the incoming value matches or not. A good candidate for a reproducible value which still changes when the content does, is an MD5 hash of the mtime of the file being served through the cache. In your case, it would probably be a sum for all the files being merged.
For Last-Modified the logical answer is the actual mtime of the file being served. Why neglect the obvious. Or for a group of files, as in your case, use the most recent mtime in the bunch.
For Expires, simply choose an appropriate TTL, or time-to-live, for the asset. Add this number to the asset's mtime, or the value you chose for Last-Modified, and you have your answer.
You may also want to include Cache-Control headers to let possible proxies on the way know how to properly serve their clients.
The scholar
For a more concrete response to your question, please refer to these questions predating yours:
What headers do I want to send together with a 304 response?
Get Browser to send both If-None-Match and If-Modified-Since
HTTP if-none-match and if-modified-since and 304 clarification in PHP
Is my implementation of HTTP Conditional Get answers in PHP is OK?
The easiest way to serve CSS (or JavaScript) through PHP would be to use Assetic, a super-useful PHP asset manager similar to Django's contrib.staticfiles or Ruby's Jammit. It handles caching and cache invalidation, dynamic minification, compression, and all the "tricky bits" that were mentioned in other answers.
To understand how to write your own asset server properly, I strongly recommend you read Assetic's source code. It's very commented and readable, and you'll learn a lot about best practices regarding caching, minification, and everything else that Assetic does so well.
One common patter is to include a meaningless GET parameter. In fact, stack exchange sites do exatly this:
<link ... href="http://cdn.sstatic.net/stackoverflow/all.css?v=0285b0392b5c">
The v (version) is presumably a hash of some kind, probably of the css file itself. They do not store the old sheets, it's just a way to force the browser to download the new file and not use the cached one.
With this setup, it is safe to set Cache-Control:max-age to a large value.
The ETag will make server reply 304 if the file is not modified, you might as well use the same hash:
header('ETag: "' . md5("path to css file") . '"');
I just finished explaining here why I don't think PHP-processed CSS is a good idea; I believe most people who implement it would be better served by another application structure. Take a look.
If you must do it, making caching work will require keeping track of each variant independently and having the client send a parameter which uniquely identifies that variant (so you can say "not modified").
The Content-Type header is a good start, but not the tricky bit...
You have to add query string at end of the javascript file, that is good option to say it is new file until that browsers are think same css files
www.example.com/css/tooltip.css?version1.0
or
www.example.com/css/tooltip.css?12-01-2012
so browser is going to understand this new files it reloads again, keep it in cache up to next release,and easy to maintainable if you append automatic date using php at end of the query string.

Stripp all header in PHP response

How if at all can I strip all the header from a PHP response through apache in order to just stream the text response. I've tried adding a custom htaccess but to no avail. I've got limited control of the hosting server. The stream is read by an embedded device which doesn't need any headers.
It get's to a point where certain headers are NEEDED to be interpreted by the browser so it can render the output. If the reason why you want to remove the header is for a chat-like feature, think about using a persitant keep-alive connection
Tips in reducing bandwidth
Use ajax: keep the response from PHP in JSON format and update DOM elements
Gzip.
Just don't worry about headers -- typically a HTTP OK response will only take up < 200 bytes, hardly anything in comparison to the actual page content. Focus on where it really matters.
Edit:
To suit your case look into using sockets (UDP would be a good option if wanting to cut back on a lot of bandwidth) socket_listen() (non UDP) or socket_bind() capabable of UDP
That's impossible.
You are using HTTP protocol and HTTP protocol response always contains headers.
Either do not use HTTP or teach your device to strip headers. It's not that hard.
Anyway, php has very little to do with removing headers. There is also a web-server that actually interacts with your device and taught to send proper headers.
There is a PHP function called header_remove(). I never used it before but you can try if this works for you. Note that this function is available since PHP 5.3.0.

How to get full text of request (not response) headers?

I have to ensure that curl is sending cookies propertly, so I should view full text of request (not response!) http headers. Is there any way to do it?
Thank you
Obviously, you need to intercept the request headers at some point. Considering you want to see headers from curl, I would suggest setting up a transparent proxy. For a relatively simpler solution, I would suggest a tool such as WireShark.

CURL vs fopen vs fsocketopen?

I would write a WordPress plugin to parse all image source and check it's broken link or not.
My idea is :
Select all post&page's images by regex from MySQL
Navigate the image url and get the response header (404 ,403 error etc)
print a report
Since i don't need actual to download the binary file, so in performance ,compare in CURL , fopen , fsocketopen
Which one is worst to use?
And one more question, which method can execute in multi-thread?
The cost of opening a connection to the remote server makes the performance of the library a fairly moot point. In other words it isn't worth worrying about the performance of the functions.
A better option would be to use wse whatever function allows you to make HEAD requests (Which only return the HTTP headers). While you can do it with fsockopen (I don't know about fopen), it is a lot of work when cURL has code already written to send the request and parse the response.
For an example of how to do a head request using cURL see this answer.
And one more question, which method can execute in multi-thread?
PHP doesn't have threads

Categories