This question already has answers here:
How to get results from the Wikipedia API with PHP?
(4 answers)
Closed 9 years ago.
I'm trying to get wikipedia pages (from particular category) using of MediaWiki. For this I'm following this tutorial Listing 3. Listing pages within a category. My question is: How to get Wikipedia pages without using of Zend Framework? And is there any Rest Clients based on php without need to install? Because Zend requires to install their package first and some configurations... and I don't want to do all this stuff.
After googling and some investigation I have found a tool called cURL, using of cURL with PHP can also buid a rest service. I really new in implementing rest services, but already tried to implement something in php:
<?php
header('Content-type: application/xml; charset=utf-8');
function curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$wiki = "http://de.wikipedia.org/w/api.php?action=query&list=allcategories&acprop=size&acprefix=haut&format=xml";
$result = curl($wiki);
var_dump($result);
?>
But got the errors in the result. Could anyone to help with this?
UPDATE:
This page contains the following errors:
error on line 1 at column 1: Document is empty
Below is a rendering of the page up to the first error.
Sorry for taking so long to reply, but better late than never...
When I run your code on the command line, the output I get is:
string(120) "Scripts should use an informative User-Agent string with contact information, or they may be IP-blocked without notice.
"
So it seems the problem is that you're bumping into Wikimedia bot User-Agent policy by not telling cURL to send a custom User-Agent header. To fix this, follow the advice given at the bottom of that page and add lines like the following into your script (alongside the other curl_setopt() calls):
$agent = 'ProgramName/1.0 (http://example.com/program; your_email#example.com)';
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
Ps. You probably also don't want to set an application/xml content type unless you're sure that the content actually is valid XML. In particular, the output of var_dump() will not be valid XML, even if the input is.
For testing and development, I'd suggest either running PHP from the command line or using the text/plain content type. Or, if you prefer, use text/html and encode your output with htmlspecialchars().
Ps. Made this a community wiki answer, since I realized that this question has already been asked and answered before.
Related
I'm trying to set up a bot for bittrex by using the bittrex api. I previously tried using python but had a hard time as the documentation was in php(https://bittrex.com/Home/Api), so I decided to switch to php. Im trying to create the bot but having a hard time starting. I pasted the initial code:
$apikey='xxx';
$apisecret='xxx';
$nonce=time();
$uri='https://bittrex.com/api/v1.1/market/getopenorders?
apikey='.$apikey.'&nonce='.$nonce;
$sign=hash_hmac('sha512',$uri,$apisecret);
$ch = curl_init($uri);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('apisign:'.$sign));
$execResult = curl_exec($ch);
$obj = json_decode($execResult);
And according to this video: (sorry I had to add space because it doesn't allow me to post more than 2 links with low rep)
https:// youtu.be/K0lDTK3D-74?t=5m30s
It should return this: (Same as Above)
http:// i.imgur.com/jCoAUT9.png
But when I try place the same thing in a php values, with my own api key and secret I just get a blank webpage with nothing on it. This is what my php file looks like(API key and secret removed for security reasons):
http://i.imgur.com/DYYoY0g.png
Any idea why this could be happening and how I could fix it?
Edit: No need for help anymore. I decided to go back to python and try to do it there and finally made it work :D
The video you're working from has faked their results. Their code doesn't do anything with the value of $obj, so I wouldn't expect anything to show up on the web page. (And definitely not with the formatting they show.)
If you're unfamiliar enough with PHP that this issue wasn't immediately apparent to you, this is probably a sign that you should step back and get more familiar with PHP before you continue -- especially if you're going to be running code that could make you lose a lot of money if it isn't working properly.
You need to echo your $obj or at least var_dump() it to see the content on a webpage.
I am creating a PHP package that I want anyone to be able to use.
I've not done any PHP dev in a few years and I'm unfamiliar with pear and pecl.
The first part of my question is related to Pecl and Pear:
It seems to me that Pear and pecl are updating my computer, rather than doing anything to my code base, which leads me to the assumption that anything I do with them will also need to be duplicated by anyone wanting to use my package. Is that correct?
The 2nd part of my question is specific, I just want to do a simple HTTP (POST) request, and ideally I'd like to do it without any config required by those who use my package.
These are options I'm considering :
HTTPRequest seems like the perfect option, but it says "Fatal error: Uncaught Error: Class 'HttpRequest' not found" when I try and use it out of the box, and when I follow these instructions for installing it I get, "autoheader: error: AC_CONFIG_HEADERS not found in configure.in
ERROR: `phpize' failed" -- I don't want to debug something crazy like that in order to do a simple HTTP request, nor do I want someone using my package to have to struggle through something like that.
I've used HTTP_Request2 via a pear install and it works for me, but there is nothing added to my codebase at all, so presumably this will break for someone trying to use my package unless they follow the same install steps?
I know that I can use CURL but the syntax for that seems way over the top for such a simple action (I want my code to be really easy to read)
I guess I can use file_get_contents() .. is that the best option?
and perhaps I'll phrase the 2nd part of my question as :
Is there an approach that is considered best practice for (1) doing a HTTP request in PHP, and (2) for creating a package that is able to be easily used by anyone?
This really depends on what you need your request for. While it can be daunting when first learning it, I prefer to use cURL requests most of the time unless all I need to do is query the page with no headers. It becomes pretty readable once you get used to the syntax and the various options in my opinion. When all I need to do is query a page with no headers, I will usually use file_get_contents as this is a lot nicer looking and simpler. I also think most PHP developers can agree with me on this standpoint. I recommend using cURL requests as, when you need to set headers, they're very organized and more popular than messing with file_get_contents.
EDIT
When learning how to do cURL in PHP, the list of options on the documentation page is your friend! http://php.net/manual/en/function.curl-setopt.php
Here's an example of a simple POST request using PHP that will return the response text:
$data = array("arg1" => "val1", "arg2" => true); // POST data included in your query
$ch = curl_init("http://example.com"); // Set url to query
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST"); // Send via POST
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($data)); // Set POST data
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // Return response text
curl_setopt($ch, CURLOPT_HEADER, "Content-Type: application/x-www-form-urlencoded"); // send POST data as form data
$response = curl_exec($ch);
curl_close($ch);
This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 8 years ago.
This is the site which I am referring.
I have search through stackoverflow and tried various suggested php methods like file_get_contents() and readfile() method but it cannot retrieve the table value from the site.
i tried to view the source from the page and I could not locate the table value as well. I tried looking for iframe src but to no avail.
Not sure if there is any method which I can use to retrieve such value from the site?
Please advise.
The table's html seems to be generated on the client side (in your browser) with javascript, so it won't show up in the server's response in the way you see it in the browser (you can try disabling javascript and check the site). You can either:
Switch technology, and use some kind of remote controller browser like phantomJS
You can use try to use their raw data. Just open up your browser's developer tools (usually F12) and check what URL's are fetched. You might need to try to analyze the site's javascript code to make sense of these. You should see something like this:
In both cases, check with the site's owners if they are OK with this kind of use (read their data use policy if they have one or just e-mail them), most site owners are not exactly too happy this kind of crawling.
Use the logic of curl, please refer this example
<?php
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "example.com");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
?>
I have a PHP script that I'm trying to get the contents of a page. The code im using is below
$url = "http://test.tumblr.com";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$txt = curl_exec($ch);
curl_close($ch);
echo "$txt";
It works fine for me as it is now. The problem I'm having is, if I change the string URL to
$url = "http://-test.tumblr.com"; or $url = "http://test-.tumblr.com";
It will not work. I understand that -test.example.com or test-.example.com is not a valid hostnames but with Tumblr they do exists. Is there a work around for this?
I even tried creating a header redirect on another php file so cURL would be first getting a valid hostname but works the same way.
Thank you
Domain Names with hyphens
As you can see in a previous question about the allowed characters in a subdomain, - is not a valid character to start or end a subdomain with. So this is actually correct behavior.
The same problem was reported over the curl mailing list some time ago but since curl follows the standard, there is actually nothing to change on their site.
Most likely tumblr knows about this and therefore offers some alternative address leading to the same site.
Possible workaround
However you could try using nslookup to manually lookup the IP and then send your request directly to this IP (and manually setting the hostname to the correct value). I didn't try this out, but it seems as if nslookup is capable to resolve malformatted domain names that start or end in a hyphen.
curl
Additionally you should know, that the php curl function should be a direct interface to the curl command line tool and therefore, if you would encounter special behavior it would most likely be due to the logic in the curl command line tool and not the php function.
i am still stucked in screen scraping problem...
link : screen scraping in php problem
This problem was solved to little extent by using '&num=100' in google search query which decreased the no. of request 10 times.But captcha problem is still dere. So to overcome it i used...sleep(seconds) function.
Now the problem is I have to scrape it myself(these are orders).that means i dont want to use 'simple_html_dom.php' becuase catching warnings and error is difficult(for me) in this case.i m instructed to do it myself. so how i can i do it.i know to methods: 1. file_get_content() 2. curl.
But its very tedious work to fetch search for ur content and count rank simultaneously.as using regular exp to parse dom is HELL.read this link for convencing urself.link: RegEx match open tags except XHTML self-contained tags
Task to implemented :
catch captcha error(or warning) so i can stop furhter execution.
Have to use headers.so it seems to be genuine and valid humanable request to google.
simple_html_dom.php cant catch errors.it shows warning when captcha error occurs.How can i catch that warning?
Please help...its long working with this module.Please give suggestion to solve each and every problem related here.
Don't know about the first problem (captcha), but you can send headers easily with curl, for example:
$ch = curl_init();
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Accept-Charset: utf-8'));
And to set the user agent:
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; Linux x86_64; rv:2.2a1pre) Gecko/20110324 Firefox/4.2a1pre');