fetching multiple urls using curl - php

Need a code example and/or guidance about fetching multiple urls stored in a .txt file using curl. Do I need to use a spider, or can I modify the code below which works well for one url?
<?php
$c = curl_init('http://www.example.com/robots.txt');
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
$page = curl_exec($c);
curl_close($c);
?>

Your question is vague but I will try to answer it with the information you provided.
I would use explode() PHP function.
$lines = explode(PHP_EOL, $page);
foreach($lines as $line) {
$val = explode(':', $line);
echo $val[1];
}
Something like this should do the job.

Related

Set Browser Cookies From Curl CookieJar/FIle

I am setting/storing CURL cookies with :
curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie);
And retrieving them / trying to set them in my browser with:
setcookie($cookie);
But what goes inbetween please?
Cookie variable defining looks like:
`$cookie="cookie.txt"`;
Is there some way to parse the cookie file as an array?
I can't find any official library that can do it for you.
This question has script to parse cookie file. Pay attention to answer about HttpOnly.
Or you might want to parse cookies directly from curl response. Then check this question.
this is the way i done it
$cookies = [];
$lines = file($cookiesFile);
foreach($lines as $line) {
if($line[0] !== '#' && $line[0] !== "\n") {
$tokens = explode("\t", $line);
$cookies[$tokens[5]] = trim($tokens[6]);
}
}

Getting resulting data from website query

I need to get the resulting data from a website query. for example
http://www.uniprot.org/uniprot/?query=organism:9606+AND+gene:AEBP1+AND+reviewed:yes&sort=score&format=tab&columns=entry%20name
resulting page shows
Entry name
AEBP1_HUMAN
I need the result, in this case "AEBP1_HUMAN" to be display on my website. Confused how to get it. Thanks
The goal is, that you can read the content of any url like a file, because php supports wrappers for variety of protocols.
First example uses function file that read entire content and split it by lines into an array.
<?php
$content = file($url);
echo $content[1];
?>
In the second example you get the whole content as a string, so you have to split it with explode function by line endings.
<?php
$content = file_get_contents($url);
$lines = explode("\n", $content);
echo $lines[1];
?>
Third example uses standard file open in combination with function fgets that read the content line by line.
<?php
$fp = fopen($url);
$fp = fopen($url, 'r');
$line = fgets($fp);
$line = fgets($fp);
echo $line;
?>
The last example shows usage of curl. Don't forget to use right options.
<?php
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$content = curl_exec($ch);
$lines = explode("\n", $content);
echo $lines[1];
?>
Sometimes you may experience problems on public hosting servers, where reading remote content is blocked.

How to get Wikipedia content section by section using Wikipedia API - PHP

Is there any better way to fetch text contents of particular sections from wikipedia. I have the below code to skip some sections but the process is taking too long to fetch data what am looking for.
for($i=0;$i>10;$i++){
if($i != 2 || $i != 4){
$url = 'http://en.wikipedia.org/w/api.php?action=parse&page=ramanagara&format=json&prop=text&section='.$i;
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, "TestScript");
$c = curl_exec($ch);
$json = json_decode($c);
$content = $json->{'parse'}->{'text'}->{'*'};
print preg_replace('/<\/?a[^>]*>/','',$content);
}
}
For starters, you're telling this to loop until $i is greater than 10, which in practice, will loop until the server request times out. Change it to $i<10, or if you need only a handful of sections, try:
foreach (array(1,3,5,6,7) as $i)
//your code
Second, decoding JSON into an associative array like this:
$json = json_decode($c, true);
And referencing it like $json['parse']['text']['*'] is easier to work with, but that's up to you.
And third, you'll find that strip_tags() will likely function faster and more accurately than stripping tags with regular expressions.

Pull text from another website

Is it possible to pull text data from another domain (not currently owned) using php? If not any other method? I've tried using Iframes, and because my page is a mobile website things just don't look good. I'm trying to show a marine forecast for a specific area. Here is the link I'm trying to display.
Update...........
This is what I ended up using. Maybe it will help someone else. However I felt there was more than one right answer to my question.
<?php
$ch = curl_init("http://forecast.weather.gov/MapClick.php?lat=29.26034686&lon=-91.46038359&unit=0&lg=english&FcstType=text&TextType=1");
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
$content = curl_exec($ch);
curl_close($ch);
echo $content;
?>
This works as I think you want it to, except it depends on the same format from the weather site (also that "Outlook" is displayed).
<?php
//define the URL of the resource
$url = 'http://forecast.weather.gov/MapClick.php?lat=29.26034686&lon=-91.46038359&unit=0&lg=english&FcstType=text&TextType=1';
//function from http://stackoverflow.com/questions/5696412/get-substring-between-two-strings-php
function getInnerSubstring($string, $boundstring, $trimit=false)
{
$res = false;
$bstart = strpos($string, $boundstring);
if($bstart >= 0)
{
$bend = strrpos($string, $boundstring);
if($bend >= 0 && $bend > $bstart)
{
$res = substr($string, $bstart+strlen($boundstring), $bend-$bstart-strlen($boundstring));
}
}
return $trimit ? trim($res) : $res;
}
//if the URL is reachable
if($source = file_get_contents($url))
{
$raw = strip_tags($source,'<hr>');
echo '<pre>'.substr(strstr(trim(getInnerSubstring($raw,"<hr>")),'Outlook'),7).'</pre>';
}
else{
echo 'Error';
}
?>
If you need any revisions, please comment.
Try using a user-agent as shown below. Then you can use simplexml to parse the contents and extract the text you want. For more info on simplexml.
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"User-agent: www.example.com"
)
);
$content = file_get_contents($url, false, stream_context_create($opts));
$xml = simplexml_load_string($content);
You may use cURL for that. Have a Look at http://www.php.net/manual/en/book.curl.php

Use PHP to get embed src page information?

Sort of a weird question.
From 4shared video site, I get the embed code like the following:
<embed src="http://www.4shared.com/embed/436595676/acfa8f75" width="420" height="320" allowfullscreen="true" allowscriptaccess="always"></embed>
Now, if I access the url in that embed src, the video is loaded up and the URL of the page is changed with information about the video.
I am wondering if there is any way for me to access that info using PHP? I tried file_get_contents but it gives me lots of weird characters.
So, can I use PHP to load the embed url and get the information present in the address bar?
Thanks for all your help! :)
Yes, e.g. with the curl-library of php. This one will handle the redirect-headers from the server, which result in the new/real url of the video.
Here's a sample code:
<?php
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.4shared.com/embed/436595676/acfa8f75");
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, 1);
// we want to further handle the content, so return it
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// grab URL and pass it to the browser
$result = curl_exec($ch);
// did we get a good result?
if (!$result)
die ("error getting url");
// if we got a redirection http-code, split the content in
// lines and search for the Location-header.
$location = null;
if ((int)(curl_getinfo($ch, CURLINFO_HTTP_CODE)/100) == 3) {
$lines = explode("\n", $result);
foreach ($lines as $line) {
list($head, $value) = explode(":", $line, 2);
if ($head == 'Location') {
$location = trim($value);
break;
}
}
}
if ($location == null)
die("no redirect found in header");
// close cURL resource, and free up system resources
curl_close($ch);
// your location is now in here.
var_dump($location);
?>

Categories