I am a beginner, in php, and i want to use regex to remove a string after the first white space i found.
My code is like:
<?php
$barcode = $_POST['barcode'];
$adress = "https://...";
$timeout = 40;
$ch = curl_init($adress);
curl_setopt($ch, CURLOPT_FRESH_CONNECT, true);
curl_setopt($ch, CURLOPT_TIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
if (strpos($adress, 'https://') === 0) {
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
}
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$page_content = curl_exec($ch);
$dom = new \DOMDocument('1.0', 'UTF-8');
$internalErrors = libxml_use_internal_errors(true);
$node = $dom->loadHTML($page_content);
$listeImages = $dom->getElementsByTagName('img');
foreach ($listeImages as $image)
{
$link = $dom->createElement("a");
$href = $dom->createAttribute('srcset');
$href->value = $image->getAttribute('srcset');
$value_img = print( $href -> nodeValue.'<br />');
$image->parentNode->replaceChild($link, $image);
$link->appendChild($href);
/*$exclusion_space_url=array(' ', ' 1x');
$url_cleaned=$this->$value_img;
foreach ($exclusion_space_url as $key=>$value) {
$url_cleaned=str_replace($value," ",$url_cleaned);*/ ***what i tried to do***
}
}
curl_close($ch);
?>
And the result of this is :
An html page with 3-4 lines of url serparate with space
Thank for your help
$yourString = "Your String With Space";
$str= strtok($yourString,' ');
echo $str;
Related
IM gettin: "Invalid argument supplied for foreach()"
in the for each line when trying to read an specific item of the xml
PHP:
$urls="http://www.floatrates.com/daily/usd.xml";
$url= $urls;
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
$data = curl_exec($ch);
curl_close($ch);
$feed = new SimpleXMLElement($data,LIBXML_NOCDATA);
foreach ($feed->channel->item as $item) {
$title = $item->title;
if (preg_match('/ARS/',$title)) {
echo $title;
}
}
$urls = "http://www.floatrates.com/daily/usd.xml";
$url = $urls;
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
$data = curl_exec($ch);
curl_close($ch);
$feed = new SimpleXMLElement($data, LIBXML_NOCDATA);
foreach ($feed->children() as $item) {
if ("item" == $item->getName()) {
$title = $item->title;
if (preg_match('/ARS/', $title)) {
echo $title;
}
}
}
It probably means that $feed does not contain what you expect it to contain, and you are not passing array to foreach Use print_r($feed); to see that it actually contains, and go from there.
When i am Decoding using commented "$jsonString" String it is working very well.
But after using curl it is not working, showing Null.
Please Help Me in this.
if (isset($_POST['dkno'])) {
$dcktNo = $_POST['dkno'];
$url = 'http://ExampleStatus.php?dkno=' . $dcktNo;
$myvars = '';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $myvars);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$jsonString = curl_exec($ch);
// $jsonString = '[{"branchname":"BHUBNESHWAR","consignee":"ICICI BANK LTD","currentstatus":"Delivered by : BHUBNESHWAR On - 25/07/2015 01:00","dlyflag":"Y","PODuploaded":"Not Uploaded"}]';
if ($jsonString != '') {
$json = str_replace(array('[', ']'), '', $jsonString);
echo $json;
$obj = json_decode($json);
if (is_null($obj)) {
die("<br/>Invalid JSON, don't need to keep on working on it");
} else {
$podStatus = $obj->PODuploaded;
}
}
}
}
After curl I used following concept to get only JSON data from HTML Page.
1) fetchData.php
$url = 'http://DocketStatusApp.aspx?dkno=' . $dcktNo;
$myvars = '';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $myvars);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$jsonString = curl_exec($ch);
// now get only value
$dom = new DOMDocument();
$dom->loadHTML($jsonString);
$thediv = $dom->getElementById('Label1');
echo $thediv->textContent;
2) JSONprocess.php
if (isset($_POST['dkno'])) {
$dcktNo = $_POST['dkno'];
ob_start(); // begin collecting output
include_once 'fetchData.php';
$result = ob_get_clean(); // Completed collecting output
// Now it will show & take only JSON Data from Div Tag
$json = str_replace(array('[', ']'), '', $result);
$obj = json_decode($json);
if (is_null($obj)) {
die("<br/>Invalid JSON, don't need to keep on working on it");
} else {
$podStatus = $obj->PODuploaded;
}
}
I was using following code to decode the tinyurl from twitter:
function MyURLDecode($url)
{
$ch = #curl_init($url);
#curl_setopt($ch, CURLOPT_HEADER, TRUE);
#curl_setopt($ch, CURLOPT_NOBODY, TRUE);
#curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
#curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$url_resp = #curl_exec($ch);
preg_match('/Location:\s+(.*)\n/i', $url_resp, $i);
if (!isset($i[1]))
{
return $url;
}
return $i[1];
}
$url = MyURLDecode($url);
This using this result to get tweet count value:
function get_twitter_url_count($url) {
$encoded_url = urlencode($url);
$content = #file_get_contents('http://urls.api.twitter.com/1/urls/count.json?url=' . $encoded_url);
return $content ? json_decode($content)->count : 0;
}
Above code works fine. But due to some issue I need to change the Url Decoding function which is as below:
function TextAfterTag($input, $tag)
{
$result = '';
$tagPos = strpos($input, $tag);
if (!($tagPos === false))
{
$length = strlen($input);
$substrLength = $length - $tagPos + 1;
$result = substr($input, $tagPos + 1, $substrLength);
}
return trim($result);
}
function expandUrlLongApi($url)
{
$format = 'json';
$api_query = "http://api.longurl.org/v2/expand?" .
"url={$url}&response-code=1&format={$format}";
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $api_query );
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_HEADER, false);
$fileContents = curl_exec($ch);
curl_close($ch);
$s1=str_replace("{"," ","$fileContents");
$s2=str_replace("}"," ","$s1");
$s2=trim($s2);
$s3=array();
$s3=explode(",",$s2);
$s4=TextAfterTag($s3[0],(':'));
$s4=stripslashes($s4);
return $s4;
}
This code gives decoded url correctly, infact with better accuracy then previous one but now result of this does not works with this function:
function get_twitter_url_count($url) {
$encoded_url = urlencode($url);
$content = #file_get_contents('http://urls.api.twitter.com/1/urls/count.json?url=' . $encoded_url);
return $content ? json_decode($content)->count : 0;
}
Rather givig correct count value, it always gives 0.
Can someone tell me what is wrong here?
full code:
<?php
function TextAfterTag($input, $tag)
{
$result = '';
$tagPos = strpos($input, $tag);
if (!($tagPos === false))
{
$length = strlen($input);
$substrLength = $length - $tagPos + 1;
$result = substr($input, $tagPos + 1, $substrLength);
}
return trim($result);
}
function expandUrlLongApi($url)
{
$format = 'json';
$api_query = "http://api.longurl.org/v2/expand?" .
"url={$url}&response-code=1&format={$format}";
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $api_query );
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_HEADER, false);
$fileContents = curl_exec($ch);
/* preg_match('/Location:\s+(.*)\n/i', $fileContents, $i);
if (!isset($i[1]))
{
return '';
}
else
$fileContents = $i[1]; */
curl_close($ch);
$s1=str_replace("{"," ","$fileContents");
$s2=str_replace("}"," ","$s1");
$s2=trim($s2);
$s3=array();
$s3=explode(",",$s2);
$s4=TextAfterTag($s3[0],(':'));
$s4=stripslashes($s4);
return $s4;
}
$url = expandUrlLongApi('http://t.co/SFdM0wjtGA');
echo "Url is : $url <br/>";
$url = get_twitter_url_count($url);
?>
how can I parse images on this site with cURL?
with this code I can show the whole site's html, but I need only images:
$ch = curl_init('http://www.lamoda.ru/shoes/sapogi/?sitelink=leftmenu&sf=16&rdr565=1#sf=16');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, '1');
$text = curl_exec($ch);
curl_close($ch);
if (!preg_match('/src="https?:\/\/"/', $text))
$text = preg_replace('/src="(.*)"/', "src=\"$MY_BASE_URL\\1\"", $text);
echo $text;
thank you!
I tried this:
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, '1');
$text = curl_exec($ch);
curl_close($ch);
$doc = new DOMDocument();
#$doc->loadHTML($text->content);
$imgs = $doc->getElementsByTagName('img');
foreach ($imgs as $img)
{
$imgarray[] = $img -> getAttribute('src');
}
return $imgarray;
BUT: on this site images uploaded via JS and it doesn't show images at all =((
You can use a DOM Parser to achieve this:
$ch = curl_init('URL_GOES_HERE');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, '1');
$text = curl_exec($ch);
curl_close($ch);
$dom = new DOMDocument;
$dom->loadHTML($text);
foreach ($dom->getElementsByTagName('img') as $img) {
echo $img->getAttribute('src');
}
you can use html parse simple_html_dom:
http://simplehtmldom.sourceforge.net/manual.htm
// Create DOM from URL or file
$url = 'http://www.lamoda.ru/shoes/sapogi/?sitelink=leftmenu&sf=16&rdr565=1#sf=16';
$html = file_get_html($url);
// Find all images
foreach($html->find('img') as $element)
echo $element->src;
I am trying to create a program that will open a text file with urls seperated by |. It will then take the first line of the text document, crawl that url and remove it from the text file. Each url is to be scraped by a basic crawler. I know the crawler part works because if I enter in one of the urls in quotations, rather than a variable from the text file, it will work. I am at the point where it will not return anything because the url simply will not be accepted.
this is a basic version of my code because I had to break it down alot to iscolate the problem.
$urlarray = explode("|", $contents = file_get_contents('urls.txt'));
$url = $urlarray[0];
$dom = new DOMDocument('1.0');
#$dom->loadHTMLFile($url);
$anchors = $dom->getElementsByTagName('a');
foreach($anchors as $element)
{
$title = $element->getAttribute('title');
$class = $element->getAttribute('class');
if($class == 'result_link')
{
$title = str_replace('Synonyms of ', '', $title);
echo $title . "<br />";
}
}`
The code below works like a champ tested with your example data:
<?php
$urlarray = explode("|", $contents = file_get_contents('urls.txt'));
$url = $urlarray[0];
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = curl_exec($ch);
$dom = new DOMDocument();
#$dom->loadHTML($html);
$anchors = $dom->getElementsByTagName('a');
foreach($anchors as $element)
{
$title = $element->getAttribute('title');
$class = $element->getAttribute('class');
if($class == 'result_link')
{
$title = str_replace('Synonyms of ', '', $title);
echo $title . "<br />";
}
}
?>
ALMOST FORGOT: LETS NOW PUT IT IN A LOOP TO LOOP THROUGH ALL URLS:
<?php
$urlarray = explode("|", $contents = file_get_contents('urls.txt'));
$url = $urlarray[0];
foreach($urlarray as $url) {
if(!empty($url)) {
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,trim($url));
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = curl_exec($ch);
$dom = new DOMDocument();
#$dom->loadHTML($html);
$anchors = $dom->getElementsByTagName('a');
foreach($anchors as $element)
{
$title = $element->getAttribute('title');
$class = $element->getAttribute('class');
if($class == 'result_link')
{
$title = str_replace('Synonyms of ', '', $title);
echo $title . "<br />";
}
}
echo '<hr />';
}
}
?>
So if you put in a URL manually $url = 'http://www.mywebsite.com'; every thing works as expected?
If so there is a problem here:
$urlarray = explode("|", $contents = file_get_contents('urls.txt'));
are you sure urls.txt is loading? Are you sure it contains http://a.com|http://b.com etc?
I would var dump
$contents = file_get_contents('urls.txt') before the explode statement to see if it is loading in.
If yes, then I would explode the into $urlarray, and var dump $urlarray[0]
if it looks right I would trim it before being sent to dom with trim($urlarray[0])
I may even go as far as using valid regex to make sure these URL's are in fact URL's before sending it to dom.
Let me know the results and I will try to help further, or post all sample code including URLS.txt
And I will run it locally