PHP file_get_html on Pinterest - Some seriously weird behaviour - php

Trying to scrape a bit of basic account info from Pinterest pages (no I'm not scraping pins before I get accused of using this maliciously, it's simply a competitor research tool).
Some accounts work fine with file_get_html, others return completely blank objects and I can't figure out why. I've built the below test code with completely random pages of different sizes to try and do some testing... still no further forward.
It uses Simple HTML DOM and here is my test code trying to figure out why some aren't working.
$pinterestUrl1 = "https://uk.pinterest.com/sfashionality/";
$pinterestUrl2 = "https://uk.pinterest.com/serenebathrooms/";
$pinterestUrl3 = "https://uk.pinterest.com/jenstanbrook/";
$pinterestUrl4 = "https://uk.pinterest.com/homebaseuk/";
$pinterestUrl5 = "https://uk.pinterest.com/thedoifter/";
$pinterestUrl6 = "https://uk.pinterest.com/coolshitibuy/";
$html1 = file_get_html($pinterestUrl1);
$html2 = file_get_html($pinterestUrl2);
$html3 = file_get_html($pinterestUrl3);
$html4 = file_get_html($pinterestUrl4);
$html5 = file_get_html($pinterestUrl5);
$html6 = file_get_html($pinterestUrl6);
echo $pinterestUrl1 . " - "; if (is_object($html1)) { echo "Returns object okay<br/>"; } else { echo "Failed<br/>"; };
echo $pinterestUrl2 . " - "; if (is_object($html2)) { echo "Returns object okay<br/>"; } else { echo "Failed<br/>"; };
echo $pinterestUrl3 . " - "; if (is_object($html3)) { echo "Returns object okay<br/>"; } else { echo "Failed<br/>"; };
echo $pinterestUrl4 . " - "; if (is_object($html4)) { echo "Returns object okay<br/>"; } else { echo "Failed<br/>"; };
echo $pinterestUrl5 . " - "; if (is_object($html5)) { echo "Returns object okay<br/>"; } else { echo "Failed<br/>"; };
echo $pinterestUrl6 . " - "; if (is_object($html6)) { echo "Returns object okay<br/>"; } else { echo "Failed<br/>"; };
Result:
https://uk.pinterest.com/sfashionality/ - Returns object okay
https://uk.pinterest.com/serenebathrooms/ - Returns object okay
https://uk.pinterest.com/jenstanbrook - Failed
https://uk.pinterest.com/homebaseuk/ - Failed
https://uk.pinterest.com/thedoifter/ - Returns object okay
https://uk.pinterest.com/coolshitibuy/ - Returns object okay
I can't see any reasons why some of these return objects and others don't... and because it's blank I don't even know where to start debugging this kind of thing.
Any ideas at all on this one? Thanks

Simple HTML DOM parser has constant MAX_FILE_SIZE with value 600000 and URLs that you are requesting have slightly more HTML.
You can define MAX_FILE_SIZE with some larger value before including lib, this will produce a PHP notice but HTML will be processed. Code I have tested this with:
<?php
define('MAX_FILE_SIZE', 6000000); //Will produce notice, but we need to define it
include_once './simplehtmldom_1_5/simple_html_dom.php';
$urls = array(
'https://uk.pinterest.com/sfashionality/',
'https://uk.pinterest.com/serenebathrooms/',
'https://uk.pinterest.com/jenstanbrook/',
'https://uk.pinterest.com/homebaseuk/',
'https://uk.pinterest.com/thedoifter/',
'https://uk.pinterest.com/coolshitibuy/',
);
foreach ($urls as $url) {
$content = file_get_contents($url);
$html = str_get_html($content);
echo $url . ' - ';
if (is_object($html)) {
echo 'Returns object okay<br/>';
} else {
echo 'Failed<br/>';
};
}

Related

Concatenation & conditions in php

I'm learning PHP and i'm trying to show an " €" when and only when $autocollant_total_ht_custom isset.
This is what i wrote :
$euro = " €";
if (isset($autocollant_total_ht_custom)) {
$autocollant_total_ht_custom = $autocollant_total_ht_custom . $euro;
} else echo " ";
However my " €" is always showing even when $autocollant_total_ht_custom is not set.
I spent 75 minutes on it, trying and failing again and again despite researching.
I also tried with !is_null, !is_empty with the same result.
I'm fairly certain that my logic isn't wrong but the way to do it is.
Anyone to the rescue?
Have a nice Saturday everyone !
Mike.
Edit 1:
A little visual aid image
My goal was to only show the content of a cell if there was indeed something in it. By default i could see 0 in the empty cells.
if (!$autocollant_total_ht_lot10) {
$autocollant_total_ht_lot10 = " ";
} else echo "error ";
if (!$autocollant_total_ht_lot20) {
$autocollant_total_ht_lot20 = " ";
} else echo " ";
if (!$autocollant_total_ht_lot50) {
$autocollant_total_ht_lot50 = " ";
} else echo " ";
if (!$autocollant_total_ht_custom) {
$autocollant_total_ht_custom = " ";
} else echo " ";
I know my code must look primitive but it works and i don't see it making a conflict with what we are trying to achieve in the initial question.
Then, as asked, this is what i'm writing in the table row and table data :
<tr>
<td class=table_align_left>A partir de 100</td>
<td><?php echo $autocollant_prix ?></td>
<td><?php echo $autocollant_custom?></td>
<td><?php echo $autocollant_total_ht_custom?> </td>
</tr>
So in short, i'm trying to not show anything if there's no value to be shown (which is currently working) and then adding a " €" after the variable is there's something to be shown.
Edit 2 :
My primitive code : my_code
Edit 3 :
The $autocollant_total_ht_custom is already conditioned to be shown earlier in this statement :
} elseif($autocollant_quantité >= 90 && $autocollant_quantité <= 99){
$autocollant_quantité_lot50 = 2;
} elseif($autocollant_quantité >= 100 && $autocollant_quantité <= 1000){
$autocollant_custom = $autocollant_quantité;
} else echo "entrée invalide";
$autocollant_total_ht_custom = $autocollant_prix * $autocollant_custom;
$autocollant_total_ht_lot10 = $autocollant_prix_lot10 * $autocollant_quantité_lot10;
$autocollant_total_ht_lot20 = $autocollant_prix_lot20 * $autocollant_quantité_lot20;
$autocollant_total_ht_lot50 = $autocollant_prix_lot50 * $autocollant_quantité_lot50;
$pointeuse_total_ht = $pointeuse_prix * $pointeuse_quantité;
$pointeuse_autocollant_offert = $pointeuse_quantité * 10;
$pointeuse_autocollant_offert_total_ht = $pointeuse_autocollant_offert * $autocollant_prix;
$pointeuse_autocollant_offert_total_ht = $pointeuse_autocollant_offert * $autocollant_prix;
I posted my code if that can help.
Mike.
//$autocollant_total_ht_custom = null;
$autocollant_total_ht_custom = "something that isnt null";
//if you switch the variable assignment above you will see it behaves as expected.
$euro = "€";
if (isset($autocollant_total_ht_custom))
{
echo $autocollant_total_ht_custom = $autocollant_total_ht_custom . " " .$euro;
}
else
{
//$autocollant_total_ht_custom wouldn't be set at all if we reach this point, this is why im un-sure what your requirements are. Nothing would be echoed.
echo $autocollant_total_ht_custom;
}
Something like this maybe? It's hard to understand your exact requirements.
IsSet checks if a variable is set to something if its not null then it passes the test, and if you're manipulating strings at this variable then it will never be null, meaning the euro sign will always show up.
If the variable IS null then you fail the conditional test, hit else and echo nothing a null string.
If you can update your answer with what you would expect "$autocollant_total_ht_custom" to be set to, I can help better.
EDIT:
Seems to me you can simplify what you what, basically we are only concerned with echoing a string at all if there is something set, otherwise there's no point doing anything, so your checks could be as simple as
$autocollant_total_ht_lot10 = null;
$autocollant_total_ht_lot11 = "";
$autocollant_total_ht_custom = "1,000";
$euro = "€";
if (isset($autocollant_total_ht_custom))
{
echo 'ht custom';
echo TDFromString($autocollant_total_ht_custom, $euro);
}
//notice this doesnt output anything because it isnt set
if (isset($autocollant_total_ht_lot10, $euro))
{
echo 'lot 10';
echo TDFromString($autocollant_total_ht_lot10, $euro);
}
//notice this does because the string, while empty is something that isnt null
if (isset($autocollant_total_ht_lot11))
{
echo 'lot 11';
echo TDFromString($autocollant_total_ht_lot11, $euro);
}
//lets set it to null and see what happens
$autocollant_total_ht_lot11 = null;
if (isset($autocollant_total_ht_lot11))
{
echo 'lot 11 AGAIN';
echo TDFromString($autocollant_total_ht_lot11, $euro);
}
//it doesnt get printed!
//create a function that takes the string in question,
//and for the sake of your use case also the currency to output,
//that way you could change 'euro' to 'currency'
//and have the sign change based on what the value of the $currency
//string is, eg $currency = "£"
function TDFromString($string, $currency)
{
return '<td>' . $string . ' ' .$currency . '</td>';
}
Live example : https://3v4l.org/r5pKt
A more explicit example : https://3v4l.org/JtnoF
I added an extra echo to indicate (and newlines) which variable is being printed out you dont need it of course!
I'll just note the function name is a good example of a bad function name, as it not only returns a td around the string but also inserts the currency, you may want to name it a little better :)
EDIT EDIT:
A final edit outside the scope of your question, you should look into keeping your data in arrays and working on them instead.
Using the previous example we can reduce the code to just this !
$autocollant_total_ht_lot10 = null;
$autocollant_total_ht_lot11 = "";
$autocollant_total_ht_lot12 = "2,0000000";
$autocollant_total_ht_custom = "1,000";
$euro = "€";
//create an array, and stick all our strings in it, from now, if we need to do something to one of the strings(or all!), we do it through the array
$arrayofLots = array($autocollant_total_ht_lot10, $autocollant_total_ht_lot11, $autocollant_total_ht_lot12, $autocollant_total_ht_custom);
//go over each array 'entry' so the first time is '$autocollant_total_ht_lot10', then '$autocollant_total_ht_lot11' etc
foreach ($arrayofLots as $lot)
{
//and we've been over this bit :)
//$lot is a variable we set so we have something to refer to for the individual array entry we are on, we could just as easily name it anything else
if (isset($lot))
{
echo TDFromString($lot, $euro);
}
}
function TDFromString($string, $currency)
{
return '<td>' . $string . ' ' .$currency . '</td>';
}
Good day. It looks like you are missing the end brace
if (isset($autocollant_total_ht_custom)) {
$autocollant_total_ht_custom = $autocollant_total_ht_custom . $euro;
} else {
echo " ";
}

Using PHP to print to HTML5 output tag

I am new to this. If I have some PHP code as in the example below, I can use the echo function to print the result. Echo always prints at the top of the screen. How do I format the tag so that in this case the result "$myPi" is printed to the screen using an HTML5 output tag? I am a newbie so please be kind to me and don't flame my post - I tried to format the code. Thanks QJB.
function taylorSeriesPi($Iteration)
{
$count = 0;
$myPi = 0.0;
for ($count=0; ($count<$Iteration);$count++)
{
if ( ($count%4) == 1)
{
$myPi = $myPi + (1/$count);
}
if ( ($count%4) == 3)
{
$myPi = $myPi - (1/$count);
}
}
$myPi *= 4.0;
echo ("Pi is ". $myPi. " After ".$Iteration. " iterations");
}
You can insert PHP anywhere in your document, and reference functions from any other place within the document or included files.
For example:
<?php
function taylorSeriesPi($Iteration)
{
$count = 0;
$myPi = 0.0;
for ($count=0; ($count<$Iteration);$count++)
{
...
}
$myPi *= 4.0;
// Return the value so we can use this function later.
return $myPi;
}
?>
<html>
<body>
<div id="somediv">
<?php
$iteration = 6/*or whatever*/;
echo "Pi is " . taylorSeriesPi($iteration) . " After " . $iteration . " iterations";
?>
</div>
</body>
</html>
This will put the returned value and associated string within the <div> tag, but you can put it anywhere in your HTML, as the output of the echo will simply be text by the time the markup is seen by your browser.

If statement and Objects

this is by Far the weirdest thing i have ever seen and i am completely confused. please someone help me with this.
$variable=array();
$count=0;
// now im am going to loop through a resource that i made
while(!feof($job))
{
$data=fgets($job);
// i am search for different things below. search for name, date, employer
// i am using regex to search btw
// presume object in class works fine, and they do.
if(search for eg name in $data, storing in $variable[$count].first($match))
// the problem is at this point i will have access to
// $variable[$count].getFirst(returns value set by first) which was set above;
if(search for eg Employer in $data, storing in variable[$count].next($match))
// i will have access here as well
// $variable[$count].getFirst(returns value set by first) which was set above
if(search for 3rd search in $data, storing in variable[$count].name($match))
// down here after the second if i am not able to see any of my variables set more than 2 if statements ago????
// $variable[$count].getFirst(does not returns the value set by first()) which was set above
if(search for 4th search in $data, storing in variable[$count].foo($match))
// check if everything is set then count++;
}
Now each one of these methods are completely dependent from the next but after 2 if statements. I am just not able to access $variable[count]->getfirst()
the answer is null;
edited
this is the actual code
require "functions/decodeEncodedUrl.php";
require "objects/jobObject.php";
$url=decodeEncodedUrl();
$profile=array();
$companies=0;
$url_search='http://www.jobbank.gc.ca/';
$startReading=0;
$job=fopen($url['url'], 'r')or die("JobBanks is failing to respond.<br>Please Try again Later");
while(!feof($job))
{
set_time_limit(500);
$profile[$companies]= new jobProfile();
$trash=fgets($job);
if(!$startReading)
{
if(preg_match('~RepeaterSearchResults_hypJobItem_[0-9]+~',$trash,$matches))
{
$startReading=true;
}
}
if($startReading)
{
$data=$trash;
if(preg_match("~href=\".*\"~",$data,$matches))
{
$temp=preg_replace("~href=~",'',$matches[0]);
$temp=preg_replace("~\"~",'',$temp);
$profile[$companies]->setLink($url_search.$temp);
var_dump($profile[$companies]);
echo "<br>";
echo "<br>";
}
if(preg_match("~>[A-Za-z-, ]+\(~",$data,$matches))
{
$temp=preg_replace("~>|\(~",'',$matches[0]);
$profile[$companies]->setPosition(ucfirst($temp));
var_dump($profile[$companies]);
echo "<br>";
echo "<br>";
}
if(preg_match("~# *[0-9]+~",$data,$matches))
{
$profile[$companies]->setOrderNum(preg_replace("~#| ~",'',$matches[0]));
var_dump($profile[$companies]);
echo "<br>";
echo "<br>";
}
if(preg_match("~Employer:</strong>.*~",$data,$matches))
{
$temp=preg_replace("~Employer:</strong> ~",'',$matches[0]);
$temp=preg_replace("~<br.*~",'',$temp);
$temp=ucfirst($temp);
$profile[$companies]->setEmployer($temp);
var_dump($profile[$companies]);
echo "<br>";
echo "<br>";
}
if(preg_match("~[$][0-9]+.*~",$data,$matches))
{
$temp=preg_replace("~/.*~",'',$matches[0]);
$profile[$companies]->setSalary(preg_replace("~[$]~","$ ",$temp));
var_dump($profile[$companies]);
echo "<br>";
echo "<br>";
}
if(preg_match("~[$][0-9]+.*~",$data,$matches))
{
$temp=preg_replace('~[$A-Za-z0-9. ]*[/] ?~','',$matches[0]);
$profile[$companies]->setRate(preg_replace('~<.*~','',$temp));
var_dump($profile[$companies]);
echo "<br>";
echo "<br>";
}
if(preg_match("~Location:.*~",$data,$matches))
{
$temp=preg_replace('~.*;~','',$matches[0]);
$temp=preg_replace('~^ |,~','',$temp);
$profile[$companies]->setCity(ucfirst($temp));
//echo ucfirst($temp)."<br>";
}
if(preg_match("~Location[:<>/\,A-Za-z ]*~",$data,$matches))
{
$profile[$companies]->setProvince($matches[0]);
//echo " ".$matches[0]."<br>\n";
//echo $profile[$companies]->getLocation()."\n<br>";
}
if(preg_match("~[0-9]{4}/[0-9]{2}/[0-9]{1,2}~",$data,$matches))
{
echo $profile[$companies]->displayHTML();
$profile[$companies]->setDate($matches[0]);
if($profile[$companies]->allDataSet())
{
//echo "data was set"."<br>";
$startReading=false;
$companies++;
}
else
{
$startReading=false;
$companies++;
echo "Data was Not set";
}
}
}
}
fclose($job);
everything works except the $profile[number] doesn't store anything in it at all after the 3 rd if statement when the variable is stored.
If
{
//Profile[number] info stored
}
if
{
//Profile[number] info available
}
if
{
//profile[number] info available
}
if
{
//profile[number] info is gone
}
variable[$count].next($match)
the .next() moves the internal pointer to the next element in the array.

How to replace whitespace in jquery to pass to php?

I'm trying to make some auto suggest stuff for an app. But i'm having some problems. I wrote this jquery code to detect when there is a change in the input:
function soletsgo(){
$('#theinput').keyup(function(){
value = $('#theinput').val();
if ( value.length > 2 ){
word = $('#theinput').val();
word.replace(/\s/g, "&nbsp");
$("#autodatathing").load("../pages/searchy.php?word="+word+"");
} else {
} }); }
And there is a PHP file (searchy.php) that processes some stuff:
<?php
$now = htmlentities($_GET['word']);
echo "<p>";
echo $now;
echo "</p>";
?>
But, when i put in a 'space' (known as 'whitespace') in the 'input', there is no result!
Could someone help?
You should do escape(word) in javascript code, then rawurldecode($_GET['word']) in php. Also if you have issues with utf8 encoding (i had in my past experience) you might consider using this function instead of rawurldecode.
You can just use:
function soletsgo(){
$('#theinput').keyup(function(){
value = $('#theinput').val();
if ( value.length > 2 ){
word = $('#theinput').val();
word.replace(" ", " ");
$("#autodatathing").load("../pages/searchy.php?word="+word+"");
} else {
}
});
}
You could also do it on the server side using:
<?php
$now = str_replace(" ", " ", htmlentities($_GET['word']));
echo "<p>";
echo $now;
echo "</p>";
?>

Why is my PHP variable created by my xpath query is throwing an "Object of class DOMAttr could not be converted to string" fatal error?

So I started a question to figure out why my PHP code that is meant to grab the only MP3 URL that exists within each of my Post contents on my Wordpress installation for a custom use of the content here >> Why doesn't my properly defined variable evaluate for length correctly (and subsequently work in the rest of my code)?
I have made several edits and updates and now need to re-formulate both the question and the context. User mrtsherman points out that based on the order I have written code, that I redefine $doc but don't load that into $xpath, however when I try to adjust this my code throws the fatal error "Object of class DOMAttr could not be converted to string" on the line at the very end where I echo the variable $BEmp3s, the end result of all this. Does this mean it cannot convert the attributes to a string I think??
I know I am soo close to the solution here it's nearly killing me but I also think I have been looking at this wayy too much and for too long. Any insight is golden at this point! Here is my code that branches correctly throughout now:
<?php
// Start MP3 URL
$doc = new DOMDocument();
$doc->strictErrorChecking = FALSE;
$xpath = new DOMXpath($doc);
// End MP3 URL
$a = 1;
if (have_posts()) :
while ( have_posts() ) : the_post();
?>
<?php
$BEpost_content = get_the_content();
if (strlen($BEpost_content) > 0) {
echo "<div id='debug_content'>get_the_content has something</div>";
} else {
echo "<div id='debug_content'>BEpost_content is empty</div>" ;
};
$success = $doc->loadHTML($BEpost_content);
$xpath = new DOMXpath($doc);
if ($success === FALSE) {
echo "<div id='debug_loadcontent'>loadHTML failed to load post content</div>";
} else {
$hrefs = $xpath->query("//a[contains(#href,'mp3')]/#href");
if ($hrefs->length > 0) {
echo "<div id='debug_xpath'>xpath found something</div>";
} else {
echo "<div id='debug_xpath'>xpath found nothing</div>";
};
$BEmp3s = $hrefs->item(0);
};
?>
<script type="text/javascript">
var myCP<?php echo $a; ?> = new CirclePlayer("#jquery_jplayer_<?php echo $a; ?>",
{
mp3: "<?php echo $BEmp3s; ?>"
}, {
cssSelectorAncestor: "#cp_container_<?php echo $a; ?>",
volume: 0.5
});
</script>
mp3: "<?php echo $BEmp3s->value; ?>"
Try using value?

Categories