Puphpeteer - Get text and href-attribute from link - php

I am using "#nesk/puphpeteer": "^2.0.0" Link to Github-Repo and want get the text and the href-attribute from a link.
I tried the following:
<?php
require_once '../vendor/autoload.php';
use Nesk\Puphpeteer\Puppeteer;
use Nesk\Rialto\Data\JsFunction;
$debug = true;
$puppeteer = new Puppeteer([
'read_timeout' => 100,
'debug' => $debug,
]);
$browser = $puppeteer->launch([
'headless' => !$debug,
'ignoreHTTPSErrors' => true,
]);
$page = $browser->newPage();
$page->goto('http://example.python-scraping.com/');
//get text and link
$links = $page->querySelectorXPath('//*[#id="results"]/table/tbody/tr/td/div/a', JsFunction::createWithParameters(['node'])
->body('return node.textContent;'));
// iterate over links and print each link and its text
// get single text
$singleText = $page->querySelectorXPath('//*[#id="pagination"]/a', JsFunction::createWithParameters(['node'])
->body('return node.textContent;'));
$browser->close();
When I run the above script I get the nodes from the page, BUT I cannot access the attributes or the text?
Any suggestions how to do this?
I appreciate your replies!

querySelectorXPath return array of ElementHandle. one more thing querySelectorXPath does not support callback function.
first get all node ElementHandle
$links = $page->querySelectorXPath('//*[#id="results"]/table/tbody/tr/td/div/a');
then loop over links to access attributes or text of node
foreach($links as $link){
// for text
$text = $link->evaluate(JsFunction::createWithParameters(['node'])
->body('return node.innerText;'));
// for link
$link = $link->evaluate(JsFunction::createWithParameters(['node'])
->body('return node.href;'));
}

Disclaimer: This is just an intermediate answer - I would update once I've got more specific requests on HTML attrs or other expectations to be retrieved.
tl;dr: Mentioned composer package nesk/puphpeteer really is just a wrapper to underlying NodeJS based implementation of puppeteer. Thus, accessing data and structures has to be "similar" to their JavaScript counterparts...
Maybe Codeception (headless) or symfony/dom-crawler (raw markup) might be better and more mature alternatives.
Anyway, let's pick the example from above and go through it step by step:
$links = $page->querySelectorXPath(
'//*[#id="results"]/table/tbody/tr/td/div/a',
JsFunction::createWithParameters(['node'])->body('return node.textContent;')
);
XPath query $x() would result in an array of ElementHandle items
to access exported node.textContent (from JsFunction), corresponding data gets fetched via ElementHandle.getProperty(prop)
exporting a scalar value (to PHP) is then done via ElementHandle.jsonValue()
Thus, after that we would have something like this:
$links = $page->querySelectorXPath(
'//*[#id="results"]/table/tbody/tr/td/div/a',
JsFunction::createWithParameters(['node'])->body('return node.textContent;')
);
/** #var \Nesk\Puphpeteer\Resources\ElementHandle $link */
foreach ($links as $link) {
var_dump($link->getProperty('textContent')->jsonValue());
}
Which outputs the following raw data (as retrieved from http://example.python-scraping.com/):
string(12) " Afghanistan"
string(14) " Aland Islands"
string(8) " Albania"
string(8) " Algeria"
string(15) " American Samoa"
string(8) " Andorra"
string(7) " Angola"
string(9) " Anguilla"
string(11) " Antarctica"
string(20) " Antigua and Barbuda"

Related

Issue with accents and encoding with Bing Spell Check API v7

So I am trying to work with Bing's Spell Check API in PHP, but I'm having an issues where accents and other special characters aren't decoded properly, creating many errors that aren't in the original text and messing with the offsets.
My implementation is quite simple - it's heavily based on the example they give in their documentation. I'm not sure if I am supposed to be doing something differently or if it is an issue on their side with how they decode those special characters (which seems highly unlikely - me messing something up is much more probable..!)
Here's the code:
$host = 'https://api.cognitive.microsoft.com';
$path = '/bing/v7.0/spellcheck?';
$data = array (
'mkt' => $lang,
'mode' => 'proof',
'text' => urlencode($text)
);
$encodedData = http_build_query($data);
$key = 'subscription key redacted for obvious reasons';
$headers = "Content-type: application/x-www-form-urlencoded\r\n" .
"Ocp-Apim-Subscription-Key: $key\r\n";
if (isset($_SERVER['REMOTE_ADDR']))
$headers .= "X-MSEdge-ClientIP: " . $_SERVER['REMOTE_ADDR'] . "\r\n";
$options = array (
'http' => array (
'header' => $headers,
'method' => 'POST',
'content' => $encodedData
)
);
$context = stream_context_create ($options);
$result = file_get_contents ($host . $path, false, $context);
if ($result === FALSE) {
# Handle error
}
$decodedResult = json_decode($result, true);
If, for example, I try to spell check the following string:
d'institution
$encodedData becomes the following:
mkt=fr-CA&method=proof&text=d%25E2%2580%2599institutions
And the results I get from the API are the following:
array(2) {
["_type"]=>
string(10) "SpellCheck"
["flaggedTokens"]=>
array(1) {
[0]=>
array(4) {
["offset"]=>
int(8)
["token"]=>
string(14) "99institutions"
["type"]=>
string(12) "UnknownToken"
["suggestions"]=>
array(2) {
[0]=>
array(2) {
["suggestion"]=>
string(15) "99 institutions"
["score"]=>
float(0.93191315174102)
}
[1]=>
array(2) {
["suggestion"]=>
string(14) "99 institution"
["score"]=>
float(0.6518044080768)
}
}
}
}
}
As you can see, the decoding seems to be problematic, as the % gets encoded twice, and is only decoded once apparently. Now, if I remove the url_encode() when setting the value of 'text' in $data, it'll work fine for the apostrophe, but it doesn't work with accents. For example, the following string:
Responsabilité
is interpreted by the API as
Responsabilité
which returns an error.
This could very well be something simple that I'm overlooking, but I've been struggling with this for quite a while and would appreciate any help I can get.
Thanks,
- Émile
[ Edit ] Well, as always... when in doubt, assume you're wrong. The API recommended to change all of the accents for regular letters because even if the specified language was French, it still gave suggestions in English instead of returning an empty array. As for the accents that didn't seem to be decoded, well... I was var_dump-ing that data without any doctype set, so of course it would show without the proper encoding. Sorry about that - in the end, simply removing the urlencode() does the trick!
As per the docs:
The API supports two proofing modes, Proof and Spell. The default mode is Proof. The Proof spelling mode provides the most comprehensive checks, but it's available only in the en-US (English-United States) market. For all other markets, set the mode query parameter to Spell. The Spell mode finds most spelling mistakes but doesn't find some of the grammar errors that Proof catches (for example, capitalization and repeated words).

PHP XML converter response

im quite new on XML, i am currently work in PHP and i have a request to other server which normaly and always return the callback with XML
In any case i would not talk about how i get the response, but how to convert the XML reponse.
They are two type of the XML response, we could call it as $response
first :
<BASEELEMENT SCHEMA="RESULT" METHOD="ADD-MBX">
<RETURN_VALUE>4</RETURN_VALUE>
<ERROR_DESCRIPTION>
<![CDATA[Error (7004): Login name already exists. Please use another login name.]]>
</ERROR_DESCRIPTION>
</BASEELEMENT>
and the second one is :
<baseelement schema="TIMEOUT" method="ADD-MBX"></baseelement>
i have searched and i got how to convert the xml reponse to obejct or array, which is
$response = new \SimpleXMLElement($response);
after convert :
object(SimpleXMLElement)#256 (3) {
["#attributes"]=>
array(2) {
["SCHEMA"]=>
string(6) "RESULT"
["METHOD"]=>
string(7) "ADD-MBX"
}
["RETURN_VALUE"]=>
string(1) "4"
["ERROR_DESCRIPTION"]=>
object(SimpleXMLElement)#257 (0) {
}
}
as we can see the ERROR_DESCRIPTION become 0.
So i would like to ask, is there any xml converter else simplexml_load_string() or new \SimpleXMLElement()
i cant use that function because i could not get the ERROR_DESCRIPTION
thank you :)
Loading it as you are with -
$response = new \SimpleXMLElement($response);
You can fetch the value quite simply as...
echo (string)($response->ERROR_DESCRIPTION).PHP_EOL;
If you want the full content (as in an XML version)...
echo $response->ERROR_DESCRIPTION->asXML().PHP_EOL;
Using XPath, you can fetch the values by...
echo (string)$response->xpath("//ERROR_DESCRIPTION")[0].PHP_EOL;
The output you are seeing shows that ERROR_DESCRIPTION is an object of SimpleXMLElement type, not that it isn't loaded.

Composer Package (PHPLeague) in Codeigniter loading class

I am aiming to use the following package for my project Color Extractor.
I have composer working and setup in my project fine as per #philsturgeon's tutorial here, however I am stuck as the function returns an empty array for an image I know is there.
I am autoloading in the index.php using require FCPATH.'vendor/autoload.php';
And I have tested this using Phil's example.
My Code looks like this:
$client = new League\ColorExtractor\Client();
$image = $client->loadJpeg(FCPATH.'assets/images/tumblr_ma7gmzwfAq1r780z3o1_250.jpg');
// Get the most used color hex code
$palette = $image->extract();
// Get three most used color hex code
$palette = $image->extract(3);
// Change the Minimum Color Ratio (0 - 1)
// Default: 0
$image->setMinColorRatio(1);
$palette = $image->extract();
var_dump($palette);
Can anyone tell me what I am doing wrong here as I don't have any errors in my log and I get the standard output.
array(0) { }
I was being utterly stupid apologies for anyone who viewed this question I was trying to declare multiple variables with the same name and variants of the function.
My final code looks like
$client = new League\ColorExtractor\Client();
$image = $client->loadJpeg(FCPATH.'assets/images/tumblr_ma7gmzwfAq1r780z3o1_250.jpg');
$palette = $image->extract(3);
var_dump($palette);
Which returns an expected array like this.
array(3) { [0]=> string(7) "#E09D73" [1]=> string(7) "#AC4C34" [2]=>
string(7) "#EEDF6C" }

PHPmyGraph: send an array with GET

I could not find any answer to my question.
I'm using PhPmyGraph ( http://phpmygraph.abisvmm.nl/ ) to display a graph of some data from my databases.
The problem is that I have to create my arrays in the file itself, and if I want 2 graphs on the page I need to create 2 different files.
Apparently the file is easier to use with a CMS but I'm not using one.
This is the file graph.php:
<?php
//Set content-type header for the graphs
header("Content-type: image/png");
//Include phpMyGraph5.0.php
include_once('../phpMyGraph5.0.php');
//Set config directives
$cfg['title'] = 'Example graph';
$cfg['width'] = 500;
$cfg['height'] = 250;
//Set data
$data = array(
'Jan' => 12,
'Nov' => 78,
'Dec' => 23
);
//Create phpMyGraph instance
$graph = new phpMyGraph();
//Parse
$graph->parseVerticalPolygonGraph($data, $cfg);
?>
I call it in my page index.php:
echo " < img src=\"graph.php\"> ";
Is there another way to do it? And send the data from index.php to graph.php?
Or maybe move the code graph.php into index.php ? The problem is for the image object, I don't really know how to do it!
UPDATE:
I have almost found a solution, my code is now:
in graph.php:
//Parse
$graph->parseVerticalPolygonGraph(unserialize($_GET['data']), $cfg);
index.php :
$select_daily = mysql_query("SELECT * FROM table");
while ($row_daily = mysql_fetch_assoc($select_daily) ){
$y = substr($row_daily['ymd'], 0, -4); // Year
$m = substr($row_daily['ymd'], 4, -2); // Month
$d = substr($row_daily['ymd'], -2); // Day
$key = $d."/".$m."/".$y;
$data_daily [$key] = $row_daily['members'];
}
foreach($data_daily as $key => $value) {
echo $key ,' : ', $value ,'<br/>';
}
echo "< img src=\"graph.php?data=".serialize($data_daily)."\">";
But I get the error "provided data is not an array"
I can't see what's wrong with it?
if I do var_dump($data_daily) I get:
array(8) { ["14/12/2011"]=> string(1) "0" ["13/12/2011"]=> string(2)
"11" ["12/12/2011"]=> string(1) "0" ["11/12/2011"]=> string(1) "2"
["10/12/2011"]=> string(1) "9" ["09/12/2011"]=> string(1) "3"
["08/12/2011"]=> string(1) "6" ["07/12/2011"]=> string(1) "6" }
UPDATE2:
var_dump($data1); gives:
array(12) { ["Jan"]=> int(12) ["Feb"]=>
int(25) ["Mar"]=> int(0) ["Apr"]=> int(7) ["May"]=> int(80) ["Jun"]=>
int(67) ["Jul"]=> int(45) ["Aug"]=> int(66) ["Sep"]=> int(23)
["Oct"]=> int(23) ["Nov"]=> int(78) ["Dec"]=> int(23) }
and var_dump($s_data1 = serialize($data1)) gives:
a:12:s:3:"Jan";i:12;s:3:"Feb";i:25;s:3:"Mar";i:0;s:3:"Apr";i:7;s:3:"May";i:80;s:3:"Jun";i:67;s:3:"Jul";i:45;s:3:"Aug";i:66;s:3:"Sep";i:23;s:3:"Oct";i:23;s:3:"Nov";i:78;s:3:"Dec";i:23;}
Then unserialize($s_data1); gives the same thing than $data1
So the argument 1 of the parse should be correct... I can’t see what is wrong
I finally gave up and loaded my arrays in graph.php:
if ($_GET['data'] == 'daily'){
$cfg['title'] = 'daily';
$graph->parseVerticalPolygonGraph($data_daily, $cfg);
}
And I call the file like that:
echo "<img src=\"graph.php?data=daily\">";
Thanks for your help anyway
I previously needed a page to display multiple graphs using phpMyGraph and the approach I took was to use data URI's and php's ob_start() and ob_get_clean()
Simply use this for each graph:
ob_start();
$graph->parseVerticalPolygonGraph($data, $cfg);
$img = ob_get_clean();
echo "<img src='data:image/gif;base64," . base64_encode($img) . "/>";
I recommend using gif's for the format since that way your page size will not be huge, you can do this by setting $cfg["type"] to "gif" (See here http://phpmygraph.abisvmm.nl/#ConfigDirectives)
This will also reduce the overhead of multiple requests and prevent hotlinking to the images.
You can read more about data URI's here
http://en.wikipedia.org/wiki/Data_URI_scheme
you might want to try
echo "< img src=\"graph.php?data=".urlencode(serialize($data_daily))."\">"
I might be misunderstanding which script is throwing the error, however (I'm presuming that it's graph.php that's giving you the provided data is not an array).
Try using json instead of serialize
echo "< img src=\"graph.php?data=".urlencode(json_encode($data_daily))."\">"
$graph->parseVerticalPolygonGraph(json_decode($_GET['data'],true), $cfg);
I see no reason for this to throw an error.

php if statement

I am using a twitter class to post updates to my account for this I have removed my twitter credentials so I am aware that XXXXX is wrong. I am able to parse the text from the remote xml file. This xml files text always reads "There are no active codes." So in my if statement i said that if the xml file reads "There are no active codes." i dont want to post anything to my twitter, but if it changes to anything else then i would like to parse that information and post it to my twitter. So today when there was an update to the xml file nothing happened. I know that the twitter portion is correct because I have a similar script that does not use an if statement and it posts fine. Once i introduced the if statement i have run into problem of not being able to post. So what can i do to post to twitter only when the xml file changes from "There are no active codes." to anything else?
// Parse Message
$source = file_get_contents('WEBSITE_URL_GOES_HERE');
$dom = new DOMDocument();
#$dom->loadHTML($source);
$xml = simplexml_import_dom($dom);
$match = $xml->xpath("//code_message");
//Twitter class (Updating status)
require_once 'twitteroauth.php';
//Twitter credentials
define("CONSUMER_KEY", "XXXXXX");
define("CONSUMER_SECRET", "XXXXXX");
define("OAUTH_TOKEN", "XXXXXX-XXXXXX");
define("OAUTH_SECRET", "XXXXXX");
// Verify credentials
$connection = new TwitterOAuth(CONSUMER_KEY, CONSUMER_SECRET, OAUTH_TOKEN, OAUTH_SECRET);
$content = $connection->get('account/verify_credentials');
//If Statement
if ( $match[0] == "There are no active codes." ) {
/* Do Nothing */;
} else {
$connection->post('statuses/update', array('status' => 'New Code Available - ' . $match[0] ));
return $connection;
}
var_dump of the $match array:
array(1) { [0]=> object(SimpleXMLElement)#3 (1) { [0]=> string(32) "There are no active codes." } }
You should probably use a string comparison function.
like strcmp : http://php.net/manual/en/function.strcmp.php
if (strcmp($match,"There are no active codes.") != 0 )
{
$connection->post('statuses/update', array('status' => 'New Code Available - ' . $match[0] ));
return $connection;
}
Why don't you just add some debugging code and check what's going on? Echo you $match[0] and check what's in there. It's hard to imagine "if" being broken, isn't it? Maybe var_dump($match) just to check what's going on. Then you should probably either fix your condition or fix the retrieval of $match.
Check your array: the [0] element is an object, not a string. You need to get the first element of that object, if you know what I mean.
You are comparing:
Object(SimpleXMLElement)#3 (1) { [0]=> string(32) "There are no active codes." }
to
"There are no active codes."
Which is obviously not the same.
I can't test it from here, but check out the manual of simpelXMLElement: http://php.net/manual/en/class.simplexmlelement.php
You should probably get away with a simple call. Just check out what kind of object you have. A simple example would be something like "$match[0]->childname", but I can't quickly see what the childname is. Check out the manual for some sort of getchild or something, shouldn't be too tricky

Categories