Foreach does not get xpath results from node - php

I use xpath webdriver to find a div in the code and I need to get data on each node of this div, but this is not happening.
HTML:
<div class="elements">
<div class="element"><div class="title">Title A</div></div>
<div class="element"><div class="title">Title B</div></div>
<div class="element"><div class="title">Title C</div></div>
</div>
PHP Code:
$elements = array();
$data = $driver->findElements(WebDriverBy::xpath("//div[#class='elements']//div[#class='element']"));
foreach ($data as $i => $element) {
$elements[$i]["title"] = $element->findElement(WebDriverBy::xpath("//div[#class='title']"))->getText();
}
Result Array $elements being returned:
Array
(
[0] => Array
(
[title] => Title A
)
[1] => Array
(
[title] => Title A
)
[2] => Array
(
[title] => Title A
)
)
The above script is only returning Title A 3 times.
I need it to work like it has a numeral in xPath [x]. Exemple:
(//div[#class='elements']//div[#class='element'])[1]//div[#class='title'] for Title A
(//div[#class='elements']//div[#class='element'])[2]//div[#class='title'] for Title B
(//div[#class='elements']//div[#class='element'])[3]//div[#class='title'] for Title C
I can't use numeral because xPath is too big and would mess up the code a lot.
Surely the correct node xPath in foreach wasn't supposed to work?

When using WebElement to locate another WebElement with xpath you need to use current context . in the path
$element->findElement(WebDriverBy::xpath(".//div[#class='title']"))

Related

PHP preg_match_all extract id and name, where id in tag is optional

I have following code:
<?php
$html = '<div>
<div class="block">
<div class="id">10</div>
<div class="name">first element</div>
</div>
<div class="block">
<div class="name">second element</div>
</div>
<div class="block">
<div class="id">30</div>
<div class="name">third element</div>
</div>
</div>';
preg_match_all('/<div class="block">[\s]+<div class="id">(.*?)<\/div>[\s]+<div class="name">(.*?)<\/div>[\s]+<\/div>/ms', $html, $matches);
print_r($matches);
I want to get array with id and name, but the second position doesn't have id, so my preg match skipped this one. How can I generate array without skip and print sth like this [ ... [id => 0 // or null, name => 'second element'] ...]?
Use DOMDocument to solve this task; there are a lot of good reasons not to use regular expressions.
Assuming your HTML code is stored in $html variable, create an instance of DOMDocument, load the HTML code, and initialize DOMXPath:
$dom = new DOMDocument();
libxml_use_internal_errors(1);
$dom->loadHTML($html, LIBXML_NOBLANKS);
$dom->formatOutput = True;
$xpath = new DOMXPath($dom);
Use DOMXPath to search for all <div> nodes with class "name" and prepare an empty array for the results:
$nodes = $xpath->query('//div[#class="name"]');
$result = array();
For each node found, run an additional query to find the optional node with class "id", then add a record to the results array:
foreach ($nodes as $node) {
$id = $xpath->query('div[#class="id"]', $node->parentNode);
$result[] = array(
'id' => $id->count() ? $id->item(0)->nodeValue : null,
'name' => $node->nodeValue
);
}
print_r($result);
This is the result:
Array
(
[0] => Array
(
[id] => 10
[name] => first element
)
[1] => Array
(
[id] =>
[name] => second element
)
[2] => Array
(
[id] => 30
[name] => third element
)
)

Is it possible to exclude parts of the matched string in preg_match?

when writing a script that is supposed to download content from a specific div I was wondering if it is possible to skip some part of the pattern in such a way that it will not be included in the matching result.
examlple:
<?php
$html = '
<div class="items">
<div class="item-s-1827">
content 1
</div>
<div class="item-s-1827">
content 2
</div>
<div class="item-s-1827">
content 3
</div>
</div>
';
preg_match_all('/<div class=\"item-s-([0-9]*?)\">([^`]*?)<\/div>/', $html, $match);
print_r($match);
/*
Array
(
[0] => Array
(
[0] => <div class="item-s-1827">
content 1
</div>
[1] => <div class="item-s-1827">
content 2
</div>
[2] => <div class="item-s-1827">
content 3
</div>
)
[1] => Array
(
[0] => 1827
[1] => 1827
[2] => 1827
)
[2] => Array
(
[0] =>
content 1
[1] =>
content 2
[2] =>
content 3
) ) */
Is it possible to omit class=\"item-s-([0-9]*?)\" In such a way that the result is not displayed in the $match variable?
In general, you can assert strings precede or follow your search string with positive lookbehinds / positive lookaheads. In the case of a lookbehind, the pattern must be of a fixed length which stands in conflict with your requirements. But fortunately there's a powerful alternative to that: You can make use of \K (keep text out of regex), see http://php.net/manual/en/regexp.reference.escape.php:
\K can be used to reset the match start since PHP 5.2.4. For example, the patter foo\Kbar matches "foobar", but reports that it has matched "bar". The use of \K does not interfere with the setting of captured substrings. For example, when the pattern (foo)\Kbar matches "foobar", the first substring is still set to "foo".
So here's the regex (I made some additional changes to that), with \K and a positive lookahead:
preg_match_all('/<div class="item-s-[0-9]+">\s*\K[^<]*?(?=\s*<\/div>)/', $html, $match);
print_r($match);
prints
Array
(
[0] => Array
(
[0] => content 1
[1] => content 2
[2] => content 3
)
)
The preferred way to parse HTML in PHP is to use DomDocument to load the HTML and then DomXPath to search the result object.
Update
Modified based on comments to question so that <div> class names just have to begin with item-s-.
$html = '<div class="items">
<div class="item-s-1827">
content 1
</div>
<div class="item-s-18364">
content 2
</div>
<div class="item-s-1827">
content 3
</div>
</div>';
$doc = new DomDocument();
$doc->loadHTML($html);
$xpath = new DomXPath($doc);
$divs = $xpath->query("//div[starts-with(#class,'item-s-')]");
foreach ($divs as $div) {
$values[] = trim($div->nodeValue);
}
print_r($values);
Output:
Array (
[0] => content 1
[1] => content 2
[2] => content 3
)
Demo on 3v4l.org

Using xPath to access values of simpleXML

I have a XML object result from my database containing settings.
I am trying to access the values for a particular settingName:
SimpleXMLElement Object
(
[settings] => Array
(
[0] => SimpleXMLElement Object
(
[settingName] => Test
[settingDescription] => Testing
[requireValue] => 1
[localeID] => 14
[status] => 1
[value] => 66
[settingID] => 5
)
[1] => SimpleXMLElement Object
(
[settingName] => Home Page Stats
[settingDescription] => Show the Top 5 Teammate / Teamleader stats?
[requireValue] => 0
[localeID] => 14
[status] => 0
[value] => SimpleXMLElement Object
(
)
[settingID] => 3
)
)
)
I tried using xPath and have this so far:
$value = $fetchSettings->xpath("//settingName[text()='Test']/../value");
which returns:
Array ( [0] => SimpleXMLElement Object ( [0] => 66 ) )
How can I get the actual value and not just another array/object?
The end result will just be 66 for the example above.
SimpleXMLElement::xpath() returns a plain PHP array of "search results"; the first result will always be index 0 if any results were found.
Each "search result" is a SimpleXMLElement object, which has a magic __toString() method for getting the direct text content of a node (including CDATA, but including text inside child nodes, etc). The simplest way to call it is with (string)$my_element; (int)$my_element will also invoke it, then convert the result to an integer.
So:
$xpath_results = $fetchSettings->xpath("//settingName[text()='Test']/../value");
if ( count($xpath_results) > 0 ) {
$value = (string)$xpath_results[0];
}
Alternatively, the DOMXPath class can return results other than element and attribute nodes, due to the DOM's richer object model. For instance, you can have an XPath expression ending //text() to refer to the text content of a node, rather than the node itself (SimpleXML will do the search, but give you an element object anyway).
The downside is it's rather more verbose to use, but luckily you can mix and match the two sets of functions (using dom_import_simplexml() and its counterpart) as they have the same underlying representation:
// WARNING: Untested code. Please comment or edit if you find a bug!
$fetchSettings_dom = dom_import_simplexml($fetchSettings);
$xpath = new DOMXPath($fetchSettings_dom->ownerDocument);
$value = $xpath->evaluate(
"//settingName[text()='Test']/../value/text()",
$fetchSettings_dom
);
Because every element in a XML-file can appear as multiple times the parser always returns an array. If you are sure, that it is only a single item you can use current()
echo (string) current($value);
Note, that I cast the SimpleXMLElement to a string (see http://php.net/manual/simplexmlelement.tostring.php ) to get the actual value.
Use DomXPath class instead.
http://php.net/manual/en/domxpath.evaluate.php
The sample from php.net is just equivalent what you'd like to achieve:
<?php
$doc = new DOMDocument;
$doc->load('book.xml');
$xpath = new DOMXPath($doc);
$tbody = $doc->getElementsByTagName('tbody')->item(0);
// our query is relative to the tbody node
$query = 'count(row/entry[. = "en"])';
$entries = $xpath->evaluate($query, $tbody);
echo "There are $entries english books\n";
In this way, you can get values straight from the XML.

PHP Query in what tag is the word.

I'm currently using this code to retrieve tags.
$title = $pq->find("title")->text();
$h1 = $pq->find("h1")->text();
$p = $pq->find("p")->text();
Is this the proper way of doing it?
Secondly I have to see what word from my array $array_words is in which tag. So i have retrieved the file_get_contents and removed all tags and put all words in an array. Now lets take this for example:
Array
(
[0] => hello
[1] => there
[2] => this
[3] => is
[4] => a
[8] => test
[9] => array
)
and this would be the HTML:
<html>
<head>
<title>
hello there
</title>
</head>
<body>
<h1>
this is a
</h1>
<p>
test array
</p>
</body>
</html>
How can I find out which word is found in which tag?
I hope I made somewhat clear what I'm trying to do.
Based on the question, the point is that you need to create a reference of which word from $array_words is in some HTML tag.
So you have a array of tags that you want to check, right?
What i'm seen is it:
Get All Tags That you Want to Check.
Put All Tags on a Foreach to check all.
On Foreach, use phpQuery to find the words inside those tags.
phpQuery should return text, so you should break in into a new array of words called "$words_from_text", using explode. A new array are created.
Use a "in_array" comparator into a new foreach (inside the old one) to find what words from $array_words are inside the text.
If a Key From $words_from_text is find in the $array_words, put in on the array of Tags by setting a new array attached to the tag key.
$array_tags = (
'h1','div','title',
)
$array_words =
(
[0] => hello
[1] => there
[2] => this
[3] => is
[4] => a
[8] => test
[9] => array
)
Final Array with the results should be like it :
$array_tags = array(
['title'] = array('word1','word2'),
['h1'] = array('word3','word4'),
['div'] = array('word5','word6')
);
So if this example is what you need, you can use this guideline to resolve your problem.

Find h3 and h4 tags beneath it

This is my HTML:
<h3>test 1</h3>
<p>blah</p>
<h4>subheading 1</h4>
<p>blah</p>
<h4>subheading 2</h4>
<h3>test 2</h3>
<h4>subheading 3</h4>
<p>blah</p>
<h3>test 3</h3>
I am trying to build an array of the h3 tags, with the h4 tags nested within them. An example of the array would look like:
Array
(
[test1] => Array
(
[0] => subheading 1
[1] => subheading 2
)
[test 2] => Array
(
[0] => subheading 3
)
[test 3] => Array
(
)
)
Happy to use preg_match or DOMDocument, any ideas?
With DOMDocument:
use XPath "//h3" to find all <h3>. These will be the first-level entries in your array
for each of them:
count a variable $i (count from 1!) as part of the loop
use XPath "./following::h4[count(preceding::h3) = $i]" to find any sub-ordinate <h4>
these will be second-level in you array
The XPath expression is "select all <h4> that have a the same constant number of preceding <h3>". For the first <h3> that count is 1, naturally, for the second the count is 2, and so on.
Be sure to execute the XPath expression in the context of the respective <h3> nodes.

Categories