Lots of tutorials around the net but none of them can explain me this:
How do I select a single element (in a table, for example), having its absolute XPath?
Example:
I have this:
/html/body/table/tbody/tr[2]/td[2]/table/tbody/tr/td/table[3]/tbody/tr/td/table/tbody/tr[3]/td/table/tbody/tr[4]/td[5]/span
What's that PHP function to get the text of that element?!
Really I could not find an answer. Found lots of guides and hints to get all the elements of the table, all the buttons of a form, etc, but not what I need.
Thank you.
$xml = simplexml_load_string($html_content_string);
$arr = $xml->xpath("//body/table/tbody/tr[2]/td[2]/table/tbody/tr/td/table[3]/tbody/tr/td/table/tbody/tr[3]/td/table/tbody/tr[4]/td[5]/span");
var_dump($arr);
Load you HTML document into a DOM object then make a DOMXPath object from it and let it evaluate your query string.
It's all described in detail here: http://php.net/manual/en/book.dom.php
Related
I have two lines of XML data that are attributes but also contain data inside then and they are repeating fields. They are being stored in a SimpleXML variable.
<inputField Type="Name">John Doe</inputField>
<inputField Type="DateOfHire">Tomorrow</inputField>
(Clearly this isnt real data but the syntax is actually in my data and I'm just using string data in them)
Everything that I've seen says to access the data like this, ,which I have tried and it worked perfectly. But my data is dynamic so the data isn't always going to be in the same place, so it doesn't fit my needs.
$xmlFile->inputField[0];
$xmlFile->inputField[1];
This works fine until one of the lines is missing, and I can have anywhere from 0 to 5 lines. So what I was wondering was is there any way that I can access the data by attribute name? So potentially like this.
$xmlFile->inputField['Name'];
or
$xmlFile->inputField->Name;
I use these as examples strictly to illustrate what I'm trying to do, I am aware that neither of the above lines of code are syntactically correct.
Just a note this information is being generated externally so I cannot change the format.
If anyone needs clarification feel free to let me know and would be happy to elaborate.
Maybe like this?
echo $xmlFile->inputField->attributest()->Name;
And what you're using? DOMDocument or simplexml?
You don't say, but I assume you're using SimpleXMLElement?
If you want to access every item, just iterate:
foreach ($xmlFile->inputField as $inputField) { ... }
If you want to access an attribute use array notation:
$inputField['Type']
If you want to access only one specific element, use xpath:
$xmlFile->xpath('inputField[#Type="Name"]');
Perhaps you should read through the basic examples of usage in the SimpleXMLElement documentation?
For example you can a grab a data:
$xmlFile = simplexml_load_file($file);
foreach($xmlFile->inputField as $res) {
echo $res["Name"];
}
Let's say i have this block of code,
<div id="id1">
This is some text
<div class="class1"><p>lala</p> Some markup</div>
</div>
What I would want is only the text "This is some text" without the child element's .class1 contents. I can do it in jquery using $('#id1').contents().eq(0).text(), how can i do this in phpQuery?
Thanks.
my bad, i was doing
pq('#id1.contents().eq(0).text()')
instead of
pq('#id1')->contents()->eq(0)->text()
If compatibility is what you are after, and you want to traverse/manipulate elements as DOM objects, then perhaps the PHP DOM XML library is what you are after: http://www.php.net/manual/en/book.domxml.php
Your code would look something like this:
$xml = xmldoc('<div id="id1">This is some text<div class="class1"><p>lala</p> Some markup</div></div>');
$node = $xml->get_element_by_id("id1");
$content = $node->get_content();
I'm sorry, I don't have time to run a test of this right now, but hopefully it sets you in the right direction, and forms the basis for a decent revision... There is a good list of DOM traversal functions in the PHP documentation though :)
References: http://www.php.net/manual/en/book.domxml.php, http://www.php.net/manual/en/function.domdocument-get-element-by-id.php, http://www.php.net/manual/en/function.domnode-get-content.php
I'm trying to parse this feed: http://musicbrainz.org/ws/1/artist/c0b2500e-0cef-4130-869d-732b23ed9df5?type=xml&inc=url-rels
I want to grab the URLs inside the 'relation-list' tag.
I've tried fetching the URL with PHP using simplexml_load_file(), but I can't access it using $feed->artist->relation-list as PHP interprets "list" as the list() function.
I have a feeling I'm going about this wrong (not much XML experience), and even if I was able to get hold of the elements I want, I don't know how to extract their attributes (I just want the type and target fields).
Can anyone gently nudge me in the right direction?
Thanks.
Matt
Have a look at the examples on the php.net page, they actually tell you how to solve this:
// $feed->artist->relation-list
$feed->artist->{'relation-list'}
To get an attribute of a node, just use the attribute name as array index on the node:
foreach( $feed->artist->{'relation-list'}->relation as $relation ) {
$target = (string)$relation['target'];
$type = (string)$relation['type'];
// Do something with it
}
(Untested)
are there build in functions in latest versions of php specially designed to aid in this task ?
Use a DOM parser like SimpleXML to split the HTML code into nodes, and walk through the nodes to build the array.
For broken/invalid HTML, SimpleHTMLDOM is more lenient (but it's not built in).
String replace and explode would work if the HTML code is clean and always the same, as soon as you have new attributes it will brake.
So only dependable solution would be using regular expressions or XML/HTML parser.
Check http://php.net/manual/en/book.dom.php
An alternative to using a native DOM parser could be using YQL. This way you dont have to do the actual parsing yourself. The YQL Web Service enables applications to query, filter, and combine data from different sources across the Internet.
For instance, to grab the HTML table with the class example given at
http://www.w3schools.com/html/html_tables.asp
you can do
$yql = 'http://tinyurl.com/yql-table-grab';
$yql = json_decode(file_get_contents($yql));
print_r( $yql->query->results );
I've deliberated shortened the URL so it does not mess up the answer. $yql actually links to the YQL API, adds some options and contains the query:
select * from html
where xpath="//table[#class='example']"
and url="http://www.w3schools.com/html/html_tables.asp"
YQL can return JSON and XML. I've made it return JSON and decoded this then, which then results in a nested structure of stdClass objects and Arrays (so it's not all arrays). You have to see if that fits your needs.
You try out the interactive YQL console to see how it works.
i dont know if this is the faster , but you can check this class (using preg_replace)
http://wonshik.com/snippet/Convert-HTML-Table-into-a-PHP-Array
If you want to convert the html-description of a table, here's how I would do it:
remove all closing tags (</...>) ( http://php.net/manual/de/function.str-replace.php)
split string at opening tags (<...>) using a regular expression ( http://php.net/manual/en/function.split.php)
You have to work out the details on your own, since I do not know if you want to handle different lines as subarrays or you want to merge all lines into one big array or something else.
you could use the explode-function to turn the table cols and rows into arrays.
see: php explode
I'm trying to fetch data from a div (based on his id), using PHP's PCRE. The goal is to fetch div's contents based on his id, and using recursivity / depth to get everything inside it. The main problem here is to get other divs inside the "main div", because regex would stop once it gets the next </div> it finds after the initial <div id="test">.
I've tryed so many different approaches to the subject, and none of it worked. The best solution, in my oppinion, is to use the R parameter (Recursion), but never got it to work properly.
Any Ideais?
Thanks in advance :D
You'd be much better off using some form of DOM parser - regex really isn't suited to this problem. If all you want is basic HTML dom parsing, something like simplehtmldom would be right up your alley. It's trivial to install (just include a single PHP file) and trivial to use (2-3 lines will do what you need).
include('simple-html-dom.php');
$dom = str_get_html($bunchofhtmlcode);
$testdiv = $dom->find('div#test',0); // 0 for the first occurrence
$testdiv_contents = $testdiv->innertext;