Parsing Youtube Annotations XML with PHP - php
While parsing XML with php is not exactly a new topic, I have had problems ensuring I parse Annotations from Youtube due to the varying information coming back.
I have spent quite a lot of time reading up about iterating over the attributes (which I think would be the "safe way") but, unfortunately, I have been unsuccessful with my attempts.
Iterating:
$data = file_get_contents("https://www.youtube.com/annotations_invideo?video_id=GUmcQuJVCrI");
$xml = simplexml_load_string($data);
foreach($xml->children() as $child) {
$role = $child->attributes();
foreach($child as $key => $value) {
if($role == "annotation")
echo "Role:".$role. "<br />";
echo("[".$key ."] ".$value . "<br />");
}
}
Another way I have tried to parse the information is by actually following the paths of the nested elements for example:
$data = file_get_contents("https://www.youtube.com/annotations_invideo?video_id=tFYaQwYbvxg");
$xml = simplexml_load_string($data);
$xml = $xml->annotations[0];
Then with a foreach loop:
foreach($xml->annotation as $annotation) {
I have been able to retrieve some of the information I require
$startX = ($annotation->segment->movingRegion->rectRegion[0]->attributes()->x[0]);
$endX = ($annotation->segment->movingRegion->rectRegion[1]->attributes()->x[0]);
$startY = ($annotation->segment->movingRegion->rectRegion[0]->attributes()->y[0]);
$endY = ($annotation->segment->movingRegion->rectRegion[1]->attributes()->y[0]);
$startW = ($annotation->segment->movingRegion->rectRegion[0]->attributes()->w[0]);
$endW = ($annotation->segment->movingRegion->rectRegion[1]->attributes()->w[0]);
$startH = ($annotation->segment->movingRegion->rectRegion[0]->attributes()->h[0]);
$endH = ($annotation->segment->movingRegion->rectRegion[1]->attributes()->h[0]);
$startT = ($annotation->segment->movingRegion->rectRegion[0]->attributes()->t[0]);
$endT = ($annotation->segment->movingRegion->rectRegion[1]->attributes()->t[0]);
I am well aware this is very messy and it fails if the data is anyway different or if there is an annotation type that I didnt know about.
I have been able to stop it failing for example type "text" annotations by ignoring some annotation types like so...
$type = $annotation->attributes()->type;
if (($type !="promotion") &&($type !="branding") &&($type !="card")&&($type !="drawer"))
but this is not really what I want to do in the long run, i really need to be able to parse all types even if Youtube had a new type tomorrow!
Summary:
I really need a safe, reliable way to be able to retrieve all annotations from any Youtube video and store all the information about each one such as the position X and Y, the start time and end time, the text if exists, the link if exists.
I also need the styling if exists.
If exists is the main problem here I think!
I have it working well for some annotations but not all....
I would appreciate if someone could advise me how to better format this so it is reliable and efficient.
Thanks for your time.
Related
How to get all data from multi dimesnion object array when it changes all the time
At start i made html table for specific xml array, it worked.But then i tested my code on others and it failed read all levels :( here are 2 arrays that try to access manually $newaCref->Debts->LiabilityDebts[$i]->Debt->Sum[$b]->Total $newaCref->Debts->LiabilityDebts[$k]->Debt[$j]->Sum->Total to access them first parts $newaCref->Debts->LiabilityDebts are the same but then it changes and goes even deeper. My question is how to make it go automatically through all levels i need? This is how i do it now for each <td> row and $newaCref is result of XML $newaCref = new SimpleXMLElement($xmldataC, LIBXML_NOCDATA); for($i=0;$i<count($newaCref->Debts->LiabilityDebts);$i++){ for($b=0;$b<count($newaCref->Debts->LiabilityDebts[$i]->Debt->Sum);$b++){ foreach($newaCref->Liabilities->Liability[$i]->Sum[$b]->Total as $row){ echo '<td>'.$row.'</td>'; } } }
Not testet, just to explain: $newaCref = new SimpleXMLElement($xmldataC, LIBXML_NOCDATA); $result = $xml->xpath('/Debts/LiabilityDebts'); while(list( , $node) = each($result)) { $subresult = $node->xpath('Debt/Sum'); while(list( , $subnode) = each($subresult)) { echo '<td>'.((string)$subnode).'</td>'; } } Notes: In your example the inner foreach makes no sence at this point. $newaCref->Liabilities->Liability[$i] The xpath for this would be /Liabilities/Liability If you are showing $xmldataC in your example, why are you saying #RiggsFolly i cant do it :( and not just do print_r(htmlentities($xmldataC))? So take this as example and get it work for your needs. But keep in mind, xml can be tricky if you dont now how they work, especially in PHP. :)
Parsing NWS XML Data - cap:geocode
I followed the info here: Parse XML namespaces with php SimpleXML And that works for everything except the information contained in the "cap:geocode" and "cap:parameter" entries. $geocode = $entry->children('cap',true)->geocode; returns an empty value. Any ideas on how to get at the data inside of the cap:geocode and cap:parameter entries? <cap:geocode> <valueName>FIPS6</valueName> <value>048017 048079</value> <valueName>UGC</valueName> <value>TXZ027 TXZ033</value> </cap:geocode> I need to read the ValueName/Value pairs.
I used this example here: https://github.com/tylerlane/php.news-leader.com/blob/master/weather/alerts.php And simplified it for my purposes to get this (echo.php just prints the data out): $dataFileName = "wx/CAP.xml"; //load the feed $capXML = simplexml_load_file($dataFileName); //how many items $itemsTotal = count($capXML->entry); if(count($itemsTotal)): $capXML->registerXPathNamespace('prefix', 'http://www.w3.org/2005/Atom'); $result = $capXML->xpath("//prefix:entry"); foreach($result as $capXML): $dc = $capXML->children('urn:oasis:names:tc:emergency:cap:1.1'); $event = $dc->event; $effective = $dc->effective; $expires = $dc->expires; $status = $dc->status; $msgType = $dc->msgType; $category = $dc->category; $urgency = $dc->urgency; $severity = $dc->severity; $certainty = $dc->certainty; $areadesc = $dc->areaDesc; $geopolygon = $dc->polygon; //get the children of the geocode element $geocodechildren = $dc->geocode->children(); //only interested in FIPS6 for now //no guarantee that FIPS6 will be the first child so we have to deal with that if($geocodechildren->valueName == "FIPS6"){ //isolate all the FIPS codes $fips = explode( " ", $geocodechildren->value ); } else { //hide everything else so we don't fail $fips = Array(); } //get the VTEC $parameter_children = $dc->parameter->children(); if($parameter_children->valueName == "VTEC"){ //isolate all VTEC codes $vtec = explode( ".", $parameter_children->value ); } else { //hide anything else that may show up $vtec = Array(); } include('echo.php'); print_r($fips); echo "<br/>"; print_r($vtec); echo "<hr/>"; endforeach; endif;
Any ideas on how to get at the data inside of the cap:geocode and cap:parameter entries? The key point in your case is, that the XML provided in the other question is invalid. You would have noticed that if you had followed a good practice in PHP development: Enable reporting of errors, warning and notices to the highest level, display those as well as log those to file. Then track those warnings. In your case you should have seen some message like: Warning: simplexml...: namespace error : Namespace prefix cap on event is not defined in /path/to/script.php on line 42 This is a notice that the cap XML namespace prefix is undefined. That means that SimpleXML will drop it. Those elements are then put into the default namespace of the document so that you can access them directly. So first of all make yourself comfortable with setting your php.ini file on your development system for development error reporting so that you'll be noticed about unexpected input values. One stop for that is the following question: How to get useful error messages in PHP? Next to that you need to decide why the input is wrong and how you'd like to deal with errors. Should it fail (the design of XML suggest to go with the fail route which is also considered a design issue for XML) or do you want to "repair" the XML or do you want to work with the invalid XML. That decision is up to. SimpleXML does work as announced, it's just in your case you got the error unnoticed and you're not doing any error handling so far. The same problem with similar XML has been asked/answered about previously: SimpleXML PHP Parsing [Duplicate] (marked as duplicate, albeit the duplicate does not talk about the error) Create a WS-Security header using SimpleXML? ( create an XML document with SimpleXML with namespace prefixes)
Breaking foreach at certain string/ Reading through text file and generating XML
I don't know if this is the right way to go about it, but right now I am dealing with a very large text file of membership details. It is really inconsistent though, but typically conforming to this format: Name School Department Address Phone Email &&^ (indicating the end of the individual record) What I want to do with this information is read through it, and then format it into XML. So right now I have a foreach reading through the long file like this: <?php $textline = file("asrlist.txt"); foreach($textline as $showline){ echo $showline . "<br>"; } ?> And that's where I don't know how to continue. Can anybody give me some hints on how I could organize these records into XML?
Here a straightforward solution using simplexml: $members = explode('&&^', $textline); // building array $members $xml = new SimpleXMLElement("<?xml version="1.0" encoding="UTF-8"?><members></members>"); $fieldnames = array('name','school','department','address','phone','email'); // set $fieldsep to character(s) that seperate fields from each other in your textfile $fieldsep = '\p\n'; // a wild guess... foreach ($members as $member) { $m = explode($fieldsep, $member); // build array $m; $m[0] would contain "name" etc. $xmlmember = $xml->addChild('member'); foreach ($m as $key => $data) $xmlmember->addChild($fieldnames[$key],$data); } // foreach $members $xml->asXML('mymembers.xml'); For reading and parsing the text-file, CSV-related functions could be a good alternative, as mentioned by other users.
To read big files you can use fgetcsv
If && works as a delimiter for records in that file, you could start with replacing it with </member><member>. Prepend whole file with <member> and append </member> at the end. You will have something XML alike. How to replace? You might find unix tools like sed useful. sed 's/&&/\<\/member\>\<member\>/' <input.txt >output.xml You can also accomplish it with PHP, using str_replace(): foreach($textline as $showline){ echo str_replace( '&&', '</member><member>', $showline ) . "<br>"; }
Passing Object Operators As Strings (PHP)
I'm building a script that takes the contents of several (~13) news feeds and parses the XML data and inserts the records into a database. Since I don't have any control over the structure of the feeds, I need to tailor an object operator for each one to drill down into the structure in order to get the information I need. The script works just fine if the target node is one step below the root, but if my string contains a second step, it fails ( 'foo' works, but 'foo->bar' fails). I've tried escaping characters and eval(), but I feel like I'm missing something glaringly obvious. Any help would be greatly appreciated. // Roadmaps for xml navigation $roadmap[1] = "deal"; // works $roadmap[2] = "channel->item"; // fails $roadmap[3] = "deals->deal"; $roadmap[4] = "resource"; $roadmap[5] = "object"; $roadmap[6] = "product"; $roadmap[8] = "channel->deal"; $roadmap[13] = "channel->item"; $roadmap[20] = "product"; $xmlSource = $xmlURL[$fID]; $xml=simplexml_load_file($xmlSource) or die(mysql_error()); if (!(empty($xml))) { foreach($xml->$roadmap[$fID] as $div) { include('./_'.$incName.'/feedVars.php'); include('./_includes/masterCategory.php.inc'); $test = sqlVendors($vendorName); } // end foreach echo $vUpdated." records updated.<br>"; echo $vInserted." records Inserted.<br><br>"; } else { echo $xmlSource." returned an empty set!"; } // END IF empty $xml result
While Fosco's solution will work, it is indeed very dirty. How about using xpath instead of object properties? $xml->xpath('deals/deal');
PHP isn't going to magically turn your string which includes -> into a second level search. Quick and dirty hack... eval("\$node = \"\$xml->" . $roadmap[$fID] . "\";"); foreach($node as $div) {
Parsing XML with PHP (simplexml)
Firstly, may I point out that I am a newcomer to all things PHP so apologies if anything here is unclear and I'm afraid the more layman the response the better. I've been having real trouble parsing an xml file in to php to then populate an HTML table for my website. At the moment, I have been able to get the full xml feed in to a string which I can then echo and view and all seems well. I then thought I would be able to use simplexml to pick out specific elements and print their content but have been unable to do this. The xml feed will be constantly changing (structure remaining the same) and is in compressed format. From various sources I've identified the following commands to get my feed in to the right format within a string although I am still unable to print specific elements. I've tried every combination without any luck and suspect I may be barking up the wrong tree. Could someone please point me in the right direction?! $file = fopen("compress.zlib://$url", 'r'); $xmlstr = file_get_contents($url); $xml = new SimpleXMLElement($url,null,true); foreach($xml as $name) { echo "{$name->awCat}\r\n"; } Many, many thanks in advance, Chris PS The actual feed
Since no one followed my closevote, I think I can just as well put my own comments as an answer: First of all, SimpleXml can load URIs directly and it can do so with stream wrappers, so your three calls in the beginning can be shortened to (note that you are not using $file at all) $merchantProductFeed = new SimpleXMLElement("compress.zlib://$url", null, TRUE); To get the values you can either use the implicit SimpleXml API and drill down to the wanted elements (like shown multiple times elsewhere on the site): foreach ($merchantProductFeed->merchant->prod as $prod) { echo $prod->cat->awCat , PHP_EOL; } or you can use an XPath query to get at the wanted elements directly $xml = new SimpleXMLElement("compress.zlib://$url", null, TRUE); foreach ($xml->xpath('/merchantProductFeed/merchant/prod/cat/awCat') as $awCat) { echo $awCat, PHP_EOL; } Live Demo Note that fetching all $awCat elements from the source XML is rather pointless though, because all of them have "Bodycare & Fitness" for value. Of course you can also mix XPath and the implict API and just fetch the prod elements and then drill down to the various children of them. Using XPath should be somewhat faster than iterating over the SimpleXmlElement object graph. Though it should be noted that the difference is in an neglectable area (read 0.000x vs 0.000y) for your feed. Still, if you plan to do more XML work, it pays off to familiarize yourself with XPath, because it's quite powerful. Think of it as SQL for XML. For additional examples see A simple program to CRUD node and node values of xml file and PHP Manual - SimpleXml Basic Examples
Try this... $url = "http://datafeed.api.productserve.com/datafeed/download/apikey/58bc4442611e03a13eca07d83607f851/cid/97,98,142,144,146,129,595,539,147,149,613,626,135,163,168,159,169,161,167,170,137,171,548,174,183,178,179,175,172,623,139,614,189,194,141,205,198,206,203,208,199,204,201,61,62,72,73,71,74,75,76,77,78,79,63,80,82,64,83,84,85,65,86,87,88,90,89,91,67,92,94,33,54,53,57,58,52,603,60,56,66,128,130,133,212,207,209,210,211,68,69,213,216,217,218,219,220,221,223,70,224,225,226,227,228,229,4,5,10,11,537,13,19,15,14,18,6,551,20,21,22,23,24,25,26,7,30,29,32,619,34,8,35,618,40,38,42,43,9,45,46,651,47,49,50,634,230,231,538,235,550,240,239,241,556,245,244,242,521,576,575,577,579,281,283,554,285,555,303,304,286,282,287,288,173,193,637,639,640,642,643,644,641,650,177,379,648,181,645,384,387,646,598,611,391,393,647,395,631,602,570,600,405,187,411,412,413,414,415,416,649,418,419,420,99,100,101,107,110,111,113,114,115,116,118,121,122,127,581,624,123,594,125,421,604,599,422,530,434,532,428,474,475,476,477,423,608,437,438,440,441,442,444,446,447,607,424,451,448,453,449,452,450,425,455,457,459,460,456,458,426,616,463,464,465,466,467,427,625,597,473,469,617,470,429,430,615,483,484,485,487,488,529,596,431,432,489,490,361,633,362,366,367,368,371,369,363,372,373,374,377,375,536,535,364,378,380,381,365,383,385,386,390,392,394,396,397,399,402,404,406,407,540,542,544,546,547,246,558,247,252,559,255,248,256,265,259,632,260,261,262,557,249,266,267,268,269,612,251,277,250,272,270,271,273,561,560,347,348,354,350,352,349,355,356,357,358,359,360,586,590,592,588,591,589,328,629,330,338,493,635,495,507,563,564,567,569,568/mid/2891/columns/merchant_id,merchant_name,aw_product_id,merchant_product_id,product_name,description,category_id,category_name,merchant_category,aw_deep_link,aw_image_url,search_price,delivery_cost,merchant_deep_link,merchant_image_url/format/xml/compression/gzip/"; $zd = gzopen($url, "r"); $data = gzread($zd, 1000000); gzclose($zd); if ($data !== false) { $xml = simplexml_load_string($data); foreach ($xml->merchant->prod as $pr) { echo $pr->cat->awCat . "<br>"; } }
<?php $xmlstr = file_get_contents("compress.zlib://$url"); $xml = simplexml_load_string($xmlstr); // you can transverse the xml tree however you want foreach ($xml->merchant->prod as $line) { // $line->cat->awCat -> you can use this } more information here
Use print_r($xml) to see the structure of the parsed XML feed. Then it becomes obvious how you would traverse it: foreach ($xml->merchant->prod as $prod) { print $prod->pId; print $prod->text->name; print $prod->cat->awCat; # <-- which is what you wanted print $prod->price->buynow; }
$url = 'you url here'; $f = gzopen ($url, 'r'); $xml = new SimpleXMLElement (fread ($f, 1000000)); foreach($xml->xpath ('//prod') as $name) { echo (string) $name->cat->awCatId, "\r\n"; }