PHPQuery not extracting meta details into Array - php

I am experimenting with PHPQuery (https://code.google.com/p/phpquery/) to scrape data from my website.
I want to extract meta information from a page.
Here is what I have tried so far :
$html = phpQuery::newDocumentHTML($file, $charset = 'utf-8');
$MetaItems = [];
foreach (pq('meta') as $keys) {
$names = trim(strtolower(pq($keys)->attr('name')));
if ($names !== null && $names !== '') {
array_push($MetaItems, $names);
}
}
for ($i=0; $i < count($MetaItems); $i++) {
$test = 'meta[name="' . $MetaItems[$i] . '"]';
echo pq($test)->html();
}
Above :
In $MetaItems I get all the meta attribute name.This array is filled correctly.
But selecting and extracting text is not working. How do i get the above code to work?
Thanks.

You want an assoc array with name => content, correct? Try this:
$metaItems = array();
foreach(pq('meta') as $meta) {
$key = pq($meta)->attr('name');
$value = pq($meta)->attr('content');
$metaItems[$key] = $value;
}
var_dump($metaItems);

Going under the assumption that the values you are extracting are exactly the same as the values of the name attributes your trying to get... I'm pretty sure the value of the name attribute is case sensitive. You need to remove the strtolower and the trim. Both could be causing issues. I would replace the first part with this:
$html = phpQuery::newDocumentHTML($file, $charset = 'utf-8');
$MetaItems = [];
foreach (pq('meta') as $keys) {
$names = pq($keys)->attr('name');
if (!empty($names) && trim($names)) {
array_push($MetaItems, $names);
}
}
hope that helps

Related

How to fix repetitives names & ids in HTML using PHP?

I have a HTML generated by another app that has some inconsistencies:
some link
<div id="ftn1">
content
</div>
another link
<div id="ftn1">
another content
</div>
As you can see I have a inconsistency here cause the id "ftn1" has been used twice. Same for name _ftnref1.
So I would like to know if there is a lib or a native way to fix those repetitives id and names in a way to "increment" their numbers and avoid the repetition.
Thanks in advance
First of all you need to use / download this lib php-simple-html-dom-parser and so try to use this function:
public function fixIdNames($html_original) {
$id_encontrados = [];
// Create DOM from string
$html = str_get_html($html_original);
$a_nodes = $html->find("div[id]");
foreach ($a_nodes as $key => $element) {
if (isset($id_encontrados[$element->id])) {
$id_encontrados[$element->id] = ++$id_encontrados[$element->id];
$element->id = preg_replace('/[0-9]+/', '', $element->id) . $id_encontrados[$element->id];
//$element->id = 'legal';
} else {
$id_encontrados[$element->id] = 1;
}
}
$a_nodes = $html->find("a[name]");
foreach ($a_nodes as $key => $element) {
if (isset($id_encontrados[$element->name])) {
$id_encontrados[$element->name] = ++$id_encontrados[$element->name];
$element->name = preg_replace('/[0-9]+/', '', $element->name) . $id_encontrados[$element->name];
//$element->id = 'legal';
} else {
$id_encontrados[$element->name] = 1;
}
}
return ((string) $html);
}

PHP Array & XML Can't get all content

I'm tring to get all content from this xml: https://api.eveonline.com/eve/SkillTree.xml.aspx
To save it on a MySQL DB.
But there are some data missing...
Could any1 that understand PHP, Array() and XML help me, please?
This is my code to get the content:
<?php
$filename = 'https://api.eveonline.com/eve/SkillTree.xml.aspx';
$xmlbalance = simplexml_load_file($filename);
$skills = array();
for ($x=0;$x<sizeOf($xmlbalance->result->rowset->row);$x++) {
$groupName = $xmlbalance->result->rowset->row[$x]->attributes()->groupName;
$groupID = $xmlbalance->result->rowset->row[$x]->attributes()->groupID;
for ($y=0;$y<sizeOf($xmlbalance->result->rowset->row[$x]->rowset->row);$y++) {
$skills[$x]["skillID"] = "".$xmlbalance->result->rowset->row[$x]->rowset->row[$y]->attributes()->typeID;
$skills[$x]["skillName"] = "".$xmlbalance->result->rowset->row[$x]->rowset->row[$y]->attributes()->typeName;
$skills[$x]["skillDesc"] = "".$xmlbalance->result->rowset->row[$x]->rowset->row[$y]->description;
$skills[$x]["skillRank"] = "".$xmlbalance->result->rowset->row[$x]->rowset->row[$y]->rank;
$skills[$x]["skillPrimaryAtr"] = "".$xmlbalance->result->rowset->row[$x]->rowset->row[$y]->requiredAttributes->primaryAttribute;
$skills[$x]["skillSecondAtr"] = "".$xmlbalance->result->rowset->row[$x]->rowset->row[$y]->requiredAttributes->secondaryAttribute;
$o = 0;
for ($z=0;$z<sizeOf($xmlbalance->result->rowset->row[$x]->rowset->row[$y]->rowset->row);$z++) {
if ($xmlbalance->result->rowset->row[$x]->rowset->row[$y]->rowset->attributes()->name == "requiredSkills") {
$skills[$x]["requiredSkills"]["".$xmlbalance->result->rowset->row[$x]->rowset->row[$y]->rowset->row[$z]->attributes()->typeID] = "".$xmlbalance->result->rowset->row[$x]->rowset->row[$y]->rowset->row[$z]->attributes()->skillLevel;
$o++;
}
}
}
}
echo '<pre>'; print_r($skills); echo '</pre>';
?>
If you go to the original XML (link), at line 452, you will see:
<row groupName="Spaceship Command" groupID="257">
And that isn't show in my array (link)...
That is one thing that i found that is missing...
I think that probally have more content that is missing too..
Why? How to fix it, please?
Thank you!!!
You will only get a total of sizeof($xmlbalance->result->rowset->row) records. Because, in your 2nd for loop, you are basically storing your result in the same array element that is $skills[$x].
Try this (I also higly encourage you to be as lazy as you can when you write code - by lazy I mean, avoid repeating / rewriting the same code over and over if possible) :
$filename = 'https://api.eveonline.com/eve/SkillTree.xml.aspx';
$xmlbalance = simplexml_load_file($filename);
$skills = array();
foreach ($xmlbalance->result->rowset->row as $row)
{
$groupName = $row->attributes()->groupName;
$groupID = $row->attributes()->groupID;
foreach ($row->rowset->row as $subRow)
{
$skill['skillID'] = (string) $subRow->attributes()->typeID;
$skill['skillName'] = (string) $subRow->attributes()->typeName;
$skill['skillDesc'] = (string) $subRow->description;
$skill['skillRank'] = (string) $subRow->rank;
$skill['skillPrimaryAtr'] = (string) $subRow->requiredAttributes->primaryAttribute;
$skill['skillSecondAtr'] = (string) $subRow->requiredAttributes->secondaryAttribute;
foreach ($subRow->rowset as $subSubRowset)
{
if ($subSubRowset->attributes()->name == 'requiredSkills')
{
foreach ($subSubRowset->row as $requiredSkill)
{
$skill['requiredSkills'][(string) $requiredSkill->attributes()->typeID] = (string) $requiredSkill['skillLevel'];
}
}
}
$skills[] = $skill;
}
}
print_r($skills);

jQuery tag suggestion

I'm try include this plugin on my site, it's working fine but I am having some problems with PHP.
When I write the PHP like this:
$default_tags = 'Avanture, Giorgio, Armani, Depeche, Mode, Pevanje, Francuska, usluživanje, Pravo, Menadžer, prodaje, Advokat';
if (!#$_SESSION['existing_tags']) {
$_SESSION['existing_tags'] = $default_tags;
}
$existing_tags = $_SESSION['existing_tags'];
$tags = split(' ', $default_tags);
if (isset($_SERVER['HTTP_X_REQUESTED_WITH']) && #$_GET['tag']) {
$match = array();
foreach ($tags as $tag) {
if (stripos($tag, $_GET['tag']) === 0) {
$match[] = $tag;
}
}
echo json_encode($match);
}
exit;
}
it works fine, but when I try to get a result from the database I have problems.
I have tried:
$query = mysql_query("SELECT * FROM tags");
while($row = mysql_fetch_array($query)) {
$default_tags = ''.$row['keyz'].', ';
if (!#$_SESSION['existing_tags']) {
$_SESSION['existing_tags'] = $default_tags;
}
$existing_tags = $_SESSION['existing_tags'];
$tags = split(' ', $default_tags);
if (isset($_SERVER['HTTP_X_REQUESTED_WITH']) && #$_GET['tag']) {
$match = array();
foreach ($tags as $tag) {
if (stripos($tag, $_GET['tag']) === 0) {
$match[] = $tag;
}
}
echo json_encode($match);
}
exit;
}
And this method is not working for me. Also, here is a screenshot from my database table tags. What is wrong with the above code?
Your problem is you keep overriding $default_tags variable over and over again.
At the end of your while loop, all you have is your last row with a comma at the end.
Basically you're not storing anything but the last row in that variable.
If you do the following you would have something similar what you're trying to do:
$default_tags_arr = array();
while($row = mysql_fetch_array($query)) {
array_push($default_tags_arr, $row["keyz"]);
}
$default_tags = join(", ",$default_tags_arr);
You should describe exactly what the problem is. Do you get an error message?
One thing I'm seeing that seems wrong to me is the fetch from the database:
while($row = mysql_fetch_array($query)) {
$default_tags = ''.$row['keyz'].', ';
For every row you are overriding $default_tags completely. I think maybe you wanted to say:
while($row = mysql_fetch_array($query)) //where does the curly brace closes in the original code?
$default_tags .= ''.$row['keyz'].', '; //notice the . before =
to concatenate the tags.
Aside from this I'm having trouble understanding the code:
$existing_tags = $_SESSION['existing_tags'];
$tags = split(' ', $default_tags);
Here you are assigning the variable $existing_tags which you don't use later in the code. Did you want to use it in the next line instead of $default_tags? What is the code supposed to do exactly?

How do i merge matching strings in an array?

Hi I currently have this code that retrieves the tags from each Image in the Database. The tags are seperated by commas. I place each set of tags on the end of the array. Now I want to create an array of the tags retrieved but merge any duplications.
function get_tags()
{
$tag_array = array();
$query = mysql_query("
SELECT tags
FROM gallery_image
");
while($row = mysql_fetch_assoc($query))
{
$tags = $row['tags'];
$tag_array[] = $tags;
}
echo $tag_array[0] . '<br>' . $tag_array[1] . '<br>' .$tag_array[2];
}
You question is not very clear but array_unique may be what you need?
function get_tags()
{
$query = mysql_query('SELECT tags '.
'FROM gallery_image');
$tag_array = array();
while($row = mysql_fetch_assoc($query))
$tag_array = array_merge($tag_array, explode(',', $row['tags']));
return array_unique($tag_array);
}
You probably want something like this:
$tags = array(
'one,two',
'one,three',
);
$result = array_unique(array_reduce($tags,
function($curr, $el) {
return array_merge($curr, explode(',', $el));
},
array()));
See it in action.
What this does is process each result row (which I assume looks like "tag1,tag2") in turn with array_reduce, splitting the tags out with explode and collecting them into an intermediate array which has just one tag per element. Then the duplicate tags are filtered out with array_unique to produce the end result.
Try this:
$unique_tags = array();
foreach ($tag_array as $value) {
$unique_tags = array_merge($unique_tags, explode(",", $value);
}
$unique_tags = array_unique($unique_tags);

PHP foreach Loop Element Index

I have an XPath query that gets Genres of a movie.
$genreXpath = $xml_data->xpath("//category");
I get the attributes from $genreXpath like this
$genreName=array();
$genresID=array();
$i=0;
foreach($genreXpath as $node) {
$genre = $node->attributes();
$genreName[$i] = $node["name"];
$genresID[$i] = $node["id"];
$i++;
}
I'm going to be writing these values to a Db hence the two different arrays.
This code works but I know there has to be a better way of doing this be it with a 2 d array, not using a $i counter or something more obvious that I haven't figured out....any pointers???
foreach($genreXpath as $i=>$node) { //note $i is your index of the current $node
$genre = $node->attributes();
$genreName[$i] = $node["name"];
$genresID[$i] = $node["id"];
}
It auto increments and you do not need to declare it above.
Use foreach($genreXpath as $key => $node) {
If you looking to a multidimensional you could do:
$genres = array();
foreach($genreXpath as $node) {
$genre = $node->attributes();
$genres[] = array($node["name"], $node["id"]);
}

Categories