I have a HTML generated by another app that has some inconsistencies:
some link
<div id="ftn1">
content
</div>
another link
<div id="ftn1">
another content
</div>
As you can see I have a inconsistency here cause the id "ftn1" has been used twice. Same for name _ftnref1.
So I would like to know if there is a lib or a native way to fix those repetitives id and names in a way to "increment" their numbers and avoid the repetition.
Thanks in advance
First of all you need to use / download this lib php-simple-html-dom-parser and so try to use this function:
public function fixIdNames($html_original) {
$id_encontrados = [];
// Create DOM from string
$html = str_get_html($html_original);
$a_nodes = $html->find("div[id]");
foreach ($a_nodes as $key => $element) {
if (isset($id_encontrados[$element->id])) {
$id_encontrados[$element->id] = ++$id_encontrados[$element->id];
$element->id = preg_replace('/[0-9]+/', '', $element->id) . $id_encontrados[$element->id];
//$element->id = 'legal';
} else {
$id_encontrados[$element->id] = 1;
}
}
$a_nodes = $html->find("a[name]");
foreach ($a_nodes as $key => $element) {
if (isset($id_encontrados[$element->name])) {
$id_encontrados[$element->name] = ++$id_encontrados[$element->name];
$element->name = preg_replace('/[0-9]+/', '', $element->name) . $id_encontrados[$element->name];
//$element->id = 'legal';
} else {
$id_encontrados[$element->name] = 1;
}
}
return ((string) $html);
}
Related
I'm tring to get all content from this xml: https://api.eveonline.com/eve/SkillTree.xml.aspx
To save it on a MySQL DB.
But there are some data missing...
Could any1 that understand PHP, Array() and XML help me, please?
This is my code to get the content:
<?php
$filename = 'https://api.eveonline.com/eve/SkillTree.xml.aspx';
$xmlbalance = simplexml_load_file($filename);
$skills = array();
for ($x=0;$x<sizeOf($xmlbalance->result->rowset->row);$x++) {
$groupName = $xmlbalance->result->rowset->row[$x]->attributes()->groupName;
$groupID = $xmlbalance->result->rowset->row[$x]->attributes()->groupID;
for ($y=0;$y<sizeOf($xmlbalance->result->rowset->row[$x]->rowset->row);$y++) {
$skills[$x]["skillID"] = "".$xmlbalance->result->rowset->row[$x]->rowset->row[$y]->attributes()->typeID;
$skills[$x]["skillName"] = "".$xmlbalance->result->rowset->row[$x]->rowset->row[$y]->attributes()->typeName;
$skills[$x]["skillDesc"] = "".$xmlbalance->result->rowset->row[$x]->rowset->row[$y]->description;
$skills[$x]["skillRank"] = "".$xmlbalance->result->rowset->row[$x]->rowset->row[$y]->rank;
$skills[$x]["skillPrimaryAtr"] = "".$xmlbalance->result->rowset->row[$x]->rowset->row[$y]->requiredAttributes->primaryAttribute;
$skills[$x]["skillSecondAtr"] = "".$xmlbalance->result->rowset->row[$x]->rowset->row[$y]->requiredAttributes->secondaryAttribute;
$o = 0;
for ($z=0;$z<sizeOf($xmlbalance->result->rowset->row[$x]->rowset->row[$y]->rowset->row);$z++) {
if ($xmlbalance->result->rowset->row[$x]->rowset->row[$y]->rowset->attributes()->name == "requiredSkills") {
$skills[$x]["requiredSkills"]["".$xmlbalance->result->rowset->row[$x]->rowset->row[$y]->rowset->row[$z]->attributes()->typeID] = "".$xmlbalance->result->rowset->row[$x]->rowset->row[$y]->rowset->row[$z]->attributes()->skillLevel;
$o++;
}
}
}
}
echo '<pre>'; print_r($skills); echo '</pre>';
?>
If you go to the original XML (link), at line 452, you will see:
<row groupName="Spaceship Command" groupID="257">
And that isn't show in my array (link)...
That is one thing that i found that is missing...
I think that probally have more content that is missing too..
Why? How to fix it, please?
Thank you!!!
You will only get a total of sizeof($xmlbalance->result->rowset->row) records. Because, in your 2nd for loop, you are basically storing your result in the same array element that is $skills[$x].
Try this (I also higly encourage you to be as lazy as you can when you write code - by lazy I mean, avoid repeating / rewriting the same code over and over if possible) :
$filename = 'https://api.eveonline.com/eve/SkillTree.xml.aspx';
$xmlbalance = simplexml_load_file($filename);
$skills = array();
foreach ($xmlbalance->result->rowset->row as $row)
{
$groupName = $row->attributes()->groupName;
$groupID = $row->attributes()->groupID;
foreach ($row->rowset->row as $subRow)
{
$skill['skillID'] = (string) $subRow->attributes()->typeID;
$skill['skillName'] = (string) $subRow->attributes()->typeName;
$skill['skillDesc'] = (string) $subRow->description;
$skill['skillRank'] = (string) $subRow->rank;
$skill['skillPrimaryAtr'] = (string) $subRow->requiredAttributes->primaryAttribute;
$skill['skillSecondAtr'] = (string) $subRow->requiredAttributes->secondaryAttribute;
foreach ($subRow->rowset as $subSubRowset)
{
if ($subSubRowset->attributes()->name == 'requiredSkills')
{
foreach ($subSubRowset->row as $requiredSkill)
{
$skill['requiredSkills'][(string) $requiredSkill->attributes()->typeID] = (string) $requiredSkill['skillLevel'];
}
}
}
$skills[] = $skill;
}
}
print_r($skills);
I tried comparing 2 text files with data separated by -, in one case one file gets all data and in another case only has the data for change with the same id in this case this file it's called db_tmp.txt
The structure in both files it's this :
File txt ( the first it´s the id ) db/text.txt
1a34-Mark Jhonson
1a56-Jhon Smith
1a78-Mary Walter
The file for comparing, has for example the data for change, same id but different content - db_tmp.txt
1a56-Tom Tom
I created a function for comparing both files to detect if the same id and change exists:
<?php
$cron_file = file("db_tmp.txt");
$cron_compare = file("db/test.txt");
function cron($file_temp, $file_target)
{
for ($fte = 0; $fte < sizeof($file_temp); $fte++) {
$exp_id_tmp = explode("-", $file_temp[$fte]);
$cr_temp[] = "" . $exp_id_tmp[0] . "";
}
for ($ftt = 0; $ftt < sizeof($file_target); $ftt++) {
$exp_id_targ = explode("-", $file_target[$ftt]);
$cr_target[] = "" . $exp_id_targ[0] . "";
}
$diff = array_diff($cr_target, $cr_temp);
$it = 0;
foreach ($diff as $diff2 => $key) {
echo $diff2;
echo "--- $key";
print "<br>";
}
}
cron($cron_file, $cron_compare);
?>
If the same id exists in tmp, i must detect the entry in the other file and change to the value of the tmp, i try but can't get this to work, the function works but not for everything, if anyone can help me, that would be perfect, because i don´t know how continue this and save.
If you want to match according to id, a simple foreach would suffice, then just check during the loop if they have the same key. Consider this example:
// sample data from file('db_text.txt');
$contents_from_text = array('1a34-Mark Jhonson','1a56-Jhon Smith', '1a87-Mary Walter');
// reformat
$text = array();
foreach($contents_from_text as $element) {
list($key, $value) = explode('-', $element);
$text[$key] = $value;
}
$tmp = array();
// sample data from file('db_tmp.txt');
$contents_from_tmp = array('1a56-Tom Tom', '1a32-Magic Johnson', '1a23-Michael Jordan', '1a34-Larry Bird');
foreach($contents_from_tmp as $element) {
list($key, $value) = explode('-', $element);
$tmp[$key] = $value;
}
// compare
foreach($tmp as $key => $value) {
foreach($text as $index => $element) {
if($key == $index) {
$tmp[$key] = $element;
}
}
}
$contents_from_tmp = $tmp;
print_r($contents_from_tmp);
Sample Output
i try to make an simple way to create an box in a class.
The problem is , it only give me the first element in the array. I echo out the $values and i get the whole css code and i try to place them in style at div. But still get only the last element.
My currently code looks like:
class general {
public function box($content,$style,$width = 50,$height = 50) {
foreach ($style as $k => $v) {
$values = ''.$k.':'.$v.';';
echo($values);
$box = '<div class="testBox" style="'.$values.'">'.$content.'</div> ';
}
return $box;
}
}
$general = new general();
$test = array(
'background-color' => '#000',
'font-size' => '120px'
);
echo $general->box('testValue',$test);
Try like this:
public function box($content,$style,$width = 50,$height = 50) {
$values = '';
foreach ($style as $k => $v) {
$values .= ''.$k.':'.$v.';';
}
$box = '<div class="testBox" style="'.$values.'">'.$content.'</div> ';
return $box;
}
$box = '<div class="testBox" style="'.$values.'">'.$content.'</div> ';
to
$box .= '<div class="testBox" style="'.$values.'">'.$content.'</div> ';
And declare
$box = ''; outside the loop.
You need to concate the data using .
I am experimenting with PHPQuery (https://code.google.com/p/phpquery/) to scrape data from my website.
I want to extract meta information from a page.
Here is what I have tried so far :
$html = phpQuery::newDocumentHTML($file, $charset = 'utf-8');
$MetaItems = [];
foreach (pq('meta') as $keys) {
$names = trim(strtolower(pq($keys)->attr('name')));
if ($names !== null && $names !== '') {
array_push($MetaItems, $names);
}
}
for ($i=0; $i < count($MetaItems); $i++) {
$test = 'meta[name="' . $MetaItems[$i] . '"]';
echo pq($test)->html();
}
Above :
In $MetaItems I get all the meta attribute name.This array is filled correctly.
But selecting and extracting text is not working. How do i get the above code to work?
Thanks.
You want an assoc array with name => content, correct? Try this:
$metaItems = array();
foreach(pq('meta') as $meta) {
$key = pq($meta)->attr('name');
$value = pq($meta)->attr('content');
$metaItems[$key] = $value;
}
var_dump($metaItems);
Going under the assumption that the values you are extracting are exactly the same as the values of the name attributes your trying to get... I'm pretty sure the value of the name attribute is case sensitive. You need to remove the strtolower and the trim. Both could be causing issues. I would replace the first part with this:
$html = phpQuery::newDocumentHTML($file, $charset = 'utf-8');
$MetaItems = [];
foreach (pq('meta') as $keys) {
$names = pq($keys)->attr('name');
if (!empty($names) && trim($names)) {
array_push($MetaItems, $names);
}
}
hope that helps
Let's assume we want to process this Feed: http://tools.forestview.eu/xmlp/xml_feed.php?aid=1094&cid=1000
I'm trying to show the nodes of an XML file this way:
deals->deal->dealsite
deals->deal->deal_id
deals->deal->deal_title
This is in order to be able to process feeds that we don't know what their XML tags are. So we will let the user choose that deals->deal->deal_title is the Deal Title and will recognize it that way.
I have been trying ages to do this with this code:
class HandleXML {
var $root_tag = false;
var $xml_tags = array();
var $keys = array();
function parse_recursive(SimpleXMLElement $element)
{
$get_name = $element->getName();
$children = $element->children(); // get all children
if (empty($this->root_tag)) {
$this->root_tag = $this->root_tag.$get_name;
}
$this->xml_tags[] = $get_name;
// only show children if there are any
if(count($children))
{
foreach($children as $child)
{
$this->parse_recursive($child); // recursion :)
}
}
else {
$key = implode('->', $this->xml_tags);
$this->xml_tags = array();
if (!in_array($key, $this->keys)) {
if (!strstr('>', $key) && count($this->keys) > 0) { $key = $this->root_tag.'->'.$key; }
if (!in_array($key, $this->keys)) {
$this->keys[] = $key;
}
}
}
}
}
$xml = new SimpleXMLElement($feed_url, null, true);
$handle_xml = new HandleXML;
$handle_xml->parse_recursive($xml);
foreach($handle_xml->keys as $key) {
echo $key.'<br />';
}
exit;
but here's what I get instead:
deals->deal->dealsite
deals->deal_id
deals->deal_title
See on 2nd and 3rd line the deal-> part is missing.
I have also tried with this code: http://pastebin.com/FkPWXF64 but it's definitely not the best way to go and it doesn't always work.
No matter how many times I couldn't do it.
In one of my sites I use a little different approach to handle xml feed. In your case it would look like:
$xml = simplexml_load_file("http://tools.forestview.eu/xmlp/xml_feed.php?aid=1094&cid=1000");
foreach($xml->{'deal'} as $deal)
{
$dealsite = $deal->{'dealsite'};
$dael_id = $deal->{'dael_id'};
$deal_title = $deal->{'deal_title'};
$deal_url = $deal->{'deal_url'};
$deal_city = $deal->{'deal_city'};
$deal_category = $deal->{'deal_category'};
// and so on for the rest
// do some stuff with the variables like insert into MySQL
}