php associative arrays, regex, array - php

I currently have the following code :
$content = "
<name>Manufacturer</name><value>John Deere</value><name>Year</name><value>2001</value><name>Location</name><value>NSW</value><name>Hours</name><value>6320</value>";
I need to find a method to create and array as name=>value. E.g Manufacturer => John Deere.
Can anyone help me with a simple code snipped I tried some regex but doesn't even work to extract the names or values, e.g.:
$pattern = "/<name>Manufacturer<\/name><value>(.*)<\/value>/";
preg_match_all($pattern, $content, $matches);
$st_selval = $matches[1][0];

You don't want to use regex for this. Try out something like SimpleXML
EDIT
Well, why don't you start with this:
<?php
$content = "<root>" . $content . "</root>";
$xml = new SimpleXMLElement($c);
print_r($xml);
?>
EDIT 2
Despite the fact that some of the answers posted using regular expression MAY work, you should get in the habit of using the correct tool for the job and regular expressions are not the correct tool for parsing of XML.

I'm using your $content variable:
$preg1 = preg_match_all('#<name>([^<]+)#', $content, $name_arr);
$preg2 = preg_match_all('#<value>([^<]+)#', $content, $val_arr);
$array = array_combine($name_arr[1], $val_arr[1]);

This is rather simple, can be solved by regex. Should be:
$name = '<name>\s*([^<]+)</name>\s*';
$value = '<value>\s*([^<]+)</value>\s*';
$pattern = "|$name $value|";
preg_match_all($pattern, $content, $matches);
# create hash
$stuff = array_combine($matches[1], $matches[2]);
# display
var_dump($stuff);
Regards
rbo

First of all, never use regex to parse xml...
You could do this with an XPATH query...
First, wrap the content in a root tag to make the parser happy (if it doesn't already have it):
$content = '<root>' . $content . '</root>';
Then, load the document
$dom = new DomDocument();
$dom->loadXml($content);
Then, initialize the XPATH
$xpath = new DomXpath($dom);
Write your query:
$xpathQuery = '//name[text()="Manufacturer"]/follwing-sibling::value/text()';
Then, execute it:
$manufacturer = $xpath->evaluate($xpathQuery);
If I did the xpath right, it $manufacturer should be John Deere...
You can see the docs on DomXpath, a basic primer on XPath, and a bunch of XPath examples...
Edit: That won't work (PHP doesn't support that syntax (following-sibling). You could do this instead of the xpath query:
$xpathQuery = '//name[text()="Manufacturer"]';
$elements = $xpath->query($xpathQuery);
$manufacturer = $elements->item(0)->nextSibling->nodeValue;

I think this is what you're looking for:
<?php
$content = "<name>Manufacturer</name><value>John Deere</value><name>Year</name><value>2001</value><name>Location</name><value>NSW</value><name>Hours</name><value>6320</value>";
$pattern = "(\<name\>(\w*)\<\/name\>\<value\>(\w*)\<\/value\>)";
preg_match_all($pattern, $content, $matches);
$arr = array();
for ($i=0; $i<count($matches); $i++){
$arr[$matches[1][$i]] = $matches[2][$i];
}
/* This is an example on how to use it */
echo "Location: " . $arr["Location"] . "<br><br>";
/* This is the array */
print_r($arr);
?>
If your array has a lot of elements dont use the count() function in the for loop, calculate the value first and then use it as a constant.

I'll edit as my PHP is wrong, but here's some PHP (pseudo-)code to give some direction.
$pattern = '|<name>([^<]*)</name>\s*<value>([^<]*)</value>|'
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);
for($i = 0; $i < count($matches); $i++) {
$arr[$matches[$i][1]] = $matches[$i][2];
}
$arr is the array you want to store the name/value pairs.

Using XMLReader:
$content = '<name>Manufacturer</name><value>John Deere</value><name>Year</name><value>2001</value><name>Location</name><value>NSW</value><name>Hours</name><value>6320</value>';
$content = '<content>' . $content . '</content>';
$output = array();
$reader = new XMLReader();
$reader->XML($content);
$currentKey = null;
$currentValue = null;
while ($reader->read()) {
switch ($reader->name) {
case 'name':
$reader->read();
$currentKey = $reader->value;
$reader->read();
break;
case 'value':
$reader->read();
$currentValue = $reader->value;
$reader->read();
break;
}
if (isset($currentKey) && isset($currentValue)) {
$output[$currentKey] = $currentValue;
$currentKey = null;
$currentValue = null;
}
}
print_r($output);
The output is:
Array
(
[Manufacturer] => John Deere
[Year] => 2001
[Location] => NSW
[Hours] => 6320
)

Related

php remove duplicate except original

this is my code
<?php
$string = 'this
this
good
good
hahah';
$rows = explode("\n",$string);
$unwanted = 'this|good';
$cleanArray= preg_grep("/$unwanted/i",$rows,PREG_GREP_INVERT);
$cleanString=implode("\n",$cleanArray);
print_r ( $cleanString );
?>
display
hahah
i want like this
this
good
hahah
i want to keep one...
please help me, thanks guys
This code resorts to checking each line to see if it matches your $unwanted string, but it also creates an array of strings it has already encountered so it checks if it has previously been encountered ( using in_array()). If it matches and has been encountered before it uses unset() in the original $rows to remove the line...
$string = 'this
this
good
good
hahah';
$rows = explode("\n",$string);
$unwanted = 'this|good';
$matched = [];
foreach ( $rows as $line => $row ) {
if ( preg_match("/$unwanted/i",$row, $matches)) {
if ( in_array(trim($matches[0]), $matched) === true ) {
unset($rows[$line]);
}
$matched[] = $matches[0];
}
}
$cleanString=implode("\n",$rows);
print_r ( $cleanString );
<?php
$string = 'this
this
good
yyyy
good
xxxx
hahah';
print_r(
implode("\n",
array_diff(array_unique(
array_map(function($v) { return trim($v);}, explode("\n",$string))
)
,array('xxxx', 'yyyy')))
);
?>
output:
this
good
hahah
Refer: https://ideone.com/Eo0MIM
You can use this simple code to get result:
$result = array_unique(explode("\n",str_replace(" ", "", $string)));
print_r ($result);
If you want more control over your data, use this code
$rows = explode("\n", $string);
$words = [];
foreach($rows as $row) {
$row = trim($row);
$words[$row] = true;
}
foreach($words as $word => $tmp) {
echo $word . "\n";
}
Here is one way you could do this:
$string = 'this
this
good
good
hahah';
preg_match_all('/([a-z])+/', $string, $matches);
$string = implode("\n",array_unique($matches[0]));
echo $string;
You can use php inbuilt array_unique function
<?php
$string = 'this
this
good
good
haha';
$rows = explode("\n",$string);
$cleanArray = array_unique($rows);
$cleanString=implode("\n",$cleanArray);
print_r ( $cleanString );
//result is this good haha

PHP Regex or DOMDocument for Matching & Removing URLs?

I'm trying to extract links from html page using DOM:
$html = file_get_contents('links.html');
$DOM = new DOMDocument();
$DOM->loadHTML($html);
$a = $DOM->getElementsByTagName('a');
foreach($a as $link){
//echo out the href attribute of the <A> tag.
echo $link->getAttribute('href').'<br/>';
}
Output:
http://dontwantthisdomain.com/dont-want-this-domain-name/
http://dontwantthisdomain2.com/also-dont-want-any-pages-from-this-domain/
http://dontwantthisdomain3.com/dont-want-any-pages-from-this-domain/
http://domain1.com/page-X-on-domain-com.html
http://dontwantthisdomain.com/dont-want-link-from-this-domain-name.html
http://dontwantthisdomain2.com/dont-want-any-pages-from-this-domain/
http://domain.com/page-XZ-on-domain-com.html
http://dontwantthisdomain.com/another-page-from-same-domain-that-i-dont-want-to-be-included/
http://dontwantthisdomain2.com/same-as-above/
http://domain3.com/page-XYZ-on-domain3-com.html
I would like to remove all results matching dontwantthisdomain.com, dontwantthisdomain2.com and dontwantthisdomain3.com so the output will looks like that:
http://domain1.com/page-X-on-domain-com.html
http://domain.com/page-XZ-on-domain-com.html
http://domain3.com/page-XYZ-on-domain3-com.html
Some people saying I should not use regex for html and others that it's ok. Could somebody point the best way how I can remove unwanted urls from my html file? :)
Maybe something like this:
function extract_domains($buffer, $whitelist) {
preg_match_all("#<a\s+.*?href=\"(.+?)\".*?>(.+?)</a>#i", $buffer, $matches);
$result = array();
foreach($matches[1] as $url) {
$url = urldecode($url);
$parts = #parse_url((string) $url);
if ($parts !== false && in_array($parts['host'], $whitelist)) {
$result[] = $parts['host'];
}
}
return $result;
}
$domains = extract_domains(file_get_contents("/path/to/html.htm"), array('stackoverflow.com', 'google.com', 'sub.example.com')));
It does a rough match on the all the <a> with href=, grabs what's between the quotes, then filters it based on your whitelist of domains.
None regex solution (without potential errors :-) :
$html='
http://dontwantthisdomain.com/dont-want-this-domain-name/
http://dontwantthisdomain2.com/also-dont-want-any-pages-from-this-domain/
http://dontwantthisdomain3.com/dont-want-any-pages-from-this-domain/
http://domain1.com/page-X-on-domain-com.html
http://dontwantthisdomain.com/dont-want-link-from-this-domain-name.html
http://dontwantthisdomain2.com/dont-want-any-pages-from-this-domain/
http://domain.com/page-XZ-on-domain-com.html
http://dontwantthisdomain.com/another-page-from-same-domain-that-i-dont-want-to-be-included/
http://dontwantthisdomain2.com/same-as-above/
http://domain3.com/page-XYZ-on-domain3-com.html
';
$html=explode("\n", $html);
$dontWant=array('dontwantthisdomain.com','dontwantthisdomain2.com','dontwantthisdomain3.com');
foreach ($html as $link) {
$ok=true;
foreach($dontWant as $notWanted) {
if (strpos($link, $notWanted)>0) {
$ok=false;
}
if (trim($link=='')) $ok=false;
}
if ($ok) $final_result[]=$link;
}
echo '<pre>';
print_r($final_result);
echo '</pre>';
outputs
Array
(
[0] => http://domain1.com/page-X-on-domain-com.html
[1] => http://domain.com/page-XZ-on-domain-com.html
[2] => http://domain3.com/page-XYZ-on-domain3-com.html
)

PHP grabbing content between two strings

// get CONTENT from united domains footer
$content = file_get_contents('http://www.uniteddomains.com/index/footer/');
// remove spaces from CONTENT
$content = preg_replace('/\s+/', '', $content);
// match all tld tags
$regex = '#target="_parent">.(.*?)</a></li><li>#';
preg_match($regex, $source, $matches);
print_r($matches);
I am wanting to match all of the TLDs:
Each tld is preceded by target="_parent">. and followed by </a></li><li>
I am wanting to end up with an array like array('africa','amsterdam','bnc'...ect ect )
What am I doing wrong here?
NOTE: The second step to remove all the spaces is just to simplify things.
Here's a regular expression that will do it for that page.
\.\w+(?=</a></li>)
REY
PHP
$content = file_get_contents('http://www.uniteddomains.com/index/footer/');
preg_match_all('/\.\w+(?=<\/a><\/li>)/m', $content, $matches);
print_r($matches);
PHPFiddle
Here are the results:
.africa, .amsterdam, .bcn, .berlin, .boston, .brussels, .budapest, .gent, .hamburg, .koeln, .london, .madrid, .melbourne, .moscow, .miami, .nagoya, .nyc, .okinawa, .osaka, .paris, .quebec, .roma, .ryukyu, .stockholm, .sydney, .tokyo, .vegas, .wien, .yokohama, .africa, .arab, .bayern, .bzh, .cymru, .kiwi, .lat, .scot, .vlaanderen, .wales, .app, .blog, .chat, .cloud, .digital, .email, .mobile, .online, .site, .mls, .secure, .web, .wiki, .associates, .business, .car, .careers, .contractors, .clothing, .design, .equipment, .estate, .gallery, .graphics, .hotel, .immo, .investments, .law, .management, .media, .money, .solutions, .sucks, .taxi, .trade, .archi, .adult, .bio, .center, .city, .club, .cool, .date, .earth, .energy, .family, .free, .green, .live, .lol, .love, .med, .ngo, .news, .phone, .pictures, .radio, .reviews, .rip, .team, .technology, .today, .voting, .buy, .deal, .luxe, .sale, .shop, .shopping, .store, .eus, .gay, .eco, .hiv, .irish, .one, .pics, .porn, .sex, .singles, .vin, .vip, .bar, .pizza, .wine, .bike, .book, .holiday, .horse, .film, .music, .party, .email, .pets, .play, .rocks, .rugby, .ski, .sport, .surf, .tour, .video
Using the DOM is cleaner:
$doc = new DOMDocument();
#$doc->loadHTMLFile('http://www.uniteddomains.com/index/footer/');
$xpath = new DOMXPath($doc);
$items = $xpath->query('/html/body/div/ul/li/ul/li[not(#class)]/a[#target="_parent"]/text()');
$result = '';
foreach($items as $item) {
$result .= $item->nodeValue; }
$result = explode('.', $result);
array_shift($result);
print_r($result);

Extract the number from a string in PHP

How can I extract the number 12345 from the following string in PHP ?
<span id="jordan934" itemprop="distance"><span class='WebDistance'>#$#20B9; </span>12345</span></h3>
I was using the following until that '#$#20B9' string was not in it .
$results = $dom->query('#jordan934"]');
$distance = false;
if (count($results)) {
$distance = (int)trim($results->current()->textContent);
}
return $distance;
}
Try using regular expression
$str = 'jordan934';
preg_match_all('!\d+!', $str, $matches);
print_r($matches);
You could use a dom object
<?php
$html = '<span id="jordan934" itemprop="distance"><span class=\'WebDistance\'>#$#20B9; </span>12345</span></h3>';
$dom = new DomDocument();
$dom->loadHTML($filePath);
$finder = new DomXPath($dom);
$classname="my-class";
$nodes = $finder->query("//*[contains(#class, '$classname')]");
var_dump($nodes);
Other not widely known function is:
$str = 'jordan934';
$int = filter_var($str, FILTER_SANITIZE_NUMBER_INT);
http://php.net/manual/pl/function.filter-var.php

I want to modify the withdrawal of an array of strings where the start and end are found

I want to modify the withdrawal of an array of strings where the start and end are found
<?php
$file = ('http://gdata.youtube.com/feeds/base/users/BBCArabicNews/uploads?alt=rss&v=2&orderby=published&client=ytapi-youtube-profile');
$string=file_get_contents($file);
function findinside($start, $end, $string) {
preg_match_all('/' . preg_quote($start,'/') . '(.+?)'. preg_quote($end, '/').'/si', $string, $m);
return $m[1];
}
$start = ':video:';
$end = '</guid>';
$out = findinside($start, $end, $string);
$out = findinside($start, $end, $string);
foreach($out as $string){
echo $string;
echo "<p></td>\n";
}
?>
Results
Q80QSzgPDD8
ozei4GysBN8
ak3bbs_UxP0
rUs-r3ilTG4
p4BO6FI5sPY
j5lclrPzeVU
dK5VWTYsJaM
mERug-d536k
h0zqd3bC0-E
ije5kuSfLKY
H9XXMPvEpHM
EK5UoQqYl4U
This works properly in withdrawing of an array of strings I want to add also
$start = '</pubDate><atom:updated>';
$end = '</atom:updated>';
I want to be Show two array of strings
Example
xSD0XJLkLQid
2011-11-08T17:36:14.000Z
bFU066NwVnD
2011-12-08T17:36:14.000Z
Can I do this with this code
Greetings
You can use PHP's DOMDocument parser like this:
$objDOM = new DOMDocument();
$objDOM->load($file); // the long one from youtube
$dates = $objDOM->getElementsByTagName("pubDate");
foreach ($dates as $node)
{
echo $node->nodeValue;
}
Use a DOM parser and then a regex parser in individual elements in the DOM (using things like getElementById()). It works better and is more failsafe.

Categories