Match multiple results single line php regex - php

I would like to match multiple results on a single line string but I am only able to get the last iteration on the result I excpected.
For example I have this string : <ul><li>test1</li><li>test2</li>test3</li></ul>
I would like to get :
test1
test2
test3
As result but I only get "test3"
I used this regex <ul>(<li><a.*>(.*)<\/a><\/li>)*<\/ul> on : https://regex101.com/ but I don't know what I did wrong.

Use a parser instead:
<?php
$html = <<<DATA
<ul>
<li>test1</li>
<li>test2</li>
<li>test3</li>
</ul>
DATA;
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DomXPath($dom);
$links = $xpath->query("//li/a");
foreach ($links as $link) {
echo $link->textContent;
}
?>
This sets up the DOM and uses an xpath expression to get the element(s).

Try like this:
(?<=(<a href="#">))([\s\S]| |w\[0-9]| )+?(?=(<\/a>))
or
(?<=(">))([\s\S]| |w\[0-9]| )+?(?=(<\/a>))
or
(?<=(<a href="#">))(.)+?(?=(<\/a>))
link with example:
https://regex101.com/r/MHnxxh/1
or
https://regex101.com/r/MHnxxh/2
<?php
$str = '
<ul>
<li>test1</li>
<li>test2</li>
<li>test3</li>
</ul>
';
preg_match_all('/(?<=(#">))([\s\S]| |w\[0-9]| )+?(?=(<\/a>))/', $str, $matches);
// display array if need
echo "<pre>";
print_r($matches);
// display list
foreach ($matches[0] as $key => $value) {
echo $value ."\r\n";
}
?>

preg_match_all("\#\"\>[a-z]\w+\<\/\a\>,
$out, PREG_PATTERN_ORDER)
this the regex pattern....try this
("#\">[a-z]\w+\</\a>)
this will extract only all text strings....

you cane use of preg_replace
$test = '<ul><li>test1</li><li>test2</li>test3</li></ul>';
echo preg_replace('/<[^>]*>/', ' ', $test);

Related

How to get specific attribute of html in string using php?

I got a string and I need to find out all the data-id numbers.
This is the string
<li data-type="mentionable" data-id="2">bla bla...
<li data-type="mentionable" data-id="812">some test
<li>bla bla </li>more text
<li data-type="mentionable" data-id="282">
So in the end It will find me this : 2,812,282
Use DOMDocument instead:
<?php
$data = <<<DATA
<li data-type="mentionable" data-id="2">bla bla...
<li data-type="mentionable" data-id="812">some test
<li>bla bla </li>more text
<li data-type="mentionable" data-id="282">
DATA;
$doc = new DOMDocument();
$doc->loadHTML($data, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($doc);
$ids = [];
foreach ($xpath->query("//li[#data-id]") as $item) {
$ids[] = $item->getAttribute('data-id');
}
print_r($ids);
?>
Which gives you 2, 812, 282, see a demo on ideone.com.
You can use regex to find target part of string in preg_match_all().
preg_match_all("/data-id=\"(\d+)\"/", $str, $matches);
// $matches[1] is array contain target values
echo implode(',', $matches[1]) // return 2,812,282
See result of code in demo
Because your string is HTML, you can use DOMDocument class to parse HTML and find target attribute in document.

need regular expression for li

How can I get the Strings between my li tags in php? I have tried many php code but it does not work.
<li class="release">
<strong>Release info:</strong>
<div>
How.to.Train.Your.Dragon.2.2014.All.BluRay.Persian
</div>
<div>
How.to.Train.Your.Dragon.2.2014.1080p.BRRip.x264.DTS-JYK
</div>
<div>
How.to.Train.Your.Dragon.2.2014.720p.BluRay.x264-SPARKS
</div>
</li>
you can try this
$myPattern = "/<li class=\"release\">(.*?)<\/li>/s";
$myText = '<li class="release">*</li>';
preg_match($myPattern,$myText,$match);
echo $match[1];
You don't need a regular expression. It seems to be a common mistake to use regular expressions to parse HTML code (I took the URL from T.J. Crowder comment).
Use a tool to parse HTML, for instance: DOM library.
This is a solution to get all strings (I'm assuming those are the values of the text nodes):
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//li//text()');
$strings = array();
foreach($nodes as $node) {
$string = trim($node->nodeValue);
if( $string !== '' ) {
$strings[] = trim($node->nodeValue);
}
}
print_r($strings); outputs:
Array
(
[0] => Release info:
[1] => How.to.Train.Your.Dragon.2.2014.All.BluRay.Persian
[2] => How.to.Train.Your.Dragon.2.2014.1080p.BRRip.x264.DTS-JYK
[3] => How.to.Train.Your.Dragon.2.2014.720p.BluRay.x264-SPARKS
)

Get last Element (<a>) tag content from html

I have a string with some HTML. In the HTML is a list of anchors (<a> tags) and I would like to get the last of those anchors.
<div id="breadcrumbs">
Home
Suppliers
This One i needed
<span class="currentpage">Amrapali</span>
</div>
Make use of DOMDocument Class.
<?php
$html='<div id="breadcrumbs">
Home
Suppliers
This One i needed
<span class="currentpage">Amrapali</span>
</div>';
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('a') as $tag) {
$arr[]=$tag->nodeValue;
}
echo $yourval = array_pop($arr); //"prints" This One i needed
You should look for the next a tags with a negative lookahead:
(?s)<a(?!.*<a).+</a>
and the code:
preg_match("#(?s)<a(?!.*<a).+</a>#", $html, $result);
print_r($result);
Output:
Array
(
[0] => This One i needed
)
Regex demo | PHP demo
Try this
<?php
$string = '<div id="breadcrumbs">
Home
Suppliers
This One i needed
<span class="currentpage">Amrapali</span>
</div>';
//$matches[0] will have all the <a> tags
preg_match_all("/<a.+>.+<\/a>/i", $string, $matches);
//Now we remove the <a> tags and store the tag content into an array called $result
foreach($matches[0] as $key => $value){
$find = array("/<a\shref=\".+\">/", "/<\/a>/");
$replace = array("", "");
$result[] = preg_replace($find, $replace, $value);
}
//Make the last item in the $result array become the first
$result = array_reverse($result);
$last_item = $result[0];
echo $last_item;
?>

Extract Text from within tags using RegExp PHP

I am trying to extract some strings from the source code of a web page which looks like this :
<p class="someclass">
String1<br />
String2<br />
String3<br />
</p>
I'm pretty sure those strings are the only things that end with a single line break(). Everything else ends with two or more line breaks. I tried using this :
preg_match_all('~(.*?)<br />{1}~', $source, $matches);
But it doesn't work like it's supposed to. It returns some other text too along with those strings.
DOMDocument and XPath to the rescue.
$html = <<<EOM
<p class="someclass">
String1<br />
String2<br />
String3<br />
</p>
EOM;
$doc = new DOMDocument;
$doc->loadHTML($html);
$xp = new DOMXPath($doc);
foreach ($xp->query('//p[contains(concat(" ", #class, " "), " someclass ")]') as $node) {
echo $node->textContent;
}
Demo
I wouldn't recommend using a regular expression to get the values. Instead, use PHP's built in HTML parser like this:
$dom = new DOMDocument();
$dom->loadHTML($source);
$xpath = new DOMXPath($dom);
$elements = $xpath->query('//p[#class="someclass"]');
$text = array(); // to hold the strings
if (!is_null($elements)) {
foreach ($elements as $element) {
$text[] = strip_tags($element->nodeValue);
}
}
print_r($text); // print out all the strings
This is tested and working. You can read more about the PHP's DOMDocument class here: http://www.php.net/manual/en/book.dom.php
Here's a demonstration: http://phpfiddle.org/lite/code/0nv-hd6 (click 'Run')
Try this:
preg_match_all('~^(.*?)<br />$~m', $source, $matches);
Should work. Please try it
preg_match_all("/([^<>]*?)<br\s*\/?>/", $source, $matches);
or if your strings may contain some HTML code, use this one:
preg_match_all("/(.*?)<br\s*\/?>\\n/", $source, $matches);

DOMDocument : how to get inner HTML as Strings separated by line-breaks?

<blockquote>
<p>
2 1/2 cups sweet cherries, pitted<br>
1 tablespoon cornstarch <br>
1/4 cup fine-grain natural cane sugar
</p>
</blockquote>
hi , i want to get the text inside 'p' tag . you see there are three different line and i want to print them separately after adding some extra text with each line . here is my code block
$tags = $dom->getElementsByTagName('blockquote');
foreach($tags as $tag)
{
$datas = $tag->getElementsByTagName('p');
foreach($datas as $data)
{
$line = $data->nodeValue;
echo $line;
}
}
main problem is $line contains the full text inside 'p' tag including 'br' tag . how can i separate the three lines to treat them respectively ??
thanks in advance.
You can do that with XPath. All you have to do is query the text nodes. No need to explode or something like that:
$dom = new DOMDocument;
$dom->loadHtml($html);
$xp = new DOMXPath($dom);
foreach ($xp->query('/html/body/blockquote/p/text()') as $textNode) {
echo "\n<li>", trim($textNode->textContent);
}
The non-XPath alternative would be to iterate the children of the P tag and only output them when they are DOMText nodes:
$dom = new DOMDocument;
$dom->loadHtml($html);
foreach ($dom->getElementsByTagName('p')->item(0)->childNodes as $pChild) {
if ($pChild->nodeType === XML_TEXT_NODE) {
echo "\n<li>", trim($pChild->textContent);
}
}
Both will output (demo)
<li>2 1/2 cups sweet cherries, pitted
<li>1 tablespoon cornstarch
<li>1/4 cup fine-grain natural cane sugar
Also see DOMDocument in php for an explanation of the node concept. It's crucial to understand when working with DOM.
You can use
$lines = explode('<br>', $data->nodeValue);
here is a solution in javascript syntax
var tempArray = $line.split("<br>");
echo $line[0]
echo $line[1]
echo $line[2]
You can use the php explode function like this. (assuming each line in your <p> tag ends with <br>)
$tags = $dom->getElementsByTagName('blockquote');
foreach($tags as $tag)
{
$datas = $tag->getElementsByTagName('p');
foreach($datas as $data)
{
$contents = $data->nodeValue;
$lines = explode('<br>',$contents);
foreach($lines as $line) {
echo $line;
}
}
}

Categories