This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 1 year ago.
I have to find a values between specific tags in HTML page through php regex. but I want if HTML page contain multiple value then do preg_match_all otherwise do nothing.
For example if preg_match find 4 values in HTML then do preg_match_all in next phase otherwise if it is preg_match find only 1 tag value then do nothing.
<td class"page">
<span class="my-tag">value1</span>
<span class="my-tag">value2</span>
<span class="my-tag">value3</span>
<span class="my-tag">value4</span>
</td>
preg_match('/<td class"page">(.*?)<\/td>/s';)
now do preg_match_all in next phase because preg_match find 4 values
preg_match_all('|\<span class="my-tag"\>(.*?)\</span\>|', $html, $string);
and if HTML contain only 1 value like this
<td class"page">
<span class="my-tag">value1</span>
</td>
So if HTML contain only 1 value then do nothing
Basically, from your preg_match, you would be getting a string back that looks like this:
Array
(
[0] => <td class="page">
<span class="my-tag">value1</span>
<span class="my-tag">value2</span>
<span class="my-tag">value3</span>
<span class="my-tag">value4</span>
</td>
[1] =>
<span class="my-tag">value1</span>
<span class="my-tag">value2</span>
<span class="my-tag">value3</span>
<span class="my-tag">value4</span>
)
With that, we can just go ahead and do the match - regardless of if it only found one match or multiple matches. (Because it's not going to hurt anything to match one item or four items, I am proposing to move the logic down in your code.) Then we can just count how many it found and store that in a variable named $count.
// CHECK TO SEE IF WE FOUND A MATCH
if (isset($matches[1])) {
// GO AHEAD AND DO THE MATCH ON THE SPANS
preg_match_all('~<span class="my-tag">(.*?)</span>~s', $string, $span_matches);
$count = count($span_matches[1]);
// IF WE FOUND MULTIPLE MATCHES, LIST THEM OUT
if ($count > 1) {
print 'COUNT IS: '.$count;
print_r($span_matches[1]);
}
// WE DID NOT MATCH ANY SPAN TAGS
elseif ($count == 0) {
print 'COUNT IS ZERO - CRAP';
}
// IF WE ONLY FOUND ONE MATCH, WE DON'T NEED TO DO ANYTHING
else {
print 'COUNT IS EXACTLY 1 - DO NOTHING';
}
}
// WE DID NOT FIND AN INITAL MATCH TO BEGIN WITH
else {
print 'WE DID NOT FIND A MATCH';
}
From there, it's just a simple if/else statement to do what you want with it.
Here is a working demo:
http://ideone.com/SiPiOx
You can combine multiple patterns in a single regex with the pipe character in parentheses:
preg_match('/(cats?|dogs?|re.*tion)/', $string, $matches);
Related
[PHP]I have a variable for storing strings (a BIIGGG page source code as string), I want to echo only interesting strings (that I need to extract to use in a project, dozens of them), and they are inside the quotation marks of the tag
but I just want to capture the values that start with the letter: N (news)
[<a href="/news7044449/exclusive_news_sunday_"]
<a href="/n[ews7044449/exclusive_news_sunday_]"
that is, I think you will have to work with match using: [a href="/n]
how to do that to define that the echo will delete all the texts of the variable, showing only:
note that there are other hrefs tags with values that start with other letters, such as the letter 'P' : href="/profiles... (This does not interest me.)
$string = '</div><span class="news-hd-mark">HD</span></div><p>exclusive_news_sunday_</p><p class="metadata"><span class="bg">Czech AV<span class="mobile-hide"> - 5.4M Views</span>
- <span class="duration">7 min</span></span></p></div><script>xv.thumbs.preparenews(7044449);</script>
<div id="news_31720715" class="thumb-block "><div class="thumb-inside"><div class="thumb"><a href="/news31720715/my_sister_running_every_single_morning"><img src="https://static-hw.xnewss.com/img/lightbox/lightbox-blank.gif"';
I imagine something like this:
$removes_everything_except_values_from_the_href_tag_starting_with_the_letter_n = ('/something regex expresion I think /' or preg_match, substring?);
echo $string = str_replace($removes_everything_except_values_from_the_href_tag_starting_with_the_letter_n,'',$string);
expected output: /news7044449/exclusive_news_sunday_
NOTE: it is not essential to be through a variable, it can be from a .txt file the place where the extracts will be extracted, and not necessarily a variable.
thanks.
I believe this will help her.
<?php
$source = file_get_contents("code.html");
preg_match_all("/<a href=\"(\/n(?:.+?))\"[^>]*>/", $source, $results);
var_export( end($results) );
Step by Step Regex:
Regex Demo
Regex Debugger
To get just the links out of the $results array from Valdeir's answer:
foreach ($results as $r) {
echo $r;
// alt: to display them with an HTML break tag after each one
echo $r."<br>\n";
}
This question already has an answer here:
XPath - select text after certain node
(1 answer)
Closed 9 years ago.
I have following string
<strong>Test: </strong> BD-F5300
I am interested in getting number BD-F5300. Number could be of any thing text,number.
Any help, how can I get it? Thanks.
You could make use of preg_replace
<?php
$str='<strong>Test: </strong> BD-F5300';
echo $str = preg_replace("~<(/)?strong>(.*?)<(/)?strong>~","", $str);
OUTPUT :
BD-F5300
do like this in JavaScript:
var src = "<strong>Test: </strong> BD-F5300";
var reg = /.*<\/.*>\s*([a-zA-Z0-9-]+)/g;
var group = reg.exec(src);
console.log(group[1]+'\r\n'); //group[1] is what you want !
If all you need is to get some content after </strong> then you can just use:
preg_match('#</strong> (.+)#', $string, $matches);
The desired match will be in $matches[1]. However, this requires that the <strong> tag and the text content you want to find are both on the same line. If there are multiples of these you want to match, you may want to use preg_match_all
If there is always a space before the beginning of the final text you want and if there are never any spaces in the actual number text you want, you can avoid regex by doing this:
$str = '<strong>Test: </strong> BD-F5300';
$solution = substr($str, strrpos($str, ' ') + 1);
var_dump($solution);
i have a string, where i need to split some values in to an array, what would be the best aproach?
String can look like this:
<span class="17">118</span><span style="display: inline">.</span><span style="display:none"></span>
or
125<span class="17">25</span>354
The rules are:
The string can start with a number, followed by a span or a div
The string can start with a span or a div
The string can end with a number
The string can end with a /span or a /div
The divs/spans can have a style/class
What i need, is to seperate the string, so that i get the elements seperated, such as:
0 => 123
1 => <span class="potato">123</span>
2 => <span style="color: black">123</span>
I have tried some costum regex, but regex is not my strong side:
$pattern = "/<div.(.*?)<\/div>|<span.(.*?)<\/span>/";
// i know it wont detect a number value prior to the div, thats also an issue, even if it worked
I cannot use simple_html_dom has to be done with REGEX.
Splitting the string between every >< might work, but ">(.*?)<" inserts after the < for some reason?
You might get better performance if you just load this string to DOM and then parse it manually programming your logic like:
var el = document.createElement( 'div' );
el.innerHTML = '125<span class="17">25</span>354';
// test your first element (125) index=0 (you can make for loop)
if(el.childNodes[0].nodeType == 3) alert('this is number first, validate it');
else if(el.childNodes[0].nodeType == 1) alert('this is span or div, test it');
// you can test for div or span with el.childNodes[0].nodeName
// store first element to your array
// then continue, test el.childNodes[next one, index=1 (span)...]
// then continue, test el.childNodes[next one, index=2 (354)...]
since you are already know are you looking for, this can be as simple as that
Try /(<(span|div)[^>]*>)*([^<]*)(<\/(span|div)>)*/
The Regex says something like 'there can be a span or div or nothing, then it has to be somthing then a /span or /div or nothing and that whole statement can match zero or many times.
Here is an example:
$pattern = "/(<(span|div)[^>]*>)*([^<]*)(<\/(span|div)>)*/";
$txt = '<span class="17">118</span><span style="display: inline">.</span><span style="display:none"></span>';
preg_match_all($pattern, $txt,$foo);
print_r($foo[0]);
$txt = '125<span class="17">25</span>354';
preg_match_all($pattern, $txt,$foo);
print_r($foo[0]);
?>
My html code is as follows
<span class="phone">
i want this text
<span class="ignore-this-one">01234567890</span>
<span class="ignore-this-two" >01234567890</span>
<a class="also-ignore-me">some text</a>
</span>
What I want to do is extract the 'i want this text' leaving all of the other elements behind. I've tried several iterations of the following, but none return the text I need:
$name = trim($page->find('span[class!=ignore^] a[class!=also^] span[class=phone]',0)->innertext);
Some guidance would be appreciated as the simple_html_dom section on filters is quite bare.
what about using php preg_match (http://php.net/manual/en/function.preg-match.php)
try the below:
<?php
$html = <<<EOF
<span class="phone">
i want this text
<span class="ignore-this-one">01234567890</span>
<span class="ignore-this-two" >01234567890</span>
<a class="also-ignore-me">some text</a>
</span>;
EOF;
$result = preg_match('#class="phone".*\n(.*)#', $html, $matches);
echo $matches[1];
?>
regex explained:
find text class="phone" then proceed until the end of the line, matching any character using *.. Then switch to a new line with \n and grab everything on that line by enclosing *. into brackets.
The returned result is stored in the array $matches. $matches[0] holds the value that is returned from the whole regex, while $matches[1] holds the value that is return by the closing brackets.
I search in many threads in Stackoverflow but I didn't find anything relevant for my case.
Here is the source text :
<span class="red"><span>70</span><span style="display:none">1</span><span>,89</span> € TTC<br /></span>
I want to extract 70,89 with a regular expression.
So I tried :
<span class="red"><span>([0-9]+)(<\/span><span style="display:none">1<\/span><span>)(,[0-9]+)<\/span>
which returns an array (with preg_match_all in PHP) with 3 groups :
1/ 70
2/
</span><span style="display:none">1</span><span>
3/ ,89
I would like to exclude group 2 and merge 1 & 3.
So I also tried :
<span class="red"><span>([0-9]+)(?:<\/span><span style="display:none">1<\/span><span>)(,[0-9]+)<\/span>
but it returns :
70
,89
How can I merge the two groups ?
Thanks a lot for your answers, I am going to be crazy searching for this regular expression ! :)
Have a good day !
Just match the numbers that are wrapped with a plain <span>:
$str = '<span class="red"><span>70</span><span style="display:none">1</span><span>,89</span> € TTC<br /></span>';
if (preg_match_all('#<span>([,\d]+)</span>#', $str, $matches)) {
echo join('', $matches[1]);
}
// output: 70,89