prevent line break after specific word (php) - php

I have a long text and I would like to add a no-wrap after specific key words. Lets say: 'Mr.', 'the', 'an' the only problem is I do not know what word will be after the key.
So if I have a text like:
... there is an elephant in the room ...
script should change it to:
... there is <span class="no-wrap">an elephant</span> in <span class="no-wrap"> the room</span> ...
I know that it should be done with regular expression of some sort but I am really bad at those. So any tips on how to do this in php?

Capture Mr., the, an strings and also the following word into a group.
(\b(?:Mr\.|the|an)\h+\S+)
Replacement string:
<span class="no-wrap">$1</span>
DEMO
Code:
<?php
$string = "... there is an elephant in the room ...";
echo preg_replace('~(\b(?:Mr\.|the|an)\h+\S+)~', '<span class="no-wrap">$1</span>', $string)
?>
Output:
... there is <span class="no-wrap">an elephant</span> in <span class="no-wrap">the room</span> ...

Related

how to do echo from a string, only from values that are between a specific stretch[href tag] of the string?

[PHP]I have a variable for storing strings (a BIIGGG page source code as string), I want to echo only interesting strings (that I need to extract to use in a project, dozens of them), and they are inside the quotation marks of the tag
but I just want to capture the values that start with the letter: N (news)
[<a href="/news7044449/exclusive_news_sunday_"]
<a href="/n[ews7044449/exclusive_news_sunday_]"
that is, I think you will have to work with match using: [a href="/n]
how to do that to define that the echo will delete all the texts of the variable, showing only:
note that there are other hrefs tags with values that start with other letters, such as the letter 'P' : href="/profiles... (This does not interest me.)
$string = '</div><span class="news-hd-mark">HD</span></div><p>exclusive_news_sunday_</p><p class="metadata"><span class="bg">Czech AV<span class="mobile-hide"> - 5.4M Views</span>
- <span class="duration">7 min</span></span></p></div><script>xv.thumbs.preparenews(7044449);</script>
<div id="news_31720715" class="thumb-block "><div class="thumb-inside"><div class="thumb"><a href="/news31720715/my_sister_running_every_single_morning"><img src="https://static-hw.xnewss.com/img/lightbox/lightbox-blank.gif"';
I imagine something like this:
$removes_everything_except_values_from_the_href_tag_starting_with_the_letter_n = ('/something regex expresion I think /' or preg_match, substring?);
echo $string = str_replace($removes_everything_except_values_from_the_href_tag_starting_with_the_letter_n,'',$string);
expected output: /news7044449/exclusive_news_sunday_
NOTE: it is not essential to be through a variable, it can be from a .txt file the place where the extracts will be extracted, and not necessarily a variable.
thanks.
I believe this will help her.
<?php
$source = file_get_contents("code.html");
preg_match_all("/<a href=\"(\/n(?:.+?))\"[^>]*>/", $source, $results);
var_export( end($results) );
Step by Step Regex:
Regex Demo
Regex Debugger
To get just the links out of the $results array from Valdeir's answer:
foreach ($results as $r) {
echo $r;
// alt: to display them with an HTML break tag after each one
echo $r."<br>\n";
}

Change one part of the selection with Regex

I would like to insert a class with regex and preg_replace
echo preg_replace("/<li\>\s*<p\>[a-z]\)\s/", "/<li class=\"inciso\"\>\s*<p\>[a-z]\)\s/", $documento);
This is the model of text that I haveEste é o modelo das linhas do meu documento:
<li>
<p>a) long text</p>
</li>
<li>
<p>b) long text</p>
</li>
<li>
<p>c) long text</p>
</li>
New example, let´s say that is not a HTML, is just a simple list, and you wanna a change from this:
a) long text
b) long text
c) long text
To this:
a) new text long text
b) new text long text
c) new text long text
echo preg_replace("/[a-z]\)\s/", "/[a-z]\)\snew\stext/", $documento);
Is this correct?
IF, and I emphasize again IF, the input text you have is like the one you posted here, then you can assume you can find a safe pattern to replce, as you won't see this pattern somewhere else:
preg_replace("/<li>/", "<li class=\"inciso\"\/>", $documento);
This will replace every occurrence of <li> with the modified version. If there are <li> that you won't replace then it becomes more difficult and you should use a DOM or SAX parser
UPDATE after your update: You can match a word and add something before it with:
preg_replace("(long)", "new text $1", $documento);
Have a look at backreferences
use str_replace instead.
$find = '<li>';
$replace = '<li class="inciso">';
echo str_replace($documento, $find, $replace);

SIMPLE HTML DOM - how to ignore nested elements?

My html code is as follows
<span class="phone">
i want this text
<span class="ignore-this-one">01234567890</span>
<span class="ignore-this-two" >01234567890</span>
<a class="also-ignore-me">some text</a>
</span>
What I want to do is extract the 'i want this text' leaving all of the other elements behind. I've tried several iterations of the following, but none return the text I need:
$name = trim($page->find('span[class!=ignore^] a[class!=also^] span[class=phone]',0)->innertext);
Some guidance would be appreciated as the simple_html_dom section on filters is quite bare.
what about using php preg_match (http://php.net/manual/en/function.preg-match.php)
try the below:
<?php
$html = <<<EOF
<span class="phone">
i want this text
<span class="ignore-this-one">01234567890</span>
<span class="ignore-this-two" >01234567890</span>
<a class="also-ignore-me">some text</a>
</span>;
EOF;
$result = preg_match('#class="phone".*\n(.*)#', $html, $matches);
echo $matches[1];
?>
regex explained:
find text class="phone" then proceed until the end of the line, matching any character using *.. Then switch to a new line with \n and grab everything on that line by enclosing *. into brackets.
The returned result is stored in the array $matches. $matches[0] holds the value that is returned from the whole regex, while $matches[1] holds the value that is return by the closing brackets.

Exclude a group regular expression

I search in many threads in Stackoverflow but I didn't find anything relevant for my case.
Here is the source text :
<span class="red"><span>70</span><span style="display:none">1</span><span>,89</span> € TTC<br /></span>
I want to extract 70,89 with a regular expression.
So I tried :
<span class="red"><span>([0-9]+)(<\/span><span style="display:none">1<\/span><span>)(,[0-9]+)<\/span>
which returns an array (with preg_match_all in PHP) with 3 groups :
1/ 70
2/
</span><span style="display:none">1</span><span>
3/ ,89
I would like to exclude group 2 and merge 1 & 3.
So I also tried :
<span class="red"><span>([0-9]+)(?:<\/span><span style="display:none">1<\/span><span>)(,[0-9]+)<\/span>
but it returns :
70
,89
How can I merge the two groups ?
Thanks a lot for your answers, I am going to be crazy searching for this regular expression ! :)
Have a good day !
Just match the numbers that are wrapped with a plain <span>:
$str = '<span class="red"><span>70</span><span style="display:none">1</span><span>,89</span> € TTC<br /></span>';
if (preg_match_all('#<span>([,\d]+)</span>#', $str, $matches)) {
echo join('', $matches[1]);
}
// output: 70,89

Regex find the first word

I'm trying to use regex to add a span to the first word of content for a page, however the content contains HTML so I am trying to ensure just a word gets chosen. The content changes for every page.
Current script is:
preg_match('/(<(.*?)>)*/i',$page_content,$matches);
$stripped = substr($page_content,strlen($matches[0]));
preg_match('/\b[a-z]* \b/i',$stripped,$strippedmatch);
echo substr($page_content, 0, strlen($matches[0])).'<span class="h1">'.$strippedmatch[0].'</span>'.substr($stripped, strlen($strippedmatch[0]));
However if the $page_content is
<p><span class="title">This is </span> my title!</p>
Then my regex thinks the first word is "span" and adds the tags around that.
Is there any way to fix this? (or a better way to do it).
This seems to work...
(?<=\>)\b\w*\b|^\w*\b
If you wanna allow spaces in front also (remember to trim the resulting string):
(?<=>)\s*\b\w*\b|^\s*\w*\b
If i understand you correct you want a tag around the first word (none tag)
with regex you could get that by using this regex
$code = preg_replace('/^(<.+?>\s*)+?(\w+)/i', '\1<span class="h1">\2</span>', $code);
this one just loops over the tags and waits until it finds text outside the tags
You shouldn't be using regex for this, but if you insist, you can try something like this:
<?php
$texts = array(
'<p><span class="title">This is </span> my title!</p>',
'<1> <2> <3> blah blah <4> <5> blah',
'garbage <1> <2> real stuff begins <3> <4>',
);
foreach ($texts as $text) {
print preg_replace('/(>\s*)(\w+)/', '\1{{\2}}', $text, 1)."\n";
}
?>
This prints:
<p><span class="title">{{This}} is </span> my title!</p>
<1> <2> <3> {{blah}} blah <4> <5> blah
garbage <1> <2> {{real}} stuff begins <3> <4>

Categories