[PHP]I have a variable for storing strings (a BIIGGG page source code as string), I want to echo only interesting strings (that I need to extract to use in a project, dozens of them), and they are inside the quotation marks of the tag
but I just want to capture the values that start with the letter: N (news)
[<a href="/news7044449/exclusive_news_sunday_"]
<a href="/n[ews7044449/exclusive_news_sunday_]"
that is, I think you will have to work with match using: [a href="/n]
how to do that to define that the echo will delete all the texts of the variable, showing only:
note that there are other hrefs tags with values that start with other letters, such as the letter 'P' : href="/profiles... (This does not interest me.)
$string = '</div><span class="news-hd-mark">HD</span></div><p>exclusive_news_sunday_</p><p class="metadata"><span class="bg">Czech AV<span class="mobile-hide"> - 5.4M Views</span>
- <span class="duration">7 min</span></span></p></div><script>xv.thumbs.preparenews(7044449);</script>
<div id="news_31720715" class="thumb-block "><div class="thumb-inside"><div class="thumb"><a href="/news31720715/my_sister_running_every_single_morning"><img src="https://static-hw.xnewss.com/img/lightbox/lightbox-blank.gif"';
I imagine something like this:
$removes_everything_except_values_from_the_href_tag_starting_with_the_letter_n = ('/something regex expresion I think /' or preg_match, substring?);
echo $string = str_replace($removes_everything_except_values_from_the_href_tag_starting_with_the_letter_n,'',$string);
expected output: /news7044449/exclusive_news_sunday_
NOTE: it is not essential to be through a variable, it can be from a .txt file the place where the extracts will be extracted, and not necessarily a variable.
thanks.
I believe this will help her.
<?php
$source = file_get_contents("code.html");
preg_match_all("/<a href=\"(\/n(?:.+?))\"[^>]*>/", $source, $results);
var_export( end($results) );
Step by Step Regex:
Regex Demo
Regex Debugger
To get just the links out of the $results array from Valdeir's answer:
foreach ($results as $r) {
echo $r;
// alt: to display them with an HTML break tag after each one
echo $r."<br>\n";
}
I have a character string like (ascii codes):
32,13,7,11,11,
"string1,blah;like: this...", 10,10, 32,32,32,32, 138,138, 32,32,32,32, 13,7, 11,11,
"string2/lorem/example-text...", 10,10, 32,32,32,32,32, 143,143,143,143,143
So the sequence is:
any characters, followed by my search string, followed by any
characters
11,11
the string I want to replace
any non-printable characters
If the block contains string1 then I need to replace the next string with something else. The second string always starts directly after the 11,11.
I'm using PHP.
I thought something like this, but I am not getting the correct result:
$updated = preg_replace("/(.*string1.*?\\v+)([[:print:]]+)([[:ascii:]]*)/mi", "$1"."new string"."$3", $orig);
This puts "new string" between the 10,10 and the 138,138 (and replaces the 32's).
Also tried \xb instead of \v.
Normally I test with regex101, but not sure how to do that with non-printable characters. Any suggestions from regex guru's?
Edit: the expected output is the sequence:
32,13,7,11,11,
"string1,blah;like: this...", 10,10, 32,32,32,32, 138,138, 32,32,32,32, 13,7, 11,11,
"new string", 10,10, 32,32,32,32,32, 143,143,143,143,143
Edit: sorry for the confusion regarding the ascii codes.
Here's a complete example:
<?php
$s = chr(32).chr(32).chr(7).chr(11).chr(11);
$s .= "string1,blah;like: this...". chr(10).chr(10).chr(32).chr(32).chr(32).chr(32).chr(138).chr(138);
$s .= chr(32).chr(32).chr(32).chr(32).chr(13).chr(7).chr(11).chr(11);
$s .= "string2/lorem/example-text...". chr(10).chr(10).chr(32).chr(32).chr(32).chr(32).chr(32).chr(143).chr(143).chr(143);
$result = preg_replace('/(.*string1.*?\v+)([[:print:]]+)([[:ascii:]]*)/mi', "$1"."new string"."$3", $s);
echo "\n------------------------\n";
echo $result;
echo "\n------------------------\n";
The text string2/lorem/example-text... should be replaced by new string.
My php-cli halted every time preg_match has reached char(138) and I don't know why.
I will throw my hat on this RegEx (note: \v matches a new-line | no flags are set):
"[^"]*"[^\x0b]+\v{2}"\K[^"]*
PHP code:
$source = chr(32).chr(13).chr(7).chr(11).chr(11)."\"string1,blah;like: this...\"".chr(10).
chr(10).chr(32).chr(32).chr(32).chr(32).chr(138).chr(138).chr(32).chr(32).chr(32).chr(32).
chr(13).chr(7).chr(11).chr(11)."\"string2/lorem/example-text...\"".chr(10).chr(10).chr(32).
chr(32).chr(32).chr(32).chr(32).chr(143).chr(143).chr(143).chr(143).chr(143);
echo preg_replace('~"[^"]*"[^\x0b]+\v{2}"\K[^"]*~', "new string", $source);
Beautiful output:
"string1,blah;like: this..."
��
"new string"
�����
Live demo
Solved. It was a combination of things:
/mis was needed (instead of /mi)
\x0b was needed (instead of \v)
Complete working example:
<?php
$s = chr(32).chr(32).chr(7).chr(11).chr(11);
$s .= "string1,blah;like: this...". chr(10).chr(10).chr(32).chr(32).chr(32).chr(32).chr(138).chr(138);
$s .= chr(32).chr(32).chr(32).chr(32).chr(13).chr(7).chr(11).chr(11);
$s .= "string2/lorem/example-text...". chr(10).chr(10).chr(32).chr(32).chr(32).chr(32).chr(32).chr(143).chr(143).chr(143);
$result = preg_replace('/(.*string1.*?\x0b+)([[:print:]]+)/mis', "$1"."new string", $s);
echo "\n------------------------\n";
echo $result;
echo "\n------------------------\n";
Thanks for everyone's suggestions. It put me on the right track.
Please take a look at the following situation below.
[reply="292"] Text Here [/reply]
What I am trying to get is the number between the quotations in reply="NUMBERS". I want to extract that to one variable and the text between [reply="NUMBER"] this text here [/reply] to another variable.
So for this example:
[reply="292"] Text Here [/reply]
I want to extract the reply number: 292 and the text between the reply tags: Text here.
I have tried this:
\[reply\=\"]([A-Z]\w)\[\/reply]
But this only works until the reply tag, doesn't work after that. How can I go about doing this?
I left generic (. *), but you can specify a type like decimal (\d+).
php:
$s = '[reply="292"] Text Here [/reply]';
$expr = '/\[reply=\"(.*)\"\](.*)\[\/reply\]/';
if(preg_match($expr,$s,$r)){
var_dump($r);
}
javascript:
s = '[reply="292"] Text Here [/reply]'
s.match(/\[reply=\"(.*)\"\](.*)\[\/reply\]/)
//["[reply="292"] Text Here [/reply]", "292", " Text Here "]
Easy!
\[reply\=\"(\d+)\"](.*?)\[\/reply]
Explanation
\d for digit
+ for 1 or more occurrence of the specified character.
[\w\s] for any character in word and whitespace (\s)
Then apply it to PHP like this:
<?php
$str = "[reply=\"292\"] Text Here [/reply]";
preg_match('/\[reply\=\"(\d+)\"]([\w\s]+)\[\/reply]/', $str, $re);
print_r($re[1]); // printing group 1, the reply number
print_r($re[2]); // printing group 2, the text
?>
Important!!
Just get the group value, not all. You only need some of it anyway.
In PHP I have a String $string and an array $acronyms (in the form "UK" => "United Kingdom").
Now I want to replace all acronyms within $string by some HTML Tags. For example Hello UK should turn into Hello <acronym title="United Kingdom">UK</acronym></pre>
I do it this way:
foreach($acronyms as $acronym => $tooltip){
$string = preg_replace('/'.$acronym.'/i', ''.$acronym.'', $string);
}
The problem is: Let's say I have a text Hello UK and have an array to replace "UK" with "United Kingdom" and "Kingdom" with "RandomWord". Then the text will replace into Hello <acronym title="United <acronym title="RandomWord">Kingdom</acronym>">UK</acronym> which obviously is chaos.
So the question is: How do I make my preg_replace only look for the words while they are NOT within an <acronym> tag? (neither in title-attribute, nor within the tag itself)
Edit: second attempt according to a response (because I can't put code in reply). Still the same problem, the text within acronym gets replaced a second time...
foreach($acronyms as $acronym => $tooltip){
$acronyms[$acronym] = '<acronym title="'.$tooltip.'">'.$acronym.'</acronym>';
}
$string = str_ireplace(array_keys($acronyms), array_values($acronyms), $string);
You can use strtr(). It doesn't rescan the string after performing a replacement:
foreach ($acronyms as $acronym => $tooltip) {
$acronyms[$acronym] = sprintf('<acronym title="%s">%s</acronym>',
htmlspecialchars($tooltip),
htmlspecialchars($acronym)
);
}
echo strtr($str, $acronyms);
Here's an attempt at the regex version:
foreach($acronyms as $acronym => $tooltip){
$rexp = '/' . $acronym . '(?!((?!<acronym).)*<\/acronym>)/i';
$string = preg_replace($rexp, ''.$acronym.'', $string);
}
Seems to work for me. It does the following:
Match the $acronym variable with a negative look ahead...
where a closing acronym tag can be found
but stop the lookahead when an opening acronym tag is before it.
Ultimately this matches only where it's not within an acronym tag (including all attributes such as the title).
Here's an example of it in action: gSkinner regex example
Don't try to do everything with regexes :
Parse your HTML using a HTML/XML parsing library.
Iterate over your HTML tags, replace what you have to replace.
Ask your "html parsing lib" to convert this back to a "HTML string".
I have a string thats separated by a space. I want to show every part of the string on new line that is separated by space. how can I do that.
base1|123|wen dsj|test base2|sa|7243|sdg custom3|dskkjds|823|kd
if there is no more | after an initial pipe then the space should break the line and it should look like this
base1|123|wen dsj|test
base2|sa|7243|sdg
custom3|dskkjds|823|kd
echo str_replace(' ',"\n",$string);
or
echo str_replace(' ',PHP_EOL,$string);
This is pretty messy, yet to clean up the last empty result:
$string = 'base1|123|wen dsj|test base2|sa|7243|sdg custom3|dskkjds|823|kd';
preg_match_all('/(?P<line>(?:[^\\| ]*\\|{0,1})*(?: [^\\| ]*\\|[^\\| ]*(?: |\\z){0,1})*)(?: |\\z)/',$string,$matches,PREG_SET_ORDER);
print_r($matches);
Edit: Actually this is pretty horrible