Get At Least X Characters Before Keyword PHP - php

I'm making a small search system using php and mysql.
I have this:
preg_match('#(.{75}' . $s . '.{75})#s', $b, $match);
if (isset($match[1])) {
return preg_replace('#(.+?)' . $s . '(.+)#s', '$1<span><b>' . $s . '</b> </span>$2', $match[1]);
} else {
return 'Error';
}
This does a good job of getting the first appearance of the keyword(s) and getting 75 characters before and after it. The only problem is that if there is less than 75 characters, it will not go through. I am pretty new to regex and I actually got help with the above code and it's not fully mine.

If I understood correctly what you want to match any characters from n number up to 75, if that is your case all you have to do is: {10,75} where 10 is the n number(lower) limit in your regex.
'#(.{10,75}' . $s . '.{10,75})#s'

Related

RegEx (preg_match_all in PHP) to capture series of <tags containing numbers> up to the first alphanumeric character

The problem here is the conflict between numbers and alphanumeric in the problem description.
Given the text:
<0><1><2><3><4><5><6><7><8><9><10><11><12><13><14><15><16><17><18>The
next 11 keys can change the SWING from OFF (50%) to
<19><20><21><22><23><24><25>80<26><27><28><29><30><31><32>% during
arpeggiator or sequencer operation.<33><34>
I need to extract the following four groups:
<0><1><2><3><4><5><6><7><8><9><10><11><12><13><14><15><16><17><18>
<19><20><21><22><23><24><25>
<26><27><28><29><30><31><32>
<33><34>
Reason: we want to display this in a much more user-friendly way as...
[1]The next 11 keys can change the SWING from OFF (50%) to [2]80[3]%
during arpeggiator or sequencer operation.[4]
Current code:
$pattern = '<[\d<>' . REGSTART . REGEND . REGSTARTSQ . REGENDSQ . '\{\}]+>';
$numberofsupertags = preg_match_all('/(' . $pattern . ')/', $source, $superchunks);
echo '<pre>';
print_r($superchunks);
echo '</pre><br>';
(REGSTART/REGEND/REGSTARTSQ/REGENDSQ refer to other possible pairs of symbols, like 【】 or 〖〗 etc.)
gives three groups:
<0><1><2><3><4><5><6><7><8><9><10><11><12><13><14><15><16><17><18>
<19><20><21><22><23><24><25>80<26><27><28><29><30><31><32>
<33><34>
As you can see, the RegEx fails to take into account sequences of only numbers between tags.
I've tried lots of things:
$pattern = '([<|' . REGSTART . REGSTARTSQ . '|\{]\d+?[>|' . REGEND . REGENDSQ . | \}])+';
$pattern = '<[\d<>' . REGSTART . REGEND . REGSTARTSQ . REGENDSQ . '\{\}]+[>(?=\d)|>]';
...but to no avail.
What is the correct solution and where do I go wrong? This looks really simple, but apparently it isn't.
You can use
(?:<(?:{\d+}|【\d+】|〖\d+〗|\d+)>)+
See the regex demo. Details:
(?: - start of a non-capturing group:
< - a < char
(?:{\d+}|【\d+】|〖\d+〗|\d+) - one of the alternatives: { + one or more digits + }, 【 + one or more digits + 】, 〖 + one or more digits + 〗 or one or more digits
> - a > char
)+ - one or more times.
See the PHP demo:
$source = '<0><1><2><3><4><5><6><7><8><9><10><11><12><13><14><15><16><17><18>The next 11 keys can change the SWING from OFF (50%) to <19><20><21><22><23><24><25>80<26><27><28><29><30><31><32>% during arpeggiator or sequencer operation.<33><34>';
$cnt = 0;
echo preg_replace_callback('~(?:<(?:{\d+}|【\d+】|〖\d+〗|\d+)>)+~u', function($m) use (&$cnt) {
return '['. ++$cnt .']';
}, $source);
// => [1]The next 11 keys can change the SWING from OFF (50%) to [2]80[3]% during arpeggiator or sequencer operation.[4]

convert decimal value to unicode characters in php

i need to convert decimals values into unicode and display the unicode character in PHP.
so for example, 602 will display as this character: ɚ
after referencing this SO question/answer, i was able to piece this together:
echo json_decode('"' . '\u0' . dechex(602) . '"' );
this seems pretty error-prone. is there a better way to do this?
i was unable to get utf8_encode to work since it seemed to want to start with a string, not a decimal.
EDIT: in order to do characters between 230 and 250, double prefixed zeros are required:
echo json_decode('"' . '\u00' . dechex(240) . '"' ); // ð
echo json_decode('"' . '\u00' . dechex(248) . '"' ); // ø
echo json_decode('"' . '\u00' . dechex(230) . '"' ); // æ
in some cases, no zero is required:
echo json_decode('"' . '\u' . dechex(8592) . '"' ); // ←
this seems strange.
While eval is generally to be avoided, it seems strictly-controlled enough to be fine here.
echo eval(sprintf('return "\u{%x}";',$val));
echo json_decode(sprintf('"\u%04x"',$val));
this ultimately worked for me, but i would not have found this without the answer from Niet the Dark Absol
normally, when i attempt to answer my own question, some SO wizard comes along and shows me a built-in function that i should have known about. but until that happens, this is all i can think of:
$leading_zeros = null;
if ( strlen(strval(dechex($val))) >= 4 ) {
$leading_zeros = '';
} else if ( ctype_alpha(dechex($val)[0]) ) {
$leading_zeros = '00';
} else if ( ctype_digit(dechex($val)[0]) ) {
$leading_zeros = '0';
}
echo json_decode('"' . '\u' . $leading_zeros . dechex($val) . '"' );
EDIT: when trying to something similar for javaScript, the documentation tells me the format is supposed to look like "\u####' four digits. i dont know if this is similar to PHP or not.
If you have IntlChar available I'd recommend using IntlChar::chr:
var_dump(IntlChar::chr(602));
Failing that, something like the following avoids any eval/json_decode trickery:
var_dump(iconv('UTF-32BE', 'UTF-8', pack('N', 602)));

PHP string replace with characters inbetween tags

I need help with my PHP, I'm using str_ireplace() and I want to filter something out and replace it with what I have.
I find it hard to explain what I am talking about so I will give an example below:
This is what I need
$string = "<error> " . md5(rand(0, 1000)) . time() . " </error> Test:)";
then I want to remove and replace the whole <error> .... </error> with nothing.
So the end outcome should just print 'Test:)'.
Your question is not perfectly clear, but I believe I may understand what you are asking. This code may do the trick:
$string = " " . md5(rand(0, 1000)) . time() . " Test:)";
$newstring = preg_replace("/.*?\ /i", "", $string);
This uses regular expressions to filter out everything that comes before the space (and also removes the space)

PHP regex split text to insert HTML

Very(!) new to regex but...
I have the following text strings outputted from a $title variable:
A. This is a title
B. This is another title
etc...
I'm after the following:
<span>A.</span> This is a title
<span>B.</span> This is another title
etc...
Currently I have the following code:
$title = $element['#title'];
if (preg_match("([A-Z][\.])", $title)) {
return '<li' . drupal_attributes($element['#attributes']) . ">Blarg</li>\n";
} else {
return '<li' . drupal_attributes($element['#attributes']) . '>' . $output . $sub_menu . "</li>\n";
}
This replaces anything A. through to Z. with Blarg however I'm not sure how to progress this?
In the Text Wrangler app I could wrap regex in brackets and output each argument like so:
argument 1 = \1
argument 2 = \2
etc...
I know I need to add an additional regex to grab the remainder of the text string.
Perhaps a regex guru could help and novice out!
Thanks,
Steve
Try
$title = 'A. This is a title';
$title = preg_replace('/^[A-Z]\./', '<span>$0</span>', $title);
echo $title;
// <span>A.</span> This is a title
If the string contains newlines and other titles following them, add the m modifier after the ending delimiter.
If the regex doesn't match then no replacements will be made, so there is no need for the if statement.
Is it always just 2 char ("A.", "B.", "C.",...)
because then you could work with a substring instead of regex.
Just pick of the first 2 chars of the link and wrap the span around the substring
Try this (untested):
$title = $element['#title'];
if (preg_match("/([A-Z]\.)(.*)/", $title, $matches)) {
return '<li' . drupal_attributes($element['#attributes']) . "><span>{$matches[0]</span>{$matches[1]}</li>\n";
} else {
return '<li' . drupal_attributes($element['#attributes']) . '>' . $output . $sub_menu . "</li>\n";
}
The change here was to first add / to the start and end of the string (to denote it's a regex), then remove the [ and ] around the period . because that's just a literal character on its own, then to add another grouping which will match the rest of the string. I also Added a $matches to preg_match() to place these two matches in to to use later, which we do on the next life.
Note: You could also do this instead:
$title = preg_replace('/^([A-Z]\.)/', "<span>$1</span>", $title);
This will simply replace the A-Z followed by the period at the start of the string (denoted with the ^ character) with <span>, that character (grabbed with the brackets) and </span>.
Again, that's not tested, but should give you a headstart :)

preg_replace_callback() memory issue

i'm having a memory issue while testing a find/replace function.
Say the search subject is:
$subject = "I wrote an article in the A+ magazine.
It'\s very long and full of words.
I want to replace every A+ instance in this text by a link
to a page dedicated to A+.";
the string to be found :
$find='A+';
$find = preg_quote($find,'/');
the replace function callback:
function replaceCallback($match)
{
if (is_array($match)) {
return '<a class="tag" rel="tag-definition" title="Click to know more about ' .stripslashes($match[0]) . '" href="?tag=' . $match[0]. '">' . stripslashes($match[0]) . '</a>';
}
}
and the call:
$result = preg_replace_callback($find, 'replaceCallback', $subject);
now, the complete searched pattern is drawn from the database. As of now, it is:
$find = '/(?![^<]+>)\b(voice recognition|test project reference|test|synesthesia|Superflux 2007|Suhjung Hur|scripts|Salvino a. Salvaggio|Professional Lighting Design Magazine|PLDChina|Nicolas Schöffer|Naziha Mestaoui|Nabi Art Center|Markos Novak|Mapping|Manuel Abendroth|liquid architecture|LAb[au] laboratory for Architecture and Urbanism|l'Arca Edizioni|l' ARCA n° 176 _ December 2002|Jérôme Decock|imagineering|hypertext|hypermedia|Game of Life|galerie Roger Tator|eversion|El Lissitzky|Bernhard Tschumi|Alexandre Plennevaux|A+)\b/s';
This $find pattern is then looked for (and replaced if found) in 23 columns across 7 mysql tables.
Using the suggested preg_replace() instead of preg_replace_callback() seems to have solved the memory issue, but i'm having new issues down the path: the subject returned by preg_replace() is missing a lot of content...
UPDATE:
the content loss is due to using preg_quote($find,'/');
It now works, except for... 'A+' which becomes 'A ' after the process.
I'm trying to reproduce your error but there's a parse error that needs to be fixed first. Either this isn't enough code to be a good sample or there's genuinely a bug.
First of all, the value you store in $find is not a pull pattern - so I had to add pattern delimiters.
Secondly, your replace string doesn't include the closing element for the anchor tags.
$subject = "
I wrote an article in the A+ magazine. It'\s very long and full of words. I want to replace every A+ instance in this text by a link to a page dedicated to A+.
";
$find='A+';
$find = preg_quote($find,'/');
function replaceCallback($match)
{
if (is_array($match)) {
return '<a class="tag" rel="tag-definition" title="Click to know more about ' .stripslashes($match[0]) . '" href="?tag=' . $match[0]. '">' . stripslashes($match[0]) . '</a>';
}
}
$result = preg_replace_callback( "/$find/", 'replaceCallback', $subject);
echo $result;
This code works, but I'm not sure it's what you want. Also, I have have strong suspicion that you don't need preg_replace_callback() at all.
This here works for me, i had to change the preg match a bit but it turns every A+ for me into a link. You also are missing a </a> at the end.
$subject = "I wrote an article in the A+ magazine. It'\s very long and full of words. I want to replace every A+ instance in this text by a link to a page dedicated to A+.";
function replaceCallback($match)
{
if (is_array($match))
{
return '<a class="tag" rel="tag-definition" title="Click to know more about ' .stripslashes($match[0]) . '" href="?tag=' . $match[0]. '">' . stripslashes($match[0]) . '</a>';
}
}
$result = preg_replace_callback("/A\+/", "replaceCallback", $subject);
echo $result;
Alright - I can see, now, why you're using the callback
First of all, I'd change your callback to this
function replaceCallback( $match )
{
if ( is_array( $match ) )
{
$htmlVersion = htmlspecialchars( $match[1], ENT_COMPAT, 'UTF-8' );
$urlVersion = urlencode( $match[1] );
return '<a class="tag" rel="tag-definition" title="Click to know more about ' . $htmlVersion . '" href="?tag=' . $urlVersion. '">' . $htmlVersion . '</a>';
}
return $match;
}
The stripslashes commands aren't going to do you any good.
As far as addressing the memory issue, you may want to break down your pattern into multiple patterns and execute them in a loop. I think your match is just too big/complex for PHP to handle it in a single call cycle.

Categories