regex php special characters - php

$word = file_get_contents('http://www.pixelmon-server-list.com/list.txt');
$content = file_get_contents('http://www.pixelmon-server-list.com/fleetyfleet.txt');
$found_dimensions = array(); // our array
$word_array = explode(' ', $word); // explode the list
foreach($word_array as $one_word) { // loop over it
$str = 'DimensionName'.$one_word; // what are we looking for?
if(strstr($content, $str) !== false) { // look for it!
echo $one_word; // Just for demonstration purposes
$found_dimensions[] = $one_word; // add to the array
}
}
okay i have a list.text and a fleetyfleet.txt
both can be viewed here i didn't post them for space sake
http://pastebin.com/7hWDUG1b
but what i want to do is find the words in list.txt but only add them to array if there prefix is Dimension�����Name� but the special characters make it kinda tough I'm not sure what i should do

I had a similar problem where I needed to remove all non-ascii characters from a file. Here's the regex I used:
s/[^\x00-\x7F]//g
If you're on linux, here's a quick one-liner:
perl -p -i -e "s/[^\x00-\x7F]//g" list.txt

Having to guess a little bit as I can't see the files from behind my firewall. The following code might help you:
<?php
$f1 = explode("\n",file_get_contents("./file1.txt"));
$f2 = file_get_contents("./file2.txt");
$found = array();
foreach($f1 as $x) {
$str = "/DimensionName.....$x./";
if (strlen($x)>0) {
if (preg_match($str, $f2, $matches)) {
echo $matches[0]."\n";
}
}
}
?>
This prints out lines that include a pattern DimensionName followed by 5 "anything" followed by whatever word was read from the first file file1.txt.
If you need this to be further refined, please leave a comment.

Related

PhP regex for a string, what would be the best way to do it?

I have an array with rule field that has a string like this:
FREQ=MONTHLY;BYDAY=3FR
FREQ=MONTHLY;BYDAY=3SA
FREQ=WEEKLY;UNTIL=20170728T080000Z;BYDAY=MO,TU,WE,TH,FR
FREQ=MONTHLY;UNTIL=20170527T100000Z;BYDAY=4SA
FREQ=WEEKLY;BYDAY=SA
FREQ=WEEKLY;INTERVAL=2;BYDAY=TH
FREQ=WEEKLY;BYDAY=TH
FREQ=WEEKLY;UNTIL=20170610T085959Z;BYDAY=SA
FREQ=MONTHLY;BYDAY=2TH
Each line is a different array, I am giving a few clues to get an idea of what I need.
What I need is to write a regex that would take off all unnecessary values.
So, I don't need FREQ= ; BYDAY= etc. I basically need the values after = but each one I want to store in a different variable.
Taking third one as an example it would be:
$frequency = WEEKLY
$until = 20170728T080000Z
$day = MO, TU, WE, TH, FR
It doesn't have to be necessarily one regex, there can be one regex for each value. So I have one for FREQ:
preg_match("/[^FREQ=][A-Z]+/", $input_line, $output_array);
But I can't do it for the rest unfortunately, how can I solve this?
The only way to go would be PHP array destructuring:
$str = "FREQ=WEEKLY;UNTIL=20170728T080000Z;BYDAY=MO,TU,WE,TH,FR";
preg_match_all('~(\w+)=([^;]+)~', $str, $matches);
[$freq, $until, $byday] = $matches[2]; // As of PHP 7.1 (otherwise use list() function)
echo $freq, " ", $until, " ", $byday;
// WEEKLY 20170728T080000Z MO,TU,WE,TH,FR
Live demo
Be more general
Using extract function:
preg_match_all('~(\w+)=([^;]+)~', $str, $m);
$m[1] = array_map('strtolower', $m[1]);
$vars = array_combine($m[1], $m[2]);
extract($vars);
echo $freq, " ", $until, " ", $byday;
Live demo
Notice: For this problem, I recommend the generell approach #revo posted, it's concise and safe and easy on the eyes -- but keep in mind, that regular expressions come with a performance penalty compared to fixed string functions, so if you can use strpos/substr/explode/..., try to use them, don't 'knee-jerk' to a preg_-based solution.
Since the seperators are fixed and don't seem to occur in the values your are interested in, and you furthermore rely on knowledge of the keys (FREQ:, etc) you don't need regular-expressions (as much as I like to use them anywhere I can, and you can use them here); why not simply explode and split in this case?
$lines = explode("\n", $text);
foreach($lines as $line) {
$parts = explode(';', $line);
$frequency = $until = $day = $interval = null;
foreach($parts as $part) {
list($key, $value) = explode('=', $part);
switch($key) {
case 'FREQ':
$frequency = $value;
break;
case 'INTERVAL':
$interval = $value;
break;
// and so on
}
}
doSomethingWithTheValues();
}
This may be more readable and efficient if your use-case is as simple as stated.
You need to use the Pattern
;?[A-Z]+=
together with preg_split();
preg_split('/;?[A-Z]+=/', $str);
Explanation
; match Semikolon
? no or one of the last Character
[A-Z]+ match one or more uppercase Letters
= match one =
If you want to have each Line into a seperate Array, you should do it this Way:
# split each Line into an Array-Element
$lines = preg_split('/[\n\r]+/', $str);
# initiate Array for Results
$results = array();
# start Looping trough Lines
foreach($lines as $line){
# split each Line by the Regex mentioned above and
# put the resulting Array into the Results-Array
$results[] = preg_split('/;?[A-Z]+=/', $line);
}

Piglatin program in PHP without regular expressions?

Basically I'm trying to write a pretty basic program in PHP that just takes user input and translates it into Piglatin with PHP without using regular expressions. This is what my code so far looks like, which is fine:
<?php # script
$original = $_REQUEST['original'];
$array = explode(" ", $original);
$piglatin = "";
foreach($array as $word)
{
$word = trim($word);
$first = substr($word,0,1);
$thsh = substr($word,1,2);
$thshrest = substr($word,2, strlen($word)-2);
$rest = substr($word,1,strlen($word)-1);
if(trim($word))
{
$piglatin .= (strlen($word)==1)?$first." ":$rest.$first. "ay ";
}
}
echo $original ." becomes: ".$piglatin;
?>
except it doesn't take into account the special cases, like if a word begins with a vowel (in which case, the word "igloo" for example should be printed as "iglooway"), or if it begins with "th" or "sh" (in which case, the word "thimble" for example should be printed as "imblethay", taking the first two letters and bringing them to the end instead of just the first one.)
I've already started the process of creating variables out of the strings that start with "th" and "sh" (see $thsh and $thshrest), but I'm really confused as to where I should go from here?
A regex solution (probably not what you want, but I want to show just how easy it is):
$pig_latin = preg_replace('#^([^aeiou]+)([aeiou]+)(.*)#', '$2$3$1ay', $original);
$pig_latin = preg_replace('#(^[aeiou].*)#', '$1way', $pig_latin);
How about:
foreach ($array as $word) {
if (preg_match('/^[aeiou]/', $word)) {
$word = preg_replace('/^([aeiou].+)$/', "$1way", $word);
} else {
$word = preg_replace('/^(th|sh)(.+)$/', "$2$1ay", $word);
}
$piglatin .= $word ." ";
}

Unspin text in php from spun text on sentence level

I need to neatly output spun text in a php page.
I already have the prespun text in {hi|hello|greetings} format.
I have a php code that i found elsewhere, but it does not output the spun text on sentence level, where two {{ come.
Here is the code that needs fixing.
<?php
function spinText($text){
$test = preg_match_all("#\{(.*?)\}#", $text, $out);
if (!$test) return $text;
$toFind = Array();
$toReplace = Array();
foreach($out[0] AS $id => $match){
$choices = explode("|", $out[1][$id]);
$toFind[]=$match;
$toReplace[]=trim($choices[rand(0, count($choices)-1)]);
}
return str_replace($toFind, $toReplace, $text);
}
echo spinText("{Hello|Hi|Greetings}!");;
?>
The output will be randomly chose word: Hello OR Hi OR Greetings.
However, if there is a sentence level spinning, the output is messed up.
E.g.:
{{hello|hi}.{how're|how are} you|{How's|How is} it going}
The output is
{hello.how're you|How is it going}
As you can see the text has not been spun completely.
Thank you
This is a recursive problem, so regular expressions aren't that great; but recursive patterns can help though:
function bla($s)
{
// first off, find the curly brace patterns (those that are properly balanced)
if (preg_match_all('#\{(((?>[^{}]+)|(?R))*)\}#', $s, $matches, PREG_OFFSET_CAPTURE)) {
// go through the string in reverse order and replace the sections
for ($i = count($matches[0]) - 1; $i >= 0; --$i) {
// we recurse into this function here
$s = substr_replace($s, bla($matches[1][$i][0]), $matches[0][$i][1], strlen($matches[0][$i][0]));
}
}
// once we're done, it should be safe to split on the pipe character
$choices = explode('|', $s);
return $choices[array_rand($choices)];
}
echo bla("{{hello|hi}.{how're|how are} you|{How's|How is} it going}"), "\n";
See also: Recursive patterns

Regex to transform abcd to (a(b(c(d))))

I'm using PHP's preg_replace, and trying to transform the string
abcd
into
(a(b(c(d))))
This is the best I've got:
preg_replace('/.(?=(.*$))/', '$0($1)', 'abcd');
// a(bcd)b(cd)c(d)d()
Is it even possible with regex?
Edit I've just discovered this in the PCRE spec: Replacements are not subject to re-matching, so my original approach isn't going to work. I wanted to keep it all regex because there's some more complicated matching logic in my actual use case.
How about:
preg_replace('/./s', '($0', 'abcd') . str_repeat(')', strlen('abcd'));
?
(Does that count as "with regex"?)
You can use preg_match_all. Not sure what kind of characters you want, though. So I'll give an example for all characters:
$val = 'abcd1234';
$out = '';
if(preg_match_all('#.#', $val, $matches))
{
$i = 0; // we'll use this to keep track of how many open paranthesis' we have
foreach($matches[0] as &$v)
{
$out .= '('.$v;
$i++;
}
$out .= str_repeat(")", $i);
}
else
{
// no matches found or error occured
}
echo $out; // (a(b(c(d(1(2(3(4))))))))
Will be easy to customise further, as well.
This is my way to do it =) :
<?php
$arr = str_split("abcd");
$new_arr = array_reverse($arr);
foreach ($new_arr as $a) {
$str = sprintf('(%s%s)', $a, $str);
}
echo "$str\n";
?>
KISS isn't it ? (few lines : 6)
I went with something along the lines of a combination of the above answers:
preg_match_all('/./ui', 'abcd', $matches);
$matches = $matches[0];
$string = '('.implode('(', $matches).str_repeat(')', count($matches));

Can I use regex for this?

Is this possible with regex?
I have a file, and if a '#' is found in the file, the text after the '#' with the '#' is to be replaced with the file with the same name as after the '#'.
File1: "this text is found in file1"
File2: "this file will contain text from file1: #file1".
File2 after regex: "this file will contain text from file1: this text is found in file1".
I wish to do this with php and I've heard that the preg function is better than the ereg, but whatever works is fine with me =)
Thanks a lot!
EDIT:
It has to be programmed, so that it looks through file2 without knowing which files to concatenate before it has gone through all occurrences of a # :)
PHP's native functions str_pos and str_replace are better to use when you're searching through larger files or strings. ;)
First of all the grammar of your templating is not a very good one becuase the parser may not exactly sure when will the file name ends.
My suggestion would be that you change to the one that can better detect the boundry like {#:filename}.
Anyhow, the code I give below follows your question.
<?php
// RegEx Utility functions -------------------------------------------------------------------------
function ReplaceAll($RegEx, $Processor, $Text) {
// Make sure the processor can be called
if(!is_callable($Processor))
throw new Exception("\"$Processor\" is not a callable.");
// Do the Match
preg_match_all($RegEx, $Text, $Matches, PREG_OFFSET_CAPTURE + PREG_SET_ORDER);
// Do the replacment
$NewText = "";
$MatchCount = count($Matches);
$PrevOffset = 0;
for($i = 0; $i < $MatchCount; $i++) {
// Get each match and the full match information
$EachMatch = $Matches[$i];
$FullMatch = is_array($EachMatch) ? $EachMatch[0] : $EachMatch;
// Full match is each match if no grouping is used in the regex
// Full match is the first element of each match if grouping is used in the regex.
$MatchOffset = $FullMatch[1];
$MatchText = $FullMatch[0];
$MatchTextLength = strlen($MatchText);
$NextOffset = $MatchOffset + $MatchTextLength;
// Append the non-match and the replace of the match
$NewText .= substr($Text, $PrevOffset, $MatchOffset - $PrevOffset);
$NewText .= $Processor($EachMatch);
// The next prev-offset
$PrevOffset = $NextOffset;
}
// Append the rest of the text
$NewText .= substr($Text, $PrevOffset);
return $NewText;
}
function GetGroupMatchText($Match, $Index) {
if(!is_array($Match))
return $Match[0];
$Match = $Match[$Index];
return $Match[0];
}
// Replacing by file content -----------------------------------------------------------------------
$RegEx_FileNameInText = "/#([a-zA-Z0-9]+)/"; // Group #1 is the file name
$ReplaceFunction_ByFileName = "ReplaceByFileContent";
function ReplaceByFileContent($Match) {
$FileName = GetGroupMatchText($Match, 1); // Group # is the gile name
// $FileContent = get_file_content($FileName); // Get the content of the file
$FileContent = "{# content of: $FileName}"; // Dummy content for testing
return $FileContent; // Returns the replacement
}
// Main --------------------------------------------------------------------------------------------
$Text = " === #file1 ~ #file2 === ";
echo ReplaceAll($RegEx_FileNameInText, $ReplaceFunction_ByFileName, $Text);
This will returns === {# content of: file1} ~ {# content of: file2} ===.
The program will replace all the regex match with the replacement returned from the result of the given function name.
In this case, the callback function is ReplaceByFileContent in which the file name is extract from the group #1 in the regex.
I believe my code is self documented but if you have any question, you can ask me.
Hope I helps.
Much cleaner:
<?php
$content = file_get_content('content.txt');
$m = array();
preg_match_all('`#([^\s]*)(\s|\Z)`ism', $content, $m, PREG_SET_ORDER);
foreach($m as $match){
$innerContent = file_get_contents($match[1]);
$content = str_replace('#'.$match[1], $innerContent, $content);
}
// done!
?>
regex tested with: http://www.spaweditor.com/scripts/regex/index.php

Categories