Assign regex pattern as key to an array - php

I have an array of regular expressions and am trying to loop through a text document to find the first pattern, assign that as the key to an array then continue through find the second pattern and assign that as the value. Whenever I come across pattern 1 I want that to always be assigned as a key and all pattern 2 matches that follow until I come across a new key will be assigned to that first key as values.
Text document structure:
Subject: sometext
Email: someemail#email.com
source: www.google.com www.stackoverflow.com www.reddit.com
So I have an array of expressions:
$expressions=array(
'email'=>'(\b[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b)',
'url'=>'([A-Za-z][A-Za-z0-9+.-]{1,120}:[A-Za-z0-9/](([A-Za-z0-9$_.+!*,;/?:#&~=-])|%[A-Fa-f0-9]{2}){1,333}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*,;/?:#&~=%-]{0,1000}))?)'
);
I want to loop through my text document and match the email address then assign that as the key to an array then assign all urls that follow as the values, s the output to the above text would be:
array(
'someemail#email.com' => array (
0 => 'www.google.com',
1 => 'www.stackoverflow.com',
2 => 'www.reddit.com'
)

One way to do such a thing:
$parts = preg_split("/(emailexpr)/",$txt,-1,PREG_SPLIT_DELIM_CAPTURE);
$res = array();
// note: $parts[0] will be everything preceding the first emailexpr match
for ( $i=1; isset($parts[$i]); $i+=2 )
{
$email = $parts[$i];
$chunk = $parts[$i+1];
if ( preg_match_all("/domainexpr/",$chunk,$match) )
{
$res[$email] = $match[0];
}
}
replace emailexpr and domainexpr with your regexp gibberish.

I would do:
$lines = file('input_file', FILE_SKIP_EMPTY_LINES);
$array = array();
foreach($lines as $line) {
if(preg_match('/^Subject:/', $line) {
$email = '';
} elseif(preg_match('/^Email: (.*)$/', $line, $m)) {
if(preg_match($expressions['email'], $m[1])) {
$email = $m[1];
}
} elseif(preg_match('/^source: (.*)$/', $line, $m) && $email) {
foreach(explode(' ', $m[1]) as $url) {
if(preg_match($expressions['url'], $url)) {
$array[$email][] = $url;
}
}
}
}

Related

how to split string and add additional strin to it

i have a string and i need to add some html tag at certain index of the string.
$comment_text = 'neethu and Dilnaz Patel  check this'
Array ( [start_index_key] => 0 [string_length] => 6 )
Array ( [start_index_key] => 11 [string_length] => 12 )
i need to split at start index key with long mentioned in string_length
expected final output is
$formattedText = '<span>#neethu</span> and <span>#Dilnaz Patel</span>  check this'
what should i do?
This is a very strict method that will break at the first change.
Do you have control over the creation of the string? If so, you can create a string with placeholders and fill the values.
Even though you can do this with regex:
$pattern = '/(.+[^ ])\s+and (.+[^ ])\s+check this/i';
$string = 'neehu and Dilnaz Patel check this';
$replace = preg_replace($pattern, '<b>#$\1</b> and <b>#$\2</b> check this', $string);
But this is still a very rigid solution.
If you can try creating a string with placeholders for the names. this will be much easier to manage and change in the future.
<?php
function my_replace($string,$array_break)
{
$break_open = array();
$break_close = array();
$start = 0;
foreach($array_break as $key => $val)
{
// for tag <span>
if($key % 2 == 0)
{
$start = $val;
$break_open[] = $val;
}
else
{
// for tag </span>
$break_close[] = $start + $val;
}
}
$result = array();
for($i=0;$i<strlen($string);$i++)
{
$current_char = $string[$i];
if(in_array($i,$break_open))
{
$result[] = "<span>".$current_char;
}
else if(in_array($i,$break_close))
{
$result[] = $current_char."</span>";
}
else
{
$result[] = $current_char;
}
}
return implode("",$result);
}
$comment_text = 'neethu and Dilnaz Patel check this';
$my_result = my_replace($comment_text,array(0,6,11,12));
var_dump($my_result);
Explaination:
Create array parameter with: The even index (0,2,4,6,8,...) would be start_index_key and The odd index (1,3,5,7,9,...) would be string_length
read every break point , and store it in $break_open and $break_close
create array $result for result.
Loop your string, add , add or dont add spann with break_point
Result:
string '<span>neethu </span>and <span>Dilnaz Patel </span> check this' (length=61)

Generate all possible matches for regex pattern in PHP

There are quite a few questions on SO asking about how to parse a regex pattern and output all possible matches to that pattern. For some reason, though, every single one of them I can find (1, 2, 3, 4, 5, 6, 7, probably more) are either for Java or some variety of C (and just one for JavaScript), and I currently need to do this in PHP.
I’ve Googled to my heart’s (dis)content, but whatever I do, pretty much the only thing that Google gives me is links to the docs for preg_match() and pages about how to use regex, which is the opposite of what I want here.
My regex patterns are all very simple and guaranteed to be finite; the only syntax used is:
[] for character classes
() for subgroups (capturing not required)
| (pipe) for alternative matches within subgroups
? for zero-or-one matches
So an example might be [ct]hun(k|der)(s|ed|ing)? to match all possible forms of the verbs chunk, thunk, chunder and thunder, for a total of sixteen permutations.
Ideally, there’d be a library or tool for PHP which will iterate through (finite) regex patterns and output all possible matches, all ready to go. Does anyone know if such a library/tool already exists?
If not, what is an optimised way to approach making one? This answer for JavaScript is the closest I’ve been able to find to something I should be able to adapt, but unfortunately I just can’t wrap my head around how it actually works, which makes adapting it more tricky. Plus there may well be better ways of doing it in PHP anyway. Some logical pointers as to how the task would best be broken down would be greatly appreciated.
Edit: Since apparently it wasn’t clear how this would look in practice, I am looking for something that will allow this type of input:
$possibleMatches = parseRegexPattern('[ct]hun(k|der)(s|ed|ing)?');
– and printing $possibleMatches should then give something like this (the order of the elements is not important in my case):
Array
(
[0] => chunk
[1] => thunk
[2] => chunks
[3] => thunks
[4] => chunked
[5] => thunked
[6] => chunking
[7] => thunking
[8] => chunder
[9] => thunder
[10] => chunders
[11] => thunders
[12] => chundered
[13] => thundered
[14] => chundering
[15] => thundering
)
Method
You need to strip out the variable patterns; you can use preg_match_all to do this
preg_match_all("/(\[\w+\]|\([\w|]+\))/", '[ct]hun(k|der)(s|ed|ing)?', $matches);
/* Regex:
/(\[\w+\]|\([\w|]+\))/
/ : Pattern delimiter
( : Start of capture group
\[\w+\] : Character class pattern
| : OR operator
\([\w|]+\) : Capture group pattern
) : End of capture group
/ : Pattern delimiter
*/
You can then expand the capture groups to letters or words (depending on type)
$array = str_split($cleanString, 1); // For a character class
$array = explode("|", $cleanString); // For a capture group
Recursively work your way through each $array
Code
function printMatches($pattern, $array, $matchPattern)
{
$currentArray = array_shift($array);
foreach ($currentArray as $option) {
$patternModified = preg_replace($matchPattern, $option, $pattern, 1);
if (!count($array)) {
echo $patternModified, PHP_EOL;
} else {
printMatches($patternModified, $array, $matchPattern);
}
}
}
function prepOptions($matches)
{
foreach ($matches as $match) {
$cleanString = preg_replace("/[\[\]\(\)\?]/", "", $match);
if ($match[0] === "[") {
$array = str_split($cleanString, 1);
} elseif ($match[0] === "(") {
$array = explode("|", $cleanString);
}
if ($match[-1] === "?") {
$array[] = "";
}
$possibilites[] = $array;
}
return $possibilites;
}
$regex = '[ct]hun(k|der)(s|ed|ing)?';
$matchPattern = "/(\[\w+\]|\([\w|]+\))\??/";
preg_match_all($matchPattern, $regex, $matches);
printMatches(
$regex,
prepOptions($matches[0]),
$matchPattern
);
Additional functionality
Expanding nested groups
In use you would put this before the "preg_match_all".
$regex = 'This happen(s|ed) to (be(come)?|hav(e|ing)) test case 1?';
echo preg_replace_callback("/(\(|\|)(\w+)(?:\(([\w\|]+)\)\??)/", function($array){
$output = explode("|", $array[3]);
if ($array[0][-1] === "?") {
$output[] = "";
}
foreach ($output as &$option) {
$option = $array[2] . $option;
}
return $array[1] . implode("|", $output);
}, $regex), PHP_EOL;
Output:
This happen(s|ed) to (become|be|have|having) test case 1?
Matching single letters
The bones of this would be to update the regex:
$matchPattern = "/(?:(\[\w+\]|\([\w|]+\))\??|(\w\?))/";
and add an else to the prepOptions function:
} else {
$array = [$cleanString];
}
Full working example
function printMatches($pattern, $array, $matchPattern)
{
$currentArray = array_shift($array);
foreach ($currentArray as $option) {
$patternModified = preg_replace($matchPattern, $option, $pattern, 1);
if (!count($array)) {
echo $patternModified, PHP_EOL;
} else {
printMatches($patternModified, $array, $matchPattern);
}
}
}
function prepOptions($matches)
{
foreach ($matches as $match) {
$cleanString = preg_replace("/[\[\]\(\)\?]/", "", $match);
if ($match[0] === "[") {
$array = str_split($cleanString, 1);
} elseif ($match[0] === "(") {
$array = explode("|", $cleanString);
} else {
$array = [$cleanString];
}
if ($match[-1] === "?") {
$array[] = "";
}
$possibilites[] = $array;
}
return $possibilites;
}
$regex = 'This happen(s|ed) to (be(come)?|hav(e|ing)) test case 1?';
$matchPattern = "/(?:(\[\w+\]|\([\w|]+\))\??|(\w\?))/";
$regex = preg_replace_callback("/(\(|\|)(\w+)(?:\(([\w\|]+)\)\??)/", function($array){
$output = explode("|", $array[3]);
if ($array[0][-1] === "?") {
$output[] = "";
}
foreach ($output as &$option) {
$option = $array[2] . $option;
}
return $array[1] . implode("|", $output);
}, $regex);
preg_match_all($matchPattern, $regex, $matches);
printMatches(
$regex,
prepOptions($matches[0]),
$matchPattern
);
Output:
This happens to become test case 1
This happens to become test case
This happens to be test case 1
This happens to be test case
This happens to have test case 1
This happens to have test case
This happens to having test case 1
This happens to having test case
This happened to become test case 1
This happened to become test case
This happened to be test case 1
This happened to be test case
This happened to have test case 1
This happened to have test case
This happened to having test case 1
This happened to having test case

Unset array Items matching a pattern [duplicate]

This question already has answers here:
filter values from an array similar to SQL LIKE '%search%' using PHP
(4 answers)
Closed last month.
I have the following Array :
Array
{
[0]=>"www.abc.com/directory/test";
[1]=>"www.abc.com/test";
[2]=>"www.abc.com/directory/test";
[3]=>"www.abc.com/test";
}
I only want the items that have something in middle in URL like /directory/ and unset the items that do not have that.
Output should be like:
Array
{
[0]=>"www.abc.com/directory/test";
[1]=>"www.abc.com/directory/test";
}
An example without closures. Sometimes you just need to understand the basics first, before you can move on to the neater stuff.
$newArray = array();
foreach($array as $value) {
if ( strpos( $value, '/directory/') ) {
$newArray[] = $value;
}
}
Try using array_filter this:
$result = array_filter($data, function($el) {
$parts = parse_url($el);
return substr_count($parts['path'], '/') > 1;
});
If you have something inside path will allways contain at least 2 slashes.
So for input data
$data = Array(
"http://www.abc.com/directory/test",
"www.abc.com/test",
"www.abc.com/directory/test",
"www.abc.com/test/123"
);
you output will be
Array
(
[0] => http://www.abc.com/directory/test
[2] => www.abc.com/directory/test
[3] => www.abc.com/test/123
)
A couple of approaches:
$urls = array(
'www.abc.com/directory/test',
'www.abc.com/test',
'www.abc.com/foo/directory/test',
'www.abc.com/foo/test',
);
$matches = array();
// if you want /directory/ to appear anywhere:
foreach ($urls as $url) {
if (strpos($url, '/directory/')) {
$matches[] = $url;
}
}
var_dump($matches);
$matches = array();
// if you want /directory/ to be the first path:
foreach ($urls as $url) {
// make the strings valid URLs
if (0 !== strpos($url, 'http://')) {
$url = 'http://' . $url;
}
$parts = parse_url($url);
if (isset($parts['path']) && substr($parts['path'], 0, 11) === '/directory/') {
$matches[] = $url;
}
}
var_dump($matches);
<?php
$array = Array("www.abc.com/directory/test",
"www.abc.com/test",
"www.abc.com/directory/test",
"www.abc.com/test",
);
var_dump($array);
array_walk($array, function($val,$key) use(&$array){
if (!strpos($val, 'directory')) {
unset($array[$key]);
}
});
var_dump($array);
php >= 5.3.0

php -> delete items from array which contain words from a blacklist

I have got an array with several twitter tweets and want to delete all tweets in this array which contain one of the following words blacklist|blackwords|somemore
who could help me with this case?
Here's a suggestion:
<?php
$banned_words = 'blacklist|blackwords|somemore';
$tweets = array( 'A normal tweet', 'This tweet uses blackwords' );
$blacklist = explode( '|', $banned_words );
// Check each tweet
foreach ( $tweets as $key => $text )
{
// Search the tweet for each banned word
foreach ( $blacklist as $badword )
{
if ( stristr( $text, $badword ) )
{
// Remove the offending tweet from the array
unset( $tweets[$key] );
}
}
}
?>
You can use array_filter() function:
$badwords = ... // initialize badwords array here
function filter($text)
{
global $badwords;
foreach ($badwords as $word) {
return strpos($text, $word) === false;
}
}
$result = array_filter($tweetsArray, "filter");
use array_filter
Check this sample
$tweets = array();
function safe($tweet) {
$badwords = array('foo', 'bar');
foreach ($badwords as $word) {
if (strpos($tweet, $word) !== false) {
// Baaaad
return false;
}
}
// OK
return true;
}
$safe_tweets = array_filter($tweets, 'safe'));
You can do it in a lot of ways, so without more information, I can give this really starting code:
$a = Array(" fafsblacklist hello hello", "white goodbye", "howdy?!!");
$clean = Array();
$blacklist = '/(blacklist|blackwords|somemore)/';
foreach($a as $i) {
if(!preg_match($blacklist, $i)) {
$clean[] = $i;
}
}
var_dump($clean);
Using regular expressions:
preg_grep($array,"/blacklist|blackwords|somemore/",PREG_GREP_INVERT)
But i warn you that this may be inneficient and you must take care of punctuation characters in the blacklist.

Finding tags in query string with regular expression

I have to set some routing rules in my php application, and they should be in the form
/%var/something/else/%another_var
In other words i beed a regex that returns me every URI piece marked by the % character, String marked by % represent var names so they can be almost every string.
another example:
from /%lang/module/controller/action/%var_1
i want the regex to extract lang and var_1
i tried something like
/.*%(.*)[\/$]/
but it doesn't work.....
Seeing as it's routing rules, and you may need all the pieces at some point, you could also split the string the classical way:
$path_exploded = explode("/", $path);
foreach ($path_exploded as $fragment) if ($fragment[0] == "%")
echo "Found $fragment";
$str='/%var/something/else/%another_var';
$s = explode("/",$str);
$whatiwant = preg_grep("/^%/",$s);
print_r($whatiwant);
I don’t see the need to slow down your script with a regex … trim() and explode() do everything you need:
function extract_url_vars($url)
{
if ( FALSE === strpos($url, '%') )
{
return $url;
}
$found = array();
$parts = explode('/%', trim($url, '/') );
foreach ( $parts as $part )
{
$tmp = explode('/', $part);
$found[] = ltrim( array_shift($tmp), '%');
}
return $found;
}
// Test
print_r( extract_url_vars('/%lang/module/controller/action/%var_1') );
// Result:
Array
(
[0] => lang
[1] => var_1
)
You can use:
$str = '/%lang/module/controller/action/%var_1';
if(preg_match('#/%(.*?)/[^%]*%(.*?)$#',$str,$matches)) {
echo "$matches[1] $matches[2]\n"; // prints lang var_1
}

Categories