How to parse string pattern in php while the string is complex - php

i need your help about how to parse the string. I have a string with structure below :
MALANG|TVhHMTAwMDBK MALANGBONG,GARUT|QkRPMjA3MTlK MALANGKE BARAT,MASAMBA|VVBHMjMzMDVK MALANGKE,MASAMBA|VVBHMjMzMDRK
I'm now confuse how to parse this string so that i can get a pattern like this :
MALANG|TVhHMTAwMDBK
MALANGBONG,GARUT|QkRPMjA3MTlK
MALANGKE BARAT,MASAMBA|VVBHMjMzMDVK
MALANGKE BARAT,MASAMBA|VVBHMjMzMDVK
The pattern output are City_Name|RandomCode
I have try to use explode by space, but the city name sometimes also contains a space. What function in php i could use to solve this problem?

Try this one out. It fits your example ok
$str = 'MALANG|TVhHMTAwMDBK MALANGBONG,GARUT|QkRPMjA3MTlK MALANGKE BARAT,MASAMBA|VVBHMjMzMDVK MALANGKE,MASAMBA|VVBHMjMzMDRK';
$pattern = '/(?<=^| )[A-Z, ]+?\|[A-Za-z0-9]+(?= |$)/';
if (preg_match_all($pattern, $str, $matches)) {
$parts = $matches[0];
}
You may need to tweak some of the character classes if say your city names contain anything other than capital letters, spaces and commas.
Example here - http://codepad.viper-7.com/6ujl3p
Alternatively, if the RandomCode parts are guaranteed to all be 12 characters long, preg_split may be a better fit, eg
$pattern = '/(?<=\|[A-Za-z0-9]{12}) /';
$parts = preg_split($pattern, $str);
Demo here - http://codepad.viper-7.com/Wd4Wmc

Related

Suggestion about search coincidences in string with PHP using REGEX

I am trying to search this coincidence in a string:
1. I need to take only numbers after the chracter '#' as long as this coincidence has not spaces, for example:
String = 'This is a test #VVC345RR, text, and more text 12345';
I want to take only this from my string -> 345.
My example:
$s = '\"access_token=103782364732640461|2. myemail#domain1.com ZmElnDTiZlkgXbT8e3 #DD234 4Jrw__.3600.1281891600-10000186237005';
$matches = array();
$s = preg_match('/#([0-9]+)/', $s, $matches);
print_r($matches);
This only works when I have one # and numbers.
Thanks!
Maybe:
#\D*\K(\d+)
Accomplishes what you want?
This will look for an #, any non-numbers, and then capture the numbers. The \K ignores the early match.
https://regex101.com/r/gNTccx/1/
I'm unclear what you mean by has not spaces, there are no spaces in the example string.

Expecting output is not displaying from php code

This is the code:
<?php
$pattern =' abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890';
$text = "kdaiuyq7e611422^^$^vbnvcn^vznbsjhf";
$text_split = str_split($text,1);
$data = '';
foreach($text_split as $value){
if (preg_match("/".$value."/", $pattern )){
$data = $data.$value;
}
if (!preg_match('/'.$value.'/', $pattern )){
break;
}
}
echo $data;
?>
Current output:
kdaiuyq7e611422^^$^vbnvcn^vznbsjhf
Expected output:
kdaiuyq7e611422
Please help me editing my code error. In pattern there is no ^ or $. But preg_match is showing matched which is doubtful.
You string $text have ^ which will match the begin of the string $pattern.
So the preg_match('/^/', $pattern) will return true, then the ^ will append to $data.
You should escape the ^ as a raw char, not a special char with preg_match('/\^/', $pattern) by the help of preg_quote() which will escape the special char.
There is no need to split your string up like that, the whole point of a regular expression is you can specify all the conditions within the expression. You can condense your entire code down to this:
$pattern = '/^[[:word:] ]+/';
$text = 'kdaiuyq7e611422^^$^vbnvcn^vznbsjhf';
preg_match($pattern, $text, $matches);
echo $matches[0];
Kris has accurately isolated that escaping in your method is the monkey wrench. This can be solved with preg_quote() or wrapping pattern characters in \Q ... \E (force characters to be interpreted literally).
Slapping that bandaid on your method (as you have done while answering your own question) doesn't help you to see what you should be doing.
I recommend that you do away with the character mask, the str_split(), and the looped calls of preg_match(). Your task can be accomplished far more briefly/efficiently/directly with a single preg_match() call. Here is the clean way that obeys your character mask fully:
Code: (Demo)
$text = "kdaiuyq7e611422^^$^vbnvcn^vznbsjhf";
echo preg_match('/^[a-z\d ]+/i',$text,$out)?$out[0]:'No Match';
Output:
kdaiuyq7e611422
miknik's method was close to this, but it did not maintain 100% accuracy given your question requirements. I'll explain:
[:word:] is a POSIX Character Class (functioning like \w) that represents letters(uppercase and lowercase), numbers, and an underscore. Unfortunately for miknik, the underscore is not in your list of wanted characters, so this renders the pattern slightly inaccurate and may be untrustworthy for your project.

Search for matching words without false positivis

I found this link and am working off of it, but I need to extend it a little further.
Check if string contains word in array
I am trying to create a script that checks a webpage for known bad words. I have one array with a list of bad words, and it compares it to the string from file_get_contents.
This works at a basic level, but returns false positives. For example, if I am loading a webpage with the word "title" it returns that it found the word "tit".
Is my best bet to strip all html and punctuation, then explode it based on spaces and put each individual word into an array? I am hoping there is a more efficient process then that.
Here is my code so far:
$url = 'http://somewebsite.com/';
$content = strip_tags(file_get_contents($url));
//list of bad words separated by commas
$badwords = 'tit,butt,etc'; //this will eventually come from a db
$badwordList = explode(',', $badwords);
foreach($badwordList as $bad) {
$place = strpos($content, $bad);
if (!empty($place)) {
$foundWords[] = $bad;
}
}
print_r($foundWords);
Thanks in advance!
You can just use a regex with preg_match_all():
$badwords = 'tit,butt,etc';
$regex = sprintf('/\b(%s)\b/', implode('|', explode(',', $badwords)));
if (preg_match_all($regex, $content, $matches)) {
print_r($matches[1]);
}
The second statement creates the regex which we are using to match and capture the required words off the webpage. First, it splits the $badwords string on commas, and join them with |. This resulting string is then used as the pattern like so: /\b(tits|butt|etc)\b/. \b (which is a word boundary) will ensure that only whole words are matched.
This regex pattern would match any of those words, and the words which are found in the webpage, will be stored in array $matches[1].

Regexp in php: how do I filter dynamic strings like abc/123/...?

I am trying to filter out all characters before the first / sign. I have strings like
ABC/123/...
and I am trying to filter out ABC, 123 and ... into separate strings. I have alsmost succeeded with the parsing of the first letters before the / sign except that the / sign is part of the match, which I donĀ“t want to.
<?php
$string = "ABC/123/...";
$pattern = '/.*?\//';
preg_match($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
?>
The letters before the first/ can differ both in length and characters, so a string could also look like EEEE/1111/aaaa.
If you are trying to split the string using / as the delimiter, you can use explode.
$array = explode("/", $string);
And if you are looking only for the first element, you can use array_shift.
$array = array_shift(explode("/", $string));

Identifying a random repeating pattern in a structured text string

I have a string that has the following structure:
ABC_ABC_PQR_XYZ
Where PQR has the structure:
ABC+JKL
and
ABC itself is a string that can contain alphanumeric characters and a few other characters like "_", "-", "+", "." and follows no set structure:
eg.qWe_rtY-asdf or pkl123
so, in effect, the string can look like this:
qWe_rtY-asdf_qWe_rtY-asdf_qWe_rtY-asdf+JKL_XYZ
My goal is to find out what string constitutes ABC.
I was initially just using
$arrString = explode("_",$string);
to return $arrString[0] before I was made aware that ABC ($arrString[0]) itself can contain underscores, thus rendering it incorrect.
My next attempt was exlpoding it on "_" anyway and then comparing each of the exploded string parts with the first string part until I get a semblance of a pattern:
function getPatternABC($string)
{
$count = 0;
$pattern ="";
$arrString = explode("_", $string);
foreach($arrString as $expString)
{
if(strcmp($expString,$arrString[0])!==0 || $count==0)
{
$pattern = $pattern ."_". $arrString[$count];
$count++;
}
else break;
}
return substr($pattern,1);
}
This works great - but I wanted to know if there was a more elegant way of doing this using regular expressions?
Here is the regex solution:
'^([a-zA-Z0-9_+-]+)_\1_\1\+'
What this does is match (starting from the beginning of the string) the longest possible sequence consisting of the characters inside the square brackets (edit that per your spec). The sequence must appear exactly twice, each time followed by an underscore, and then must appear once more followed by a plus sign (this is actually the first half of PQR with the delimiter before JKL). The rest of the input is ignored.
You will find ABC captured as capture group 1.
So:
$input = 'qWe_rtY-asdf_qWe_rtY-asdf_qWe_rtY-asdf+JKL_XYZ';
$result = preg_match('/^([a-zA-Z0-9_+-]+)_\1_\1\+/', $input, $matches);
if ($result) {
echo $matches[2];
}
See it in action.
Sure, just make a regular expression that matches your pattern. In this case, something like this:
preg_match('/^([a-zA-Z0-9_+.-]+)_\1_\1\+JKL_XYZ$/', $string, $match);
Your ABC is in $match[1].
If the presence of underscores in these strings has a low frequency, it may be worth checking to see if a simple explode() will do it before bothering with regex.
<?php
$str = 'ABC_ABC_PQR_XYZ';
if(substr_count($str, '_') == 3)
$abc = reset(explode('_', $str));
else
$abc = regexy_function($str);
?>

Categories