How to combine Regex with removing space and # character - php

I have this array, that I need to remove white spaces and # hashtag character:
array (size=7)
0 => string 'darwin' (length=6)
1 => string ' #nature' (length=8)
2 => string ' explore' (length=8)
3 => string ' galapagos' (length=10)
4 => string 'karma' (length=5)
foreach ($feedSinglePosts["hashtags_list"] as $key=>&$item) {
$item = preg_replace('/(\s|^)/', '', $item);
$item = preg_replace('/\#+/', '', $item);
}
The Regex above works well but I want to make it one line if possible.
When I do: /(\s|^)\#+/ it outputs this:
array (size=7)
0 => string 'darwin' (length=6)
1 => string 'nature' (length=6)
2 => string ' explore' (length=8)
3 => string ' galapagos' (length=10)
4 => string 'karma' (length=5)
How to make the regex on liner that removes white spaces and3 hashtag.

It appears that the characters will be at the beginning or end. If so then no need for loops or regex:
array_walk($feedSinglePosts["hashtags_list"], function(&$v) { $v = trim($v, "\n\r #"); });
If you need to remove them anywhere:
$feedSinglePosts["hashtags_list"] = str_replace(["\n","\r"," ","#"], "", $feedSinglePosts["hashtags_list"]);

A non-regex way with array_walk() and trim(),
<?php
$array = ['darwin' ,' #nature',' explore', ' galapagos','karma'];
function remove_hash_space(&$value,$key){
$value = trim($value,'# ');
}
array_walk($array, 'remove_hash_space');
print_r($array);
?>
DEMO: https://3v4l.org/nOadH
OR with single line array_map(),
$array = array_map(function($e){return trim($e,'# ');},$array);
DEMO: https://3v4l.org/OaS1F

You may use
$arr = ['darwin',' #nature',' explore',' galapagos','karma'];
print_r( preg_replace('~^[\s#]+~', '', $arr) );
// => Array ( [0] => darwin [1] => nature [2] => explore [3] => galapagos [4] => karma )
See the regex demo
The ^[\s#]+ pattern matches 1 or more occurrences (+) of whitespace or # characters ([\s#]) at the start of the string (^).
If your strings may contain some wierd Unicode whitespace, consider adding the u modifier: ~^[\s#]+~u.
If you only need to handle horizontal whitespace, replace \s with \h.

Related

PHP Regex Match getting unexpected output

I'm trying to create a simple PHP script that retrieves info from a string and puts it into an array. Ive looked around on some sites on multi capture regex for one pattern but can't seem to get the output im looking for
Currently this is my script.
$input = "username: jack number: 20";
//$input = file_get_contents("test.txt");
preg_match_all("/username: ([^\s]+)|number: ([^\s]+)/", $input, $data);
var_dump($data);
Which produces this output:
0 =>
array (size=2)
0 => string 'username: jack' (length=14)
1 => string 'number: 20' (length=10)
1 =>
array (size=2)
0 => string 'jack' (length=4)
1 => string '' (length=0)
2 =>
array (size=2)
0 => string '' (length=0)
1 => string '20' (length=2)
Im looking to get the data into the form of:
0 =>
array (size=x)
0 => string 'jack'
1 =>
array (size=x)
0 => string '20'
Or two different arrays where the keys correspond to the same user/number combo
You can use match-reset \K:
preg_match_all('/\b(?:username|number):\h*\K\S+/', $input, $data);
print_r($data[0]);
Array
(
[0] => jack
[1] => 20
)
RegEx Breakup:
\b => a word boundary
(?:username|number) => matches username or number. (?:..) is non-capturing group
:\h* => matches a colon followed optional horizontal spaces
\K => match reset, causes regex engine to forget matched data
\S+ => match 1 or more non-space chars
Or else you can use a capturing group to get your matched data like this:
preg_match_all('/\b(?:username|number):\h*(\S+)/', $input, $data);
print_r($data[1]);
Array
(
[0] => jack
[1] => 20
)
(?<=username:|number:)\s*(\S+)
You can use lookbehind here.See demo.
https://regex101.com/r/mG8kZ9/10

The capturing group in the regular expression isn't output

<?php
$string = 'This is my regular expression';
$array = array();
preg_match('/^.*((my)? regular (expression)?)$/i', $string, $array);
var_dump($array);
?>
After execution of this script I have:
array (size=4)
0 => string 'This is my regular expression' (length=29)
1 => string ' regular expression' (length=19)
2 => string '' (length=0)
3 => string 'expression' (length=10)
Why it doesn't output capturing group (my)?
That is because you have a greedy quantifier .* before it. You should instead use a non greedy quantifier .*?.
Do it as follows instead:
<?php
$string = 'This is my regular expression';
$array = array();
preg_match('/^.*?((my)? regular (expression)?)$/i', $string, $array);
var_dump($array);
?>
DEMO
[OUTPUT]
array (size=4)
0 => string 'This is my regular expression' (length=29)
1 => string 'my regular expression' (length=21)
2 => string 'my' (length=2)
3 => string 'expression' (length=10)

Get content from html file

I have a list of html files. Each file repeatedly has the strings onClick="rpd(SOME_NUMBER)" . I know how to get the content from the html files, what I would want to do is get a list of the "SOME_NUMBER" . I saw that I might need to do a preg_match, but I'm horrible at regular expressions. I tried
$file_content = file_get_contents($url);
$pattern= 'onClick="rpd(#);"';
preg_match($pattern, $file_content);
As you could imagine... it didn't work. What would be the best way to get this done? Thanks!
This should get it done:
$file_content ='234=fdf donClick="rpd(5);"as23 f2 onClick="rpd(7);" dff fonClick="rpd(8);"';
$pattern= '/onClick="rpd\((\d+)\);"/';
preg_match_all($pattern, $file_content,$matches);
var_dump( $matches);
The output is like this:
array (size=2)
0 =>
array (size=3)
0 => string 'onClick="rpd(5);"' (length=17)
1 => string 'onClick="rpd(7);"' (length=17)
2 => string 'onClick="rpd(8);"' (length=17)
1 =>
array (size=3)
0 => string '5' (length=1)
1 => string '7' (length=1)
2 => string '8' (length=1)
Maybe something like this?
preg_match('/onClick="rpd\((\d+)\);"/', $file_content,$matches);
print $matches[1];
I don't know PHP, but the regular expression to match that would be:
'onClick="rpd\(([0-9]+)\)"'
Note that we need to escape those paranthesis with \ because of their special meaning, also we surrounded our match with one regular paranthesis for seperating digits.
If preg_match also supports lookahead/lookbehind expressions:
'(?<=onClick="rpd\()[0-9]+(?=\)")'
will also work.
$file_content='blah blah onClick="rpd(56)"; blah blah\nblah blah onClick="rpd(43)"; blah blah\nblah blah onClick="rpd(11)"; blah blah\n';
$pattern= '/onClick="rpd\((\d+)\)";/';
preg_match_all($pattern, $file_content, $matches);
print_r($matches);
That outputs:
Array
(
[0] => Array
(
[0] => onClick="rpd(56)";
[1] => onClick="rpd(43)";
[2] => onClick="rpd(11)";
)
[1] => Array
(
[0] => 56
[1] => 43
[2] => 11
)
)
You can play around with my example here: http://ideone.com/TzShPG
A clean way to do this is to use DOMDocument and XPath:
$doc = new DOMDocument();
#$doc->loadHTMLFile($url);
$xpath = new DOMXPath($doc);
$ress= $xpath->query("//*[contains(#onclick,'rpd(')]/attribute::onclick");
foreach ($ress as $res) {
echo substr($res->value,4,-1) . "\n";
}

Split by whitespace only if not surrounded by [,<,{ or ],>,}

I have a string like this one:
traceroute <ip-address|dns-name> [ttl <ttl>] [wait <milli-seconds>] [no-dns] [source <ip-address>] [tos <type-of-service>] {router <router-instance>] | all}
I'd like to create an array like this:
$params = array(
<ip-address|dns-name>
[ttl <ttl>]
[wait <milli-seconds]
[no-dns]
[source <ip-address>]
[tos <tos>]
{router <router-instance>] | all}
);
Should I use preg_split('/someregex/', $mystring) ?
Or is there any better solution?
Use negative lookarounds. This one uses a negative lookahead for a <. This means it will not split if it finds a < ahead of the whitespace.
$regex='/\s(?!<)/';
$mystring='traceroute <192.168.1.1> [ttl <120>] [wait <1500>] [no-dns] [source <192.168.1.11>] [tos <service>] {router <instance>] | all}';
$array=array();
$array = preg_split($regex, $mystring);
var_dump($array);
And my output is
array
0 => string 'traceroute <192.168.1.1>' (length=24)
1 => string '[ttl <120>]' (length=11)
2 => string '[wait <1500>]' (length=13)
3 => string '[no-dns]' (length=8)
4 => string '[source <192.168.1.11>]' (length=23)
5 => string '[tos <service>]' (length=15)
6 => string '{router <instance>]' (length=19)
7 => string '|' (length=1)
8 => string 'all}' (length=4)
You could use preg_match_all such as:
preg_match_all("/\\[[^]]*]|<[^>]*>|{[^}]*}/", $str, $matches);
And get your result from the $matches array.
Yes, preg_split makes sense and is probably the most efficient way to do this.
Try:
preg_split('/[\{\[<](.*?)[>\]\}]/', $mystring);
Or if you want to match rather than split, you may want to try:
$matches=array();
preg_match('/[\{\[<](.*?)[>\]\}]/',$mystring,$matches);
print_r($matches);
Updated
I missed that you're trying to get the tokens, not the content of the tokens. I think you are going to need to use preg_match. Try something like this one for a good start:
$matches = array();
preg_match_all('/(\{.*?[\}])|(\[.*?\])|(<.*?>)/', $mystring,$matches);
var_dump($matches);
I get:
Array
(
[0] => Array
(
[0] => <ip-address|dns-name>
[1] => [ttl <ttl>]
[2] => [wait <milli-seconds>]
[3] => [no-dns]
[4] => [source <ip-address>]
[5] => [tos <type-of-service>]
[6] => {router <router-instance>] | all}
)

php - How do I convert a string to an associative array of its keywords

take this string as an example: "will see you in London tomorrow and Kent the day after tomorrow".
How would I convert this to an associative array that contains the keywords as keys, whilst preferably missing out the common words, like this:
Array ( [tomorrow] => 2 [London] => 1 [Kent] => 1)
Any help greatly appreciated.
I would say you could :
split the string into an array of words
with explode
or preg_split
depending on the complexity you'll accept for your words separators
use array_filter to only keep the lines (i.e. words) you want
the callback function will have to return false for all non-valid-words
and, then, use array_count_values on the resulting list of words
which will count how many times each words is present in the array of words
EDIT : and, just for fun, here's a quick example :
First of all, the string, that gets exploded into words :
$str = "will see you in London tomorrow and Kent the day after tomorrow";
$words = preg_split('/\s+/', $str, -1, PREG_SPLIT_NO_EMPTY);
var_dump($words);
Which gets you :
array
0 => string 'will' (length=4)
1 => string 'see' (length=3)
2 => string 'you' (length=3)
3 => string 'in' (length=2)
4 => string 'London' (length=6)
5 => string 'tomorrow' (length=8)
6 => string 'and' (length=3)
7 => string 'Kent' (length=4)
8 => string 'the' (length=3)
9 => string 'day' (length=3)
10 => string 'after' (length=5)
11 => string 'tomorrow' (length=8)
Then, the filteting :
function filter_words($word) {
// a pretty simple filter ^^
if (strlen($word) >= 5) {
return true;
} else {
return false;
}
}
$words_filtered = array_filter($words, 'filter_words');
var_dump($words_filtered);
Which outputs :
array
4 => string 'London' (length=6)
5 => string 'tomorrow' (length=8)
10 => string 'after' (length=5)
11 => string 'tomorrow' (length=8)
And, finally, the counting :
$counts = array_count_values($words_filtered);
var_dump($counts);
And the final result :
array
'London' => int 1
'tomorrow' => int 2
'after' => int 1
Now, up to you to build up from here ;-)
Mainly, you'll have to work on :
A better exploding function, that deals with ponctuation (or deal with that during filtering)
An "intelligent" filtering function, that suits your needs better than mine
Have fun !
You could have a table of common words, then go through your string one word at a time, checking if it exists in the table, if not, then add it to your associative array, or +1 to it if it already exists.
using a blacklist of words not to be included
$str = 'will see you in London tomorrow and Kent the day after tomorrow';
$skip_words = array( 'in', 'the', 'will', 'see', 'and', 'day', 'you', 'after' );
// get words in sentence that aren't to be skipped and count their values
$words = array_count_values( array_diff( explode( ' ', $str ), $skip_words ) );
print_r( $words );

Categories