Php - Group by similar words

Php - Group by similar words - php

I was just thinking that how could we group by or seperate similar words in PHP or MYSQL. For instance, like i have samsung Glaxy Ace, Is this possible to recognize S120, S-120, s120, S-120.
Is this even possible?
Thanks

What you could do is strip all non alphanumeric characters and spaces, and strtoupper() the string.
$new_string = preg_replace("/[^a-zA-Z0-9]/", "", $string);
$new_string = strtoupper($new_string);

Only those? Easily.
/S-?120/i
But if you want to extend, you'll probably need to move from REGEX to something a little more sophisticated.

The best thing to do here is to pick a format and standardise on it. So for your example, you would just store S120, and when you get a value from a user, strip all non-alphanumeric characters from it and convert it to upper case.
You can do this in PHP with this code:
$result = strtoupper(preg_replace('/(\W|_)+/', '', $userInput));

Related

Php Replace Any Number Of A Certain Character

I have urls with strings in them that look like this:
search?q=FAIRMONT+FREE+STANDING+SPACE+SAVER+CABINET+IN+ESPRESSO++++++++++++++++++++++++++++++++++++++++++++++&
I've been trying to replace the extra plus symbols, but it's always a different number of them.
$ss = str_replace('+++++++', '+', $row[0]);
I assume I need to do a regex to match "any number of plus signs" here on the first parameter, but I'm not sure how to do that syntactically, or if that can still be done in str_replace?
Any help appreciated?
Edit -- this is not an "exact duplicate" as it was marked -- it asks a specific question about how to replace multiple characters in PHP. While one way to do this is regex using preg_replace, I assume there may be other methods as well. The question is not just about regex, considering there could be other solutions.

This seems to work, you need to use preg_replace()
$ss = preg_replace('~\+{2,}~', '+', $ss);

Split string on non-alphanumerics in PHP? Is it possible with php's native function?

I was trying to split a string on non-alphanumeric characters or simple put I want to split words. The approach that immediately came to my mind is to use regular expressions.
Example:
$string = 'php_php-php php';
$splitArr = preg_split('/[^a-z0-9]/i', $string);
But there are two problems that I see with this approach.
It is not a native php function, and is totally dependent on the PCRE Library running on server.
An equally important problem is that what if I have punctuation in a word
Example:
$string = 'U.S.A-men's-vote';
$splitArr = preg_split('/[^a-z0-9]/i', $string);
Now this will spilt the string as [{U}{S}{A}{men}{s}{vote}]
But I want it as [{U.S.A}{men's}{vote}]
So my question is that:
How can we split them according to words?
Is there a possibility to do it with php native function or in some other way where we are not dependent?
Regards

Sounds like a case for str_word_count() using the oft forgotten 1 or 2 value for the second argument, and with a 3rd argument to include hyphens, full stops and apostrophes (or whatever other characters you wish to treat as word-parts) as part of a word; followed by an array_walk() to trim those characters from the beginning or end of the resultant array values, so you only include them when they're actually embedded in the "word"

Either you have PHP installed (then you also have PCRE), or you don't. So your first point is a non-issue.
Then, if you want to exclude punctuation from your splitting delimiters, you need to add them to your character class:
preg_split('/[^a-z0-9.\']+/i', $string);
If you want to treat punctuation characters differently depending on context (say, make a dot only be a delimiter if followed by whitespace), you can do that, too:
preg_split('/\.\s+|[^a-z0-9.\']+/i', $string);

As per my comment, you might want to try (add as many separators as needed)
$splitArr = preg_split('/[\s,!\?;:-]+|[\.]\s+/', $string, -1, PREG_SPLIT_NO_EMPTY);
You'd then have to handle the case of a "quoted" word (it's not so easy to do in a regular expression, because 'is" "this' quoted? And how?).
So I think it's best to keep ' and " within words (so that "it's" is a single word, and "they 'll" is two words) and then deal with those cases separately. For example a regexp would have some trouble in correctly handling
they 're 'just friends'. Or that's what they say.
while having "'re" and a sequence of words of which the first is left-quoted and the last is right-quoted, the first not being a known sequence ('s, 're, 'll, 'd ...) may be handled at application level.

This is not a php-problem, but a logical one.
Words could be concatenated by a -. Abbrevations could look like short sentences.
You can match your example directly by creating a solution that fits only on this particular phrase. But you cant get a solution for all possible phrases. That would require a neuronal-computing based content-recognition.

filter non-alphanumeric "repeating" characters

What's the best way to filter non-alphanumeric "repeating" characters
I would rather no build a list of characters to check for. Is there good regex for this I can use in PHP.
Examples:
...........
*****************
!!!!!!!!
###########
------------------
~~~~~~~~~~~~~
Special case patterns:
=*=*=*=*=*=
->->->->

Based on #sln answer:
$str = preg_replace('~([^0-9a-zA-Z])\1+|(?:=[*])+|(?:->)+~', '', $str);

The pattern could be something like this : s/([\W_]|=\*|->)\1+//g
or, if you want to replace by just a single instance: s/([\W_]|=\*|->)\1+/$1/g
edit ... probably any special sequence should be first in the alternation, incase you need to make something like == special, it won't be grabbed by [\W_].
So something like s/(==>|=\*|->|[\W_])\1+/$1/g where special cases are first.

preg_replace('~\W+~', '', $str);

sin's solution is pretty good but the use of \W "non-word" class includes whitespace. I don't think you wan't to be removing sequences of tabs or spaces! Using a negative class (something like: '[^A-Za-z0-9\s]') would work better.

This will filter out all symbols
[code]
$q = ereg_replace("[^A-Za-z0-9 ]", "", $q);
[/code]

replace(/([^A-Za-z0-9\s]+)\1+/, "")
will remove repeated patterns of non-alphanumeric non-whitespace strings.
However, this is a bad practice because you'll also be removing all non-ASCII European and other international language characters in the Unicode base.
The only place where you really won't ever care about internationalization is in processing source code, but then you are not handling text quoted in strings and you may also accidentally de-comment a block.
You may want to be more restrictive in what you try to remove by giving a list of characters to replace instead of the catch-all.
Edit: I have done similar things before when trying to process early-version ShoutCAST radio names. At that time, stations tried to call attention to themselves by having obnoxious names like: <<!!!!--- GREAT MUSIC STATION ---!!!!>>. I used used similar coding to get rid of repeated symbols, but then learnt (the hard way) to be careful in what I eventually remove.

This works for me:
preg_replace('/(.)\1{3,}/i', '', $sourceStr);
It removes all the symbols that repats 3+ times in row.

regular expr question

i'v got such string <>1 <>2 <>3
i want remove all '<>' and symbols after '<>' i want replace with such expression like www.test.com/1.jpg, www.test.com/2.jpg, www.test.com/3.jpg
is it possible to do with regex? i only know to find '/<>.?/'

preg_replace('/<>(\d+)/g', 'www.test.com/bla/$1.jpg', $input);
(assuming your replaced elements are just numbers. If they are more general, you'll need to replace '\d+' by something else).

str_replace('<>', 'www.test.com/', $input);
// pseudo code
pre_replace_all('~<>([0-9]+)~', 'www.test.com/$1.jpg', $input);

$string = '<>1 <>2 <>3';
$temp = explode(' ',preg_replace('/<>(\d)/','www.test.com/\1.jpg',$string));
$newString = implode(', ',$temp);
echo $newString;

Based on your example, I don’t think you need regex at all.
$str = '<>1 <>2 <>3';
print_r(str_replace('<>', 'www.test.com/', $str));

Regex's allow you to manipulate a string in any fashion you desire, to modify the string in the fashion you desire you would use the following regex:
<>(\d)
and you would use regex back referencing to keep the values you have captured in your grouping brackets, in this case a single digit. The back reference is typically signified by the $ symbol and then the number of the group you are referencing. As follows:
www.test.com/$1
this would be used in a regex replace scenario which would be implemented in different ways depending on the language you are implementing your regex replace method in.

Regex to match all characters except letters and numbers

I want to clean the filenames of all uploaded files. I want to remove all characters except periods, letters and numbers. I'm not good with regex so I thought I would ask here.
Can someone show me how to put this together? I'm using PHP.

$newfilename=preg_replace('/[^a-zA-Z0-9.]/','',$filename);

s/[^.a-zA-Z\d]//g
(This is a Perl expression of how to use the RegExp. In PHP you do:
$output = preg_replace('/[^.a-zA-Z\d]/', '', $input);

Try to use this:
$cleanString = preg_replace('#\W#', '', $string);
It will remove all but letters and numbers.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Php - Group by similar words - php

I was just thinking that how could we group by or seperate similar words in PHP or MYSQL. For instance, like i have samsung Glaxy Ace, Is this possible to recognize S120, S-120, s120, S-120. Is this even possible? Thanks

What you could do is strip all non alphanumeric characters and spaces, and strtoupper() the string. $new_string = preg_replace("/[^a-zA-Z0-9]/", "", $string); $new_string = strtoupper($new_string);

Only those? Easily. /S-?120/i But if you want to extend, you'll probably need to move from REGEX to something a little more sophisticated.

Related

Php Replace Any Number Of A Certain Character

Split string on non-alphanumerics in PHP? Is it possible with php's native function?

filter non-alphanumeric "repeating" characters

regular expr question

Regex to match all characters except letters and numbers

Categories

Resources