I have a line of text that has acronyms inside is kind of like this...
$draft="The war between the CIA and NSA started in K2 when the FBI hired M";
I can't for the life of me figure out how to create a new string with all acronyms removed.
I need this output...
$newdraft="The war between the and started in when the hired";
The only php functions I can find only remove words that you statically declare like this!
$newdraft= str_replace("CIA", " ", $draft);
Anyone have any ideas, or an already created function?
Ok, let's try to write something (albeit I can't understand what for it can be useful).
<?php
function remove_acronyms($str)
{
$str_arr = explode(' ', $str);
if (empty($str_arr)) return false;
foreach ($str_arr as $index => $val)
{
if ($val==strtoupper($val)) unset($str_arr[$index]);
}
return implode(' ', $str_arr);
}
$draft = "The war between the CIA and NSA started in K2 when the FBI hired M";
print remove_acronyms($draft);
http://codepad.org/cIZSwwhV
Definition of an acronym: any word that is fully capitalized, and at least 2 chars long.
<?php
$draft="The war between the CIA and NSA started in K2 when the FBI hired M";
$words = explode(' ', $draft);
foreach($words as $i => $word)
{
if (!strcmp($word, strtoupper($word)) && strlen($word) >= 2)
{
unset($words[$i]);
}
}
$clean = implode(' ', $words);
echo $clean;
?>
Try to define an acronym. You'd have to cut some corners, but stating something like 'any single word that is smaller then 5 characters and in all capitals' should be correct for this sample, and you'd be able to write a regular expression for that.
Other then that, you could make a huge list of known acronyms and just replace those.
Regex to remove multiple caps and/or numbers appearing together:
$draft="The war between the CIA and NSA started in K2 when the FBI hired M";
$newdraft = preg_replace('/[A-Z0-9][A-Z0-9]+/', '', $draft);
echo $newdraft;
Related
I wan't to create a highlight tag search function via php
when I search a part of word...whole of word be colored
for example this is a sample text:
Text: British regulators say traders used private online chatrooms to coordinate their buying and selling to shift currency prices in their favor.
when I search "th" the output be like this:
Text: British regulators say traders used private online chatrooms to coordinate their buying and selling to shift currency prices in their favor.
So...I tried this code...please help me to complete it.
This is a algorithm:
$text= "British regulators say...";
foreach($word in $text)
{
if( IS There "th" in $word)
{
$word2= '<b>'.$word.'</b>'
replace($word with word2 and save in $text)
}
}
how can I it in php language?
function highLightWords($string,$find)
{
return preg_replace('/\b('.$find.'\w+)\b/', "<b>$1</b>", $string);
}
Usage:
$string="British regulators say traders used private online chatrooms to coordinate their buying and selling to shift currency prices in their favor.";
$find="th";
print_r(highLightWords($string,$find));
Fiddle
Edit after your comment:
...How can I do it for middle characters? for example "line"
Very easy, just update the regex pattern accordingly
return preg_replace("/\b(\w*$find\w*)\b/", "<b>$1</b>", $string);
Fiddle
Use strpos() to find the position of the character you search for.. Then start reading from that identified position of character to till you don't find any space..
Should be much easier:
$word = "th";
$text = preg_replace("/\b($word.*?)\b/", "<b>$1</b>", $text);
Let's say a lot of things.
First, as you know php is a server-side code, so, as long as you won't mind reload the page each time or use ajax...
The correct way i think will be using Javascript to Achieve this.
That said to explode the text you need to use another function, to be sure of what obtained:
Something like:
$str = "Hello world. It's a beautiful day.";
$words = explode(" ",$str);
Now Words var will contain the exploded string.
Now you can loop and replace (for example) and then re-construct the string and print it or do other.
You can go with the following code
<?php
$string = "British regulators say traders used private online chatrooms to coordinate their buying and selling to shift currency prices in their favor";
$keyword = "th";
echo highlightkeyword($string , $keyword );
function highlightkeyword($str, $search) {
$occurrences = substr_count(strtolower($str), strtolower($search));
$newstring = $str;
$match = array();
for ($i=1;$i<$occurrences;$i++) {
$match[$i] = stripos($str, $search, $i);
$match[$i] = substr($str, $match[$i], strlen($search));
$newstring = str_replace($match[$i], '[#]'.$match[$i].'[#]', strip_tags($newstring));
}
$newstring = str_replace('[#]', '<b>', $newstring);
$newstring = str_replace('[#]', '</b>', $newstring);
return $newstring;
}
?>
Check here https://eval.in/220395
I have a piece of PHP code as follows:
$words = array(
'Art' => '1',
'Sport' => '2',
'Big Animals' => '3',
'World Cup' => '4',
'David Fincher' => '5',
'Torrentino' => '6',
'Shakes' => '7',
'William Shakespeare' => '8'
);
$text = "I like artists, and I like sports. Can you call the name of a big animal? Brazil World Cup matchers are very good. William Shakespeare is very famous in the world.";
$all_keywords = $all_keys = array();
foreach ($words as $word => $key) {
if (strpos(strtolower($text), strtolower($word)) !== false) {
$all_keywords[] = $word;
$all_keys[] = $key;
}
}
echo $keywords_list = implode(',', $all_keywords) ."<br>";
echo $keys_list = implode(',', $all_keys) . "<br>";
The code echos Art,Sport,World Cup,Shakes,William Shakespeare and 1,2,4,7,8; however, the code is very simple and is not accurate enough to echo the right keywords. For example, the code returns 'Shakes' => '7' because of the Shakespeare word in $text, but as you can see, "Shakes" can not represent "Shakespeare" as a proper keyword. Basically I want to return Art,Sport,World Cup,William Shakespeare and 1,2,4,8 instead of Art,Sport,World Cup,Shakes,William Shakespeare and 1,2,4,7,8. So, could you please help me how to develop a better code to extract the keywords without having similar problems? thanks for your help.
You may want to look at regular expressions to weed out partial matches:
// create regular expression by using alternation
// of all given words
$re = '/\b(?:' . join('|', array_map(function($keyword) {
return preg_quote($keyword, '/');
}, array_keys($words))) . ')\b/i';
preg_match_all($re, $text, $matches);
foreach ($matches[0] as $keyword) {
echo $keyword, " ", $words[$keyword], "\n";
}
The expression uses the \b assertion to match word boundaries, i.e. the word must be on its own.
Output
World Cup 4
William Shakespeare 8
You're better off using regular expressions if you want accurate matches.
I modified your original code to use them instead of strpos() as it will result in partial matches, as was the case with your code.
There's room for improvement, but hopefully you get the basic gist of it.
Let me know if you have any questions.
Code was modified to a shell script, so save to demo.php and chmod +x demo.php && ./demo.php
`
#!/usr/bin/php
//array of regular expressions to match your words/phrases
$words = array(
'/\b[Aa]rt\b/',
'/\bI\b/',
'/\bSport\b/',
'/\bBig Animals\b/' ,
'/\bWorld Cup\b/' ,
'/\bDavid Fincher\b/',
'/\bTorrentino\b/' ,
'/\bShakes\b/' ,
'/\b[sS]port[s]{0,1}\b/' ,
'/\bWilliam Shakespeare\b/',
);
$text = "I like artists and art, and I like sports. Can you call the name of a big animal? Brazil World Cup matchers are very good. William Shakespeare is very famous in the world.";
$all_keywords = array(); //changed formatting for clarity
$all_keys = array();
foreach ($words as $regex) {
$m = array();
if (preg_match_all($regex, $text, $m, PREG_OFFSET_CAPTURE)>=1)
for ($n=0;$n<count($m); ++$n) {
$match = $m[0];
foreach($match as $mm) {
$key = $mm[1]; //key is the offset in $text where the match begins
$word = $mm[0]; //the matched word/phrase
$all_keywords[] = $word;
$all_keys[] = $key;
}
}
}
echo "\$text = \"$text\"\n";
echo $keywords_list = implode(',', $all_keywords) ."<br>\n";
echo $keys_list = implode(',', $all_keys) . "<br>\n";
`
Replace
strpos(strtolower($text), strtolower($word)
With
preg_match('/\b'.$word.'\b/',$text)
Or, since you don't seem to care about capital letters:
preg_match('/\b'.strtolower($word).'\b/', strtolower($text))
I suggest in that case that you perform strtolower($text) beforehand, for instance just before the beginning of foreach.
From the top of my head, I think there are two additional steps to make this function a bit robust.
If we somehow sort the $words array by strlen (descending, bigger words at the top and smaller at the bottom) there would be greater chance for desired "match".
In the for loop, when a word "matches" or strcmp returns true, we can remove the matched word from the string to avoid further unnecessary match. (e.g. Shakes will always match where William Shakespeare matches.)
P.S. SO ios app rocks! But still not easy to code(bloody autocorrect!)
I'm trying to find a way to negate sentences based on POS-tagging. Please consider:
include_once 'class.postagger.php';
function negate($sentence) {
$tagger = new PosTagger('includes/lexicon.txt');
$tags = $tagger->tag($sentence);
foreach ($tags as $t) {
$input[] = trim($t['token']) . "/" . trim($t['tag']) . " ";
}
$sentence = implode(" ", $input);
$postagged = $sentence;
// Concatenate "not" to every JJ, RB or VB
// Todo: ignore negative words (not, never, neither)
$sentence = preg_replace("/(\w+)\/(JJ|MD|RB|VB|VBD|VBN)\b/", "not$1/$2", $sentence);
// Remove all POS tags
$sentence = preg_replace("/\/[A-Z$]+/", "", $sentence);
return "$postagged<br>$sentence";
}
BTW: In this example, I'm using the POS-tagging implementation and lexicon of Ian Barber. An example of this code running would be:
echo negate("I will never go to their place again");
I/NN will/MD never/RB go/VB to/TO their/PRP$ place/NN again/RB
I notwill notnever notgo to their place notagain
As you can see, (and this issue is also commented in the code), negating words themselves are being negated as wel: never becomes notnever, which obviously shouldn't happen. Since my regex skills aren't all that, is there a way to exclude these words from the regex used?
[edit] Also, I would very much welcome other comments / critiques you might have in this negating implementation, since I'm sure it's (still) quite flawed :-)
Give this a try:
$sentence = preg_replace("/(\s)(?:(?!never|neither|not)(\w*))\/(JJ|MD|RB|VB|VBD|VBN)\b/", "$1not$2", $sentence);
Can someone tell me please how to do this:
Input:
hello http://DOMAIN.com/asdakdjk.php?asd=231&adsj=23 u.s. nicely done!
Result:
Hello http://DOMAIN.com/asdakdjk.php?asd=231&adsj=23 U.S. Nicely Done!
Including words in separated by '.' if possible such as in U.S.
Thanks
try this:
<?php
function capitalizeNonURLs($input)
{
preg_match('#(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)#', $input, $matches);
$url = $matches[1];
$temp = ucwords($input);
$output = str_ireplace($url, $url, $temp);
return $output;
}
$str = "hello http://domain.com/asdakdjk.php?asd=231&adsj=23 u.s. nicely done!";
echo capitalizeNonURLs($str);
Keep in mind that this function does not handle abbreviations (it won't change usa to USA). Country codes can be handled in several different ways. One is to make a hashmap of country codes and replace them or use regular expression for that as well.
To keep urls lower:
$strarray = explode(' ',$str);
for($i=0;$i<count($strarray))
{
if(substr($strarray[$i],0,4)!='http')
{
$strarray[$i] = ucfirst($strarray[$i])
}
}
$new_str = implode('',$strarray);
Hey all. I have a string of names separated by commas. I'm exploding this string of names into an array of names with the comma as the delimiter. I need a RegEx to remove a white space(if any) only after the comma and not the white space between the first and last name.
So as an example:
$nameStr = "Sponge Bob,Bart Simpson, Ralph Kramden,Uncle Scrooge,Mickey Mouse";
See the space before Ralph Kramden? I need to have that space removed and not the space between the names. And I need to have any other spaces before names that would occur removed as well.
P.S.: I've noticed an interesting behavior regarding white space and this situation. Take the following example:
When not line breaking an echo like so:
$nameStr = "Sponge Bob,Bart Simpson, Ralph Kramden,Uncle Scrooge,Mickey Mouse";
$nameArray = explode(",", $nameStr);
foreach($nameArray as $value)
{
echo $value;
}
The result is: Sponge BobBart Simpson Ralph KramdenUncle ScroogeMickey Mouse
Notice the white space still there before Ralph Kramden
However, when line breaking the above echo like so:
echo $value . "<br />";
The output is:
Sponge Bob
Bart Simpson
Ralph Kramden
Uncle Scrooge
Mickey Mouse
And all of the names line up with what appears to be no white space before the name.
So what exactly is PHP's behavior regarding a white space at the start of a string?
Cheers all. Thanks for replies.
What's with today's fixation on using regexp to solve every little problem
$nameStr = "Sponge Bob,Bart Simpson, Ralph Kramden,Uncle Scrooge,Mickey Mouse";
$nameArray = explode(",", $nameStr);
foreach($nameArray as $value)
{
echo trim($value);
}
EDIT
PHP's behaviour re white space is to treat it as the appropriate character and do what it's told by you.
HTML's behaviour (or at least that of web browsers) is rather different... and you'll need to learn and understand that difference
Try
$nameStr = preg_replace("/,([\s])+/",",",$nameStr);
$nameArray = explode(",", $nameStr);
This is a workable regex solution, but as others have pointed out above, a simple trim() will do the job with what you already have.
As you have mentioned to remove White Space only after the Comma, considering that space before comma can be left.
You can also use below:
$nameStr = "Sponge Bob, Bart Simpson, Ralph Kramden,Uncle Scrooge,Mickey Mouse";
while (strpos($nameStr, ', ') !== FALSE) {
$nameStr = str_replace(', ', ',', $nameStr);
}
echo $nameStr;
After this, you can simply explode it as:
$allNames = explode(',', $nameStr);
Otherwise the regex by Michael is very good.
Why don't you just preg_split?
$names = preg_split('~,\s*~', $names);
PHP couldn't care less what's in a string, unless it's parsing it for variable interpolation. A space is like any other character, an ascii or unicode value that just happens to show up as a "blank".
How are you replacing those post-comma spaces?
$str = str_replace(', ', ',', $str);
^--space
If that's not catching the space before your Ralph name, then most likely whatever's there isn't a space character (ascii 32). You can try displaying what it is with:
echo ord(substr($nameArray['index of Ralph here'], 0, 1));
Have you tried it with the trim function?
$nameStr = "Sponge Bob,Bart Simpson, Ralph Kramden,Uncle Scrooge,Mickey Mouse";
$nameArray = explode(",", $nameStr);
for($i = 0; $i < count($nameArray); $i++) {
$nameArray[$i] = trim($nameArray[$i];
}
$newNameStr = implode(',', $nameArray);
You can do this with a RegExp, but since it looks like you aren't very experienced with RegExp you shouldn't use them, because when doing it wrong they cost a good chunk of performance.
An easy solution is to avoid using regex and apply the trim function:
$nameStr = 'Bart Simpson, Lisa Simpson,Homer Simpson, Madonna';
$names = explode(',', $nameStr);
foreach($names as &$name) {
$name = trim($name);
}
//this also doesn't falter when a name is only one word
this one works for me
$string = "test1, test2, test3, test4, , ,";
array_map('trim', array_filter(explode(',', $string), 'trim'))
output
=> [
"test1",
"test2",
"test3",
"test4",
]