Extracting matching words to JSON file - php

Is it possible to extract those that match those in the JSON file as separate JSON data?
<?php
$searchArray = array('settings all','print', 'sum', 'industry'); // total 50K words
function sanitize($string,$searchArray) {
$repl = array_map("dashReplace", $searchArray);
$pattern = array_map("insertWordBoundaries", $searchArray);
$string = preg_replace($pattern,$repl,$string);
return $string;
}
function dashReplace($str) {
return "<span class='txtOlg'>" . $str . "</span>";
}
function insertWordBoundaries($str){
return "/\b". preg_quote($str,"/") ."\b/";
}
$text = 'Lorem Ipsum is simply dummy text of the printing and typesettings all industry.';
echo sanitize($text,$searchArray);
Demo
My goal is to pack only the matching words in a separate JSON.
How can I do this, can you guide me?

Ok! after some researches I came up with this, to make a search faster using php and mysql.
1. CREATE INDEX Example
The SQL statement below creates an index named "idx_lastname" on the "LastName" column in the "Persons" table.
CREATE INDEX idx_lastname
ON Persons (LastName);
See here :Slow PHP searching in database
There are several methods creating indexes (combination of columns) etc. you probably know them.
2.FULLTEXT
I use fulltext in all my projects its working fine for me never had a problem for performance.
Mysql site:https://dev.mysql.com/doc/refman/8.0/en/fulltext-search.html
See here : http://www.bakale.com/mysql/myfultext.htm
Other solutions
check mysql setup and use faster methods and create your columns on that.
Example : char is faster than varchar but char is less word limited.
You need to look into mysql faster methods.
Given answer is the best solution for the way you wanted to do, so I wont give a duplicate answer.
But I realy dont think that solution will make faster your server performance.
Even will make it more slower, you need to look into mysql setup and how to make it faster, how to send minimum queries etc...

You could use array_intersect($array1, $array2); which gives you an array where both arrays have that value. https://www.php.net/manual/en/function.array-intersect.php
<?php
$searchArray = array('settings all','print', 'sum', 'industry');
$text = 'Lorem Ipsum is simply dummy text of the printing and typesettings all industry.';
$text = str_replace('.', '', $text); // and any other characters you dont want
$checkArray = explode(' ', $text);
var_dump(array_intersect($searchArray, $checkArray));
https://3v4l.org/17cWj
The only problem with this approach is that one of your words is actually TWO words, so matching settings all wouldn't work.

Related

PHP preg_replace inside for loop

I'm currently trying out this PHP preg_replace function and I've run into a small problem. I want to replace all the tags with a div with an ID, unique for every div, so I thought I would add it into a for loop. But in some strange way, it only do the first line and gives it an ID of 49, which is the last ID they can get. Here's my code:
$res = mysqli_query($mysqli, "SELECT * FROM song WHERE id = 1");
$row = mysqli_fetch_assoc($res);
mysqli_set_charset("utf8");
$lyric = $row['lyric'];
$lyricHTML = nl2br($lyric);
$lines_arr = preg_split('[<br />]',$lyricHTML);
$lines = count($lines_arr);
for($i = 0; $i < $lines; $i++) {
$string = preg_replace(']<br />]', '</h4><h4 id="no'.$i.'">', $lyricHTML, 1);
echo $i;
}
echo '<h4>';
echo $string;
echo '</h4>';
How it works is that I have a large amount of text in my database, and when I add it into the lyric variable, it's just plain text. But when I nl2br it, it gets after every line, which I use here. I get the number of by using the little "lines_arr" method as you can see, and then basically iterate in a for loop.
The only problem is that it only outputs on the first line and gives that an ID of 49. When I move it outside the for loop and removes the limit, it works and all lines gets an <h4> around them, but then I don't get the unique ID I need.
This is some text I pulled out from the database
Mama called about the paper turns out they wrote about me
Now my broken heart´s the only thing that's broke about me
So many people should have seen what we got going on
I only wanna put my heart and my life in songs
Writing about the pain I felt with my daddy gone
About the emptiness I felt when I sat alone
About the happiness I feel when I sing it loud
He should have heard the noise we made with the happy crowd
Did my Gran Daddy know he taught me what a poem was
How you can use a sentence or just a simple pause
What will I say when my kids ask me who my daddy was
I thought about it for a while and I'm at a loss
Knowing that I´m gonna live my whole life without him
I found out a lot of things I never knew about him
All I know is that I´ll never really be alone
Cause we gotta lot of love and a happy home
And my goal is to give every line an <h4 id="no1">TEXT</h4> for example, and the number after no, like no1 or no4 should be incremented every iteration, that's why I chose a for-loop.
Looks like you need to escape your regexp
preg_replace('/\[<br \/\]/', ...);
Really though, this is a classic XY Problem. Instead of asking us how to fix your solution, you should ask us how to solve your problem.
Show us some example text in the database and then show us how you would like it to be formatted. It's very likely there's a better way.
I would use array_walk for this. ideone demo here
$lines = preg_split("/[\r\n]+/", $row['lyric']);
array_walk($lines, function(&$line, $idx) {
$line = sprintf("<h4 id='no%d'>%s</h4>", $idx+1, $line);
});
echo implode("\n", $lines);
Output
<h4 id="no1">Mama called about the paper turns out they wrote about me</h4>
<h4 id="no2">Now my broken heart's the only thing that's broke about me</h4>
<h4 id="no3">So many people should have seen what we got going on</h4>
...
<h4 id="no16">Cause we gotta lot of love and a happy home</h4>
Explanation of solution
nl2br doesn't really help us here. It converts \n to <br /> but then we'd just end up splitting the string on the br. We might as well split using \n to start with. I'm going to use /[\r\n]+/ because it splits one or more \r, \n, and \r\n.
$lines = preg_split("/[\r\n]+/", $row['lyric']);
Now we have an array of strings, each containing one line of lyrics. But we want to wrap each string in an <h4 id="noX">...</h4> where X is the number of the line.
Ordinarily we would use array_map for this, but the array_map callback does not receive an index argument. Instead we will use array_walk which does receive the index.
One more note about this line, is the use of &$line as the callback parameter. This allows us to alter the contents of the $line and have it "saved" in our original $lyrics array. (See the Example #1 in the PHP docs to compare the difference).
array_walk($lines, function(&$line, $idx) {
Here's where the h4 comes in. I use sprintf for formatting HTML strings because I think they are more readable. And it allows you to control how the arguments are output without adding a bunch of view logic in the "template".
Here's the world's tiniest template: '<h4 id="no%d">%s</h4>'. It has two inputs, %d and %s. The first will be output as a number (our line number), and the second will be output as a string (our lyrics).
$line = sprintf('<h4 id="no%d">%s</h4>', $idx+1, $line);
Close the array_walk callback function
});
Now $lines is an array of our newly-formatted lyrics. Let's output the lyrics by separating each line with a \n.
echo implode("\n", $lines);
Done!
If your text in db is in every line why just not explode it with \n character?
Always try to find a solution without using preg set of functions, because they are heavy memory consumers:
I would go lke this:
$lyric = $row['lyric'];
$lyrics =explode("\n",$lyrics);
$lyricsHtml=null;
$i=0;
foreach($lyrics as $val){
$i++;
$lyricsHtml[] = '<h4 id="no'.$i.'">'.$val.'</h4>';
}
$lyricsHtml = implode("\n",$lyricsHtml);
An other way with preg_replace_callback:
$id = 0;
$lyric = preg_replace_callback('~(^)|$~m',
function ($m) use (&$id) {
return (isset($m[1])) ? '<h4 id="no' . ++$id . '">' : '</h4>'; },
$lyric);

How to expand variables in a string

Problem
I'd like to expand variables in a string in the same manner that variable in a double quoted string get expanded.
$string = '<p>It took $replace s</>';
$replace = 40;
expression_i_look_for;
$string should become '<p>It took 40 s</>';
I see a obvious solution like this:
$string = str_replace('"', '\"', $string);
eval('$string = "$string";');
But I really don't like it, because eval() is insecure. Is there any other way to do this ?
Context
I'm building a simple templateing engine, that's where I need this.
Example Template (view_file.php)
<h1>$title</h1>
<p>$content</p>
Template rendering (simplified code):
$params = array('title' => ...);
function render($view_file, $params)
extract($params)
ob_start();
include($view_file);
$text = ob_get_contents();
ob_end_clean();
expression_i_look_for; // this will expand the variables in the template
return $text;
}
The expansion of the variables in the template simplifies it's syntax. Without it, the above example template would be:
<h1><?php echo $title;?></h1>
<p><?php echo $content;?></p>
Do you think this approach is good ? Or should I look in another direction ?
Edit
Finally I understand that there is no simple solution due to flexible way PHP expands variables (even ${$var}->member[0] would be valid.
So there are only two options:
Adopt an existing full fledged templating system
Stick with something very basic that essentially is limited to including the view files via include.
I would rather suggest using some existing template engines, like for example Smarty, but if you really want to do it by yourself you can use the simple regular expression to match all variables constructed with for example letters and numbers and then replace them with correct variables:
<?php
$text = 'hello $world, what is the $matter? I like $world!';
preg_match_all('/\$([a-zA-Z0-9]+)/',
$text,
$out, PREG_PATTERN_ORDER);
$world = 'World';
$matter = 'matter';
foreach(array_unique($out[1]) as $variable){
$text=str_replace('$'.$variable, $$variable, $text);
}
echo $text;
?>
prints
hello World, what is the matter? I like World!
Parse
Parse the string look for $ followed by valid variable name (i.e. \[a-zA-Z_\x7f-\xff\]\[a-zA-Z0-9_\x7f-\xff\]*)
Variable²
Use variable variables syntax (i.e. $$var notation).
Are you trying to do this?
templater.php:
<?php
$first = "first";
$second = "second";
$third = "third";
include('template.php');
template.php:
<?php
echo 'The '.$first.', '.$second.', and '.$third.' variables in a string!';
When templater.php is run, produces:
"The first, second, and third variables in a string!"
Do you want something like this ?
$replace = 40;
$string = '<p>It took {$replace}s</p>';
Instead of using single quotes
$string = '<p>It took $replace s</>';
$replace = 40;
use double quotes
$replace = 40;
$string = "<p>It took $replace s</>";
However, for readability and to enable you to remove the space between $replace and the s I would use:
$replace = 40;
string = '<p>It took ' . $replace . 's</>';
The correct way is probably to parse your document as a tree, identify your parser tags ( because you are managing your own parser they don't have to follow php conventions if you don't want them to ) and then add in your values from an associative array or other data structure as the opportunity arises.
This is a more complex solution but will make it far easier when you realise that you want to be able to display lists whose length is unknown ahead of time using some kind of looping structure based on a standard display option. In the long run, you won't find many serious templating systems that aren't parsing the documents into some kind of in-memory tree where the placeholders can be inserted and then the document constructed as required. This also offers many opportunities for cacheing. Also, if you are unafraid of recursion you will be able to perform a lot of operations on it fairly simply.
However, this is not an uncommon problem to solve and as I commented on the question, there are almost guaranteed to be libraries and extensions around that provide most of the functionality you need. Unless this is a purely academic process for you, I would find some existing solutions and either use one of those or get a solid understanding of how it works so you have a starting point for adapting your own solution.
This is a snippet I pulled out from Lejlot's answer. I tested it and it works fine.
function resolve_vars_in_str( $input )
{
preg_match_all('/\$([a-zA-Z0-9]+)/', $input, $out, PREG_PATTERN_ORDER);
foreach(array_unique($out[1]) as $variable) $input=str_replace('$'.$variable, $GLOBALS["$variable"], $input);
return $input ;
}

CodeIgniter - Search results - How to create good query and URI?

I'm working on some script and I would like to create search. My project is based on CodeIgniter and my desire is to have all code content compatibile with it.
So, I have already working on the query for searching but it's not good because it doesn't support more words than one. So if I enter word "test" in my search form (assume that the test word is in database in one of fields) there will be few results, but if I enter words "test test test" (again assume that the test words are in database in one of fields) there will be no any result.
My current query:
$this->db->select('title, content, date, hyperlink')->or_like(array('title' => $query, 'content' => $query))->order_by('id_article', 'desc')->get('news');
The next problem is with URI because if I search for "some article, today is sunny" there is problem with comma and in other cases probably with other disallowed characters too. I have read few other question there, but all that I found is $this->uri->uri_to_assoc(), which I think in my case wouldn't work, because I don't use English in my project.
What's the best solution for these two problems?
To solve the search issue you should probably explode the query into pieces.
$query = 'test test 2 test3';
$parts = explode(' ', $query);
Then loop through the parts and add or_like conditions.
foreach ($parts as $q)
{
$this->db->or_like(array('title' => $q, 'content' => $q));
}
Now it checks for any of the words.
You may also want to allow quoted values to be treated as one word so that "test 2" would search for the sequence test 2 and not just test and 2. You would need to adjust how you get parts, maybe by a regular expression like:
preg_match_all('/"(?:\\\\.|[^\\\\"])*"|\S+/', $query, $parts);
As for the invalid characters in the URI, pass the string through urlencode on one end then urldecode on the other.
$uri = urlencode('some article, today is sunny');

php search and replace

I am trying to create a database field merge into a document (rtf) using php
i.e if I have a document that starts
Dear Sir,
Customer Name: [customer_name], Date of order: [order_date]
After retrieving the appropriate database record I can use a simple search and replace to insert the database field into the right place.
So far so good.
I would however like to have a little more control over the data before it is replaced. For example I may wish to Title Case it, or convert a delimited string into a list with carriage returns.
I would therefore like to be able to add extra formatting commands to the field to be replaced. e.g.
Dear Sir,
Customer Name: [customer_name, TC], Date of order: [order_date, Y/M/D]
There may be more than one formatting command per field.
Is there a way that I can now search for these strings? The format of the strings is not set in stone, so if I have to change the format then I can.
Any suggestions appreciated.
You could use a templating system like Smarty, that might make your life easier, as you can do {$customer_name|ucwords} or actually put PHP code in your email template.
Try a RegEx and preg_replace_callback:
function replace_param($matches)
{
$parts = explode(',',$matches[0]);
//$parts now contains an array like: customer_name,TC,SE,YMD
// do some substitutions and:
return $text;
}
preg_replace_callback('/\[([^\]]+)\]/','replace_param',$rtf);
You can use explode on it to separate them into array values.
For Example:
$customer_name = 'customer_name, TC';
$get_fields = explode(',', $customer_name);
foreach($get_fields as $value)
{
$new_val = trim($value);
// Now do whatever you want to these in here.
}
Sorry if I'm not understanding you.

Assistance with building an inverted-index

It's part of an information retrieval thing I'm doing for school. The plan is to create a hashmap of words using the the first two letters of the word as a key and any words with the two letters saved as a string value. So,
hashmap["ba"] = "bad barley base"
Once I'm done tokenizing a line I take that hashmap, serialize it, and append it to the text file named after the key.
The idea is that if I take my data and spread it over hundreds of files I'll lessen the time it takes to fulfill a search by lessening the density of each file. The problem I am running into is when I'm making 100+ files in each run it happens to choke on creating a few files for whatever reason and so those entries are empty. Is there any way to make this more efficient? Is it worth continuing this, or should I abandon it?
I'd like to mention I'm using PHP. The two languages I know relatively intimately are PHP and Java. I chose PHP because the front end will be very simple to do and I will be able to add features like autocompletion/suggested search without a problem. I also see no benefit in using Java. Any help is appreciated, thanks.
I would use a single file to get and put the serialized string. I would also use json as the serialization.
Put the data
$string = "bad barley base";
$data = explode(" ",$string);
$hashmap["ba"] = $data;
$jsonContent = json_encode($hashmap);
file_put_contents("a-z.txt",$jsonContent);
Get the data
$jsonContent = file_get_contents("a-z.txt");
$hashmap = json_decode($jsonContent);
foreach($hashmap as $firstTwoCharacters => $value) {
if ($firstTwoCharacters == 'ba') {
$wordCount = count($value);
}
}
You didn't explain the problem you are trying to solve. I'm guessing you are trying to make a full text search engine, but you don't have document ids in your hashmap so I'm not sure how you are using the hashmap to find matching documents.
Assuming you want a full text search engine, I would look into using a trie for the data structure. You should be able to fit everything in it without it growing too large. Nodes that match a word you want to index would contain the ids of the documents containing that word.

Categories