php - find segments of similar text within a string with php - php

I am familiar with the similar_text() function in PHP but what I am trying to think of how to do is find potential links I could add in my content to other articles that I have.
What I want is to be able to scan through all my content in each post and find a segment of text in a post that is similar to a title of another post.
So lets say I have the following structure
$contentfromapost = "this is example text of the content in this article. It talks about things that people like to do for fun and vacation spots where they can do them all around the world"
$titleofpost1 = "Yellow cats are fun to throw in the snow"
$titleofpost2 = "Vacation Rentals in Fun parts of the world"
So my idea is to scan the first post content, then scan the titles of all my articles.
As you can see in my example $titleofpost2 has matching keywords to the $contentfromapost.
Then I would want to be able to grab the segment of text in $contentfromapost and send a link to that post with the similar title. I would maybe use anchor text "fun and vacation spots where they can do them all around the world" to link to that second post.
I want to build this to help me find other posts that I could link too within an article. Potentially I would like it to be able to automatically add the link with the section of text.
Anyways trying to see how I could structure this, any ideas would help

Here is my suggestion on this. This might be helpful.
You can make in two steps :
First Step (Data collection)
1) You can make a table which will contain the keywords and PostId (Here I am assuming each post will have a unique ID) along with other required fields.
2) For making keyword list you can parse each post and filter out Nouns, Verbs.
3) For each keyword and Postid will have a mapping in table.
Second Step (Data Extraction)
Now suppose you have to find the similar post of POST-1
1) First find the all keyword stored for POST-1. And store it in a data structure
2) Now take each keyword and find it in Table which do not belongs to POST-1. Now you will have a Data Structure of PostId for each keyword. Since one postid can come multiple times for a Keyword so here we can also introduce weight of each postid.
3) Now suppose you have 5 keyword for Post-1. Means you have 5 Data structure of Postid with their weight.
4) Now combine all Data Structures. During combination you will have to take care that if any PostId is coming more time means we are also increasing the weight.
5) Finally you will get the PostId which will have most weight. This will be close to your Post-1.
Hope it will make sense.

Related

How to get related content WITHOUT using tags?

I need a way to get related content without using tags because in my case there are too many tags and those tags are inserted by users ( so in the most case they forgot to use them ).
Youtube do the same thing: if, for example, you are watching a funny video, then youtube show you other funny videos in the related content.
For instance, if the article's title is "Barack Obama, president of USA, go to Miami", I need to get other articles that contain "Barack Obama", "USA", "president" or "Miami" in the title and, if possible, other articles of the same topic.
THIS CAN BE VERY COMPLEX TO DO, so I asked you for some advice.
Possible solution is to use Zend Lucene.
http://framework.zend.com/manual/1.12/en/zend.search.lucene.html
It's an easy to implement search engine that runs entirely in php. You can use it a component separate from Zend Framework, and it's fairly easy to implement.
Index all your contents. Use the (for some reason undocumented) boost feature to make parts of the content that more relevant (IE. title, user tags)
Example here: http://davedash.com/2007/05/29/boosting-terms-in-zend-search-lucene/
Then, use the title as a keyword query and display the x highest scoring results to your users. (making sure to filter the content the user is currently looking at)
For optimization you could cash the search results per page.
You can tweak outcomes:
- What content best describes the content - Boost those items while indexing
- When searching what will you use (Title, User Tag, combination)

multiple links per row in mysql

I have a MySQL database in which each row represents an episode of a podcast. I would like to include show notes for each episode and therefore need to be able to extract multiple links per row via PHP.
What would be the best data field to achieve this? I'm thinking that including the links via a linked table may be the only way to do this, but if anybody knows a simpler way I'd love to hear about it.
I would definitely recommend using a new table (podcast_link) because the number of links per podcast is flexible. Adding a text field to the podcast table wouldn't be very efficient due to the parsing of the links when you want to display them.
This will also allow you to e.g. count the number of links per podcast, so you can display "Show related links (4)" and you can add more fields to the links, so that you don't only display the links, but also a title for the link. Especially going forward you might want to add more information per link.
explode() Function :
You have to save your links for example with this structure :
http:\\example.com\podcast01.mp3,http:\\example.com\podcast02.mp3,http:\\example.com\podcast03.mp3 in the database rows.
then when you want to extract them you can easily use explode() function with this way :
$links= $row['links']; // your links in row
$links_array = explode("," , $links);// now you have an array that you can easily access to each block of it.
for example :
echo ($links_array[0]);// output : http:\\example.com\podcast01.mp3
good luck

I need to echo more columns from my Mysql

So far - and with great support from SO members, I am at the edge of finishing my Music Database program, along with all its complexities... As previously suggested, I am using Mysql, Php, JQuery and DataTable plugin, which gives great paginated results. All my Search results work as intended.
My database holds 15 columns of data. I have one table (OK for my current needs). I am able to currently POST and ECHO 12 columns of Search results within a 900px table.
To finalize my project, I also need to be able to show 3 more columns of - data which holds longer text (song description (150 Char), Producer Name(80 Char), and Publisher Name (80 Char), which obviously will not fit on this size table, even with wrapping - on the same row echo.
BUT how do I POST the last 3 columns in a SHOW/HIDE hidden div?, so users maybe click on a link and have these 3 pieces of information suddenly appear underneath any one row on the 900px table?
I have struggled for hundreds of hours just to get to this final stretch...So I need a final suggestion (or push off a cliff) as to where to look next for this answer...
Thank you in advance for any "easy" to understand suggestions you may have to offer me!!
Since you said that you are using datatable plugin, You can use following example to display lengthy details. Once you click on expand button, it will expand the particular row.
http://datatables.net/release-datatables/examples/api/row_details.html
Users don't need (and usually don't care) about all this information. Allow them to configure which columns they can see, and if they choose too many for the width then it's not your problem.
Create a link in your furthest right hand column (for example). Use an anchor link like this:
See More
In the next table row, put in a hidden <div id="extra-<?php echo $counter; ?>" class="hidden-more-data">..your data here..</div>
You can structure your data any way you like in those elements.
In CSS, you can hide .hidden-more-data with {display:none;}
Using jQuery, you can use $('.see-more').live('click',function(){}); in this kind of fashion:
$('.see-more').live('click',function(){
var href = this.href;
$(href).toggle();
return false;
});
And various similar possibilities.
use short headings and show full headings on mouse-over or title attribute.
Show limited char in table cells. And for detailed view show them in pop up div's or mouse-over events.

getting popular or relevant words from input for use as tags

We all know how tagging on SO works. We make a post, tag it and it helps in searches and is used in folksonomy.
this is what I want to do. Instead of forcing people to tag posts, I can somehow fetch relevant words from the post to use as tags.
Apart from say, fetching repeating words, is there a method of getting words of relevance from a post? Maybe a language parser which can detect words of import?
Please give me your own ideas. It doesn't have to be along the lines I am thinking.
Thanks.
Make a table that is a look-up for your keywords. Then search the text entries against your list for hits.
Since you presumably want your tags generated on-the-fly, you would need a large list of words. Those are easy to find (search on "word list"). Then you can delete words of little interest, or even rank them according to relevance.

PHP Detect Pages Genre/Category

I was wondering if their was any sort of way to detect a pages genre/category.
Possibly their is a way to find keywords or something?
Unfortunately I don't have any idea so far, so I don't have any code to show you.
But if anybody has any ideas at all, let me know.
Thanks!
EDIT #Nican
Perhaps their is a way to set, let's say 10 category's (Entertainment, Funny, Tech).
Then creating keywords for these category's (Funny = Laughter, Funny, Joke etc).
Then searching through a webpage (maybe using a cUrl) for these keywords and assigning it to the right category.
Hope that makes sense.
What you are talking about is basically what Google Adsense and similar services do, and it's based on analyzing the content of a page and matching it to topics. Generally, this kind of stuff is beyond what you would call simple programming / development and would require significant resources to be invested to get it to work "right".
A basic system might work along the following lines:
Get page content
Get X most commonly used words (omitting stuff like "and" "or" etc.)
Get words used in headings
Assign weights to different words according to a set of factors (is used in heading, is used in more than one paragraph, is used in link anchors)
Match the filtered words against a database of words related to a specific "category"
If cumulative score > treshold, classify site as belonging to category
Rinse and repeat
Folksonomy may be a way of accomplishing what you're looking for:
http://en.wikipedia.org/wiki/Folksonomy
For instance, in Drupal they have a Folksonomy module:
http://drupal.org/node/19697 (Note this module appears to be dead, see http://drupal.org/taxonomy/term/71)
Couple that with a tag cloud generator, and you may get somewhere:
http://drupal.org/project/searchcloud
Plus, a little more complexity may be able to derive mapped relationships to other terms, especially if you control the structure of the tagging options.
http://intranetblog.blogware.com/blog/_archives/2008/5/22/3707044.html
EDIT
In general, the type of system you're trying to build relies on unique word values on a page. So you would need to...
Get unique word values from your content (index values or create a bot to crawl your site)
Remove all words and symbols you can't use (at, the, or, and, etc...)
Count the number of times the unique words appear on the page
Add them to some type of datastore so you can call them based on the relationships you're mapping
If you have a root label system in place, associate those values with the word counts on the page (such as a query or derived table)
This is very general, and there are a number of ways this can be implemented/interpreted. Folksonomies are meant to "crowdsource" much of the effort for you, in a "natural way", as long as you have a user base that will contribute.

Categories