I'm writing a blog and I want to show short versions of posts on the main page. I assume native php string functions aren't appropriate here since posts can be large and it would take long to substr all posts in loop. So, what is the common strategy here? I hope the question is clear and specific.
I don't want to shorten posts on client side with JS, that's not an option.
The solution I use is to make another field in db table with posts where I put short version of post, cuted begining or something like that.
It's faster, and better, you don't have to worry about length becouse you control it, there is no problem with evenual html tags used in context, and you can have a bit diffrent text on mainpage
I can think of two options. The first one involves you writing excerpts for your blog posts manually. Doing this, you don't have to worry about PHP at all.
If you do want to go ahead and automatically generate excerpts, I would set a upper character limit and then cut at the end of the sentence nearest the chosen limit. This approach may or may not produce good results depending on how your post is written.
Related
At the risk of getting redirected to this answer (yes, I read it and spent the last 5 minutes laughing out loud at it), allow me to explain this issue, which is just one in a list of many.
My employer asked me to review a site written in PHP, using Smarty for templates and MySQL as the DBMS. It's currently running very slowly, taking up to 2 minutes (with a entirely white screen through it all, no less) to load completely.
Profiling the code with xdebug, I found a single preg_replace call that takes around 30 seconds to complete, which currently goes through all the HTML code and replaces each URL found to its SEO-friendly version. The moment it completes, it outputs all of the code to the browser. (As I said before, that's not the only issue -the code is rather old, and it shows-, but I'll focus on it for this question.)
Digging further into the code, I found that it currently looks through 1702 patterns with each appropriate match (both matches and replacements in equally-sized arrays), which would certainly account for the time it takes.
Code goes like this:
//This is just a call to a MySQL query which gets the relevant SEO-friendly URLs:
$seourls_data = $oSeoShared->getSeourls();
$url_masks = array();
$seourls = array();
foreach ($seourls_data as $seourl_data)
{
if ($seourl_data["url"])
{
$url_masks[] = "/([\"'\>\s]{1})".$site.str_replace("/", "\/", $seourl_data["url"])."([\#|\"'\s]{1})/";
$seourls[] = "$1".MAINSITE_URL.$seourl_data["seourl"]."$2";
}
}
//After filling both $url_masks and $seourls arrays, then the HTML is parsed:
$html_seo = preg_replace($url_masks, $seourls, $html);
//After it completes, $html_seo is simply echo'ed to the browser.
Now, I know the obvious answer to the problem is: don't parse HTML with a regexp. But then, how to solve this particular issue? My first attempt would probably be:
Load the (hopefully, well-formed) HTML into a DOMDocument, and then get each href attribute in each a tag, like so.
Go through each node, replacing the URL found for its appropriate match (which would probably mean using the previous regexps anyway, but on a much-reduced-size string)
???
Profit?
but I think it's most likely not the right way to solve the issue.
Any ideas or suggestions?
Thanks.
As your goal is to be SEO-friendly, using canonical tag in the target pages would tell the search engines to use your SEO-friendly urls, so you don't need to replace them in your code...
Oops ,That's really tough, bad strategy from the beginning , any way that's not your fault,
i have 2 suggestion:-
1-create a caching technique by smarty so , first HTML still generated in 2 min >
second HTMl just get from a static resource .
2- Don't Do what have to be done earlier later , so fix the system ,create a database migration that store the SEO url in a good format or generate it using titles or what ever, on my system i generate SEO links in this format ..
www.whatever.com/jobs/722/drupal-php-developer
where i use 722 as Id by parsing the url to get the right page content and (drupal-php-developer) is the title of the post or what ever
3 - ( which is not a suggestion) tell your client that project is not well engineered (if you truly believe so ) and need a re structure to boost performance .
run
I run a photo website where users are free to enter any tag they like, even tags not used before. As a result, a photo of a tag may sometimes be tagged as "insect" whilst somebody else tags it as "insects".
I'd like to keep the free-tagging capability, yet would like to have a way to filter out such near-duplicates. The total collection of tags is currently at 1,500. My idea is to read all of them from the DB into mem and then run an alghoritm on it that displays "suspects".
My idea of a suspect is that x% of the characters in the string are the same (same char and order), where x is configurable. I could probably code a really inefficient way to do this but I was wondering if there is an existing solution to this problem?
Edit: Forgot to mention: just sorting the tags isn't enough, as that would require me to go through the entire set to find dupes.
There are some flaws in your logic. For example, what happens when the plural of an object is different from the singular (i.e. person vs. people or even candy vs. candies).
If English is the primary language, check out Soundex which allows phonetic matches. Also consider using a crowd-sourced synonym model where users can create links to existing tags.
Maybe the algorithm you are looking for is approximate string matching.
http://en.wikipedia.org/wiki/Approximate_string_matching.
by a given word you can match it to list of words and if the 'distance' is close add it to suspects.
A fast implementation is to use dynamic programming like the Needleman–Wunsch algorithm.
I have made a blog example of this in C# where you can configure the 'distance' using a matrix character lookup file.
http://kunuk.wordpress.com/2010/10/17/dynamic-programming-example-with-c-using-needleman-wunsch-algorithm/
Is "either contains either" fine? You could do a SQL query something like this, if your images are in a database (which would only make sense):
SELECT * FROM ImageTags WHERE INSTR('theNewTag', TagName) > 0 OR INSTR(TagName, 'theNewTag') > 0 LIMIT 1;
If you really want to do this efficiently I would suggest some sort of JavaScript implementation that displays possibilities as the user is typing in a tag that they want. Not only will it save the user time to happily see 5 suggestions as they type. It will automatically stop them from typing "suspects" when "suspect" shows up as a suggestion. That is, of course, unless they really want "suspects" as a point of urgency.
You could load a huge list of words and as the user types narrow them down. I get the feeling that this could be very simplistic esp if you want to anticipate correctly spelled words. If someone misses a letter, they'll probably go back to fix it when they see a list of suggestions that isn't at all what they meant to type. And when they do correctly type a word it'll pop up in the suggestions.
Does anyone know a clever way to create even columns of text using php?
So lets say I have a few paragraphs of text and I want to split this into two columns of even length (not string length, I'm talking even visible length).
At the moment I'm splitting based on word count, which (as you can imagine) isn't working too well. For instance, on one page I have a list (ul li style) which is increasing the line breaks but not the word count. eg: whats happening is that the left column (with the list in it) is visibly longer than the right column (and if there was a list in the right hand column then it would be the same the other way round).
So does anyone have a clever way to split text? For instance using my knowledge of objective c there is a "size that fits" function. I know how wide the columns are going to be, so is there any way to take that, and the string, and work out how high its going to be? Then cut it in half? Or similar?
Thanks
ps: no css3 nonsense please, we're targeting browsers as far back as ie6 (shudder). :)
I know you're looking at a PHP solution but since the number of lines will depend on how it's rendered in the browser, you'll need to use some javascript.
You basically need to know the dimensions of the container the text is in and using the height divided by the text's line-height, you'll get the number of lines.
Here's a fiddle using jQuery: http://jsfiddle.net/bh8ZR/
There is not a lot of information here as to the source data. However, if you know that you have 20 lines of data, and want to split it, why not simply use an array of the display lines, then divide by two. Then you can take the first half of the PHP array and push it into the second column when you hit the limit of the first.
I think you're going to have trouble displaying these columns in a web browser and having a consistent look and feel because you're trying to apply simple programming logic to a visual layout. CSS and jQuery were designed to help layout issues. jQuery does have IE6 compatibility.
I really don't think you're going to find a magic bullet here if you have HTML formatting inside the data you're trying to display. The browser is going to render this based on a lot of variables. Page width, font size, etc. This is exactly why CSS and other layout styles are there, to handle this sort of formatting.
Is there any reason why you're not trying to solve this in the browser instead of PHP? IE6 to me is not a strong enough case not to do this where it belongs.
I am facing a problem on developing my web app, here is the description:
This webapp (still in alpha) is based on user generated content (usually short articles although their length can become quite large, about one quarter of screen), every user submits at least 10 of these articles, so the number should grow pretty fast. By nature, about 10% of the articles will be duplicated, so I need an algorithm to fetch them.
I have come up with the following steps:
On submission fetch a length of text and store it in a separated table (article_id,length), the problem is the articles are encoded using PHP special_entities() function, and users post content with slight modifications (some one will miss the comma, accent or even skip some words)
Then retrieve all the entries from database with length range = new_post_length +/- 5% (should I use another threshold, keeping in mind that human factor on articles submission?)
Fetch the first 3 keywords and compare them against the articles fetched in the step 2
Having a final array with the most probable matches compare the new entry using PHP's levenstein() function
This process must be executed on article submission, not using cron. However I suspect it will create heavy loads on the server.
Could you provide any idea please?
Thank you!
Mike
Text similarity/plagiat/duplicate is a big topic. There are so many algos and solutions.
Lenvenstein will not work in your case. You can only use it on small texts (due to its "complexity" it would kill your CPU).
Some projects use the "adaptive local alignment of keywords" (you will find info on that on google.)
Also, you can check this (Check the 3 links in the answer, very instructive):
Cosine similarity vs Hamming distance
Hope this will help.
I'd like to point out that git, the version control system, has excellent algorithms for detecting duplicate or near-duplicate content. When you make a commit, it will show you the files modified (regardless of rename), and what percentage changed.
It's open source, and largely written in small, focused C programs. Perhaps there is something you could use.
You could design your app to reduce the load by not having to check text strings and keywords against all other posts in the same category. What if you had the users submit the third party content they are referencing as urls? See Tumblr implementation-- basically there is a free-form text field so each user can comment and create their own narrative portion of the post content, but then there are formatted fields also depending on the type of reference the user is adding (video, image, link, quote, etc.) An improvement on Tumblr would be letting the user add as many/few types of formatted content as they want in any given post.
Then you are only checking against known types like a url or embed video code. Combine that with rexem's suggestion to force users to classify by category or genre of some kind, and you'll have a much smaller scope to search for duplicates.
Also if you can give each user some way of posting to their own "stream" then it doesn't matter if many people duplicate the same content. Give people some way to vote up from the individual streams to a main "front page" level stream so the community can regulate when they see duplicate items. Instead of a vote up/down like Digg or Reddit, you could add a way for people to merge/append posts to related posts (letting them sort and manage the content as an activity on your app rather than making it an issue of behind the scenes processing).
I've got a client that requires that an article be displayed in two, sometimes three, columns in Joomla. I am fairly sure they won't be happy with having to edit 3 articles for 3 columns so the splitting would have to be done automatically.
I've done something similar before where it'll split a chunk of HTML into n columns, but have no real idea how to accomplish this within Joomla itself.
Any ideas gratefully recieved!
An alternative approach:
Use Javascript to split up the Article in several column in the browser. Here I could imagine a full-automated approach could work.
Advantages (over the first approach):
As Javascript can know, which height the paragraphs actually have in the browser (unlinke PHP), you could find the optimum split more accurately.
This can be implemented in the template php-File: you tell the template to include the js-File. So it could be made context-dependent,E.g.: If the left column is collapsed (because there are no modules in it), tell the JavaScript-File to initialize to 3 columns, else 2 columns.
However, have in mind that it should rest usable for those who have Javascript disabled.
This doesn't seem to be easy.
At first thought this should be an CSS attribute, but if it exists, it is part of CSS 3.0 and as such only understood by modern browsers (if at all). But I didn't find any way to do this in CSS.
So you actually have to modify your HTML code. I would propose the following:
A Button (editor-xtd plugin) that splits the article into several parts, each one for one column, showing a dotted line in the editor box (similar to the "read-more"-Button).
E.g. it inserts in the article: (you will have to define hr.column in /templates/system/css/editor.css).
A (content) plugin that creates the multiple colum-style,
E.g. replacing the hr-Tag with table or floating divs.
This way, it is half-automized, without mangeling in the Joomla! files but only adding to extensions to it.
the CSS 3 rules for multi-columns are:
-column-width
-column-gap
-column-rule
-column-count
with the vendor label (-moz, -webkit) before.
More info at http://www.css3.info/preview/multi-column-layout/
I would use css and tell the people with Explorer to change browser! (i'm jocking of course)
Otherwise javascript is the way like said before. This script should do (not tested) http://13thparallel.com/archive/column-script/
This should be done through the template, some PHP coding is involved.
One of our clients asked us to do the exact same thing before, and we have done it through template. Note that for very small articles we increased the font in order to split the article into 3 columns.