At one point in time, prior to converting to WordPress, I wrote a word density counter function to print out the number of times a word was used and what percentage of the whole document that was.
The $min_word_char
variable is to remove words less than that character count. The default will remove all single character words from the count. (There was a good reason for this; not sure what it was.)
The excluded words are not part of the words counted, but are part of the total word count. (This was done at the client's request.)
<?php
function calculate_word_density( $string, $min_word_char = 2, $exclude_words = array() ) {
// remove all html and php tags from the text
$string = strip_tags($string);
//convert all text to lowercase
$string = mb_strtolower($string);
// get an array containing all the words found inside $string
$initial_words_array = str_word_count($string, 1);
// count the size of the array to get the total words
$total_words = sizeof($initial_words_array);
// replace excluded words with blank
$new_string = $string;
foreach( $exclude_words as $filter_word ) {
$new_string = preg_replace("/\b".$filter_word."\b/i", "", $new_string);
}
// get an array without the excluded words
$words_array = str_word_count($new_string, 1);
// verify that all the words are >= the minimum word character length
$words_array = array_filter($words_array,
create_function('$var', 'return (strlen($var) >= '.$min_word_char.');')
);
// remove any duplicate words from the array
$unique_words_array = array_unique($words_array);
$density = array();
foreach( $unique_words_array as $key => $word ) {
preg_match_all('/\b'.$word.'\b/i', $string, $out);
$count = count($out[0]);
$percent = number_format((($count * 100) / $total_words), 2);
$density[$key]['word'] = $word;
$density[$key]['count'] = $count;
$density[$key]['percent'] = $percent.'%';
}
function cmp( $a, $b ) {
return ($a['count'] > $b['count']) ? +1 : -1;
}
usort($density, "cmp");
return $density;
}
?>
strip_tags() ● mb_strtolower() ● str_word_count() ● sizeof() ● preg_replace() ● array_filter() ● create_function() ● strlen() ● array_unique() ● preg_match_all() ● count() ● number_format() ● usort()