igraph::pagerank
smart_stopwords
to be internal data so that package doesnt need to be explicitly loaded with library
to be able to parseidf(d, t) = log( n / df(d, t) )
to idf(d, t) = log( n / df(d, t) ) + 1
to avoid zeroing out common word tfidf valueslexRank
and unnest_sentences
unnest_sentences
and unnest_sentences_
to parse sentences in a dataframe following tidy data principlesbind_lexrank
and bind_lexrank_
to calculate lexrank scores for sentences in a dataframe following tidy data principles (unnest_sentences
& bind_lexrank
can be used on a df in a magrittr pipeline)sentenceSimil
now calculated using Rcpp. Improves speed by ~25%-30% over old implementation using proxy
packageAdded logic to avoid naming conflicts in proxy::pr_DB in sentenceSimil
(#1, @AdamSpannbauer)
Added check and error for cases where no sentences above threshold in lexRankFromSimil
(#2, @AdamSpannbauer)
tokenize
now has stricter punctuation removal. Removes all non-alphnumeric characters as opposed to removing [:punct:]