5. World is full of unstructured, text-rich
data. Everything from emails to
customer tweets.
The information buried in all that
text holds the potential to deliver
valuable business insights
6. Text analytics is the practice of using
technology to gather, store and mine
textual information for hidden signals
that can be used to inform smarter
business decisions
8. Many types of organizations are
experiencing explosive growth in their
unstructured enterprise data.
Same time that they have access to
external sources of data such as social
media, blogs, and mobile data.
9. Until now, much of this information
passed through the organization virtually
unanalyzed. Today, new tools for
handling large amounts of complex data
makes it easier to squeeze value from
such unlikely sources.
18. Part of speech
tagging explanation
CC Coordinating conjunctin
CD Cardinal Number
DT Determiner
EX Existing “ there“
FW Foreign word
IN Preposition or subordination conjuction
JJ Adjective
JJR Adjective- comparative
JJS Adjective- superlative
LS List item marker
MD Modal
NN Noun- singular or mass
NNS Non-Plural
NP Proper noun- singular
nltk.help.upenn_tagset() //all tag sets
23. Document similarity
detection
Tf-idf stands for term frequency-inverse document frequency, and the tf-idf weight is
a weight often used in information retrieval and text mining. This weight is a statistical
measure used to evaluate how important a word is to a document in a collection or
corpus.
27. “Market and product reserch”
“Social CMS”
1.97 b social network users
“Costumer profiling / analytics”
70% of marketers used Facebook to gain
6.7 million people blog on blogging sites