Python NLTK

90% of world's data generated over last two years

common
Internet
user
creates
Visual Textual
Instagram
Flickr
Vscocam
Facebook
Tumblr
Blogger
Twitter
Facebook
Emails
Costumer Reviews

World is full of unstructured, text-rich
data. Everything from emails to
customer tweets.
The information buried in all that
text holds the potential to deliver
valuable business insights

Text analytics is the practice of using
technology to gather, store and mine
textual information for hidden signals
that can be used to inform smarter
business decisions

An explosion of
unstructured
data

Many types of organizations are
experiencing explosive growth in their
unstructured enterprise data.
Same time that they have access to
external sources of data such as social
media, blogs, and mobile data.

Until now, much of this information
passed through the organization virtually
unanalyzed. Today, new tools for
handling large amounts of complex data
makes it easier to squeeze value from
such unlikely sources.

sentiment analysis
spam filtering
text categorization
topic detection
keyword frequency
plagiatism detection
document similarity
phrase extraction

Natural Language
Tool Kit
leading platform for building
Python programs to work with
human language data

sentence and word tokenization
text calsification
corpora
parsing
clustring
part of speach tagging
text stemming
and mutch more..

Part of speech
tagging explanation
CC Coordinating conjunctin
CD Cardinal Number
DT Determiner
EX Existing “ there“
FW Foreign word
IN Preposition or subordination conjuction
JJ Adjective
JJR Adjective- comparative
JJS Adjective- superlative
LS List item marker
MD Modal
NN Noun- singular or mass
NNS Non-Plural
NP Proper noun- singular
nltk.help.upenn_tagset() //all tag sets

Text
clasification
Algorithms in
NLTK
Naive Bayes
Maximum Entropy
Decision Tree

Sentiment analysis
https://github.com/pumpurs/SentimentWordsLV/

Document similarity
detection
Tf-idf stands for term frequency-inverse document frequency, and the tf-idf weight is
a weight often used in information retrieval and text mining. This weight is a statistical
measure used to evaluate how important a word is to a document in a collection or
corpus.

“Market and product reserch”
“Social CMS”
1.97 b social network users
“Costumer profiling / analytics”
70% of marketers used Facebook to gain
6.7 million people blog on blogging sites

pumpurs.alberts@gmail.com
Big Data, Startups, Text Analysis, Internet of Things, Web Development

Python NLTK

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Python NLTK

Similar to Python NLTK (20)

Python NLTK