Slides from a presentation about Text::Perfide::BookCleaner given at PtPW2011. T::P::BC is a Perl module created to clean books in plain text format, making them suitable for further automatic text processing activities.
Prescribed medication order and communication skills.pptx
Cleaning plain text books with Text::Perfide::BookCleaner
1. Cleaning plain text books with
Text::Perfide::BookCleaner
Andr´ Santos
e
andrefs@cpan.org
September 23, 2011
2. Introduction Per-Fide
1 Introduction
Per-Fide
Text alignment
Books
2 Text::Perfide::BookCleaner
3 Conclusions, wish list and future work
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
3. Introduction Per-Fide
1 Introduction
Per-Fide
Text alignment
Books
2 Text::Perfide::BookCleaner
3 Conclusions, wish list and future work
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
4. Introduction Per-Fide
Project Per-Fide
Joint venture between the Computer Science
Department and the School of Humanities of
the University of Minho
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
5. Introduction Per-Fide
Project Per-Fide
Joint venture between the Computer Science
Department and the School of Humanities of
the University of Minho
Portuguese in parallel with six languages:
Espa˜ol, Russian, Fran¸ais, Italiano, Deutsch,
n c
English
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
6. Introduction Per-Fide
Project Per-Fide
Joint venture between the Computer Science
Department and the School of Humanities of
the University of Minho
Portuguese in parallel with six languages:
Espa˜ol, Russian, Fran¸ais, Italiano, Deutsch,
n c
English
Build parallel corpora that will establish a
relation between Portuguese and the other 6
languages
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
7. Introduction Per-Fide
[Parallel] Corpora
Corpora Collection of natural language texts
Parallel corpora Collection of nat. lang. bitexts
Bitext Pair formed by a text in a given
language and its translation in
another language, frequently aligned.
Alignment Mapping between the
sentences/paragraphs/words of one
text and the other.
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
8. Introduction Per-Fide
Project Per-Fide
Original texts in the seven languages and their
translations
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
9. Introduction Per-Fide
Project Per-Fide
Original texts in the seven languages and their
translations
Two main genres: contemporary fiction
and non-fiction
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
10. Introduction Per-Fide
Project Per-Fide
Original texts in the seven languages and their
translations
Two main genres: contemporary fiction
and non-fiction
non-fiction: judicial, journalistic, religious,
technical, ...
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
11. Introduction Per-Fide
Project Per-Fide
Original texts in the seven languages and their
translations
Two main genres: contemporary fiction
and non-fiction
non-fiction: judicial, journalistic, religious,
technical, ...
fiction: contemporary novels and short
stories
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
12. Introduction Per-Fide
Project Per-Fide
Original texts in the seven languages and their
translations
Two main genres: contemporary fiction
and non-fiction
non-fiction: judicial, journalistic, religious,
technical, ...
fiction: contemporary novels and short
stories
per-fide.di.uminho.pt
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
13. Introduction Text alignment
Text alignment
Manual or automatic
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
14. Introduction Text alignment
Text alignment
Manual or automatic
Paragraph/sentence/word level
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
15. Introduction Text alignment
Text alignment
Manual or automatic
Paragraph/sentence/word level
Automatic alignment tools/algorithms
generally fall into three categories:
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
16. Introduction Text alignment
Text alignment
Manual or automatic
Paragraph/sentence/word level
Automatic alignment tools/algorithms
generally fall into three categories:
length based: “when two sentences correspond, the
words in them also correspond”
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
17. Introduction Text alignment
Text alignment
Manual or automatic
Paragraph/sentence/word level
Automatic alignment tools/algorithms
generally fall into three categories:
length based: “when two sentences correspond, the
words in them also correspond”
lexical/dictionary based: relies on lexical
information or dictionaries to perform the
alignment
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
18. Introduction Text alignment
Text alignment
Manual or automatic
Paragraph/sentence/word level
Automatic alignment tools/algorithms
generally fall into three categories:
length based: “when two sentences correspond, the
words in them also correspond”
lexical/dictionary based: relies on lexical
information or dictionaries to perform the
alignment
partial similarity (cognates) based: relies on
occurrences of tokens graphically or
otherwise identical (cognates)
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
19. Introduction Text alignment
Text alignment – Example
Table: Extract of sentence-level alignment performed using
Portuguese and Russian subtitles from the movie Tron.
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
20. Introduction Books
Books
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
21. Introduction Books
Books
Obtained directly from publishers or, if in
public domain, from Project Gutenberg and
similar projects
Large variety of formats: PDF, MS Word,
HTML, ebook formats, ...
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
22. Introduction Books
Books
Obtained directly from publishers or, if in
public domain, from Project Gutenberg and
similar projects
Large variety of formats: PDF, MS Word,
HTML, ebook formats, ...
If not already in plain text, they need to be
converted before the alignment
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
23. Introduction Books
Books
Obtained directly from publishers or, if in
public domain, from Project Gutenberg and
similar projects
Large variety of formats: PDF, MS Word,
HTML, ebook formats, ...
If not already in plain text, they need to be
converted before the alignment
This is where all the trouble starts!
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
24. Introduction Books
Book alignment problems
pagination – page numbers, headers,
footers, . . .
previous text formatting – sub/superscript,
bold, italics, . . .
sections
paragraphs
translineations and transpaginations
footnotes
text encoding
...
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
25. Introduction Books
Book alignment problems – Example
(. . . )
gaiement. Sur le devant s<92>’ouvrait la porte
d<92>’entr´e, donnant acc`s dans la salle commune.
e e
Une l´g`re v´randa, qui en prot´-
e e e e
<96>- 86 <96>-
^L geait la partie ant´rieure contre l<92>’action
e
des rayons solaires, reposait sur de sveltes bambous.
Le tout ´tait peint d<92>’une fra^che
e ı
(. . . )
La Jangada, Jules Verne
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
26. Text::Perfide::BookCleaner
1 Introduction
Per-Fide
Text alignment
Books
2 Text::Perfide::BookCleaner
3 Conclusions, wish list and future work
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
27. Text::Perfide::BookCleaner
1 Introduction
Per-Fide
Text alignment
Books
2 Text::Perfide::BookCleaner
3 Conclusions, wish list and future work
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
30. Text::Perfide::BookCleaner
First approach
Well-intentioned but:
Too na¨ıve
Big mess
A more sofisticated approach was needed!
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
31. Text::Perfide::BookCleaner
Architecture
Build a pipeline; each step handles a specific set of
problems.
1 pages
2 sections
3 paragraphs
4 footnotes
5 chars
6 ...
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
32. Text::Perfide::BookCleaner
Architecture
Build a pipeline; each step handles a specific set of
problems.
1 pages
2 sections
3 paragraphs
4 footnotes
5 chars
6 ...
7 commit
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
34. Text::Perfide::BookCleaner
Architecture
whenever possible, use ontologies and DSLs
they help organizing stuff
they allow to abstract from the code and
discuss details at a higher level (even with
people from other areas)
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
35. Text::Perfide::BookCleaner
Pages
Goal
Identify and remove from text elements related to
book pagination:
page numbers
headers
footers
page breaks
These elements often lead to a bad performance of
the aligner.
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
36. Text::Perfide::BookCleaner
Pages – Example
est vrai qu’il fallait etre assez chanceux pour
^
rencontrer le nabab, et assez audacieux pour
s’emparer de sa personne.
Page 3
^L La maison ` vapeur
a Jules Verne
Le faquir, - evidemment le seul entre tous
´
que ne surexcit^t pas l’espoir de gagner la
a
prime, - filait au milieu des groupes, s’arr^tant
e
La Maison ` Vapeur, Jules Verne
a
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
37. Text::Perfide::BookCleaner
Pages – Algorithm
1 identify page breaks (e.g., ^L )
2 nearby: candidates to headers and footers
3 count the occurrences of each normalized
candidate
4 headers and footers are extracted from
candidates which occur more thant a threshold
value
5 replace everything with a custom mark
6 move all the necessary information to a
standoff file
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
38. Text::Perfide::BookCleaner
Pages – Example
est vrai qu’il fallait etre assez chanceux pour
^
rencontrer le nabab, et assez audacieux pour
s’emparer de sa personne.
Page 3
^L La maison ` vapeur
a Jules Verne
Le faquir, - evidemment le seul entre tous
´
que ne surexcit^t pas l’espoir de gagner la
a
prime, - filait au milieu des groupes, s’arr^tant
e
La Maison ` Vapeur, Jules Verne
a
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
39. Text::Perfide::BookCleaner
Pages – Example
est vrai qu’il fallait etre assez chanceux pour
^
rencontrer le nabab, et assez audacieux pour
s’emparer de sa personne. _pb2_
Le faquir, - evidemment le seul entre tous
´
que ne surexcit^t pas l’espoir de gagner la
a
prime, - filait au milieu des groupes, s’arr^tant
e
La Maison ` Vapeur, Jules Verne
a
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
40. Text::Perfide::BookCleaner
Sections
Goal
Identify and normalize the divisions between the
several sections of a book (parts, chapters, acts,
scenes, epilogue, afterword, ...)
An ontology was created, containing types of
divisions and subdivisions, in several languages.
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
41. Text::Perfide::BookCleaner
Sections – Ontology
Example
cap
PT cap´tulo, cap, capitulo
ı
FR chapitre, chap
EN chapter, chap
NT sec
PT fim
FR fin
EN the_end
BT _alone
This ontology is used to automatically generate a
parte of the code.
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
42. Text::Perfide::BookCleaner
Sections – Example
PRIMEIRA PARTE
FANTINE
^L LIVRO PRIMEIRO
UM JUSTO
O abade Myriel
Em 1815, era bispo de Digne, o reverendo Carlos
Francisco Bemvindo Myriel, o qual contava setenta e
Os Miser´veis, Vitor Hugo
a
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
43. Text::Perfide::BookCleaner
Sections – Algorithm
1 Search for potential sections divisions:
lines with keywords – cap´ıtulo, chapter, Chap.,
Appendix, Table des Mati´res, . . .
e
pages or lines containing only numbers
roman numbering
...
2 Insert a custom mark immediately before the
section identified
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
44. Text::Perfide::BookCleaner
Sections – Example
PRIMEIRA PARTE
FANTINE
^L LIVRO PRIMEIRO
UM JUSTO
O abade Myriel
Em 1815, era bispo de Digne, o reverendo Carlos
Francisco Bemvindo Myriel, o qual contava setenta e
Os Miser´veis, Vitor Hugo
a
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
45. Text::Perfide::BookCleaner
Sections – Example
_sec+O:PARTE=PRIMEIRA_
FANTINE
_sec+O:LIVRO=PRIMEIRO_
UM JUSTO
O abade Myriel
Em 1815, era bispo de Digne, o reverendo Carlos
Francisco Bemvindo Myriel, o qual contava setenta e
Os Miser´veis, Vitor Hugo
a
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
46. Text::Perfide::BookCleaner
Sections
Identifying the different parts within a bitext:
allows to subsequently compare the two
versions and remove parts which can only be
found in one of them
allows to perform a structural alignment1
1
Text::Perfide::BookSync
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
47. Text::Perfide::BookCleaner
Paragraphs
Goal
Handles things related with identifying and
normalizing paragraph notation, direct speech, etc.
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
48. Text::Perfide::BookCleaner
Paragraphs – Example
L’h^tesse prit la d´fense de son cur´:
o e e
- D’ailleurs, il en plierait quatre comme vous sur
son genou. Il a, l’ann´e derni`re, aid´ nos gens a
e e e `
rentrer la paille; il en portait jusqu’` six bottes
a
a la fois, tant il est fort!
`
- Bravo! dit le pharmacien. Envoyez donc vos filles
en confesse a des gaillards d’un temp´rament pareil!
` e
Moi, si j’´tais le gouvernement, je voudrais qu’on
e
saign^t les pr^tres une fois par mois.
a e
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
49. Text::Perfide::BookCleaner
Paragraphs – Example
L’h^tesse prit la d´fense de son cur´:
o e e
"D’ailleurs, il en plierait quatre comme vous sur
son genou. Il a, l’ann´e derni`re, aid´ nos gens a
e e e `
rentrer la paille; il en portait jusqu’` six bottes
a
a la fois, tant il est fort! "
`
"Bravo!" dit le pharmacien. "Envoyez donc vos filles
en confesse a des gaillards d’un temp´rament pareil!
` e
Moi, si j’´tais le gouvernement, je voudrais qu’on
e
saign^t les pr^tres une fois par mois."
a e
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
50. Text::Perfide::BookCleaner
Paragraphs – Algorithm
paragraph identification is performed by
calculating metrics based on the number of
blank lines and indentation
identification and normalization of direct
speech:
punctuation, paragraph, dash
text in quotes
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
51. Text::Perfide::BookCleaner
Footnotes
Goal
Identify and remove footnote callmarks and
footnote expansions
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
52. Text::Perfide::BookCleaner
Footnotes – Example
On fit un inventaire de son argent comptant, et on
le mena dans le ch^teau que fit construire le roi
a
Charles V, fils de Jean II, aupr`s de la rue
e
Saint-Antoine, a la porte des Tournelles[1].
`
[1] La Bastille, qui fut prise par le peuple de
Paris, le 14 juillet 1789, puis d´molie. B.
e
^L Quel etait en chemin l’´tonnement de l’Ing´nu!
´ e e
je vous le laisse a penser. Il crut d’abord
`
que c’´tait un r^ve.
e e
Oeuvres de Voltaire, Voltaire
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
53. Text::Perfide::BookCleaner
Footnotes – Algorithm
1 Search for footnote expansions (lines beggining
with <<1>>, [2], ^3, . . . )
2 Replace with custom mark
3 Only footnote call marks left
4 Search again for the same patterns in the
middle of the text
5 Replace with custom mark
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
54. Text::Perfide::BookCleaner
Footnotes – Algorithm
On fit un inventaire de son argent comptant, et on
le mena dans le ch^teau que fit construire le roi
a
Charles V, fils de Jean II, aupr`s de la rue
e
Saint-Antoine, a la porte des Tournelles[1].
`
[1] La Bastille, qui fut prise par le peuple de
Paris, le 14 juillet 1789, puis d´molie. B.
e
(fbox^LQuel ´tait en chemin l’´tonnement de l’Ing´nu!
e e e
je vous le laisse a penser. Il crut d’abord
`
que c’´tait un r^ve.
e e
Oeuvres de Voltaire, Voltaire
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
55. Text::Perfide::BookCleaner
Footnotes – Algorithm
On fit un inventaire de son argent comptant, et on
le mena dans le ch^teau que fit construire le roi
a
Charles V, fils de Jean II, aupr`s de la rue
e
Saint-Antoine, a la porte des Tournelles_fnr29_.
`
_fne8_
^L Quel etait en chemin l’´tonnement de l’Ing´nu!
´ e e
je vous le laisse a penser. Il crut d’abord
`
que c’´tait un r^ve.
e e
Oeuvres de Voltaire, Voltaire
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
57. Text::Perfide::BookCleaner
Report
Previous steps produce a report
Summarizes what was found, what was
assumed and what was done
Main goal is to allow to make a diagnostic of
the program, allowing to manually emend what
is wrong
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
58. Text::Perfide::BookCleaner
Report
livros/_FR_15.pdf.txt:
footers=[’( Page) = 241’]
headers=[
"(La maison x{e0} vapeur Jules Verne) = 241"]
ctrL=1;
pagnum_ctrL=241;
sectionsO=2;
sectionsN=30;
word_tr=58;
words=118036;
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
59. Text::Perfide::BookCleaner
Commit
Final and irreversible step which removes all
the custom marks added by the previous steps
Outputs a cleaned copy of the document
This is the last stage before the alignment (or
any other further processing)
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
60. Conclusions, wish list and future work
1 Introduction
Per-Fide
Text alignment
Books
2 Text::Perfide::BookCleaner
3 Conclusions, wish list and future work
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
61. Conclusions, wish list and future work
1 Introduction
Per-Fide
Text alignment
Books
2 Text::Perfide::BookCleaner
3 Conclusions, wish list and future work
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
62. Conclusions, wish list and future work
Conclusions and wish list
There is no de facto standard format for plain
text books (documents?)
Documents are way heterogeneous
(provenience, type and quantity, notation
formats, . . . )
Hurrah to regular expressions!
20/80 rule applies
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
63. Conclusions, wish list and future work
Conclusions and wish list
Ontologies and DSLs lead to a better structure
Common pattern:
search text
calculate metrics
perform action accordingly
Report generated at the end should present a
smart summary of what was found and done
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
64. Conclusions, wish list and future work
Related ongoing work
Text::Perfide::BookPairs Find repeated books and
pairs of books (same book in different
languages) within a collection
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
65. Conclusions, wish list and future work
Related ongoing work
Text::Perfide::BookPairs Find repeated books and
pairs of books (same book in different
languages) within a collection
Text::Perfide::BookSync Uses the section
delimitation made by T::P::BC to make a
structural alignment:
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
66. Conclusions, wish list and future work
Related ongoing work
Text::Perfide::BookPairs Find repeated books and
pairs of books (same book in different
languages) within a collection
Text::Perfide::BookSync Uses the section
delimitation made by T::P::BC to make a
structural alignment:
Text::Perfide::CorporaFlow Uses a DSL to guide the
corpora preparation workflow (to be
done)
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
67. Conclusions, wish list and future work
Related ongoing work
Text::Perfide::BookPairs Find repeated books and
pairs of books (same book in different
languages) within a collection
Text::Perfide::BookSync Uses the section
delimitation made by T::P::BC to make a
structural alignment:
Text::Perfide::CorporaFlow Uses a DSL to guide the
corpora preparation workflow (to be
done)
Text::Perfide::SciPaperCleaner Cleaner for scientific
papers (to be done)
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
68. Conclusions, wish list and future work
Future work
Standoff annotation – no changes in the
original file until commit
Export to ebook formats – .fb2, .epub, . . .
...
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
69. Conclusions, wish list and future work
CPAN
Is it on CPAN yet?
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
70. Conclusions, wish list and future work
CPAN
Is it on CPAN yet?
No, but it will be really, really soon!
Missing
More and better documentation
More and better tests
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
71. Conclusions, wish list and future work
Questions
o/
Andr´ Santos
e
andrefs@cpan.org
Andr´ Santos andrefs@cpan.org
e Cleaning plain text books with Text::Perfide::BookCleaner
72. Cleaning plain text books with
Text::Perfide::BookCleaner
Andr´ Santos
e
andrefs@cpan.org
September 23, 2011